A collection of information that is systematized with the aim to retrieve, manage and organize it, is referred to as a Database. Any software development needs a reliable database. The choice of database is a strategic decision so as to optimize the software architecture.
There are mainly two types of databases.
SQL – Such as Oracle, MySQL, PostgreSQL, Microsoft SQL Server
NoSQL – Such as Couchbase, MongoDB, Redis, Cassandra
NoSQL means ‘not only SQL database’ and SQL is a relational type database.
NoSQL database is a non-relational kind of database system utilized to store and retrieve data. Generally, it is not always possible to store data in a table format with fixed schemas. Such as GEO-location data, Social graphs, IoT & user-generated data. Over and above nowadays data is being churned out in exponential figures.
Data needs to be stored, processed, retrieved, and interpreted so as to be utilized in numerous beneficial ways. NoSQL databases have such capabilities. With NoSQL database storing and retrieval of documents, graph-based data, key-value data are effortless and quick. Hence we can definitely avoid intricate SQL joins processes. NoSQL databases are scalable horizontally for web and enterprise business applications.
The simplicity of design and the ability to scale horizontally gains favour for NoSQL databases types which is otherwise challenging to accomplish using RDMS databases.
NoSQL Databases Types
Document Databases – The document databases pair each key with a complex data structure termed a document. Documents may have key-value pairs or key-array pairs of nested documents. Document NoSQL databases are: MongoDB, Couchbase, ArangoDB, Apache CouchDB, Cosmos DB, OrientDB, IBM Domino, MarkLogic.
Key-value stores – All items are stored as a key-value pair. The key-value stores are among the simplest kind of databases among NoSQL Databases. Key-value NoSQL databases are Redis, Apache Ignite, Memcached, and Riak.
Wide-column stores – Where a large dataset is involved data is stored in column format rather than rows as querying can then be done utilizing Wide-Column databases. Such as Cassandra, Scylla, and HBase.
Graph stores – Finally when the data is in the form of graphs, social connections, networks, road-maps, and transport links we use Graph types NoSQL DB such as AllegorGraph or Neo4j.
Elasticsearch an open-source full-text search, scalable, real-time distributed, enterprise-grade search engine.
Here we shall take a look at 4 NoSQL Databases in detail.
Cassandra has the ability to handle large quantity of structured data. It was developed to search inbox for Facebook and is a distributed type of data storage system. The data is usually spread out among several commodity servers. The storage capacity of data can be increased effortlessly by taking the service online. It does not involve complex structures as all nodes are in the same cluster.
Cassandra is written in Java is a SQL type language and is the second best open source database used by Facebook, Cisco, Twitter, eBay, Netflix, Rackspace and others.
- Scalable linearly
- Has quick response time
- Properties supported: ACID – Atomicity, Isolation, Consistency, and Durability.
- MapReduce is supported with Apache Hadoop
- Data distribution flexibility is good
- Peer-to-peer architecture
- Highly scalable
- Single point of failure rare
- Multi-DC Replication
- Can integrate over other JVM applications
- Suitable for redundancy, multiple data center deployments, failover, and disaster recovery
- Aggregations poor support
- Performance not predictable
- Ad-hoc query not supported
Redis stands for Remote Dictionary Server. Redis is basically composed in C language and supports C++, Ruby, Perl, PHP, Scala, and others. It is well known among all key-value stores. Redis can manage up to 2 to the power of 32, i.e, is roughly 4.3 billion keys. In fact, it was tested to handle a minimum of 250 million keys at every instance. Redis stores all the data in the RAM and backs it up on the disk as it is in-memory with an on-disk persistent database.
- Automatic failover
- Entire database held in memory
- Lua scripting, which is easier than Python too.
- Can replicate data to scores of slaves
- Keys have a limited time
- Eviction of keys through LRU
- You can Publish/Subscribe
- Diverse types of data are supported
- Easy to install
- Performs 110000 SETs per second, 81000 GETs per second
- Multi-utility tool
- Redis Sentinel can create replication into distributed systems.
- Doesn’t support joins
- Knowledge of Lua necessary for procedures
- Dataset has to fit in memory
HBase is composed in Java 8 and authorized under Apache. This is a non-relational distributed database designed for Big-Table databases by Google. HBase can host billions of rows, millions of columns and utilizes Java API for user access. Capacity can be increased by adding servers and the availability of data is ensured by multiple master nodes.
- Automatic failure diagnosis support
- Linearly scalable
- Data replication provision
- Integrates with Hadoop, as a source as well as destination
- Fast lookups in large tables.
- Low latency access from billions of single row records
- Java API for client
- Can manage large datasets with HDFS file storage
- Schema design flexibility
- Transactions not supported
- No built-in authentications and permissions
- Indexing and sorting on key.
- When one HMaster is used Single point of failure may occur
- SQL structure not supported
- Memory issues (on cluster)
Memcached has been adopted by Facebook, Netlog, Wikipedia, Flickr, YouTube, Twitter, and others. It’s a high-performance, open-source, distributed memory caching system with capabilities to enhance dynamic web applications by decreasing database load. This is key value of strings or objects stored in memory, causing database calls, API calls, or page rendering.
- Client-server application (over TCP/UDP)
- Reduces database load
- Server is a huge hash-table
- Competent where database load is huge
- Memory caches combined to a logical pool
- Installation is quick
- Huge community and widely documented
- Supported on Linux OS or those similar to BSD
- Data redundancy, locks, CAS, read-throughs not supported