What is Amazon DocumentDB?

Amazon DocumentDB is a NoSQL document database service, purpose-built for JSON data management at scale, with some degree of compatibility with MongoDB.

What is NoSQL?

NoSQL’ refers to nonrelational types of databases. NoSQL databases can be queried using APIs, declarative structured query languages, and query-by example languages.

NoSQL databases are widely used in real-time web applications and big data, because their main advantages are their ease of development, functionality, scalability, and performance. These databases differ from relational databases, such as Amazon RDS.

Benefits of NoSQL Databases

NoSQL databases are great for modern applications such as mobile, web, and gaming; they need flexible, scalable, high-performance, and highly functional databases to provide great user experiences.

  • Flexibility: NoSQL databases provide flexible schemas that enable faster and more iterative development. It can easily handle any data format, such as structured, semi-structured, and unstructured data in a single data store.
  • Scalability: NoSQL databases can scale out by using distributed clusters of hardware instead of scaling up by adding more servers. This has the ability to support increased traffic in order to meet demand with zero downtime.
  • High-performance: The scale-out architecture of a NoSQL database can be valuable when data volume or traffic increases. This architecture ensures fast response times in milliseconds. NoSQL databases can also ingest data and deliver it quickly and reliably, which is why they are used in applications that collect terabytes of daily data.
  • Highly functional: NoSQL databases provide highly functional APIs and are designed for distributed data stores that have extremely large data storage needs. This is why it is the ideal choice for big data, real-time web apps, customer 360, online shopping, gaming, IoT, social networks, and online advertising applications.
  • Availability: NoSQL databases minimize latency for users by replicating data across multiple servers, data centers, or cloud resources.

Types of NoSQL Databases

There are four main types of NoSQL databases:

  • Key value- This is the most flexible type of NoSQL database because it is highly partitionable and allows horizontal scaling. The application has complete control over what is stored in the value field without any restrictions. Amazon DynamoDB provides latency in milliseconds.
  • Document- These databases are used for storing, retrieving, and managing semi-structured data. Application data is represented in the form of objects or JSON-like objects. Amazon DocumentDB and MongoDB are popular document databases that provide powerful and intuitive APIs for flexible and iterative development.
  • Graph- This database organizes data as nodes and relationships, which show the connections between nodes and run applications with highly connected datasets. Graph databases are mostly used in social networks, reservation systems, and fraud detection. Amazon Neptune is a fully managed graph database service.
  • In Memory- MemoryDB is used to deliver ultra-fast performance and durability. It is primarily used for modern, microservices applications. Amazon ElastiCache is a fully managed, in-memory caching service.
  • Search- Amazon Elasticsearch Service (Amazon ES) is used for providing near-real-time visualizations and analytics of machine-generated data by indexing, aggregating, and searching semi-structured logs.

Relational vs. NoSQL Database

 

Relational Database

NoSQL Database

Data Model

The relational model normalizes data into tables with rows and columns.The database enforces the referential integrity in relationships between tables.

NoSQL databases provide a variety of data models key value, document, and graph, which are optimized for performance and scale. 

Workloads

Designed for OLTP and OLAP applications

NoSQL databases are designed for data access patterns that include low-latency, and used for apps with semi-structured data. 

ACID 

Yes

No

Performance

Dependent on optimization of queries, indexes, and table structures.

Performance is generally a function of the underlying hardware cluster size, network latency, and the calling application.

Scale

Low

High

APIs

Uses SQL to store or retrieve data

APIs used to store and retrieve data structures. Partition keys let apps look up key-value pairs, column sets, or semi-structured documents. 

AWS DynamoDB vs AWS DocumentDB vs MongoDB

Now that we've clarified the NoSQL technology that Amazon DocumentDB is made of, let's get into the specifics of the service, and compare it to related products.

  • Amazon DynamoDB is a fully managed NoSQL database service. It provides fast and predictable performance with scalability. It can be used to create a table that can store and retrieve any amount of data. Amazon DocumentDB is a NoSQL database based upon open-source MongoDB and is designed for less development and scaling.
  • DynamoDB uses tables, items and attributes as the core components. A table is a collection of items, and each item is a collection of attributes. DynamoDB uses primary keys to uniquely identify each item in a table, and secondary indexes provide more flexibility. Both MongoDB and DocumentDB use JSON-like documents to store schema-free data. Since it is schema-free, it allows you to create documents without having to create the structure for the document first, and columns can be different for documents.
  • In DynamoDB, a secondary index is created based on its key attributes, and is used for querying or scanning. Indexes are preferred in MongoDB/DocumentDB. If an index is missing, every document within the collection must be searched to select the documents requested by the query, which will slow down the read performance.

DocumentDB makes it easy to store, query, and index JSON data. Collections are equivalent to a table in a relational database, except there is no single schema enforced upon all documents. Collections let you group similar documents together without identical structure.

Developing With Amazon DocumentDB

Creating Amazon DocumentDB Clusters

In the AWS Management Console, navigate to Amazon DocumentDB>Dashboard>Create Cluster:


Navigate as shown below to confirm that the cluster was created.

Connecting Programmatically to Amazon DocumentDB

Verify whether TLS is enabled or not (by default it is enabled).

Run the below command from AWS CLI to determine the cluster’s parameter group:

$ aws docdb describe-db-clusters     --db-cluster-identifier docdb-cd     --query 'DBClusters[*].[DBClusterIdentifier,DBClusterParameterGroup]'
[
    [
        "docdb-cd",
        "default.docdb4.0"
    ]
]

Run the below command to determine TLS parameter in your cluster's parameter group:

$ aws docdb describe-db-cluster-parameters \
>     --db-cluster-parameter-group-name default.docdb4.0

The output should show “ParameterValue”: “enabled”:

{
            "ParameterName": "tls",
            "ParameterValue": "enabled",
            "Description": "Config to enable/disable TLS",
            "Source": "system",
            "ApplyType": "static",
            "DataType": "string",
            "AllowedValues": "disabled,enabled",
            "IsModifiable": true,
            "ApplyMethod": "pending-reboot"
        }

Connecting to an Amazon DocumentDB Cluster From Outside an Amazon VPC

Amazon DocumentDB clusters are deployed within Virtual Private Cloud (Amazon VPC). They can be accessed directly by Amazon EC2 instances or other AWS services that are deployed in the same Amazon VPC.

Also, DocumentDB can be accessed by EC2 instances or other AWS services in different VPCs in the same AWS Region, or other Regions, via VPC peering. However, if the application needs to access Amazon DocumentDB resources from outside the cluster's VPC, then we can use SSH tunneling .

For this example we will be using an EC2 instance in the same VPC where DocumentDB cluster is created.

Step 1: Create an EC2 Instance

On the Amazon EC2 console, choose Launch instance.

Please make sure to use the same VPC where the DocumentDB cluster is available.

Step 2: Connect to EC2

$ ssh -i "mydemoec2key.pem" ec2-user@ec2-50-17-32-57.compute-1.amazonaws.com

Step 3: Install Mongo Shell in EC2

As Amazon DocumentDB is compatible with MongoDB version 4.0, make sure you install version 4.0 Mongo Shell. There could be compatibility issues with version 5.0.

In EC2 ssh session, run the below command to create yum repo:

$echo -e "[mongodb-org-4.0] \nname=MongoDB Repository\nbaseurl=https://repo.mongodb.org/yum/amazon/2013.03/mongodb-org/4.0/x86_64/\ngpgcheck=1 \nenabled=1 \ngpgkey=https://www.mongodb.org/static/pgp/server-4.0.asc" | sudo tee /etc/yum.repos.d/mongodb-org-4.0.repo

Install the Mongo shell with the following command:

$sudo yum install -y mongodb-org-shell

As TLS is enabled, we need to download the certificate for MongoDB connection. It can be done using below command:

$wget https://s3.amazonaws.com/rds-downloads/rds-combined-ca-bundle.pem

Step 4: Connect to Your Amazon DocumentDB Cluster

Navigate to the DocumentDB cluster, and in the 'Connectivity' tab locate the server/connection details:

As SSL is deprecated, we will be using TLS and use the password which was used while creating the cluster.

Command to connect DocumentDB from MongoShell:

$mongo --tls --host docdb-cd.cckwiu90ey0i.us-east-1.docdb.amazonaws.com:27017 --tlsCAFile rds-combined-ca-bundle.pem --username <username removed> --password <password removed> 

Step 5: Document DB Commands From Mongo Shell

  • Show all collections:
rs0:PRIMARY> show collections
collection
demo
  • Insert document:
rs0:PRIMARY> db.blog.insert({"blogname":"documentdb"})
WriteResult({ "nInserted" : 1 })
rs0:PRIMARY> show collections
blog
collection
demo
rs0:PRIMARY>
  • Insert one more document:
rs0:PRIMARY> db.blog.insert({"blogname":"dynamoDB","status":"completed"})
WriteResult({ "nInserted" : 1 })
  • Find all documents:
rs0:PRIMARY> db.blog.find()
{ "_id" : ObjectId("62b7c4ee378d58cfaecef021"), "blogname" : "documentdb" }
{ "_id" : ObjectId("62b7c580378d58cfaecef022"), "blogname" : "dynamoDB", "status" : "completed" }
rs0:PRIMARY>
  • Find 1 document:
rs0:PRIMARY> db.blog.findOne()
{ "_id" : ObjectId("62b7c4ee378d58cfaecef021"), "blogname" : "documentdb" }
  • Search a specific document:
rs0:PRIMARY> db.blog.find({"blogname":"dynamoDB"})
{ "_id" : ObjectId("62b7c580378d58cfaecef022"), "blogname" : "dynamoDB", "status" : "completed" }
  • Find and modify document:
rs0:PRIMARY> db.blog.findAndModify({query: { blogname: "documentdb"},update: { $inc: { status: “completed” } }})
  • There are two documents:
rs0:PRIMARY> db.blog.find()
{ "_id" : ObjectId("62b7c580378d58cfaecef022"), "blogname" : "dynamoDB", "status" : "completed" }
{ "_id" : ObjectId("62b7e26f378d58cfaecef024"), "blogname" : "documentdb" }
  • Add status to document where blog name is "documentdb":
rs0:PRIMARY> db.blog.findAndModify({query: {blogname: "documentdb"}, update : {blogname : "documentdb", status: "inProgress"}})
{ "_id" : ObjectId("62b7e26f378d58cfaecef024"), "blogname" : "documentdb" }
  • After update show all documents:
rs0:PRIMARY> db.blog.findAndModify({query: {blogname: "documentdb"}, update : {blogname : "documentdb", status: "inProgress"}})
{ "_id" : ObjectId("62b7e26f378d58cfaecef024"), "blogname" : "documentdb" }
  • Delete or remove documents with completed status:
rs0:PRIMARY> db.blog.findAndModify({query: {status: "completed"}, remove : true})
{
	"_id" : ObjectId("62b7c580378d58cfaecef022"),
	"blogname" : "dynamoDB",
	"status" : "completed"
}
  • After remove, show all documents:
rs0:PRIMARY> db.blog.find()
{ "_id" : ObjectId("62b7e26f378d58cfaecef024"), "blogname" : "documentdb", "status" : "inProgress" }
rs0:PRIMARY>

Should you make DocumentDB a part of your stack?

To summarize, DocumentDB can be a good choice if you need scalability and caching for real-time analytics. However, it is not built for transactional data. And DocumentDB has some significant issues/limitations in regards to its compatibility with MongoDB.

  • the DocumentDB 4.0 feature set most closely resembles MongoDB 3.0 and 3.2, released way back in 2015
  • compatibility testing reveals that it fails over two-thirds of the MongoDB API correctness tests
  • applications written for MongoDB must be re-written to work with Amazon DocumentDB

Here are some recommendations from Zuar:

  1. Consider using MongoDB Atlas to run MongoDB apps, which is developed in-house by the MongoDB team.
  2. Utilize a robust ELT tool such as Runner to pull data from MongoDB into your data pipeline.
  3. If you plan to update your data strategy, Zuar can assess your current system for ways to improve efficiency and automate data processes.
Zuar Runner ELT Data Staging Platform | Zuar
Zuar Runner is a fast, lightweight, automated data staging platform. Connect to APIs, Databases, or Flat Files to model your data in preparation for analytics.
Centralize Analytics Data to your Preferred Database with Zuar Runner
Move and model data from hundreds of potential sources to your preferred database with Zuar Runner, an end-to-end ELT platform.