What is DocumentDB used for?

Due to their flexible schema, document databases are perfect for collecting and storing any type of data. If you need or have the potential to scale multiple terabytes with hundreds of thousands of reads and writes per second, DocumentDB could be a good fit. It's a good choice when your data is document-centric and doesn’t fit well into the schema of a relational database, when you need to accommodate massive scale, when you are rapidly prototyping, and a few other use cases.

When should I use documentDB?

Due to their flexible schema, document databases are perfect for collecting and storing any type of data. If you need or have the potential to scale multiple terabytes with hundreds of thousands of reads and writes per second, DocumentDB could be a good fit. It's a good choice when your data is document-centric and doesn’t fit well into the schema of a relational database, when you need to accommodate massive scale, when you are rapidly prototyping, and a few other use cases.

Is Amazon DocumentDB the same as MongoDB?

No, though both are NoSQL JSON document database services, MongoDB is an open source database system developed by MongoDB, Inc which can run on most of operating systems, while DocumentDB is a commercial database system developed by Amazon which can only run on AWS hosted OS.

What is the difference between DynamoDB and DocumentDB?

In DynamoDB, there is no limit in the document size as it can be scaled up to the size of user requirement. In DocumentDB, document size is limited to 16MB and the storage is maximized up to 64 TB of data.

Is DocumentDB a relational database?

No, it's not a relational database. Amazon DocumentDB (with MongoDB compatibility) is a database service that is purpose-built for JSON data management at scale, fully managed and integrated with AWS, and enterprise-ready with high durability.

Is DocumentDB compatible with MongoDB?

Both MongoDB and DocumentDB are a non-relational document database that provides support for JSON-like storage. However, Amazon DocumentDB emulates the MongoDB 4.0 API on a purpose-built database engine that utilizes a distributed, fault-tolerant, self-healing storage system. DocumentDB is not based on the MongoDB server. Rather it emulates the MongoDB API, and runs on top of Amazon's Aurora backend platform. This creates significant architectural constraints, functionality limitations, and broken compatibility.

Is DocumentDB serverless?

DocumentDB is not a serverless service. You need to manage the backend server to use it.

Is DocumentDB faster than MongoDB?

The largest instance of DocumentDB supports 30,000 concurrent connections whereas the largest MongoDB instance supports 128,000 concurrent connections. So in those instances, MongoDB is faster than DocumentDB.

What is DocumentDB in Nosql?

A document database is a type of non relational database that is designed to store and query data as JSON-like documents. Document databases make it easier for developers to store and query data in a database by using the same document-model format they use in their application code.

How fast is AWS DocumentDB?

Amazon DocumentDB write operations are very fast, it only persists write-ahead logs, and does not need to write full buffer page syncs. DocumentDB clusters can scale out to millions of reads per second.

How do I query in DocumentDB?

Use find() command to query Amazon documentDB.

Database/Cloud

What is Amazon DocumentDB?

Amazon DocumentDB basics including types of NoSQL databases, DynamoDB vs. DocumentDB vs. MongoDB, and instructions for DocumentDB development.

Chinmayee Das

Jul 26, 2022 • 9 min read

Amazon DocumentDB is a NoSQL document database service, purpose-built for JSON data management at scale, with some degree of compatibility with MongoDB.

What is NoSQL?

‘NoSQL’ refers to nonrelational types of databases. NoSQL databases can be queried using APIs, declarative structured query languages, and query-by example languages.

NoSQL databases are widely used in real-time web applications and big data, because their main advantages are their ease of development, functionality, scalability, and performance. These databases differ from relational databases, such as Amazon RDS.

Benefits of NoSQL Databases

NoSQL databases are great for modern applications such as mobile, web, and gaming; they need flexible, scalable, high-performance, and highly functional databases to provide great user experiences.

Flexibility: NoSQL databases provide flexible schemas that enable faster and more iterative development. It can easily handle any data format, such as structured, semi-structured, and unstructured data in a single data store.
Scalability: NoSQL databases can scale out by using distributed clusters of hardware instead of scaling up by adding more servers. This has the ability to support increased traffic in order to meet demand with zero downtime.
High-performance: The scale-out architecture of a NoSQL database can be valuable when data volume or traffic increases. This architecture ensures fast response times in milliseconds. NoSQL databases can also ingest data and deliver it quickly and reliably, which is why they are used in applications that collect terabytes of daily data.
Highly functional: NoSQL databases provide highly functional APIs and are designed for distributed data stores that have extremely large data storage needs. This is why it is the ideal choice for big data, real-time web apps, customer 360, online shopping, gaming, IoT, social networks, and online advertising applications.
Availability: NoSQL databases minimize latency for users by replicating data across multiple servers, data centers, or cloud resources.

Types of NoSQL Databases

There are four main types of NoSQL databases:

Key value- This is the most flexible type of NoSQL database because it is highly partitionable and allows horizontal scaling. The application has complete control over what is stored in the value field without any restrictions. Amazon DynamoDB provides latency in milliseconds.
Document- These databases are used for storing, retrieving, and managing semi-structured data. Application data is represented in the form of objects or JSON-like objects. Amazon DocumentDB and MongoDB are popular document databases that provide powerful and intuitive APIs for flexible and iterative development.
Graph- This database organizes data as nodes and relationships, which show the connections between nodes and run applications with highly connected datasets. Graph databases are mostly used in social networks, reservation systems, and fraud detection. Amazon Neptune is a fully managed graph database service.
In Memory- MemoryDB is used to deliver ultra-fast performance and durability. It is primarily used for modern, microservices applications. Amazon ElastiCache is a fully managed, in-memory caching service.
Search- Amazon Elasticsearch Service (Amazon ES) is used for providing near-real-time visualizations and analytics of machine-generated data by indexing, aggregating, and searching semi-structured logs.

Relational vs. NoSQL Database

	Relational Database	NoSQL Database
Data Model	The relational model normalizes data into tables with rows and columns.The database enforces the referential integrity in relationships between tables.	NoSQL databases provide a variety of data models key value, document, and graph, which are optimized for performance and scale.
Workloads	Designed for OLTP and OLAP applications	NoSQL databases are designed for data access patterns that include low-latency, and used for apps with semi-structured data.
ACID	Yes	No
Performance	Dependent on optimization of queries, indexes, and table structures.	Performance is generally a function of the underlying hardware cluster size, network latency, and the calling application.
Scale	Low	High
APIs	Uses SQL to store or retrieve data	APIs used to store and retrieve data structures. Partition keys let apps look up key-value pairs, column sets, or semi-structured documents.

The differences between relational databases and NoSQL databases

AWS DynamoDB vs AWS DocumentDB vs MongoDB

Now that we've clarified the NoSQL technology that Amazon DocumentDB is made of, let's get into the specifics of the service, and compare it to related products.

Amazon DynamoDB is a fully managed NoSQL database service. It provides fast and predictable performance with scalability. It can be used to create a table that can store and retrieve any amount of data. Amazon DocumentDB is a NoSQL database based upon open-source MongoDB and is designed for less development and scaling.
DynamoDB uses tables, items and attributes as the core components. A table is a collection of items, and each item is a collection of attributes. DynamoDB uses primary keys to uniquely identify each item in a table, and secondary indexes provide more flexibility. Both MongoDB and DocumentDB use JSON-like documents to store schema-free data. Since it is schema-free, it allows you to create documents without having to create the structure for the document first, and columns can be different for documents.
In DynamoDB, a secondary index is created based on its key attributes, and is used for querying or scanning. Indexes are preferred in MongoDB/DocumentDB. If an index is missing, every document within the collection must be searched to select the documents requested by the query, which will slow down the read performance.

DocumentDB makes it easy to store, query, and index JSON data. Collections are equivalent to a table in a relational database, except there is no single schema enforced upon all documents. Collections let you group similar documents together without identical structure.

Developing With Amazon DocumentDB

Creating Amazon DocumentDB Clusters

In the AWS Management Console, navigate to Amazon DocumentDB>Dashboard>Create Cluster:

Navigate as shown below to confirm that the cluster was created.

Connecting Programmatically to Amazon DocumentDB

Verify whether TLS is enabled or not (by default it is enabled).

Run the below command from AWS CLI to determine the cluster’s parameter group:

$ aws docdb describe-db-clusters     --db-cluster-identifier docdb-cd     --query 'DBClusters[*].[DBClusterIdentifier,DBClusterParameterGroup]'

[
    [
        "docdb-cd",
        "default.docdb4.0"
    ]
]

Run the below command to determine TLS parameter in your cluster's parameter group:

$ aws docdb describe-db-cluster-parameters \
>     --db-cluster-parameter-group-name default.docdb4.0

The output should show “ParameterValue”: “enabled”:

{
            "ParameterName": "tls",
            "ParameterValue": "enabled",
            "Description": "Config to enable/disable TLS",
            "Source": "system",
            "ApplyType": "static",
            "DataType": "string",
            "AllowedValues": "disabled,enabled",
            "IsModifiable": true,
            "ApplyMethod": "pending-reboot"
        }

Connecting to an Amazon DocumentDB Cluster From Outside an Amazon VPC

Amazon DocumentDB clusters are deployed within Virtual Private Cloud (Amazon VPC). They can be accessed directly by Amazon EC2 instances or other AWS services that are deployed in the same Amazon VPC.

Also, DocumentDB can be accessed by EC2 instances or other AWS services in different VPCs in the same AWS Region, or other Regions, via VPC peering. However, if the application needs to access Amazon DocumentDB resources from outside the cluster's VPC, then we can use SSH tunneling .

For this example we will be using an EC2 instance in the same VPC where DocumentDB cluster is created.

Step 1: Create an EC2 Instance

On the Amazon EC2 console, choose Launch instance.

Please make sure to use the same VPC where the DocumentDB cluster is available.

Step 2: Connect to EC2

$ ssh -i "mydemoec2key.pem" ec2-user@ec2-50-17-32-57.compute-1.amazonaws.com

Step 3: Install Mongo Shell in EC2

As Amazon DocumentDB is compatible with MongoDB version 4.0, make sure you install version 4.0 Mongo Shell. There could be compatibility issues with version 5.0.

In EC2 ssh session, run the below command to create yum repo:

$echo -e "[mongodb-org-4.0] \nname=MongoDB Repository\nbaseurl=https://repo.mongodb.org/yum/amazon/2013.03/mongodb-org/4.0/x86_64/\ngpgcheck=1 \nenabled=1 \ngpgkey=https://www.mongodb.org/static/pgp/server-4.0.asc" | sudo tee /etc/yum.repos.d/mongodb-org-4.0.repo

Install the Mongo shell with the following command:

$sudo yum install -y mongodb-org-shell

As TLS is enabled, we need to download the certificate for MongoDB connection. It can be done using below command:

$wget https://s3.amazonaws.com/rds-downloads/rds-combined-ca-bundle.pem

Step 4: Connect to Your Amazon DocumentDB Cluster

Navigate to the DocumentDB cluster, and in the 'Connectivity' tab locate the server/connection details:

As SSL is deprecated, we will be using TLS and use the password which was used while creating the cluster.

Command to connect DocumentDB from MongoShell:

$mongo --tls --host docdb-cd.cckwiu90ey0i.us-east-1.docdb.amazonaws.com:27017 --tlsCAFile rds-combined-ca-bundle.pem --username <username removed> --password <password removed>

Step 5: Document DB Commands From Mongo Shell

Show all collections:

rs0:PRIMARY> show collections
collection
demo

Insert document:

rs0:PRIMARY> db.blog.insert({"blogname":"documentdb"})
WriteResult({ "nInserted" : 1 })
rs0:PRIMARY> show collections
blog
collection
demo
rs0:PRIMARY>

Insert one more document:

rs0:PRIMARY> db.blog.insert({"blogname":"dynamoDB","status":"completed"})
WriteResult({ "nInserted" : 1 })

Find all documents:

rs0:PRIMARY> db.blog.find()
{ "_id" : ObjectId("62b7c4ee378d58cfaecef021"), "blogname" : "documentdb" }
{ "_id" : ObjectId("62b7c580378d58cfaecef022"), "blogname" : "dynamoDB", "status" : "completed" }
rs0:PRIMARY>

Find 1 document:

rs0:PRIMARY> db.blog.findOne()
{ "_id" : ObjectId("62b7c4ee378d58cfaecef021"), "blogname" : "documentdb" }

Search a specific document:

rs0:PRIMARY> db.blog.find({"blogname":"dynamoDB"})
{ "_id" : ObjectId("62b7c580378d58cfaecef022"), "blogname" : "dynamoDB", "status" : "completed" }

Find and modify document:

rs0:PRIMARY> db.blog.findAndModify({query: { blogname: "documentdb"},update: { $inc: { status: “completed” } }})

There are two documents:

rs0:PRIMARY> db.blog.find()
{ "_id" : ObjectId("62b7c580378d58cfaecef022"), "blogname" : "dynamoDB", "status" : "completed" }
{ "_id" : ObjectId("62b7e26f378d58cfaecef024"), "blogname" : "documentdb" }

Add status to document where blog name is "documentdb":

rs0:PRIMARY> db.blog.findAndModify({query: {blogname: "documentdb"}, update : {blogname : "documentdb", status: "inProgress"}})
{ "_id" : ObjectId("62b7e26f378d58cfaecef024"), "blogname" : "documentdb" }

After update show all documents:

rs0:PRIMARY> db.blog.findAndModify({query: {blogname: "documentdb"}, update : {blogname : "documentdb", status: "inProgress"}})
{ "_id" : ObjectId("62b7e26f378d58cfaecef024"), "blogname" : "documentdb" }

Delete or remove documents with completed status:

rs0:PRIMARY> db.blog.findAndModify({query: {status: "completed"}, remove : true})
{
	"_id" : ObjectId("62b7c580378d58cfaecef022"),
	"blogname" : "dynamoDB",
	"status" : "completed"
}

After remove, show all documents:

rs0:PRIMARY> db.blog.find()
{ "_id" : ObjectId("62b7e26f378d58cfaecef024"), "blogname" : "documentdb", "status" : "inProgress" }
rs0:PRIMARY>

Step-by-step commands for Amazon DocumentDB

Should you make DocumentDB a part of your stack?

To summarize, DocumentDB can be a good choice if you need scalability and caching for real-time analytics. However, it is not built for transactional data. And DocumentDB has some significant issues/limitations in regards to its compatibility with MongoDB.

the DocumentDB 4.0 feature set most closely resembles MongoDB 3.0 and 3.2, released way back in 2015
compatibility testing reveals that it fails over two-thirds of the MongoDB API correctness tests
applications written for MongoDB must be re-written to work with Amazon DocumentDB

Here are some recommendations from Zuar:

Consider using MongoDB Atlas to run MongoDB apps, which is developed in-house by the MongoDB team.
Utilize a robust ELT tool such as Runner to pull data from MongoDB into your data pipeline.
If you plan to update your data strategy, Zuar can assess your current system for ways to improve efficiency and automate data processes.

Schedule a free data strategy assessment!