Overview
Azure Cosmos DB is a managed database service designed to support modern application development that requires globally distributed, low-latency data access. Launched by Microsoft in 2014, it is engineered for applications demanding high availability and elastic scalability. The service provides guaranteed single-digit millisecond latency at the 99th percentile across chosen regions, backed by service level agreements (SLAs) for throughput, latency, availability, and consistency. This makes it suitable for scenarios such as IoT, gaming, retail, and web applications that serve a global user base.
A core differentiator of Azure Cosmos DB is its multi-model and multi-API capabilities. Developers can interact with the same underlying data using various APIs, including the native NoSQL (Core) API, API for MongoDB, API for Apache Cassandra, API for Apache Gremlin (graph database), and API for Azure Table storage. This flexibility allows organizations to migrate existing applications built on these different database technologies to Cosmos DB with minimal code changes, or to build new applications leveraging the most appropriate data model for their specific use case. For instance, a development team familiar with MongoDB drivers can use the MongoDB API to connect to Cosmos DB without needing to learn a new query language or data model.
Cosmos DB's global distribution feature allows data to be replicated across multiple Azure regions with turn-key global distribution. This enables applications to place data closer to users, reducing latency and improving responsiveness. Developers can configure consistency levels ranging from strong to eventual, providing a spectrum of choices to balance consistency, availability, and latency according to application requirements. This granular control over consistency is a critical aspect for distributed systems, as discussed in detail by Martin Fowler's article on consistency patterns in distributed systems.
The service operates on a request unit (RU) model, which is a normalized measure of database operations. Every database operation, such as reads, writes, queries, and stored procedure executions, consumes RUs. This abstraction simplifies capacity planning by allowing developers to provision throughput based on RUs per second (RU/s) rather than specific CPU, memory, or I/O metrics. Cosmos DB offers both provisioned throughput and serverless capacity options, catering to predictable high-volume workloads and intermittent or spiky traffic patterns, respectively. The free tier provides the first 1000 RU/s and 25 GB storage free per month per Azure subscription, enabling developers to experiment and build small-scale applications without immediate cost.
Key features
- Globally Distributed with Turnkey Replication: Distribute data across any number of Azure regions with a single click, providing low-latency access for users worldwide and high availability.
- Multi-Model and Multi-API: Supports multiple data models (document, graph, key-value, column-family) and compatible APIs including NoSQL (Core), MongoDB, Cassandra, Gremlin, and Table API, allowing flexibility in data access and migration.
- Guaranteed Low Latency: Offers single-digit millisecond latency at the 99th percentile, backed by SLAs, for both reads and writes.
- Elastic Scalability: Independently scale throughput (Request Units/second) and storage based on application needs, handling unpredictable peaks and dynamic workloads.
- Five Consistency Models: Provides a spectrum of consistency choices (Strong, Bounded Staleness, Session, Consistent Prefix, Eventual) to optimize for performance, availability, and consistency requirements.
- High Availability: Guarantees 99.999% availability for multi-region accounts and 99.99% for single-region accounts, with automatic failover capabilities.
- Serverless and Provisioned Throughput: Choose between a serverless consumption-based model for intermittent workloads or provisioned throughput for predictable, high-performance needs.
- Change Feed: Continuously outputs a sorted list of changes to data in near real-time, enabling reactive programming patterns and integration with other services like Azure Functions.
- Built-in Security: Provides enterprise-grade security with encryption at rest and in transit, IP firewall, virtual network integration, and role-based access control.
- Comprehensive SDKs: Supports popular languages including .NET, Java, Python, Node.js, and Go, facilitating development across various platforms.
Pricing
Azure Cosmos DB employs a pay-as-you-go model primarily based on provisioned throughput, measured in Request Units per second (RU/s), and consumed storage. Users can select between two capacity modes: provisioned throughput or serverless.
Provisioned Throughput
In provisioned throughput mode, users specify the RU/s needed, ensuring dedicated capacity. This mode is suitable for workloads with predictable performance requirements. Throughput can be provisioned at the database level or container level. Storage is charged per GB consumed per month.
Serverless
The serverless capacity mode is designed for workloads with intermittent or unpredictable traffic. Users are billed for the total Request Units consumed and the storage used, without needing to provision throughput in advance. This model automatically scales based on demand and can be cost-effective for spiky or low-utilization scenarios.
A free tier is available, offering the first 1000 RU/s and 25 GB of storage free each month per Azure subscription.
| Service Component | Unit | Example Price (USD, East US 2) | Notes |
|---|---|---|---|
| Provisioned Throughput | 100 RU/s Multi-region write | $0.016 / hour | Billing typically hourly for provisioned RU/s. |
| Provisioned Throughput | 100 RU/s Single-region | $0.008 / hour | For single-region deployments. |
| Serverless Request Units | 1,000,000 RUs | $0.25 | Billed per million RUs consumed. |
| Storage | 1 GB / month | $0.25 | Billed per GB of data stored per month. |
| Backup Storage | 1 GB / month | Included in storage cost up to 100% of data size. Higher tiers available. | Standard backup storage is complimentary up to the size of your transactional data. |
For more detailed pricing information and regional variations, refer to the official Azure Cosmos DB pricing page.
Common integrations
- Azure Functions: Integrate Cosmos DB with serverless functions for real-time data processing, event-driven architectures, and backend operations. The Azure Functions Cosmos DB binding allows triggers and output bindings.
- Azure Stream Analytics: Process large volumes of streaming data from sources like Azure Event Hubs or IoT Hubs, and output results to Cosmos DB for real-time analytics. Refer to the Stream Analytics output to Azure Cosmos DB documentation.
- Azure Synapse Analytics: Use Azure Synapse Link to create a seamless connection between Cosmos DB and Azure Synapse Analytics, enabling near real-time analytics over operational data without impacting transactional workloads. The analytical store for Cosmos DB facilitates this integration.
- Power BI: Connect Cosmos DB data to Power BI for business intelligence dashboards and reporting, allowing visualization of operational data. The Power BI integration guide provides connection steps.
- Azure Logic Apps & Power Automate: Automate workflows and integrate Cosmos DB with hundreds of other services using visual designers. The Azure Cosmos DB connector for Logic Apps enables various actions.
- Kubernetes: Deploy applications that use Cosmos DB within Kubernetes clusters, leveraging the Cosmos DB SDKs for connectivity. While Cosmos DB is a managed service, applications running in Kubernetes environments can interact with it via its public endpoints.
Alternatives
- MongoDB Atlas: A fully managed cloud database service for MongoDB, offering global distribution, high availability, and scalability, with a focus on the document model.
- Amazon DynamoDB: AWS's fully managed, serverless NoSQL key-value and document database designed for single-digit millisecond performance at any scale.
- Google Cloud Firestore: A flexible, scalable NoSQL document database for mobile, web, and server development from Google Cloud, offering real-time synchronization and offline support.
- Apache Cassandra: An open-source, distributed NoSQL database system known for its high availability and linear scalability, often self-managed or available through cloud providers.
- Couchbase: An open-source, distributed NoSQL document database that combines the agility of JSON with the power of SQL, offering a memory-first architecture.
Getting started
To begin using Azure Cosmos DB, you typically provision an account, create a database, and then create containers (collections) within that database. The following Python example demonstrates how to connect to an Azure Cosmos DB for NoSQL account, create a database and container, and then add and query items. This example uses the official Azure Cosmos DB Python SDK.
import os
from azure.cosmos import CosmosClient, PartitionKey
# --- Configuration ---
# Replace with your Cosmos DB account endpoint and primary key
# It's recommended to retrieve these from environment variables or a secure key store
COSMOS_DB_ENDPOINT = os.environ.get("COSMOS_DB_ENDPOINT", "YOUR_COSMOS_DB_ENDPOINT")
COSMOS_DB_KEY = os.environ.get("COSMOS_DB_KEY", "YOUR_COSMOS_DB_KEY")
DATABASE_NAME = "cloudpicker-db"
CONTAINER_NAME = "items"
def main():
client = CosmosClient(COSMOS_DB_ENDPOINT, credential={'masterKey': COSMOS_DB_KEY})
try:
# 1. Create a database
# If the database already exists, this operation will return the existing database
database = client.create_database_if_not_exists(id=DATABASE_NAME)
print(f"Database '{database.id}' created or retrieved.")
# 2. Create a container (collection)
# Define a partition key for efficient scaling. Here, '/category' is used.
# If the container already exists, this operation will return the existing container
container = database.create_container_if_not_exists(
id=CONTAINER_NAME,
partition_key=PartitionKey(path="/category"),
offer_throughput=400 # Minimum throughput provisioned
)
print(f"Container '{container.id}' created or retrieved.")
# 3. Create some items (documents)
item1 = {
"id": "item1",
"name": "Laptop",
"category": "Electronics",
"price": 1200.00
}
item2 = {
"id": "item2",
"name": "Keyboard",
"category": "Electronics",
"price": 75.50
}
item3 = {
"id": "item3",
"name": "Notebook",
"category": "Stationery",
"price": 5.99
}
container.upsert_item(body=item1)
container.upsert_item(body=item2)
container.upsert_item(body=item3)
print("Items upserted.")
# 4. Query items
print("\nQuerying all items:")
query = "SELECT * FROM c"
items = list(container.query_items(query=query, enable_cross_partition_query=True))
for item in items:
print(item)
print("\nQuerying items by category 'Electronics':")
query = "SELECT * FROM c WHERE c.category = @category"
params = [{
"name": "@category",
"value": "Electronics"
}]
items_by_category = list(container.query_items(
query=query,
parameters=params,
partition_key="Electronics" # Specify partition key for targeted query
))
for item in items_by_category:
print(item)
except Exception as e:
print(f"An error occurred: {e}")
if __name__ == "__main__":
# Ensure environment variables are set or replace placeholders
# For a real application, consider Azure Key Vault for secrets management
if "YOUR_COSMOS_DB_ENDPOINT" in COSMOS_DB_ENDPOINT or "YOUR_COSMOS_DB_KEY" in COSMOS_DB_KEY:
print("Please set COSMOS_DB_ENDPOINT and COSMOS_DB_KEY environment variables or replace placeholders.")
else:
main()
Before running this code, ensure you have the Azure Cosmos DB Python SDK installed (pip install azure-cosmos) and replace the placeholder YOUR_COSMOS_DB_ENDPOINT and YOUR_COSMOS_DB_KEY with your actual Azure Cosmos DB account endpoint and primary key, which can be found in the Azure portal under the "Keys" section of your Cosmos DB account. For production applications, it is recommended to manage credentials securely using Azure Key Vault or environment variables.