Code Library: MongoDB Horizontal Scaling

MongoDB Horizontal Scaling

One common reason for using MongoDB is its schema-less collections and the other is its inherent capacity to perform well and scale. In more recent versions, MongoDB supports auto-sharding for scaling horizontally with ease.

The fundamental concept of sharding is fairly similar to the idea of the column database’s masterworker pattern where data is distributed across multiple range servers. MongoDB allows ordered collections to be saved across multiple machines. Each machine that saves part of the collection is then a shard. Shards are replicated to allow failover. So, a large collection could be split into four shards and each shard in turn may be replicated three times. This would create 12 units of a MongoDB server. The two additional copies of each shard serve as failover units.

Shards are at the collection level and not at the database level. Thus, one collection in a database may reside on a single node, whereas another in the same database may be sharded out to multiple nodes. Each shard stores contiguous sets of the ordered documents. Such bundles are called chunks in MongoDB jargon. Each chunk is identifi ed by three attributes, namely the fi rst document key (min key), the last document key (max key), and the collection.

A collection can be sharded based on any valid shard key pattern. Any document fi eld of a collection or a combination of two or more document fi elds in a collection can be used as the basis of a shard key. Shard keys also contain an order direction property in addition to the fi eld to defi ne a shard key. The order direction can be 1, meaning ascending or –1, meaning descending. It’s important to choose the shard keys prudently and make sure those keys can partition the data in an evenly balanced manner.

All defi nitions about the shards and the chunks they maintain are kept in metadata catalogs in a config server. Like the shards themselves, confi g servers are also replicated to support failover.

Client processes reach out to a MongoDB cluster via a mongos process. A mongos process does not have a persistent state and pulls state from the confi g servers. There can be one or more mongos processes for a MongoDB cluster. Mongos processes have the responsibility of routing queries appropriately and combining results where required. A query to a MongoDB cluster can be targeted or can be global. All queries that can leverage the shard key on which the data is ordered typically are targeted queries and those that can’t leverage the index are global. Targeted queries are more efficient than global queries. Think of global queries as those involving full collection scans.

Source of Information : NoSQL

Code Library

Categories

0 comments:

Leave a Reply