Code Library: Storing Data in Memory-Mapped Files

Storing Data in Memory-Mapped Files

A memory-mapped fi le is a segment of virtual memory that is assigned byte-for-byte to a fi le or a fi le-like resource that can be referenced through a fi le descriptor. This implies that applications can interact with such fi les as if they were parts of the primary memory. This obviously improves I/O performance as compared to usual disk read and write. Accessing and manipulating memory is much faster than making system calls. In addition, in many operating systems, like Linux, memory region mapped to a fi le is part of the buffer of disk-backed pages in RAM. This transparent buffer is commonly called page cache. It is implemented in the operating system’s kernel.

MongoDB’s strategy of using memory-mapped fi les for storage is a clever one but it has its ramifi cations. First, memory-mapped fi les imply that there is no separation between the operating system cache and the database cache. This means there is no cache redundancy either. Second, caching is controlled by the operating system, because virtual memory mapping does not work the same on all operating systems. This means cache-management policies that govern what is kept in cache and what is discarded also varies from one operating system to the other. Third, MongoDB can expand its database cache to use all available memory without any additional confi guration. This means you could enhance MongoDB performance by throwing in a larger RAM and allocating a larger virtual memory.

Memory mapping also introduces a few limitations. For example, MongoDB’s implementation restricts data size to a maximum of 2 GB on 32-bit systems. These restrictions don’t apply to MongoDB running on 64-bit machines.

Database size isn’t the only size limitation, though. Additional limitations govern the size of each document and the number of collections a MongoDB server can hold. A document can be no larger than 8 MiB, which obviously means using MongoDB to store large blobs is not appropriate. If storing large documents is absolutely necessary, then leverage the GridFS to store documents larger than 8 MiB. Furthermore, there is a limit on the number of namespaces that can be assigned in a database instance. The default number of namespaces supported is 24,000. Each collection and each index uses up a namespace. This means, by default, two indexes per collection would allow a maximum of 8,000 collections per database. Usually, such a large number is enough. However, if you need to, you can raise the namespace size beyond 24,000.

Increasing the namespace size has implications and limitations as well. Each collection namespace uses up a few kilobytes. In MongoDB, an index is implemented as a B-tree. Each B-tree page is 8 kB. Therefore, adding additional namespaces, whether for collections or indexes, implies adding a few kB for each additional instance. Namespaces for a MongoDB database named mydb are maintained in a fi le named mydb.ns. An .ns fi le like mydb.ns can grow up to a maximum size of 2 GB.

Because size limitations can restrict unbounded database growth, it’s important to understand a few
more behavioral patterns of collections and indexes.

Source of Information : NoSQL

Code Library

Categories

0 comments:

Leave a Reply