Horizontal partitioning in MongoDB
In this article, I'll talk a bit about data partitioning. Specifically, I'll talk about horizontal data partitioning or sharding. After a brief 'theoretical' part, I'll show you an example of how you can configure sharding with MongoDB. You will be able to run three MongoDB shards, a configuration server, and a router on your computer, using Docker compose.
Why partition the data?
Note
Scaling up or scaling vertically is done by adding more resources to the existing system. Adding more storage, memory, processors, improving the network, etc.
Note
Scaling out or scaling horizontally means adding more servers or units, instead of upgrading those units as with scaling up.
Horizontal partitioning or sharding
Note
A hot partition is a partition or shard that receives a disproportionate amount of traffic.
Sharding with MongoDB
Configuring shards
version: "2"
services:
# Configuration server
config:
image: mongo
command: mongod --configsvr --replSet configserver --port 27017
volumes:
- ./scripts:/scripts
# Shards
shard1:
image: mongo
command: mongod --shardsvr --replSet shard1 --port 27018
volumes:
- ./scripts:/scripts
shard2:
image: mongo
command: mongod --shardsvr --replSet shard2 --port 27019
volumes:
- ./scripts:/scripts
shard3:
image: mongo
command: mongod --shardsvr --replSet shard3 --port 27020
volumes:
- ./scripts:/scripts
# Router
router:
image: mongo
command: mongos --configdb configserver/config:27017 --bind_ip_all --port 27017
ports:
- "27017:27017"
volumes:
- ./scripts:/scripts
depends_on:
- config
- shard1
- shard2
- shard3
docker-compose up
from the same folder the docker-compose.yaml
file is in.docker ps
to see all running containers.$ docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
a4666f0b327a mongo "docker-entrypoint.s…" 8 minutes ago Up 8 minutes 0.0.0.0:27017->27017/tcp tutorial_router_1
a058e405aee1 mongo "docker-entrypoint.s…" 8 minutes ago Up 8 minutes 27017/tcp tutorial_config_1
db346afca10b mongo "docker-entrypoint.s…" 8 minutes ago Up 8 minutes 27017/tcp tutorial_shard1_1
2fc3a5129f35 mongo "docker-entrypoint.s…" 8 minutes ago Up 8 minutes 27017/tcp tutorial_shard2_1
ea84bd82760b mongo "docker-entrypoint.s…" 8 minutes ago Up 8 minutes 27017/tcp tutorial_shard3_1
Configuring the configuration server
init-config.js
with these contents:rs.initiate({
_id: 'configserver',
configsvr: true,
version: 1,
members: [
{
_id: 0,
host: 'config:27017',
},
],
});
exec
command to run this script inside the configuration container:$ docker-compose exec config sh -c "mongo --port 27017 < init-config.js"
MongoDB shell version v4.2.6
connecting to: mongodb://127.0.0.1:27017/?compressors=disabled&gssapiServiceName=mongodb
Implicit session: session { "id" : UUID("696d3b93-57c1-4d3e-8057-4f0b13e3ec0a") }
MongoDB server version: 4.2.6
{
"ok" : 1,
"$gleStats" : {
"lastOpTime" : Timestamp(1589317607, 1),
"electionId" : ObjectId("000000000000000000000000")
},
"lastCommittedOpTime" : Timestamp(0, 0),
"$clusterTime" : {
"clusterTime" : Timestamp(1589317607, 1),
"signature" : {
"hash" : BinData(0,"AAAAAAAAAAAAAAAAAAAAAAAAAAA="),
"keyId" : NumberLong(0)
}
},
"operationTime" : Timestamp(1589317607, 1)
}
bye
"ok": 1
in the output that indicates the command ran successfully.Configuring the shards
// -- start init-shard1.js --
rs.initiate({
_id: 'shard1',
version: 1,
members: [{ _id: 0, host: 'shard1:27018' }],
});
// -- end init-shard2.js --
// -- init-shard2.js --
rs.initiate({
_id: 'shard2',
version: 1,
members: [{ _id: 0, host: 'shard2:27019' }],
});
// -- end init-shard2.js --
// -- init-shard3.js --
rs.initiate({
_id: 'shard3',
version: 1,
members: [{ _id: 0, host: 'shard3:27020' }],
});
// -- end init-shard3.js --
docker-compose exec shard1 sh -c "mongo --port 27018 < init-shard1.js"
docker-compose exec shard2 sh -c "mongo --port 27019 < init-shard2.js"
docker-compose exec shard3 sh -c "mongo --port 27020 < init-shard3.js"
"ok": 1
line in the result.Configuring the router
init-router.js
file with these contents:sh.addShard('shard1/shard1:27018');
sh.addShard('shard2/shard2:27019');
sh.addShard('shard3/shard3:27020');
docker-compose exec
command:docker-compose exec router sh -c "mongo < init-router.js"
{
"shardAdded" : "shard1",
"ok" : 1,
"operationTime" : Timestamp(1586120900, 7),
"$clusterTime" : {
"clusterTime" : Timestamp(1586120900, 7),
"signature" : {
"hash" : BinData(0,"AAAAAAAAAAAAAAAAAAAAAAAAAAA="),
"keyId" : NumberLong(0)
}
}
}
...
Verify everything
mongo
and then check the status:docker-compose exec router mongo
mongos>
prompt, you can run sh.status()
and you should get the status message that will contain the shards section that looks like this:shards:
{ "_id" : "shard1", "host" : "shard1/shard1:27018", "state" : 1 }
{ "_id" : "shard2", "host" : "shard2/shard2:27019", "state" : 1 }
{ "_id" : "shard3", "host" : "shard3/shard3:27020", "state" : 1 }
exit
.Use hashed sharding
router
container. Run docker-compose exec router mongo
to get the MongoDB shell.test
:mongos> sh.enableSharding('test')
{
"ok" : 1,
"operationTime" : Timestamp(1589320159, 5),
"$clusterTime" : {
"clusterTime" : Timestamp(1589320159, 5),
"signature" : {
"hash" : BinData(0,"AAAAAAAAAAAAAAAAAAAAAAAAAAA="),
"keyId" : NumberLong(0)
}
}
}
books
collection and we will add data later to observe how it gets distributed between shards. Let's enable the collection sharding:mongos> sh.shardCollection("test.books", { title : "hashed" } )
{
"collectionsharded" : "test.books",
"collectionUUID" : UUID("deb3295d-120b-47aa-be78-77a077ca9220"),
"ok" : 1,
"operationTime" : Timestamp(1589320193, 42),
"$clusterTime" : {
"clusterTime" : Timestamp(1589320193, 42),
"signature" : {
"hash" : BinData(0,"AAAAAAAAAAAAAAAAAAAAAAAAAAA="),
"keyId" : NumberLong(0)
}
}
}
Note
As a test, you could use the range-based approach. In that case, instead of use{ title: "hashed" }
you can set the value for the field to 1, like this:{ title: 1 }
exit
to exit this container before continuing.Import sample data
books.json
: curl https://gist.githubusercontent.com/peterj/a2eca621bd7f5253918cafdc00b66a33/raw/b1a7b9a6a151b2c25d31e552350de082de65154b/books.json --output books.json
books.json
file, we can use the mongoimport
command on the router
container to import the data into the books
collection.docker-compose exec router mongoimport --db test --collection books --file books.json
2020-05-12T22:01:58.469+0000 connected to: mongodb://localhost/
2020-05-12T22:01:58.529+0000 431 document(s) imported successfully. 0 document(s) failed to import.
Check sharding distribution
router
container:docker-compose exec router mongo
db.books.getShardDistribution()
. The output shows the details per each shard (how much data and documents) as well as the totals:2020-05-12T21:31:15.867+0000 I CONTROL [main]
mongos> db.books.getShardDistribution()
Shard shard1 at shard1/shard1:27018
data : 152KiB docs : 135 chunks : 2
estimated data per chunk : 76KiB
estimated docs per chunk : 67
Shard shard3 at shard3/shard3:27020
data : 164KiB docs : 142 chunks : 2
estimated data per chunk : 82KiB
estimated docs per chunk : 71
Shard shard2 at shard2/shard2:27019
data : 188KiB docs : 154 chunks : 2
estimated data per chunk : 94KiB
estimated docs per chunk : 77
Totals
data : 505KiB docs : 431 chunks : 6
Shard shard1 contains 30.19% data, 31.32% docs in cluster, avg obj size on shard : 1KiB
Shard shard3 contains 32.49% data, 32.94% docs in cluster, avg obj size on shard : 1KiB
Shard shard2 contains 37.31% data, 35.73% docs in cluster, avg obj size on shard : 1KiB
exit
to exit out of the container. To stop running all containers, you can type docker-compose stop
.