Definition

A sharded cluster is a set of Replica Sets (shards) whose function is to evenly distribute the workload, in such a way that allows us to scale our applications horizontally in order to work with large amounts of data. MongoDB uses shielding support for deployments with very large data sets

Advantages of a sharded cluster

The main features that a sharded cluster brings us are the following:

  • Scalability: from an isolated server to distributed architectures of large clusters (It allows us to partially, transparently for the user, our database in as many as shards we have. This increases the processing efficiency of the data since each of them is smaller That the original)
  • Load balanced auto through the different shards. The balancer decides when to migrate the data, and to what Shard, so that they are evenly distributed among all the servers of the cluster. Each shard hosts the data corresponding to a range of the chosen key to partition our collection.

Configuration of our Sharded Cluster

This is the configuration that our cluster will have:

  • Three configur servers (the minimum for production): its function is to tell mongos what Replica Set, or shard, have to route customer requests. I ran in the machine: horla-mongodb
  • Two replica sets (a and b) of three nodes each (enough to show how the cluster is configured). Each Replica Set in a different machine, the first in: horla-mongodb and the second in: horla-mongodb2.
  • Two mongos (request the information required by the clients to the shard that contains it. They will go to the machine: horla-mongodb

Directory creation

Before we lift the services we must have created the directories where we will store our data. We will use the following:

  • Cfg0, cfg1 and cfg2 for each config server
  • A0, a1 and a2 for each of the three nodes of the first Replica Set
  • B0, b1 and b2 for each of the three nodes of the second Replica Set

Order

We will follow this order to raise the instances that make up our sharded cluster:

  1. Config servers
  2. Mongod’s
  3. Mongo’s

Config servers

We upload the config servers:

 

horla@horla-mongodb:/var/lib/mongodb/shardedcluster$ mongod --configsvr --port 26050 --logpath /var/lib/mongodb/shardedcluster/log.cfg0 --logappend --dbpath /var/lib/mongodb/shardedcluster/cfg0 --fork
about to fork child process, waiting until server is ready for connections.
forked process: 5453
child process started successfully, parent exiting
horla@horla-mongodb:/var/lib/mongodb/shardedcluster$ mongod --configsvr --port 26051 --logpath /var/lib/mongodb/shardedcluster/log.cfg1 --logappend --dbpath /var/lib/mongodb/shardedcluster/cfg1 --fork
about to fork child process, waiting until server is ready for connections.
forked process: 5469
child process started successfully, parent exiting
horla@horla-mongodb:/var/lib/mongodb/shardedcluster$ mongod --configsvr --port 26052 --logpath /var/lib/mongodb/shardedcluster/log.cfg2 --logappend --dbpath /var/lib/mongodb/shardedcluster/cfg2 --fork
about to fork child process, waiting until server is ready for connections.
forked process: 5498
child process started successfully, parent exiting
horla@horla-mongodb:/var/lib/mongodb/shardedcluster$

By means of the option ‘configsvr’ we indicate that the service will be a config server. As we can see, we are using the “horla-mongodb” machine to host the three config servers, for that reason we use different ports. In case of having them in different machines it is not necessary that we specify no port.

We will lift the services corresponding to the six nodes. The first three will form the first Replica Set (shard) and go on a machine. The last three will form the second shard and go on another machine.

In the command that we are going to use we have to indicate that the mongod will be part of a shard in a partitioned cluster. This is achieved through the -shardsvr option. The default port for these instances is 27018.

horla@horla-mongodb:/var/lib/mongodb/shardedcluster$ mongod --shardsvr --replSet a --dbpath /var/lib/mongodb/shardedcluster/a0 --logpath /var/lib/mongodb/shardedcluster/log.a0 --port 27000 --fork --logappend --smallfiles --oplogSize 50
about to fork child process, waiting until server is ready for connections.
forked process: 5515
child process started successfully, parent exiting
horla@horla-mongodb:/var/lib/mongodb/shardedcluster$ mongod --shardsvr --replSet a --dbpath /var/lib/mongodb/shardedcluster/a1 --logpath /var/lib/mongodb/shardedcluster/log.a1 --port 27001 --fork --logappend --smallfiles --oplogSize 50
about to fork child process, waiting until server is ready for connections.
forked process: 5562
child process started successfully, parent exiting
horla@horla-mongodb:/var/lib/mongodb/shardedcluster$ mongod --shardsvr --replSet a --dbpath /var/lib/mongodb/shardedcluster/a2 --logpath /var/lib/mongodb/shardedcluster/log.a2 --port 27002 --fork --logappend --smallfiles --oplogSize 50
about to fork child process, waiting until server is ready for connections.
forked process: 5609
child process started successfully, parent exiting
horla@horla-mongodb:/var/lib/mongodb/shardedcluster$

Now we go with the second Shard, which will be housed in the other machine:

 

horla@horla-mongodb2:/var/lib/mongodb/shardedcluster$ mongod --shardsvr --replSet b --dbpath /var/lib/mongodb/shardedcluster/b0 --logpath /var/lib/mongodb/shardedcluster/log.b0 --port 27000 --fork --logappend --smallfiles --oplogSize 50
about to fork child process, waiting until server is ready for connections.
forked process: 5658
child process started successfully, parent exiting
horla@horla-mongodb2:/var/lib/mongodb/shardedcluster$ mongod --shardsvr --replSet b --dbpath /var/lib/mongodb/shardedcluster/b1 --logpath /var/lib/mongodb/shardedcluster/log.b1 --port 27001 --fork --logappend --smallfiles --oplogSize 50
about to fork child process, waiting until server is ready for connections.
forked process: 5706
child process started successfully, parent exiting
horla@horla-mongodb2:/var/lib/mongodb/shardedcluster$ mongod --shardsvr --replSet b --dbpath /var/lib/mongodb/shardedcluster/b2 --logpath /var/lib/mongodb/shardedcluster/log.b2 --port 27002 --fork --logappend --smallfiles --oplogSize 50
about to fork child process, waiting until server is ready for connections.
forked process: 5754
child process started successfully, parent exiting
horla@horla-mongodb2:/var/lib/mongodb/shardedcluster$

Mongo’s

Except for administration tasks, we will never connect to a mongod or a config server. The apps will do it to the mongos, who are in charge of routing the requests to the corresponding shards. This is why we are specifying non-standard ports on all instances. Therefore, the only gateway to the database has to be a mongos.

This is how the mongos rise:

 


horla@horla-mongodb:/var/lib/mongodb/shardedcluster$ mongos --configdb horla-mongodb:26050,horla-mongodb:26051,horla-mongodb:26052 --fork --logappend --logpath /var/lib/mongodb/shardedcluster/log.mongos0
about to fork child process, waiting until server is ready for connections.
forked process: 5832
child process started successfully, parent exiting
horla@horla-mongodb:/var/lib/mongodb/shardedcluster$ mongos --configdb horla-mongodb:26050,horla-mongodb:26051,horla-mongodb:26052 --fork --logappend --logpath /var/lib/mongodb/shardedcluster/log.mongos1 --port 26061
about to fork child process, waiting until server is ready for connections.
forked process: 5894
child process started successfully, parent exiting
horla@horla-mongodb:/var/lib/mongodb/shardedcluster$

As we see, having no port specified in the first one will be that mongos which uses the default port (27017).

Check our services

Let’s now check that we have all instances running:

 


horla@horla-mongodb:/var/lib/mongodb/shardedcluster$ ps -ef | grep mongo
root 5453 2634 0 11:08 ? 00:00:10 mongod --configsvr --port 26050 --logpath /var/lib/mongodb/shardedcluster/log.cfg0 --logappend --dbpath /var/lib/mongodb/shardedcluster/cfg0 --fork
root 5469 2634 0 11:08 ? 00:00:10 mongod --configsvr --port 26051 --logpath /var/lib/mongodb/shardedcluster/log.cfg1 --logappend --dbpath /var/lib/mongodb/shardedcluster/cfg1 --fork
root 5498 2634 0 11:09 ? 00:00:10 mongod --configsvr --port 26052 --logpath /var/lib/mongodb/shardedcluster/log.cfg2 --logappend --dbpath /var/lib/mongodb/shardedcluster/cfg2 --fork
root 5515 2634 0 11:10 ? 00:00:10 mongod --shardsvr --replSet a --dbpath /var/lib/mongodb/shardedcluster/a0 --logpath /var/lib/mongodb/shardedcluster/log.a0 --port 27000 --fork --logappend --smallfiles --oplogSize 50
root 5562 2634 0 11:10 ? 00:00:10 mongod --shardsvr --replSet a --dbpath /var/lib/mongodb/shardedcluster/a1 --logpath /var/lib/mongodb/shardedcluster/log.a1 --port 27001 --fork --logappend --smallfiles --oplogSize 50
root 5609 2634 0 11:11 ? 00:00:09 mongod --shardsvr --replSet a --dbpath /var/lib/mongodb/shardedcluster/a2 --logpath /var/lib/mongodb/shardedcluster/log.a2 --port 27002 --fork --logappend --smallfiles --oplogSize 50
root 5832 2634 0 11:34 ? 00:00:00 mongos --configdb horla-mongodb:26050,horla-mongodb:26051,horla-mongodb:26052 --fork --logappend --logpath /var/lib/mongodb/shardedcluster/log.mongos0
root 5894 2634 0 11:34 ? 00:00:00 mongos --configdb horla-mongodb:26050,horla-mongodb:26051,horla-mongodb:26052 --fork --logappend --logpath /var/lib/mongodb/shardedcluster/log.mongos1 --port 26061
horla 5954 5196 0 11:35 pts/6 00:00:00 grep --color=auto mongo
horla@horla-mongodb:/var/lib/mongodb/shardedcluster$

And on the other machine we have the mongods corresponding to the second shard:

 


horla@horla-mongodb2:/var/lib/mongodb/shardedcluster$ ps -ef | grep mongo
root 5658 2634 0 11:12 ? 00:00:09 mongod --shardsvr --replSet b --dbpath /var/lib/mongodb/shardedcluster/b0 --logpath /var/lib/mongodb/shardedcluster/log.b0 --port 27000 --fork --logappend --smallfiles --oplogSize 50
root 5706 2634 0 11:12 ? 00:00:09 mongod --shardsvr --replSet b --dbpath /var/lib/mongodb/shardedcluster/b1 --logpath /var/lib/mongodb/shardedcluster/log.b1 --port 27001 --fork --logappend --smallfiles --oplogSize 50
root 5754 2634 0 11:12 ? 00:00:09 mongod --shardsvr --replSet b --dbpath /var/lib/mongodb/shardedcluster/b2--logpath /var/lib/mongodb/shardedcluster/log.b2 --port 27002 --fork --logappend --smallfiles --oplogSize 50
horla@horla-mongodb2:/var/lib/mongodb/shardedcluster$

Replica Set Set

Now we have to configure the two Replica Set (explained in the previous article). I only show how it is done for one of them because it is identical in both:

 


horla@horla-mongodb:/var/lib/mongodb/replicaset$ mongo --port 27000
MongoDB shell version: 2.6.6
connecting to: 127.0.0.1:27000/test
Hello Horla. Have a good day!
> rs.initiate()
{
	"info2" : "no configuration explicitly specified -- making one",
	"me" : "horla-mongodb:27000",
	"info" : "Config now saved locally.  Should come online in about a minute.",
	"ok" : 1
}
a:PRIMARY> rs.add("horla-mongodb:27001")
{ "ok" : 1 }
a:PRIMARY> rs.add("horla-mongodb:27002")
{ "ok" : 1 }
a:PRIMARY>

Cluster configuration

We just need to add our two shards to the cluster. We will do this from the first mongos. Before doing so we can verify that our cluster does not contain any shard:

 


horla@horla-mongodb:/var/lib/mongodb/shardedcluster$ mongo
MongoDB shell version: 2.6.6
connecting to: test
Hello Horla. Have a good day!
mongos> sh.status()
--- Sharding Status --- 
sharding version: {
"_id" : 1,
"version" : 4,
"minCompatibleVersion" : 4,
"currentVersion" : 5,
"clusterId" : ObjectId("548eb941260fb6e98e17d275")
}
shards:
databases:
{ "_id" : "admin", "partitioned" : false, "primary" : "config" }

mongos>

We add our two shards to the cluster:

 


mongos> sh.addShard("a/horla-mongodb:27000")
{ "shardAdded" : "a", "ok" : 1 }
mongos> sh.addShard("b/horla-mongodb2:27000")
{ "shardAdded" : "a", "ok" : 1 }
mongos>

Check the configuration of the cluster once the two shards have been added:

 


mongos> sh.status()
--- Sharding Status --- 
sharding version: {
"_id" : 1,
"version" : 4,
"minCompatibleVersion" : 4,
"currentVersion" : 5,
"clusterId" : ObjectId("548eb941260fb6e98e17d275")
}
shards:
{ "_id" : "a", "host" : "a/horla-mongodb:27000,horla-mongodb:27001,horla-mongodb:27002" }
{ "_id" : "b", "host" : "b/horla-mongodb2:27000,horla-mongodb2:27001,horla-mongodb2:27002" }
databases:
{ "_id" : "admin", "partitioned" : false, "primary" : "config" }

mongos>

Balancer

This way we can verify that the balancer is active:

 


mongos> sh.getBalancerState()
true
mongos>

We have already finished configuring our sharded cluster. We left for another article explaining how to partition a collection.

In the next article we will talk about how to update a stand-alone node, a Replica Set and a Sharded Cluster.

Author

Am horla

Pin It