How to keep your XHgui (mongo) database in check

We got XHgui installed on a client’s VM. It’s profiling 1/100 requests, but the Mongo Database size increased rapidly and we started to look into how to keep it in check.

First thing is to gather some data about our database (and collection):

# mongo
MongoDB shell version: 2.6.5
connecting to: test
> show dbs
local   0.078GB
xhprof  3.952GB

> use xhprof
switched to db xhprof

> show collections
results
system.indexes

> db.results.find().count()
48417

We’re dealing with about 48k records, but many of those are useless, profiling of scripts that are already very fast and unworthy of the devs attention, so the first thing we wanted was to remove most of the “noise” from our collection.

The document has this format:

> db.results.findOne()
{
	"_id" : ObjectId("547ef32ec2f6cd9253ec1053"),
	"profile" : {
		[...]
		"main()==>{closure}" : {
			"ct" : NumberLong(1),
			"wt" : NumberLong(27),
			"cpu" : NumberLong(0),
			"mu" : NumberLong(2952),
			"pmu" : NumberLong(0)
		},
		"main()" : {
			"ct" : NumberLong(1),
			"wt" : NumberLong(247),
			"cpu" : NumberLong(0),
			"mu" : NumberLong(8232),
			"pmu" : NumberLong(4880)
		}
	},
	[...]
}

So we want to filter by main() wt (wall time):

> db.results.find().count()
48417
> db.results.find({"profile.main().wt" : {$lt: 1000000}}).count()
48129
> db.results.find({"profile.main().wt" : {$lt: 500000}}).count()
47185
> db.results.find({"profile.main().wt" : {$lt: 100000}}).count()
41694

As you see, the vast majority of the results are below 100 milliseconds of wall time. Since we’re not interested in that many results and we want to trim the database a bit, we made a backup and then performed a remove:

> db.results.remove({"profile.main().wt" : {$lt: 500000}})
WriteResult({ "nRemoved" : 47234 })

At this point, the database stats did look a lot more interesting:

> db.stats()
{
	"db" : "xhprof",
	"collections" : 3,
	"objects" : 1846,
	"avgObjSize" : 65844.85373781149,
	"dataSize" : 121549600,
	"storageSize" : 3583025152,
	"numExtents" : 24,
	"indexes" : 6,
	"indexSize" : 703136,
	"fileSize" : 4226809856,
	"nsSizeMB" : 16,
	"dataFileVersion" : {
		"major" : 4,
		"minor" : 5
	},
	"extentFreeList" : {
		"num" : 0,
		"totalSize" : 0
	},
	"ok" : 1
}

The dataSize parameter is the size of the actual data. The storageSize parameter is the space MongoDB allocated for the data, while the fileSize is the on-disk size of the database (including indexes, etc).

As you can learn from the MongoDB Storage FAQ, Mongo allocates space in chunks: the first chunk is 64Mb, the second 128Mb and so on up to 2Gb. Each file after that is 2Gb in size, as you can see in this output:

[root@ws ~]# ls -lh /var/lib/mongo/
totale 4,1G
drwxr-xr-x 2 mongod mongod 4,0K  3 dic 12:19 journal
-rw------- 1 mongod mongod  64M 23 nov 22:44 local.0
-rw------- 1 mongod mongod  16M 23 nov 22:44 local.ns
-rwxr-xr-x 1 mongod mongod    6 23 nov 22:44 mongod.lock
drwxr-xr-x 2 mongod mongod 4,0K 29 nov 23:12 _tmp
-rw------- 1 mongod mongod  64M  3 dic 12:38 xhprof.0
-rw------- 1 mongod mongod 128M  3 dic 12:38 xhprof.1
-rw------- 1 mongod mongod 256M  3 dic 12:38 xhprof.2
-rw------- 1 mongod mongod 512M  3 dic 12:38 xhprof.3
-rw------- 1 mongod mongod 1,0G  3 dic 12:38 xhprof.4
-rw------- 1 mongod mongod 2,0G  3 dic 12:38 xhprof.5
-rw------- 1 mongod mongod  16M  3 dic 12:37 xhprof.ns

To retrieve the space used by empty chunks, you’ll need to perform a db repair or drop the database alltogether and rebuild it from a backup.

WARNING: the db repair operation is BLOCKING and it requires double the amount of space taken by the files on disk (fileSize: in my case, about 4G).

After removing the records and performing the repair, we went down from 4G to about 200Mb:

> db.repairDatabase()
{ "ok" : 1 }
> db.stats()
{
	"db" : "xhprof",
	"collections" : 3,
	"objects" : 2079,
	"avgObjSize" : 65585.81625781626,
	"dataSize" : 136352912,
	"storageSize" : 167763968,
	"numExtents" : 13,
	"indexes" : 6,
	"indexSize" : 490560,
	"fileSize" : 201326592,
	"nsSizeMB" : 16,
	"dataFileVersion" : {
		"major" : 4,
		"minor" : 5
	},
	"extentFreeList" : {
		"num" : 0,
		"totalSize" : 0
	},
	"ok" : 1
}

A different approach would be to backup the database, drop it alltogether and restore it from the backup. WARNING: this one requires downtime, too!

# echo "db.results.remove({\"profile.main().wt\" : {\$lt: 500000}})" | mongo xhprof
# mongodump -d xhprof
# echo 'db.dropDatabase()' | mongo xhprof
# sync
# mongorestore dump/xhprof

Note: MongoDB has a compact function for collections that is used to remove data fragmentation (doc). This function does NOT free up disk space (the fileSize will stay the same):

> db.runCommand({ compact : 'results' })
{ "ok" : 1 }
> db.stats()
{
	"db" : "xhprof",
	"collections" : 3,
	"objects" : 2079,
	"avgObjSize" : 65585.81625781626,
	"dataSize" : 136352912,
	"storageSize" : 167759872,
	"numExtents" : 13,
	"indexes" : 6,
	"indexSize" : 490560,
	"fileSize" : 4226809856,
	"nsSizeMB" : 16,
	"dataFileVersion" : {
		"major" : 4,
		"minor" : 5
	},
	"extentFreeList" : {
		"num" : 24,
		"totalSize" : 3438862336
	},
	"ok" : 1
}

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s