Mongodb 监控

mongodb 监控

监控什么?
CPU、内存、磁盘I/O、应用程序(MongoDB)、进程监控(ps -aux)、错误日志监控


1 MongoDB集群监控方式


db.serverStatus()
查看实例运行状态(内存使用、锁、用户连接等信息)
通过比对前后快照进行性能分析

"connections" # 当前连接到本机处于活动状态的连接数
"activeClients" # 连接到当前实例处于活动状态的客户端数量
"locks" # 锁相关参数
"opcounters" # 启动之后的参数
"opcountersRepl" # 复制相关
"storageEngine" # 查看数据库的存储引擎
"mem" # 内存相关
MyMongo:PRIMARY> db.serverStatus().connections
{ "current" : 6, "available" : 813, "totalCreated" : 3428 }
MyMongo:PRIMARY> db.serverStatus().connections.available
813
MyMongo:PRIMARY> db.serverStatus().opcounters.insert
765275
MyMongo:PRIMARY> db.serverStatus().ok
1

db.stats()
显示信息说明
MyMongo:PRIMARY> db.stats()
{
"db" : "test",//表示当前是针对"test"这个数据库的描述
"collections" : 11,//表示当前数据库有多少个collections.可以通过运行show collections查看当前数据库具体有哪些collection.
"views" : 0,
"objects" : 17902,//表示当前数据库所有collection总共有多少行数据。显示的数据是一个估计值,并不是非常精确。
"avgObjSize" : 34.015193833091274,//表示每行数据是大小,也是估计值,单位是bytes
"dataSize" : 608940,//表示当前数据库所有数据的总大小,不是指占有磁盘大小。单位是bytes
"storageSize" : 577536,//表示当前数据库占有磁盘大小,单位是bytes,因为mongodb有预分配空间机制,为了防止当有大量数据插入时对磁盘的压力,因此会事先多分配磁盘空间
"numExtents" : 0,
"indexes" : 14,//表示system.indexes表数据行数
"indexSize" : 385024,//表示索引占有磁盘大小。单位是bytes
"fsUsedSize" : 27890081792,
"fsTotalSize" : 52844687360,//表示当前数据库预分配的文件大小
"ok" : 1,
"operationTime" : Timestamp(1521048613, 1),
"$clusterTime" : {
"clusterTime" : Timestamp(1521048613, 1),
"signature" : {
"hash" : BinData(0,"AAAAAAAAAAAAAAAAAAAAAAAAAAA="),
"keyId" : NumberLong(0)
}
}
}

2 mongostat


实时数据库状态,读写、加锁、索引命中、缺页中断、读写等待队列等情况。
每秒刷新一次状态值,并能提供良好的可读性,通过这些参数可以观察到MongoDB系统整体性能情况
/usr/local/mongodb/bin/mongotop -h 192.168.20.118:28002
/usr/local/mongodb/bin/mongostat -h 192.168.20.118:28002
[mongodb@hongquan1 conf]$ /usr/local/mongodb/bin/mongostat -h 192.168.20.118:28002
insert query update delete getmore command dirty used flushes vsize res qrw arw net_in net_out conn set repl time
*0 *0 *0 *0 0 2|0 0.2% 52.4% 0 1.76G 320M 0|0 1|0 425b 60.4k 7 MyMongo PRI Mar 15 01:36:17.947
*0 *0 *0 *0 0 2|0 0.2% 52.4% 0 1.76G 320M 0|0 1|0 1.29k 60.7k 7 MyMongo PRI Mar 15 01:36:18.951
*0 *0 *0 *0 0 3|0 0.2% 52.4% 0 1.76G 320M 0|0 1|0 427b 60.6k 7 MyMongo PRI Mar 15 01:36:19.951
*0 *0 *0 *0 0 2|0 0.2% 52.4% 0 1.76G 320M 0|0 1|0 159b 60.4k 7 MyMongo PRI Mar 15 01:36:20.943
insert 每秒插入量
query 每秒查询量
update 每秒更新量
delete 每秒删除量
conn 当前连接数
qr|qw 客户端查询排队长度(读|写)最好为0,如果有堆积,数据库处理慢。
ar|aw 活跃客户端数量(读|写)
time 当前时间
上面insert、query、update、delete、getmore、command 每种对应操作的发生次数。其中faults表示访问失败数,数据从内存交换出去,放到swap。值越小越好,最好不要大于100
其中mongostat加上--discover 可以查看到副本集和分片集群的所有成员状态


[root@mysqlt1 ~]# /usr/local/mongodb/bin/mongostat --host=10.15.7.114 --port=28004 --discover

host insert query update delete getmore command dirty used flushes vsize res qrw arw net_in net_out conn set repl time
10.15.7.114:28001 *0 *0 *0 *0 0 3|0 0.1% 0.3% 0 1.44G 51.0M 0|0 1|0 668b 60.2k 8 MyMongo SEC Oct 11 09:40:43.487
10.15.7.114:28002 *0 *0 *0 *0 0 3|0 0.1% 0.2% 0 1.44G 49.0M 0|0 1|0 666b 60.1k 7 MyMongo SEC Oct 11 09:40:43.489
10.15.7.114:28004 *0 *0 *0 *0 0 3|0 0.1% 0.2% 0 1.53G 54.0M 0|0 1|0 664b 60.2k 13 MyMongo PRI Oct 11 09:40:43.487

[mongodb@hongquan1 conf]$ /usr/local/mongodb/bin/mongotop -h 192.168.20.118:28002
2018-03-15T01:38:34.250+0800 connected to: 192.168.20.118:28002

ns total read write 2018-03-15T01:38:35+08:00
admin.system.keys 0ms 0ms 0ms
admin.system.roles 0ms 0ms 0ms
admin.system.users 0ms 0ms 0ms
admin.system.version 0ms 0ms 0ms
admin.tempusers 0ms 0ms 0ms
config.system.sessions 0ms 0ms 0ms
config.transactions 0ms 0ms 0ms
local.a 0ms 0ms 0ms
local.me 0ms 0ms 0ms
local.oplog.rs 0ms 0ms 0ms
ns:数据库命名空间,后者结合了数据库名称和集合。
total:mongod在这个命令空间上花费的总时间。
read:在这个命令空间上mongod执行读操作花费的时间。
write:在这个命名空间上mongod进行写操作花费的时间

3 db级别命令


db.currentOp()
查看数据库当前执行什么操作。
用于查看长时间运行进程
通过(执行时长、操作、锁、等待锁时长)等条件过滤
如果发现一个操作太长,把数据库卡死的话,可以用这个命令杀死他:
> db.killOp(608605)

db.setProfilingLevel()
设置server级别慢日志
打开profiling:
0:不保存
1:保存慢查询日志
2:保存所有查询日志
注意:级别是对应当前的数据库,而阈值是全局的。
查看profiling状态
> use test
> db.setProfilingLevel(2);
> db.getProfilingLevel()
> db.system.profile.find().sort({$natural:-1})
ts:时间戳
info:具体的操作
millis:操作所花时间,毫秒

查看慢查询:system.profile
关闭profiling
企业工具ops manager官方文档: https://docs.opsmanager.mongodb.com/v3.6/

-------------zabbix 监控mongodb
/usr/local/zabbix/bin/zabbix_get -s 127.0.0.1 -p 10050 -k "MongoDB.Status[connections,current]"
/usr/local/zabbix/bin/zabbix_get -s 127.0.0.1 -p 10050 -k "MongoDB.Status[backgroundFlushing,last_ms]"
/usr/local/zabbix/bin/zabbix_get -s 127.0.0.1 -p 10050 -k "net.tcp.port[,28002]"
/usr/local/zabbix/bin/zabbix_get -s 127.0.0.1 -p 10050 -k "MongoDB.Status[uptime]"
/usr/local/zabbix/bin/zabbix_get -s 127.0.0.1 -p 10050 -k "MongoDB.Status[opcounters,insert]"
/usr/local/zabbix/bin/zabbix_get -s 127.0.0.1 -p 10050 -k "net.tcp.listen[28002]"
/usr/local/zabbix/bin/zabbix_get -s 127.0.0.1 -p 10050 -k "MongoDB.Status[opcountersRepl,command]"

/usr/local/zabbix/bin/zabbix_get -s 127.0.0.1 -p 10050 -k "MongoDB.Status[repl,hosts]"

UserParameter=MongoDB.status[*],/bin/echo "db.serverStatus().$1" | /usr/local/mongodb/bin/mongo 192.168.20.118:28002/admin | grep "$2" | awk -F ':' '{print $$2}' | awk -F ',' '{print $$1}'

MyMongo:PRIMARY> db.serverStatus()
{
"host" : "hongquan1:28002",//server hostname
"version" : "3.6.3",//mongodb version
"process" : "mongod",//进程名字
"pid" : NumberLong(3624),//进程号,$ ps -ef|grep mongodb
"uptime" : 330756,//启动时间 S
"uptimeMillis" : NumberLong(330756195),
"uptimeEstimate" : NumberLong(330756),//基于MongoDB内部粗粒度定时器的运行时间
"localTime" : ISODate("2018-03-15T18:18:26.916Z"),//--server的本地时间
"asserts" : {
"regular" : 0,//--server启动以来抛出正规断言(assert 类似于异常处理的形式)总数目
"warning" : 0,//--server启动以来抛出的告警总数目
"msg" : 0,//--消息断言数目。服务器内部定义的良好字符串错误
"user" : 197807,//--用户断言数目。用户产生的错误,譬如:磁盘空间满;重复键
"rollovers" : 0//--server启动以来,assert counters have rolled over的次数
},
"connections" : { //连接数
"current" : 6,
"available" : 813,
"totalCreated" : 33997
},
"extra_info" : {
"note" : "fields vary by platform",
"page_faults" : 128//--此过程中访问内存中页面失败的总次数。仅适用于Linux
},
"globalLock" : {
"totalTime" : NumberLong("330756195000"),//--全局锁创建的时间(单位:ms 微秒)
"currentQueue" : {
"total" : 0,//--等待全局锁的队列中操作数目
"readers" : 0,//--等待读锁的队列中操作数目
"writers" : 0//--等待写锁的队列中操作数目
},
"activeClients" : {
"total" : 27,//--连接到server的当前活动client数目
"readers" : 0,//--执行读操作的当前活动client数目
"writers" : 0//--执行写操作的当前活动client数目
}
},
"locks" : {
"Global" : {
"acquireCount" : {
"r" : NumberLong(8911776),
"w" : NumberLong(1840369),
"W" : NumberLong(669)
},
"acquireWaitCount" : {
"r" : NumberLong(44),
"w" : NumberLong(1),
"W" : NumberLong(10)
},
"timeAcquiringMicros" : {
"r" : NumberLong(38981),
"w" : NumberLong(142),
"W" : NumberLong(7605)
}
},
"Database" : {
"acquireCount" : {
"r" : NumberLong(3505895),
"w" : NumberLong(1805926),
"R" : NumberLong(123),
"W" : NumberLong(1381)
},
"acquireWaitCount" : {
"w" : NumberLong(15),
"W" : NumberLong(29)
},
"timeAcquiringMicros" : {
"w" : NumberLong(198585),
"W" : NumberLong(273441)
}
},
"Collection" : {
"acquireCount" : {
"r" : NumberLong(2503081),
"w" : NumberLong(732637)
}
},
"Metadata" : {
"acquireCount" : {
"W" : NumberLong(9)
}
},
"oplog" : {
"acquireCount" : {
"r" : NumberLong(1002769),
"w" : NumberLong(1073431)
}
}
},
"logicalSessionRecordCache" : {
"activeSessionsCount" : 0,
"sessionsCollectionJobCount" : 1102,
"lastSessionsCollectionJobDurationMillis" : 0,
"lastSessionsCollectionJobTimestamp" : ISODate("2018-03-15T18:15:57.322Z"),
"lastSessionsCollectionJobEntriesRefreshed" : 0,
"lastSessionsCollectionJobEntriesEnded" : 0,
"lastSessionsCollectionJobCursorsClosed" : 0,
"transactionReaperJobCount" : 1102,
"lastTransactionReaperJobDurationMillis" : 0,
"lastTransactionReaperJobTimestamp" : ISODate("2018-03-15T18:15:57.361Z"),
"lastTransactionReaperJobEntriesCleanedUp" : 0
},
"network" : {
"bytesIn" : NumberLong(498867911),
"bytesOut" : NumberLong("2604855044"),
"physicalBytesIn" : NumberLong(416438312),
"physicalBytesOut" : NumberLong("2368099190"),
"numRequests" : NumberLong(1408505),
"compression" : {
"snappy" : {
"compressor" : {
"bytesIn" : NumberLong(586258544),
"bytesOut" : NumberLong(337829036)
},
"decompressor" : {
"bytesIn" : NumberLong(316856552),
"bytesOut" : NumberLong(450798083)
}
}
},
"serviceExecutorTaskStats" : {
"executor" : "passthrough",
"threadsRunning" : 6
}
},
"opLatencies" : {
"reads" : {
"latency" : NumberLong(19500873),
"ops" : NumberLong(175622)
},
"writes" : {
"latency" : NumberLong(49719906),
"ops" : NumberLong(521342)
},
"commands" : {
"latency" : NumberLong(28760416),
"ops" : NumberLong(711539)
}
},
"opcounters" : {
"insert" : 765275,//--server启动以来总的insert数据量
"query" : 1265,//--server启动以来总的query数据量
"update" : 14,//-server启动以来总的update数据量
"delete" : 10,//--server启动以来总的delete数据量
"getmore" : 175274,//--server启动以来调用任何游标的getMore总次数
"command" : 712854//-server启动以来执行其他命令的总次数
},
"opcountersRepl" : {//复制相关信息
"insert" : 0,
"query" : 0,
"update" : 0,
"delete" : 0,
"getmore" : 0,
"command" : 0
},
"repl" : {//复制相关信息
"hosts" : [
"192.168.20.118:28001",
"192.168.20.118:28002"
],
"arbiters" : [
"192.168.20.118:28003"
],
"setName" : "MyMongo",
"setVersion" : 1,
"ismaster" : true,
"secondary" : false,
"primary" : "192.168.20.118:28002",
"me" : "192.168.20.118:28002",
"electionId" : ObjectId("7fffffff0000000000000004"),
"lastWrite" : {
"opTime" : {
"ts" : Timestamp(1521137905, 1),
"t" : NumberLong(4)
},
"lastWriteDate" : ISODate("2018-03-15T18:18:25Z"),
"majorityOpTime" : {
"ts" : Timestamp(1521137905, 1),
"t" : NumberLong(4)
},
"majorityWriteDate" : ISODate("2018-03-15T18:18:25Z")
},
"rbid" : 1
},
"storageEngine" : {
"name" : "wiredTiger",
"supportsCommittedReads" : true,
"readOnly" : false,
"persistent" : true
},
"tcmalloc" : {
"generic" : {
"current_allocated_bytes" : 401898176,
"heap_size" : 498167808
},
"tcmalloc" : {
"pageheap_free_bytes" : 21684224,
"pageheap_unmapped_bytes" : 56627200,
"max_total_thread_cache_bytes" : 262144000,
"current_total_thread_cache_bytes" : 4529152,
"total_free_bytes" : 17958208,
"central_cache_free_bytes" : 8770304,
"transfer_cache_free_bytes" : 4658752,
"thread_cache_free_bytes" : 4529152,
"aggressive_memory_decommit" : 0,
"pageheap_committed_bytes" : 441540608,
"pageheap_scavenge_count" : 42266,
"pageheap_commit_count" : 69531,
"pageheap_total_commit_bytes" : NumberLong("37870575616"),
"pageheap_decommit_count" : 42266,
"pageheap_total_decommit_bytes" : NumberLong("37429035008"),
"pageheap_reserve_count" : 62,
"pageheap_total_reserve_bytes" : 498167808,
"formattedString" : "------------------------------------------------ MALLOC: 401898752 ( 383.3 MiB) Bytes in use by application MALLOC: + 21684224 ( 20.7 MiB) Bytes in page heap freelist MALLOC: + 8770304 ( 8.4 MiB) Bytes in central cache freelist MALLOC: + 4658752 ( 4.4 MiB) Bytes in transfer cache freelist MALLOC: + 4528576 ( 4.3 MiB) Bytes in thread cache freelists MALLOC: + 3170560 ( 3.0 MiB) Bytes in malloc metadata MALLOC: ------------ MALLOC: = 444711168 ( 424.1 MiB) Actual memory used (physical + swap) MALLOC: + 56627200 ( 54.0 MiB) Bytes released to OS (aka unmapped) MALLOC: ------------ MALLOC: = 501338368 ( 478.1 MiB) Virtual address space used MALLOC: MALLOC: 22340 Spans in use MALLOC: 72 Thread heaps in use MALLOC: 4096 Tcmalloc page size ------------------------------------------------ Call ReleaseFreeMemory() to release freelist memory to the OS (via madvise()). Bytes released to the OS take up virtual address space but no physical memory. "
}
},...
"mem" : {
"bits" : 64,//#64位操作系统
"resident" : 319,//#共占用屋里内存M
"virtual" : 1807,//#占用虚拟内存
"supported" : true,
"mapped" : 0, //#映射内存
"mappedWithJournal" : 0
},
"metrics" : {
...
},
"ok" : 1,--serverStatus是否返回正确
"operationTime" : Timestamp(1521137905, 1),
"$clusterTime" : {
"clusterTime" : Timestamp(1521137905, 1),
"signature" : {
"hash" : BinData(0,"AAAAAAAAAAAAAAAAAAAAAAAAAAA="),
"keyId" : NumberLong(0)
}
}
}

原文地址:https://www.cnblogs.com/yhq1314/p/10008063.html