深入 innodb

深入innodb
InnoDB表为IOT，采用了B+树类型，故每个页面至少要存储2行数据，如果行过大则会产生行溢出；
理论上InnoDB表中varchar可存储65535字节，但对于InnoDB其实际上限为65532，且该值为表所有varchar列长度总和；
对于utf8字符集，一个字符占3个字节，则其上限又缩小为1/3；
如果强行创建varchar(65535)的字段，在sql_mode不为restricted的情况下，其会被隐式转换为mediumtext；

不论是varchar还是blob/text，只要保证一个16k的页面能容下2行数据，应该不会行溢出
而一旦行溢出，字段前768字节（Antelope格式）依旧存放于当前页面，数据一般使用B-tree Node页，而溢出的行存放于Uncompress Blob页；
而barracuda采用了完全行溢出，即只保留字段的前20字节。

【建议】不要在线上执行大的删除事务，而应该每次只删一小部分数据，持续删除并提交

Ø Innodb_scan_pages_contiguous
Counts the number of leaf pages that were read contiguously during the last query. It does not count the first page, so it will be 0 for single-page scans.

Ø Innodb_scan_pages_jumpy
Counts the number of leaf pages that were not read contiguously during the last query. It does not count the first page,
so it will be 0 for single-page scans. Because it measures leaf pages only and “branch pages” must be inserted for scans on large tables, it will always be nonzero for large tables.

Ø Innodb_scan_data_in_pages
Counts the bytes used by records in the leaf pages that were scanned during the last query. To make the implementation more efficient, it does not count the last page in the scan.

Ø Innodb_scan_garbage_in_pages
Counts the bytes occupied by garbage (not used by records) in the leaf pages that were scanned during the last query. To make the implementation more efficient, it does not count the last page in the scan

--查看数据信息
(system@127.0.0.1:3306) [test]> show table status like 'test_user'G;
*************************** 1. row ***************************
           Name: test_user
         Engine: InnoDB
        Version: 10
     Row_format: Dynamic
           Rows: 9858355
 Avg_row_length: 288
    Data_length: 2843721728
Max_data_length: 0
   Index_length: 214712320
      Data_free: 7340032
 Auto_increment: 10210001
    Create_time: 2018-05-02 06:32:54
    Update_time: NULL
     Check_time: NULL
      Collation: utf8_bin
       Checksum: NULL
 Create_options: 
        Comment: 
1 row in set (0.00 sec)

• 消除碎片
随机方式插入新数据，可能导致辅助索引产生大量的碎片，意思是索引page和索引顺序不接近，或者有大量的空洞。
执行 ALTER TABLE XX ENGINE = INNODB; 可以重整表空间，消除碎片
或者备份数据表，删掉，重新导入
• 回收表空间
共享表空间无法在线回收，共享表空间想要回收的话，需要全部InnoDB表导出、删除、导回
数据表空间用上面的方法即可，或者直接清空不需要保存的历史表，临时表 TRUNCATE TABLE
• 检查点
⁻ innodb会批量的把buffer pool中的脏页以及redo log刷新到磁盘，称之为检查点
⁻ 并不是在一次刷新中刷新所有的内容，因为这样会降低mysql的性能.甚至无法提供服务
⁻ 在恢复的过程中，innodb会向前扫描事务日志，把这些脏数据刷新到磁盘中
⁻ innodb循环的使用它的事务日志，所以旧的日志必然在未来某一时刻被覆盖，innodb必须保证，在旧日志被覆盖之前，与这些旧的日志条目相关的脏数据都被刷新到了磁盘
⁻ 如果这一点不能保证，那么万一服务器crash，buffer pool中的脏页就永远也无法恢复了
⁻ 所以在切换日志的时候，innodb必然会做检查点，把所有的脏页都刷新到磁盘
⁻ 从这个意义上，innodb的事务日志越大，节省的磁盘IO越多，对系统性能越好。但是crash后恢复的时间肯定会变长
⁻ innodb的检查点每隔几秒就会做一次
⁻ 只是经过日志切换后，在日志被重用前，该日志的内容必须被全部刷新到磁盘，否则，系统就会被hung住
⁻ 尝试用大一点的事务日志，可以减少检查点过程中写磁盘的次数（之所以节省，是因为IO的合并）
Checkpoint触发条件
Ø 每1秒
• 若buffer pool中的脏页比率超过了srv_max_buf_pool_modified_pct = 75，则进行Checkpoint，刷脏页，flush PCT_IO(100)的dirty pages = 200
• 若采用adaptive flushing，则计算flush rate，进行必要的flush

Ø 每10秒
• 若buffer pool中的脏页比率未超过70%，flush PCT_IO(10)的dirty pages = 20
• 每10S，必定调用一次log_checkpoint，做一次Checkpoint
脏页比率 = 需要被flush的页面数 / (使用中的页面数 + 空闲页面数 + 1)

innodb_adaptive_flushing_lwm -- 设置redo log flush低水位线，当需要flush的redo log超过这个低水位线时，立即强制启用adaptive flushing，即便没有设置使用adaptive flushing机制
innodb_io_capacity = N -- 设置InnoDB后台进程最大的IO性能指标，例如从buffer pool中刷新刷新数据页，从insert buffer中合并数据等。默认值是200，在繁忙的OLTP模式下，需要适当提高。
单盘5400或7200转的配置下，最小也可以设置为100，如果是15000转多盘做RAID，那么可以设置更高了。
innodb_io_capacity_max =N -- innodb_io_capacity在紧急情况下的上限值
innodb_flushing_avg_loops =N -- 设置InnoDB统计前N个page flush速率，避免太快flush
后台进程（15个）：
• master thread（1个）
• lock monitor thread（1个）
• error monitor thread（1个）
• log thread（1个）
• insert buffer thread（1个）
• read/write thread（8个，默认各4个）
• purge thread（1个）
• page cleaner thread（1个）
master thread的线程优先级别最高。
其内部几个循环(loop)组成：主循环(loop)，后台循环(background loop)，刷新循环(flush loop)，暂停循环(suspend loop)。

每秒一次的操作包括：
日志缓冲（log buffer）刷新到磁盘，即使这个事务还没有提交(总是)
合并插入缓冲（insert buffer）(可能)
至多刷新100个innodb的缓冲池（buffer pool）中的脏页（dirty page）到磁盘(可能)
如果当前没有用户活动，切换到background loop(可能)

接着来看每10秒的操作，包括如下内容：
刷新100个脏页到磁盘(可能)
合并至多5个插入缓冲(总是)
将日志缓冲刷新到磁盘(总是)
删除无用的undo页(总是)
刷新100个或者10个脏页到磁盘(总是)
产生一个检查点(总是)

【重点】几个关键点：
1、dirty pages不要堆积太多，否则热点数据不能被有效缓存，命中率低，并且瞬间大批量刷新dirty pages时也影响IOPS；
2、undo pages不要堆积太多，否则ibdata1可能暴涨，或者tps受到影响；
3、checkpoint不要延迟太厉害，否则crash recovery进程很慢；
4、记住最重要的一点：让这些后台进程有条不紊按照固定频率工作着，不要有停滞，也不要太频繁。

InnoDB监控触发器
• innodb_monitor
• innodb_lock_monitor
• innodb_table_monitor
• innodb_tablespace_monitor
innodb一些参数
innodb_buffer_pool_size #50-70
ibu太小有什么问题：
ERROR 1206 (HY000): The total number of locks exceeds the lock table size 错误解决，http://imysql.com/2007_08_03_locks_exceeds
innodb_buffer_pool_instances #切分很多小实例，内存利用率
innodb_data_file_path
innodb_max_dirty_pages_pct=20-50
innodb_flush_method =O_DIRECT #xfs文件系统上，仍旧使用O_DIRECT就好
innodb_flush_log_at_trx_commit
innodb_flush_log_at_trx_commit =1
sync_binlog =1