（转帖）A deep look at MySQL 5.5 partitioning enhancements

The release of MySQL 5.5 has brought several enhancements. While most of the coverage went, understandably, to the semi-synchronous replication, the enhancements of partitioning were neglected, and sometimes there was some degree of misunderstanding on their true meaning. With this article, we want to explain these cool enhancements, especially the parts that were not fully understood.

The intuitive part: partition by non-integer columns

Anyone who has used partitions so far (see MySQL 5.1 partitions in practice) has experienced some frustration at the amount of problems to face when using non-integer columns. Partitions in 5.1 can only deal with integer values, meaning that if you want to do partitions on dates or strings, you had to convert these columns with a function.

The new additions work with RANGE and LIST partitioning. There is a new COLUMNS keyword that introduces the new functionality.

Let's assume a table like this one:


CREATE TABLE expenses (
  expense_date DATE NOT NULL,
  category VARCHAR(30),
  amount DECIMAL (10,3)
);

If you want to partition by category in MySQL 5.1, you will have to convert categories into integers, with an additional lookup table. As of MySQL 5.5., you can simply do


ALTER TABLE expenses
PARTITION BY LIST COLUMNS (category)
(
  PARTITION p01 VALUES IN ( 'lodging', 'food'),
  PARTITION p02 VALUES IN ( 'flights', 'ground transportation'),
  PARTITION p03 VALUES IN ( 'leisure', 'customer entertainment'),
  PARTITION p04 VALUES IN ( 'communications'),
  PARTITION p05 VALUES IN ( 'fees')
);

This statement, in addition to being clearly readable and to organizing the data into efficient chunks, has the beneficial side effect of ensuring that only the listed categories are accepted.

Another pain point in MySQL 5.1 is the handling of date columns. You can't use them directly, but you need to convert such columns using either YEAR or TO_DAYS, with situations like this one:


/* with MySQL 5.1*/
CREATE TABLE t2
(
  dt DATE
)
PARTITION BY RANGE (TO_DAYS(dt))
(
  PARTITION p01 VALUES LESS THAN (TO_DAYS('2007-01-01')),
  PARTITION p02 VALUES LESS THAN (TO_DAYS('2008-01-01')),
  PARTITION p03 VALUES LESS THAN (TO_DAYS('2009-01-01')),
  PARTITION p04 VALUES LESS THAN (MAXVALUE));

SHOW CREATE TABLE t2 \G
*************************** 1. row ***************************
       Table: t2
Create Table: CREATE TABLE `t2` (
  `dt` date DEFAULT NULL
) ENGINE=MyISAM DEFAULT CHARSET=latin1
/*!50100 PARTITION BY RANGE (TO_DAYS(dt))
(PARTITION p01 VALUES LESS THAN (733042) ENGINE = MyISAM,
 PARTITION p02 VALUES LESS THAN (733407) ENGINE = MyISAM,
 PARTITION p03 VALUES LESS THAN (733773) ENGINE = MyISAM,
 PARTITION p04 VALUES LESS THAN MAXVALUE ENGINE = MyISAM) */

How dreadful. A real pain in the ... code. Of course, there were workarounds, but the trouble was quite a lot. Not to mention that it was really puzzling to define a partition using YEAR or TO_DAYS, and then having to query by bare column, as the queries by function did not kick the partition pruning.

Now it's a different story. Partitioning by date has become easy and immediate.


/*With MySQL 5.5*/
CREATE TABLE t2
(
  dt DATE
)
PARTITION BY RANGE COLUMNS (dt)
(
  PARTITION p01 VALUES LESS THAN ('2007-01-01'),
  PARTITION p02 VALUES LESS THAN ('2008-01-01'),
  PARTITION p03 VALUES LESS THAN ('2009-01-01'),
  PARTITION p04 VALUES LESS THAN (MAXVALUE));

SHOW CREATE TABLE t2 \G
*************************** 1. row ***************************
       Table: t2
Create Table: CREATE TABLE `t2` (
  `dt` date DEFAULT NULL
) ENGINE=MyISAM DEFAULT CHARSET=latin1
/*!50500 PARTITION BY RANGE  COLUMNS(dt)
(PARTITION p01 VALUES LESS THAN ('2007-01-01') ENGINE = MyISAM,
 PARTITION p02 VALUES LESS THAN ('2008-01-01') ENGINE = MyISAM,
 PARTITION p03 VALUES LESS THAN ('2009-01-01') ENGINE = MyISAM,
 PARTITION p04 VALUES LESS THAN (MAXVALUE) ENGINE = MyISAM) */

The partition pruning will kick as in the previous case. There is no confusion between defining by function and querying by column because the definition is by column; the values we insert in the definition are preserved, making the DBA job much easier.

Everyone's happy then? Well, almost. Let's have a look at some more obscure trait of the COLUMNS feature.

The counter-intuitive part: multiple columns

The COLUMNS keyword does more than allowing string and date columns as partition definers. It also allows using multiple columns to define a partition.

You probably have seen some examples in the official docs, with something like the ones below:


CREATE TABLE p1 (
  a INT,
  b INT,
  c INT
)
PARTITION BY RANGE COLUMNS (a,b)
(
  PARTITION p01 VALUES LESS THAN (10,20),
  PARTITION p02 VALUES LESS THAN (20,30),
  PARTITION p03 VALUES LESS THAN (30,40),
  PARTITION p04 VALUES LESS THAN (40,MAXVALUE),
  PARTITION p05 VALUES LESS THAN (MAXVALUE,MAXVALUE)
);

CREATE TABLE p2 (
  a INT,
  b INT,
  c INT
)
PARTITION BY RANGE COLUMNS (a,b)
(
  PARTITION p01 VALUES LESS THAN (10,10),
  PARTITION p02 VALUES LESS THAN (10,20),
  PARTITION p03 VALUES LESS THAN (10,30),
  PARTITION p04 VALUES LESS THAN (10,MAXVALUE),
  PARTITION p05 VALUES LESS THAN (MAXVALUE,MAXVALUE)
)

There are also examples with PARTITION BY RANGE COLUMNS (a,b,c), and more. If you are the kind of reader who gets the whole idea from looking at such examples, feel free to make fun of me, because I didn't.

Having been using MySQL 5.1 partitions for long time, I failed to grasp immediately the significance of partitioning by multiple columns. What is the meaning of LESS THAN (10,10)? And what happens if the next partition is LESS THAN (10,20)? What if, instead, is a completely different pair, like (20,30)?

All these questions need an answer, and before an answer they need a better understanding of what we are dealing with.

In the beginning, there was some confusion, even among MySQL engineers. And it has fooled me as well! It was believed that, when all the partitions have different first range values, for all practical purposes it was the same as if the table were partitioned on one column only. But this is not the case. In the following example:


CREATE TABLE p1_single (
  a INT,
  b INT,
  c INT
)
PARTITION BY RANGE COLUMNS (a)
(
  PARTITION p01 VALUES LESS THAN (10),
  PARTITION p02 VALUES LESS THAN (20),
  PARTITION p03 VALUES LESS THAN (30),
  PARTITION p04 VALUES LESS THAN (40),
  PARTITION p05 VALUES LESS THAN (MAXVALUE)
);

This is not equivalent to the table p1 above. If you insert (10, 1, 1) in p1, it will go to the first partition. In p1_single, instead, it will go to the second one.

The reason is that (10,1) is LESS THAN (10, 10). If you only focus on the first value, you will fail to realize that you are comparing a tuple, not a single value.

Now for the difficult part. What happens when you need to determine where a row will be placed? How do you evaluate an operation like (10,9) < (10,10)? The answer is simple: the same way you evaluate two records when you are sorting them.


a=10
b=9
(a,b) < (10,10) ?

# evaluates to:

(a < 10)
OR
((a = 10) AND ( b < 10))

# which translates to:

(10 < 10)
OR
((10 = 10) AND ( 9 < 10))

If you have three columns, the expression is longer, but not more complex. You first test for less than on the first item. If two or more partitions match that, then you test the second item. If after that you still have more than one candidate partition, then you test the third item.

The figures below will walk you through the evaluation of three records being inserted into a table with a partition definition of


(10,10),
(10,20),
(10,30),
(10, MAXVALUE)

(1 < 10) OR ((1 = 10) AND (10 < 10)) = TRUE

Fig. 1. Comparing tuples. When the first value is less than the first range in the partition definition, things are easy. The row belongs here.

(10 < 10) OR ((10 = 10) AND (9 < 10)) = TRUE

Fig. 2. Comparing tuples. When the first value is equal to the first range in the partition definition, then we need to compare the second item. If that one is less than the second range, then the row belongs here.

(10 < 10) OR ((10 = 10) AND (10 < 10)) = FALSE

Fig. 3. Comparing tuples. Both the first and second values are equal to their corresponding ranges. The tuple is not LESS THAN the defined range, and thus it doesn't belong here. Next step.

(10 < 10) OR ((10 = 10) AND (10 < 20)) = TRUE

Fig. 4. Comparing tuples. At the next range, the first item is equel, and the second item is smaller. Thus the tuple is smaller, and the row belongs here.

With the help of these figures, we have now a better understanding of the procedure to insert a record into a multi-column partitioned table. That was the theory. To help you grasp the new feature better than I did at the beginning, let me offer a different example, which should make more sense to the non-theoretically oriented readers. I will use a table taken from the MySQL test employees database on Launchpad, with some modifications.


CREATE TABLE employees (
  emp_no int(11) NOT NULL,
  birth_date date NOT NULL,
  first_name varchar(14) NOT NULL,
  last_name varchar(16) NOT NULL,
  gender char(1) DEFAULT NULL,
  hire_date date NOT NULL
) ENGINE=MyISAM
PARTITION BY RANGE  COLUMNS(gender,hire_date)
(PARTITION p01 VALUES LESS THAN ('F','1990-01-01') ,
 PARTITION p02 VALUES LESS THAN ('F','2000-01-01') ,
 PARTITION p03 VALUES LESS THAN ('F',MAXVALUE) ,
 PARTITION p04 VALUES LESS THAN ('M','1990-01-01') ,
 PARTITION p05 VALUES LESS THAN ('M','2000-01-01') ,
 PARTITION p06 VALUES LESS THAN ('M',MAXVALUE) ,
 PARTITION p07 VALUES LESS THAN (MAXVALUE,MAXVALUE)

Unlike the above examples, which lie too much on the theoretical side, this one is understandable. The first partition will store female employees hired before 1990, the second one female employees hider between 1990 and 2000, and the third one all the remaining female employees. For partitions p04 to p06 we get the same cases, but for male employees. The last partition is a control case: if anyone ends up in this partition, there must have been a mistake somewhere.

Reading the latest sentence, you may rightfully ask: how do I know in which partition the rows are stored?

There are two ways actually. The first one is to use a query with the same conditions used to define the partitions.


SELECT
CASE
  WHEN gender = 'F' AND hire_date < '1990-01-01'
  THEN 'p1'
  WHEN gender = 'F' AND hire_date < '2000-01-01'
  THEN 'p2'
  WHEN gender = 'F' AND hire_date < '2999-01-01'
  THEN 'p3'
  WHEN gender = 'M' AND hire_date < '1990-01-01'
  THEN 'p4'
  WHEN gender = 'M' AND hire_date < '2000-01-01'
  THEN 'p5'
  WHEN gender = 'M' AND hire_date < '2999-01-01'
  THEN 'p6'
ELSE
  'p7'
END as p,
COUNT(*) AS rows
FROM employees
GROUP BY p;

+------+-------+
| p    | rows  |
+------+-------+
| p1   | 66212 |
| p2   | 53832 |
| p3   |     7 |
| p4   | 98585 |
| p5   | 81382 |
| p6   |     6 |
+------+-------+

If the table is MyISAM or ARCHIVE, then you can trust the statistics provided by the INFORMATION_SCHEMA.


SELECT
  partition_name part,
  partition_expression expr,
  partition_description descr,
  table_rows
FROM
  INFORMATION_SCHEMA.partitions
WHERE
  TABLE_SCHEMA = schema()
  AND TABLE_NAME='employees';
+------+------------------+-------------------+------------+
| part | expr             | descr             | table_rows |
+------+------------------+-------------------+------------+
| p01  | gender,hire_date | 'F','1990-01-01'  |      66212 |
| p02  | gender,hire_date | 'F','2000-01-01'  |      53832 |
| p03  | gender,hire_date | 'F',MAXVALUE      |          7 |
| p04  | gender,hire_date | 'M','1990-01-01'  |      98585 |
| p05  | gender,hire_date | 'M','2000-01-01'  |      81382 |
| p06  | gender,hire_date | 'M',MAXVALUE      |          6 |
| p07  | gender,hire_date | MAXVALUE,MAXVALUE |          0 |
+------+------------------+-------------------+------------+

If the engine is InnoDB, then the above values are approximated, and you can't trust them if you need exact values.

One more question may still be floating in the air after all the above explanation, and it's about performance. Do these enhancements trigger the partition pruning? The answer is an unequivocal yes. Unlike the 5.1, where partitioning by date only works with two functions, in 5.5 every partition defined with the COLUMNS keyword will use the partition pruning. Let's try:


select count(*) from employees where gender='F' and hire_date < '1990-01-01';
+----------+
| count(*) |
+----------+
|    66212 |
+----------+
1 row in set (0.05 sec)

explain partitions select count(*) from employees where gender='F' and hire_date < '1990-01-01'\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: employees
   partitions: p01
         type: ALL
possible_keys: NULL
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 300024
        Extra: Using where

Using the conditions that define the first partition, we get a very optimized query. Not only that, but also a partial condition will benefit from the partition pruning:


select count(*) from employees where gender='F';
+----------+
| count(*) |
+----------+
|   120051 |
+----------+
1 row in set (0.12 sec)

explain partitions select count(*) from employees where gender='F'\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: employees
   partitions: p01,p02,p03,p04
         type: ALL
possible_keys: NULL
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 300024
        Extra: Using where

This is the same algorithm used for composite indexes. If your condition refers to the leftmost part of the index, MySQL will use it. Similarly, if you refer to the leftmost part of the partition definition, MySQL will prune as much as possible. As it happens with composite indexes, if you only use the rightmost condition, the partition pruning doesn't work:


select count(*) from employees where hire_date < '1990-01-01';
+----------+
| count(*) |
+----------+
|   164797 |
+----------+
1 row in set (0.18 sec)

explain partitions select count(*) from employees where hire_date < '1990-01-01'\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: employees
   partitions: p01,p02,p03,p04,p05,p06,p07
         type: ALL
possible_keys: NULL
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 300024
        Extra: Using where

Referring to the second part of the partition definition without using the first one will generate a full table scan. This is always something to keep in mind when designing partitions and writing queries.

Usability enhancements: TRUNCATE PARTITION

One of the most appealing features of partitions id the ability of removing large amounts of records almost instantly. In a scheme that has become quite popular, DBAs are using partitions to rotate historical records in tables partitioned by date, dropping the partition with the oldest records at regular intervals. This method works very well. You drop the first partition (i.e. the one with the oldest records) and add a new one at the end (i.e. the one which will get the newest records).

All is well until you only need to trim from the bottom. But when you need to remove records from a partition in between, things are not as smooth. You can drop the partition, no problem about that. But if you want just to empty it, you face quite a painful problem. To remove all records from a partition you can:

Use the DELETE statement, thus relinquishing most of the advantages of partitions trimming;
use the DROP PARTITION, followed by a REORGANIZE PARTITIONS to re-create it, but it's is often more costly than the previous choice.

MySQL 5.5 introduces TRUNCATE PARTITION, a statement that works like DROP PARTITION, but leaving the partition in place, ready to be filled in again.

TRUNCATE PARTITION is a statement that should be in every DBA's tool chest.

More fine tuning: TO_SECONDS

As a bonus, the partitions enhancement package has a new function to manipulate DATE and DATETIME columns. With the TO_SECONDS function your can convert a date/time column into the number of seconds from year "0". It is useful if you want to partition on time intervals smaller than one day.

Like the rest of the enhancements, TO_SECONDS triggers the partition pruning, thus raising to three the number of date functions that you can efficiently use with partitions.

Unlike TO_DAYS, which can be reversed with FROM_DAYS, there is no such function for TO_SECONDS, but it is not that hard to create one.


drop function if exists from_seconds;

delimiter //
create function from_seconds (secs bigint)
returns DATETIME
begin
    declare days INT;
    declare secs_per_day INT;
    DECLARE ZH INT;
    DECLARE ZM INT;
    DECLARE ZS INT;
    set secs_per_day = 60 * 60 * 24;
    set days = floor(secs / secs_per_day);
    set secs = secs - (secs_per_day * days);
    set ZH = floor(secs / 3600);
    set ZM = floor(secs / 60) - ZH * 60;
    set ZS = secs - (ZH * 3600 + ZM * 60);
    return CAST(CONCAT(FROM_DAYS(days), ' ', ZH, ':', ZM, ':', ZS) as DATETIME);
end //

delimiter ;

Armed with these new weapons, we can confidently create a table with less than one day temporal partitions, as follows:


CREATE TABLE t2 (
  dt datetime
)
PARTITION BY RANGE (to_seconds(dt))
(
  PARTITION p01 VALUES LESS THAN (to_seconds('2009-11-30 08:00:00')) ,
  PARTITION p02 VALUES LESS THAN (to_seconds('2009-11-30 16:00:00')) ,
  PARTITION p03 VALUES LESS THAN (to_seconds('2009-12-01 00:00:00')) ,
  PARTITION p04 VALUES LESS THAN (to_seconds('2009-12-01 08:00:00')) ,
  PARTITION p05 VALUES LESS THAN (to_seconds('2009-12-01 16:00:00')) ,
  PARTITION p06 VALUES LESS THAN (MAXVALUE)
);
show create table t2\G
*************************** 1. row ***************************
       Table: t2
Create Table: CREATE TABLE `t2` (
  `dt` datetime DEFAULT NULL
) ENGINE=MyISAM DEFAULT CHARSET=latin1
/*!50500 PARTITION BY RANGE (to_seconds(dt))
(PARTITION p01 VALUES LESS THAN (63426787200) ENGINE = MyISAM,
 PARTITION p02 VALUES LESS THAN (63426816000) ENGINE = MyISAM,
 PARTITION p03 VALUES LESS THAN (63426844800) ENGINE = MyISAM,
 PARTITION p04 VALUES LESS THAN (63426873600) ENGINE = MyISAM,
 PARTITION p05 VALUES LESS THAN (63426902400) ENGINE = MyISAM,
 PARTITION p06 VALUES LESS THAN MAXVALUE ENGINE = MyISAM) */

Since we aren't using the COLUMNS keyword (and we can't, because mixing COLUMNS and functions is not allowed), the values recorded in the table definitions are the results of the TO_SECONDS function.

But thanks to the new function, we can reverse the value and get a human readable value, as shown in this old blog post.


select
  partition_name part,
  partition_expression expr,
  from_seconds(partition_description) descr,
  table_rows
FROM
INFORMATION_SCHEMA.partitions
WHERE
    TABLE_SCHEMA = 'test'
    AND TABLE_NAME='t2';
+------+----------------+---------------------+------------+
| part | expr           | descr               | table_rows |
+------+----------------+---------------------+------------+
| p01  | to_seconds(dt) | 2009-11-30 08:00:00 |          0 |
| p02  | to_seconds(dt) | 2009-11-30 16:00:00 |          0 |
| p03  | to_seconds(dt) | 2009-12-01 00:00:00 |          0 |
| p04  | to_seconds(dt) | 2009-12-01 08:00:00 |          0 |
| p05  | to_seconds(dt) | 2009-12-01 16:00:00 |          0 |
| p06  | to_seconds(dt) | 0000-00-00 00:00:00 |          0 |
+------+----------------+---------------------+------------+

Summing up

MySQL 5.5 is definitely good news for partitions users. While there is no direct improvement on the performance (if you evaluate performance as response time), the ease of use of the enhancements and the time saved by the new TRUNCATE PARTITION statement will result in much time saved by the DBA, and sometime by the final users.

These additions will be still updated in the next milestone release, and eventually will be GA in mid 2010. Time for all partitions users to give it a try!