Notes on <High Performance MySQL> Ch5: Advanced MySQL Feature

The MySQL Query Cache

The query cache keeps track of which tables a query uses, and if any of those tables changes, it invalidates the cache entry.

How MySQL Checks for a Cache Hit

The cache is a lookup table. The lookup key is a hash of the query text itself, the current database, the client protocol version, and a handful of other things that might affect the actual bytes in the query’s result.

The query cache will not store a result unless the query that generated it was deterministic. Thus, any query that contains a nondeterministic function, such as NOW() or CURRENT_DATE(), will not be cached.

MySQL marks a query as uncacheable as soon as it notices a construct that forbids caching, and the results generated by such a query are not stored.

There is a single global lock on the query cache, which will block all queries that need to access it. Accessing happens both when checking for a hit and when checking whether there are any queries to invalidate.

How the Cache Uses Memory

It keeps structures that help it figure out which memory in its pool is free, mappings between tables and query results, query text, and the query results.

Aside from some basic housekeeping structures, which require about 40KB, the query cache’s memory pool is available to be used in variable-sized blocks. Every block knows what type it is, how large it is, and how much data it contains, and it holds pointers to the next and previous logical and physical blocks.

Blocks can be of several types: they can store cache result, lists of tables used by a query, query text, and so on.

When the server starts, it initializes the memory for the query cache. The memory pool is initially a single free block. This block is as large as the entire amount of memory the cache is configured to use, minus the housekeeping structures.

When the server caches a query’s results, it allocates a block to store those results. This block must be a minimum of query_cache_min_res_unit bytes, though it may be large if the server knows it is storing a large result.

When the Query Cache is Helpful

With the cache disabled, each read query has to execute and return its results, and each write query has to execute. Witch the cache enabled, each read has to first check the cache and then either return the stored result or, if there isn’t one, execute, generate the result, store it, and return it. Each write query has to execute and then check whether there are any cached queries that must be invalidated.

The type of query that benefits most from caching is one whose result is expensive to generate but doesn’t take much space in the cache, so it’s cheap to store, return to client, and invalidate.

When the server receives a SELECT statement, it increments either the Qcache_hits or Com_select status variable, depending on whether the query was cached. Thus, the query cache hit rate is given by the formula Qcache_hits / (Qcache_hits + Com_select)

Cache invalidations can happen because of fragmentation, insufficient memory, or data modifications. If you have allocated enough memory to the cache and tuned the query_cache_min_res_unit value properly, most cache invalidations should be due to modification. You can see how many queries have modified data by examining the Com_* status variables (Com_update, Com_delete, and so forth, and you can see how many queries have been invalidated due to low memory by checking the Qcache_lowmem_prunes status variable.

How to Tune and Maintain the Query Cache

- Query_cache_type: OFF, ON, or DEMAND (if set to DEMAND,need SQL_CACHE hint)

- Query_cache_size: This must be a multiple of 1024 bytes

- Query_cache_min_res_unit

- Query_cache_limit: If the result exceeds the specified limit, MySQL will increment the Qcache_not_cached status variable and discard the results cached so far.

- Query_cache_wlock_invalidate:

Reducing fragmentation

You can keep large results from being cached by lowering the query_cache_limit variable, which can sometimes help achieve a better balance between fragmentation and the overhead of storing results in the cache.

If Qcache_free_blocks approaches Qcache_total_blocks/2, your query cache is severely fragmented.

You can defragment the query cache with FLUSH QUERY CACHE. This command compacts the query cache by moving all blocks “upward” and removing the free space between them, leaving a single free block at the bottom. It blocks access to the query cache while it runs, which pretty much locks the whole server, but it’s usually fast unless your cache is large.

Contrary to its name, it does not remove queries from the cache. That’s what REST QUERY CACHE does.

Improving query cache usage

To disable the query cache, set query_cache_size to 0 (changing query_cache_type globally won’t affect connections that are already open, and it won’t return the memory to the server)

InnoDB and the Query Cache

In MySQL 4.1 and newer, InnoDB indicates to the server, on a per-table basis, whether a transaction can access the query cache.

The factor that determine access are the transaction ID and whether there are any locks on the table. Each table in InnoDB’s in-memory data dictionary has an associated transaction ID counter.

Transactions whose IDs are less than the counter value are forbidden to read from or write to the query cache for queries that involve that table. Any locks on a table also make queries that access it uncacheable.

General Query Cache Optimizations

Alternatives to the Query Cache

Storing Code Inside MySQL

Stored Procedures and Functions

Triggers

Events

Preserving Comments in Stored Code

Cursors

MySQL currently provides read-only, forward-only server-side cursors that you can use only from within a MySQL stored procedure. A stored procedure can have multiple cursors open at once, and you can “nest” cursors in loops.

Cursors are read-only because they iterate over temporary tables rather than the tables where the data originated.

A cursor executes the entire query when you open it. If you close a cursor that fetches data from a large result set early, you won’t actually save work. If you need only a few rows, use LIMIT.

Prepared Statements

When you create a prepared statement, the client library sends the server a prototype of the actual query you want to use. The server parses and processes this “skeleton” query, stores a structure representing the partially optimized query, and returns a statement handle to the client. The client library can execute the query repeatedly by specifying the statement handle.

Using prepared statements can be more efficient than executing a query repeatedly,

- Sending parameters via the binary protocol is more efficient than sending them as ASCII text. The biggest savings are for BLOB and TEXT values, which can be sent to the server in chunks rather than as a single huge piece of data.

You can use the binary protocol only with prepared statements. Issuing queries through the normal my_query API function will not use the binary protocol.

Prepared Statement Optimization

The optimizations can be separated into three types, based on when they must be performed.

- At preparation time

- At first execution

The server simplifies nested joins and coverts OUTER JOIN to INNER JOIN where possible

- At every execution

The SQL Interface to Prepared Statements

Limitations of Prepared Statements

User-Defined Functions

Unlike stored functions, which are written in SQL, you can write UDFs in any programming language that supports C calling conventions.

Views

You can’t have triggers on views.

MySQL can use “MERGE” and “TEMPTABLE” to execute the view, and it tries to use the MERGE algorithm when possible.

If you want to know whether a view will use MERGE or TEMPTABLE, you should EXPLAIN a trivial SELECT query against the view:

The presence of a DERIVED select type indicates that the view will use the TEMPTABLE algorithm.

Updatable Views

Any view that uses the TEMPTABLE algorithm is not updatable.

Performance Implications of Views

Limitations of Views

MySQL does not support the materialized views. MySQL also doesn’t support indexed views.

Character Sets and Collations

A character set is a mapping from binary encodings to a defined set of symbols. A collation is a set of sorting rules for a character set.

How MySQL Uses Character Sets

Default for creating objects

MySQL has a default character set and collation for the server, for each database, and for each table.

Settings for client/server communication

When the server and the client communicate with each other, they may send data back and forth in different character sets. The server will translate as needed:

The server assumes the client is sending statements in the character set specified by character_set_client.
After the server receives a statement from the client, it translates it into the character set specified by character_set_connection. It also uses this setting to determine how to convert numbers into strings.
When the server returns results or error message back to the client, it translates them into character_set_result.

How MySQL compares values

When MySQL compares two values with different character sets, it must convert them to the same character set for the comparison.

MySQL also assigns a coercibility to values. This determines the priority of a value’s character set and influences which value MySQL will convert implicitly. You can use the CHARST(), COLLATION(), and COERCIBILITY() functions to help debug errors related to character sets and collations.

Special-case behaviors

- The magical character_set_database setting

The character_set_database setting defaults to the default database’s setting.

- LOAD DATA INFILE

LOAD DATA INFILE interprets incoming data according to the current setting of character_set_database.

- SELECT INTO OUTFILE

MySQL writes all data from SELECT INTO OUTFILE without converting it.

- Embedded escape sequences

MySQL interprets escape sequences in statements according to character_set_client, even when there’s an introducer or collate clause.

Choosing a Character Set and Collation

MySQL supports a three-byte subset of full UTF-8 that can store most characters in most languages. You can see the supported character sets with the SHOW CHARACTER SET and SHOW COLLATION commands.

The most common choices for collations are whether letters should sort in a case sensitive or case insensitive manner, or according to the encoding’s binary value. The collation names generally end with _cs, _ci, or_bin, so you can tell which is which easily.

How Character Sets and Collations Affect Queries

The server can use the index for sorting only if it’s stored by the same collation as the one the query specifies. The index is sorted by the column’s collation.

If you join two tables on character columns that don’t have the same character set, MySQL has to convert one of them. This conversion can make it impossible to use an index, because it is just like a function enclosing the column.

The UTF-8 multibyte character set stores each character in a varying number of bytes (between one and three). MySQL uses fixed-size buffers internally for many string operations, so it must allocate enough space to accommodate the maximum possible length. Variable-length fields (VARCHAR, TEXT) do not suffer from this on disk, but in-memory temporary tables used for processing and sorting queries will always allocate the maximum length needed.

If you index a UTF-8 column, MySQL has to assume each character can take up to three bytes, so the usual length restrictions are suddenly shortened by a factor of three:

Note: 255 = 767/3

Character sets are mostly useful for case insensitive comparison, sorting, and string operations that need to be character-aware, such as SUBSTRING(). If you don’t need the database server to be aware of characters, you can store anything you want in BINARY columns, including UTF-8 data.

Full-Text Searching

In MySQL, only the MyISAM storage engine supports full-text indexing. It lets you search character-based content (CHAR, VARCHAR, and TEXT columns), and it supports both natural-language and Boolean searching.

A MyISAM full-text index operates on a full-text collection, which is made up of one or more character columns from a single table. In effect, MySQL builds the index by concatenating the columns in the collection and indexing them as one long string of text.

A MyISAM full-text index is a special type of B-Tree index with two levels. The first level holds keywords. Then, for each keyword, the second level holds a list of associated document pointers that point to full-text collections that contain that keyword. The index doesn’t contain every word in the collection. It prunes it as follows:

A list of stopwords weeds out “noise” words by preventing them from being indexed. The stopword list is based on common English usage by default, but you can use the ft_stopword_file option to replace it with a list from an external file.
The index ignores words unless they’re longer than ft_min_word_len characters and shorten than ft_max_word_len characters.

Nature-Language Full-Text Searches

A nature-language search query determines each document’s relevance to the query. Relevance is based on the number of matched words and the frequency with which they occur in the document. Words that are less common in the entire index make a match more relevant. In contrast, extremely common words aren’t worth searching for at all. A nature-language full-text search excludes words that exist in more than 50% of the rows in the table, even if they’re not in the stopword list.

Unlike normal queries, the full-text search results are automatically ordered by relevance. MySQL cannot use an index for sorting when you perform a full-text search. Therefore, you shouldn’t specify an ORDER BY clause if you want to avoid a filesort.

Boolean Full-Text Searches

In Boolean searches, the query itself specifies the relative relevance of each word in a match. The results are unsorted.

A Boolean full-text search doesn’t actually require a full-text index to work. It will use a full-text index if there is one, but if there isn’t, it will just scan the entire table.

Full-Text Changes in MySQL 5.1 and Beyond

Full-Text Tradeoffs and Workarounds

Full-Text Tuning and Optimization

Foreign Key Constraints

If you insert a row into a child table, the foreign key constraint will cause InnoDB to check for a corresponding value in the parent. It must also lock the row in the parent, to ensure it doesn’t get deleted before the transaction completes.

Instead of using foreign key as constraints, it’s often a good idea to constrain the values in the application.

Merge Tables and Partitioning

Merge tables are a MySQL feature that combines multiple MyISAM tables into a single “virtual table”, much like a view that does a UNION over the tables. You create a merge table with the Merge storage engine. A merge table is not really a table per se; it’s more like a container for similarly defined tables.

Partitioned tables appear to be normal tables with special sets of instructions that tell MySQL where to physically store the rows.

The dirty little secret is that the storage code for partitioned tables is a lot like the code for merge tables.

Merge Tables

You can think of merge tables as an order, more limited version of partitioning.

mysql> create table t1(a int not null primary key) engine=MyISAM;

Query OK, 0 rows affected (0.20 sec)

mysql> create table t2(a int not null primary key) engine=MyISAM;

Query OK, 0 rows affected (0.00 sec)

mysql> insert into t1(a) values(1), (2);

Query OK, 2 rows affected (0.14 sec)

Records: 2 Duplicates: 0 Warnings: 0

mysql> insert into t2(a) values(1), (2);

Query OK, 2 rows affected (0.09 sec)

Records: 2 Duplicates: 0 Warnings: 0

mysql> create table merge(a int not null primary key)

-> union=(t1,t2) insert_method=last;

Query OK, 0 rows affected (0.13 sec)

mysql> select * from merge;

Empty set (0.03 sec)

mysql> select * from t1;

+---+

| a |

+---+

| 1 |

| 2 |

+---+

2 rows in set (0.00 sec)

mysql> drop table merge;

Query OK, 0 rows affected (0.08 sec)

mysql> select * from merge;

ERROR 1146 (42S02): Table 'test.merge' doesn't exist

mysql> create table merge(a int not null primary key)

-> engine=merge union=(t1, t2) insert_method=last;

Query OK, 0 rows affected (0.05 sec)

mysql> select * from merge;

+---+

| a |

+---+

| 1 |

| 2 |

+---+

4 rows in set (0.00 sec)

mysql>

Please note that you need to specify the engine to MERGE when creating the merge table, otherwise, the table doesn’t have any records.

Notice that the underlying tables have exactly the same number and types of columns, and that all indexes that exist on the merge table also exist on the underlying tables. These are requirements when creating a merge table.

Each table inside the merge behaves normally, but the merge table doesn’t enforce constraints over the entire set of tables.

The INSERT_METHOD=LAST instruction to the table tells MySQL to send all INSERT statements to the last table in the merge. Specifying FIRST or LAST is the only control you have over where rows inserted into the merge table are placed.

Merge table performance impacts

A merge table requires more open file descriptors than a non-merge table containing the same data. Even though a merge table looks like a single table, it actually opens the underlying tables separately.
The CREATE statement that creates a merge table doesn’t check that the underlying tables are compatible.
Queries that access a merge table access every underlying table. It’s a good idea to limit the number of underlying tables in a merge table, especially if it is the second or later table in a join.
Unique key and primary key lookups stop as soon as they succeed.
The underlying tables are read in the order specified in the CREATE TABLE statement.

Merge table strengths

Merge tables excel for data that naturally has an active and inactive part.
They help manage huge volumes of data
It’s much faster to check and repair a collection of small tables than one huge one, especially if the huge table doesn’t fit in memory.
They’re handy for creating on-the-fly tables as needed.
A MyISAM table can be a member of many merge tables.
You can copy underlying tables between servers by copying the .frm, .MYI, and .MYD files.
You can use myisampack to compress some or all of the underlying tables.

Partitioned Tables

A partitioned table’s partitions are hidden by the MySQL server and are accessible only through the partitioned table.

Any given row of data is eligible to be stored in one and only one of the partitions.

Why partitioning works

Partitioning examples

Partitioned table limitations

At present, all partitions have to use the same storage engine.
Every unique index on a partitioned table must contain the columns referred to by the partition function.
Although MySQL may be able to avoid accessing all of the partitions in a partitioned table during a query, it still locks all the partitions.
Foreign key don’t work
You can’t use LOAD INDEX INTO CACHE.

Optimizing queries against partitioned tables

It’s very important to specify the partitioned key in the WHERE clause, even if it’s otherwise redundant, so the optimizer can prune unneeded partitions.

You can use EXPLAN PARTITIONS to see whether the optimizer is pruning partitions.

At present, MySQL can prune only comparison on the partitioning function’s columns. It cannot prune on the result of an expression, even if the expression is the same as the partitioning function.

Distributed (XA) Transactions

Whereas storage engine transactions give ACID properties inside the storage engine, a distributed (XA) transaction is a higher-level transaction that can extend some ACID properties outside the storage engine—and even outside the database—with a two-phase commit.

The MySQL server can participate in an externally managed distributed transaction, but it also uses XA internally to coordinate storage engines and binary logging.

Internal XA Transactions

Synchronizing a storage engine commit with “committing” an event to the binary log is a distributed transaction, because the server – not the storage engine—handles the binary log.