Covering Indexes in MySQL, PostgreSQL, and MongoDB

Covering Indexes in MySQL, PostgreSQL, and MongoDB - Orange Matter https://orangematter.solarwinds.com/2019/02/01/covering-indexes-in-mysql-postgresql-and-mongodb/

Query Optimization — MongoDB Manual https://docs.mongodb.com/manual/core/query-optimization/#covered-query

Query Optimization — MongoDB Manual https://docs.mongodb.com/manual/core/query-optimization/#indexes-covered-queries

MySQL :: MySQL 8.0 Reference Manual :: 8.8.2 EXPLAIN Output Format https://dev.mysql.com/doc/refman/8.0/en/explain-output.html#jointype_index

A covering index is a fantastic query performance optimization. An index covers a query when the index has all the data needed to execute the query, so the server can retrieve the query’s data without reading any rows or documents. Covered queries are usually SELECT queries, but in addition to reads, indexes can cover portions of what a write query needs to do its work. In this article I’ll explain what a covering index is in a bit more detail; the technical implementation and caveats in MySQL, PostgreSQL, and MongoDB; and how to check that a query is using a covering index.

What’s a Covering Index?

A covering index, or index-covered query, doesn’t refer just to an index. Instead, it’s the combination of a query and an index: the index covers the query. An index that covers one query might not cover another query, so “covering” doesn’t solely describe the index, it describes the index and query together. To “cover” a query, an index must contain all of the data mentioned in the query. That is, the index has the data in the columns the query returns, as well as the columns the query uses for WHERE clauses, grouping, or any other part of the query. Covering indexes make queries run faster! That’s because the server is accessing less data, and potentially in a more optimized way. This is because of how indexes work. The most common implementation of indexing is basically a sorted copy of the original data, with a quick-lookup data structure to navigate it, and pointers back to the original rows or documents. It’s a lot like the index in a physical book: there’s a keyword in sorted order, followed by page numbers where you can find the keyword. Covering indexes optimize query speed because accessing rows or documents through an index is slow, just like finding page numbers in the book’s index and then flipping to the right page to find the related text. When the index covers the query, the server doesn’t have to go elsewhere to find more data. The index has everything needed. And a lot of times it has it in the order needed, too, which helps avoid random jumping around from place to place to find data.

How do Databases Support Covering Indexes?

Not all databases offer covering-index support, and those that do, usually have some caveats and small details to be aware of, lest the covering index be disabled or disallowed for helping optimize the query. Fortunately, there’s support for covering indexes in MySQL, PostgreSQL, and MongoDB. Here’s an overview of each of those. MySQL

Supported since: early versions
Terminology: covering index, index-covered query
Caveats: works best with the most common storage engines, like InnoDB and MyISAM. Some storage engines like HEAP don’t support covering indexes. With InnoDB, sometimes data that’s been modified by a transaction will be ineligible for a covering index optimization.
Extras: with the InnoDB storage engine, the primary key columns are always invisibly included in secondary keys, and can be used for index-covered queries.
Documentation for more details: https://dev.mysql.com/doc/refman/8.0/en/explain-output.html#jointype_index

PostgreSQL

Supported since: 9.2, with more robust support in version 9.6 and newer
Terminology: index-only scan
Caveats: works best with B-Tree indexes, but sometimes works with other index types. GIN indexes don’t work. Some rows’ visibility might not be set in the visibility map if they’ve been modified recently, so access to the original rows might still be required. The planner doesn’t always know how to use index-only scans in all theoretically possible cases.
Extras: can work with expression indexes too, but the rules are complicated. Since version 11, indexes can INCLUDE extra payload columns that aren’t indexed, but are useful for allowing index-only scans.
Documentation for more details: https://www.postgresql.org/docs/current/indexes-index-only-scans.html

MongoDB

Supported since: very old versions, with improvements over time
Terminology: covered query
Caveats: Doesn’t work with array fields or geospatial indexes. Doesn’t work when querying sharded collections through mongos unless the index contains the shard key.
Extras: in version 3.6, covering can work with fields within embedded documents.
Documentation for more details: https://docs.mongodb.com/manual/core/query-optimization/#covered-query

How To Use EXPLAIN To Tell If A Query Uses A Covering Index

The best way to figure out whether a query benefits from a covered index optimization is to explain the query and examine its execution plan. The output of explaining a query is pretty complicated and different for each database, so I’ll discuss this only at a pretty high level here, with links to documentation.

MySQL puts a note in the Extra field of the EXPLAIN output. In older-style tabular EXPLAIN, it’s “Using index.” There are a few other notes that can go in Extra that begin with “Using index” but have different meaning. Covering indexes are notated by “Using index” without any further details. In newer-style JSON format, there’s a property named using_index which is true or false.
PostgreSQL indicates the access type prominently in the EXPLAINoutput. It’ll appear something like “Index Only Scan using tbl1_pkey on tbl1.”
MongoDB is a bit trickier. In earlier versions of MongoDB, there’s an indexOnly true/false property in the JSON document that you get as a result of running a query with explain. In more recent versions, quoting the manual, “When an index covers a query, the explain result has an IXSCAN stage that is not a descendant of a FETCH stage, and in the executionStats, the totalDocsExamined is 0.”

Concluding Thoughts

Covering indexes are such a powerful optimization that if the query is a heavy hitter in general in your server, then creating a new index is often well worth the extra cost and disk space. You can use a workload profile to figure this out. For example, in this screenshot, you can see the first two SELECT queries are a huge proportion of the overall workload, making them potentially good candidates for optimizing with a covering index. vividcortex-profiler Using an index to cover a query can often speed it up by orders of magnitude. It's not always that dramatic an optimization, but I've personally seen those types of results many times. Of course your mileage may vary, etc. Now you know what a covering index (or covered query) is, why it's faster than a query that's not covered by an index, and how to check whether a query is covered by an index. Next time you're looking at a query that's slower than you want, check whether an index could potentially cover it. Happy query optimization!