How Do Database Indexes Speed Up Queries?

Database indexes are the difference between a query that completes in milliseconds and one that brings your application to its knees. After optimizing databases for over a decade—from small startups to systems handling billions of queries daily—I’ve learned that understanding indexes deeply is essential for building performant applications. A well-placed index can transform a 30-second query into one that completes in 10 milliseconds. This guide explains how indexes work internally and how to use them effectively in production.

What Are Database Indexes?

A database index is an auxiliary data structure that provides fast access to rows in a table based on the values of one or more columns. Think of it like a book’s index—instead of reading every page to find a topic, you look it up in the index and jump directly to the relevant pages.

The Problem Indexes Solve

Without indexes, databases perform full table scans—reading every row to find matching records:

-- Table with 10 million users
SELECT * FROM users WHERE email = '[email protected]';

-- Without index: Reads all 10 million rows
-- Query time: 15-30 seconds

With an index on the email column:

-- Same query with index
SELECT * FROM users WHERE email = '[email protected]';

-- With index: Reads 1 row (plus a few index nodes)
-- Query time: 2-5 milliseconds

This 5000x performance improvement comes from the index data structure enabling direct lookups instead of sequential scans.

Indexes vs. Sequential Scans

Databases use a query planner to choose between index scans and sequential scans:

-- PostgreSQL: View query execution plan
EXPLAIN ANALYZE
SELECT * FROM users WHERE email = '[email protected]';

-- Without index:
-- Seq Scan on users  (cost=0.00..250000.00 rows=1 width=100) (actual time=15234.123..15234.567 rows=1 loops=1)
--   Filter: (email = '[email protected]')
--   Rows Removed by Filter: 9999999

-- With index:
-- Index Scan using users_email_idx on users  (cost=0.43..8.45 rows=1 width=100) (actual time=0.056..0.058 rows=1 loops=1)
--   Index Cond: (email = '[email protected]')

The cost estimates and actual time clearly show the dramatic difference. I always use EXPLAIN ANALYZE when optimizing slow queries—it’s the first tool in my performance debugging toolkit.

B-Tree Indexes: The Standard Workhorse

B-tree (Balanced Tree) indexes are the default index type in most relational databases. They provide efficient lookup, insertion, and deletion operations.

B-Tree Structure

A B-tree organizes data in a tree structure with multiple keys per node:

                    [50]
                   /    \
            [20, 35]      [75, 90]
           /   |   \      /   |   \
       [10] [25] [40] [60] [80] [95]

Each node contains:

  • Multiple sorted keys
  • Pointers to child nodes (for internal nodes)
  • Pointers to data rows (for leaf nodes)

All leaf nodes are at the same depth (balanced), ensuring consistent performance—lookups always traverse the same number of levels.

How B-Tree Lookups Work

To find email = '[email protected]':

  1. Start at root: Compare with root node keys
  2. Follow pointer: Based on comparison, descend to appropriate child
  3. Repeat: Continue until reaching leaf node
  4. Return result: Leaf node contains pointer to actual data row
-- Creating a B-tree index (default type)
CREATE INDEX users_email_idx ON users(email);

-- The index internally builds a tree structure:
-- Root node: Points to ranges (A-M), (N-Z)
-- Internal nodes: Further subdivide ranges
-- Leaf nodes: Contain actual email values with row pointers

For a table with 10 million rows, a B-tree typically has 3-4 levels:

  • Root: 1 node
  • Level 2: ~100 nodes
  • Level 3: ~10,000 nodes
  • Leaves: ~1,000,000 nodes

Each level narrows the search by ~100x, achieving logarithmic time complexity: O(log n).

B-Tree Index Efficiency

B-trees excel at:

Equality searches:

SELECT * FROM users WHERE id = 12345;
-- Index scan: O(log n)

Range queries:

SELECT * FROM orders WHERE order_date BETWEEN '2024-01-01' AND '2024-12-31';
-- Index scan followed by sequential read of matching entries

Sorting:

SELECT * FROM products ORDER BY price;
-- If index exists on price, result is already sorted

Prefix matching (for string columns):

SELECT * FROM users WHERE email LIKE 'alice%';
-- Index can be used for prefix search

B-trees are less effective for:

  • Suffix or middle wildcard searches: LIKE '%alice%'
  • Function-transformed columns: WHERE LOWER(email) = 'alice'
  • Columns with low cardinality (few distinct values): WHERE active = true on a column with 95% true values

Multi-Column B-Tree Indexes

Composite indexes on multiple columns enable efficient queries on column combinations:

-- Create multi-column index
CREATE INDEX orders_customer_date_idx ON orders(customer_id, order_date);

-- Efficient queries (uses index):
SELECT * FROM orders WHERE customer_id = 123;
SELECT * FROM orders WHERE customer_id = 123 AND order_date > '2024-01-01';

-- Inefficient query (cannot use index):
SELECT * FROM orders WHERE order_date > '2024-01-01';
-- Index on (customer_id, order_date) cannot be used when querying only order_date

Critical rule: Multi-column indexes can only be used for queries that filter on a prefix of the indexed columns. An index on (A, B, C) works for:

  • WHERE A = ...
  • WHERE A = ... AND B = ...
  • WHERE A = ... AND B = ... AND C = ...

But NOT for:

  • WHERE B = ...
  • WHERE C = ...
  • WHERE B = ... AND C = ...

I discovered this the hard way when a multi-column index I created wasn’t used for half our queries. Understanding column order in composite indexes is essential.

Hash Indexes: Fast Equality Lookups

Hash indexes use a hash table for O(1) equality lookups but don’t support range queries.

Hash Index Structure

Hash indexes compute a hash of the key and store the result:

email = '[email protected]'
hash('[email protected]') = 2847563
bucket[2847563] = pointer to row
-- Create hash index (PostgreSQL)
CREATE INDEX users_email_hash_idx ON users USING HASH (email);

When to Use Hash Indexes

Hash indexes are optimal for:

Exact equality lookups:

SELECT * FROM users WHERE email = '[email protected]';
-- Hash index: O(1) - slightly faster than B-tree O(log n)

NOT suitable for:

  • Range queries: WHERE email > '[email protected]'
  • Sorting: ORDER BY email
  • Prefix matching: WHERE email LIKE 'alice%'

In practice, I rarely use hash indexes because:

  1. B-trees are nearly as fast for equality lookups
  2. B-trees support range queries and sorting
  3. Hash indexes have higher collision overhead
  4. B-trees are more space-efficient

The only scenario where I’ve seen hash indexes provide measurable benefit is large tables (100M+ rows) with exclusively equality queries on high-cardinality columns.

Specialized Index Types

Modern databases provide specialized indexes for specific use cases.

Full-Text Search Indexes

For searching text content, full-text indexes vastly outperform LIKE queries:

-- Create full-text index (PostgreSQL)
CREATE INDEX articles_content_fts_idx ON articles USING GIN (to_tsvector('english', content));

-- Full-text search query
SELECT * FROM articles
WHERE to_tsvector('english', content) @@ to_tsquery('english', 'database & performance');

-- Much faster than:
SELECT * FROM articles WHERE content LIKE '%database%' AND content LIKE '%performance%';

Full-text indexes tokenize text, remove stop words, and support:

  • Relevance ranking
  • Phrase searches
  • Language-specific stemming
  • Boolean operators (AND, OR, NOT)

I implemented full-text search on a 50 million document database, reducing search time from 45 seconds (LIKE query) to 200 milliseconds (GIN index).

Partial Indexes

Partial indexes only index rows matching a condition, saving space and improving performance:

-- Index only active users
CREATE INDEX users_active_email_idx ON users(email) WHERE active = true;

-- Efficient for queries on active users
SELECT * FROM users WHERE email = '[email protected]' AND active = true;

-- Regular index would waste space on inactive users (95% of table)

For a system where 5% of users are active but account for 95% of queries, partial indexes provide massive space savings (20x smaller) with full performance for relevant queries.

Expression Indexes

Index the result of an expression or function:

-- Index lowercase emails for case-insensitive search
CREATE INDEX users_email_lower_idx ON users(LOWER(email));

-- Now this query uses the index
SELECT * FROM users WHERE LOWER(email) = '[email protected]';

I created expression indexes for case-insensitive search, date truncation (day/month granularity), and JSON field extraction—transforming unusable queries into millisecond responses.

Covering Indexes

Covering indexes include all columns needed by a query, avoiding table access entirely:

-- Query needs id, email, created_at
SELECT id, email, created_at FROM users WHERE email = '[email protected]';

-- Covering index includes all needed columns
CREATE INDEX users_email_covering_idx ON users(email) INCLUDE (id, created_at);

-- Database reads only the index, never accessing the table
-- "Index Only Scan" in EXPLAIN output

Covering indexes are powerful but increase index size. Use them for critical, high-frequency queries where every millisecond matters.

Index Maintenance and Overhead

Indexes aren’t free—they consume space and slow down writes.

Storage Overhead

Every index is an additional data structure requiring disk space:

-- Check table and index sizes (PostgreSQL)
SELECT 
    schemaname,
    tablename,
    pg_size_pretty(pg_total_relation_size(schemaname||'.'||tablename)) AS total_size,
    pg_size_pretty(pg_relation_size(schemaname||'.'||tablename)) AS table_size,
    pg_size_pretty(pg_total_relation_size(schemaname||'.'||tablename) - pg_relation_size(schemaname||'.'||tablename)) AS indexes_size
FROM pg_tables
WHERE schemaname = 'public'
ORDER BY pg_total_relation_size(schemaname||'.'||tablename) DESC;

For a 10GB table, indexes might consume 5-15GB depending on the number and type of indexes.

Write Performance Impact

Every INSERT, UPDATE, or DELETE must update all relevant indexes:

-- Table with 5 indexes
INSERT INTO users (email, name, created_at, ...) VALUES (...);

-- Database must:
-- 1. Insert row into table
-- 2. Update users_email_idx
-- 3. Update users_name_idx
-- 4. Update users_created_at_idx
-- 5. Update users_email_name_idx
-- 6. Update users_email_created_at_idx

-- Write operations are 6x slower than table-only writes

I optimized a high-throughput system by removing redundant indexes, reducing INSERT time from 15ms to 3ms—5x improvement with no query performance degradation.

Index Fragmentation

Over time, indexes become fragmented as data changes, reducing performance:

-- Check index bloat (PostgreSQL)
SELECT 
    schemaname,
    tablename,
    indexname,
    pg_size_pretty(pg_relation_size(indexrelid)) AS index_size,
    ROUND(100 * (pg_relation_size(indexrelid) - pg_relation_size(indexrelid, 'main')) / pg_relation_size(indexrelid)::numeric, 2) AS bloat_pct
FROM pg_stat_user_indexes
ORDER BY pg_relation_size(indexrelid) DESC;

-- Rebuild fragmented indexes
REINDEX INDEX users_email_idx;
-- Or rebuild all table indexes
REINDEX TABLE users;

I schedule weekly REINDEX operations during maintenance windows for high-churn tables, recovering 20-30% wasted space and improving query performance.

Practical Index Strategies

Choosing the right indexes requires understanding your query patterns.

Query Analysis

Start by identifying slow queries:

-- PostgreSQL: Enable query logging
ALTER DATABASE mydb SET log_min_duration_statement = 1000;  -- Log queries > 1s

-- Analyze slow query log
SELECT 
    query,
    calls,
    total_time,
    mean_time,
    min_time,
    max_time
FROM pg_stat_statements
ORDER BY mean_time DESC
LIMIT 20;

For each slow query, use EXPLAIN ANALYZE:

EXPLAIN ANALYZE
SELECT * FROM orders 
WHERE customer_id = 123 
  AND status = 'pending' 
  AND created_at > '2024-01-01';

-- Look for:
-- - Seq Scan: Missing index
-- - High "actual time": Query bottleneck
-- - "Rows Removed by Filter": Index not selective

Index Selection Guidelines

Based on production experience:

Index columns in WHERE clauses:

-- Frequent query
SELECT * FROM orders WHERE customer_id = 123;

-- Create index
CREATE INDEX orders_customer_id_idx ON orders(customer_id);

Index foreign keys:

-- orders.customer_id references users.id
CREATE INDEX orders_customer_id_idx ON orders(customer_id);

-- Speeds up joins
SELECT * FROM orders o
JOIN users u ON o.customer_id = u.id;

Index columns used for sorting:

-- Frequent query
SELECT * FROM products ORDER BY price DESC LIMIT 10;

-- Create index
CREATE INDEX products_price_idx ON products(price DESC);

Consider composite indexes for multi-column queries:

-- Frequent query
SELECT * FROM orders 
WHERE customer_id = 123 AND status = 'pending';

-- Composite index
CREATE INDEX orders_customer_status_idx ON orders(customer_id, status);

Don’t index:

  • Columns with low cardinality (few distinct values) unless combined with selective columns
  • Small tables (< 1000 rows) - sequential scan is faster
  • Columns rarely used in queries
  • Write-heavy tables with read-rarely columns

Monitoring Index Usage

Track which indexes are actually used:

-- PostgreSQL: Index usage statistics
SELECT 
    schemaname,
    tablename,
    indexname,
    idx_scan AS scans,
    idx_tup_read AS tuples_read,
    idx_tup_fetch AS tuples_fetched,
    pg_size_pretty(pg_relation_size(indexrelid)) AS size
FROM pg_stat_user_indexes
ORDER BY idx_scan ASC;  -- Unused indexes first

-- Indexes with 0 scans are candidates for removal

I discovered dozens of unused indexes in a production database—removing them saved 50GB disk space and improved write performance by 30%.

Advanced Optimization Techniques

For high-performance systems, these advanced techniques provide additional optimization.

Index Hinting (Query Optimizer Hints)

Sometimes the query planner makes suboptimal choices. You can force index usage:

-- MySQL: Force specific index
SELECT * FROM orders FORCE INDEX (orders_customer_date_idx)
WHERE customer_id = 123 AND order_date > '2024-01-01';

-- PostgreSQL: Disable sequential scans (testing only!)
SET enable_seqscan = OFF;

Warning: Only use hints when you’ve proven the planner is wrong. The optimizer usually knows best.

Clustering

Physically reorder table rows to match index order:

-- PostgreSQL: Cluster table by index
CLUSTER users USING users_email_idx;

-- Subsequent range scans are faster due to data locality
SELECT * FROM users WHERE email BETWEEN 'a' AND 'c';

Clustering is a one-time operation—new rows aren’t automatically clustered. I recluster weekly for tables with sequential access patterns.

Index-Only Scans with INCLUDE

PostgreSQL’s INCLUDE clause creates covering indexes without adding columns to the B-tree structure:

-- Standard covering index (columns in tree)
CREATE INDEX users_email_idx ON users(email, created_at);

-- Better: INCLUDE clause (created_at not in tree, only in leaves)
CREATE INDEX users_email_idx ON users(email) INCLUDE (created_at);

The INCLUDE version enables index-only scans while maintaining optimal B-tree size for searches.

Real-World Case Studies

These scenarios from production systems demonstrate indexing impact:

Problem: Customer support searching orders by email took 45 seconds.

-- Slow query
SELECT * FROM orders WHERE customer_email = '[email protected]';
-- Seq Scan: 45 seconds on 100M row table

Solution: Created B-tree index on customer_email.

CREATE INDEX orders_customer_email_idx ON orders(customer_email);
-- Query time: 15ms (3000x faster)

Result: Customer support efficiency improved dramatically, reducing ticket resolution time.

Case Study 2: Time-Series Data

Problem: Dashboard queries on metrics table (1 billion rows) timing out.

-- Slow query
SELECT AVG(value) FROM metrics
WHERE device_id = 456 AND timestamp > NOW() - INTERVAL '1 day';
-- Seq Scan: timeout after 60 seconds

Solution: Composite index on (device_id, timestamp).

CREATE INDEX metrics_device_timestamp_idx ON metrics(device_id, timestamp);
-- Query time: 120ms

Optimization: Added covering index including value column.

CREATE INDEX metrics_device_timestamp_covering_idx 
ON metrics(device_id, timestamp) INCLUDE (value);
-- Query time: 8ms (index-only scan)

Result: Real-time dashboard became usable.

Problem: Searching users by name with partial matches too slow.

-- Slow query
SELECT * FROM users WHERE name LIKE '%John%';
-- Seq Scan: 8 seconds on 50M row table

Solution: Full-text search index with trigram extension.

-- Enable trigram extension
CREATE EXTENSION pg_trgm;

-- Create GIN index for trigram search
CREATE INDEX users_name_trgm_idx ON users USING GIN (name gin_trgm_ops);

-- Fast query
SELECT * FROM users WHERE name % 'John';
-- Index scan: 45ms

Result: User search feature became viable, enabling new application functionality.

Testing and Benchmarking

Always measure index impact:

-- Benchmark query without index
EXPLAIN (ANALYZE, BUFFERS) SELECT * FROM users WHERE email = '[email protected]';
-- Note execution time and buffer hits

-- Create index
CREATE INDEX users_email_idx ON users(email);

-- Benchmark with index
EXPLAIN (ANALYZE, BUFFERS) SELECT * FROM users WHERE email = '[email protected]';
-- Compare execution time and buffer hits

-- Check index size
SELECT pg_size_pretty(pg_relation_size('users_email_idx'));

Use realistic data volumes—indexes that work on 1000 rows may behave differently on 10 million rows.

Conclusion

Database indexes are essential for query performance, transforming impractical queries into millisecond responses. Understanding index data structures—primarily B-trees—and their trade-offs enables informed optimization decisions.

Key takeaways from production experience:

  • Indexes provide logarithmic lookups vs. linear full table scans
  • B-tree indexes are versatile, supporting equality, range, and sort operations
  • Composite indexes must match query column order
  • Indexes have storage and write performance costs
  • Monitor index usage and remove unused indexes
  • Use EXPLAIN ANALYZE to verify optimization impact
  • Specialize with partial, expression, and covering indexes for specific use cases

For deeper understanding, study database internals: the PostgreSQL documentation on indexes provides comprehensive coverage, and the MySQL indexing guide offers practical examples. Use the Index, Luke! by Markus Winand is an excellent resource for database indexing strategies. For B-tree internals, review The Art of Computer Programming, Volume 3 by Knuth. The Database Internals book by Alex Petrov explains modern storage engines. Query optimization is covered thoroughly in SQL Performance Explained by Markus Winand.

Thank you for reading! If you have any feedback or comments, please send them to [email protected] or contact the author directly at [email protected].