7 min read
Database Indexing: A Practical Guide to Boosting Query Performance
Database PostgreSQL Node.js Performance Backend DevOps

Database Indexing: A Practical Guide to Boosting Query Performance

Why Database Performance Matters

Imagine searching for a specific word in a 1000-page book without an index—you’d have to flip through every single page. That’s exactly what happens when your database queries run without proper indexing. In this hands-on guide, I’ll show you how database indexes can transform your query performance from painfully slow to lightning fast.


What is a Database Index?

A database index is a data structure that improves the speed of data retrieval operations on a database table. Think of it like an index in a book—instead of scanning every page (or row), the database can quickly jump to the exact location where your data lives.

Key Concepts

Without Index (Sequential Scan)

  • Database reads every row in the table
  • Time complexity: O(n) - linear time
  • Performance degrades as data grows

With Index (Index Scan)

  • Database uses a B-tree structure to quickly locate rows
  • Time complexity: O(log n) - logarithmic time
  • Consistently fast even with millions of records

The Performance Impact: Real Numbers

In my demonstration project with 100,000 records, here’s what I observed:

OperationWithout IndexWith IndexSpeedup
Search by email~50-100ms~2-5ms20-50x faster
Query planSequential ScanIndex ScanOptimal

The difference becomes even more dramatic with larger datasets. A table with 1 million records could see 100x+ performance improvements!


Project Overview

I built a Node.js application that demonstrates indexing performance by comparing two identical PostgreSQL databases:

  1. db_with_index - Has an index on the email column
  2. db_without_index - No index (baseline for comparison)

Tech Stack

  • Backend: Node.js + Express.js
  • Database: PostgreSQL 16
  • Containerization: Docker & Docker Compose
  • Data Volume: 100,000 records per database

Architecture & Setup

Database Schema

CREATE TABLE orders (
  id SERIAL PRIMARY KEY,
  user_id INT,
  email VARCHAR(255),
  product_name VARCHAR(150),
  status VARCHAR(20),
  amount NUMERIC(10,2),
  city VARCHAR(100),
  created_at TIMESTAMP
);

-- The magic line:
CREATE INDEX idx_orders_email ON orders(email);

Docker Configuration

Running two PostgreSQL instances simultaneously using Docker Compose:

services:
  db_with_index:
    image: postgres:16
    container_name: postgres_with_index
    ports:
      - "5433:5432"
    volumes:
      - ./init-db-with-index.sql:/docker-entrypoint-initdb.d/init.sql

  db_without_index:
    image: postgres:16
    container_name: postgres_without_index
    ports:
      - "5434:5432"
    volumes:
      - ./init-db-without-index.sql:/docker-entrypoint-initdb.d/init.sql

API Endpoints for Testing

Search by Email (The Key Comparison)

With Index:

GET /api/indexed/orders/email/user50000@example.com

Without Index:

GET /api/non-indexed/orders/email/user50000@example.com

Check Execution Plans

With Index:

GET /api/indexed/explain/email/user50000@example.com

Without Index:

GET /api/non-indexed/explain/email/user50000@example.com

Understanding Query Execution Plans

PostgreSQL’s EXPLAIN ANALYZE command shows exactly how the database executes a query.

Without Index (Sequential Scan)

Seq Scan on orders
  Filter: (email = 'user50000@example.com')
  Rows Removed by Filter: 99999
  Execution Time: 85.234 ms

What’s happening:

  • Database scans all 100,000 rows
  • Checks each row’s email against the search criteria
  • Discards 99,999 rows that don’t match
  • Takes ~85ms for a single record lookup

With Index (Index Scan)

Index Scan using idx_orders_email on orders
  Index Cond: (email = 'user50000@example.com')
  Execution Time: 0.234 ms

What’s happening:

  • Database uses the B-tree index structure
  • Directly jumps to matching row(s)
  • No unnecessary row scans
  • Takes <1ms - 350x faster!

Code Implementation Highlights

Connection Pool Management

// db/client.js
import pg from 'pg';

const { Pool } = pg;

export const poolWithIndex = new Pool({
  host: 'localhost',
  port: 5433,
  database: 'db_with_index',
  user: 'admin',
  password: 'admin',
});

export const poolWithoutIndex = new Pool({
  host: 'localhost',
  port: 5434,
  database: 'db_without_index',
  user: 'admin',
  password: 'admin',
});

Route Handler with Execution Plan

// routes/indexed.routes.js
router.get('/explain/email/:email', asyncHandler(async (req, res) => {
  const { email } = req.params;
  
  const result = await poolWithIndex.query(
    'EXPLAIN ANALYZE SELECT * FROM orders WHERE email = $1',
    [email]
  );
  
  res.json(new ApiResponse(200, result.rows, 'Execution plan retrieved'));
}));

When to Use Indexes

Good Candidates for Indexing

  1. Columns frequently used in WHERE clauses

    • Email addresses, user IDs, status codes
  2. Foreign keys in JOIN operations

    • Dramatically speeds up table joins
  3. Columns used for sorting (ORDER BY)

    • Pre-sorted data for faster retrieval
  4. High cardinality columns

    • Columns with many unique values (emails, UUIDs)

When NOT to Index

  1. Small tables (< 1000 rows)

    • Index overhead exceeds benefits
  2. Frequently updated columns

    • Index maintenance slows down INSERT/UPDATE operations
  3. Low cardinality columns

    • Boolean fields, gender, status (few unique values)
  4. Columns rarely queried

    • Wastes storage space and slows writes

Performance Testing Results

Test Scenario: Search for specific email in 100,000 records

MetricWithout IndexWith IndexImprovement
First query (cold cache)95ms3ms31x faster
Subsequent queries (warm cache)45ms0.8ms56x faster
CPU usageHighLow70% reduction
Scan methodSequentialIndexOptimal

Scalability Impact

As data grows, the performance gap widens exponentially:

  • 100K records: 30-50x faster with index
  • 1M records: 100-200x faster with index
  • 10M records: 500-1000x faster with index

Index Types in PostgreSQL

1. B-tree (Default)

CREATE INDEX idx_email ON orders(email);
  • Best for equality and range queries
  • Supports <, <=, =, >=, >
  • Most commonly used index type

2. Hash

CREATE INDEX idx_email_hash ON orders USING HASH (email);
  • Only for equality comparisons (=)
  • Slightly faster than B-tree for exact matches
  • Cannot be used for range queries

3. GIN (Generalized Inverted Index)

CREATE INDEX idx_tags ON articles USING GIN (tags);
  • For array, JSON, and full-text search
  • Great for multi-value columns

4. GiST (Generalized Search Tree)

CREATE INDEX idx_location ON stores USING GIST (location);
  • For geometric and full-text data
  • Supports complex data types

Best Practices

1. Monitor Index Usage

SELECT 
  schemaname,
  tablename,
  indexname,
  idx_scan,
  idx_tup_read,
  idx_tup_fetch
FROM pg_stat_user_indexes
WHERE idx_scan = 0;

This query finds unused indexes that waste space and slow down writes.

2. Composite Indexes for Multiple Columns

CREATE INDEX idx_user_status_date ON orders(user_id, status, created_at);

Useful when queries filter by multiple columns together.

3. Partial Indexes for Filtered Queries

CREATE INDEX idx_active_orders ON orders(created_at) 
WHERE status = 'active';

Smaller, faster index for queries that always filter by the same condition.

4. Regular Index Maintenance

REINDEX INDEX idx_orders_email;
VACUUM ANALYZE orders;

Keeps indexes optimized and statistics up-to-date.


Common Pitfalls to Avoid

1. Over-Indexing

  • Every index slows down INSERT, UPDATE, DELETE operations
  • Indexes consume disk space
  • Rule of thumb: Start with few indexes, add based on query patterns

2. Indexing Low-Selectivity Columns

-- ❌ Bad: Only 2 possible values
CREATE INDEX idx_gender ON users(gender);

-- ✅ Good: Many unique values
CREATE INDEX idx_email ON users(email);

3. Ignoring Index Selectivity

-- Check selectivity (higher is better)
SELECT 
  COUNT(DISTINCT email)::float / COUNT(*) as selectivity
FROM orders;
-- Result: 1.0 = perfect (all unique)
-- Result: 0.01 = poor (only 1% unique)

4. Not Using EXPLAIN for Query Analysis

Always test with EXPLAIN ANALYZE to verify your index is being used!


Running the Demo Project

Quick Start

# 1. Clone and setup
cd database-indexing
npm install

# 2. Start databases
docker-compose up -d

# 3. Wait 10 seconds, then seed data
npm run seed

# 4. Start server
npm run dev

Test Endpoints

# Search without index (slow)
curl http://localhost:3000/api/non-indexed/orders/email/user50000@example.com

# Search with index (fast)
curl http://localhost:3000/api/indexed/orders/email/user50000@example.com

# Compare execution plans
curl http://localhost:3000/api/non-indexed/explain/email/user50000@example.com
curl http://localhost:3000/api/indexed/explain/email/user50000@example.com

Key Takeaways

  1. Indexes are essential for production databases - They can improve query performance by 10-1000x

  2. Index strategically, not excessively - Focus on columns in WHERE, JOIN, and ORDER BY clauses

  3. Monitor and maintain - Use PostgreSQL statistics to identify slow queries and unused indexes

  4. Always test - Use EXPLAIN ANALYZE to verify your indexes are being used effectively

  5. Consider trade-offs - Indexes speed up reads but slow down writes—balance based on your use case


Real-World Applications

This knowledge applies to:

  • E-commerce platforms: Product searches, order lookups
  • Social media: User profiles, friend lookups
  • Analytics dashboards: Time-series data, aggregations
  • API backends: Any database-backed REST/GraphQL API

Resources & Further Reading


Conclusion

Database indexing is one of the most powerful performance optimization techniques at your disposal. The difference between a sequential scan and an index scan can mean the difference between a snappy user experience and frustrated users abandoning your application.

The best part? In most cases, it’s just one line of SQL:

CREATE INDEX idx_your_column ON your_table(your_column);

Start small, measure the impact, and watch your queries fly! 🚀


Questions or suggestions? Feel free to reach out or leave a comment below. Happy indexing!