From 12 second queries to under 1s: Optimizing a 230 Million Row Dashboard - 14 Bottlenecks I Had to Fix

Published: February 10, 2026

Cine Pro Timeline

The first version was taking 9 to 12 seconds. I would click, wait, wonder if something was broken. The database had 230 million rows across multiple tables. The largest single table: 91 million records. Total size: 16GB.

This is the story of getting that down to under 3 seconds - most pages under a second.

I built this with Claude Code. The process was methodical: add a timer to each component, measure page load box by box, then tackle the slowest one. Claude Code would run investigations - writing test scripts, timing queries with curl and Python, tracing execution paths through frontend and backend. It would come back with a diagnosis: "This query is scanning 12 million rows because of a JOIN. Here's why the index isn't being used."

One problem at a time. Measure, diagnose, fix, measure again.

The app now runs on two backends simultaneously - Hetzner in Germany and Oracle Cloud in USA - with a frontend toggle to switch between them. Users can compare latency from different regions. Both backends serve the same 16GB DuckDB database with identical optimization.

Here's everything I learned getting from 9 to 12 seconds to 0.5 to 2.5 seconds.


Section 1: Pre-Compute Everything

The single biggest performance lever. Four separate bottlenecks, all solved by the same idea: do the expensive work once... create flat tables with pre-aggregations, add index, query the result.

Challenge 1: The JOIN That killed performance

The Deep Profile page shows a person's complete filmography - every movie, TV show, and short they've worked on. For someone like Morgan Freeman, that's 200+ titles with ratings, vote counts, and genres.

The original query looked reasonable:

SELECT tb.primaryTitle, tb.startYear, tr.averageRating
FROM title_principals tp
JOIN title_basics tb ON tp.tconst = tb.tconst
LEFT JOIN title_ratings tr ON tb.tconst = tr.tconst
WHERE tp.nconst = 'nm0000151'

Time: 1.6 seconds. Every single time.

The problem: DuckDB wasn't using the index on title_basics for the JOIN. It was doing a full sequential scan of 12 million rows to find the ~200 matching titles. The filter on title_principals was fast (indexed), but the JOIN to title_basics killed performance.

The fix: Pre-compute the JOIN once, store it as a denormalized table:

CREATE TABLE person_filmography AS
SELECT tp.nconst, tp.tconst, tp.category,
       tb.primaryTitle, tb.startYear, tb.titleType,
       tr.averageRating, tr.numVotes
FROM title_principals tp
JOIN title_basics tb ON tp.tconst = tb.tconst
LEFT JOIN title_ratings tr ON tb.tconst = tr.tconst

Result: 91 million rows, ~1.8GB storage. But now every person lookup is a simple indexed filter - no JOINs at query time.

Before: 1.6 seconds (JOIN with full table scan) After: 0.5 seconds (indexed filter on denormalized table) Improvement: 3.4x faster

The principle: if you're repeatedly JOINing the same tables for the same access pattern, just create the JOIN once and push into a intermediate table

Challenge 2: 15 HTTP Requests for a Static Page

The main dashboard shows database statistics, rating distributions, top movies by genre, hidden gems. None of this data changes - it's aggregate stats from a dataset that updates monthly at most.

But the page was making 15 parallel API calls on every load. Each request: DNS lookup, TLS handshake, routing through Vercel serverless, hitting the backend, returning JSON. Even with backend caching, the network overhead alone was 2-3 seconds.

I watched the Network tab: 15 requests, 15 responses, all returning small JSON payloads. Total data: maybe 100 rows across all queries. Total time: 3 seconds.

The page is static. Every user sees the same thing. Why query 15 times?

Fix: Store the entire dashboard response as a single JSON blob in DuckDB. One table, one row, one query returns everything. The JSON gets regenerated when the data pipeline runs (weekly).

Before: 15 HTTP requests, ~3 seconds After: 1 HTTP request, ~0.65 seconds Improvement: 4.6x faster

Challenge 3: In-Memory Query Cache - Why Most Queries Never Hit DuckDB

All the pre-computation above makes individual queries fast. But the fastest query is the one you don't run at all.

The backend has an in-memory cache sitting in front of DuckDB using cachetools (Python library). Every query result gets cached by its SQL hash. Same query comes in again - return the cached result, skip DuckDB entirely.

Most dashboard usage is repetitive. People search for popular actors. The same filmography queries get run over and over. After the first user looks up Morgan Freeman, every subsequent lookup is a cache hit - zero DuckDB work.

Cache resets on server restart since it's in-memory, not persisted. Cold starts are slower, but the cache warms up quickly with normal traffic. Config is 2000 entries max per database file with a 30-day TTL. Enough to cover the popular queries without eating too much RAM.

Challenge 4: Same Principle, Applied Three More Times

Once I saw pre-computation working, I applied it everywhere the pattern fit:

Leaderboards: The "Most Prolific" section shows top actors, actresses, and directors by era - six eras, three categories. Each combination was scanning 97 million rows, taking 2-5 seconds per query. Pre-computed all combinations into a summary table. Before: 2-5 seconds. After: ~50ms. Improvement: 40-100x faster.

Person stats: The stats header (total titles, average rating, career span) was running 6 parallel aggregation queries per person, adding up to ~300ms. Created a person_stats table with one row per person. Before: ~300ms. After: ~5ms. Improvement: 60x faster.

Search ranking: Search for "fost" and Adam Foster appeared before Jodie Foster - alphabetical order, ignoring that Jodie has 700+ credits. Used the pre-computed person_stats table to order results by total_titles. Famous people show up first. And because the stats are pre-computed, the ORDER BY adds ~0.1ms per result instead of ~5-10ms with runtime aggregation.

Storage is cheap. User patience is not.


Section 2: Query Patterns That Backfire

Challenge 5: Searching 15 Million People for Every Keystroke

The person search queries name_basics - 15 million people. Most are extras, background actors, or people with a single credit. Type "John" and the database scans all 15 million rows. About 2 seconds per search.

But most users search for known actors and directors. Out of 15 million people, only 2.3 million have 3 or more credits. That smaller set covers nearly all searches anyone will actually run.

Fix: Created a prolific_people table - just people with 3+ credits. Added a "Smart Search" toggle, on by default. Searches scan 2.3 million rows instead of 15 million.

Before: ~2 seconds (scanning 15M rows) After: ~200ms (scanning 2.3M rows, 85% less data)

If someone doesn't show up, uncheck Smart Search to search the full database. In practice, nearly all searches are covered by the smaller set.

Challenge 6: The ORDER BY That Killed My Index

After all the pre-computation work, one query was still slow: the filmography list. EXPLAIN ANALYZE showed the problem:

TABLE_SCAN - Type: Sequential Scan
Rows scanned: 91,479,013

91 million rows scanned. But I had an index on nconst. Why wasn't DuckDB using it?

The query:

SELECT * FROM person_filmography
WHERE nconst = 'nm0000151'
ORDER BY startYear DESC
LIMIT 500

The problem: DuckDB's ART indexes have a critical limitation. They only work for single-column equality/IN conditions. The moment you add ORDER BY on a different column, the optimizer decides a full table scan is "cheaper" than index lookup + sort. Remove the ORDER BY, and the EXPLAIN output changes to Index Scan.

The fix: Remove ORDER BY from SQL, sort in JavaScript instead.

filmography.sort((a, b) => (b.year || 0) - (a.year || 0))

Sorting 500 objects in JavaScript: less than 1ms. Saved by removing ORDER BY: 1100ms.

Before: 2.0 seconds (Sequential Scan because of ORDER BY) After: 0.9 seconds (Index Scan, client-side sort) Improvement: 55% faster

Takeaway: When DuckDB isn't using your index, check for ORDER BY. Remove it and sort client-side if the result set is small.

Challenge 7: Adaptive Query - Check the Count Before You Query

Challenge 6 removed ORDER BY to get Index Scan. But that created a new problem.

LIMIT without ORDER BY doesn't return the first N or the last N rows. It returns any N - whatever DuckDB finds first in storage order. For someone with 200 titles and a LIMIT of 5000, we get all 200. Client-side sort works fine. But for someone with 39,000 titles, LIMIT 5000 returns an arbitrary 5000. Client-side sort then orders this random sample by year. Completely wrong data.

The fix is an adaptive query. Before running the filmography query, make a separate API call to check how many titles this person has. One row from person_stats. The database lookup is instant, but this is a full round-trip through the network (frontend to Vercel to backend and back), so it costs a few hundred milliseconds.

If they have 5000 or fewer titles - download everything. No ORDER BY needed because LIMIT doesn't cut anything off. This is the fast path: Index Scan, ~0.9 seconds.

If they have more than 5000 titles - add ORDER BY to get the most recent 5000. This forces the slower Sequential Scan (~2 seconds), but we need the right data, not random data.

99.98% of people have 5000 titles or less - they get the fast path. Only 513 people in the entire database have more than 5000. The most prolific is Ekta Kapoor with 39,000 entries. The count check adds a network round-trip, but it saves over a second on the main query by keeping Index Scan for nearly everyone.

An extra API call to check the count costs a few hundred ms, but saves ~4.5 seconds on the filmography query for 99.98% of lookups.

Challenge 8: EXISTS vs CTE - A 15x Performance Gap

The Discover page lets users filter movies by person - "show all Morgan Freeman movies." The original query used EXISTS:

SELECT tb.*, tr.*
FROM title_basics tb
JOIN title_ratings tr ON tb.tconst = tr.tconst
WHERE EXISTS (
    SELECT 1 FROM title_principals tp
    WHERE tp.tconst = tb.tconst AND tp.nconst = 'nm0000151'
)

Time: 2-3 seconds. The EXISTS subquery runs for every row being evaluated - millions of index lookups.

The fix: CTE pattern. Get the person's titles first (fast indexed lookup), then JOIN to that small set:

WITH person_titles AS (
    SELECT DISTINCT tconst
    FROM person_filmography
    WHERE nconst = 'nm0000151'
)
SELECT tb.*, tr.*
FROM person_titles pt
JOIN title_basics tb ON pt.tconst = tb.tconst
JOIN title_ratings tr ON tb.tconst = tr.tconst

The CTE returns ~200 titles. The main query joins to this tiny set instead of running EXISTS against millions of rows.

Before: 2-3 seconds (EXISTS subquery on every row) After: ~200ms (CTE + small join) Improvement: 10-15x faster

Takeaway: When filtering by a related entity, get that entity's IDs first with a CTE, then JOIN to the small result set.

Challenge 9: The "Check First" Anti-Pattern

The stats section needed to handle two cases: prolific people (in person_filmography table) and non-prolific people (fallback to slower JOINs). The original code ran a check query first - "is this person in the fast table?" - then ran the actual stats queries based on the result.

The filmography appeared instantly. Stats appeared 300ms later. The check query had to complete before the stats queries could start. Sequential bottleneck.

Fix: "Try fast, fallback if empty" pattern. Run all the fast-table queries immediately in parallel. If they come back empty, the person isn't in the fast table - run the fallback queries. For the 95% of lookups that hit the fast table, stats load instantly alongside the filmography. For the 5% that don't, there's a small penalty - fast queries return empty quickly, then fallback queries run.

Before: Check query (300ms) then stats queries (parallel) After: All queries (parallel), fallback if needed Result: Stats appear with filmography, not after

Takeaway: Don't check which path to take. Try the fast path. Fallback if it fails.


Section 3: Stop Querying What You Already Have

Challenge 10: When 5 Database Queries Became Zero

The Top Stats Band shows highest rated titles, most voted titles, titles per decade, and title types. Original implementation: 5 parallel database queries, ~3-4 seconds total.

Then I realized: the filmography tab loads up to 5000 titles for this person. Those 5000 rows contain everything needed for the stats - ratings, votes, years, title types. The data is already in the browser.

Fix: Compute stats from the filmography array in JavaScript. Array.sort on 5000 objects takes less than 5ms. The network round-trip it replaced took 3-4 seconds.

Before: 5 database queries, 3-4 seconds After: Array operations, less than 5ms Improvement: ~99% faster

I applied this same pattern across the app. The filmography table has Role and Title Type filters - originally each filter change triggered a new database query (1-2 second wait). Now it's client-side Array.filter on the already-loaded data. Instant. The Compare tab showed side-by-side stats for multiple people - originally 5 database queries when you switched tabs. Now it computes comparison stats from the loaded filmography for the current person. Instant.

The principle: If you've already loaded the source data into the browser, don't make a round-trip to the database for derived views of that same data. Compute them client-side. For datasets under ~10,000 rows, client-side filtering and aggregation is almost always faster than a server round-trip.


Section 4: Infrastructure Realities

Challenge 11: The Container That Thought It Had 24GB of RAM

Oracle Cloud backend was 4x slower than Hetzner. Same code, same database, similar specs. OCI queries: ~4 seconds. Hetzner queries: ~1 second.

Investigated disk latency (ioping showed OCI was 10x slower - red herring, DuckDB caches in RAM). Investigated network (similar). Finally checked memory:

# Container limit
docker inspect <container> | grep Memory
# 8589934592 (8GB)
# What the container sees
cat /proc/meminfo | grep MemTotal
# 24576000 kB (24GB - host RAM!)

The problem: Docker containers share /proc/meminfo with the host. DuckDB reads /proc/meminfo to determine available RAM. It saw 24GB, set memory_limit to 80% = ~19GB, then tried to allocate 19GB in an 8GB container. Massive swapping.

The fix: Explicitly configure DuckDB memory via environment variable:

DUCKDB_MEMORY_LIMIT = os.getenv("DUCKDB_MEMORY_LIMIT")  # "7GB"

Set to 7GB (leaving 1GB headroom below the 8GB container limit).

Before: 4 seconds (DuckDB thrashing in swap) After: 1-1.5 seconds (DuckDB fits in container memory) Improvement: 3-4x faster

Takeaway: When running DuckDB in containers, always explicitly set memory_limit. The container's cgroup limit is invisible to /proc/meminfo.

Challenge 12: The Single Worker Bottleneck

Backend was running uvicorn with default settings. Under load, requests could queue up.

Before: 1 worker, requests serialize under load After: 2 workers, 2x concurrent capacity

Challenge 13: DuckDB Has No Query Timeout

A badly-written query could block a worker forever. A CROSS JOIN between two large tables - 12M x 1.5M = billions of rows - runs until the server runs out of resources.

First attempt: Use DuckDB's timeout setting. SET statement_timeout_ms = 30000. Result: error. DuckDB has no statement_timeout_ms parameter. Unlike PostgreSQL, there's no built-in query timeout.

Fix: Timeout at the Python/asyncio level. Wrap the query execution in asyncio.wait_for with a 30-second timeout. The asyncio timeout doesn't kill the DuckDB query - it just stops waiting and returns a timeout error to the user. The query continues in the background until it finishes, but the thread pool size (8) limits how many runaway queries can accumulate.

Takeaway: When your database doesn't support timeouts, implement them at the application layer.

Challenge 14: The Race Condition in Every Autocomplete

Type "fost" quickly. Results for "fo" arrive. Then "fos" results. Then "fost" results. The dropdown flashes between different result sets as each async response arrives out of order.

The problem: Each keystroke fires an async search. Responses arrive in arbitrary order. Old responses overwrite new ones.

Fix: Sequence number tracking. Every search increments a counter. When results arrive, check if the counter still matches. If a newer search has started, discard the stale results.

const thisSeq = ++searchSeqRef.current
const results = await executeSearch(term)
if (thisSeq !== searchSeqRef.current) return  // Stale, discard
setSearchResults(results)

Cost: zero. One integer increment, one comparison. Result: no more flashing, clean dropdown.

Takeaway: This pattern applies to any async operation where newer requests should supersede older ones.


What's Working Now

The dashboard pages that took 9-12 seconds now load in 0.5 to 2.5 seconds. Filters respond immediately. Search shows famous people first.

Every optimization came from the same process: measure, investigate, diagnose, fix, measure again. Claude Code made this efficient - it could trace execution paths, write test scripts, and propose fixes faster than I could type the problem description.

230 million rows. 14 bottlenecks removed. Sub-second response times. Running on a server that costs a few euros a month. Not because of any single optimization, but because each fix revealed the next bottleneck. The slow query you fix today exposes the medium query that becomes tomorrow's slow query.

Full implementations - every query, every config, every JavaScript pattern - are in the open source repos linked below.


Resources