---
title: "Go from a 200MB flat file with 1.5M records to analysis in minutes with my open-source AI-SQL App"
slug: go-from-a-200mb-flat-file-with-1-5m-records-to-analysis-in-minutes-with-my-open-source-ai-sql-app
date_published: 2025-09-15T11:55:04.690Z
original_url: https://www.tigzig.com/post/go-from-a-200mb-flat-file-with-1-5m-records-to-analysis-in-minutes-with-my-open-source-ai-sql-app
source: migrated
processed_at: 2025-12-03T13:30:00.000Z
---

# Go from a 200MB flat file with 1.5M records to analysis in minutes with my open-source AI-SQL App

20 Yrs ODI Cricket stats - I'm providing the data and tools. Go run it.

## 1. Get the Data

* **Data:** 25 years of ODI cricket data from cricsheet.org processed with Tigzig tools.
* **Format:** Pipe delimited raw TXT file, approx. 200MB, 1.5M records.

[Download from Google Drive](https://drive.google.com/drive/u/1/folders/1VHD9UzeYaJF_dBPecnjucHpoGocf4IvR)

## 2. Get the Free Database

* **Platform:** [neon.com](http://neon.com)
* **Action:** Go to their site. Get a free, live Postgres database in seconds. No CC required. Copy the credentials. This is your temporary analysis sandbox.

## 3. Load the Data

* **Platform:** [app.tigzig.com](http://app.tigzig.com) → Database Apps → DATS-4
* **Action:** Connect to your Neon DB, then upload the 200MB text file. The app handles the rest. Takes approx. 2 minutes.
  * Menu → Connect to Database
  * Menu → Choose File

## 4. Query with Natural Language

* **Action:** Just type or dictate your question.
* **Example 1:** "Show top 10 batsmen by runs off the bat with chart"
* **Example 2:** "For these, show runs, matches, overs, run rate per match & per over, with chart"

## DATS-4

My open-source SQL multi-agent app. It handles Text-to-SQL, Python charting, stats, instant Postgres creation, PDF outputs, and provides 9 reasoning models (Gemini, Claude, DeepSeek, more).

## Practitioner's Warning

This is a public-facing app. All credentials and API calls run through my backend server.

**Rule:** Use this public version for sandbox testing ONLY, with temporary databases and non-sensitive data.

**For Live Use:** Full source code shared. Deploy it on your VPN. Current setup is low-security for open testing; live use must tighten auth and access controls. Basic OAuth module with Auth0 included in source.

## Where it gets messy

This example uses file I pre-processed for rapid analysis.

**Reality:** is not like click-click and report appears. It's more like bang-head, bang-head and then a drop appears.

**The Work:** needs data cleaning, semantic layers, pre-computed metrics, marts and summary tables. AI is a powerful tool, but it doesn't replace solid data engineering - even though I use AI for data engineering too, including pre-processing of this data.
