# DATS-4 Database AI Suite

Connect to any PostgreSQL or MySQL database, analyze CSV/TXT files up to 1.5GB, run multi-agent AI models for advanced analytics with charts and PDF reports.

## Links
- App: https://rexdb.tigzig.com
- Docs: https://tigzig.com/app-documentation/analyzer.html
- GitHub (Backend): https://github.com/amararun/shared-fastapi-rex-db-coolify
- GitHub (Frontend): https://github.com/amararun/shared-rexdb-auth-embed-v3-agentflowv2

## Tags
database-ai, text-to-sql, fastapi, postgresql, mysql, flowise, multi-agent

## Architecture

The full app has 7 major components:

1. Main App (React SPA) - UI with file uploads, database connection, interactive tables
2. FastAPI Server: Database Connector - Text-to-SQL processing, file uploads, SQL execution
3. FastAPI Server: Neon DB Creation - Temporary PostgreSQL database provisioning
4. LLM Agent: Sequential Agent Framework built with Flowise AI
5. Proxy Server: API calls to OpenAI / Gemini / OpenRouter
6. MCP Server: Markdown to PDF conversion
7. Quant Analyst: TIGZIG Quants Agent integrated into a single tab

```
Frontend (React) → Flowise Sequential Agent → FastAPI Backend → PostgreSQL/MySQL
                                            → Proxy Server → OpenAI/Gemini/OpenRouter
                                            → MCP Server → PDF Reports
```

### GitHub Repositories
- Frontend: https://github.com/amararun/shared-rexdb-auth-embed-v3-agentflowv2
- FastAPI Backend (DB Connect): https://github.com/amararun/shared-fastapi-rex-db-coolify
- FastAPI Backend (Neon DB): https://github.com/amararun/shared-rexdb-fastapi-neon
- LLM Proxy Server: https://github.com/amararun/shared-rtWebrtc-fastAPI-ephemeral
- MCP Markdown to PDF: https://github.com/amararun/shared-mcp-markdown-to-pdf
- Flowise Agent Schemas: In docs folder of Frontend repo

## Backend Technical Details

FastAPI server for connecting LLMs/AI tools to PostgreSQL and MySQL databases.

### API Endpoints

**SQL Query on Fixed Database**
- `GET /sqlquery/` - params: `sqlquery`, `cloud` (azure|aws|neon|filessio)

**SQL Query on Custom Database**
- `GET /connect-db/` - params: `host`, `database`, `user`, `password`, `port`, `db_type` (mysql|postgresql), `sqlquery`

**File Upload to PostgreSQL**
- `POST /upload-file-llm-pg/` - Upload CSV/gzip to fixed Neon PostgreSQL
- `POST /upload-file-custom-db-pg/` - Upload to custom PostgreSQL (params: host, database, user, password, port, schema)

**File Upload to MySQL**
- `POST /upload-file-llm-mysql/` - Upload CSV/gzip to fixed MySQL
- `POST /upload-file-custom-db-mysql/` - Upload to custom MySQL

**Export Table**
- `GET /connect-db-export/` - Stream table contents as text file

### File Upload Processing
- Size threshold: 100MB (below = memory, above = streams to disk)
- Disk write chunk size: 32MB
- Gzip files: streaming decompression (32MB chunks)
- Delimiter auto-detection: tab > comma > pipe
- Column name sanitization for SQL compatibility
- PostgreSQL bulk insert: COPY command via copy_expert()
- MySQL bulk insert: Polars scan_csv() + collect_chunks(), 100k rows per chunk, falls back to Pandas with on_bad_lines='skip'
- MySQL optimizations: disables foreign_key_checks, unique_checks, autocommit during bulk insert
- Tested up to 1.6GB compressed files

### Connection Pooling
PostgresPoolManager manages pools keyed by host:port:database:user. Stale connections detected with SELECT 1, retried up to 3 times. Uses asyncio.Lock() for thread safety.

### Timeouts
- Connection: 30 seconds
- Statement (PostgreSQL): 15 minutes
- Export queries: 10 minutes

### Environment Variables
- `RT_ENDPOINT` - URL of proxy server for OpenAI/LLM API calls (required for schema detection)
- `NEON_HOST/DATABASE/USER/PASSWORD` - PostgreSQL connection
- `FILESSIO_HOST/DATABASE/USER/PASSWORD/PORT` - MySQL connection
- `AWS_DATABASE_NAME/HOST/USER/PASSWORD` - MySQL connection
- `AZURE_DATABASE_NAME/HOST/USER/PASSWORD` - MySQL connection
- `RATE_LIMIT` - Default: 300/hour

### Dependencies
fastapi, uvicorn, psycopg2-binary, mysql-connector-python, polars, pandas, python-multipart, slowapi, aiohttp

### Custom GPT Integration
Use docs/gptJson.json schema for Custom GPT actions. Replace server URL with your deployment.

### Security
- CORS whitelist configured in app.py
- IP-based rate limiting via slowapi
- SSL required for Neon connections (sslmode='require')

## Setup and Run

```bash
pip install -r requirements.txt
cp .env.example .env  # Edit with your credentials

# Production
uvicorn app:app --host 0.0.0.0 --port $PORT

# Development
python app.py  # Runs on localhost:8000
```

## User Guide

### Interface Tabs
- AI Data Structure: AI interpretation of file structure with PDF report
- AI Quick Insights: Quick analysis using 100 rows with PDF report
- AI Quants Analyst: Portfolio and quants reporting agent with database and Python capabilities
- AI Database Analyst: Main agent - sequential agent with multi-step reasoning, execution, error handling

### Menu Options
- Sample: Quick start with curated sample datasets
- Choose File: Upload CSV/TXT files
- Fast Insights → Table: Load files into interactive tables
- Fast Insights → Structure: AI analysis of file structure
- Fast Insights → Analysis: AI analysis using 100 rows
- Connect to DB: Connect to your warehouse using credentials
- Upload File to DB: Upload files to connected warehouse (temporary DB available without login)
- Export Data: Export from connected warehouse
- Create DB: Create temporary Postgres database (requires login)
- Model: Select AI model and agent

### Capabilities
- BYOW (Bring Your Own Warehouse): Connect to MySQL/PostgreSQL, unlimited table sizes, petabyte-scale
- Natural language to SQL querying with sequential reasoning
- Interactive tables with sort, filter, search, column statistics
- File analysis: CSV/TXT up to 1.5GB (tested 600MB/5M rows)
- Python-based statistical analysis and chart generation
- PDF report generation

### Video Guide
https://www.youtube.com/watch?v=hqn3zrdXVSQ (for REX-3/Version 3 but covers 80% of DATS-4)
