AI Co-Analyst - Live Multi-Agent App. Cost, quality, reliability - what works? what doesn't?
Published: March 2, 2025
Sonnet-3.7 the best, Deepseek 2nd, Gemini excellent. Try it π tigzig.com (open source)
Top Line
As an AI Co-Analyst LLM, Sonnet-3.7 is my top choice for deep, incisive analysis support....loving Gemini-2.0-Flash for balance of quality, reliability and cost.. and it's the fastest. Deepseek-R1 quality close to Sonnet but less reliable. o3-mini is lowest cost but not too great
Take it for a spin
- Go to tigzig.com β Click 'Sample' to auto-upload a sample file into a temporary Postgres database. Choose your advanced analyst agent - Gemini/Sonnet/R1/o3-mini. Use sample prompt or modify it
- No login, database creds, or API keys needed
- Option: connect your own database...or upload your own files
Agent Setup β Flowise AI
Sequential Agents (LangGraph). Router agent β regular queries to a general analyst agent and complex queries to an advanced analysis route β Reasoning LLM β analysis plan + SQL queries β execution agent (gpt-4o) reviews, corrects, executes, and debugs before delivering results
Quality
My (judgmental) ranking β reasoning & analysis
- Sonnet β best by far. Brilliant derived variables & approach. Scoreβ 100 (baseline). Sometimes too deep for 4o to execute, but superb for iterative analysis
- R1 β close to sonnet β 95
- Gemini β excellent β85
- o3-mini β hmmm... β50
CPQ (Cost per Query)
Reasoning-based analysis (breakdown in comments)
- o3-mini: ~8.5c
- Gemini: ~11c
- R1: ~13.5c
- Sonnet: ~20.5c
Variance: up to Β±50% on the same query.. models evolving...and variances coming down.
Latencies: mostly 1-4 mins, sometimes 10+ mins....time of day matters β peak vs. off-peak. Gemini the fastest.
CPQβ Regular Queries
- 4o-mini: ~0.10c
- 4o: ~1.5c
4o-mini the workhorse; 4o when it stumbles...Gemini may take over
Variance: Β±20% β stable in live deployments
Latencies: 15 sec to 3 min depending on query complexity and time of day.
Reliability
- o3-mini & Sonnet β high reliability -negligible API failures
- Gemini β high nowadays...but would like to see for some time
- R1 β low - API failures & latency spikes. Improving- likely temporary. Alternate hosting options available.
Demoed Example
- Scoring & Ranking of Indian Banks - Credit Card Segment
- Data Mart & Profile Summary for 1M Cust + 10M Trans.
SQL Errors / API Failures / Data Validations?
See detailed video guide - for live debugging / error catching: YouTube Video
Source Codes, Architecture & Build Guide
5 repos + 7 Flowise schemas + video build guide. Main repo: GitHub
Caveats & Assumptions
Lots of them...plus tips...check comments...
Caveats, Assumptions& Tips
- Reasoning estimates: β ~100 queries across 4 reasoning agents (1-3 iterations per request. 1 iteration = 1 query).
- Regular queries: Based on months of live usage (API calls, automation, web scraping, NL-to-SQL via custom UIs).
- Use case-specific: Estimates apply to queries demoed in the video.
- High variability for same query: expect to come down as LLMs stabilize
- Critical to estimate costs for your own use case.
- Check actual billing β Pen-and-paper token math is unreliable.
- Time-based variability β Example: r1 costs were very high a few weeks ago but are now more reasonable-even though rack rate pricing is unchanged. Be mindful.
- Prototype app - live working prototype.
CPC breakdown- reasoning& analysis
- o3-mini: ~8.5c (reasoning + execution)
- gemini-2.0-flash:
11c (reasoning = free tier, execution = 11c). Paid tier is cheaper than gpt-4o-mini (0.10c additional). - r1: ~13.5c (reasoning = 4c, execution = 9.5c)
- sonnet-3.7: ~20.5c (planning = 11.5c, execution = 9c)
CPC- regular queries
- gpt-4o-mini β ~0.10c (my workhorse β solid performance, solid pricing)
- gpt-4o β ~1.5c (I shift to gpt-4o if gpt-4o-mini stumbles)
- sonnet β With 3.5, I used to get ~2.5c. With 3.7, costs are now much higher despite the same token pricing-likely a temporary issue.
Workhorse LLM: 4o-mini default; 4o when it stumbles. Flash2 may take over-better performance, quality, and cost, with improved reliability over last year's Gemini.
Detailed Video Guide
Demo, build guide, architecture, API call flows, error catching, repo walkthrus and more.
GitHub Repos& Schemas
Main Repo: GitHub
With step-by-step build guide & links to other repos
Agents Schemas - Flowise
In docs folder in Main Repo