AI Co-Analyst - Live Multi-Agent App. Cost, quality, reliability - what works? what doesn't?

Published: March 2, 2025

Sonnet-3.7 the best, Deepseek 2nd, Gemini excellent. Try it 👉 tigzig.com (open source)

Top Line

As an AI Co-Analyst LLM, Sonnet-3.7 is my top choice for deep, incisive analysis support....loving Gemini-2.0-Flash for balance of quality, reliability and cost.. and it's the fastest. Deepseek-R1 quality close to Sonnet but less reliable. o3-mini is lowest cost but not too great

Take it for a spin

Go to tigzig.com → Click 'Sample' to auto-upload a sample file into a temporary Postgres database. Choose your advanced analyst agent - Gemini/Sonnet/R1/o3-mini. Use sample prompt or modify it
No login, database creds, or API keys needed
Option: connect your own database...or upload your own files

Agent Setup ➔ Flowise AI

Sequential Agents (LangGraph). Router agent ➟ regular queries to a general analyst agent and complex queries to an advanced analysis route ➟ Reasoning LLM ➟ analysis plan + SQL queries ➟ execution agent (gpt-4o) reviews, corrects, executes, and debugs before delivering results

Quality

My (judgmental) ranking – reasoning & analysis

Sonnet – best by far. Brilliant derived variables & approach. Score➔ 100 (baseline). Sometimes too deep for 4o to execute, but superb for iterative analysis
R1 – close to sonnet ➔ 95
Gemini – excellent ➔85
o3-mini – hmmm... ➔50

CPQ (Cost per Query)

Reasoning-based analysis (breakdown in comments)

o3-mini: ~8.5c
Gemini: ~11c
R1: ~13.5c
Sonnet: ~20.5c

Variance: up to ±50% on the same query.. models evolving...and variances coming down.

Latencies: mostly 1-4 mins, sometimes 10+ mins....time of day matters – peak vs. off-peak. Gemini the fastest.

CPQ– Regular Queries

4o-mini: ~0.10c
4o: ~1.5c

4o-mini the workhorse; 4o when it stumbles...Gemini may take over

Variance: ±20% – stable in live deployments

Latencies: 15 sec to 3 min depending on query complexity and time of day.

Reliability

o3-mini & Sonnet – high reliability -negligible API failures
Gemini – high nowadays...but would like to see for some time
R1 – low - API failures & latency spikes. Improving- likely temporary. Alternate hosting options available.

Demoed Example

Scoring & Ranking of Indian Banks - Credit Card Segment
Data Mart & Profile Summary for 1M Cust + 10M Trans.

SQL Errors / API Failures / Data Validations?

See detailed video guide - for live debugging / error catching: YouTube Video

Source Codes, Architecture & Build Guide

5 repos + 7 Flowise schemas + video build guide. Main repo: GitHub

Caveats & Assumptions

Lots of them...plus tips...check comments...

Caveats, Assumptions& Tips

Reasoning estimates: – ~100 queries across 4 reasoning agents (1-3 iterations per request. 1 iteration = 1 query).
Regular queries: Based on months of live usage (API calls, automation, web scraping, NL-to-SQL via custom UIs).
Use case-specific: Estimates apply to queries demoed in the video.
High variability for same query: expect to come down as LLMs stabilize
Critical to estimate costs for your own use case.
Check actual billing – Pen-and-paper token math is unreliable.
Time-based variability – Example: r1 costs were very high a few weeks ago but are now more reasonable-even though rack rate pricing is unchanged. Be mindful.
Prototype app - live working prototype.

CPC breakdown- reasoning& analysis

o3-mini: ~8.5c (reasoning + execution)
gemini-2.0-flash: ~~11c (reasoning = free tier, execution = 11c). Paid tier is cheaper than gpt-4o-mini (~~0.10c additional).
r1: ~13.5c (reasoning = 4c, execution = 9.5c)
sonnet-3.7: ~20.5c (planning = 11.5c, execution = 9c)

CPC- regular queries

gpt-4o-mini – ~0.10c (my workhorse – solid performance, solid pricing)
gpt-4o – ~1.5c (I shift to gpt-4o if gpt-4o-mini stumbles)
sonnet – With 3.5, I used to get ~2.5c. With 3.7, costs are now much higher despite the same token pricing-likely a temporary issue.

Workhorse LLM: 4o-mini default; 4o when it stumbles. Flash2 may take over-better performance, quality, and cost, with improved reliability over last year's Gemini.

Detailed Video Guide

Demo, build guide, architecture, API call flows, error catching, repo walkthrus and more.

GitHub Repos& Schemas

Main Repo: GitHub

With step-by-step build guide & links to other repos

Agents Schemas - Flowise

In docs folder in Main Repo

🔗

Blog Migration Notice: Some links or images in earlier posts may be broken. View the original post on the old blog site.