---
title: "AI Co-Analyst - Live Multi-Agent App. Cost, quality, reliability - what works? what doesn't?"
slug: ai-co-analyst-live-multi-agent-app-cost-quality-reliability
date_published: 2025-03-02T13:34:27.717Z
original_url: https://www.tigzig.com/post/ai-co-analyst-live-multi-agent-app-cost-quality-reliability
source: migrated
processed_at: 2025-12-02T10:00:00.000Z
---

# AI Co-Analyst - Live Multi-Agent App. Cost, quality, reliability - what works? what doesn't?

Sonnet-3.7 the best, Deepseek 2nd, Gemini excellent. **Try it** 👉 [tigzig.com](http://tigzig.com/) (open source)

## Top Line

As an AI Co-Analyst LLM, Sonnet-3.7 is my top choice for deep, incisive analysis support....loving Gemini-2.0-Flash for balance of quality, reliability and cost.. and it's the fastest. Deepseek-R1 quality close to Sonnet but less reliable. o3-mini is lowest cost but not too great

## Take it for a spin

* Go to [tigzig.com](http://tigzig.com/) → Click 'Sample' to auto-upload a sample file into a temporary Postgres database. Choose your advanced analyst agent - Gemini/Sonnet/R1/o3-mini. Use sample prompt or modify it
* No login, database creds, or API keys needed
* Option: connect your own database...or upload your own files

## Agent Setup ➔ Flowise AI

Sequential Agents (LangGraph). Router agent ➟ regular queries to a general analyst agent and complex queries to an advanced analysis route ➟ Reasoning LLM ➟ analysis plan + SQL queries ➟ execution agent (gpt-4o) reviews, corrects, executes, and debugs before delivering results

## Quality

My (judgmental) ranking – reasoning & analysis

* Sonnet – best by far. Brilliant derived variables & approach. Score➔ 100 (baseline). Sometimes too deep for 4o to execute, but superb for iterative analysis
* R1 – close to sonnet ➔ 95
* Gemini – excellent ➔85
* o3-mini – hmmm... ➔50

## CPQ (Cost per Query)

Reasoning-based analysis (breakdown in comments)

* o3-mini: ~8.5c
* Gemini: ~11c
* R1: ~13.5c
* Sonnet: ~20.5c

**Variance:** up to ±50% on the same query.. models evolving...and variances coming down.

**Latencies:** mostly 1-4 mins, sometimes 10+ mins....time of day matters – peak vs. off-peak. Gemini the fastest.

## CPQ– Regular Queries

* 4o-mini: ~0.10c
* 4o: ~1.5c

4o-mini the workhorse; 4o when it stumbles...Gemini may take over

**Variance:** ±20% – stable in live deployments

**Latencies:** 15 sec to 3 min depending on query complexity and time of day.

## Reliability

* o3-mini & Sonnet – high reliability -negligible API failures
* Gemini – high nowadays...but would like to see for some time
* R1 – low - API failures & latency spikes. Improving- likely temporary. Alternate hosting options available.

## Demoed Example

* Scoring & Ranking of Indian Banks - Credit Card Segment
* Data Mart & Profile Summary for 1M Cust + 10M Trans.

## SQL Errors / API Failures / Data Validations?

See detailed video guide - for live debugging / error catching: [YouTube Video](https://www.youtube.com/watch?v=hqn3zrdXVSQ)

## Source Codes, Architecture & Build Guide

5 repos + 7 Flowise schemas + video build guide. Main repo: [GitHub](https://github.com/amararun/shared-rexdb-auth-embed-v2)

## Caveats & Assumptions

Lots of them...plus tips...check comments...

### Caveats, Assumptions& Tips

* Reasoning estimates: – ~100 queries across 4 reasoning agents (1-3 iterations per request. 1 iteration = 1 query).
* Regular queries: Based on months of live usage (API calls, automation, web scraping, NL-to-SQL via custom UIs).
* Use case-specific: Estimates apply to queries demoed in the video.
* High variability for same query: expect to come down as LLMs stabilize
* Critical to estimate costs for your own use case.
* Check actual billing – Pen-and-paper token math is unreliable.
* Time-based variability – Example: r1 costs were very high a few weeks ago but are now more reasonable-even though rack rate pricing is unchanged. Be mindful.
* Prototype app - live working prototype.

### CPC breakdown- reasoning& analysis

* o3-mini: ~8.5c (reasoning + execution)
* gemini-2.0-flash: ~11c (reasoning = free tier, execution = 11c). Paid tier is cheaper than gpt-4o-mini (~0.10c additional).
* r1: ~13.5c (reasoning = 4c, execution = 9.5c)
* sonnet-3.7: ~20.5c (planning = 11.5c, execution = 9c)

### CPC- regular queries

* gpt-4o-mini – ~0.10c (my workhorse – solid performance, solid pricing)
* gpt-4o – ~1.5c (I shift to gpt-4o if gpt-4o-mini stumbles)
* sonnet – With 3.5, I used to get ~2.5c. With 3.7, costs are now much higher despite the same token pricing-likely a temporary issue.

**Workhorse LLM:** 4o-mini default; 4o when it stumbles. Flash2 may take over-better performance, quality, and cost, with improved reliability over last year's Gemini.

## Detailed Video Guide

Demo, build guide, architecture, API call flows, error catching, repo walkthrus and more.

## GitHub Repos& Schemas

**Main Repo:** [GitHub](https://github.com/amararun/shared-rexdb-auth-embed-v2)

With step-by-step build guide & links to other repos

**Agents Schemas - Flowise**

In docs folder in Main Repo

