How to build Voice-based AI Action Agents App to Execute Tasks, Automate Reports, and Analyze Data …and more.

Published: October 24, 2024

Video thumbnail

In this article I show how to build a voice-powered AI agentic app to automate business tasks and analyze live data warehouses.

NEW

My open-source platform with a ton of micro-apps and tooling's for AI driven analytics and data science

This 5 part series has very detailed video guides for numerous reusable components that can be used across many apps

This app has a non-realtime implementation of voice AI agents. For realtime voice AI agent app implemented with OpenAI's new WebRTC check out the posting below

Chat is just one of the many things LLMs can do…They can act and execute.

VTEXER: Voice-Enabled LLM Action Agent App for Task Execution Automation Query and Research. I demonstrate how you can, with natural language voice instructions:

And yes, it can also chat.

Detailed, step-by-step, hands-on implementation guide, source code, GitHub repo, Schemas, Blueprints shared in sections below.

The source deployed as-is on Vercel etc. will give you a fully functional voice-bot right off the bat (with restricted app features)

By leveraging platforms like Flowise AI and Make.com, you can interact with and execute tasks in voice and natural language text with: your business data warehouses, thousands of platforms (Zoho, QuickBooks, Salesforce, AWS, Google, Microsoft, Azure, Slack HubSpot, Stripe, Woo Commerce, Shopify….) and most importantly your own existing API's

The implementations are relatively rapid, with part of it being no-code, and any coding being done by AI-assisted coding tools.

As part of the series, I show you how to use AI Assisted Coding Tools like Claude Dev and Cursor AI to develop LLM Apps with natural language commands. And deploy to open internet.

The app uses LLM ReAct Agents, Reasoning and Action Agents. Called as such since they can execute tasks via API calls, also called Function Calls / Tool Calling. Fairly easy to setup in Flowise. Flowise has a marvelous functionality to access an agent / chat flow via an API endpoint. Allows you to fully customize the UI and response flow. Using that extensively here

ALL coding done by GPT, LLMs & AI Assisted Coding Tools. For Google Scripts and Python I like GPT-4o. For React in general it is Claude Sonnet. For React.js apps like this one, my current favorites are Claude Dev, a VS Code extension and Cursor AI IDE, a VS Code fork. Claude Dev works on Github Codespaces also… I have a bit of preference for cloud over local. Both are able to create new files as well as modify files across whole codebase. Claude Dev even has terminal access, so does pretty much everything.

Hands on Implementation Guide

Video guide

Source code, JSON Schemas and blueprints on GitHub:

The repo, deployed ASIS to Vercel / Netlify etc will give you a fully functional voice-bot (with restricted app features)

Top Flowise AI Resource:

Leon Van Zyl's videos are a must-watch - from simple chatbots to complex agentic flows. Leon van Zyl - YouTube

AI Assisted Coding: Must Read

Implementation Videos with Time-Stamped Sections

Complete Playlist at: YouTube

Part 1: Demo and Agent Process Flow

Part 2 : How to update Excel, Google Sheets and Databases with AI / LLM Voice Agents

Part 3: How To Automate Excel to PDF, Excel to Slides, Automatic Email with AI Agents and Google Scripts

Part 4: How to Build AI Voice Action Agent App with natural language with Claude Dev and Cursor AI

Part 5: How to create LLM Agents that query databases, do web search and take action

Hope you find the guide useful and enjoy using it as much as I enjoyed making it.

🔗
Blog Migration Notice: Some links or images in earlier posts may be broken. View the original post on the old blog site.