top of page

Intelligent AI Web Scraper in Excel with Python (xlwings Lite)

Tell it what to extract - fields, filters, rules - all in plain English.


Live Tool. Paste a URL list. The app extracts the fields you asked for, applies your transforms, filters and any data cleaning instructions. And updates structured data in Excel, URL status, error logs, and a 30-KPI scraping dashboard.


▪ What Intelligence does

You can share instructions in conversational English.

Filter: 'Don't include any records from Mumbai, Chennai and Delhi'

Select: 'If there are multiple phone numbers, just keep the first one.' ,

Normalize: 'If a state is abbreviated (e.g., UP), replace with the full name (e.g., Uttar Pradesh).'

Derive: 'Combine first name and city into a unique ID. No spaces. All caps.'


▪ How it works

Jina.ai fetches the text from URL. Gemini extracts and transforms. xlwings Lite runs it all. Control LLM Intelligence by tweaking LLM parameters: topP, temperature, max tokens, and thinking budget. Configure scrape parameters like request delays, max retries and timeout settings. Non-deterministic by default. For stricter pulls, add a Python parser layer or a hybrid. Select LLMs by use case: cheap-fast for volume, slower-stronger for precision.


▪ How to customize.

I use it as-is for common client scrapes and customize for tougher cases. To customize, hand the code to your AI Coder with change requests. Examples: switch to OpenAI or Anthropic, capture images, follow child URLs, add a python parser step. The foundation is xlwings Lite. I've documented my process and examples in the xlwings Practice Lab: xlwings-lite.tigzig.com. Refer to my blog posts (links below) on choosing an AI Coder.


▪ Live Tool, Source Code & Docs

Built with xlwings Lite by Felix Zumstein (lite.xlwings.com)


▪ Resources

xlwings Lite Practice Lab - xlwings-lite.tigzig.com

Technical Analysis Report - app.tigzig.com/technical-analysis-report

Database and Machine Learning - app.tigzig.com/xlwings-api-db

AI Coder Instruction file for xlwings Lite (Blog)



 
 

Recent Posts

See All
bottom of page