Intelligent AI Web Scraper in Excel with Python (xlwings Lite)
- Amar Harolikar
- Nov 3
- 2 min read
Tell it what to extract - fields, filters, rules - all in plain English.
Live Tool. Paste a URL list. The app extracts the fields you asked for, applies your transforms, filters and any data cleaning instructions. And updates structured data in Excel, URL status, error logs, and a 30-KPI scraping dashboard.
▪ What Intelligence does
You can share instructions in conversational English.
Filter: 'Don't include any records from Mumbai, Chennai and Delhi'
Select: 'If there are multiple phone numbers, just keep the first one.' ,
Normalize: 'If a state is abbreviated (e.g., UP), replace with the full name (e.g., Uttar Pradesh).'
Derive: 'Combine first name and city into a unique ID. No spaces. All caps.'
▪ How it works
Jina.ai fetches the text from URL. Gemini extracts and transforms. xlwings Lite runs it all. Control LLM Intelligence by tweaking LLM parameters: topP, temperature, max tokens, and thinking budget. Configure scrape parameters like request delays, max retries and timeout settings. Non-deterministic by default. For stricter pulls, add a Python parser layer or a hybrid. Select LLMs by use case: cheap-fast for volume, slower-stronger for precision.
▪ How to customize.
I use it as-is for common client scrapes and customize for tougher cases. To customize, hand the code to your AI Coder with change requests. Examples: switch to OpenAI or Anthropic, capture images, follow child URLs, add a python parser step. The foundation is xlwings Lite. I've documented my process and examples in the xlwings Practice Lab: xlwings-lite.tigzig.com. Refer to my blog posts (links below) on choosing an AI Coder.
▪ Live Tool, Source Code & Docs
Built with xlwings Lite by Felix Zumstein (lite.xlwings.com)
▪ Resources
xlwings Lite Practice Lab - xlwings-lite.tigzig.com
Which AI Coder should you use for xlwings Lite (Blog): https://www.tigzig.com/post/which-ai-coder-should-you-use-for-xlwings-lite-python-in-excel
Technical Analysis Report - app.tigzig.com/technical-analysis-report
Database and Machine Learning - app.tigzig.com/xlwings-api-db
AI Coder Instruction file for xlwings Lite (Blog)
https://www.tigzig.com/post/a-1-450-line-context-file-to-ensure-clean-efficient-xlwings-lite-code-ge

