Intelligent AI Web Scraper in Excel with Python (xlwings Lite)
Published: November 3, 2025
Tell it what to extract - fields, filters, rules - all in plain English.
Live Tool. Paste a URL list. The app extracts the fields you asked for, applies your transforms, filters and any data cleaning instructions. And updates structured data in Excel, URL status, error logs, and a 30-KPI scraping dashboard.
What Intelligence does
You can share instructions in conversational English.
- Filter: "Don't include any records from Mumbai, Chennai and Delhi"
- Select: "If there are multiple phone numbers, just keep the first one."
- Normalize: "If a state is abbreviated (e.g., UP), replace with the full name (e.g., Uttar Pradesh)."
- Derive: "Combine first name and city into a unique ID. No spaces. All caps."
How it works
Jina.ai fetches the text from URL. Gemini extracts and transforms. xlwings Lite runs it all. Control LLM Intelligence by tweaking LLM parameters: topP, temperature, max tokens, and thinking budget. Configure scrape parameters like request delays, max retries and timeout settings. Non-deterministic by default. For stricter pulls, add a Python parser layer or a hybrid. Select LLMs by use case: cheap-fast for volume, slower-stronger for precision.
How to customize
I use it as-is for common client scrapes and customize for tougher cases. To customize, hand the code to your AI Coder with change requests. Examples: switch to OpenAI or Anthropic, capture images, follow child URLs, add a python parser step. The foundation is xlwings Lite. I've documented my process and examples in the xlwings Practice Lab: xlwings-lite.tigzig.com. Refer to my blog posts (links below) on choosing an AI Coder.
Live Tool, Source Code & Docs
Built with xlwings Lite by Felix Zumstein (lite.xlwings.com)
Resources
AI Coder Instruction file for xlwings Lite
xlwings Lite official site: lite.xlwings.org