# INTELISCAPE-X: AI-Powered Web Scraping in Excel

Describe in natural English what data you need from web pages - AI extracts it into structured Excel tables. Uses Jina API for web rendering and Gemini API for structured data extraction. Batch URL processing with custom column specifications, error tracking, and performance dashboard.

## Links
- App: https://app.tigzig.com/web-scraper
- Video Guide: https://www.youtube.com/watch?v=41ZX46DibV4
- Blog Post: https://www.tigzig.com/post/intelligent-ai-web-scraper-in-excel-with-python-xlwings-lite

## Tags
python-in-excel, xlwings-lite

## Download
- Excel Workbook: https://www.tigzig.com/files/xlwings/INTELISCAPE_X_DYNAMIC_WEB_SCRAPER_v2_1125.xlsx

## Architecture

```
Excel (URLs + Config + Column Specs) → Jina API (Fetch & Render Web → Markdown)
  → Gemini API (AI Extraction → Structured JSON) → Python (Parse & Format)
    → Excel Output (DATA, DASHBOARD, ERROR_LOG, URL_LIST Status)
```

### Three-Stage Pipeline

**Stage 1: Configuration**
- xlwings Lite runs Python directly in Excel via WebAssembly
- Reads config from MASTER sheet (Jina and Gemini API keys, model settings, scraping parameters)
- Loads column definitions from COLUMN_INPUTS sheet

**Stage 2: Data Extraction**
- Sends each URL to Jina API to fetch and render as markdown
- Passes rendered content to Gemini API with column specifications
- Gemini extracts structured data matching your definitions

**Stage 3: Results**
- Writes extracted data to DATA sheet as formatted table
- Logs errors to ERROR_LOG sheet with timestamps
- Generates DASHBOARD with performance metrics and token usage

### How to Use
1. Install xlwings Lite add-in in Excel
2. Configure MASTER sheet: add Jina and Gemini API keys, select model (e.g., gemini-2.5-flash)
3. Define extraction columns in COLUMN_INPUTS: column names, descriptions, filtering instructions
4. Add target URLs in URL_LIST
5. Run scrape_urls_from_list from xlwings tab
6. Review results in DATA, DASHBOARD, ERROR_LOG sheets

## Resources
- Jina AI Web Scraping API: https://jina.ai/api-dashboard/reader
- Google Gemini API (structured output): https://ai.google.dev/gemini-api/docs/structured-output?lang=rest
- xlwings Lite: https://lite.xlwings.org
