# xlwings Lite : Coding Guidelines for AI Coders (Humans Welcome)

# SYSTEM CONSTRAINTS (MUST FOLLOW)
These are the absolute highest priority rules. You must verify your code against these before outputting.
1. **NO INDEXES:** Always use `.options(index=False)` when writing DataFrames. `sheet.value = df` is FORBIDDEN.
2. **ROBUST SHEET DELETION:** Never assume a sheet doesn't exist. Always check and delete before creating: `if s.name == 'MySheet': s.delete()`.
3. **NO AUTO-FIT:** Never call `.autofit()`. It is not supported.
4. **NO RGB TUPLES:** Always use hex strings (e.g., '#F0F0F0') for colors.

### Last Update date: 30th NOVEMBER 2025

## 1. Introduction
This document provides coding guidelines for xlwings Lite - a browser-based Python environment running inside Excel.

## 2. Execution Context (Where Your Code Runs)
**YOUR CODE RUNS IN:**
- **File:** `main.py` inside the xlwings Lite add-in task pane (right side of Excel)
- **Environment:** Pyodide (Python compiled to WebAssembly, runs in browser sandbox)
- **Trigger:** User clicks play button or presses F5 to execute `@script` decorated functions
- **Access:** Your code has direct read/write access to the active workbook via `book: xw.Book`
- **Output:** `print()` statements appear in the console log below the code editor
- **Dependencies:** Managed via `requirements.txt` tab in the same task pane

## 3. AI Coder Checklist: Core Directives for All Scripts

**Golden Rules: These 20 directives are non-negotiable and MUST be applied in every script.**

1. **ALWAYS** use the `find_table_in_workbook()` helper to locate tables.
2. **ALWAYS** use `.options(index=False)` when writing DataFrames.
3. **ALWAYS** use hex strings (e.g., '#F0F0F0') for colors, **NEVER** RGB tuples which will raise `ValueError`.
4. **ALWAYS** make custom functions (`@func`) robust. Use `typing.Any` for arguments from cell references and handle data conversions safely inside the function. **NEVER** rely on specific type hints like `float` or `str` for cell inputs, as this will cause `#VALUE!` errors. See section `6.7` for the mandatory pattern.
5. **CRITICAL: NEVER** use `.expand()` on newly written data. It runs too fast and **WILL** fail with an `IndexError`. **ALWAYS** define ranges explicitly using `.resize(df.shape[0] + 1, df.shape[1])`). This is the most common point of failure.
6. **ALWAYS** wrap fallible operations like `sheet.tables.add()` in `try...except`.
7. **CRITICAL: NEVER** access a table's parent sheet with `.sheet` (e.g., `my_table.sheet`). It will fail. The helper from Rule #1 **MUST** return both the sheet and the table (`sheet, table = find_table...`). See Section 3.1 for the mandatory pattern.
8. **CRITICAL FOR SUMMARY TABLES:** When creating new tables from summary data, ensure the main descriptor column (e.g., 'Value') contains only strings. A column mixing text ('< 10k') and numbers (10000) **WILL** cause a silent GeneralException when `.tables.add()` is called. **ALWAYS** explicitly convert all values in such columns to strings (e.g., `str(value)`) before writing the data. See Section 3.1 for detailed examples of preventing mixed-type column failures.
9. **CRITICAL FOR PIVOTED SUMMARIES:** When a summary requires reshaping (e.g., turning rows into columns with `.unstack()`), you **MUST** use the robust iterative 'list-of-dicts' pattern. A direct `.groupby().agg()` is only safe for simple, non-reshaped summaries.
10. **ALWAYS USE `@script` FOR RUNNABLES:** Any Python function that users may execute via the xlwings Lite task pane **MUST** be decorated with `@script` — even if it's called elsewhere. Treat each user-facing functionality as a standalone runnable. **NEVER** omit `@script` on functions meant for direct execution.
11. **CRITICAL: NEVER** write a 2D Python list to a single starting cell (e.g., `sheet["A10"].value = [["Label", value]]`). This operation is unstable in the xlwings Lite API and is the known cause of silent InvalidArgument errors that Python cannot detect. **ALWAYS** write small, non-DataFrame data atomically (one cell at a time). See Section 3.4 for the mandatory pattern.
12. **CRITICAL FOR PERSISTENT UI ERRORS:** When a script consistently fails with a generic InvalidArgument error and produces partial or no output (e.g., only the first of several tables appears), especially if the user reports "it was working fine and suddenly stopped," you **MUST** stop suggesting code changes. Your immediate next step is to ask the user a targeted diagnostic question: "Have you recently manually renamed an output sheet that was created by a previous run of this same script?" This is a known cause of uncatchable API errors, and the only solution is user action. See Section 7.5.1 for the full explanation and user guidance.
13. **CRITICAL FOR SILENT UI ERRORS:** If you see a generic error in the Excel UI (e.g., InvalidArgument or Invalid parameter out of range) but no specific Python error in the logs, it's almost always a data type issue during the write operation. To prevent this, you **MUST** explicitly convert any DataFrame column that contains mixed data types (e.g., numbers and text) to a single, consistent type (e.g., `.astype(str)`) BEFORE writing it to Excel. See Section 7.5.2 for the full explanation.
14. **CRITICAL: PREVENT InvalidArgument ERRORS BY AVOIDING MIXED DATA TYPES.** Never mix data types within a DataFrame column. The most common cause of a silent InvalidArgument error is a column containing both numbers and empty strings (`''`). **ALWAYS** use `None` for missing numbers, not `''`. When a column must contain text and numbers (e.g., '<10k', 10000), you **MUST** convert the entire column to string via `.astype(str)` before writing. See Section 7.5.1 for the full troubleshooting guide on this error.

    Example:
    ```python
    # INCORRECT (WILL FAIL with InvalidArgument): Mixes float and string
    df['P-Value'] = [0.05, '', 0.05, ''] # -> dtype: object

    # CORRECT (Robust): Uses None, keeps column numeric
    df['P-Value'] = [0.05, None, 0.05, None] # -> dtype: float64

    # ALSO CORRECT (Robust): Explicitly converts to string for display
    df['Value'] = ['<600', 600, 700]
    df['Value'] = df['Value'].astype(str) # -> All values are strings
    ```

15. **CRITICAL: NEVER USE `sheet.autofit()` - IT'S NOT SUPPORTED.** Will raise `NotImplementedError`. Use `sheet.range().column_width = value` or rely on Excel's default formatting.

16. **CRITICAL FOR SAMPLING: ALWAYS use `sklearn.model_selection.train_test_split` with the `stratify` parameter for creating balanced samples** (e.g., Test/Control). This is the mandatory gold standard for ensuring representative samples in data science workflows.

17. **CRITICAL TO PREVENT InvalidArgument ERRORS: The `sheet.tables.add(name=...)` API call is unstable when used repeatedly in a single script run.** While naming one table is often safe, creating multiple named tables in a loop or in quick succession on the same sheet **WILL** lead to silent crashes. To balance functionality with stability, follow this logic:
   - **WHEN TO NAME (The Exception):** If the user provides a specific name for a table (e.g., "create a table named LEADS_CAT"), you **SHOULD** use the name parameter. This is considered a low-risk, single-table operation and is often essential for the workflow.
   - **WHEN TO OMIT (The Default):** If the user does not provide a specific name, OR if you are creating multiple tables on a single report sheet (e.g., a numeric and categorical profile on one audit sheet), you **MUST OMIT** the name parameter. This is the safest default behavior. Let Excel assign the names (Table1, Table2).
   - **Brittle (Will Fail in Loops/Multi-Table Reports):** `sheet.tables.add(source=my_range, name=f"DynamicName_{i}")`
   - **Robust (Always Works):** `sheet.tables.add(source=my_range)`

18. **ALWAYS `activate()` the last output sheet created or modified, where applicable, to enhance user experience.** This ensures users immediately see the results of the script execution instead of remaining on the original data sheet.

19. **CRITICAL: ROBUST NUMERIC CONVERSION AND IMPUTATION FOR STATISTICAL CONSISTENCY.** When converting columns to numeric types and subsequently calculating statistics (like median, mean, standard deviation) for purposes such as imputation or stratification, ensure the `pd.to_numeric(errors='coerce')` operation is completed and materialized before calculating the statistic. Chaining `.fillna(df['col'].median())` directly after `pd.to_numeric()` in a single line can lead to the `.median()` method (or other statistical functions) operating on an intermediate object dtype Series that has not yet had all non-numeric values reliably converted to `np.nan`. This can result in subtly incorrect imputed values or stratification bins, leading to inconsistent analytical results. **Always convert, then calculate statistics, then impute.**

Example:
```python
# INCORRECT (Potential Pitfall: Median calculation may be unreliable due to chaining):
# df['MyNumericColumn'] = pd.to_numeric(df['MyNumericColumn'], errors='coerce').fillna(df['MyNumericColumn'].median())

# CORRECT (Robust and Mandatory for reliable imputation/data preparation):
df['MyNumericColumn'] = pd.to_numeric(df['MyNumericColumn'], errors='coerce') # Step 1: Ensure column is fully numeric with NaNs

# Apply imputation logic based on requirement (e.g., conditional median, fixed value, or mean)
if df['MyNumericColumn'].isnull().any(): # Optional: only calculate/fill if NaNs are present
    impute_value = df['MyNumericColumn'].median() # Step 2: Calculate median (or mean) on the now-clean, numeric column
    df['MyNumericColumn'].fillna(impute_value, inplace=True) # Step 3: Fill NaNs
# For a fixed fill value (e.g., 0 for inquiry):
# df['InquiryColumn'].fillna(0, inplace=True)
```

20. **CRITICAL: SEPARATE DATA CALCULATION FROM EXCEL I/O.** For complex scripts involving multi-step data processing and reporting, **ALWAYS** separate the calculation and transformation logic (e.g., creating summary DataFrames, running statistical tests) from the Excel interaction logic (e.g., writing DataFrames to ranges, adding tables, formatting cells).
   - **Calculation Functions:** Should take DataFrames/parameters and **RETURN** transformed DataFrames or Python data structures. They should **NOT** directly interact with `xw.Sheet` or `xw.Range` objects.
   - **I/O Functions (or blocks):** Should take the calculated DataFrames/structures and a `xw.Sheet`/`xw.Range` object, then perform the writing and formatting operations.
   - This separation enhances testability, modularity, readability, and simplifies debugging by clearly isolating data processing errors from Excel API interaction errors.

21. **ALWAYS ENSURE RE-RUNNABILITY:** Before creating a new sheet with `book.sheets.add('Name')`, you **MUST** explicitly check if a sheet with that name already exists and delete it. Use a robust loop-based check (`if s.name == name: s.delete()`) rather than `if name in book.sheets`, which can be flaky.

### 3.1 Script Robustness & Reliability
> - **Robustly Locate Excel Tables:**
>   **CRITICAL:** The xlwings Lite Table object has no `.sheet` attribute. You cannot get a table's parent sheet from the table object itself. Relying on `book.sheets.active` is also unreliable.
>
>   Therefore, to access a table and its parent sheet, you **MUST** include and use the following helper function. This function's signature—returning both the sheet and the table—is mandatory.
>
>   ```python
>   # THIS HELPER FUNCTION IS MANDATORY FOR ALL SCRIPTS ACCESSING TABLES. The import and full type hints make this snippet robust and self-contained. Use it asis copy-paste.
>
>   from typing import Tuple, Optional, Any
>
>   def find_table_in_workbook(book: xw.Book, table_name: str) -> Tuple[Optional[xw.Sheet], Any]:
>   """
>   Searches all sheets for a table and returns both the sheet and table objects.
>   Returns: (xw.Sheet, xw.Table) or (None, None) if not found.
>   """
>   for sheet in book.sheets:
>       if table_name in sheet.tables:
>           return sheet, sheet.tables[table_name]
>   return None, None
>   ```
>
>   **Mandatory Usage Pattern:**
>   - **Incorrect (WILL FAIL):** `table = find_table(...) followed by table.sheet`
>   - **Correct (Robust):** `source_sheet, my_table = find_table_in_workbook(book, 'MyTable')`

> - **Choosing the Correct Pattern for Summary Tables: Robustness vs. Directness**
>   Creating summary tables is a common task, but it requires choosing the right pandas pattern to avoid errors. There are two primary methods. Using the wrong one for the task is a primary source of KeyError and ValueError.
>
>   **A. The Direct Method (.groupby().agg()): For Simple, Non-Pivoted Summaries**
>   This method is efficient and safe only when the output of the aggregation does not need to be reshaped.
>   Use this when: You need a simple summary where the grouping variable remains as the index (or a column after .reset_index()).
>   Example: Calculating average balance per lead category.
>   ```python
>   # SAFE AND CORRECT for this use case.
>   # The output structure is simple and predictable.
>   summary_df = df.groupby('LEADS_CAT').agg(
>       Count=('CUST_ID', 'count'),
>       Avg_Balance=('CURR_BAL', 'mean')
>   ).reset_index()
>
>   # This result is stable and can be written directly to Excel.
>   ```
>
>   **B. The Robust Iterative Method (list-of-dicts): Mandatory for Pivoted/Reshaped Summaries**
>   This method **MUST** be used whenever the final report structure requires reshaping, such as turning unique row values into columns (pivoting). This is common in comparison reports (e.g., Test vs. Control).
>   Use this when: The final table's columns are derived from the values of a column in the source data (e.g., 'TEST' and 'CONTROL' columns derived from the 'GROUP' column).
>   Example: Creating the Test vs. Control numeric profile.
>   ```python
>   # BRITTLE - DO NOT USE: The .unstack() call is fragile and will fail if a group is missing.
>   # failed_df = df.groupby(['GROUP', 'Variable']).agg(...).unstack()
>
>   # ROBUST AND MANDATORY for this use case.
>   report_rows = []
>   for var in numeric_vars:
>       # Safely get stats for each group
>       test_stats = df[df['GROUP'] == 'TEST'][var].describe()
>       control_stats = df[df['GROUP'] == 'CONTROL'][var].describe()
>
>       # Build a dictionary row-by-row, which is predictable and safe.
>       report_rows.append({
>           'Variable': var,
>           'Metric': 'mean',
>           'Test_Group_Value': test_stats.get('mean', 0),
>           'Control_Group_Value': control_stats.get('mean', 0)
>       })
>
>   # The final DataFrame is built from a simple, stable list structure.
>   final_df = pd.DataFrame(report_rows)
>   ```

> - **Prevent Mixed-Type Column Failures in New Tables:** The `.tables.add()` command can silently fail if a column in the source data contains mixed types (e.g., strings and numbers). This is common in summary reports where a 'Value' column might contain labels like '< 600' and numbers like 600. To prevent this, always ensure such columns are converted to a single, consistent type (str is safest) before writing to Excel.
>
>   **Incorrect (WILL CRASH):** The Value key has mixed types.
>   ```python
>   report_rows = [
>       {'Metric': 'BScore', 'Value': '< 600', 'Count': 10},
>       {'Metric': 'BScore', 'Value': 600, 'Count': 5}, # <-- This number will cause a crash
>   ]
>   df = pd.DataFrame(report_rows)
>   sheet["A1"].value = df
>   # This next line will likely cause a GeneralException
>   sheet.tables.add(source=sheet["A1"].expand())
>   ```
>
>   **Correct (Robust):** All values are explicitly converted to strings.
>   ```python
>   report_rows = [
>       {'Metric': 'BScore', 'Value': '< 600', 'Count': 10},
>       {'Metric': 'BScore', 'Value': str(600), 'Count': 5}, # <-- Safely converted to string
>   ]
>   df = pd.DataFrame(report_rows)
>   sheet["A1"].value = df
>   # This will now work reliably
>   sheet.tables.add(source=sheet["A1"].resize(df.shape[0] + 1, df.shape[1]))
>   ```

> - **Robustly Creating Quantile Bins (Tertiles, Deciles, etc.)**
>   A common data preparation step for stratification or analysis is binning a numeric column into quantiles (e.g., tertiles, deciles) using `pandas.qcut()`. This function is "brittle" by default and will crash with a `ValueError: Bin edges must be unique` if the column has too few unique values to create the requested number of bins. This is a common occurrence with real-world data.
>
>   To prevent this script-halting error, you **MUST** use the `duplicates='drop'` parameter. This tells pandas to gracefully create fewer bins if necessary, rather than crashing.
>
>   ```python
>   # INCORRECT (Brittle): This will crash if df['SCORE'] has few unique values.
>   # This line WILL FAIL on certain datasets.
>   df['score_tertile'] = pd.qcut(df['SCORE'], 3, labels=False)
>
>   # CORRECT (Robust): This is the mandatory, professional pattern.
>   # By adding duplicates='drop', the script becomes robust to any data distribution.
>   df['score_tertile'] = pd.qcut(df['SCORE'], 3, labels=False, duplicates='drop')
>   ```

> - **Ensure Type Consistency Within Report DataFrames**
>   When building DataFrames from lists of dictionaries for reports, be vigilant about data types. A common failure pattern is mixing numbers and strings in the same column.
>
>   **Incorrect (Brittle):** The P-Value key gets mixed types.
>   ```python
>   # This will create a column with mixed floats and strings, causing a crash.
>   report_rows = [
>       {'Metric': 'mean', 'Value': 105.3, 'P-Value': 0.04},
>       {'Metric': 'std', 'Value': 15.1, 'P-Value': ''}, # <-- Fails here!
>   ]
>   df = pd.DataFrame(report_rows)
>   # This write operation will fail with 'invalid argument'.
>   sheet["A1"].value = df
>   ```
>
>   **Correct (Robust):** Use None for missing numeric values.
>   ```python
>   # This creates a clean numeric column with NaN for missing values.
>   report_rows = [
>       {'Metric': 'mean', 'Value': 105.3, 'P-Value': 0.04},
>       {'Metric': 'std', 'Value': 15.1, 'P-Value': None}, # <-- Safe!
>   ]
>   df = pd.DataFrame(report_rows)
>   # This write will succeed.
>   sheet["A1"].value = df
>   ```

### 3.2 Formatting & Readability
> - **Ensure Visible Headers:** When setting a background color for a cell or range (`.color`), you **MUST** also explicitly set a contrasting font color (`.font.color`) in the same step. For a light background, use a dark font. **CRITICAL:** Only use hex color strings (e.g., '#F0F0F0'), as RGB tuples are not supported and will raise a `ValueError`.
>   - **Incorrect (Will Raise ValueError):** `header_range.color = (240, 240, 240)`
>   - **Incorrect (Unreadable):** `header_range.color = '#F0F0F0'  # Missing font color`
>   - **Correct (Always Readable):**
>     ```python
>     # ALWAYS use hex strings for both background and font colors.
>     # RGB tuples for .color are not supported and WILL cause a ValueError.
>     header_range.color = '#F0F0F0'         # Light gray background
>     header_range.font.color = '#000000'    # Black text
>     ```
>
> - **Use Clean Column Names:** Before writing a DataFrame to Excel, proactively rename columns for professional presentation (e.g., `df.rename(columns={'raw_name': 'Clean Name'})`).
>
> - **Narrate the Script's Progress:** Use descriptive `print()` statements at each major step of the script. This gives the user confidence and critical information if something goes wrong.
>
> - **Create Formal Excel Tables:** When writing a DataFrame (especially a summary) to a new region on a sheet, you **MUST** convert it into a formal Excel Table. Simply writing the data and coloring the header is insufficient and produces unprofessional results.
>   - **CRITICAL: Range Sizing:** Define the table's range explicitly using `.resize()` with the DataFrame's shape (`df.shape[0] + 1` for rows, `df.shape[1]` for columns). **NEVER** use `.expand()` on newly written data as it runs too fast and **WILL** fail with an `IndexError` before Excel can register the data.
>   - **Incorrect (WILL Fail):** `range_to_format = sheet["B2"].expand('down')`
>   - **Correct (Always Works):** `range_to_format = sheet["B2"].resize(df.shape[0] + 1, df.shape[1])`
>   - Wrap in `try...except`: As a fallible operation, the `sheet.tables.add()` call must be wrapped in a `try...except` block to ensure the script doesn't halt if table creation fails.
>   - For a best-practice implementation, see the table creation logic in the `XGBoost Response Model` script.

### 3.3 Known Limitations to Acknowledge
> - **Font Properties:** Font properties (`bold`, `italic`, `color`, `size`, `name`) can be **set**, but they cannot be **read**.
>
> - **Custom Script Arguments:** Custom scripts (decorated with `@script`) can only accept a single argument: `book: xw.Book`.
>
> - **No Direct API Access:** The `.api` property is not available.

### 3.4 Robust Data Writing: Preventing Silent InvalidArgument API Errors

#### The Problem: "Ghost" Errors After Successful Logs
A particularly difficult bug in xlwings Lite occurs when the Python console log shows that a script has completed successfully (✅ SUCCESS...), but a generic InvalidArgument error still appears in the Excel task pane UI. This happens when Python successfully sends a command, but the underlying Excel JavaScript API fails to execute it.

This error is most commonly triggered when writing small, non-DataFrame data structures, like Chi-Square statistics or summary values.

#### The Root Cause: Unstable vs. Stable Write Operations
The core issue lies in the distinction between two types of write operations:

- **Stable (DataFrame Writes):** xlwings has a highly optimized and robust converter for writing pandas DataFrames (`sheet["A1"].value = my_dataframe`). This is the professional standard for all tabular data and is proven to be reliable.

- **Unstable (2D Python List Writes):** The converter for native Python 2D lists (e.g., `[["Label", value]]`) has a bug in the Lite version when asked to auto-expand from a single starting cell. This operation's success is unreliable and can fail depending on the complexity of prior operations on the sheet, leading to the silent InvalidArgument error.

#### Mandatory Pattern: Use the Right Tool for the Write
To avoid this entire class of bugs, all write operations **MUST** adhere to the following patterns.

**For Tabular Data** (any data that is or can be a pd.DataFrame):
- **ALWAYS** write the entire DataFrame object in a single, efficient operation.
```python
# CORRECT AND PROFESSIONAL (Fast, Reliable)
# This pattern is used for writing the main numeric and categorical profile tables.
summary_df = pd.DataFrame(...)
sheet["A1"].value = summary_df
```

**For Small, Non-DataFrame Data** (e.g., summary stats, key-value pairs):
- **NEVER** use the unstable 2D list write.
- **ALWAYS** write this data one cell at a time. This is the only guaranteed-reliable method for this specific use case.
```python
# INCORRECT (Unstable, known to cause silent API errors)
# This was the exact cause of the bug in the Chi-Square stats write.
# stats_list = [["Chi-Square Statistic:", 1.6072], ["P-Value:", 0.8075]]
# sheet["A20"].value = stats_list

# CORRECT AND PROFESSIONAL (Always Reliable)
# This pattern is now the mandatory way to write non-DataFrame data.
chi2_value = 1.6072
p_value = 0.8075
sheet["A20"].value = "Chi-Square Statistic:"
sheet["B20"].value = chi2_value
sheet["A21"].value = "P-Value:"
sheet["B21"].value = p_value
```

By strictly distinguishing between these two data structures and using the correct, stable write method for each, we can ensure our scripts are robust and professional.

### 3.5 Best Practice: Separate Calculation from I/O
For complex scripts, adopt a two-phase structure to improve robustness and simplify debugging:

#### Phase 1: Calculation
Perform all data loading, cleaning, analysis, and DataFrame creation in memory. Use `print()` statements to log progress. At the end of this phase, you should have all your final DataFrames ready.

#### Phase 2: Writing
In a single, final block of code, write all the prepared DataFrames and values to Excel.

This separation prevents a failure during an early write operation from leaving the workbook in a partially updated, corrupted state. It also makes it easier to identify whether an error is in your Python logic or in the interaction with Excel.

Example:
```python
@script
def analyze_portfolio(book: xw.Book):
    # PHASE 1: CALCULATION
    print("📊 Loading and analyzing data...")

    # Load data
    sheet, table = find_table_in_workbook(book, 'Portfolio')
    df = table.range.options(pd.DataFrame, index=False).value

    # Perform all calculations
    summary_stats = calculate_summary_stats(df)
    risk_metrics = calculate_risk_metrics(df)
    allocation_df = calculate_allocations(df)

    print("✅ All calculations complete.")

    # PHASE 2: WRITING
    print("📝 Writing results to Excel...")
    try:
        # Create results sheet
        results_sheet = book.sheets.add('Portfolio_Analysis')

        # Write all results in sequence
        results_sheet["A1"].value = "Portfolio Analysis Results"
        results_sheet["A3"].value = summary_stats
        results_sheet["A10"].value = risk_metrics
        results_sheet["A20"].value = allocation_df

        print("✅ All results written successfully.")

    except Exception as e:
        print(f"❌ Error writing results: {e}")
```

## 4. Compatibility
Supported: Windows/macOS (Microsoft 365, Office 2021+), Excel on the Web (any modern browser).

## 6. Custom Functions
### 6.1 Basic Syntax
```python
from xlwings import func

@func
def hello(name):
    return f"Hello {name}!"
```
Call in Excel with: `=HELLO("World")` or `=HELLO(A1)`

### 6.2 Working with DataFrames
```python
import pandas as pd
from xlwings import func, arg, ret

@func
@arg("df", pd.DataFrame)
@ret(index=False, header=False)
def correl2(df):
    return df.corr()
```

### 6.3 Type Hints Support
```python
from xlwings import func
import pandas as pd

@func
def myfunction(df: pd.DataFrame) -> pd.DataFrame:
    return df
```

### 6.4 Variable Arguments
```python
from xlwings import func, arg

@func
@arg("*args", pd.DataFrame, index=False)
def concat(*args):
    return pd.concat(args)
```

### 6.5 Documentation
```python
from xlwings import func, arg

@func
@arg("name", doc='A name such as "World"')
def hello(name):
    """This is a classic Hello World example"""
    return f"Hello {name}!"
```

### 6.6 Date/Time Handling
```python
import datetime as dt
from xlwings import func

@func
@arg("date", dt.datetime)
def process_date(date):
    return date

# For multiple dates in a range
import xlwings as xw
@func
def process_dates(dates):
    return [xw.to_datetime(d) for d in dates]

# For DataFrames with dates
@func
@arg("df", pd.DataFrame, parse_dates=[0])
def timeseries_start(df):
    return df.index.min()
```

### 6.7 Robustness and Debugging for Custom Functions

**CRITICAL FOR AI CODERS:** The most common point of failure for custom functions (`@func`) is the `#VALUE!` error. This is almost always caused by a data type mismatch between the Excel cell and the function's argument type hint. This "pre-emptive type conversion" by the xlwings engine occurs *before* your Python code's `try...except` block can catch the error.

#### 6.7.1 The Mandatory Pattern for Robust Custom Functions

To prevent these failures, **all functions that accept arguments from Excel cell references MUST follow this pattern**:

1.  **Use `typing.Any` as the type hint.** This instructs the xlwings engine to pass the value as-is, without attempting a risky pre-conversion.
2.  **Perform data type conversions safely *inside* the function** using a robust helper function.

**Incorrect (Brittle) Approach:** Fails if `value` is blank or text.
```python
@func
def my_function(value: float):
    # This code is never reached if the pre-conversion fails.
    return value * 2
```

**Correct (Robust) Approach:** Handles any input gracefully.
```python
from typing import Any

def _to_float(value: Any, default_if_error: float) -> float:
    """Safely converts any value (int, str, None) to a float."""
    if value is None:
        return default_if_error
    try:
        return float(value)
    except (ValueError, TypeError):
        return default_if_error

@func
def my_function(value: Any):
    # The conversion is now handled safely inside our code.
    numeric_value = _to_float(value, 0.0)
    return numeric_value * 2
```

#### 6.7.2 Troubleshooting #VALUE! in Custom Functions
Follow this exact debugging sequence:
1. Check for Syntax Errors: First, confirm the function is recognized by Excel's autocomplete. If not, there is likely a syntax error in main.py (e.g., a misplaced import) preventing the file from loading. Test by replacing the entire file with a minimal function.

2. Use the Minimal Debug Function: To diagnose data type issues, use this universal test function. It bypasses all conversion issues and reports exactly what Python receives.

```python
from typing import Any

@func
@arg("CELL_VALUE", doc="A single cell to test.")
def final_debug_test(CELL_VALUE: Any) -> str:
    """Receives any value and reports its type and string representation."""
    try:
        value_type = type(CELL_VALUE).__name__
        str_value = str(CELL_VALUE)
        return f"Success! Type is '{value_type}', Value is '{str_value}'"
    except Exception as e:
        return f"Error: {e}"
```

3. Analyze and Implement: Apply the `=FINAL_DEBUG_TEST(A1)` formula. The output (Success! Type is 'int', ...) will reveal the data types. Refactor the failing function using the mandatory robust pattern from section 6.7.1.

### 6.8 Why Custom Scripts (@script) Avoid This Issue
This data type issue primarily affects custom functions (`@func`) and not scripts (`@script`) due to their fundamentally different data processing models:

| Aspect | Custom Script (@script) | Custom Function (@func) |
|--------|--------------|-------|
| Data Flow | Bulk Operation (entire tables/ranges) | Scalar Operation (one cell per argument) |
| Conversion Engine | pandas DataFrame converter | Direct "Argument Marshalling" Bridge |
| Robustness | High. Pandas is designed to handle messy, mixed-type data and infer column dtypes without crashing. | Low (by default). Prone to failure if a cell's type doesn't match the argument's type hint. |
| Solution | Use pd.to_numeric after loading data into a DataFrame. | Use the typing.Any pattern to handle conversion manually. |

## 7. Custom Scripts
### 7.1 Basic Syntax
Custom Scripts in xlwings Lite are Python functions that run at the click of a button and have access to the Excel object model. They are equivalent to VBA Subs or Office Scripts.

```python
import xlwings as xw
from xlwings import script

@script
def hello_world(book: xw.Book):
    sheet = book.sheets[0]
    sheet["A1"].value = "Hello xlwings!"
```

### 7.2 Running Scripts
- Click the run button or press F5 in the xlwings Lite add-in
- Select different scripts from the dropdown menu
- Changes to script names automatically update in the dropdown

### 7.3 Sheet Buttons
- Create buttons using Excel shapes with hyperlinks
- Name the shape in the name box (e.g., `xlwings_button`)
- Link the shape to a cell behind it (e.g., `B4`)
- Configure the script with:
```python
@script(button="[xlwings_button]Sheet1!B4", show_taskpane=True)
def hello_world(book: xw.Book):
    # your code here
```
Note: Button clicks change cell selection, so don't use for scripts that depend on selected cells.

### 7.4 Configuration Options
```python
@script(
    include=["Sheet1", "Sheet2"],  # Only include these sheets' content
    exclude=["BigData"],  # Exclude these sheets' content
    button="[mybutton]Sheet1!A1",  # Sheet button configuration
    show_taskpane=True  # Show taskpane when button clicked
)
```

### 7.5 Tips and Troubleshooting
- Use `include`/`exclude` to limit data transfer for large workbooks
- Only include sheets needed by your script
- Don't select the linked cell initially
- Verify button name in script decorator matches exactly
- Restart xlwings Lite to re-register event handlers
- Excel web doesn't support adding shape hyperlinks (but works if set up in desktop)

### 7.5.1 InvalidArgument Troubleshooting (Diagnose in Order)

**Cause 1: Mixed Data Types** (Most Common)
→ See Section 3.3 for examples. Use `None` not `''`, or `.astype(str)` for mixed columns.

**Cause 2: Sheet Renaming Contamination**
Script was working, now fails after user manually renamed an output sheet (e.g., `RESULTS` → `RESULTS_v1`).
- **Fix:** Ask user to delete the renamed sheet and re-run
- **Prevention:** Use "Move or Copy → Create a copy" instead of renaming

**Cause 3: API Instability (Last Resort)**
**Symptom:** Script logs success, but Excel UI shows generic error OR **`object of type 'int' has no len()`**. This specific error indicates the Excel JS API has returned an invalid handle.

If Causes 1 & 2 don't apply, force an API sync before `tables.add()` or `.number_format`:
```python
sheet["Z1"].value = "SYNC"  # Force write
_ = sheet["Z1"].value       # Force read roundtrip
sheet.tables.add(source=my_range)  # Now this works
sheet["Z1"].value = ""      # Cleanup
```

## 8. Comprehensive Guide to Limitations & Unsupported Features
This section provides a consolidated overview of all known limitations in xlwings Lite as of June 2025. Understanding these constraints is crucial for effective development.

### 8.1 Pyodide and Environment Constraints
xlwings Lite runs on Pyodide, which imposes several environment-level restrictions:
- **Python Version**: The Python version is fixed by the specific Pyodide distribution used in the add-in.
- **Memory Limit**: There is a 2GB memory limit for the Python environment.
- **Debugging**: There is no debugger support. Use `print()` statements to the Output Pane for debugging.
- **Concurrency**: `multiprocessing` and `threading` are not supported.
- **Package Availability**: Only packages that are pure Python or have been specifically compiled for the Pyodide environment can be used. Check the [official Pyodide packages list](https://pyodide.org/en/stable/usage/packages-in-pyodide.html) for availability.
- **Network Connections**: Direct TCP/IP sockets are not available. This means:
    - No direct connections to databases like PostgreSQL, MySQL, etc. (must use a web API layer).
    - All HTTP requests are subject to browser CORS (Cross-Origin Resource Sharing) policies.

### 8.2 Detailed Unsupported API Features (Anti-Hallucination List)
The following standard xlwings methods are **NOT** implemented in Lite. Do not attempt to use them.

| Object | Unsupported Methods/Properties (Common) |
|--------|-----------------------------------------|
| **App** | `quit()`, `screen_updating`, `calculation`, `display_alerts` |
| **Book** | `save()`, `to_pdf()`, `close()` |
| **Sheet** | `copy()`, `autofit()`, `used_range` (Use `find_table` helper instead), `select()` |
| **Range** | `copy()`, `paste()`, `merge()`, `unmerge()`, `autofill()`, `formula_array`, `to_png()` |
| **Chart** | `set_source_data()` (Must recreate chart to update), `to_png/pdf` |
| **Shape** | `delete()`, `duplicate()` |
| **Table** | `table.sheet` (**Fails silently** - use `find_table_in_workbook()` helper) |
| **Font** | Can **SET** (`bold`, `color`, `size`), but **CANNOT GET/READ** any font properties |
| **General** | `.api` property (No direct access to Excel JavaScript API) |

> **General Rule:** If you get `NotImplementedError`, the feature isn't available in Lite. Check the [xlwings Lite docs](https://docs.xlwings.org/en/latest/lite.html) for current status.

### 8.3 Planned Future Enhancements
The following features are on the development roadmap but are **not yet available** as of June 2025.

- **File System Access**:
  - ❌ No local file access
  - ❌ No direct file system operations
  - 🔄 Planned: Enable access to local files

- **Development Features**:
  - ❌ No interactive Python terminal
  - ❌ No multiple Python modules
  - ❌ No external code storage
  - ❌ Limited code completion
  - ❌ No dark mode
  - 🔄 Planned: All these features in development

- **Excel Integration**:
  - ❌ No streaming functions
  - ❌ No object handles
  - ❌ Can't use Excel calculated values in same script
  - ❌ Limited formatting and charting
  - 🔄 Planned: Improved Excel object model coverage

- **Advanced Features**:
  - ❌ No Git integration
  - ❌ No Jupyter/marimo notebook support
  - ❌ No backend server option
  - ❌ Fixed Pyodide version
  - 🔄 Planned: All these features in roadmap

> **Note:** When users request unavailable features, guide them to use available workarounds, consider alternative approaches, and watch for updates in newer versions.

## 9. Connecting to External Data & APIs
This section details how xlwings Lite interacts with external data sources, including web APIs and databases. Due to its browser-based environment (Pyodide), direct database connections are not supported; all interactions must occur via web APIs.

### 9.1 Working with Web APIs
xlwings Lite supports common Python HTTP libraries and Pyodide's native `pyfetch` for making web requests.

1.  **Supported Libraries**:
    *   `requests`: For synchronous HTTP requests.
    *   `httpx`, `aiohttp`: For asynchronous HTTP requests (requires `async/await` syntax).
    *   `pyfetch`: Pyodide's native asynchronous JavaScript fetch wrapper.

    ```python
    # Synchronous with requests
    import requests
    response = requests.get("https://api.example.com/data")
    
    # Async with aiohttp
    import aiohttp
    async with aiohttp.ClientSession() as session:
        async with session.get("https://api.example.com/data") as response:
            data = await response.json()
    ```

2.  **Handling API Responses**:
    *   For FastAPI servers returning file responses, use `await response.text()` to extract content.
    *   Pipe-delimited data (common in FastAPI file responses) can be parsed by splitting lines with `.split("
")` and columns with `.split("|")`.
    *   When working with RexDB server responses, process them as plain text rather than attempting to parse as JSON.

3.  **Best Practices for Web API Requests**:
    *   Always use HTTPS for API requests.
    *   Handle errors gracefully with `try...except` blocks.
    *   Log detailed error information for debugging.
    *   Consider implementing request retries for reliability.

### 9.2 Connecting to Databases via an API Layer
Direct SQL database connections are **not supported** in xlwings Lite (no TCP sockets in browser). All database interactions must go through a web API layer.

**Options:**
- **Custom API:** Build with FastAPI/Flask for full control
- **Ready-to-Use REST APIs:** PostgREST, Supabase (PostgreSQL), Oracle ORDS, MySQL REST Service, NocoDB
- **SQLite:** Can download `.db` file and process locally with `sqlite3` (add to requirements.txt)


## 10. Security Best Practices
Security is paramount when working with xlwings Lite, especially given its browser-based execution environment. This section outlines best practices for managing sensitive information and securing your API interactions.

### 10.1 Environment Variables for Secrets
xlwings Lite runs in a secure browser sandbox and cannot directly access local system environment variables. It provides two ways to set environment variables:

-   **Add-in Scope (Recommended for Secrets)**:
    -   Stored in the browser's local storage.
    -   Available across all workbooks.
    -   Never leaves your machine.
    -   **Use for API keys and sensitive secrets.**
    -   *Note*: These are cleared when the Office cache is cleared, so make backups!

-   **Workbook Scope**:
    -   Stored directly within the current workbook.
    -   **Not recommended for secrets** as they are embedded in the file.
    -   Specific to each workbook.

**Setting Environment Variables**:
1.  In the xlwings Lite add-in, navigate to the Environment Variables settings.
2.  Provide the Name (e.g., `OPENAI_API_KEY`), Value (e.g., `your-key`), and select the desired Scope (Add-in or Workbook).
3.  Click Save.
4.  Restart xlwings Lite for changes to take effect.

**Using Environment Variables in Code**:
```python
import os
import xlwings as xw
from xlwings import func, script

@script
def sample_script(book: xw.Book):
    key = os.getenv("OPENAI_API_KEY")
    if key is not None:
        print(key)
    else:
        raise Exception("Store your OPENAI_API_KEY key under Environment Variables!")
```

**Important Notes**:
-   Add-in scope overrides Workbook scope variables if names conflict.
-   Always back up important add-in scope variables.
-   Restart xlwings Lite after setting new variables to ensure they are loaded.

### 10.2 Cross-Origin Resource Sharing (CORS)
Since xlwings Lite runs in the browser, all HTTP requests are subject to CORS policies.

**Key Points:**
- Requests originate from `https://addin.xlwings.org` (include this in your server's `Access-Control-Allow-Origin`)
- Requests go directly from user's browser to your API server (not through xlwings.org)
- For IP whitelisting, whitelist the **client's IP**, not xlwings.org servers
- Include `Authorization` header in `pyfetch` for token-based auth

## 11. Python Dependencies Management
Packages are managed via `requirements.txt` tab in the editor. Packages must be Pyodide-compatible ([check list](https://pyodide.org/en/stable/usage/packages-in-pyodide.html)).

**Version Pinning Rules (IMPORTANT):**
```python
# Pure Python packages - PIN versions
xlwings==0.33.14
requests==2.31.0

# Pyodide-provided packages - DON'T PIN (use Pyodide's version)
pandas
numpy
```

**Notes:** Restart xlwings Lite after changing packages. Private packages can use direct wheel URLs.


## 12. Latest Features (as of June 2025)
Recent updates have added several important capabilities:

1. **Self-Hosting Support** (June 2025):
   - Build custom Docker images
   - Include additional packages
   - Self-host the add-in

2. **Sheet Button Support** (May 2025):
   - Create clickable buttons on sheets
   - Configure with `button` parameter
   - Requires xlwings 0.33.14+

3. **Performance Optimizations** (May 2025):
   - `include`/`exclude` configuration for scripts
   - Control workbook data transfer
   - Optimize for large workbooks

4. **Font Formatting** (April 2025):
   - Can now set font properties:
     - bold, italic, color
     - size, name
   - Note: Cannot read font properties

5. **Polars Support** (April 2025):
   - Native converter for Polars DataFrame and Series
   - Use `import polars as pl` (standard Pyodide package)
   - `sheet["A1"].value = pl_df` works directly without converting to pandas
   - Can offer performance gains over pandas in browser environment

6. **Bug Fixes and Improvements**:
   - Better error tracebacks in output pane
   - Fixed `Range.expand()`
  
## 13. Example Scripts (Quick Reference Patterns)

> **Note:** These examples focus on xlwings Lite-specific patterns only. For full working scripts, refer to your separate reference files.

### 13.1 Core Script Pattern
```python
@script
def my_script(book: xw.Book):
    # 1. Get data from table (ALWAYS use find_table_in_workbook helper)
    sheet, table = find_table_in_workbook(book, 'MyTable')
    df = table.range.options(pd.DataFrame, index=False).value

    # 2. Process data (pure Python/pandas)
    result_df = df.groupby('Category').agg({'Value': 'sum'}).reset_index()

    # 3. Write results - delete existing sheet first for re-runnability
    for s in book.sheets:
        if s.name == 'Results': s.delete()

    new_sheet = book.sheets.add(name='Results')
    new_sheet["A1"].options(index=False).value = result_df

    # 4. Create table with EXPLICIT range sizing (NEVER use .expand() on new data)
    table_range = new_sheet["A1"].resize(result_df.shape[0] + 1, result_df.shape[1])
    try:
        new_sheet.tables.add(source=table_range)
    except Exception as e:
        print(f"Table creation failed: {e}")

    new_sheet.activate()
```

### 13.2 Formatting (Hex Colors Only)
```python
header_range.color = '#4472C4'        # CORRECT: hex string
header_range.font.color = '#FFFFFF'
header_range.font.bold = True
# WRONG: header_range.color = (68, 114, 196)  # RGB tuples raise ValueError
```

### 13.3 Async API Pattern (pyfetch for External Data)
```python
from pyodide.http import pyfetch

@script
async def fetch_external_data(book: xw.Book):
    response = await pyfetch(
        "https://api.example.com/data",
        method="GET",
        headers={"Accept": "application/json"}
    )
    if response.ok:
        data = await response.json()
        # Process and write to Excel...
    else:
        print(f"API Error: {response.status}")
```

### 13.4 Writing Non-DataFrame Data (Cell-by-Cell)
```python
# WRONG: 2D list writes are unstable in Lite
# sheet["A1"].value = [["Label", value], ["Label2", value2]]

# CORRECT: Write atomically, one cell at a time
sheet["A1"].value = "Chi-Square:"
sheet["B1"].value = chi2_value
sheet["A2"].value = "P-Value:"
sheet["B2"].value = p_value
```

### 13.5 Mixed Data Type Prevention
```python
# WRONG: Mixed types cause InvalidArgument
report_rows = [
    {'Metric': 'Score', 'Value': '< 600'},
    {'Metric': 'Score', 'Value': 600},      # Number mixed with string!
]

# CORRECT: Convert entire column to string before writing
df = pd.DataFrame(report_rows)
df['Value'] = df['Value'].astype(str)
sheet["A1"].options(index=False).value = df
```

### 13.6 Matplotlib Charts to Excel via pictures.add()
```python
import matplotlib.pyplot as plt
import tempfile
import os

# 1. Create chart and save to temp file
fig, ax = plt.subplots(figsize=(8, 6))
ax.plot(df['DATE'], df['CLOSE'])
temp_dir = tempfile.gettempdir()
chart_path = os.path.join(temp_dir, "my_chart.png")
fig.savefig(chart_path, dpi=150, bbox_inches='tight')
plt.close(fig)

# 2. Add to Excel sheet
charts_sheet.pictures.add(chart_path, name="MyChart", update=True, anchor=charts_sheet["A1"])
# - update=True: Replaces existing picture with same name (re-runnable)
# - anchor: Cell reference for top-left corner positioning
```

### 13.7 Reading Variable-Length Data with .expand('down')
```python
# .expand() is SAFE for READING existing data (problematic only when WRITING new data)
names_range = sheet.range("A3").expand('down')  # Expands down to last non-empty cell
values = names_range.value  # Returns list if multiple values, single value if one cell

# Always handle both cases
if not isinstance(values, list):
    values = [values]  # Normalize to list
```

### 13.8 Robust API Data Parsing Pattern (Pipe-Delimited)
```python
# Use this helper when handling text/pipe-delimited responses from Web APIs
# Common when using FastAPI intermediaries to avoid JSON overhead or CORS issues
def _parse_pipe_delimited(text_content):
    """Parses pipe-delimited text response into a DataFrame. Robust to empty lines."""
    lines = text_content.strip().split("\n")
    if not lines or not lines[0]:
        return pd.DataFrame()

    headers = [h.strip() for h in lines[0].split("|")]
    data_rows = [
        [cell.strip() for cell in line.split("|")]
        for line in lines[1:] if line.strip()
    ]

    df = pd.DataFrame(data_rows, columns=headers)
    return df.apply(pd.to_numeric, errors='ignore')  # Auto-convert numeric columns

# Usage:
response = await pyfetch("https://myapi.com/data")
text_content = await response.text()
df = _parse_pipe_delimited(text_content)
```

<!-- END OF SECTION 13 - Full example scripts removed. Refer to separate reference files for complete implementations. -->