How I Identify and Map Every Holding — The ISIN Mapping Process

Published: February 18, 2026

ISIN Mapping Process

Every month, each mutual fund publishes an Excel file listing every stock, bond, and instrument they hold. The challenge? These files come from 15 different fund houses, each with different formats, naming conventions, and quirks. The same company can appear under dozens of different spellings. A stock split can change an identifier overnight. And short-term debt instruments like bank CDs can have 10 separate line items for the same bank.

This document explains — step by step — how I take these messy Excel files and turn them into a clean, comparable dataset where the same company always appears as one row, no matter which fund holds it or how they spell the name.


Step 1: Building the Reference Database (ISIN Master)

Every security traded in India has a unique 12-character code called an ISIN (International Securities Identification Number), assigned by NSDL (National Securities Depository Limited). Think of it as a universal ID card for every stock, bond, and instrument.

I download two public datasets and merge them into a single reference table:

NSDL Master — The complete register of every ISIN ever issued in India. Over 361,000 records covering equities, bonds, mutual fund units, certificates of deposit, commercial papers, treasury bills, government securities, and more. Each record has the ISIN, the official NSDL name, the issuer, and the security type.

NSE Equity List — The list of all stocks currently listed on the National Stock Exchange, with their trading symbols (like RELIANCE, TCS, HDFCBANK). About 2,000 records.

I merge these by joining on the ISIN code. The result: for every Indian security, I know its official name, and for every NSE-listed equity, I also know its trading symbol.

This reference table is the backbone of the entire process. When the converter encounters an ISIN in a fund's Excel file, it looks it up here to get the canonical name and trading symbol.


Step 2: Reading the Excel Files

Each AMC (Asset Management Company) uses a different Excel layout. Column positions vary, header rows are in different places, and some AMCs have quirks like putting data across multiple sheets.

The converter handles this automatically:

  1. Format detection — AMC websites sometimes serve files with the wrong extension (a .xlsx file that's actually in the old .xls binary format, or vice versa). The converter reads the first few bytes of the file to detect the actual format, regardless of the file extension.

  2. Schema auto-detection — The converter scans each sheet row by row, looking for a header row that contains the word "ISIN". Once found, it identifies which columns contain the company name, ISIN, quantity, and market value by matching keywords in the header cells.

  3. Grand total extraction — The converter finds the "Grand Total" row in the Excel file and reads the total market value. This number is used later for validation — my calculated total must match this exactly.

For example, Axis funds have the ISIN in column C, name in column B, and market value in column F. HDFC funds have ISIN in column B, name in column D, and market value in column G. Kotak funds are completely different again. The converter handles all of these automatically, with manual overrides for the few AMCs that don't follow any standard pattern.


Step 3: Identifying Each Holding — ISIN Lookup

For every row in the Excel file, the converter takes the ISIN and looks it up in the reference database.

If the ISIN is found — The lookup returns the official NSDL name and NSE symbol. This is the most common case for equity holdings.

For example, fund houses might spell the same company three different ways:

But they all report the same ISIN: INE009A01021. Looking this up in the NSDL master gives us the canonical name "INFOSYS LIMITED" and the NSE symbol "INFY". Problem solved — all three are now identified as the same company.

If the ISIN is not in the master — This can happen for very recent issuances or private placements. The converter falls back to the name as written in the Excel file.

If there is no valid ISIN at all — Many non-equity items (cash, TREPS, net receivables, futures, margins) don't have standard ISINs. These get assigned a placeholder code IN9999999999 and are handled separately (see Step 6 below).


Step 4: Understanding ISIN Structure — How I Detect Corporate Actions

An Indian ISIN isn't just a random code. It has a specific structure that encodes information about the security:

INE 238A 01 02 8
│   │    │  │  └─ Check digit
│   │    │  └──── Serial number (increments on corporate actions)
│   │    └─────── Security type (01 = equity, D6/16 = CD, 14 = CP)
│   └──────────── Issuer code (unique per company)
└──────────────── Country code + issuer type

Real example — Kotak Mahindra Bank face value split:

When a company does a stock split, bonus issue, or rights issue, NSDL issues a new ISIN with the serial number incremented by 1. The issuer code stays the same because it's still the same company. This is how I detect corporate actions automatically.

More real examples from my data:


Step 5: The Name-Cut Process — Finding Potential Duplicates

Corporate action detection relies on two ISINs being from the same company. But how do I even know which ISINs to compare? Comparing every ISIN against every other ISIN would be impractical with 7,000+ holdings.

This is where the name-cut process comes in.

How it works:

  1. Take the company name from the NSDL master (or the Excel file)
  2. Convert to lowercase and remove all spaces, punctuation, and special characters
  3. Take the first 7 characters

This gives us a "name-cut" — a short fingerprint of the company name.

Examples:

Original Name After Cleanup Name-Cut (7 chars)
AXIS BANK LIMITED axisbanklimited axisban
AXIS BANK CD 08JAN26 axisbankcd08jan26 axisban
KOTAK MAHINDRA BANK kotakmahindrabank kotakma
KOTAK MAHINDRA PRIME kotakmahindraprime kotakma
GOVERNMENT OF INDIA 364D TBILL governmentofindia364dtbill governm

Why 7 characters? It's a deliberate balance. Short enough to catch the same company across different instrument types (Axis Bank equity and Axis Bank CDs both become "axisban"). Long enough to mostly avoid false positives between unrelated companies.

Grouping by name-cut: All ISINs with the same 7-character name-cut are placed into the same group. These are the candidates for being "the same company under different ISINs."

Within each group, the algorithm does a deeper analysis using the ISIN structure:

5a. Same Issuer Code + Equity Type + Different Serial → Corporate Action

If two ISINs in the group have the same issuer code (positions 4–7) and both are equity type (01), but different serial numbers — this is a corporate action. The newer serial is the current ISIN; the older one is mapped to it.

5b. Same Issuer Code + CD Type → Certificate of Deposit Aggregation

If multiple ISINs share the same issuer code but have security type D6 or 16 (Certificate of Deposit), they are separate CD issuances from the same bank. These get aggregated into a single synthetic entry.

5c. Same Issuer Code + CP Type → Commercial Paper Aggregation

Same logic for security type 14 (Commercial Paper).

5d. Different Issuer Codes → No Action

If the group contains ISINs from different issuer codes, they are genuinely different companies that happen to share a similar name prefix. For example, "KOTAK MAHINDRA BANK" (issuer 237A) and "KOTAK MAHINDRA PRIME" (issuer 476A) both have name-cut "kotakma" but are different entities. These are flagged as NO_ACTION and left as separate holdings.


Step 6: Handling Non-Equity Items — Cash, TREPS, and Foreign Stocks

Equity mutual funds don't just hold stocks. They also hold various non-equity items for cash management, hedging, and regulatory compliance. These items typically don't have standard Indian ISINs.

When the converter encounters a holding without a valid Indian ISIN, I apply a three-step classification:

6a. Foreign Stock Check

Some equity funds (notably Parag Parikh Flexi Cap, SBI Contra, SBI Focused, ICICI Value) hold overseas stocks like Alphabet (Google), Meta, Microsoft, and Amazon. These have valid ISINs — just not Indian ones. For example, Alphabet's ISIN is US02079K3059.

The converter checks the raw ISIN from the Excel file against a curated list of known foreign stocks. If matched, the original US/foreign ISIN is preserved as-is. This allows the dashboard to identify and group foreign holdings correctly.

Current foreign stock holdings across my tracked funds:

Company ISIN Held By
Alphabet Inc (Google) US02079K3059 PPFAS Flexi Cap, SBI Contra
Meta Platforms US30303M1027 PPFAS Flexi Cap
Microsoft Corp US5949181045 PPFAS Flexi Cap
Amazon.com Inc US0231351067 PPFAS Flexi Cap
EPAM Systems US29414B1044 SBI Focused, SBI Contra
Cognizant Technology US1924461023 SBI Contra

6b. TREPS Check

TREPS (Tri-Party Repo) is an overnight lending instrument where mutual funds lend money to other financial institutions through the Clearing Corporation of India. Almost every fund holds TREPS as a cash management tool.

Different fund houses report TREPS under different names: "TREPS", "Tri-party Repo", "Triparty Repo", "CBLO", "Collateralized Borrowing", and others. The converter matches against a list of known patterns and groups all of them under a single synthetic ISIN: SYNTREPS00001.

6c. Everything Else → Cash & Other Assets

Items that are neither foreign stocks nor TREPS — such as net receivables, derivative margins, cash balances, and reverse repos — are grouped under the placeholder IN9999999999 with the display name "Cash & Other Assets."

The AI-in-the-Loop Review

After each monthly conversion, I run a cash classification report that scans all items classified as "Cash & Other Assets" for potential misclassifications:

This ensures the classification stays accurate as fund houses change their reporting formats.


Step 7: Synthetic ISINs — Creating Virtual Identifiers

When multiple instruments are aggregated into a single entry, I need a new identifier for the group. I call these synthetic ISINs — they follow a naming convention that encodes the source:

Type Format Example Represents
Certificate of Deposit SYN{issuer}CD01 SYN238ACD01 All Axis Bank CDs
Commercial Paper SYN{issuer}CP01 SYN296ACP01 All Bajaj Finance CPs
Treasury Bills SYNGOITBILL01 SYNGOITBILL01 All GOI T-Bills
Government Securities SYNGOIGSEC01 SYNGOIGSEC01 All GOI G-Secs
TREPS SYNTREPS00001 SYNTREPS00001 All TREPS instruments

The {issuer} in CD and CP synthetic ISINs is the 4-character issuer code from the original ISIN. This means each bank's CDs get their own synthetic ISIN — Axis Bank CDs, HDFC Bank CDs, and ICICI Bank CDs are separate entries, not lumped together.

Why aggregate CDs and CPs? A single bank like HDFC Bank might have 5–10 separate CDs with different maturity dates, each with its own ISIN. For an equity fund analysis, knowing that the fund holds "₹500 Cr in HDFC Bank CDs" is more useful than seeing 8 separate line items for each maturity date. The individual issuances are still available in the drill-down modal.

Why aggregate GOI T-Bills and G-Secs? Similarly, a fund might hold treasury bills across 4–5 different auction dates. Grouping them into "GOI T-BILL" gives a cleaner picture of the fund's government securities exposure.


Step 8: The Mapping File — Human-in-the-Loop Review

After the algorithm generates all mappings, it writes two files:

The Validation File (for human review)

A detailed text file with every potential duplicate detected, organized by category. Each entry shows:

This file is reviewed by a human before the mappings are applied. The reviewer checks:

The Mapping File (consumed by the converter)

A compact file with only the actionable mappings — one line per ISIN that needs to be remapped. This is the file that the converter reads on subsequent runs to apply the mappings.

Current mapping statistics (as of Feb 2026):


Step 9: Applying Mappings — The Re-Conversion

Here's an important detail: ISIN mapping is not incremental. Every time I add a new month's data, I regenerate mappings from the entire database and then re-convert every single month.

Why? Because a new month can introduce a corporate action that affects older data. For example:

In December 2025, Fund A holds BEML with ISIN INE258A01016. In January 2026, BEML does a stock split, and Fund B (which bought after the split) reports ISIN INE258A01024. If I only mapped January's data, December's BEML would still be on the old ISIN, and the dashboard would show them as two different companies.

By re-mapping the full database, December's BEML gets correctly mapped to the new ISIN, and both months show a single BEML entry.

The re-conversion process:

  1. Generate fresh mappings from the full database
  2. Re-convert every month × every category (e.g., 4 months × 6 categories = 24 runs)
  3. Each run replaces the data for that fund-month combination in the database

Step 10: Validation — Making Sure Nothing Is Lost

After all conversions are complete, I run a validation check on every fund-month combination. For each one:

  1. Read the "Grand Total" market value from the original Excel file
  2. Sum all market values for that fund-month in the database
  3. Compare the two numbers

They must match exactly (0.0000% difference). This confirms that no rows were accidentally skipped, no values were misread, and no data was corrupted during processing.

Current results (96 fund-month combinations):

These three exceptions are documented and verified. Every other fund-month is an exact match down to the paisa.


Step 11: Frontend Display — Virtual Grouping in the Browser

The processed database is served to your browser as a compact SQLite file (~2.5 MB). All the ISIN mapping, name resolution, and aggregation described above has already been applied in the data.

In the browser, one additional grouping step happens: non-equity items are organized into four category rows at the bottom of the comparison table:

Category What's In It Why Separate
Liquidity (TREPS, Repos) TREPS, reverse repos, treasury bills Near-cash instruments for daily cash management
Debt (CD, CP, GSecs) Certificates of deposit, commercial papers, government securities Interest-bearing debt instruments
Foreign Stocks (Alphabet, Meta, Amazon etc.) Overseas equity holdings Non-Indian equities identified by their US/GB/KY ISIN prefix
Others Cash balances, net receivables, margins, derivatives Everything else

Each row shows the combined value across all selected funds. Clicking any row opens a drill-down modal where you can see:

This means the aggregation keeps the main table clean while preserving full granularity in the drill-down.


Step 12: ISIN Mapping Disclosure in the Frontend

Transparency is a core principle. Every holding where an ISIN mapping has been applied is tagged with an ISN badge in the comparison table. This small grey label appears next to the company name.

Clicking the ISN badge (or the company row) opens a modal showing:

For example, clicking the ISN badge on "BEML LIMITED" would show:

Similarly, clicking "AXIS BANK CD" would show all 8 individual CD issuances with their original ISINs, maturity dates, and market values — aggregated into a single row for clarity but fully transparent.

The complete list of all ISIN mappings is also available in the ISIN Remaps tab in the Methodology section, where every single mapping record can be reviewed.


Summary: The Full Pipeline

Here's the complete flow from raw Excel to dashboard display:

  1. Download — Collect monthly portfolio Excel files from 15 AMC websites
  2. Read — Auto-detect format and schema, extract holdings data
  3. Identify — Look up each ISIN in the 361,000-record NSDL master
  4. Classify cash — Sort non-ISIN items into foreign stocks, TREPS, or other
  5. Name-cut grouping — Group ISINs by 7-character name prefix to find potential duplicates
  6. Detect corporate actions — Same issuer + same equity type + different serial = stock split or bonus
  7. Aggregate debt instruments — Group CDs, CPs, T-Bills, G-Secs by issuer with synthetic ISINs
  8. Human review — Review the validation file, verify mappings are correct
  9. Apply mappings — Re-convert all months with fresh mappings
  10. Validate — Confirm Excel totals match database totals exactly
  11. Deploy — Push the ~2.5 MB frontend database to production
  12. Display — Browser groups non-equity items into 4 categories with full drill-down