How I Identify and Map Every Holding — The ISIN Mapping Process
Published: February 18, 2026

Every month, each mutual fund publishes an Excel file listing every stock, bond, and instrument they hold. The challenge? These files come from 15 different fund houses, each with different formats, naming conventions, and quirks. The same company can appear under dozens of different spellings. A stock split can change an identifier overnight. And short-term debt instruments like bank CDs can have 10 separate line items for the same bank.
This document explains — step by step — how I take these messy Excel files and turn them into a clean, comparable dataset where the same company always appears as one row, no matter which fund holds it or how they spell the name.
Step 1: Building the Reference Database (ISIN Master)
Every security traded in India has a unique 12-character code called an ISIN (International Securities Identification Number), assigned by NSDL (National Securities Depository Limited). Think of it as a universal ID card for every stock, bond, and instrument.
I download two public datasets and merge them into a single reference table:
NSDL Master — The complete register of every ISIN ever issued in India. Over 361,000 records covering equities, bonds, mutual fund units, certificates of deposit, commercial papers, treasury bills, government securities, and more. Each record has the ISIN, the official NSDL name, the issuer, and the security type.
NSE Equity List — The list of all stocks currently listed on the National Stock Exchange, with their trading symbols (like RELIANCE, TCS, HDFCBANK). About 2,000 records.
I merge these by joining on the ISIN code. The result: for every Indian security, I know its official name, and for every NSE-listed equity, I also know its trading symbol.
This reference table is the backbone of the entire process. When the converter encounters an ISIN in a fund's Excel file, it looks it up here to get the canonical name and trading symbol.
Step 2: Reading the Excel Files
Each AMC (Asset Management Company) uses a different Excel layout. Column positions vary, header rows are in different places, and some AMCs have quirks like putting data across multiple sheets.
The converter handles this automatically:
Format detection — AMC websites sometimes serve files with the wrong extension (a .xlsx file that's actually in the old .xls binary format, or vice versa). The converter reads the first few bytes of the file to detect the actual format, regardless of the file extension.
Schema auto-detection — The converter scans each sheet row by row, looking for a header row that contains the word "ISIN". Once found, it identifies which columns contain the company name, ISIN, quantity, and market value by matching keywords in the header cells.
Grand total extraction — The converter finds the "Grand Total" row in the Excel file and reads the total market value. This number is used later for validation — my calculated total must match this exactly.
For example, Axis funds have the ISIN in column C, name in column B, and market value in column F. HDFC funds have ISIN in column B, name in column D, and market value in column G. Kotak funds are completely different again. The converter handles all of these automatically, with manual overrides for the few AMCs that don't follow any standard pattern.
Step 3: Identifying Each Holding — ISIN Lookup
For every row in the Excel file, the converter takes the ISIN and looks it up in the reference database.
If the ISIN is found — The lookup returns the official NSDL name and NSE symbol. This is the most common case for equity holdings.
For example, fund houses might spell the same company three different ways:
- "Infosys Ltd" (one fund)
- "Infosys Limited" (another fund)
- "INFOSYS LTD." (a third fund)
But they all report the same ISIN: INE009A01021. Looking this up in the NSDL master gives us the canonical name "INFOSYS LIMITED" and the NSE symbol "INFY". Problem solved — all three are now identified as the same company.
If the ISIN is not in the master — This can happen for very recent issuances or private placements. The converter falls back to the name as written in the Excel file.
If there is no valid ISIN at all — Many non-equity items (cash, TREPS, net receivables, futures, margins) don't have standard ISINs. These get assigned a placeholder code IN9999999999 and are handled separately (see Step 6 below).
Step 4: Understanding ISIN Structure — How I Detect Corporate Actions
An Indian ISIN isn't just a random code. It has a specific structure that encodes information about the security:
INE 238A 01 02 8
│ │ │ │ └─ Check digit
│ │ │ └──── Serial number (increments on corporate actions)
│ │ └─────── Security type (01 = equity, D6/16 = CD, 14 = CP)
│ └──────────── Issuer code (unique per company)
└──────────────── Country code + issuer type
Real example — Kotak Mahindra Bank face value split:
- Old ISIN:
INE237A01028— issuer code237A, security type01(equity), serial02 - New ISIN:
INE237A01036— same issuer237A, same security type01, serial03(incremented)
When a company does a stock split, bonus issue, or rights issue, NSDL issues a new ISIN with the serial number incremented by 1. The issuer code stays the same because it's still the same company. This is how I detect corporate actions automatically.
More real examples from my data:
- BEML Limited —
INE258A01016(serial 01) →INE258A01024(serial 02) after a face value split from Rs 10 to Rs 5 - Computer Age Management Services —
INE596I01012(serial 01) →INE596I01020(serial 02) after a bonus issue - Motilal Oswal Financial Services —
INE338I01027(serial 02) →INE338I01035(serial 03) after a face value split
Step 5: The Name-Cut Process — Finding Potential Duplicates
Corporate action detection relies on two ISINs being from the same company. But how do I even know which ISINs to compare? Comparing every ISIN against every other ISIN would be impractical with 7,000+ holdings.
This is where the name-cut process comes in.
How it works:
- Take the company name from the NSDL master (or the Excel file)
- Convert to lowercase and remove all spaces, punctuation, and special characters
- Take the first 7 characters
This gives us a "name-cut" — a short fingerprint of the company name.
Examples:
| Original Name | After Cleanup | Name-Cut (7 chars) |
|---|---|---|
| AXIS BANK LIMITED | axisbanklimited | axisban |
| AXIS BANK CD 08JAN26 | axisbankcd08jan26 | axisban |
| KOTAK MAHINDRA BANK | kotakmahindrabank | kotakma |
| KOTAK MAHINDRA PRIME | kotakmahindraprime | kotakma |
| GOVERNMENT OF INDIA 364D TBILL | governmentofindia364dtbill | governm |
Why 7 characters? It's a deliberate balance. Short enough to catch the same company across different instrument types (Axis Bank equity and Axis Bank CDs both become "axisban"). Long enough to mostly avoid false positives between unrelated companies.
Grouping by name-cut: All ISINs with the same 7-character name-cut are placed into the same group. These are the candidates for being "the same company under different ISINs."
Within each group, the algorithm does a deeper analysis using the ISIN structure:
5a. Same Issuer Code + Equity Type + Different Serial → Corporate Action
If two ISINs in the group have the same issuer code (positions 4–7) and both are equity type (01), but different serial numbers — this is a corporate action. The newer serial is the current ISIN; the older one is mapped to it.
5b. Same Issuer Code + CD Type → Certificate of Deposit Aggregation
If multiple ISINs share the same issuer code but have security type D6 or 16 (Certificate of Deposit), they are separate CD issuances from the same bank. These get aggregated into a single synthetic entry.
5c. Same Issuer Code + CP Type → Commercial Paper Aggregation
Same logic for security type 14 (Commercial Paper).
5d. Different Issuer Codes → No Action
If the group contains ISINs from different issuer codes, they are genuinely different companies that happen to share a similar name prefix. For example, "KOTAK MAHINDRA BANK" (issuer 237A) and "KOTAK MAHINDRA PRIME" (issuer 476A) both have name-cut "kotakma" but are different entities. These are flagged as NO_ACTION and left as separate holdings.
Step 6: Handling Non-Equity Items — Cash, TREPS, and Foreign Stocks
Equity mutual funds don't just hold stocks. They also hold various non-equity items for cash management, hedging, and regulatory compliance. These items typically don't have standard Indian ISINs.
When the converter encounters a holding without a valid Indian ISIN, I apply a three-step classification:
6a. Foreign Stock Check
Some equity funds (notably Parag Parikh Flexi Cap, SBI Contra, SBI Focused, ICICI Value) hold overseas stocks like Alphabet (Google), Meta, Microsoft, and Amazon. These have valid ISINs — just not Indian ones. For example, Alphabet's ISIN is US02079K3059.
The converter checks the raw ISIN from the Excel file against a curated list of known foreign stocks. If matched, the original US/foreign ISIN is preserved as-is. This allows the dashboard to identify and group foreign holdings correctly.
Current foreign stock holdings across my tracked funds:
| Company | ISIN | Held By |
|---|---|---|
| Alphabet Inc (Google) | US02079K3059 | PPFAS Flexi Cap, SBI Contra |
| Meta Platforms | US30303M1027 | PPFAS Flexi Cap |
| Microsoft Corp | US5949181045 | PPFAS Flexi Cap |
| Amazon.com Inc | US0231351067 | PPFAS Flexi Cap |
| EPAM Systems | US29414B1044 | SBI Focused, SBI Contra |
| Cognizant Technology | US1924461023 | SBI Contra |
6b. TREPS Check
TREPS (Tri-Party Repo) is an overnight lending instrument where mutual funds lend money to other financial institutions through the Clearing Corporation of India. Almost every fund holds TREPS as a cash management tool.
Different fund houses report TREPS under different names: "TREPS", "Tri-party Repo", "Triparty Repo", "CBLO", "Collateralized Borrowing", and others. The converter matches against a list of known patterns and groups all of them under a single synthetic ISIN: SYNTREPS00001.
6c. Everything Else → Cash & Other Assets
Items that are neither foreign stocks nor TREPS — such as net receivables, derivative margins, cash balances, and reverse repos — are grouped under the placeholder IN9999999999 with the display name "Cash & Other Assets."
The AI-in-the-Loop Review
After each monthly conversion, I run a cash classification report that scans all items classified as "Cash & Other Assets" for potential misclassifications:
- New foreign stocks — If a US ISIN appears in the cash bucket, it means a fund has added a new overseas holding that isn't in my list yet. The report flags it for addition.
- New TREPS variants — If a name containing "repo", "trep", "clearing", or "cblo" appears in cash, a fund house may be using a new naming convention for TREPS. The report flags it for review.
This ensures the classification stays accurate as fund houses change their reporting formats.
Step 7: Synthetic ISINs — Creating Virtual Identifiers
When multiple instruments are aggregated into a single entry, I need a new identifier for the group. I call these synthetic ISINs — they follow a naming convention that encodes the source:
| Type | Format | Example | Represents |
|---|---|---|---|
| Certificate of Deposit | SYN{issuer}CD01 |
SYN238ACD01 |
All Axis Bank CDs |
| Commercial Paper | SYN{issuer}CP01 |
SYN296ACP01 |
All Bajaj Finance CPs |
| Treasury Bills | SYNGOITBILL01 |
SYNGOITBILL01 |
All GOI T-Bills |
| Government Securities | SYNGOIGSEC01 |
SYNGOIGSEC01 |
All GOI G-Secs |
| TREPS | SYNTREPS00001 |
SYNTREPS00001 |
All TREPS instruments |
The {issuer} in CD and CP synthetic ISINs is the 4-character issuer code from the original ISIN. This means each bank's CDs get their own synthetic ISIN — Axis Bank CDs, HDFC Bank CDs, and ICICI Bank CDs are separate entries, not lumped together.
Why aggregate CDs and CPs? A single bank like HDFC Bank might have 5–10 separate CDs with different maturity dates, each with its own ISIN. For an equity fund analysis, knowing that the fund holds "₹500 Cr in HDFC Bank CDs" is more useful than seeing 8 separate line items for each maturity date. The individual issuances are still available in the drill-down modal.
Why aggregate GOI T-Bills and G-Secs? Similarly, a fund might hold treasury bills across 4–5 different auction dates. Grouping them into "GOI T-BILL" gives a cleaner picture of the fund's government securities exposure.
Step 8: The Mapping File — Human-in-the-Loop Review
After the algorithm generates all mappings, it writes two files:
The Validation File (for human review)
A detailed text file with every potential duplicate detected, organized by category. Each entry shows:
- The name-cut group that triggered the analysis
- The category (CORPORATE_ACTION, CD_AGGREGATE, CP_AGGREGATE, TBILL_AGGREGATE, GSEC_AGGREGATE, or NO_ACTION)
- The action taken (MAP, AGGREGATE, or NO_ACTION)
- Both the original and mapped ISINs
- The original and mapped names
- The total market value affected
- The reasoning
This file is reviewed by a human before the mappings are applied. The reviewer checks:
- Are the corporate action mappings correct? (Is the old ISIN really the same company as the new one?)
- Are the CD/CP aggregations grouping the right instruments? (Is "AXIS BANK CD 08JAN26" really from Axis Bank?)
- Are the NO_ACTION items correctly identified as different companies?
The Mapping File (consumed by the converter)
A compact file with only the actionable mappings — one line per ISIN that needs to be remapped. This is the file that the converter reads on subsequent runs to apply the mappings.
Current mapping statistics (as of Feb 2026):
- 9 corporate action mappings (face value splits, bonus issues)
- 114 CD aggregations (across ~15 banks)
- 9 CP aggregations (across ~5 companies)
- 61 T-Bill aggregations (various auction dates)
- 4 G-Sec aggregations (various coupon rates and maturities)
- 62 NO_ACTION records (different companies with similar names — verified and left alone)
Step 9: Applying Mappings — The Re-Conversion
Here's an important detail: ISIN mapping is not incremental. Every time I add a new month's data, I regenerate mappings from the entire database and then re-convert every single month.
Why? Because a new month can introduce a corporate action that affects older data. For example:
In December 2025, Fund A holds BEML with ISIN
INE258A01016. In January 2026, BEML does a stock split, and Fund B (which bought after the split) reports ISININE258A01024. If I only mapped January's data, December's BEML would still be on the old ISIN, and the dashboard would show them as two different companies.
By re-mapping the full database, December's BEML gets correctly mapped to the new ISIN, and both months show a single BEML entry.
The re-conversion process:
- Generate fresh mappings from the full database
- Re-convert every month × every category (e.g., 4 months × 6 categories = 24 runs)
- Each run replaces the data for that fund-month combination in the database
Step 10: Validation — Making Sure Nothing Is Lost
After all conversions are complete, I run a validation check on every fund-month combination. For each one:
- Read the "Grand Total" market value from the original Excel file
- Sum all market values for that fund-month in the database
- Compare the two numbers
They must match exactly (0.0000% difference). This confirms that no rows were accidentally skipped, no values were misread, and no data was corrupted during processing.
Current results (96 fund-month combinations):
- 93 exact matches (0.0000% difference)
- 3 known expected differences:
- Kotak Flexicap Sep 2025: Rs 10,414 difference (0.19%) — Kotak's Grand Total excludes futures mark-to-market adjustments, but I include the futures rows
- Kotak Midcap Jan 2026: Rs 13,391 difference (0.23%) — Same Kotak futures issue
- PPFAS Flexicap Sep 2025: Rs 15.06 difference (0.0001%) — PPFAS hardcodes their Grand Total cell instead of using a SUM formula; my sum matches their section sub-totals exactly
These three exceptions are documented and verified. Every other fund-month is an exact match down to the paisa.
Step 11: Frontend Display — Virtual Grouping in the Browser
The processed database is served to your browser as a compact SQLite file (~2.5 MB). All the ISIN mapping, name resolution, and aggregation described above has already been applied in the data.
In the browser, one additional grouping step happens: non-equity items are organized into four category rows at the bottom of the comparison table:
| Category | What's In It | Why Separate |
|---|---|---|
| Liquidity (TREPS, Repos) | TREPS, reverse repos, treasury bills | Near-cash instruments for daily cash management |
| Debt (CD, CP, GSecs) | Certificates of deposit, commercial papers, government securities | Interest-bearing debt instruments |
| Foreign Stocks (Alphabet, Meta, Amazon etc.) | Overseas equity holdings | Non-Indian equities identified by their US/GB/KY ISIN prefix |
| Others | Cash balances, net receivables, margins, derivatives | Everything else |
Each row shows the combined value across all selected funds. Clicking any row opens a drill-down modal where you can see:
- By Fund — How much each fund holds in this category
- By Instrument Group — Breakdown by type (e.g., CDs vs CPs within Debt)
- By Individual Instrument — Every single CD issuance, every T-Bill, every cash item
This means the aggregation keeps the main table clean while preserving full granularity in the drill-down.
Step 12: ISIN Mapping Disclosure in the Frontend
Transparency is a core principle. Every holding where an ISIN mapping has been applied is tagged with an ISN badge in the comparison table. This small grey label appears next to the company name.
Clicking the ISN badge (or the company row) opens a modal showing:
- The original ISIN(s) from each fund's Excel file
- The mapped ISIN after processing
- The mapping category (Corporate Action, CD Aggregate, etc.)
- The specific reason for the mapping
- Which funds held which ISINs
- The market values involved
For example, clicking the ISN badge on "BEML LIMITED" would show:
- Fund A reported ISIN
INE258A01016(old, pre-split) - Fund B reported ISIN
INE258A01024(new, post-split) - Mapping: Corporate Action — serial 01 → 02, face value split from Rs 10 to Rs 5
- Both ISINs now grouped under the newer ISIN
Similarly, clicking "AXIS BANK CD" would show all 8 individual CD issuances with their original ISINs, maturity dates, and market values — aggregated into a single row for clarity but fully transparent.
The complete list of all ISIN mappings is also available in the ISIN Remaps tab in the Methodology section, where every single mapping record can be reviewed.
Summary: The Full Pipeline
Here's the complete flow from raw Excel to dashboard display:
- Download — Collect monthly portfolio Excel files from 15 AMC websites
- Read — Auto-detect format and schema, extract holdings data
- Identify — Look up each ISIN in the 361,000-record NSDL master
- Classify cash — Sort non-ISIN items into foreign stocks, TREPS, or other
- Name-cut grouping — Group ISINs by 7-character name prefix to find potential duplicates
- Detect corporate actions — Same issuer + same equity type + different serial = stock split or bonus
- Aggregate debt instruments — Group CDs, CPs, T-Bills, G-Secs by issuer with synthetic ISINs
- Human review — Review the validation file, verify mappings are correct
- Apply mappings — Re-convert all months with fresh mappings
- Validate — Confirm Excel totals match database totals exactly
- Deploy — Push the ~2.5 MB frontend database to production
- Display — Browser groups non-equity items into 4 categories with full drill-down