top of page

LlamaParse complex PDF / 10K / Annual Reports. Analyze with Excel. Push into RAG Pipline

Updated: Apr 7

Extracting data, especially table data, from complex PDFs with tables used to be a challenge. But with the launch of LlamaParse by LlamaIndex, that period is now over.



Originally published on LinkedIn. Embedded post below.



Note for 𝗱𝗲𝘃𝗲𝗹𝗼𝗽𝗲𝗿𝘀 𝗱𝗼𝗶𝗻𝗴 𝘁𝗵𝗲 𝗰𝗼𝗻𝘃𝗲𝗿𝘀𝗶𝗼𝗻 𝘁𝗵𝗲𝗺𝘀𝗲𝗹𝘃𝗲𝘀 𝘄𝗶𝘁𝗵 𝗣𝘆𝘁𝗵𝗼𝗻/𝗝𝗦 𝘀𝗰𝗿𝗶𝗽𝘁𝘀:


- The API call works faster than the Python package.


- Chunking the file before parsing improves speeds.


- Currently, around 50 pages seems to be the optimal chunk size.


- Parsing is faster when done in 50-page chunks versus the full file at once, even for say a 100-page report.


- Tested chunk sizes between 25 to 100 pages, with less than 50 or more than 50 pages increasing the conversion time.


- However, all this can change rapidly as LlamaParse is evolving quickly. For example, just a few days back they increased file size limit from 200 to 700 pages.

23 views
bottom of page