top of page

LlamaParse complex PDF / 10K / Annual Reports. Analyze with Excel. Push into RAG Pipline

Updated: Jul 31



app.tigzig.com - my open-source platform with 25+ micro-apps and tooling's for AI driven analytics and data science.


Including a Llama Parse PDF to Markdown converter



-----------------------------

Extracting data, especially table data, from complex PDFs with tables used to be a challenge. But with the launch of LlamaParse by LlamaIndex, that period is now over.




se



Originally published on LinkedIn. Embedded post below.



Note for 𝗱𝗲𝘃𝗲𝗹𝗼𝗽𝗲𝗿𝘀 𝗱𝗼𝗶𝗻𝗴 𝘁𝗵𝗲 𝗰𝗼𝗻𝘃𝗲𝗿𝘀𝗶𝗼𝗻 𝘁𝗵𝗲𝗺𝘀𝗲𝗹𝘃𝗲𝘀 𝘄𝗶𝘁𝗵 𝗣𝘆𝘁𝗵𝗼𝗻/𝗝𝗦 𝘀𝗰𝗿𝗶𝗽𝘁𝘀:


- The API call works faster than the Python package.


- Chunking the file before parsing improves speeds.


- Currently, around 50 pages seems to be the optimal chunk size.


- Parsing is faster when done in 50-page chunks versus the full file at once, even for say a 100-page report.


- Tested chunk sizes between 25 to 100 pages, with less than 50 or more than 50 pages increasing the conversion time.


- However, all this can change rapidly as LlamaParse is evolving quickly. For example, just a few days back they increased file size limit from 200 to 700 pages.

 
 

Recent Posts

See All
"𝘌𝘹𝘦𝘤𝘶𝘵𝘦 𝘈𝘚𝘈𝘗. 𝘈𝘱𝘱𝘳𝘰𝘷𝘢𝘭 𝘨𝘳𝘢𝘯𝘵𝘦𝘥. 𝘎𝘰𝘰𝘨𝘭𝘦 - 𝘢𝘨𝘢𝘪𝘯𝘴𝘵 𝘔𝘪𝘤𝘳𝘰𝘴𝘰𝘧𝘵 & 𝘔𝘦𝘵𝘢 ..."

"𝘌𝘹𝘦𝘤𝘶𝘵𝘦 𝘈𝘚𝘈𝘗. 𝘈𝘱𝘱𝘳𝘰𝘷𝘢𝘭 𝘨𝘳𝘢𝘯𝘵𝘦𝘥. 𝘎𝘰𝘰𝘨𝘭𝘦 - 𝘢𝘨𝘢𝘪𝘯𝘴𝘵 𝘔𝘪𝘤𝘳𝘰𝘴𝘰𝘧𝘵 & 𝘔𝘦𝘵𝘢 past 15 𝘺𝘦𝘢𝘳𝘴, 𝘣𝘦𝘯𝘤𝘩𝘮𝘢𝘳𝘬 𝘷𝘴. 𝘚&𝘗 500, 𝘵𝘦𝘤𝘩𝘯𝘪𝘤𝘢𝘭𝘴 & 𝘲

 
 
bottom of page