From Overengineering to Top-10: Lessons from the Enterprise RAG Challenge

The world of AI is evolving at breakneck speed, and nowhere is this more apparent than in the field of Retrieval-Augmented Generation (RAG) systems. With best practices still emerging and most real-world results hidden behind NDAs, sharing hands-on experiences is invaluable. In this article, I’ll walk you through my journey in the Enterprise RAG Challenge, how I moved from 30th to 7th place by simplifying my approach, and what I learned along the way.

What is the Enterprise RAG Challenge?

The Enterprise RAG Challenge, organized by the LLM Under the Hood community and TimeToAct Austria, is a competition focused on building effective RAG systems for enterprise data. While there were prizes, the real draw was the opportunity to test solutions on a realistic, industry-scale dataset—and, of course, to earn a spot on the leaderboard.

The Task

Participants were given:

100 corporate reports (each 100–1000 pages of dense text)
100 questions about these reports (e.g., “How many hotels does company X own?”, “Were there changes in company Y’s dividends?”, “Which company among X, Y, Z had the highest revenue?”)
The goal: Answer each question and provide page references as proof

The challenge was time-constrained: a few hours for preprocessing and indexing the reports, then a few more hours to answer the questions using your system.

My Initial Approach: Structured Extraction

Inspired by previous winning solutions, I started by extracting structured data from the reports:

Extract metrics and facts into a knowledge base
Feed all extracted data into the prompt for each question

However, this approach quickly hit its limits:

The questions were more varied (not just metrics)
The dataset was larger and more complex
The context window for LLMs is limited—too much irrelevant data degrades answer quality

Refining the Extraction

I analyzed the question generation code and identified nine types of questions. I updated my data model accordingly:

class AnnualReport(BaseModel):
    financial_metrics: list[FinancialMetric]
    leadership_changes: list[LeadershipChange]
    # ... seven more lists for other data types

For each question, I:

Parsed the company name from the question using structured output
Identified the relevant data type (e.g., financial metrics, leadership changes)
Included only the relevant data in the prompt to the LLM

This reduced noise and improved answer quality.

Don’t Forget the Proof!

A key requirement was to provide page numbers as references. I modified the extraction step to:

Include page numbers in the structured data
Pass these references through to the final answer

The Real Bottleneck: PDF Extraction

Extracting clean text from PDFs is notoriously tricky, especially for tables. I experimented with:

PyPDF2: Fast, but struggles with complex tables
marker-pdf: Produces beautiful Markdown tables, but is slow (up to 30 minutes per report)
Custom OCR/Vision models: Powerful, but expensive

In practice, even imperfect extraction works for LLMs, but better formatting helps with complex data.

The Breakthrough: Simpler is Better

As the competition progressed, I realized something crucial:

Most questions only required data from a single report
Pre-extracting all possible data was overkill

The Simplified Pipeline

I switched to a much simpler, question-driven approach:

For each question, scan each page of the relevant report asynchronously with the LLM: “Does this page contain the answer?”
Aggregate all positive pages
Prompt the LLM with only these pages to generate the final answer and references

This reduced complexity, improved speed, and boosted my score from 89 to 110 out of 133.

What I Missed (and What the Winner Did Differently)

Some questions required cross-report answers (e.g., comparing metrics across companies). I didn’t handle these well in time for the deadline. The winning solution split such questions into sub-questions (one per report), answered them individually, and then combined the results—a simple but effective strategy.

Key Takeaways

Don’t overengineer: Complex retrieval architectures and embeddings are mainly about reducing LLM costs by filtering irrelevant data. As LLMs get cheaper and faster, simpler architectures become viable.
Prompt engineering matters: Use structured output and targeted prompts to extract exactly what you need.
Optimize for the task: If most questions are single-report, don’t waste time extracting everything upfront.
PDF extraction is a pain point: Invest in good tools, but don’t let perfect be the enemy of good.

Conclusion

The Enterprise RAG Challenge taught me that sometimes, less is more. By focusing on the actual requirements and iterating quickly, I was able to climb the leaderboard and learn valuable lessons about building practical RAG systems. As LLMs continue to improve, expect to see more “dumb” but effective architectures competing with complex solutions.

Tip: Always analyze your task and data before building elaborate pipelines. Sometimes, the simplest solution is the best.

Interested in more practical AI engineering stories? Check out my AI & Pitfalls channel, including a series on structured output used throughout this post.

How I Simplified My RAG System and Climbed the Enterprise RAG Challenge Leaderboard