Internal Knowledge RAG Case Study | Nie Er

The hard part of research Q&A is not making a model speak; it is making each useful claim traceable back to evidence.

Overview of the five modules of the LLM-powered investment research platform — The RAG Q&A described here is one module of a larger LLM-powered investment research platform that also covers macro indicators, contract extraction, allocation reports, and private-fund due diligence.

Background

This project served an internal knowledge-work workflow at a financial institution. Users needed to search across internal material and market news. Public search could not cover the internal view, while traditional search struggled with follow-up questions, changing time scopes, and natural phrasing such as “what changed recently?”

The goal was not a general chatbot. Users needed to ask in normal language, receive a grounded answer, see where the answer came from, and continue the thread without rebuilding the whole question every time.

Sanitized multi-turn RAG question answering screenshot — Sanitized multi-turn Q&A view: it preserves the shape of a conversational question, context-aware rewrite, and evidence-backed answer, while exact questions, dates, and source content are blurred.

What Made It Difficult

Internal reports and news behaved like different products. Reports usually came from PDFs and had a more stable taxonomy. News moved faster, had looser structure, and contained more noise. Treating both as one flat vector index made retrieval look broad, but it weakened explainability and consistency.

The questions were also not clean search queries. A user might ask “how has it changed recently?” after several turns. The system needed to infer whether “it” referred to a company, sector, or macro event, and whether “recently” should be interpreted from the question context rather than a fixed default.

The most important constraint was provenance. In this setting, a fluent answer without sources is worse than an incomplete answer, because it creates confidence without a way to audit the claim.

What I Helped Build

I worked on the Q&A flow as a two-stage pipeline: understand the question first, then answer from evidence.

The first stage rewrote the user question into a standalone query. It used recent conversation context to resolve references, made the time range explicit, and extracted entities, keywords, and retrieval tags. This step needed predictable structured output more than long-form reasoning.

The retrieval stage used a hybrid strategy. Where the taxonomy was reliable, tag-based retrieval came first because it mapped to how the team already organized material and could be explained clearly. Keyword and vector retrieval filled gaps when the wording was more open or the taxonomy signal was weaker. Before generation, weak evidence was filtered out so the answer model only saw material that could support citations.

Answers were streamed back to the UI with SSE, so users did not have to wait for the full response before seeing progress. Key claims had to point back to a report or news source. When the retrieved material did not support a conclusion, the system refused to answer beyond the evidence instead of filling the gap with a plausible guess.

Sanitized cited answer screenshot — Sanitized cited-answer view: the relationship between answer text, evidence snippets, and source material is kept visible, while source details and original text are blurred.

Tradeoffs

Using two LLM stages added some moving parts, but it separated two different jobs. If an answer was poor, the team could inspect whether the rewrite was wrong, retrieval was off, or generation failed to stay within the evidence.

The project also avoided treating vector similarity as the whole retrieval strategy. In this setting, a mature internal taxonomy contains real workflow knowledge. It should be used when it is reliable, with vector search acting as a supplement rather than a replacement for every structured signal.

Refusal behavior was a product choice, not an error path. For serious knowledge work, “the current material does not support that conclusion” is often more useful than a polished paragraph with weak grounding.

Result

After connecting tens of thousands of internal reports and tens of millions of news items, the system supported follow-up questions, time-scope changes, and topic shifts. Answers began streaming in about 3–5 seconds and completed with citations in roughly a dozen seconds, sustaining 10+ follow-up turns without losing the thread; users could read the conclusion first and then inspect the clickable sources before trusting it.

The practical gain was straightforward: less time spent hunting for material, more time spent judging what the material meant. The system was useful because it put documents, citations, time handling, and refusal boundaries back into the knowledge-work workflow.

If you are evaluating an enterprise RAG, knowledge base, AI support, or agent workflow project, contact me by email at contact@aildnc.com. You can also reach me on Telegram at @NieErAI.

Contact

Discuss Similar Work

If you are evaluating a similar document AI, enterprise RAG, knowledge base, or AI workflow project, share the context first. Email works, and Telegram is available for a faster reply: contact@aildnc.com.

Telegram @NieErAI Message me on Telegram