Financial Services / Anonymized financial institution knowledge work team
Evidence-Backed RAG For Internal Knowledge Work
Knowledge workers needed to query internal material and fast-moving news while preserving time context, follow-up questions, and source traceability.
I helped build a two-stage RAG workflow: rewrite the question from conversation context, parse time, entities, and keywords, retrieve with a hybrid strategy, then answer only from cited evidence.
Conversation context -> question rewriting -> time and entity parsing -> tag-first and hybrid retrieval -> evidence filtering -> cited SSE streaming answer.
The system connects tens of thousands of internal reports and tens of millions of news items; answers begin streaming in about 3–5 seconds and complete with citations in roughly a dozen seconds, supporting 10+ follow-up turns without losing the thread and refusing when evidence is insufficient — so users spend less time finding material and more time judging it.
The hard part of research Q&A is not making a model speak; it is making each useful claim traceable back to evidence.
Background
This project served an internal knowledge-work workflow at a financial institution. Users needed to search across internal material and market news. Public search could not cover the internal view, while traditional search struggled with follow-up questions, changing time scopes, and natural phrasing such as “what changed recently?”
The goal was not a general chatbot. Users needed to ask in normal language, receive a grounded answer, see where the answer came from, and continue the thread without rebuilding the whole question every time.

What Made It Difficult
Internal reports and news behaved like different products. Reports usually came from PDFs and had a more stable taxonomy. News moved faster, had looser structure, and contained more noise. Treating both as one flat vector index made retrieval look broad, but it weakened explainability and consistency.
The questions were also not clean search queries. A user might ask “how has it changed recently?” after several turns. The system needed to infer whether “it” referred to a company, sector, or macro event, and whether “recently” should be interpreted from the question context rather than a fixed default.
The most important constraint was provenance. In this setting, a fluent answer without sources is worse than an incomplete answer, because it creates confidence without a way to audit the claim.
What I Helped Build
I worked on the Q&A flow as a two-stage pipeline: understand the question first, then answer from evidence.
The first stage rewrote the user question into a standalone query. It used recent conversation context to resolve references, made the time range explicit, and extracted entities, keywords, and retrieval tags. This step needed predictable structured output more than long-form reasoning.
The retrieval stage used a hybrid strategy. Where the taxonomy was reliable, tag-based retrieval came first because it mapped to how the team already organized material and could be explained clearly. Keyword and vector retrieval filled gaps when the wording was more open or the taxonomy signal was weaker. Before generation, weak evidence was filtered out so the answer model only saw material that could support citations.
Answers were streamed back to the UI with SSE, so users did not have to wait for the full response before seeing progress. Key claims had to point back to a report or news source. When the retrieved material did not support a conclusion, the system refused to answer beyond the evidence instead of filling the gap with a plausible guess.

Tradeoffs
Using two LLM stages added some moving parts, but it separated two different jobs. If an answer was poor, the team could inspect whether the rewrite was wrong, retrieval was off, or generation failed to stay within the evidence.
The project also avoided treating vector similarity as the whole retrieval strategy. In this setting, a mature internal taxonomy contains real workflow knowledge. It should be used when it is reliable, with vector search acting as a supplement rather than a replacement for every structured signal.
Refusal behavior was a product choice, not an error path. For serious knowledge work, “the current material does not support that conclusion” is often more useful than a polished paragraph with weak grounding.
Result
After connecting tens of thousands of internal reports and tens of millions of news items, the system supported follow-up questions, time-scope changes, and topic shifts. Answers began streaming in about 3–5 seconds and completed with citations in roughly a dozen seconds, sustaining 10+ follow-up turns without losing the thread; users could read the conclusion first and then inspect the clickable sources before trusting it.
The practical gain was straightforward: less time spent hunting for material, more time spent judging what the material meant. The system was useful because it put documents, citations, time handling, and refusal boundaries back into the knowledge-work workflow.
Related Links
If you are evaluating an enterprise RAG, knowledge base, AI support, or agent workflow project, contact me by email at contact@aildnc.com. You can also reach me on Telegram at @NieErAI.
Contact
Discuss Similar Work
If you are evaluating a similar document AI, enterprise RAG, knowledge base, or AI workflow project, share the context first. Email works, and Telegram is available for a faster reply: contact@aildnc.com.