Query Understanding In Investment Research RAG | Nie Er

Most RAG projects start by tuning retrieval: which embedding model to use, how large each chunk should be, how many results to fetch, whether to add reranking.

Those choices matter. But in real investment research Q&A, I have seen many failures that were not caused by weak vector search. They happened earlier: the system did not understand what the user was asking.

Research users rarely write clean, self-contained prompts. They ask follow-up questions, switch scope, add constraints, and refer back to previous answers:

“What about it recently?”
“How does that compare with last month?”
“Only use overseas institution views.”
“Replace that company with the industry leader.”
“Is there research evidence for this conclusion?”

If these questions are sent directly into a vector database, retrieval may still return plausible text. The final answer can still be wrong, because the retrieval query was already malformed.

RAG Does Not Start With Vector Search

For investment research, I think of RAG as a two-stage system.

The first stage is query understanding: converting a natural-language question into a retrieval intent that can be constrained, searched, and audited.

The second stage is retrieval and generation: finding source material and answering within the evidence boundary.

When the first stage is weak, BM25, embeddings, hybrid search, and reranking all become compensating mechanisms. You can tune many parameters, but the behavior remains unstable because the system is optimizing around a misunderstood question.

I prefer to keep query understanding as an explicit module instead of burying it inside the final answer prompt. At minimum, it should produce a rewritten query, time range, entities, topic tags, source types, filters, and any unresolved ambiguity.

Answer and evidence thread in an investment research RAG workflow

Coreference Determines Whether Follow-Up Questions Work

Investment research conversations are usually multi-turn. A user may start with a company, sector, or macro theme, then continue with “it”, “that logic”, or “the risk you mentioned”.

Humans handle this naturally. A RAG system does not, unless it performs coreference resolution.

Suppose the first question is about why a company’s margin declined. The next question is, “Does it have room to recover this year?” What does “it” refer to: the company, the margin, sector demand, or a cost item mentioned in the previous answer?

The system needs to rewrite the follow-up question into a standalone query before retrieval. Simply passing chat history into the final generation step is not enough. A rewritten query also makes logs and error reviews much clearer: you can inspect what the system thought the user meant.

Time Range Is Not A Cosmetic Field

Time expressions in research questions are risky.

“Recent” does not mean the same thing across topics. For news impact, it may mean a few days. For sector demand, it may mean one to three months. For macro cycles, the answer may need a longer history of data and research views.

If the system does not extract time range explicitly, two common errors appear.

One error is answering a current question with stale material. The answer has citations, but the citations are no longer suitable evidence.

The other error is using only the newest material when the user is asking about change over time. The system answers “what happened today” but misses “what changed compared with before”.

Time range should therefore be part of query understanding: start date, end date, comparison baseline, whether older background material is allowed, and which sources must be recent evidence. This affects both retrieval filters and the final answer boundary.

Entity Recognition Is Harder Than Keyword Matching

Financial entities are not just company names.

A single question can include listed companies, sectors, products, indexes, regions, policy bodies, macro indicators, and asset classes. The same entity may appear as a formal name, abbreviation, English name, ticker, or relationship in an internal taxonomy.

If retrieval relies only on the raw user sentence, it often returns passages that are semantically similar but attached to the wrong entity. This is especially common within the same sector, where many companies share language such as margin pressure, inventory digestion, or demand recovery.

Entity recognition is not about making the pipeline look structured. It tells the system which parts of the query require exact constraints and which parts can be expanded semantically.

Company names, funds, indexes, and macro indicators usually need stricter matching. Themes, risks, and investment logic can use semantic retrieval. Treating all of them the same makes retrieval look broad, but the answer becomes less trustworthy.

Tag Filters Are Often More Explainable Than Pure Vectors

Many financial organizations already maintain useful metadata: sector, asset class, region, report type, research topic, publication date, and source level.

These tags should not be treated as display-only metadata. They should participate in retrieval.

When a user says “only use overseas institution views”, that is not a semantic similarity request. It is a filter. When the user asks for recent research views on electric vehicle demand over the last quarter, the query contains time, sector, topic, and source-type constraints. Pure vector search may retrieve related text, but it cannot reliably enforce all of those conditions.

Tag filtering also improves explainability. The system can show which topics, time windows, and source types were used. For research users, that is often more useful than saying the passages had high similarity scores.

In practice, investment research RAG is usually more stable with hybrid retrieval: entities and tags narrow the search space, keyword and vector search expand recall, and reranking is applied after the candidate set is already sane.

Evidence citation example in an investment research RAG workflow

Evidence Boundaries Must Be Set Before Generation

The hard part is not that the model cannot answer. The hard part is that it can answer too well.

When retrieval is weak, the model can still write a smooth industry view. When evidence covers only one company, it may generalize to the whole sector. When the source material is news rather than research reports, it may still phrase the answer as if it came from analyst research.

This cannot be fixed by adding “do not hallucinate” to the prompt. The system needs to establish the evidence boundary before generation:

Can the retrieved evidence answer this question?
Does the evidence cover a company, sector, or macro theme?
Are the sources research reports, news, announcements, or internal summaries?
Is there enough recent material?
Which conclusions can only be used as background?

If the boundary is weak, the answer should shrink. Sometimes it should say that there is not enough evidence. The value of RAG is not making the model always speak; it is making clear which statements are supported.

A More Useful Query Pipeline

In enterprise projects, I prefer an investment research RAG pipeline closer to this:

Read the current question and the necessary conversation history.
Resolve coreference and rewrite the question as a standalone query.
Extract time range, entities, topics, source types, and filters.
Decide which constraints require exact matching and which can be expanded semantically.
Run tag, keyword, and vector retrieval strategies in parallel.
Deduplicate, rank, and check the evidence boundary.
Generate the answer only from qualified evidence, with citations.

This is more work than “user question -> vector search -> LLM answer”. But it matches how research users actually interact with the system.

My View

Investment research RAG is not mainly about putting documents into a vector database. It is about turning an ambiguous natural-language question into executable retrieval constraints.

Vector search answers one question: which passages are semantically close? Real research Q&A also needs to answer: what does “it” refer to, how long is “recent”, which source type should be used, which entities must match, and how far the evidence can support the conclusion.

If these parts are not explicit, a RAG demo may look fine while real conversations become unreliable. The more naturally users speak, the more the system exposes weaknesses in query understanding.

So when I evaluate investment research RAG, I start with one question: does the system actually understand the query, or does it only search for similar text?

If you are evaluating an enterprise RAG, knowledge base, AI support, or agent workflow project, contact me by email at contact@aildnc.com. You can also reach me on Telegram at @NieErAI.