Project
Macro Analysis and Asset Allocation Report Pipeline
A macro research pipeline that turns reports, news, policy signals, and economic indicators into traceable asset allocation reference reports.
This project is an internal engineering pipeline for macro research and advisory workflows. It is not a public SaaS product or a standalone demo. The system takes research reports, news, policy text, and macro indicators, processes them through several specialized modules, and produces a structured asset allocation reference report with traceable source material.
The goal was not to ask an LLM to write a market commentary from a pile of documents. The useful part was turning noisy daily material into a repeatable workflow: raw documents can be traced back, indicator-level insights can be reviewed, policy signals have historical continuity, and the final report follows a shape that research and advisory teams can use every day.
Problem
Macro research is hard to automate because the inputs do not share a common format or rhythm. Research reports arrive as PDFs, news is a continuous text stream, and indicators such as GDP, PMI, inflation, employment, central bank activity, and funding conditions are published on different cycles. Feeding all of that directly into a model usually produces fluent prose, but it is difficult to audit and inconsistent from day to day.
The pipeline was designed around three practical requirements:
- Convert unstructured material into searchable, grouped, and traceable intermediate data.
- Turn policy, inflation, employment, sector, and liquidity signals into stable indicator-level insights.
- Force the final report to include both bullish and bearish reasoning, so the output does not collapse into one-sided commentary.
Stack
The implementation used Python data jobs, PDF OCR and Markdown conversion, LLM-based extraction and summarization, structured prompts, scheduled tasks, database persistence, time-series smoothing, and report templating.
LLMs were treated as task-specific components rather than a single chat endpoint. Separate steps handled topic labeling, date extraction, policy stance judgment, indicator insight generation, consensus and disagreement synthesis, and final report writing. Each step had its own prompt and output contract, which made the system easier to test and adjust.
Architecture
The pipeline was split into four layers: raw material, signals, aggregation, and reporting.
The raw material layer converts PDF reports into Markdown, extracts titles, dates, topic labels, and source metadata, and stores news, policy text, and macro indicators with a consistent metadata shape. Traceability is a core requirement here: every downstream insight should be able to point back to the source text.
The signal layer runs several focused jobs, including policy stance, overseas macro, domestic macro, inflation and employment, sector changes, central bank operations, and funding conditions. Each job works on a narrow input scope and emits structured fields instead of writing directly into the final report.
The aggregation layer reduces multiple raw observations about the same indicator into a smaller set of reviewable insights. It explicitly separates consensus from disagreement, and filters local or low-relevance noise before the information enters the macro view.
The reporting layer reads the outputs from previous modules and builds the asset allocation reference report. Its structure is constrained around asset classes, positive and negative arguments, and allocation suggestions, which helps avoid daily swings caused by news sentiment alone.
My Role
I worked on the pipeline design and several core modules: PDF report processing, LLM extraction tasks, policy signal scoring, indicator insight aggregation, report structure constraints, and end-to-end persistence.
For policy signals, I separated internal continuous scoring from external qualitative presentation. The system converts policy text and market interpretation into directional signals, then aggregates and smooths those signals into a continuous internal time series. End users see broader categories such as easing, neutral, or tightening, which is less misleading than exposing internal numeric scores as if they were precise forecasts.
For the final report, I focused on two constraints. Each asset class had to include both supportive and opposing arguments. Market-style mappings were also encoded into the structure, reducing cases where the model gave similar recommendations under conflicting market conditions.
Hard Parts
The first challenge was noise. On a given day, several sources may discuss the same topic from opposite angles. A plain summary tends to blur these disagreements into generic language. The pipeline keeps source direction and context at the indicator level, then marks consensus and disagreement during aggregation instead of forcing everything into a neutral sentence.
The second challenge was stability. A single policy headline should not make the whole report reverse tone unless the underlying signal is strong enough. The internal scoring process therefore uses historical smoothing, balancing new information with recent state.
The third challenge was data frequency. Not every macro indicator needs to be processed every day. The jobs are triggered according to data release rhythm and task type, which reduces wasted model calls and keeps the workflow closer to how macro research is actually done.
Delivery Shape
The delivered system is a deployable backend pipeline and report generation workflow. The full pipeline runs nightly in roughly 00:30–05:00 with the report ready before 5 a.m., covering 23+ macro indicators (~14 domestic activity indicators and ~9 U.S. macro indicators) and converging from thousands of news insights to about a hundred indicator-level insights per day. It reads source data in scheduled batches, writes intermediate insights to storage, and assembles the final report from structured module outputs.
The main outputs are:
- Report and news insights traceable to source documents.
- Daily or periodic views grouped by macro indicator.
- An internal time series for policy stance.
- A fixed-format asset allocation reference report.
- Intermediate tables, logs, and task records for troubleshooting.
Links
This project depends on private data sources, client-side workflow details, and internal report templates, so there is no public repository or demo URL. Public profile link: GitHub profile.
If you are evaluating an enterprise RAG, knowledge base, AI support, or agent workflow project, contact me by email at contact@aildnc.com. You can also reach me on Telegram at @NieErAI.