Project
Financial Contract Extraction And Calendar Rule Engine
An engineering pipeline that extracts asset-management contract clauses into standard rules and generates daily subscription/redemption calendars.
This project is an engineering pipeline for turning long asset-management contracts into standard business rules and daily transaction-calendar output. The input is a fund or asset-management contract. The output is a reviewable set of fee rules, performance-compensation rules, subscription/redemption rules, and daily open states.
It is not a public SaaS product and has no public repository or hosted demo. I am documenting it here as a sanitized engineering project, focused on architecture and tradeoffs.
Problem
In this workflow, contract extraction is not summarization. It prepares structured rules for downstream business systems.
The system had to handle several constraints:
- A 100k-character contract could not be sent to the model in one call under the internal compute environment.
- Fee and performance-compensation clauses include tier tables, implicit ranges, Chinese numeric expressions, percentages, and field-boundary issues.
- Subscription and redemption days must become one of nine cycle rule types.
- Postponement, advancement, business days, trading days, and continuous market-closed blocks affect the final calendar.
- Every prompt, chunking, or cleanup change needs field-level regression visibility.
Stack
The implementation is mainly Python. LLM calls handle extraction, while the surrounding system handles chunking, modular prompts, candidate fusion, enum and numeric cleanup, standard rule objects, transaction-calendar rules, parameter-fingerprint caching, and offline EDD evaluation.
There is no public repoUrl or demoUrl. Public contact points are available through the GitHub profile and this site.
Architecture
contract text
-> chunk localization (chunk=8000, overlap=800)
-> 5 extraction modules
-> dual-prompt candidates for high-risk modules
-> candidate fusion and deterministic cleanup
-> standard rule VO
-> transaction-calendar rule engine
-> daily subscription/redemption open states
-> EDD field-level evaluation and regression comparison
The extraction side and calendar side are decoupled through a standard rule object. The calendar engine consumes flat rule fields rather than the internal model-output shape.
Core Modules
1. Contract extraction
The contract is split into five modules: fees, performance compensation, subscription liquidity, redemption liquidity, and other fields. Higher-risk modules use two prompt styles, with up to eight parallel LLM calls per case.
The point is better coverage for tier tables, numeric clauses, and liquidity fields. Dual prompts improved results by about 3 percentage points compared with the single-prompt baseline, at a higher token cost.
2. Fusion and cleanup
Model output does not go straight into business rules. The cleanup layer handles:
- Enum matching, such as
T+2 / T+2日 / N+2 -> n2. - Numeric parsing for Chinese percentage and positive-return expressions.
- KV shape normalization for bare values.
paramListmerging by left-bound key.- Candidate selection based on field completeness.
The goal is not to make the model smarter. It is to make the output more stable.
3. Transaction-calendar rule engine
The calendar engine supports nine cycle types:
- monthly nth day
- quarterly nth day
- weekly weekday
- monthly nth-week weekday
- monthly reverse nth day
- quarterly reverse nth day
- yearly month-day
- yearly nth-week weekday
- yearly month reverse nth day
All cycle types first produce calendar-day candidates, then go through trading-calendar and holiday alignment. openDay=32 means the last day of the month.
4. Holiday alignment
Postponement and advancement share one alignment algorithm. The system identifies the continuous market-closed block containing each candidate date. Multiple candidates inside the same block are sorted and indexed. Candidate i maps to the ith trading day after the holiday for postponement, or before it for advancement.
This prevents several open days from collapsing into the same post-holiday trading day.
5. Offline EDD evaluation
The evaluation setup includes L0-single, L1-dual, and L2-dual-chunk ablation levels. It checks expected-null and spurious outputs, then uses multi-model consistency to separate likely model variance from extraction difficulty or prompt/gold-label issues.
My Role
I worked on:
- The five-module extraction flow and dual-prompt strategy.
- Candidate fusion, enum cleanup, numeric normalization, and KV shape repair.
- The standard rule object between extraction and calendar generation.
- The nine open-day cycle implementations.
- Continuous market-closed block alignment for postponement and advancement.
- MD5 parameter-fingerprint caching.
- EDD ablation evaluation and multi-model error attribution.
Business-rule semantics were confirmed with product stakeholders, so I describe those as shared rule decisions rather than individual business authority.
Challenges And Boundaries
The internal environment could not simply send the full 100k-character contract into the model, so chunk localization was a core design choice, not a performance tweak.
Dual prompts improved recall, but they did not eliminate errors. Some semantic inference fields had not fully settled, so I do not present them as solved.
The calendar engine also stays conservative near boundaries. It walks up to 400 days looking for nearby trading days. If none is found, it drops the candidate instead of drawing a questionable date.
Result
The confirmed result was that contract handling moved from hours to about two minutes per contract, and field-level results improved from the 60%+ range to around 90%. Human review remains part of the workflow, especially for high-risk and conflicting fields.
My takeaway from this project is simple: contract extraction becomes usable only when extraction, rule mapping, calendar generation, evaluation, and review boundaries fit together.
Related Links
If you are evaluating an enterprise RAG, knowledge base, AI support, or agent workflow project, contact me by email at contact@aildnc.com. For China-based inquiries, use the WeChat QR code below the article.