Article
Financial Contract Extraction and Calendar Rule Engine
A practical engineering note on financial contract extraction, tiered fee rules, subscription and redemption calendars, holiday alignment, offline evaluation, and human review boundaries.
A financial contract extraction and calendar rule engine turns asset-management contract clauses into standard rules, then generates daily subscription and redemption availability. The hard part is not producing JSON. The hard part is making the JSON executable, measurable, and reviewable.
I wrote earlier about contract extraction evaluation. This version is closer to the real production problem: field extraction is only the first handoff. The output still has to survive rule mapping, transaction-calendar generation, cache invalidation, and human review.
The Missing Clause Can Be The Rule
In asset-management contracts, a single document can be around 100k Chinese characters. Fees, performance compensation, subscription rules, redemption windows, lockups, and confirmation timing may appear in body clauses, appendices, and tables.
One failure case stayed with me.
A contract may state that excess return between 8% and 15% is charged at 5%, and anything above 15% is charged at 10%. It does not always say, in the same sentence, that return below 8% is charged at 0%. But the business rule needs that tier.
Early versions often missed this implicit tier. Other versions created the opposite problem: the model over-inferred threshold values into fields where they did not belong.
That is where “just improve the prompt” stops being useful. The extraction result has to be judged against field definitions, standard-rule mapping, and an evaluation loop, not just against whether the JSON looks plausible.
One Prompt Was Too Blunt
I split the extraction side into five modules: fees, performance compensation, subscription liquidity, redemption liquidity, and other fields. The higher-risk modules used two prompt styles each. A single case could trigger up to eight parallel LLM calls.
That was a cost decision, not a style preference. A single prompt was cheaper, but missed tier tables and numeric edge cases more often. Dual prompts improved results by about 3 percentage points compared with the single-prompt setup. In this workflow, getting more risky fields into the review path was worth more than shaving token cost.
The second prompt did not make the output production-ready by itself.
The cleanup layer did the less glamorous work: mapping T+2, T+2日, and N+2 into one enum; normalizing “postponed” and “business day” variants into system codes; converting Chinese percentage expressions into numeric values; and merging candidate paramList items by their left-bound keys.
The model found candidates. Deterministic code made them survivable.
An Open Day Is Not A Date
Subscription and redemption days look like a date extraction problem. They are not.
One real pattern was: the Tuesday after the third Friday of September, plus the next two business days, postponed if the day is not a business day. To execute that, the system needs to map natural language into a cycle type, month, week offset, weekday list, business-day semantics, and holiday alignment.
The rule engine supports nine cycle types, including monthly nth day, quarterly nth day, weekly weekday, monthly nth-week weekday, month-end offsets, quarter-end offsets, yearly month-day, yearly nth-week weekday, and yearly month-end offsets. openDay=32 means “last day of the month”, dynamically resolved to 28, 29, 30, or 31.
I split calendar generation into three steps:
- Generate candidate dates using calendar-day logic.
- Intersect with the trading calendar or send candidates to holiday alignment.
- Apply postponement, advancement, or no-op behavior for closed-market days.
This kept the nine cycle functions focused on date selection while using one shared alignment layer.
Holiday Alignment Has A Trap
If several candidate dates fall inside the same market-closed block, pushing all of them to the first trading day after the holiday is wrong. It collapses multiple open days into one.
The engine first identifies the continuous closed-market block. It then sorts candidates inside that block and assigns each one an index. For postponement, candidate i maps to the ith trading day after the holiday. For advancement, it maps to the ith trading day before the holiday.
Small detail. Big consequence.
Without it, the extracted word “postponed” still cannot produce a valid transaction calendar.
There is also a conservative boundary: the engine walks up to 400 days looking for nearby trading days. If it cannot find one, it drops the candidate instead of drawing a possibly wrong open day.
Evaluation Tells You What To Blame
The first version had low accuracy and standard-rule mapping broke often. The offline EDD setup changed the debugging loop.
The evaluation has three ablation levels: L0-single, L1-dual, and L2-dual-chunk. It also checks expected-null paths and spurious outputs, which matters when a contract does not state a field but the model fills one anyway.
The most useful part is attribution. If several models make the same wrong non-null prediction, the issue may be the prompt or the gold label. If only one model fails, it may be model variance. If the right clause never reached the model, the fix is chunking or localization, not another prompt sentence.
This changed the conversation from “the new prompt feels better” to “these fields improved, these regressed, and this error is probably a rule-definition problem.”
Result And Boundary
The confirmed result was practical: processing time moved from hours to about two minutes per contract, and field-level results improved from the 60%+ range to around 90%. Dual prompts added about 3 percentage points over the single-prompt baseline.
I still would not describe this as full automation. High-risk fields, conflicts, implicit tiers, and unsettled business definitions still need human review. The useful shift is narrower and more honest: reviewers no longer start from a 100k-character contract. They start from candidates, evidence, conflicts, and an evaluation trail.
If I were rebuilding this from day one, I would put EDD into the first version. In regulated contract workflows, admitting early that the model will be wrong is the faster way to build something usable.
Related Links
If you are evaluating contract extraction, document parsing, transaction-calendar rules, or AI workflows for financial operations, contact me by email at contact@aildnc.com. For China-based inquiries, use the WeChat QR code below the article.