Why A Regulatory LLM Must Justify Its Verdicts | Nie Er

In securities-regulation content control, a model that gets the verdict right but can’t say why is close to useless. A regulator doesn’t want “this is probably illegal.” They want “this is illegal because it hits stock-tipping slang, pushes a contact channel, and promises guaranteed returns” — a verdict with an audit-ready reason attached. That single requirement is why, on a securities RegTech project, I treated explainability as the training objective itself rather than something stapled on after training.

This is one slice that Nie Er (AILDNC) pulled out of that project. It isn’t a full walkthrough of the toolchain. It’s one argument: why a regulatory model’s reasons can’t be added afterward, and how I welded them into the training objective.

”Accurate” Is Just The Passing Bar Here

First, how demanding this setting is. Illegal securities activity — stock-tipping jargon, guaranteed-return pitches, off-exchange margin funding, fake licensed institutions, pre-IPO share scams — is scattered across nine major social platforms, spanning text, images, short video, and audio. Manual review can’t keep up, so the idea is to use an LLM. But a general model has two obvious weaknesses: it doesn’t understand domain slang like “follow the teacher’s calls / add me on WeChat to join the group / pre-IPO shares, guaranteed,” and its false-positive rate runs high.

The third weakness is the one that gets overlooked: even when the model happens to be right, it can’t say why.

In plenty of settings, that third point doesn’t matter. Nobody asks a recommender to justify a misclick. But in a regulatory workflow, a verdict has to be trusted, has to enter an enforcement process, and has to survive after-the-fact review. So it has to answer “on what grounds?” A 0.97-confidence output with no written reason is something the business side can’t use — they can’t put “the model said so” into an enforcement record.

So the goal here had two legs from the start: be accurate, and produce a reason that can be audited. The second leg isn’t a nicety. It’s a hard requirement.

Treating The Reason As A Post-Hoc Feature Is A Common, Dangerous Design

The easy path is to train a model that outputs only a three-way label — illegal / compliant / suspect — and, once it’s accurate, bolt on an “explanation module” that adds a justification. It sounds reasonable and it schedules nicely. I didn’t do it, because that kind of explanation is inherently untrustworthy.

Asking a model to justify a verdict after the fact is asking it to make up something plausible for a conclusion it has already reached. It’s incentivized to make the reason fluent, not true — there’s no training-time causal link between the reason and the verdict. What you end up with is a verdict plus some text that looks like a reason but doesn’t necessarily support it. In a regulatory context, “looks explainable” is more dangerous than “no explanation,” because it fools the reviewer.

What you actually want isn’t “the model can produce a paragraph.” It’s “the model’s verdict was reasoned out from that paragraph.” The two can look identical on the output side, but their trustworthiness is worlds apart. To get the latter, the reason can’t come afterward. It has to be part of the model’s reasoning, tied to the verdict at training time.

What I Did: Welding `<think>reason</think>verdict` Into Every Training Sample

The approach is direct. The target output of each training sample isn’t a bare label. It’s a two-part sequence: <think>{reasoning in Chinese}</think>{final verdict}. The model first writes out, in Chinese, which signals it saw and why those signals point to a violation, and only then gives the illegal / compliant / suspect label.

This reuses Qwen3’s native <think> chain-of-thought marker instead of inventing a format. Part of why I picked Qwen3-14B/32B as the base is its native thinking / non-thinking dual mode — the think block isn’t a foreign structure for it. When the weak-labeled samples are converted to LlamaFactory’s alpaca format, the reason span and the verdict span go in together as the target sequence. What the model learns during training is the full path — reason first, then verdict — not just the conclusion.

Why does this matter? When the reason is part of the training target, the model has to learn the reason that points to the verdict in order to learn the verdict correctly — the two are coupled in the loss. It can’t learn to slap on a label while letting the justification drift. The think block such a model produces isn’t rhetoric to back-fill a conclusion; it’s the signal the model actually relied on. That lines up with what the online side cares about most: every verdict can be traced back to “what was hit.”

Where The Reasons Come From: Three-Layer Weak Labeling Bakes The “Why” Into The Golden Set

Making explainability a training objective has a precondition that’s easy to miss: the reasons in the training samples have to be correct first. No labels is the first mountain on this project — nobody tells you in advance which post is illegal and which is compliant, so supervised training has nowhere to start. I cold-started the first golden set with a three-layer joint judgment: rule scoring, LLM voting, expert review.

The rule layer doesn’t just emit a label, it emits a written reason. It assigns illegal / compliant / suspect by weighted signals, with the weights encoding regulatory experience — a contact-channel lure carries the strongest weight, slang density and promise-style phrasing next — and writes the hit signals out as human-readable reasons. This step is the crux: it means the reasons in the golden set aren’t back-filled. From the first moment of cold start, every label carries a “why.” What the model later learns are reasons that both the rules and the experts have signed off on.

There’s a deliberate, conservative guardrail here. When the rules return a high-confidence “illegal” but the LLM majority disagrees, the system is forced to mark it “suspect” and route it to a human — it never lets the LLM quietly override a strong rule signal. In a regulatory setting, you’d rather send too much to humans than miss a violation. The confidence on suspect samples is deliberately capped low so they’re guaranteed to be routed to a person rather than auto-cleared. This sacrifices some automation rate to keep recall from being quietly traded away — and the reasons that reach the golden set have passed both rule signals and expert judgment, not the LLM talking to itself.

The Other Line To Hold During Training: Don’t Burn Out General Ability

While welding the domain verdict and its reason into the model, there’s a hazard to handle in the same breath: catastrophic forgetting. Teach the model securities judgment and it can easily degrade the general Chinese ability it started with — and “no catastrophic forgetting” is also a hard requirement on this project. That’s why I didn’t take an off-the-shelf finance vertical model (DianJin-R1, Fin-R1) as the base. Those narrow, heavily-reinforced models carry an undisclosed risk of regressing general ability, which collides with that requirement; I kept them only as comparison baselines.

Forgetting prevention is three stacked layers, all done with LlamaFactory’s native config, no upstream source changes: LoRA incremental tuning (lora_rank: 32, lora_target: all) freezes the base and trains only side-attached low-rank patches; general-corpus replay (interleave_probs: 0.85,0.15) interleaves general data back in at 85/15; knowledge distillation (ASFT≈LwF, asft_alpha: 0.2) uses a KL regularizer to pull back updates that drift too far from the frozen base. The design philosophy is to solve it with the framework’s native capabilities first and add complexity only if that isn’t enough — quantifiable verification before stacking elaborate regularizers.

Configuring forgetting prevention isn’t enough; you have to be able to measure whether forgetting happened. So evaluation runs on two tracks. One measures domain accuracy (three-way precision, recall, and F1 on the golden set, plus a separate “reason correctness” metric — character-level Jaccard between the model’s reason and the expert’s). The other measures general-ability retention (a set of securities-unrelated probe questions answered by base and tuned, watching regressions: things it could do before and can’t after). Both tracks are hand-written with the Python standard library, formulas checkable by hand, no sklearn or numpy — regulators want every number to be verifiable by the business side itself. One caveat: the real model hasn’t been trained yet (it’s waiting on GPU capacity and an expert-reviewed golden set), so the verdicts in the current demo come from the rule engine and represent recognition capability, not model performance. That’s why you won’t find a single effectiveness percentage in this post.

Where This Transfers: Not Just Securities Regulation

Making explainability a training objective applies well beyond securities regulation. Any setting where “the conclusion has to be trusted by a person and survive later review” — compliance review, risk control, medical triage support, content safety — wants the same thing. The test is simple: if your model’s output is ultimately taken by a human to make a decision, and that human is accountable for it, then what they need isn’t a confidence score. It’s a reason they can review, challenge, and sign off on.

If you’re about to build a model in one of these settings, here’s what I’d nail down first:

Ask whether the reason is a hard requirement. If it is, carry the reason from the moment you label data — don’t add it after training. A post-hoc reason is rhetoric, not reasoning.
Put the reason in the training objective, in the loss alongside the verdict. Use the base model’s native CoT format (like Qwen3’s <think>) rather than a structure it never learned.
The golden set’s reasons have to withstand review first. Produce them with a rules + LLM + expert stack, don’t let the LLM quietly override strong rules, and route low-confidence samples to humans.
Evaluate reason quality on its own. Don’t only score verdict accuracy; turn “is the reason correct” into a quantifiable, hand-checkable number.
Run forgetting prevention alongside domain tuning. Add domain ability while watching general ability not regress, and use a two-track eval to turn forgetting into a number you can report.

The hard part of a regulatory model was never pushing accuracy up. It’s getting a person who’s accountable for a conclusion to put what the model said into an enforcement record. Whether you can do that is largely decided the moment you label your first data point.

Official docs for the tools mentioned: LlamaFactory, data-juicer, Qwen3.

If you’re working on domain LLM training in regulated settings, auditable AI judgment systems, or enterprise RAG, reach me by email at contact@aildnc.com. Code and projects are on GitHub: https://github.com/JV-X .