Part 2: The Solution
A story about scaling judgment in the age of AI
3 months later.
Ram closes his laptop at 7:15 PM. Not 9:47 PM like he used to.
He smiles.
Tomorrow, he'll present the Q3 results to the leadership team. They'll ask the usual question: "Can we do it faster?"
This time, his answer will be different.
The Realization
It started with a conversation Ram didn't expect.
One month after the lease classification disaster, Ram reached out to an AI Coach—someone who'd spent 30 years in enterprise systems and had proven experience in AI-enabled engineering.
Ram was skeptical about "AI coaches." Most were either YouTubers or consultants who'd never built real systems. But this coach came recommended by a trusted colleague: "Thirty years in enterprise systems. Actually understands complexity. They've done this before."
Ram took the chance. In their first conversation, the coach didn't pitch tools or frameworks. They just listened, then said something that made Ram sit up:
"The problem isn't your team or your documentation. The problem is that you're trying to store judgment in a medium that can't hold it."
Ram explained his situation: the constant re-learning, the classification errors, the bottleneck he'd become.
"Code stores logic. Documentation stores information. But judgment? Judgment lives in the space between the question and the answer. It's contextual. It's iterative. You can't freeze it in static formats."
Ram nodded. "That's exactly what I'm experiencing. But how do you solve it?"
"You don't store the judgment itself. You store the conversation that led to the judgment. You make the decision-making process visible, capturable, and queryable."
"I've seen this pattern in enterprise systems for 30 years. The solution isn't better documentation or training. It's building systems that surface ambiguities explicitly, capture expert judgment when it's provided, and make that judgment reusable."
That night, Ram couldn't sleep. But for a different reason—he finally saw a path forward.
The Insight
Ram had been thinking about the problem wrong.
He'd been trying to:
- Document every classification rule → static, gets stale
- Train developers on principles → oral tradition, doesn't scale
- Build decision trees → can't capture nuance
- Write better code → only stores logic, not reasoning
What if instead, he could build a system that:
- Surfaces ambiguities instead of hiding them
- Captures judgment when it's made
- Makes that judgment queryable when the same situation appears again
- Asks better questions based on accumulated context
Not "AI that classifies transactions."
But "AI that helps developers make classification decisions the way Ram would."
The Experiment
Ram started small. One client. One quarter. One cash flow statement.
He worked with the AI Coach to design and implement the system. The coach assembled Ram's existing developers—helping them understand both the accounting domain and how to work with LLMs effectively. More importantly, the coach brought a proven methodology for encoding expert judgment in AI systems.
The approach was radical:
Step 1: Generate a Draft, Not a Final Statement
They took the client's Trial Balance and fed it to a local LLM (Llama-3 8B) along with:
- Historical accounting policies
- Prior year's cash flow statement
- Previous auditor comments
- IndAS 7 standard text
The LLM's job wasn't to produce the final cash flow statement.
Its job was to produce a first draft with explicit flags.
Step 2: Flag Ambiguities, Don't Resolve Them Silently
The system generated output like this:
Operating Activities:
Interest paid: ₹45 crore
⚠️ CLASSIFICATION AMBIGUITY:
Interest paid classification depends on company policy election per IndAS 7.
Prior year: Classified as Operating Activities
Current policy document: Not explicitly stated
Question for reviewer:
- Confirm policy election for interest paid classification
- If policy has changed, note reason for change
If Operating: No change needed
If Financing: Move to Financing Activities section
Another example:
Lease payments: ₹12 crore
⚠️ CLASSIFICATION AMBIGUITY:
Trial balance shows "Lease Payments" but doesn't distinguish between:
- Operating lease payments (Operating Activities)
- Finance lease principal (Financing Activities)
- Finance lease interest (Operating or Financing per policy)
Question for reviewer:
- What is the breakdown between operating and finance leases?
- For finance leases, is interest separated from principal?
- What is the policy election for finance lease interest?
This was different from traditional tools that either:
- Made assumptions silently (wrong)
- Required upfront configuration of every scenario (impossible)
Step 3: Accountant Reviews and Resolves Ambiguities
Here's the critical part: The accountant is the user, not the developer.
When the company accountant, Meera, uploaded the Trial Balance for Q1 2024, she received a notification 4 minutes later:
Draft cash flow statement ready for review
3 ambiguities require your attention
She opened the workspace. It looked like a review interface with a problems panel:
CASH FLOW STATEMENT REVIEW - Q1 2024
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Draft Generated: ✓
Ambiguities: 3 require resolution
Status: Awaiting review
AMBIGUITIES (3)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
⚠️ Interest paid classification needs confirmation
Line 147: Interest paid ₹45 cr → Operating Activities
Context:
- IndAS 7 permits Operating OR Financing classification
- Prior year (Q4 2023): Classified as Operating Activities
- Current policy document: Not explicitly stated for this year
Question: Confirm policy election for current year
[Confirm Operating] [Change to Financing] [View Policy History]
⚠️ Lease payment breakdown required
Line 203: "Lease Payments ₹12 cr"
Context:
- Trial balance shows aggregate amount
- Need breakdown for proper classification:
* Operating leases → Operating Activities
* Finance lease principal → Financing Activities
* Finance lease interest → Per policy election
Question: Provide breakdown or request from finance team
[Enter Breakdown] [Request from Finance] [View Q4 Treatment]
⚠️ Foreign exchange gain treatment unclear
Line 298: "FX Gain ₹5 cr" in P&L
Context:
- P&L shows FX gain but cash impact unclear
- Could be: Realized (affects cash) or Unrealized (non-cash)
- Treatment depends on realization status
Question: Clarify if realized or unrealized, and source
[Provide Details] [Escalate to Ram] [View FX Policy]
Meera reviewed the first ambiguity.
Meera clicked [Confirm Operating] and typed her reasoning in plain English:
Resolution for: Interest paid classification
Decision: Operating Activities (confirmed)
Reasoning:
Per company accounting policy manual section 4.2:
All interest paid (on borrowings and finance leases) classified as Operating Activities.
Confirmed with CFO in policy review meeting January 2024.
Auditor (KPMG) approved this treatment in FY22 audit (ref: memo AM-22-Q4-15).
Continue applying consistently.
Approved by: Meera Sharma, Senior Accountant
Date: March 15, 2024
She clicked [Submit]. The system immediately:
- Updated the cash flow statement
- Marked the ambiguity as resolved ✓
- Stored Meera's reasoning for future reference
For the lease breakdown, Meera clicked [Request from Finance]. The finance team responded later that morning with the split. Meera entered the breakdown in plain English, the system processed it, and she confirmed. Ambiguity resolved ✓
The system stored these judgments—not as code, but as contextual reasoning attached to specific scenarios.
Step 4: LLM Reprocesses with New Context
The LLM took Ram's judgment and:
- Updated the cash flow statement
- Documented the classification reasoning
- Stored the pattern for future similar scenarios
Next time an accountant worked on a cash flow statement for this client, the system would:
- Automatically apply the known policy elections
- Only flag NEW ambiguities
- Reference past decisions when relevant
The First Real Test
Two weeks later, Q2 closed. Meera uploaded the Q2 Trial Balance into the system.
4 minutes later, she received a notification: "Draft ready for review."
She opened the workspace. This time, fewer ambiguities:
AMBIGUITIES (1)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
✓ Interest paid: ₹52 crore → Operating Activities
Auto-applied based on Q1 policy decision
Reference: Meera's resolution from March 15, 2024
No action needed unless policy changed.
⚠️ Lease payments: ₹13 crore (NEW AMOUNT)
Trial balance shows ₹13 crore (increased from ₹12 cr in Q1)
System suggested provisional split based on prior-quarter ratios, pending confirmation:
- Operating leases: ~₹8.7 cr
- Finance principal: ~₹3.2 cr
- Finance interest: ~₹1.1 cr
Question: Please confirm actual Q2 breakdown
[Enter Breakdown] [Request from Finance] [Apply Q1 Ratio]
Meera saw only ONE ambiguity instead of three.
Interest classification? Already decided and stored. Applied automatically.
She clicked [Request from Finance] for the lease breakdown. Finance team responded that afternoon. Meera entered the breakdown, reviewed the proposed classification, and confirmed. Marked resolved ✓
Done.
Total time: 25 minutes (vs 4+ hours manually, vs 45 minutes in Q1).
Ambiguities to resolve: 1 (vs 3 in Q1, because policies already stored)
Classification errors: 0 (Ram spot-checked, found none)
Most important: Meera spent her time on what actually needed her judgment (new lease amounts), not re-deciding policies that were already established.
The Transformation
Over the next 3 months, Ram rolled out the system to all company accountants working with their team.
The Results:
Accountant time per statement: 6-8 hours → 25-45 minutes
Ram's review time: 2-3 hours → 15-20 minutes
Classification errors: 3-5 per statement → 0-1
New accountant ramp-up: 6 months → 2 months
85% reduction in errors. 80% time savings. Results varied by client complexity, but the trend was consistent.
But the numbers don't tell the full story.
What Actually Changed:
1. Ambiguities Became Visible and Explicit
Before: Accountants had to manually identify every classification decision point while building the cash flow statement. Many decisions were made implicitly or by pattern-matching without conscious review.
After: The system explicitly flagged every point where judgment was required, making invisible decisions visible.
Example:
Trial Balance shows "Dividend Received ₹5 cr"
Old workflow: Accountant manually classifies while building statement (Operating? Investing? Check last year... probably Operating... move on)
New workflow: System shows in the workspace:
⚠️ Dividend classification requires policy confirmation
Context: IndAS 7 permits Operating OR Investing classification
Prior treatment: Operating Activities
Policy rationale: "Company views dividends as returns on operational investments"
Status: ✓ Policy confirmed by Meera (Senior Accountant) on Jan 15, 2024
Action: Verify this dividend is consistent with policy, or update if changed.
[Apply Prior Treatment] [Update Policy] [View Full History]
Accountant now sees:
- This decision point explicitly (not buried in the process)
- What the established policy is
- The reasoning behind it
- Whether it applies or needs updating
Critically: Decisions that were implicit become explicit. Nothing slips through unreviewed.
2. Judgment Became Reusable
Before: Ram answered the same questions 100 times.
After: Ram answered each question once, and the system made it queryable.
Example: Junior accountant Kavya encounters "FX Gain ₹8 cr" in Trial Balance.
Old workflow: Kavya asks Ram, he explains 3 scenarios for 20 minutes.
New workflow: System shows Ram's stored guidance covering realized vs unrealized FX, with examples for each scenario. Kavya gets the answer in 2 minutes. Ram gets his time back.
3. Edge Cases Became Learning Opportunities
Before: Edge cases caused escalations and stress.
After: Edge cases became additions to the knowledge base.
Example: Complex foreign loan repayment with embedded FX loss. System flagged it as novel, escalated to Ram. Ram provided judgment with full reasoning. Stored in knowledge base. Next time this scenario appears, the system knows how to handle it.
But the system wasn't perfect:
One month later, a client had a similar-sounding but fundamentally different transaction: A forex hedge settlement that coincided with loan repayment but wasn't directly linked.
The system initially flagged it as matching Ram's prior guidance on "foreign loan with FX loss."
Meera reviewed the proposed classification and immediately saw the issue—this was a separate hedging transaction, not embedded FX on the loan itself.
She clicked [Reject Proposed Classification] and entered:
This is different from RAM-2024-05-22 scenario.
This is a cash flow hedge settlement (separate contract).
Classification: Operating Activities per hedging policy
Not related to financing activity.
Escalating to Ram to confirm and store as new pattern.
Ram confirmed Meera's assessment. The system learned the distinction.
This is exactly how it should work: The system proposes based on patterns, but the accountant is always the final authority. When the system gets it wrong, the human catches it, corrects it, and the system learns the nuance.
4. Clear User: Accountants Work Directly in the System
This was the fundamental shift.
Before: Developers built cash flow generation code. Accountants provided requirements. Lots of lost-in-translation errors.
After: Accountants work directly in the system. Developers improve the system itself (code, prompts).
The Roles:
- Accountant (User): Uploads Trial Balance, resolves ambiguities, provides judgment
- Ram (Domain Expert): Reviews complex scenarios, builds judgment library
- Auditor: Validates contentious items
- Developer (Builder): Improves system code and prompts—NOT in operational workflow
The Workflow:
- Accountant (Meera) uploads Trial Balance
- System generates draft + surfaces ambiguities
- Accountant resolves in workspace (plain English)
- System stores and applies judgment
- Accountant reviews final statement
- Ram spot-checks (optional for complex cases)
No translation layer. No handoffs. Accountant drives the process.
5. Junior Accountants Learned Faster (By Seeing Expert Reasoning)
Before: Junior accountants learned by making mistakes and getting corrections.
After: Junior accountants learned by seeing explicit reasoning for every decision.
Kavya (junior accountant, 3 months experience) said:
"Earlier, I'd see 'Interest paid ₹50 cr' in Trial Balance and panic. Is this Operating or Financing? Check last year's statement... check policy manual... ask Meera... hope I got it right.
Now, the system shows me: 'Interest paid → Operating Activities per company policy (ref: Meera's decision March 2024). Rationale: All interest classified as Operating per accounting manual section 4.2, confirmed with CFO, auditor-approved FY22.'
I'm learning the WHY behind decisions, not just copying prior treatments blindly."
Meera (Senior Accountant) said:
"Before, I'd explain the same classification rules to every new accountant. They'd still make mistakes because remembering rules without context is hard.
Now, every decision I make is captured with full reasoning. New accountants see my judgment, understand the context, and apply it consistently. My expertise scales without me repeating myself 100 times."
Ram said:
"I used to spend 2-3 hours reviewing each cash flow statement, finding the same classification errors repeatedly.
Now, the system catches 90% of those based on stored judgment. I spend 15-20 minutes on genuinely new scenarios—complex transactions, new standards, edge cases—where my expertise actually adds value."
What the System Doesn't Do
This is critical. Ram was very clear about the boundaries:
The System Does NOT:
- ❌ Make final classification decisions without human review
- ❌ Invent policy elections or make up guidance
- ❌ Override stored accountant/auditor judgment
- ❌ Perform final arithmetic reconciliation (that's deterministic code)
- ❌ Produce "audit-ready" output without accountant sign-off
- ❌ Replace accountants or domain experts
The System DOES:
- ✓ Generate structured drafts from Trial Balances
- ✓ Surface ambiguities explicitly in a review workspace
- ✓ Accept judgment in plain English from accountants/auditors
- ✓ Store that judgment with full context and reasoning
- ✓ Reference historical judgment and apply established policies
- ✓ Ask targeted questions when context is missing
- ✓ Make expert judgment queryable and reusable across quarters/clients
Role Clarity:
- Accountant (Primary User): Uploads TB, resolves ambiguities, provides judgment, reviews output
- Ram (Domain Expert): Handles complex scenarios, builds judgment library, spot-checks
- Auditor: Validates contentious items, confirms treatments
- Developer (System Builder): Improves code and prompts—NOT involved in operational workflow
The Key Insight:
It's not "AI replaces accountants."
It's not "AI automates accounting decisions."
It's "AI surfaces decision points + stores expert judgment + makes it reusable."
The accountant is always in control. The accountant always decides.
The Technical Foundation
The AI Coach helped Ram understand why this approach worked:
Why a Local LLM (Llama-3 8B)?
- Data sensitivity: Trial balances are confidential, auditor comments are privileged
- Cost and control: Runs on a single GPU, predictable latency, no per-call token costs
- Right-sized: Not trying to "know everything"—good at flagging ambiguities and pattern matching
The Architecture:
Ingestion API (Rust Language)
Step 1 — Capture Two Trial Balances
API receives opening and closing TB files from the user.
Step 2 — Convert Ledgers into Movements
API computes ledger-wise deltas to create movement records.
Step 3 — Add Deterministic Machine Hints
API enriches each movement with debit/credi,stock/flow, and working-capital hints.
Step 4 — Retrieve Relevant Past Judgments
API queries the Knowledge Base for similar historical cases.
API packages Movement + Hints + Context into a job.
Redis
Step 5 — Job Queued for Processing
The enriched job is pushed to the queue for asynchronous AI processing.
Orchestrator (Rust Language)
Step 6 — Layer Context Like an Accountant
Orchestrator assembles facts, policy, auditor notes, and KB cases in correct order.
Step 9 — Deterministic Engine Builds the Cash Flow
After AI responses, Orchestrator constructs the cash flow using rule-based logic.
Step 10 — Accountant Reviews Only Ambiguities
Low-confidence items are routed to the human review workspace.
Step 11 — The System Learns from Every Decision
Human resolutions are stored back into the Knowledge Base.
Llama-3 8B/Qwen 2.5 14B
Step 7 — AI Performs Major Classification
LLM classifies each movement into Operating, Investing, or Financing.
Step 8 — AI Refines into Sub-classification & Treatments
LLM determines sub-type and special accounting handling.
Take Away
- API prepares accounting facts
- Queue manages jobs
- Orchestrator thinks like an accontant
- LLM applies judgment
- Human resolves only edge cases
- System keeps learning
The CFO Conversation
Three months after rolling out the system, Ram met with the CFO.
CFO: "Your team's productivity is up 40%, but headcount is flat. What changed?"
Ram: "We stopped re-learning classification decisions. Accountants work directly in a system that captures their judgment once and makes it reusable."
CFO: "So you built a knowledge base?"
Ram: "Not quite. Traditional knowledge bases are static documents. This is conversational memory. The system doesn't just store what accountants decided—it stores why they decided it, what context mattered, and applies that reasoning to new situations."
CFO: "And this scales?"
Ram: "Yes. Every judgment Meera provides becomes part of the knowledge base. The system gets smarter organically. We're learning by doing, and the learning sticks across quarters and clients."
CFO: "What about new accountants?"
Ram: "Kavya generated her first cash flow statement with minimal errors. That would have taken 6 months to achieve before. She's learning from seeing Meera's reasoning for every decision, not from making mistakes and getting corrected."
CFO: "Wait—who's actually making the accounting decisions? The system or the accountant?"
Ram: "Always the accountant. Meera uploads the Trial Balance. The system generates a draft and flags ambiguities. Meera resolves them in plain English. The system stores her judgment and applies it going forward. The accountant is always in control."
CFO: "So accountability is clear?"
Ram: "Crystal clear. Every judgment is attributed—who decided, when, and why. Meera signs off on policy questions. I sign off on complex scenarios. Auditors validate contentious items. Full audit trail."
CFO: "That's... actually better governance than before."
Ram: "Exactly. Before, decisions were implicit—buried in spreadsheet formulas and manual processes. Now, every decision is explicit, documented, and traceable. Auditors love it."
CFO: "So you're... less busy?"
Ram smiled: "I close my laptop at 7 PM now. I spend my time on genuinely new problems, not reviewing the same classification errors repeatedly. Meera spends less time fixing mistakes, more time on strategic policy decisions."
CFO: "Can we scale this to other areas?"
Ram: "Absolutely. Anywhere expert judgment is being re-learned repeatedly—revenue recognition, tax provisions, lease accounting—yes."
What Ram Learned
Looking back, Ram realized the breakthrough wasn't just the technology. It was finding someone who understood the problem deeply—and brought both worlds together.
The AI Coach didn't try to sell him AI tools or promise automation. They understood:
- The actual problem: Judgment doesn't scale through documentation
- Both worlds: 30 years in enterprise systems + proven AI engineering experience
- A methodology: Systematic approach to encoding expertise, not just deploying tools
The Key Insights:
1. The problem wasn't tools. It was approach.
His team had tried GitHub Copilot, ChatGPT—all the tools. But they were using AI to speed up coding. Cash flow errors aren't coding errors. They're classification errors. The breakthrough was using AI to surface ambiguities and store judgment, not generate code.
2. Automation isn't the goal. Augmentation is.
The system didn't replace Ram. It multiplied him. It made human judgment visible, capturable, reusable, and scalable.
3. Context is everything.
The same AI (Llama-3 8B) that gave generic answers became incredibly useful when given client-specific history, auditor comments, and Ram's stored judgment. Context transformed it from "fast but wrong" to "helpful and reliable."
4. Knowledge transfer is really judgment transfer.
Ram had been trying to transfer knowledge through documentation and training. What he needed to transfer was judgment: how to recognize decision points, what questions to ask, what context matters, how similar situations were handled before.
Six Months Later
Ram closes his laptop at 7:15 PM.
His team now handles 40% more clients with the same team size.
Classification errors in production: down 85%.
Auditor queries: down 70%.
New accountant onboarding: 2 months (was 6 months).
But the biggest change?
Ram isn't the bottleneck anymore.
His knowledge is accessible. His judgment is queryable. His expertise scales.
When Meera gets promoted to Financial Controller at another division, the judgment she provided doesn't walk out the door with her.
When Kavya encounters a complex FX scenario she's never seen, she doesn't wait for Ram's next available slot.
When a new accountant joins, they don't spend 6 months making every mistake that's already been made.
Ram thinks back to that 9:47 PM night six months ago.
He'd been trapped by his own success. His expertise had become the constraint.
The problem wasn't his team.
The problem wasn't documentation.
The problem wasn't process.
The problem was that judgment couldn't be stored in code or documents.
Until now.
Ram isn't unique.
There are thousands of "Rams" across Indian IT:
- Directors carrying irreplaceable domain expertise
- Experts drowning in repetitive mentoring
- Leaders who've become the bottleneck
- Architects whose knowledge doesn't scale
The question is no longer: "Can judgment be scaled?"
The question is: "When will you scale yours?"
[End of Part 2]
About this article: This article has been created jointly with R. Harini Maniam, Chartered Accountant.