February 10, 2026

Making Expert Knowledge Scalable With AI

AI, AI Enabled Engineering

Making Expert Knowledge Scalable with AI

Part 2: The Solution

A story about scaling judgment in the age of AI

3 months later.

Ram closes his laptop at 7:15 PM. Not 9:47 PM like he used to.

He smiles.

Tomorrow, he'll present the Q3 results to the leadership team. They'll ask the usual question: "Can we do it faster?"

This time, his answer will be different.

The Realization

It started with a conversation Ram didn't expect.

One month after the lease classification disaster, Ram reached out to an AI Coach—someone who'd spent 30 years in enterprise systems and had proven experience in AI-enabled engineering.

Ram was skeptical about "AI coaches." Most were either YouTubers or consultants who'd never built real systems. But this coach came recommended by a trusted colleague: "Thirty years in enterprise systems. Actually understands complexity. They've done this before."

Ram took the chance. In their first conversation, the coach didn't pitch tools or frameworks. They just listened, then said something that made Ram sit up:

"The problem isn't your team or your documentation. The problem is that you're trying to store judgment in a medium that can't hold it."

Ram explained his situation: the constant re-learning, the classification errors, the bottleneck he'd become.

"Code stores logic. Documentation stores information. But judgment? Judgment lives in the space between the question and the answer. It's contextual. It's iterative. You can't freeze it in static formats."

Ram nodded. "That's exactly what I'm experiencing. But how do you solve it?"

"You don't store the judgment itself. You store the conversation that led to the judgment. You make the decision-making process visible, capturable, and queryable."

"I've seen this pattern in enterprise systems for 30 years. The solution isn't better documentation or training. It's building systems that surface ambiguities explicitly, capture expert judgment when it's provided, and make that judgment reusable."

That night, Ram couldn't sleep. But for a different reason—he finally saw a path forward.

The Insight

Ram had been thinking about the problem wrong.

He'd been trying to:

Document every classification rule → static, gets stale
Train developers on principles → oral tradition, doesn't scale
Build decision trees → can't capture nuance
Write better code → only stores logic, not reasoning

What if instead, he could build a system that:

Surfaces ambiguities instead of hiding them
Captures judgment when it's made
Makes that judgment queryable when the same situation appears again
Asks better questions based on accumulated context

Not "AI that classifies transactions."

But "AI that helps developers make classification decisions the way Ram would."

The Experiment

Ram started small. One client. One quarter. One cash flow statement.

He worked with the AI Coach to design and implement the system. The coach assembled Ram's existing developers—helping them understand both the accounting domain and how to work with LLMs effectively. More importantly, the coach brought a proven methodology for encoding expert judgment in AI systems.

The approach was radical:

Step 1: Generate a Draft, Not a Final Statement

They took the client's Trial Balance and fed it to a local LLM (Llama-3 8B) along with:

Historical accounting policies
Prior year's cash flow statement
Previous auditor comments
IndAS 7 standard text

The LLM's job wasn't to produce the final cash flow statement.

Its job was to produce a first draft with explicit flags.

Step 2: Flag Ambiguities, Don't Resolve Them Silently

The system generated output like this:

Operating Activities:
  Interest paid: ₹45 crore

  ⚠️ CLASSIFICATION AMBIGUITY:
  Interest paid classification depends on company policy election per IndAS 7.
  
  Prior year: Classified as Operating Activities
  Current policy document: Not explicitly stated
  
  Question for reviewer:
  - Confirm policy election for interest paid classification
  - If policy has changed, note reason for change
  
  If Operating: No change needed
  If Financing: Move to Financing Activities section

Another example:

Lease payments: ₹12 crore

  ⚠️ CLASSIFICATION AMBIGUITY:
  Trial balance shows "Lease Payments" but doesn't distinguish between:
  - Operating lease payments (Operating Activities)
  - Finance lease principal (Financing Activities)
  - Finance lease interest (Operating or Financing per policy)
  
  Question for reviewer:
  - What is the breakdown between operating and finance leases?
  - For finance leases, is interest separated from principal?
  - What is the policy election for finance lease interest?

This was different from traditional tools that either:

Made assumptions silently (wrong)
Required upfront configuration of every scenario (impossible)

Step 3: Accountant Reviews and Resolves Ambiguities

Here's the critical part: The accountant is the user, not the developer.

When the company accountant, Meera, uploaded the Trial Balance for Q1 2024, she received a notification 4 minutes later:

Draft cash flow statement ready for review
3 ambiguities require your attention

She opened the workspace. It looked like a review interface with a problems panel:

CASH FLOW STATEMENT REVIEW - Q1 2024
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

Draft Generated: ✓
Ambiguities: 3 require resolution
Status: Awaiting review

AMBIGUITIES (3)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

⚠️  Interest paid classification needs confirmation
    Line 147: Interest paid ₹45 cr → Operating Activities
    
    Context: 
    - IndAS 7 permits Operating OR Financing classification
    - Prior year (Q4 2023): Classified as Operating Activities
    - Current policy document: Not explicitly stated for this year
    
    Question: Confirm policy election for current year
    
    [Confirm Operating] [Change to Financing] [View Policy History]

⚠️  Lease payment breakdown required
    Line 203: "Lease Payments ₹12 cr"
    
    Context:
    - Trial balance shows aggregate amount
    - Need breakdown for proper classification:
      * Operating leases → Operating Activities
      * Finance lease principal → Financing Activities  
      * Finance lease interest → Per policy election
    
    Question: Provide breakdown or request from finance team
    
    [Enter Breakdown] [Request from Finance] [View Q4 Treatment]

⚠️  Foreign exchange gain treatment unclear
    Line 298: "FX Gain ₹5 cr" in P&L
    
    Context:
    - P&L shows FX gain but cash impact unclear
    - Could be: Realized (affects cash) or Unrealized (non-cash)
    - Treatment depends on realization status
    
    Question: Clarify if realized or unrealized, and source
    
    [Provide Details] [Escalate to Ram] [View FX Policy]

Meera reviewed the first ambiguity.

Meera clicked [Confirm Operating] and typed her reasoning in plain English:

Resolution for: Interest paid classification

Decision: Operating Activities (confirmed)

Reasoning:
Per company accounting policy manual section 4.2: 
All interest paid (on borrowings and finance leases) classified as Operating Activities.

Confirmed with CFO in policy review meeting January 2024.
Auditor (KPMG) approved this treatment in FY22 audit (ref: memo AM-22-Q4-15).

Continue applying consistently.

Approved by: Meera Sharma, Senior Accountant
Date: March 15, 2024

She clicked [Submit]. The system immediately:

Updated the cash flow statement
Marked the ambiguity as resolved ✓
Stored Meera's reasoning for future reference

For the lease breakdown, Meera clicked [Request from Finance]. The finance team responded later that morning with the split. Meera entered the breakdown in plain English, the system processed it, and she confirmed. Ambiguity resolved ✓

The system stored these judgments—not as code, but as contextual reasoning attached to specific scenarios.

Step 4: LLM Reprocesses with New Context

The LLM took Ram's judgment and:

Updated the cash flow statement
Documented the classification reasoning
Stored the pattern for future similar scenarios

Next time an accountant worked on a cash flow statement for this client, the system would:

Automatically apply the known policy elections
Only flag NEW ambiguities
Reference past decisions when relevant

The First Real Test

Two weeks later, Q2 closed. Meera uploaded the Q2 Trial Balance into the system.

4 minutes later, she received a notification: "Draft ready for review."

She opened the workspace. This time, fewer ambiguities:

AMBIGUITIES (1)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

✓ Interest paid: ₹52 crore → Operating Activities
  Auto-applied based on Q1 policy decision
  Reference: Meera's resolution from March 15, 2024
  No action needed unless policy changed.

⚠️  Lease payments: ₹13 crore (NEW AMOUNT)
    
    Trial balance shows ₹13 crore (increased from ₹12 cr in Q1)
    
    System suggested provisional split based on prior-quarter ratios, pending confirmation:
    - Operating leases: ~₹8.7 cr
    - Finance principal: ~₹3.2 cr  
    - Finance interest: ~₹1.1 cr
    
    Question: Please confirm actual Q2 breakdown
    
    [Enter Breakdown] [Request from Finance] [Apply Q1 Ratio]

Meera saw only ONE ambiguity instead of three.

Interest classification? Already decided and stored. Applied automatically.

She clicked [Request from Finance] for the lease breakdown. Finance team responded that afternoon. Meera entered the breakdown, reviewed the proposed classification, and confirmed. Marked resolved ✓

Done.

Total time: 25 minutes (vs 4+ hours manually, vs 45 minutes in Q1).

Ambiguities to resolve: 1 (vs 3 in Q1, because policies already stored)

Classification errors: 0 (Ram spot-checked, found none)

Most important: Meera spent her time on what actually needed her judgment (new lease amounts), not re-deciding policies that were already established.

The Transformation

Over the next 3 months, Ram rolled out the system to all company accountants working with their team.

The Results:

Accountant time per statement: 6-8 hours → 25-45 minutes
Ram's review time: 2-3 hours → 15-20 minutes
Classification errors: 3-5 per statement → 0-1
New accountant ramp-up: 6 months → 2 months

85% reduction in errors. 80% time savings. Results varied by client complexity, but the trend was consistent.

But the numbers don't tell the full story.

What Actually Changed:

1. Ambiguities Became Visible and Explicit

Before: Accountants had to manually identify every classification decision point while building the cash flow statement. Many decisions were made implicitly or by pattern-matching without conscious review.

After: The system explicitly flagged every point where judgment was required, making invisible decisions visible.

Example:
Trial Balance shows "Dividend Received ₹5 cr"

Old workflow: Accountant manually classifies while building statement (Operating? Investing? Check last year... probably Operating... move on)

New workflow: System shows in the workspace:

⚠️  Dividend classification requires policy confirmation

Context: IndAS 7 permits Operating OR Investing classification
Prior treatment: Operating Activities  
Policy rationale: "Company views dividends as returns on operational investments"

Status: ✓ Policy confirmed by Meera (Senior Accountant) on Jan 15, 2024

Action: Verify this dividend is consistent with policy, or update if changed.

[Apply Prior Treatment] [Update Policy] [View Full History]

Accountant now sees:

This decision point explicitly (not buried in the process)
What the established policy is
The reasoning behind it
Whether it applies or needs updating

Critically: Decisions that were implicit become explicit. Nothing slips through unreviewed.

2. Judgment Became Reusable

Before: Ram answered the same questions 100 times.

After: Ram answered each question once, and the system made it queryable.

Example: Junior accountant Kavya encounters "FX Gain ₹8 cr" in Trial Balance.

Old workflow: Kavya asks Ram, he explains 3 scenarios for 20 minutes.

New workflow: System shows Ram's stored guidance covering realized vs unrealized FX, with examples for each scenario. Kavya gets the answer in 2 minutes. Ram gets his time back.

3. Edge Cases Became Learning Opportunities

Before: Edge cases caused escalations and stress.

After: Edge cases became additions to the knowledge base.

Example: Complex foreign loan repayment with embedded FX loss. System flagged it as novel, escalated to Ram. Ram provided judgment with full reasoning. Stored in knowledge base. Next time this scenario appears, the system knows how to handle it.

But the system wasn't perfect:

One month later, a client had a similar-sounding but fundamentally different transaction: A forex hedge settlement that coincided with loan repayment but wasn't directly linked.

The system initially flagged it as matching Ram's prior guidance on "foreign loan with FX loss."

Meera reviewed the proposed classification and immediately saw the issue—this was a separate hedging transaction, not embedded FX on the loan itself.

She clicked [Reject Proposed Classification] and entered:

This is different from RAM-2024-05-22 scenario.

This is a cash flow hedge settlement (separate contract).
Classification: Operating Activities per hedging policy
Not related to financing activity.

Escalating to Ram to confirm and store as new pattern.

Ram confirmed Meera's assessment. The system learned the distinction.

This is exactly how it should work: The system proposes based on patterns, but the accountant is always the final authority. When the system gets it wrong, the human catches it, corrects it, and the system learns the nuance.

4. Clear User: Accountants Work Directly in the System

This was the fundamental shift.

Before: Developers built cash flow generation code. Accountants provided requirements. Lots of lost-in-translation errors.

After: Accountants work directly in the system. Developers improve the system itself (code, prompts).

The Roles:

Accountant (User): Uploads Trial Balance, resolves ambiguities, provides judgment
Ram (Domain Expert): Reviews complex scenarios, builds judgment library
Auditor: Validates contentious items
Developer (Builder): Improves system code and prompts—NOT in operational workflow

The Workflow:

Accountant (Meera) uploads Trial Balance
System generates draft + surfaces ambiguities
Accountant resolves in workspace (plain English)
System stores and applies judgment
Accountant reviews final statement
Ram spot-checks (optional for complex cases)

No translation layer. No handoffs. Accountant drives the process.

5. Junior Accountants Learned Faster (By Seeing Expert Reasoning)

Before: Junior accountants learned by making mistakes and getting corrections.

After: Junior accountants learned by seeing explicit reasoning for every decision.

Kavya (junior accountant, 3 months experience) said:
"Earlier, I'd see 'Interest paid ₹50 cr' in Trial Balance and panic. Is this Operating or Financing? Check last year's statement... check policy manual... ask Meera... hope I got it right.

Now, the system shows me: 'Interest paid → Operating Activities per company policy (ref: Meera's decision March 2024). Rationale: All interest classified as Operating per accounting manual section 4.2, confirmed with CFO, auditor-approved FY22.'

I'm learning the WHY behind decisions, not just copying prior treatments blindly."

Meera (Senior Accountant) said:
"Before, I'd explain the same classification rules to every new accountant. They'd still make mistakes because remembering rules without context is hard.

Now, every decision I make is captured with full reasoning. New accountants see my judgment, understand the context, and apply it consistently. My expertise scales without me repeating myself 100 times."

Ram said:
"I used to spend 2-3 hours reviewing each cash flow statement, finding the same classification errors repeatedly.

Now, the system catches 90% of those based on stored judgment. I spend 15-20 minutes on genuinely new scenarios—complex transactions, new standards, edge cases—where my expertise actually adds value."

What the System Doesn't Do

This is critical. Ram was very clear about the boundaries:

The System Does NOT:

❌ Make final classification decisions without human review
❌ Invent policy elections or make up guidance
❌ Override stored accountant/auditor judgment
❌ Perform final arithmetic reconciliation (that's deterministic code)
❌ Produce "audit-ready" output without accountant sign-off
❌ Replace accountants or domain experts

The System DOES:

✓ Generate structured drafts from Trial Balances
✓ Surface ambiguities explicitly in a review workspace
✓ Accept judgment in plain English from accountants/auditors
✓ Store that judgment with full context and reasoning
✓ Reference historical judgment and apply established policies
✓ Ask targeted questions when context is missing
✓ Make expert judgment queryable and reusable across quarters/clients

Role Clarity:

Accountant (Primary User): Uploads TB, resolves ambiguities, provides judgment, reviews output
Ram (Domain Expert): Handles complex scenarios, builds judgment library, spot-checks
Auditor: Validates contentious items, confirms treatments
Developer (System Builder): Improves code and prompts—NOT involved in operational workflow

The Key Insight:

It's not "AI replaces accountants."

It's not "AI automates accounting decisions."

It's "AI surfaces decision points + stores expert judgment + makes it reusable."

The accountant is always in control. The accountant always decides.

The Technical Foundation

The AI Coach helped Ram understand why this approach worked:

Why a Local LLM (Llama-3 8B)?

Data sensitivity: Trial balances are confidential, auditor comments are privileged
Cost and control: Runs on a single GPU, predictable latency, no per-call token costs
Right-sized: Not trying to "know everything"—good at flagging ambiguities and pattern matching

The Architecture:



Ingestion API (Rust Language)

Step 1 — Capture Two Trial Balances
API receives opening and closing TB files from the user.

Step 2 — Convert Ledgers into Movements
API computes ledger-wise deltas to create movement records.

Step 3 — Add Deterministic Machine Hints
API enriches each movement with debit/credi,stock/flow, and working-capital hints.

Step 4 — Retrieve Relevant Past Judgments
API queries the Knowledge Base for similar historical cases.

API packages Movement + Hints + Context into a job.

Redis

Step 5 — Job Queued for Processing
The enriched job is pushed to the queue for asynchronous AI processing.

Orchestrator (Rust Language)

Step 6 — Layer Context Like an Accountant
Orchestrator assembles facts, policy, auditor notes, and KB cases in correct order.

Step 9 — Deterministic Engine Builds the Cash Flow
After AI responses, Orchestrator constructs the cash flow using rule-based logic.

Step 10 — Accountant Reviews Only Ambiguities
Low-confidence items are routed to the human review workspace.

Step 11 — The System Learns from Every Decision
Human resolutions are stored back into the Knowledge Base.

Llama-3 8B/Qwen 2.5 14B

Step 7 — AI Performs Major Classification
LLM classifies each movement into Operating, Investing, or Financing.

Step 8 — AI Refines into Sub-classification & Treatments
LLM determines sub-type and special accounting handling.

Take Away

API prepares accounting facts
Queue manages jobs
Orchestrator thinks like an accontant
LLM applies judgment
Human resolves only edge cases
System keeps learning

The CFO Conversation

Three months after rolling out the system, Ram met with the CFO.

CFO: "Your team's productivity is up 40%, but headcount is flat. What changed?"

Ram: "We stopped re-learning classification decisions. Accountants work directly in a system that captures their judgment once and makes it reusable."

CFO: "So you built a knowledge base?"

Ram: "Not quite. Traditional knowledge bases are static documents. This is conversational memory. The system doesn't just store what accountants decided—it stores why they decided it, what context mattered, and applies that reasoning to new situations."

CFO: "And this scales?"

Ram: "Yes. Every judgment Meera provides becomes part of the knowledge base. The system gets smarter organically. We're learning by doing, and the learning sticks across quarters and clients."

CFO: "What about new accountants?"

Ram: "Kavya generated her first cash flow statement with minimal errors. That would have taken 6 months to achieve before. She's learning from seeing Meera's reasoning for every decision, not from making mistakes and getting corrected."

CFO: "Wait—who's actually making the accounting decisions? The system or the accountant?"

Ram: "Always the accountant. Meera uploads the Trial Balance. The system generates a draft and flags ambiguities. Meera resolves them in plain English. The system stores her judgment and applies it going forward. The accountant is always in control."

CFO: "So accountability is clear?"

Ram: "Crystal clear. Every judgment is attributed—who decided, when, and why. Meera signs off on policy questions. I sign off on complex scenarios. Auditors validate contentious items. Full audit trail."

CFO: "That's... actually better governance than before."

Ram: "Exactly. Before, decisions were implicit—buried in spreadsheet formulas and manual processes. Now, every decision is explicit, documented, and traceable. Auditors love it."

CFO: "So you're... less busy?"

Ram smiled: "I close my laptop at 7 PM now. I spend my time on genuinely new problems, not reviewing the same classification errors repeatedly. Meera spends less time fixing mistakes, more time on strategic policy decisions."

CFO: "Can we scale this to other areas?"

Ram: "Absolutely. Anywhere expert judgment is being re-learned repeatedly—revenue recognition, tax provisions, lease accounting—yes."

What Ram Learned

Looking back, Ram realized the breakthrough wasn't just the technology. It was finding someone who understood the problem deeply—and brought both worlds together.

The AI Coach didn't try to sell him AI tools or promise automation. They understood:

The actual problem: Judgment doesn't scale through documentation
Both worlds: 30 years in enterprise systems + proven AI engineering experience
A methodology: Systematic approach to encoding expertise, not just deploying tools

The Key Insights:

1. The problem wasn't tools. It was approach.
His team had tried GitHub Copilot, ChatGPT—all the tools. But they were using AI to speed up coding. Cash flow errors aren't coding errors. They're classification errors. The breakthrough was using AI to surface ambiguities and store judgment, not generate code.

2. Automation isn't the goal. Augmentation is.
The system didn't replace Ram. It multiplied him. It made human judgment visible, capturable, reusable, and scalable.

3. Context is everything.
The same AI (Llama-3 8B) that gave generic answers became incredibly useful when given client-specific history, auditor comments, and Ram's stored judgment. Context transformed it from "fast but wrong" to "helpful and reliable."

4. Knowledge transfer is really judgment transfer.
Ram had been trying to transfer knowledge through documentation and training. What he needed to transfer was judgment: how to recognize decision points, what questions to ask, what context matters, how similar situations were handled before.

Six Months Later

Ram closes his laptop at 7:15 PM.

His team now handles 40% more clients with the same team size.

Classification errors in production: down 85%.
Auditor queries: down 70%.
New accountant onboarding: 2 months (was 6 months).

But the biggest change?

Ram isn't the bottleneck anymore.

His knowledge is accessible. His judgment is queryable. His expertise scales.

When Meera gets promoted to Financial Controller at another division, the judgment she provided doesn't walk out the door with her.

When Kavya encounters a complex FX scenario she's never seen, she doesn't wait for Ram's next available slot.

When a new accountant joins, they don't spend 6 months making every mistake that's already been made.

Ram thinks back to that 9:47 PM night six months ago.

He'd been trapped by his own success. His expertise had become the constraint.

The problem wasn't his team.
The problem wasn't documentation.
The problem wasn't process.

The problem was that judgment couldn't be stored in code or documents.

Until now.

Ram isn't unique.

There are thousands of "Rams" across Indian IT:

Directors carrying irreplaceable domain expertise
Experts drowning in repetitive mentoring
Leaders who've become the bottleneck
Architects whose knowledge doesn't scale

The question is no longer: "Can judgment be scaled?"

The question is: "When will you scale yours?"

[End of Part 2]

About this article: This article has been created jointly with R. Harini Maniam, Chartered Accountant.

Leadership, Communication; Culture

Raja Subramanian

Raja is a programmer since 1993 and has extensive experience in architecting and implementing object oriented and distributed systems. He has architected 50+ large applications across domains which include Retail, Manufacturing, Healthcare, Health Analytics and Financials. Through PM Power, he helps organizations in Agile methodology and Extreme Programming.

What do you think?

Show comments / Leave a comment