Part 1: When Expertise Can't Scale
A story about scaling judgment in the age of AI
Ram closes his laptop at 9:47 PM. Again.
Tomorrow, he'll present the Q2 roadmap to the leadership team. They'll ask the same question they always ask: "Can we do it faster?"
And he'll give the same answer he always gives: "We're working on it."
But here's what he won't say:
"I've been in this industry for 15 years. I built this cash flow statement platform from scratch. I know every classification rule in Ind AS 7, every reconciliation scenario, every edge case that makes auditors nervous. I can spot a misclassified transaction in seconds.
But I can't make my knowledge move faster than my mouth."
The Success That Became a Trap
Ram is technically successful. His Cash Flow Reporting product serves 200+ enterprise clients across banking, insurance, and manufacturing. His platform generates IndAS-compliant cash flow statements for some of India's largest companies.
His team ships on time. Auditors trust his output. His stakeholders trust him.
But success has a cost.
His best developers don't stay developers for long. Within 18-24 months, they either:
- Become managers for other products internally
- Get poached by competitors at 40% salary bumps
- Start their own ventures
Ram doesn't begrudge them. In fact, he's proud. "I run a leadership factory," he jokes at team celebrations.
But the joke has stopped being funny.
Because every time a trained developer leaves, Ram's knowledge walks out the door with them.
And cash flow classification isn't something you learn from documentation alone.
The Impossible Math
Ram has learned to play a game he can't win:
The Cost Game:
Senior developers = ₹25-35 LPA. Junior developers = ₹8-12 LPA.
Finance keeps asking: "Why do you need so many developers?"
So Ram keeps the team young. The CFO is happy.
The Quality Game:
Junior developers need hand-holding. They don't understand the domain. They don't see the dependencies. They don't anticipate the edge cases.
So Ram mentors relentlessly. The juniors learn. The quality improves.
The Churn Game:
Trained developers become valuable. Other teams want them. Competitors want them.
They leave. Ram starts over.
Cost ↔ Quality ↔ Retention
Pick two. Ram can't have all three.
A Typical Tuesday
9:30 AM - Stand-up
Priya (joined 4 months ago): "I'm working on the interest payment classification module."
Ram: "Good. Operating or financing?"
Priya: "Um... it's interest, so... operating?"
Ram: (internally sighs) "Depends on the company's accounting policy choice. IndAS 7 clause 33 allows either classification, but it must be applied consistently once elected."
Priya: "But I thought there was one correct answer..."
Ram: "That's the complexity. Different clients make different policy elections. We need to handle both and ensure consistency within each client."
Priya looks confused.
Ram: "Let's have a quick call after standup."
He knows this "quick call" will take 30 minutes. And Priya will ask the same question again next month.
11:00 AM - Code Review
Arjun's PR: "Add support for dividend received transactions"
Ram opens the code. Line 47 catches his eye:
if (transactionType === 'DIVIDEND_RECEIVED') {
classifyAs('OPERATING_ACTIVITIES');
}
Ram leaves a comment: "Not necessarily wrong, but incomplete. IndAS 7 permits dividends received to be classified as Operating OR Investing, depending on the company's accounting policy election. We need to check which policy this specific client elected and apply it consistently. Also - you're not handling the reconciliation impact on profit adjustments."
Arjun replies in 5 minutes: "But the requirement doc says operating activities?"
Ram: "The doc shows one client's example. Not a universal rule. Policy choice matters."
He rewrites the logic on a call. Arjun nods and understands.
Next month, someone else will make the same mistake.
1:30 PM - The Senior Dev Question
Vikram (18 months in the team, now a senior developer) pings:
"Ram, client is asking why their cash flow doesn't match their bank balance."
Ram: "Did you handle non-cash transactions?"
Vikram: "Yes, excluded depreciation and provisions."
Ram: "What about foreign exchange revaluation?"
Vikram: "Oh... that's non-cash too?"
Ram: "It's in P&L but doesn't affect cash. Needs to be backed out in reconciliation."
Vikram is smart. He learns fast. But there are 47 scenarios like this.
Ram knows them all. His team is discovering them one bug at a time.
Ram stares at his screen for a moment.
This is the pattern he's been living with for years.
Every classification decision is being re-learned from memory—his memory—because the system has no way to store accounting judgment. It only stores code. And code can't capture context.
Priya will ask about interest classification again next quarter.
Someone else will hard-code dividends as Operating next month.
Another developer will forget about FX revaluation in six weeks.
The knowledge lives in Ram's head. The code lives in the repository. But the connection between them—the judgment, the reasoning, the context—lives nowhere.
"This isn't sustainable," Ram thinks. But he doesn't know what else to do.
2:30 PM - The Blunder
Sneha pings: "Quick question - bank overdraft movement goes under financing activities, right?"
Ram's heart sinks. This should be obvious to anyone who's been here 6 months.
Ram: "Is it a temporary overdraft or a long-term facility?"
Sneha: "Temporary. Used for working capital."
Ram: "Then it's part of cash and cash equivalents per IndAS 7. Not a financing activity. It affects the opening/closing cash balance, not the activities themselves."
Sneha: "But it's borrowing..."
Ram: "Review the cash and cash equivalents definition in IndAS 7. And check the reconciliation notes template."
Ram knows Sneha is confused because this is counter-intuitive. Every transaction requires judgment. Every judgment requires context.
And context is what the team doesn't have.
4:00 PM - The Escalation
The dreaded email arrives: "URGENT: Auditor flagged cash flow discrepancy for Q1"
Client: HDFC Bank subsidiary
Issue: ₹450 crore mismatch in Operating Activities
Auditor: "Classification appears incorrect for lease payments"
Ram's stomach tightens. He knows exactly what happened.
Someone classified operating lease payments as Operating Activities (correct).
And also classified financing lease payments as Operating Activities (wrong - should be Financing).
The code doesn't distinguish between lease types. It just sees "lease payment" and dumps everything into Operating.
But the distinction matters. Legally. To auditors. To regulators.
Ram looks at the git history. The code was written by Rajeev. Who left the company 3 months ago. The code was reviewed by Arjun. Who didn't know the lease classification rules.
This means:
- Evening war room call with the client
- Weekend RCA documentation
- Monday morning explanation to the CEO
- Auditor conversations
- Patch deployment and validation
- Re-running reports for 15 other clients to check if they're affected
"This isn't a math error," Ram thinks. "The calculation is perfect. It's a classification error."
And classification errors are the hardest to catch. Because they require understanding, not testing.
6:30 PM - The Realization
As Ram fixes the lease classification logic, he stares at the code:
// Handle lease payments
if (isLease(transaction)) {
return 'OPERATING_ACTIVITIES';
}
The logic is clean. The code is simple. The test cases pass.
But it's wrong.
Ram remembers what the senior auditor told him last year during the FY24 audit:
"Ram, your calculations are always perfect. The math is never wrong. But sometimes... the thinking behind the numbers needs adjustment. That's what we're here to help with."
The auditor was being diplomatic. What he really meant was:
"Your team doesn't always understand classification nuances. And that's more dangerous than a math error because it's invisible until we review the reports."
Because the code doesn't know that:
- Finance leases under IndAS 116 are classified as Financing Activities (principal portion)
- Finance lease interest is Operating OR Financing (per company policy)
- Payments under operating leases are classified as Operating Activities under Ind AS 7
- But only if you can distinguish them
The developer who wrote this didn't make a coding mistake.
They made a classification mistake.
And no unit test can catch a misunderstanding.
Ram adds a 40-line comment explaining the lease classification rules.
He adds a new parameter: leaseType.
He updates the test cases.
The tests now validate that the code executes correctly against the updated logic. But Ram knows:
Tests can validate execution against defined rules, but they cannot validate whether the rules themselves reflect correct accounting judgment.
The original code had passing tests. It did exactly what it was designed to do. The problem wasn't in the execution — it was in the design.
Tests verify the "how." They can't verify the "why."
He knows someone will need this again. And they won't read the comment.
"This is the pattern," Ram realizes. "Every bug we fix is a classification problem disguised as a code problem."
What Ram Has Tried
Ram isn't sitting idle. He's tried everything:
✓ Detailed Documentation
- 600+ pages on Confluence
- IndAS 7 clause-by-clause explanations
- Classification decision trees with 50+ examples
Result: Developers still make classification errors. Because reading ≠ judgment.
✓ Classification Matrix
- Comprehensive Excel: Transaction type → Activity category
- 200+ scenarios covered
Result: The matrix says "see policy" for 40% of cases. Excel can't capture nuance. "Interest paid" depends on 3 different context factors. Because rules ≠ context.
✓ Intensive Mentoring
- Weekly one-on-ones
- "Classification boot camp" for new joiners
- Real-world scenario discussions
Result: Developers learn... then leave. New developers arrive. The cycle repeats. Oral tradition doesn't scale. Because growth ≠ retention.
✓ Automated Tests & Reviews
- Unit tests for every transaction type
- 25-point review checklist
- Edge case coverage: 85%
Result: Tests verify code does what it's designed to do. But they can't verify that the design itself reflects correct accounting judgment. Checklists become boxes to tick. Because process ≠ comprehension.
Nothing sticks. The knowledge doesn't transfer.
The problem isn't that his team is incompetent. The problem is that classification requires judgment that can't be encoded in static documentation.
The Numbers Don't Lie
Ram keeps a private spreadsheet. Leadership doesn't see it.
Classification Errors (Last 12 Months):
- Production incidents due to misclassification: 23
- Auditor queries on classification: 47
- Client escalations: 11
- Emergency patches: 18
Every single one could have been prevented with proper classification understanding.
None of them were math errors. All of them were judgment errors.
Time Allocation (Typical Week):
- Explaining classification rules: 12 hours
- Reviewing classification logic: 8 hours
- "Quick questions" and escalations: 10 hours
- Actual strategic work: 5 hours
35 out of 40 hours answering questions about things he's already documented.
Cost Per Feature:
Simple feature (e.g., new transaction type): 3 weeks, 2 developers, ₹5L effective cost
- Coding: 2 days
- Understanding classification rules: 1 week
- Review iterations: 1 week
- Testing & validation: 3 days
Complex feature (e.g., new IndAS standard): 8 weeks, 4 developers, ₹25L effective cost
- Coding: 1 week
- Classification framework design: 3 weeks
- Review iterations: 2 weeks
- Validation with auditors: 2 weeks
Notice the pattern: Coding is 20% of the effort. Classification understanding is 80%.
But Ram knows: If the team had his classification judgment, these timelines would halve.
The Churn Tax:
Every departing developer = 3-4 months of accumulated classification knowledge, gone.
Every new developer = 6-8 months to become productive in classification decisions.
With 30% annual churn, Ram is constantly training.
And classification knowledge can't be taught in a classroom. It comes from seeing 1000 scenarios.
The Conversation Ram Dreads
CFO: "Your cost per feature is 2x higher than the market benchmark."
Ram: "We maintain high quality. Zero production issues in the last quarter."
CFO: "Your competitor shipped the same feature in 3 weeks. You took 8."
Ram: "They have more senior developers."
CFO: "Which you can't afford because your cost is already too high."
Ram: ...
There's no good answer. Ram is trapped.
What Ram Thinks About at 2 AM
When he can't sleep, Ram thinks about the gap.
The gap between what he knows and what his team knows.
He knows that "interest paid" isn't a simple classification.
It depends on:
- Company accounting policy election (operating vs financing as permitted under IndAS 7)
- Whether it's on borrowings or lease liabilities
- Whether there's any capitalization involved
- Whether it's on working capital or long-term financing
His team sees: transaction_type: 'INTEREST_PAID' and picks one classification.
He knows that when profit is ₹100 crore but operating cash flow is ₹30 crore, you need to reconcile the gap by identifying non-cash items, working capital changes, and reclassifying items between activities — potentially 20+ different adjustment types.
His team knows: "add back depreciation and provisions."
He knows that classifying a transaction requires asking:
"What was the economic substance of this event?"
"What does IndAS 7 say, and which policy did we elect?"
"What did we do last year?"
"What will the auditor expect?"
His team asks: "Which column does this go in?"
This gap costs:
- Late nights fixing classification errors
- Auditor queries and escalations
- Delays in financial close
- Client trust
- His sanity
And Ram doesn't know how to close it.
Because the gap isn't about knowledge transfer. It's about judgment transfer.
And nobody has figured out how to scale judgment.
The AI Conversation
Last month, the CTO announced: "We're adopting AI tools. GitHub Copilot for everyone."
Ram's team was excited. For two days.
Day 3
Priya uses Copilot to generate transaction classification logic.
The code is clean. The structure is elegant. The syntax is perfect.
def classify_transaction(transaction):
if transaction.type == 'INTEREST_PAID':
return CashFlowActivity.OPERATING
elif transaction.type == 'DIVIDEND_RECEIVED':
return CashFlowActivity.OPERATING
elif transaction.type == 'LEASE_PAYMENT':
return CashFlowActivity.OPERATING
...
Ram reviews it. His heart sinks.
"Why is everything classified as operating?" Ram asks.
"That's what Copilot suggested," Priya says. "And it's based on the most common patterns online."
"The most common patterns are wrong," Ram says. "Or at least, incomplete. IndAS 7 allows multiple classifications depending on policy."
He rewrites it. Again.
Day 5
Arjun asks ChatGPT: "How to classify dividend paid in cash flow statement?"
ChatGPT responds (confidently):
"Under IndAS 7, dividends paid are classified as Financing Activities as they represent distribution to equity holders."
Arjun implements it. The test cases pass.
Ram catches it in code review: "IndAS 7 permits dividends paid to be classified as Operating OR Financing, depending on policy election. We have 40 clients who classify it as Operating."
"But ChatGPT said..." Arjun trails off.
"ChatGPT gives you the textbook answer," Ram says. "Reality is more nuanced."
Day 8
Sneha uses AI to generate reconciliation logic from P&L to cash flow.
The AI generates perfect code for:
- Adding back depreciation ✓
- Adding back provisions ✓
- Adjusting for working capital changes ✓
But it misses:
- Unrealized foreign exchange gains (not cash, needs backing out)
- Share-based payment expense (non-cash, needs backing out)
- Gain on sale of fixed assets (not operating, reclassify to investing)
- Interest capitalized (not expensed, doesn't affect reconciliation)
The reconciliation doesn't balance. Client's auditor flags it.
"Why didn't the AI catch this?" Sneha asks.
"Because," Ram explains, "the AI learned from code repositories. But classification errors in code repositories don't raise exceptions. They pass tests. They go to production. And then auditors catch them."
"The AI learned from mistakes. And now it's reproducing them."
Week 3
Ram runs an experiment. He asks ChatGPT:
"How should I classify bank overdraft in a cash flow statement under IndAS 7?"
ChatGPT responds:
"Bank overdrafts are typically classified as Financing Activities as they represent short-term borrowings from the bank."
Wrong. (At least, wrong for overdrafts that are repayable on demand and form part of cash management.)
He asks again with more context:
"Bank overdraft is repayable on demand and fluctuates between positive and negative balance. How should it be treated in cash flow statement under IndAS 7?"
ChatGPT responds:
"In this case, the bank overdraft should be included in cash and cash equivalents as permitted under IndAS 7."
Correct.
Ram realizes something important:
The AI wasn't wrong. It was context-blind.
The AI wasn't broken or malicious. It was doing exactly what it was trained to do: predict the most likely answer based on patterns in its training data. But it didn't have access to:
- This specific client's accounting policy elections
- Prior-year treatment and auditor approvals
- Industry-specific regulatory context
- The nuances of when exceptions apply
When you give it context, it can reason correctly.
When you don't, it defaults to the most common pattern — which may or may not apply.
The AI had knowledge. It lacked context.
And in classification, context is everything.
Week 4
The CTO asks: "How's the AI adoption going?"
Ram: "Code generation is faster. But classification errors are up 30%."
CTO: "Why?"
Ram: "AI generates syntactically correct code. But it can't verify semantic correctness. It doesn't know our clients' policies. It doesn't know which IndAS clause applies in which context."
CTO: "Can we train it?"
Ram: "On what? We don't have a dataset of correct classifications with reasoning. We have code. And half that code has subtle classification bugs we haven't discovered yet."
The CTO goes quiet.
Ram realized something profound:
AI makes developers faster at writing code.
But cash flow errors aren't coding errors. They're classification errors.
And classification requires:
- Understanding the transaction's economic substance
- Knowing the relevant accounting standard clauses
- Understanding the company's accounting policy choices
- Connecting the transaction to its downstream impacts
- Exercising judgment based on context
AI tools accelerated the wrong thing.
His team was now producing wrong answers faster.
"There must be a better way," Ram thought. "But I don't know what it is."
The Breaking Point
Ram closes his laptop at 9:47 PM. Again.
He thinks about tomorrow's leadership meeting. The CFO will ask about cost per feature. The CTO will ask about AI adoption results. The CEO will ask about the audit escalation.
And Ram realizes something:
He's been solving the same problem for 15 years in different ways:
- Better documentation → Developers still make classification errors
- More mentoring → Knowledge leaves when people leave
- Stricter reviews → Bottleneck gets worse
- AI tools → Wrong answers delivered faster
Every solution has made him more essential. None has made his judgment transferable.
The problem isn't lack of tools. It's not lack of process. It's not even lack of talent.
The problem is that classification judgment doesn't scale through traditional means.
And he doesn't know what non-traditional means would look like.
Ram saves a note to himself:
"Classification errors aren't coding problems. They're judgment problems. Code can't store context. Documentation can't capture nuance. Tests can't validate semantic correctness.
If this problem could be solved, it would change everything. But I don't know how to solve it.
And I can't keep doing this for another 15 years."
He closes his laptop.
9:47 PM.
Tomorrow will be the same.
Unless...
[End of Part 1]
About this article: This article has been created jointly with R. Harini Maniam, Chartered Accountant.
2 Responses