Trustworthy AI Begins with Trustworthy Data: The Foundation We Often Overlook

TiE Charter Member at TiE Chennai – Founder & CEO at XPRUS Consulting Services – AI/ML Enthusiast, Data Privacy & Cybersecurity specialist, Mentor , Advisor – CISSP, CCSP, 4x GCP, 2x AWS , 1x Azure

October 11, 2025

The Uncomfortable Truth: AI is only as trustworthy as the data it’s built upon. As we rush into the AI era, we are overlooking a fundamental truth that is already causing a systemic crisis in digital governance.

Three Stories That Reveal a Systemic Crisis

As my friend narrated three recent frustrating experiences with digital government systems, a troubling pattern emerged: issues caused by broken data, not broken algorithms.

1. The Phantom Challan from Thanjavur

My friend, whose two-wheeler never left a 10-mile radius in Chennai, found a ₹200 traﬃc challan dated 2022 on the Parivahan website for an offense committed 340 kilometers away in

Thanjavur. He received no prior notiﬁcation (SMS, email, or WhatsApp), and when he tried to pay the ﬁne—accepting the system’s word—that too didn’t work.

The Data Quality Issue: Incorrect location data, missing notiﬁcation records, broken payment integration, and a lack of a clear dispute resolution mechanism.

2. The Professional Tax Mystery

After diligently paying his startup’s professional tax for two years and having digital receipts, he visited the municipality to pay the current year’s tax. The staff member informed him he had

₹6,000+ pending from 2023-24. When presented with the receipt, the staff’s response was telling: “Maybe it’s a software error.” The only workaround was to pay a nonsensical ₹5 to clear the “pending” amount and unlock the current year’s payment.

The Data Quality Issue: Payment records not synchronized, staff working around system failures instead of ﬁxing them, and citizens forced into illogical compromises.

3. The Ten-Year-Old (Post Oﬃce ) Banking Error

A phone call and a letter demanded repayment for overpaid interest on his wife’s MIS account—an error that occurred in 2015. The local post oﬃce explained the error occurred

during the core banking migration. A system upgrade, meant for eﬃciency, introduced an error that went undetected for a decade.

The Data Quality Issue: Migration errors left uncorrected, a lack of validation checks

during data transfer, delayed detection spanning years, and the burden of proof shifted entirely to the citizen.

Before Trustworthy AI, We Need Trustworthy Data

We are debating the ethics of AI algorithms while ignoring the integrity of the data that feeds them. These cases reveal systemic issues that no amount of AI sophistication can overcome.

Achieving trustworthy data requires focusing on these key pillars:

🎯 Data Accuracy: If the source data is wrong (impossible vehicle locations, contradictory payment records), every downstream system will fail.
🔄 Data Integration: When systems can’t talk to each other or lose data during migration,

citizens pay the price.

✅ Data Validation: Systems must implement ﬂags for impossible scenarios (like simultaneous presence in two cities) and question data anomalies before they become citizen problems.
🔍 Data Transparency: Citizens must be able to see, verify, and dispute their “Software error” cannot be an acceptable explanation without clear recourse.
📝 Data Governance: Clear accountability for data quality is essential to prevent data

degradation and system breakdowns.

⚖ Data Accountability: When data errors cause hardship, there must be mechanisms for correction, compensation, and prevention.

The Hidden Cost: Burden on the Common Citizen

Behind every data error is a citizen who must bear the consequences, resulting in a signiﬁcant hidden cost.

⏰ Time Tax: Citizens must take time off work, travel, and navigate bureaucracy to resolve errors they did not create.
💰 Financial Burden: Costs include travel, lost wages, printing documents, and sometimes making “workaround” payments like the absurd ₹5.
😰 Mental and Emotional Stress: The anxiety and frustration of facing a broken system

affect mental health and productivity.

📄 Burden of Proof: The citizen is forced to prove their innocence, saving every receipt, while the system’s faulty data is presumed correct.
🚫 Service Denial: Until errors are resolved, citizens are denied services they are entitledto.

⚖ Erosion of Trust: Each unresolved error erodes citizen trust in digital Without trust, technological advancement is meaningless.

When systems fail, it’s always the citizen who pays—in time, money, stress, and dignity. This is fundamentally unjust.

Building Trustworthy Systems: A Framework for Action

The conversation about trustworthy AI must begin with a commitment to trustworthy data. Here are 10 mandates for action:

Data Quality Audits: Conduct regular, independent audits of all databases and hold a Chief Data Quality Oﬃcer accountable for data integrity.
Citizen-Centric Validation: Implement real-time validation checks that ﬂag impossible scenarios and question data anomalies before they become problems.
Transparent Dispute Resolution: Create accessible, time-bound dispute The burden of proof must lie with the system, and citizens should be compensated for time spent resolving system errors.
Migration Safeguards: Mandate comprehensive data validation before and reconciliation audits after any system Never allow migration errors to remain undetected for years.
Proactive Notiﬁcation Systems: Use multiple channels (SMS, email, app) to provide timely notiﬁcations for any required Silence should not be an excuse to penalize citizens.
Interoperability Standards: Enforce strict data exchange standards so that when one system records a payment, all connected systems reﬂect it immediately.
Accountability Framework: Establish clear lines of accountability for data errors, ensuring consequences and corrective action.
Training and Capacity Building: Train staff to report and ﬁx system errors, not just work around them.
Data Rights and Privacy: Give citizens the right to view, verify, and correct their data, ensuring both accuracy and security.
AI Readiness Assessment: Before deploying AI in any service, mandate a data quality assessment. AI readiness is data readiness.

The ﬁnal statement:

“You can have the most sophisticated AI in the world, but if you feed it garbage data, you get garbage decisions—only faster and at scale.”

It’s time we prioritized data quality with the same urgency we bring to AI innovation. Because trustworthy governance begins with trustworthy data.

What data quality issue have you encountered recently that impacted you or your business? Share your story in the comments.