Packaged Agile Logo

AI Capability Proof Checklist

9 Questions That Expose Fake AI Before You Buy
Compliance checks won't save you. Capability tests will.
Bring this to your next vendor demo. Ask these questions in order.

The Smoke Test

Is this in production with referenceable federal clients?
"Yes, [Agency X] has used it for 18 months. Here's the contact."
"Currently in beta with select partners"
Does it learn from data, or follow predetermined rules?
"We use supervised learning on [dataset]. The model retrains monthly."
Can't describe the learning mechanism (supervised, unsupervised, reinforcement)
Follow-up: If we remove the natural language interface, what decisions does the AI make independently? (Exposes "GenAI wrappers" where only the UI is AI)

The Wizard of Oz Check

What's your actual automation rate? Give me a number.
"87% straight-through processing. 13% goes to human review for edge cases."
"Hybrid approach" without percentages
Nate Inc. claimed 93-97% automation. Reality: ~0% (human contractors in Philippines/Romania manually processing). Result: $42M securities fraud, DOJ wire fraud charges, CEO indicted April 2025.
Do offshore contractors handle data, exceptions, or transactions?
"No. All processing happens on US-based infrastructure with cleared personnel."
Vague "global support teams"
Ironically, for government, human-in-the-loop is a feature, not a bug. OMB M-25-21 requires meaningful human review for high-impact AI decisions.

The Learning Test

What datasets trained the model? Do you have legal rights to that data?
"Trained on [X million records] from [source]. We have licensing agreements."
"Proprietary blend" with no specifics
OMB M-25-21 requires training data transparency for high-impact AI systems.
How do you detect when model performance degrades?
"We monitor accuracy weekly. If it drops below 92%, we trigger a retrain."
No monitoring, no drift detection, no alerts
"Drift" = accuracy drops as real-world data changes. Good AI vendors detect this. Rule-based systems fail silently. OMB M-25-21 requires continuous monitoring for high-impact AI.
Why did the model make THIS decision? Show me.
"Here are the top 5 features that drove this prediction, ranked by weight."
"It's proprietary" or can only show rule logic, not learned patterns
OMB M-25-21 requires explainability for high-impact AI. If they can't explain it, they can't defend it in an audit.

The Proof Test

True story: A federal CIO required live demos with real agency data. Three vendors withdrew within 48 hours. The fourth passed—and delivered 40% faster processing.
Will you demo with OUR data before we sign?
"Yes. Send us a sanitized dataset and we'll run it live."
"Our demo environment doesn't support external data"
If they say no: walk away. If they withdraw: you just saved yourself a GAO finding.
Show me validated results—metrics, not testimonials.
"Agency X reduced processing time by 40% and error rate by 12%. Here's the case study with false positive/negative rates across demographic groups."
Only quotes, no quantified outcomes; no error rate breakdowns
Ask for: False positive rate, false negative rate, and performance on edge cases compared to industry baseline. OMB M-25-21 requires bias testing across protected classes.

Contractual Protection (Post-Award Risk Mitigation)

Before you sign, ensure your contract includes:

Scoring: Count Your Checkmarks

9/9Strong candidate. Proceed to pilot.7-8/9Investigate gaps before contract.5-6/9High risk. Require additional validation.<5/9Walk away. This is likely AI washing.

Risk Categories:

  • Compliance Risk: OMB M-25-21 violations, audit failures, GAO findings
  • Reputational Risk: Congressional hearings, Federal News Network headlines, public trust erosion
  • Operational Lock-in: Vendor dependency, no real efficiency gains, data rights issues
$5.6B
Federal AI Spend
2022-2024 (OMB)
$400K
SEC Fines for
AI Washing (2024)
$42M
Nate Inc. Fraud
DOJ Indictment
70-80%
AI Initiative
Failure Rate
Downloadable Version
Need help running a vendor evaluation? Contact us

Learn More

For a deep dive on AI washing detection, real-world case studies, and the complete federal compliance framework, see our article:"The $400,000 Question: How to Avoid Buying AI Snake Oil"(link to be added when blog post is published)