← Back to Technical Library
AI in Medical Diagnostics: Capabilities vs Hype
Evidence-Based Analysis | FDA Clearance, Accuracy Claims, and Liability
⚠️ Critical Reality Check:
FDA clearance does NOT mean an AI diagnostic tool is "better than doctors." It means the device is
substantially equivalent to an existing predicate device. Most AI diagnostics are cleared
as clinical decision support — they assist, not replace, physician judgment.
1. FDA 510(k) Clearance: What It Actually Means
Most AI diagnostic tools enter the market via the FDA's 510(k) pathway, which requires demonstrating
"substantial equivalence" to a predicate device already on the market. This is NOT the same as proving
clinical efficacy through randomized trials.
510(k) Clearance Process
1. Identify Predicate
Existing cleared device
→
2. Demonstrate Equivalence
Same intended use, similar technology
→
3. Performance Testing
Accuracy vs predicate on test dataset
→
4. FDA Review
90 days typical
→
5. Clearance Granted
Can market in US
🚩 Marketing Misrepresentation:
"FDA Approved AI" — This is technically incorrect. The FDA clears 510(k) devices,
it doesn't approve them. Approval (PMA pathway) requires rigorous clinical trials.
Most AI tools are cleared, not approved. This distinction matters for liability and evidence standards.
✓ What to Ask Vendors:
1. "What's your 510(k) number?" (Verify on FDA database: accessdata.fda.gov)
2. "What was your predicate device?" (Should be similar technology)
3. "What dataset was used for clearance?" (Size, diversity, inclusion/exclusion criteria)
4. "Was your AI locked or adaptive?" (Locked = fixed model; adaptive = learns over time, requires post-market monitoring)
2. Sensitivity & Specificity: Reading Between the Numbers
Vendors love to claim "95% accuracy!" But accuracy alone is meaningless without context. Here's what
actually matters for clinical decision-making:
Key Metrics Explained:
| Metric |
Definition |
Clinical Meaning |
What to Look For |
Sensitivity (True Positive Rate) |
% of actual positives correctly identified |
How good at NOT missing disease |
High sensitivity for screening (catch everything) |
Specificity (True Negative Rate) |
% of actual negatives correctly identified |
How good at NOT false alarming |
High specificity for confirmatory tests (avoid false positives) |
PPV (Positive Predictive Value) |
% of positive results that are true positives |
If AI says "disease present," how likely is it correct? |
Depends on disease prevalence in your population |
NPV (Negative Predictive Value) |
% of negative results that are true negatives |
If AI says "no disease," how confident can you be? |
High NPV critical for ruling out conditions |
AUC-ROC (Area Under Curve) |
Overall discriminative ability (0.5-1.0) |
How well AI distinguishes disease from no disease |
>0.90 excellent, 0.80-0.90 good, <0.80 questionable |
📊 Prevalence Matters:
PPV and NPV depend heavily on disease prevalence. An AI with 95% sensitivity/specificity will have
very different PPV in a specialty clinic (high prevalence) vs primary care (low prevalence).
Always ask: "What was the prevalence in your validation dataset?"
🚩 Red Flag Claims:
"99% accurate!" — Accurate on what dataset? Internal test set? Public benchmark?
Real-world diverse population?
"Outperforms radiologists!" — Compared to whom? Average radiologists? Experts?
Under what conditions? Was it a fair comparison (same data, same time limits)?
"Zero false positives!" — Statistically impossible. This indicates overfitting
or testing on trivial cases.
3. AI Assists vs Replaces: The Reality
Despite marketing hype, no AI diagnostic tool is approved to replace physician judgment.
All FDA-cleared AI diagnostics are classified as Clinical Decision Support (CDS) — they provide
information to assist, not dictate, clinical decisions.
✓ Appropriate Use Cases:
✓ Screening/Triage: AI flags potential abnormalities for priority review
✓ Second Reader: AI provides independent assessment to reduce missed findings
✓ Quantification: AI measures tumor volume, ejection fraction, etc. (objective metrics)
✓ Workflow Efficiency: AI prioritizes critical cases, auto-rules-out normals
✗ NOT Appropriate: Fully autonomous diagnosis without physician review (except in rare, specific FDA-approved scenarios like certain diabetic retinopathy screening)
⚖️ The "Alert Fatigue" Problem:
If an AI system generates too many false positives, clinicians start ignoring its alerts — even
the true ones. Studies show alert acceptance rates drop below 50% when false positive rates exceed
10%. Before deploying any AI diagnostic, pilot it in your environment to measure real-world
false positive rates and clinician trust.
4. Liability & Malpractice Implications
Who's liable when AI makes a wrong call? The short answer: you are. The physician
remains ultimately responsible for patient care decisions, regardless of AI recommendations.
Liability Scenarios:
| Scenario |
Likely Liability |
Risk Mitigation |
| AI says "normal," physician agrees, but disease present |
Physician (failure to independently assess) |
Document independent review, don't blindly trust AI |
| AI flags abnormality, physician dismisses, disease present |
Physician (ignoring clinical decision support) |
Document rationale for disagreeing with AI |
| AI malfunctions, gives wrong result, physician relies on it |
Physician + potential vendor liability |
Verify vendor indemnification clauses in contract |
| AI not used (available but ignored), diagnosis missed |
Physician (standard of care may now include AI) |
Understand when AI use is considered standard of care |
📋 Documentation Best Practices:
Always document in the medical record: (1) AI tool used, (2) AI recommendation, (3) your independent
assessment, (4) rationale if disagreeing with AI. This creates a clear audit trail showing you used
AI as a tool, not as a crutch.
5. Evaluating AI Diagnostic Vendors
Due Diligence Checklist:
| Question |
Acceptable Answer |
🚩 Red Flag |
| What's your FDA 510(k) number? |
Provides number, verifiable on FDA database |
"In process" or "not required for our use case" |
| What was your validation dataset size? |
Thousands of cases, multi-center, diverse demographics |
Hundreds or less, single institution |
| Does your dataset match our patient population? |
Similar demographics, disease prevalence, imaging equipment |
"Our AI generalizes well" (no specifics) |
| What's your false positive rate in real-world use? |
Post-market surveillance data from current customers |
Only internal test set results |
| How do you handle edge cases / rare conditions? |
"AI flags uncertainty, recommends expert review" |
"Our AI handles all cases" |
| Do you update the model after deployment? |
"Locked model" or "updates require new FDA submission" |
"Our AI continuously learns" (regulatory red flag) |
| What's your indemnification policy? |
Vendor indemnifies for defects in AI performance |
"No indemnification" or "use at your own risk" |
💼 Service Details:
Avondale.AI offers AI Vendor Evaluation services including FDA clearance verification,
accuracy claim validation, contract review for liability clauses, and pilot program design. We help
you separate marketing from evidence before you commit.
Key Takeaways:
- FDA 510(k) clearance ≠ approval; it means "substantially equivalent" to existing devices
- Sensitivity/specificity alone are meaningless without prevalence context
- AI is Clinical Decision Support — physician remains ultimately responsible
- Document AI use: tool, recommendation, your assessment, rationale for disagreement
- Verify accuracy claims with independent, multi-center validation data
- Vendor contracts should include indemnification for AI performance defects