AI in Medical Diagnostics: Capabilities vs Hype

Evidence-Based Analysis | FDA Clearance, Accuracy Claims, and Liability

Quick Navigation: → FDA 510(k) Clearance → Sensitivity/Specificity → AI vs Clinical Judgment → Liability & Malpractice → Evaluating Vendors

⚠️ Critical Reality Check: FDA clearance does NOT mean an AI diagnostic tool is "better than doctors." It means the device is substantially equivalent to an existing predicate device. Most AI diagnostics are cleared as clinical decision support — they assist, not replace, physician judgment.

1. FDA 510(k) Clearance: What It Actually Means

Most AI diagnostic tools enter the market via the FDA's 510(k) pathway, which requires demonstrating "substantial equivalence" to a predicate device already on the market. This is NOT the same as proving clinical efficacy through randomized trials.

510(k) Clearance Process

1. Identify Predicate
Existing cleared device

→

2. Demonstrate Equivalence
Same intended use, similar technology

→

3. Performance Testing
Accuracy vs predicate on test dataset

→

4. FDA Review
90 days typical

→

5. Clearance Granted
Can market in US

🚩 Marketing Misrepresentation:
"FDA Approved AI" — This is technically incorrect. The FDA clears 510(k) devices, it doesn't approve them. Approval (PMA pathway) requires rigorous clinical trials. Most AI tools are cleared, not approved. This distinction matters for liability and evidence standards.

✓ What to Ask Vendors: 1. "What's your 510(k) number?" (Verify on FDA database: accessdata.fda.gov)
2. "What was your predicate device?" (Should be similar technology)
3. "What dataset was used for clearance?" (Size, diversity, inclusion/exclusion criteria)
4. "Was your AI locked or adaptive?" (Locked = fixed model; adaptive = learns over time, requires post-market monitoring)

2. Sensitivity & Specificity: Reading Between the Numbers

Vendors love to claim "95% accuracy!" But accuracy alone is meaningless without context. Here's what actually matters for clinical decision-making:

Key Metrics Explained:

Metric	Definition	Clinical Meaning	What to Look For
Sensitivity (True Positive Rate)	% of actual positives correctly identified	How good at NOT missing disease	High sensitivity for screening (catch everything)
Specificity (True Negative Rate)	% of actual negatives correctly identified	How good at NOT false alarming	High specificity for confirmatory tests (avoid false positives)
PPV (Positive Predictive Value)	% of positive results that are true positives	If AI says "disease present," how likely is it correct?	Depends on disease prevalence in your population
NPV (Negative Predictive Value)	% of negative results that are true negatives	If AI says "no disease," how confident can you be?	High NPV critical for ruling out conditions
AUC-ROC (Area Under Curve)	Overall discriminative ability (0.5-1.0)	How well AI distinguishes disease from no disease	>0.90 excellent, 0.80-0.90 good, <0.80 questionable

📊 Prevalence Matters: PPV and NPV depend heavily on disease prevalence. An AI with 95% sensitivity/specificity will have very different PPV in a specialty clinic (high prevalence) vs primary care (low prevalence). Always ask: "What was the prevalence in your validation dataset?"

🚩 Red Flag Claims:
"99% accurate!" — Accurate on what dataset? Internal test set? Public benchmark? Real-world diverse population?

"Outperforms radiologists!" — Compared to whom? Average radiologists? Experts? Under what conditions? Was it a fair comparison (same data, same time limits)?

"Zero false positives!" — Statistically impossible. This indicates overfitting or testing on trivial cases.

3. AI Assists vs Replaces: The Reality

Despite marketing hype, no AI diagnostic tool is approved to replace physician judgment. All FDA-cleared AI diagnostics are classified as Clinical Decision Support (CDS) — they provide information to assist, not dictate, clinical decisions.

✓ Appropriate Use Cases: ✓ Screening/Triage: AI flags potential abnormalities for priority review
✓ Second Reader: AI provides independent assessment to reduce missed findings
✓ Quantification: AI measures tumor volume, ejection fraction, etc. (objective metrics)
✓ Workflow Efficiency: AI prioritizes critical cases, auto-rules-out normals

✗ NOT Appropriate: Fully autonomous diagnosis without physician review (except in rare, specific FDA-approved scenarios like certain diabetic retinopathy screening)

⚖️ The "Alert Fatigue" Problem: If an AI system generates too many false positives, clinicians start ignoring its alerts — even the true ones. Studies show alert acceptance rates drop below 50% when false positive rates exceed 10%. Before deploying any AI diagnostic, pilot it in your environment to measure real-world false positive rates and clinician trust.

4. Liability & Malpractice Implications

Who's liable when AI makes a wrong call? The short answer: you are. The physician remains ultimately responsible for patient care decisions, regardless of AI recommendations.

Liability Scenarios:

Scenario	Likely Liability	Risk Mitigation
AI says "normal," physician agrees, but disease present	Physician (failure to independently assess)	Document independent review, don't blindly trust AI
AI flags abnormality, physician dismisses, disease present	Physician (ignoring clinical decision support)	Document rationale for disagreeing with AI
AI malfunctions, gives wrong result, physician relies on it	Physician + potential vendor liability	Verify vendor indemnification clauses in contract
AI not used (available but ignored), diagnosis missed	Physician (standard of care may now include AI)	Understand when AI use is considered standard of care

📋 Documentation Best Practices: Always document in the medical record: (1) AI tool used, (2) AI recommendation, (3) your independent assessment, (4) rationale if disagreeing with AI. This creates a clear audit trail showing you used AI as a tool, not as a crutch.

5. Evaluating AI Diagnostic Vendors

Due Diligence Checklist:

Question	Acceptable Answer	🚩 Red Flag
What's your FDA 510(k) number?	Provides number, verifiable on FDA database	"In process" or "not required for our use case"
What was your validation dataset size?	Thousands of cases, multi-center, diverse demographics	Hundreds or less, single institution
Does your dataset match our patient population?	Similar demographics, disease prevalence, imaging equipment	"Our AI generalizes well" (no specifics)
What's your false positive rate in real-world use?	Post-market surveillance data from current customers	Only internal test set results
How do you handle edge cases / rare conditions?	"AI flags uncertainty, recommends expert review"	"Our AI handles all cases"
Do you update the model after deployment?	"Locked model" or "updates require new FDA submission"	"Our AI continuously learns" (regulatory red flag)
What's your indemnification policy?	Vendor indemnifies for defects in AI performance	"No indemnification" or "use at your own risk"

💼 Service Details: Avondale.AI offers AI Vendor Evaluation services including FDA clearance verification, accuracy claim validation, contract review for liability clauses, and pilot program design. We help you separate marketing from evidence before you commit.

Key Takeaways:

FDA 510(k) clearance ≠ approval; it means "substantially equivalent" to existing devices
Sensitivity/specificity alone are meaningless without prevalence context
AI is Clinical Decision Support — physician remains ultimately responsible
Document AI use: tool, recommendation, your assessment, rationale for disagreement
Verify accuracy claims with independent, multi-center validation data
Vendor contracts should include indemnification for AI performance defects