Model Performance
Three ML models (Logistic Regression, Random Forest, MLP) trained on TCGA RNA-seq data with 5-fold stratified cross-validation.
—
Best Accuracy
bal. accuracy
—
Best AUC
across cancers
—
Avg Specificity Gain
percentage points
5
Cancer Types
TCGA cohorts
3
Models
LR · RF · MLP
Specificity Improvements by Cancer Type
MLP Performance Dashboard
| Cancer | Bal. Accuracy | Specificity | Sensitivity | AUC | MCC | Architecture | Samples (T/N) |
|---|---|---|---|---|---|---|---|
| Loading… | |||||||
Task × Model Results
| Task | Model | Accuracy | Precision | Recall | ROC AUC |
|---|---|---|---|---|---|
| Loading… | |||||
Limitations
- Near-perfect AUC reflects the intrinsic separability of tumor vs. normal transcriptomes on the full DESeq2-filtered feature set (~5,000 genes), not signature-specific discriminatory power.
- PRAD specificity (73.5%) is the lowest across cancers due to adjacent-normal tumor contamination.
- UCEC has only 201 samples (smallest dataset) and reaches AUC ≈ 1.000 on the full feature set.
- SMOTE oversampling is applied for PRAD and BLCA within CV folds. Class weighting may be more appropriate for high-dimensional data.
All metrics are averaged over 5-fold stratified cross-validation.
Architecture is selected dynamically: 512→256→128 for datasets with
n > 600 samples, 256→128 for smaller datasets.