Pipeline Architecture
Interactive visualisation of the complete analysis pipeline — from data acquisition through ML classification to evolutionary candidate identification.
Data Sources
ML Pipeline
Germline Evolution
Somatic Selection
Integration
📋 Pipeline Summary
Stage 1: Data Acquisition
Download TCGA RNA-seq HTSeq counts and MAF mutation files for 5 cancer types via GDC API. DESeq2 pre-filter to ~13,660 genes.
Stage 2: ML Classification
Train LR, RF, and MLP classifiers with 5-fold CV. Extract feature importance signatures. Union top genes across models.
Stage 3: Evolutionary Filtering
Germline dN/dS < 0.3 (purifying selection) + Somatic dN/dS ≥ 1.5 with FDR < 0.05 (positive selection). Intersection = candidates.