Evolutionary Analysis
Two-scale evolutionary filtering: germline conservation (dN/dS < 0.3) identifies functionally constrained genes, somatic positive selection (dN/dS ≥ 1.5, FDR < 0.05) identifies cancer-driver mutations.
—
ML Predictive Genes
—
Mean Germline dN/dS
—
% Under Purifying Selection
—
Somatic Genes Tested
—
Median Somatic dN/dS
—
Final Candidates
— candidate genes identified
🧬 Germline Conservation
Germline dN/dS < 0.3 indicates strong purifying selection across
vertebrate evolution, meaning the gene is functionally essential and intolerant of
amino-acid changes.
Interpretation: Most ML-predictive genes (â80%) are under purifying selection
(dN/dS < 0.3), consistent with essential gene function. However, predictive genes show a
slightly higher mean dN/dS (0.234) than background genes (0.203) â they are marginally
less conserved on average. This difference is statistically significant but biologically
modest, and does not undermine the finding that the majority remain under strong purifying constraint.
| Gene | Group | dN/dS | Selection Type | Mouse %ID |
|---|
🔬 Somatic Selection
Somatic dN/dS ≥ 1.5 with FDR < 0.05 indicates positive selection
for non-synonymous mutations within tumours, characteristic of cancer driver genes.
âšī¸ Note: Somatic dN/dS is estimated using a simplified binomial test. This approach
over-estimates the number of genes under positive selection compared to covariate-adjusted methods
(dNdScv; Martincorena et al., 2017). The large number of nominally significant genes reflects
background mutation rate heterogeneity, not genuine positive selection. Only genes passing ALL
three filters (ML-predictive + germline conserved + somatic dN/dS ≥ 1.5 with FDR < 0.05)
are reported as candidates.
| Gene | n_nonsyn | n_syn | dN/dS | CI Low | CI High | FDR q |
|---|
Germline vs. Somatic dN/dS
🌍 Multi-Species Conservation
Evolutionary Filtering Funnel
Progressive filtering through evolutionary constraints. See the full pipeline story for the complete analysis funnel.
Methodological Notes
- Germline dN/dS values are sourced from Ensembl Compara (pre-computed ortholog alignments), not calculated de novo. This provides robust estimates but relies on the Ensembl gene annotation version.
- Somatic dN/dS is calculated using a simplified binomial exact test comparing observed nonsynonymous mutations to expectation under neutrality (expected nonsynonymous proportion ≈ 0.74). This is conceptually inspired by the dNdScv framework (Martincorena et al., Cell 2017) but does not use the full dNdScv negative-binomial model or gene-level covariates. Full dNdScv implementation is planned for a future update.
- Genes with zero synonymous mutations (S=0) produce infinite dN/dS. These are flagged but retained if they are established cancer drivers.