Executive summary
Malignant cells do not exist in isolation. The conceptualisation of cancer has undergone a profound paradigm shift — from early observations in the Babylonian Code of Hammurabi, the ancient Egyptian Ebers and Smith Papyri, and the Chinese Rites of the Zhou Dynasty, to a modern understanding in which tumorigenesis, progression, and metastasis are recognised as ecosystem-level phenomena. In other words, humanity has had roughly four thousand years to appreciate that cancer is complicated, and we are still very much working on it. These phenomena are orchestrated by dynamic, tripartite interactions within the tumour microenvironment (TME): a highly complex, heterogeneous matrix comprising malignant cells, non-malignant stromal cells, diverse immune cell populations, and the extracellular matrix (ECM).
Stromal and immune features help identify malignant cells through three distinct but complementary lenses. As compositional evidence, tumours alter the fractions of fibroblasts, endothelial cells, and immune infiltrates in ways that can be computationally deconvolved from bulk expression data. As spatial evidence, immune infiltration patterns, stromal barriers, and structured immune aggregates delineate tumour core, leading edge, and invasive margins — often more robustly than tumour-cell markers alone when malignant cells are sparse or spatially patchy. As mechanistic interaction evidence, ligand–receptor and downstream signalling enriched near malignant niches can define malignant "ecostates" even when malignant cell identity is ambiguous, though these inferences remain correlational unless experimentally validated.
Two pragmatic conclusions follow. First, stromal and immune features are rarely diagnostic of malignancy by themselves — inflammation, fibrosis, and immune activation also occur in benign disease — so they act as probabilistic context that improves discrimination when combined with tumour-cell measurements. A swollen lymph node is not a tumour; it might just be a bad week. Second, the strongest malignancy-identification strategies are increasingly multi-modal, combining tumour-cell intrinsic signals (mutations/CNVs, epithelial programmes), stromal and immune composition, and spatial organisation to infer tumour regions, borders, and malignant clones in situ.
1. An overview of malignant cells
Malignant transformation is a multi-step pathological process driven by the accumulation of genetic and epigenetic alterations, leading to the acquisition of distinct functional capabilities codified as the hallmarks of cancer. The transformation from normal physiology to malignancy represents a catastrophic failure of cellular homeostasis, influenced by an individual's genetics, environmental exposures, physical carcinogens (such as ultraviolet and ionising radiation), and unphysiological DNA recombination. Modern conceptualisations frame malignancy as acquisition of these functional capabilities plus enabling characteristics and systemic interactions — a framework iteratively updated from its original formulations (2000, 2011) through major later revisions (2022 and a further 2026 synthesis) that explicitly elevate microenvironmental and systemic contributions.123
Genomic instability, mutational burden, and clonal evolution
At the core of malignant behaviour is profound genomic instability, which serves as the primary engine for tumour heterogeneity and continuous clonal evolution. The inactivation of critical tumour suppressor genes dismantles the cell's DNA damage response, preventing cell cycle arrest and senescence. A central player is the p53 protein — often called the "guardian of the genome," which is a generous title for something that gets mutated in over half of all human cancers. Mutations leading to p53 dysfunction not only promote the unchecked accumulation of subsequent genetic aberrations but also initiate a cascading effect that fundamentally modulates the TME, creating conditions that favour chronic inflammation and tumour progression.
This genomic chaos allows for the emergence of cellular subpopulations with distinct phenotypic traits and heterogeneous driver genes. Aided by a high proliferative rate, the growing malignant lesion serves as a genetic reservoir, providing the raw material for Darwinian selection within the tumour ecosystem. Models of cancer evolution demonstrate that heterogeneity occurs through new mutations that occasionally confer a growth advantage, resulting in a clonal sweep of a more fit clone — evolution, but at its most inconvenient. This dynamic process drives therapeutic resistance and drastically enhances the metastatic potential of the disease. Within one tumour, subclones can differ in driver events, copy-number states, transcriptional programmes, and immune-evasion strategies — consistent with large-scale genomics showing both recurrent "mountain" driver genes and many lower-frequency "hills."
Replicative immortality and telomerase reactivation
Normal somatic cells are strictly limited in their proliferative capacity by the progressive shortening of telomeres during successive divisions, a mechanism that eventually triggers cellular senescence or apoptosis. Malignant cells overcome this biological barrier to achieve replicative immortality primarily through the reactivation of telomerase, aberrantly upregulating the telomerase reverse transcriptase (TERT) and the telomerase RNA component (TERC) proteins to continually add telomeric repeats back onto chromosome ends. This effectively reverses their fate, allowing them to maintain telomere length above the critical threshold that would otherwise halt division.
Metabolic reprogramming and angiogenesis
To support rapid proliferation and biomass accumulation, malignant cells undergo profound metabolic reprogramming. A classic hallmark is the Warburg effect, wherein cancer cells preferentially utilise aerobic glycolysis over oxidative phosphorylation even in the presence of abundant oxygen.4 This is, in practical terms, a cell choosing the less efficient petrol engine when a perfectly good electric one is available — deliberately, and to great effect. While glycolysis is significantly less efficient in ATP yield per glucose molecule, it proceeds at a faster rate and generates vital intermediate precursors for the de novo synthesis of nucleotides, amino acids, and lipids required to support fast proliferation. Certain cancer cells further adapt by utilising lactate as their primary energy source.
Concurrently, specific genetic mutations drive persistent metabolic and structural alterations. For example, inactivation of the von Hippel–Lindau (VHL) tumour suppressor gene in clear cell renal cell carcinoma (CC-RCC) allows for the normoxic accumulation of hypoxia-inducible factor alpha (HIF-α) subunits, leading to constitutive expression of VEGF-A and sustained angiogenesis. VHL mutation is additionally associated with a hypoxia-independent role in suppressing IGF1R transcription and mRNA stability, indicating a functional interplay between IGF1R and VEGF signalling pathways that sustains pathological angiogenesis.
Phenotypic plasticity, invasion, and metastasis
The lethality of cancer is predominantly dictated by the ability of malignant cells to invade adjacent tissues and colonise distant sites. This metastatic cascade is intrinsically linked to phenotypic plasticity, particularly the epithelial-to-mesenchymal transition (EMT). During EMT, epithelial cancer cells lose their apical-basal polarity and downregulate cell–cell adhesion molecules including specific integrins and cadherins (such as CDH12), thereby acquiring a highly motile, mesenchymal phenotype capable of degrading the basement membrane.
Tumours actively secrete soluble factors, exosomes, and cytokines into the systemic circulation to establish a "pre-metastatic niche" — a distant microenvironment structurally and immunologically modified by mobilised stromal and immune cells, preparing the way for the arrival of cancer cells and enabling their survival across a range of tissues. The tumour, in other words, sends advance scouts. In recent iterations of the hallmarks of cancer conceptualisation, phenotypic plasticity, disrupted cellular differentiation, and nonmutational epigenetic reprogramming have been firmly established as discrete capabilities, alongside recognition that polymorphic microbiomes constitute distinctive enabling characteristics.3
2. The stromal compartment
Stromal cells form the connective tissue framework of the tumour and are heavily manipulated by malignant cells through paracrine signalling and direct cell–cell contact to provide structural support, metabolic cross-feeding, and pro-survival signals. Think of them as the tumour's contractors: initially recruited for legitimate construction work, then quietly redirected to build walls that keep the immune system out. The TME also comprises important acellular components: ECM proteins, growth factors, chemokines, and physical features (stiffness, hypoxia) that feed back on both malignant and non-malignant cells.
| Stromal cell type | Origin and key markers | Primary roles in the TME |
|---|---|---|
| Cancer-Associated Fibroblasts (CAFs) | Derived from resident fibroblasts, MSCs, or transdifferentiated cells; FAP, α-SMA, Periostin, COL1A1/COL1A2, DCN, LUM, PDGFRA/PDGFRB | The most dominant cell type in the TME. Responsible for extensive ECM deposition, collagen cross-linking, and desmoplasia. Create mechanical barriers that limit drug penetration and physically exclude T-cell infiltration. Secrete growth factors and crosstalk extensively with immune cells. |
| Tumour Endothelial Cells (TECs) | Inner lining of tumour blood and lymphatic vessels; PECAM1 (CD31), VWF | Form highly disorganised, structurally defective, and hyper-permeable vascular networks. Facilitate continuous angiogenesis and promote immune evasion by restricting T-cell extravasation into the tumour core. |
| Pericytes | Vascular mural cells; RGS5, CSPG4 (NG2), MCAM, PDGFRB | Regulate vascular stability and barrier function; increasingly appreciated as active participants in metastatic facilitation rather than passive vessel supports. |
| Mesenchymal Stem Cells (MSCs) | Multipotent stromal stem cells recruited from bone marrow or local tissues | Promote cancer cell growth, invasion, and metastasis. Interact directly with tumour and immune cells, differentiating into CAFs or modulating immune function via paracrine signalling. |
| Cancer-Associated Adipocytes (CAAs) | Adipocytes located adjacent to the tumour margin | Reprogrammed to secrete adipokines, pro-inflammatory cytokines, and free fatty acids. Provide dense metabolic fuel for cancer cells; particularly notable in lipid-rich ascites and omental tumour microenvironments of ovarian cancer. |
CAFs physically remodel the ECM through excessive collagen deposition and enzymatic manipulation of the matrix structure, creating a stiff, hypoxic, and high-interstitial-pressure environment that shields malignant cells from therapeutic agents and facilitates metastatic dissemination. Specific subpopulations of CAFs — such as those that are periostin-positive — are actively mobilised by highly metastatic cancer cells to accelerate tumour growth and metastasis.
The ECM is not merely scaffolding; it actively modulates signalling, migration routes, immune cell access, and mechanotransduction. ECM remodelling by tumour and stromal cells helps create a tumour-promoting niche and can physically restrict immune infiltration — one key mechanistic route to immune exclusion.
3. The immune compartment
The immune contexture of the TME — the type, density, functional orientation, and location of immune cells in the tumour and its margins — determines the natural history of the tumour and its response to clinical therapies.512 Malignant cells deploy sophisticated mechanisms to exhaust, exclude, or reprogram infiltrating immune cells, converting the body's primary defence mechanism into a pro-tumorigenic asset. This is the immunological equivalent of a hostile takeover where the security team ends up working for the new owner.
| Immune cell type | Functional characteristics | TME impact and pathological roles |
|---|---|---|
| Tumour-Associated Macrophages (TAMs) | Highly plastic innate immune cells; polarise into anti-tumorigenic (M1) or pro-tumorigenic (M2) phenotypes; CD68, LGALS3, LST1; M2-like: CD163, MRC1 | The most abundant innate immune cell in the TME, often localised near CAFs. Advanced tumours polarise TAMs toward the M2 state, secreting immunosuppressive cytokines and growth factors that promote angiogenesis and induce tissue fibrosis. |
| T Lymphocytes (CD8+ and CD4+) | Cytotoxic T cells (CD8+) and T helper cells (Th1 CD4+) execute robust anti-cancer immunity; CD3D/E, TRAC, CD8A/B, GZMB, PRF1, NKG7 | Frequently exhausted or excluded from the tumour core. The TME upregulates immune checkpoints to induce T-cell anergy. Regulatory T cells (CD4+ CD25+ FOXP3+ Tregs) are recruited to actively suppress effector T-cell activation. |
| Regulatory T Cells (Tregs) | Immunosuppressive CD4+ T cells; FOXP3, IL2RA (CD25), CTLA4 | Immunosuppressive niches co-localise with malignant progression and immune evasion; functional interpretation varies by context and therapy. |
| Myeloid-Derived Suppressor Cells (MDSCs) | Immature myeloid cells that expand during cancer and inflammation | Potently inhibit T-cell and NK-cell function through depletion of essential amino acids and production of reactive species. Critical components of the pre-metastatic niche. |
| B Cells / Plasma Cells / TLS | Adaptive immune cells that can form tertiary lymphoid structures (TLS); MS4A1 (CD20), CD79A; plasma cells: MZB1, XBP1 | Organised TLS can indicate structured antitumour adaptive responses and correlate with better outcomes in several settings, particularly with immunotherapy. Findings are promising but context-dependent. |
| Neutrophils | First responders of the innate immune system | Exhibit plasticity. Ly6G− populations may encourage anti-cancer activities, whereas Ly6G+ neutrophils frequently contribute to immunosuppressive pre-metastatic niches. |
| Natural Killer (NK) Cells | Innate lymphocytes responsible for destroying cells lacking MHC class I molecules | Frequently depleted or functionally impaired within primary tumours and metastases. Specific subsets, such as human decidual NK cells (CD56brightCD16−), can be hijacked to stimulate tumour growth via VEGF and placental growth factors. |
The tripartite interaction between cancer cells, the immune system, and the tumour stroma creates a highly regulated ecosystem. Tumour cells and immune cells shape immunosuppressive microenvironments through metabolic branches favouring polyamine synthesis or nitric oxide production. Elevated serum arginine levels in advanced breast cancer patients correlate positively with the expression of immunosuppressive cell markers, illustrating how metabolic reprogramming is weaponised for immune escape.
4. Deconvolution methods for identifying malignant cells
Given the profound interdependence between malignant cells and their microenvironment, the mere presence of transformed cells is often insufficient to result in clinically relevant disease — a concept termed "cancer without disease." The tumour, it turns out, needs the neighbourhood on its side before it can truly cause trouble. The progression to invasive carcinoma necessitates a permissive stroma, making the quantitative and qualitative assessment of non-malignant TME components an exceptionally powerful diagnostic, prognostic, and predictive tool.
Bulk transcriptomic deconvolution and prognostic algorithms
Clinical tumour biopsies inevitably contain a complex admixture of malignant, stromal, and immune cells. Computational algorithms have been developed to deconvolute bulk RNA-sequencing data, transforming the non-malignant cellular fraction into quantifiable diagnostic biomarkers. The ESTIMATE (Estimation of STromal and Immune cells in MAlignant Tumour tissues using Expression data) algorithm utilises highly specific gene expression signatures to infer the fraction of infiltrating stromal and immune cells within tumour samples.8 By calculating an "Immune Score" and a "Stromal Score," the ESTIMATE method inversely derives "Tumour Purity" across diverse samples, including those from The Cancer Genome Atlas (TCGA) and the Gene Expression Omnibus (GEO).
Bulk deconvolution methods vary considerably in their assumptions:
- Fixed signature matrix methods (e.g., CIBERSORT, xCell, MCP-counter, quanTIseq) use precomputed leukocyte or stromal signatures.
- Joint tumour+immune+stroma methods (e.g., EPIC) also estimate a cancer/other fraction.
- Bayesian, single-cell-informed methods (e.g., BayesPrism) integrate sc/snRNA reference atlases to better match tumour contexts.
Community benchmarking indicates that many published methods perform well for broad cell types but can struggle with finer functional states — notably within T cells — and with dataset shift. In other words: ask the algorithm whether there are T cells, and it will probably tell you. Ask it to distinguish exhausted from dysfunctional, and you may receive a confident answer that is quietly wrong.
Clinical applications of these algorithms have yielded profound diagnostic insights. In gastric cancer, analyses comparing high and low immune score groups identified 1,260 differentially expressed genes (DEGs), of which 869 were upregulated and 391 were downregulated in high-immune-infiltration samples. High stromal scores frequently correlate with EMT pathway activation and TGF-β signalling. Conversely, patients with low stromal scores often present with higher tumour mutation burden (TMB) and micro-satellite instability (MSI), rendering them more sensitive to immune checkpoint inhibitors (ICIs) targeting PD-1 and PD-L1. The identification of hub genes such as BPIFB2 further highlights how stromal scoring acts as an independent indicator of overall survival and molecular subtyping.
The Immunoscore has emerged as a critical prognostic indicator, surpassing traditional AJCC/UICC-TNM classification in specific contexts like colorectal cancer.67 It standardises the quantification of the density and spatial distribution of CD3+ and CD8+ T cells in the tumour core (CT) and the invasive margin (IM).
A simplified pseudocode for a bulk RNA-seq workflow illustrates the key steps:
# Input: bulk_expression_matrix (genes × samples)
# 1) Estimate stromal/immune scores and purity proxy
scores <- estimate_stromal_immune_scores(bulk_expression_matrix)
# scores: immune_score, stromal_score, tumour_purity_proxy
# 2) Immune deconvolution (immune fractions)
immune_fracs <- run_immune_deconvolution(bulk_expression_matrix,
method = "CIBERSORT_or_quanTIseq")
# 3) Joint immune + stroma (+ cancer/other) deconvolution (optional)
cell_fracs <- run_joint_deconvolution(bulk_expression_matrix,
method = "EPIC_or_BayesPrism")
# 4) Flag samples where tumour-cell calls are likely unreliable
flag_low_purity <- scores$tumour_purity_proxy < 0.4 # threshold is context-dependent
Single-cell approaches: malignant vs non-malignant calls via CNV inference
In single-cell RNA-seq from tumours, a common strategy separates malignant from non-malignant cells by inferring large-scale copy-number alterations (CNVs) from expression data, because many tumour cells harbour characteristic CNVs that immune and stromal cells lack. Methods such as CopyKAT and CONICS were explicitly motivated by the need to distinguish malignant from infiltrating stroma in scRNA data.9
Stromal and immune cells serve two essential roles in this workflow: they provide (i) a reference baseline for CNV inference, and (ii) contextual validation — if a candidate malignant cluster expresses immune markers (e.g., PTPRC/CD45) and lacks CNV signals, it is likely infiltrating immune rather than tumour. Benchmarking studies emphasise that performance depends on reference choice, CNV burden, platform, and runtime trade-offs.
# Input: adata (AnnData with scRNA counts)
# 1) QC
adata = filter_cells_and_genes(adata)
adata = remove_doublets(adata)
adata = correct_ambient_rna_if_needed(adata)
# 2) Normalise and cluster
adata = normalise_and_log1p(adata)
adata = run_pca_neighbors_umap(adata)
adata = cluster_cells(adata)
# 3) Cell type annotation (immune/stroma vs epithelial)
adata.obs["cell_type"] = label_by_marker_genes(adata, marker_dict)
# 4) CNV inference: use immune/stromal as reference "normal"
cnv = infer_cnv_from_expression(
adata,
reference_cells=adata.obs["cell_type"].isin(
["T_cell", "B_cell", "myeloid", "endothelial", "fibroblast"]
)
)
adata.obs["malignant_call"] = cnv_based_label(cnv, threshold="dataset_specific")
# 5) Characterise microenvironment niches associated with malignant subclones
tme_summary = summarise_tme_states(adata, group_by=["patient", "region_or_cluster"])
Comparison of major computational approaches
| Approach | Typical inputs | Typical outputs | Key assumptions | Typical failure modes |
|---|---|---|---|---|
| Bulk scores | Bulk RNA expression | Stromal score, immune score, purity proxy | Signature validity across tissues; linear additivity | Misleading in atypical tumours; batch effects |
| Bulk deconvolution (signature-based) | Bulk RNA + signature matrix | Estimated cell-type fractions | Marker gene stability; limited cross-expression | Collinearity; tumour-expression contamination; limited functional state resolution |
| Bulk deconvolution (Bayesian, sc-informed) | Bulk RNA + sc/snRNA reference atlas | Posterior over cell fractions | Reference matches biology; model captures dataset shift | Reference mismatch; overconfidence in posteriors; computational cost |
| scRNA malignant labelling by CNV inference | scRNA counts + normal reference cells | Per-cell CNV profiles; malignant vs normal labels | CNVs large enough to infer from expression | Low CNV burden tumours; technical noise/ambient RNA |
| Spatial transcriptomics tumour-margin inference | Spatial gene expression + histology ± sc reference | Tumour core/margin maps; immune/stroma gradients | Spatial spots represent mixtures; signatures spatially stable | Limited resolution; spot mixtures; segmentation errors |
| Multiplex imaging (IMC / MIBI / CODEX) | Multiplex protein panels + images | Cell phenotypes + spatial relationships | Antibody specificity; segmentation correctness | Batch/antibody variability; segmentation artefacts |
| AI on histology/radiology | H&E slides or CT/MRI + labels | Predicted immune infiltration; risk/prognosis proxies | Training labels valid; model generalises | Dataset shift; confounding by site/scanner; limited interpretability |
5. Spatial transcriptomics and niche-specific biomarkers
The primary limitation of bulk RNA-seq lies in the loss of spatial architecture resulting from tissue dissociation. When you blend the tissue, you lose the map. The advent of spatial transcriptomics — utilising platforms like Visium, GeoMx, and physically interacting cell sequencing (PIC-seq) — has revolutionised the ability to map molecular profiles directly onto intact histological structures, revealing exactly how stromal and immune cells are localised relative to malignant cells, and identifying distinct "niches" where specific cellular interactions drive pathogenesis.
Spatial neighbourhood structure often reveals general principles — immune exclusion zones, stromal barriers, and tumour–immune interfaces — that are invisible after dissociation. Even when tumour markers are diffuse, tumour margins often show coordinated changes: rising CAF/ECM programmes (reactive stroma), shifts in immune density and functional state (e.g., excluded-to-inflamed transitions), and altered neighbourhood composition. These gradients can localise malignant regions and annotate "leading edge" niches that are biologically and clinically relevant.
For example, in advanced spatial transcriptomic analyses of cancer-associated dermatomyositis lesions, researchers identified unique immune and stromal niches: cancer-associated skin lesions exhibited dispersed immune infiltrates enriched with macrophages, CD8+ T cells, and B cells with preserved vascular architecture, whereas non-cancer-associated lesions displayed dense myeloid infiltrates expressing IL1B and CXCL10, driving massive stromal remodelling and loss of vascular-associated fibroblasts.
Similarly, spatial omics has localised argininosuccinate lyase (ASL)+ malignant cells to the tumour–stroma interface, forming dense ligand–receptor dyads with endothelial and fibroblast cells via the MIF–CD74/CXCR4 signalling axis, providing dual metabolic and immune targets that link metabolic reprogramming directly to immune escape.
# Input: spatial_data (spots × genes), histology_image, optional sc_reference
spatial = normalise_spatial(spatial_data)
spatial.obs["tumour_score"] = score_signature(spatial, tumour_signature)
spatial.obs["immune_score"] = score_signature(spatial, immune_signature)
spatial.obs["stroma_score"] = score_signature(spatial, stroma_signature)
if sc_reference is not None:
spatial = deconvolve_spots_with_sc_ref(spatial, sc_reference)
# Tumour margin detection as gradient/boundary problem
spatial.obs["margin_prob"] = detect_boundary(
features=["tumour_score", "immune_score", "stroma_score"],
spatial_graph=build_spatial_graph(spatial)
)
# Neighbourhood characterisation
neighbourhoods = identify_spatial_neighbourhoods(
spatial, features=["immune_score", "stroma_score", "tumour_score"]
)
Follicular lymphoma grading: a spatial biomarker case study
One of the most precise clinical applications of spatial biomarkers is found in the grading of Follicular Lymphoma (FL). Reanalysis of spatial and single-cell transcriptomic data from normal germinal centres and FL samples has identified objective biomarkers capable of distinguishing aggressive Grade 3 malignant cells from indolent Grade 1–2 cells. ROC curve analyses established optimal cutoffs that proved highly reproducible across independent visual observers and digital algorithms.
| Marker | Target cell population / niche | Optimal cutoff for Grade 3 FL | Clinical significance |
|---|---|---|---|
| AICDA (AID) | Dark zone cells (centroblasts) / Grade 3A | 1.54% | Best discriminatory ability; high expression correlates with shorter disease-specific survival |
| CXCR4 | Dark zone cells (centroblasts) / Grade 3A | 21.9% | Highly expressed in aggressive malignant phenotypes |
| CD40 | Light zone cells (centrocytes) / Grade 1–2 | N/A | Differentiates low-grade, less aggressive malignant cells |
| TFRC (CD71) | Light zone cells (centrocytes) / Grade 1–2 | 7.57% | Useful for distinguishing grade variations in combination with other markers |
| Ki67 | General proliferation marker | 21.6% | Standard marker of cellular proliferation, significantly higher in Grade 3 FL |
Imaging biomarkers and computational pathology
Classical pathology already uses microenvironment cues: tumour-infiltrating lymphocytes (TILs) and stromal abundance are evaluated on H&E/IHC in some tumour settings, with published guidelines and clinical validation studies (notably in colon cancer and breast cancer). Digital pathology and multiplex imaging now quantify these spatial patterns more systematically, and machine learning models can predict immune-associated outcomes from routine slides, though generalisation and standardisation remain challenges.
6. Theoretical framework: entropy maximisation in biological networks
The identification of malignant cells based on the transcriptomic state of their surrounding stroma and immune cells relies not merely on phenomenological observation, but on rigorous statistical inference and systems biology. To formalise how cell types are defined, researchers utilise the principle of maximum entropy (MaxEnt), a theoretical framework rooted in statistical mechanics and information theory developed by pioneers such as Claude Shannon and E.T. Jaynes.1011 The core idea is elegant: given what you know, assume as little as possible about what you do not. Biologists, who have spent decades assuming a great deal about what they do not know, have found this surprisingly useful.
The maximum entropy procedure seeks the probability distribution that maximises information entropy subject to the constraints of known testable information — such as the measured mean expression of specific genes and their pairwise correlations in a tissue sample. This constrained optimisation problem is solved using the method of Lagrange multipliers. For a probability distribution \( p(x) \) subject to constraints on expectations, the Lagrangian takes the form:
Shore and Johnson demonstrated that the canonical Boltzmann–Gibbs–Shannon (BGS) form of entropy is uniquely the mathematical function ensuring satisfaction of the addition and multiplication rules of probability, avoiding unwarranted biases compared to non-extensive forms like Tsallis entropy.
The Ising model and biological network states
In the context of biological networks, this mathematical foundation maps directly onto the Ising model, originally developed for ferromagnetism and now applied to retinal neuron firing patterns and genetic regulatory networks. The Ising model defines the probability \( P(\sigma) \) of a specific network state as:
When applied to single-cell RNA-seq or MERFISH spatial transcriptomics data capturing hundreds of genes across millions of cells, this maximum entropy construction generates a probabilistic energy landscape. The distribution of cell states exhibits multiple local maxima (thermodynamic energy minima). Grouping cells according to these thermodynamic attractors yields mathematically rigorous definitions of cell classes — distinguishing malignant cell states from complex stromal and immune cell states with a precision comparable to neural network classifiers, without relying on arbitrary clustering algorithms.
Maximum Caliber and extensions
Further extensions include the principle of Maximum Caliber (MaxCal) — a dynamical analogue of MaxEnt that utilises stochastic protein expression trajectories to predict underlying rate parameters of protein synthesis and degradation, accurately inferring the details of auto-activating genetic feedback circuits. Advanced machine learning approaches for quantum state tomography have also adopted the Von Neumann entropy (VNE), providing a robust game-theoretic justification for VNE maximisation over density matrices as least-committed inferences in spectral domains relevant to processing massive biological data sets.
7. Limitations
Biological heterogeneity and phenotypic ambiguity
The most significant biological limitation is the profound heterogeneity and plasticity of the stromal and immune compartments. CAFs are a highly heterogeneous population with varied developmental origins. There is a critical lack of specific, universally expressed target markers for CAFs that do not also overlap with normal physiological fibroblasts or other mesenchymal lineages. Markers like FAP or α-SMA are also upregulated during normal wound healing, tissue fibrosis, and benign inflammatory conditions — meaning stromal and immune signals should always be interpreted as probabilistic context, not as standalone proof of malignancy.
The genetic heterogeneity of the tumour dictates an ever-shifting immune landscape. As malignant cells undergo clonal sweeps, the resulting interactions with the immune system are highly dynamic. Metrics derived from population genetics, such as the Shannon index, are used to quantify this diversity — for example, increased Shannon indexes in oesophageal biopsy specimens correlate with a higher propensity for malignant progression of Barrett's oesophagus. However, this shifting diversity means a static biopsy measuring the stromal or immune state may quickly become obsolete as the tumour evolves.
EMT further obfuscates analysis: while bulk RNA-sequencing heavily correlates EMT signatures with tumour aggressiveness, it is often impossible to determine whether these signatures originate from malignant cells themselves or from reactive stromal fibroblasts.
Technical constraints of spatial transcriptomics
While spatial transcriptomics provides critical contextual data, widely used spot-based platforms suffer from inherent resolution constraints. A single transcriptomic "spot" may capture the boundary where a malignant cell, a CAF, and a macrophage physically interact — blending their transcriptomes, blurring true transcriptomic boundaries, and artificially inflating the apparent stromal–immune co-expression. Sophisticated computational deconvolution algorithms such as RCTD and cell2location attempt to correct for this mixing, but remain highly sensitive to the quality of the single-cell reference datasets and are prone to cross-platform batch effects.
The inherent trade-offs in spatially resolved transcriptomics between spatial resolution, transcriptome coverage depth, and transcript sensitivity remain central bottlenecks for precise, standalone clinical diagnostics. Dissociation itself can induce stress-response transcription and bias cell-type recovery; ambient RNA contamination can further blur cell identity. These issues should be treated as essential variables to address, not optional polish.
Sampling bias, spatial heterogeneity, and temporal dynamics
Tumours are spatially heterogeneous; a core biopsy can miss malignant regions or oversample stroma. Any single "average" immune or stromal number from bulk sequencing can hide clinically relevant localisation. Time also matters: therapy reshapes immune and stromal ecologies, and the "same" tumour can switch immune phenotypes under selective pressure. The diagnostic snapshot provided by a single biopsy may therefore quickly become inaccurate for predicting long-term behaviour.
Statistical identifiability, overfitting, and misspecification
Bulk deconvolution is an underdetermined inverse problem when cell types are correlated or when tumour cells express "immune-like" genes (or vice versa). In imaging and ML-based inference, overfitting and confounding by site, scanner, and staining are major risks; external validation and model reporting standards are crucial before clinical reliance. Cell–cell communication inference (ligand–receptor) further illustrates misspecification and multiplicity issues: many tools exist with divergent assumptions, and colocalisation/diffusion constraints are often simplified.
Thermodynamic and information-theoretic constraints
At the theoretical edge of systems biology, efforts to model the TME using information theory encounter fundamental physical constraints. Maximum entropy models such as the Ising model assume that the biological system obeys the principle of "balance" at a steady state. However, biological systems at all scales are strongly constrained by history and the arrow of time, making them fundamentally non-ergodic — over long timescales, evolutionary systems explore new, small areas of phase space rather than the entire space evenly. The ergodic assumption is, to put it politely, aspirational.
Living cells constantly consume free energy and dissipate heat to maintain ordered states against the second law of thermodynamics, operating far from thermodynamic equilibrium. They do not naturally seek maximum entropy; rather, they minimise entropy production to perform vital biological work. When observing the TME via transcriptomics, the vast majority of internal degrees of freedom — rapid protein phosphorylation states, transient metabolite fluxes, and localised epigenetic shifts — remain hidden from the observer.
Mathematical optimisation frameworks reveal that this partial observability places strict bounds on the accuracy of predicting the underlying thermodynamic energy consumption and the true state of the cell. Recent research in quantum thermodynamics further demonstrates that quantum engines made of correlated particles can exceed the traditional efficiency limit set by Carnot, suggesting that classical thermodynamic laws have loopholes when quantum correlations are involved. Nature, as usual, did not read the manual. Consequently, any diagnostic algorithm relying solely on classical transcriptomic snapshots of stromal and immune cell states operates under a persistent veil of mathematical and physical uncertainty, strictly bounded by the physical limits of observability in non-equilibrium thermodynamic systems.
8. Practical recommendations and conclusion
Across modalities, robust use of stromal and immune indicators for malignancy identification follows three core principles:
1. Triangulate across evidence types
Combine tumour-intrinsic evidence (mutation/CNV/lineage markers) with microenvironment composition and spatial context rather than substituting one for another. No single signal layer is sufficient in general.
2. Quantify uncertainty and validate orthogonally
Use benchmarks, sensitivity analyses (reference choices, signature sets, thresholds), and orthogonal validation (IHC/IF, pathology review, targeted assays) to avoid overconfident calls from a single pipeline.
3. Treat microenvironment cues as context-dependent
Interpret stromal and immune patterns relative to tumour type, stage, site, and treatment history. Reporting should always include these metadata and modality constraints (e.g., spatial spot size, dissociation protocol).
Conclusion
The characterisation of malignant cells has irretrievably moved beyond the scope of isolated genomic aberrations into the realm of complex ecological and thermodynamic systems. The transformation, survival, and metastatic dissemination of cancer are strictly governed by an ongoing dialogue with stromal architects — such as CAFs and TECs — and the immune system, spanning from exhausted T cells to reprogrammed macrophages. Recognising the TME as a proactive participant in disease pathology has established non-malignant cells as critical diagnostic indicators, driving the implementation of sophisticated computational deconvolution algorithms like ESTIMATE and high-resolution spatial transcriptomics methodologies.
The identification of highly specific spatial niches and mathematically defined cell-state attractors via the principle of maximum entropy offers an unprecedented ability to stratify disease severity, as evidenced by precise prognostic cutoffs in indications like follicular lymphoma. Nevertheless, utilising stroma and immune infiltrates as proxies for malignancy faces profound limitations. The phenotypic plasticity of non-malignant cells, the technical realities of multi-cell mixing in spatial assays, and the fundamental theoretical limits of modelling non-ergodic, non-equilibrium biological systems with hidden degrees of freedom all highlight the boundaries of current methodologies.
Ultimately, the future of precision oncology relies not merely on cataloguing the cells within the tumour space, but on dynamically modelling the energetic, transcriptomic, and spatial architecture of the entire tumour ecosystem as a unified, evolving entity — and on combining all available evidence layers, always with explicit acknowledgment of their assumptions and failure modes.
References
- 1Hanahan, D. & Weinberg, R. A. (2000). The hallmarks of cancer. Cell, 100(1), 57–70. doi:10.1016/s0092-8674(00)81683-9
- 2Hanahan, D. & Weinberg, R. A. (2011). Hallmarks of cancer: the next generation. Cell, 144(5), 646–674. doi:10.1016/j.cell.2011.02.013
- 3Hanahan, D. (2022). Hallmarks of cancer: new dimensions. Cancer Discovery, 12(1), 31–46. doi:10.1158/2159-8290.CD-21-1059
- 4Vander Heiden, M. G., Cantley, L. C. & Thompson, C. B. (2009). Understanding the Warburg effect: the metabolic requirements of cell proliferation. Science, 324(5930), 1029–1033. doi:10.1126/science.1160809
- 5Fridman, W. H., Pagès, F., Sautès-Fridman, C. & Galon, J. (2012). The immune contexture in human tumours: impact on clinical outcome. Nature Reviews Cancer, 12(4), 298–306. doi:10.1038/nrc3245
- 6Galon, J., Costes, A., Sanchez-Cabo, F., Kirilovsky, A., Mlecnik, B., Lagorce-Pagès, C., … Pagès, F. (2006). Type, density, and location of immune cells within human colorectal tumors predict clinical outcome. Science, 313(5795), 1960–1964. doi:10.1126/science.1129139
- 7Pagès, F., Mlecnik, B., Marliot, F., Bindea, G., Ou, F.-S., Bifulco, C., … Galon, J. (2018). International validation of the consensus Immunoscore for the classification of colon cancer: a prognostic and accuracy study. The Lancet, 391(10135), 2128–2139. doi:10.1016/S0140-6736(18)30789-X
- 8Yoshihara, K., Shahmoradgoli, M., Martínez, E., Vegesna, R., Kim, H., Torres-Garcia, W., … Verhaak, R. G. W. (2013). Inferring tumour purity and stromal and immune cell admixture from expression data. Nature Communications, 4, 2612. doi:10.1038/ncomms3612
- 9Gao, R., Bai, S., Henderson, Y. C., Lin, Y., Schalck, A., Yan, Y., … Navin, N. E. (2021). Delineating copy number and clonal substructure in human tumors from single-cell transcriptomes. Nature Biotechnology, 39, 599–608. doi:10.1038/s41587-020-00795-2
- 10Shannon, C. E. (1948). A mathematical theory of communication. Bell System Technical Journal, 27(3), 379–423. doi:10.1002/j.1538-7305.1948.tb01338.x
- 11Jaynes, E. T. (1957). Information theory and statistical mechanics. Physical Review, 106(4), 620–630. doi:10.1103/PhysRev.106.620
- 12Fridman, W. H., Zitvogel, L., Sautès-Fridman, C. & Kroemer, G. (2017). The immune contexture in cancer prognosis and treatment. Nature Reviews Clinical Oncology, 14(12), 717–734. doi:10.1038/nrclinonc.2017.101