Simple guide connecting your motivation-letter terms to what you already know from your thesis: single-cell data, GRN inference, KAN interpretability, perturbation validation, QC, normalization, HVG selection, and reproducible ML workflows.
Click items to expand. Search filters the guide.
My strongest background is computational: single-cell data, machine learning, statistics, and reproducible workflows. I am still learning the detailed immunology, but I can connect biological questions to analysis workflows.
Say: “I am still building deeper expertise in that specific biological area, so I do not want to overstate my knowledge. My current understanding is that ... Computationally, I would approach it by ...”
In my thesis, I learned how to turn noisy single-cell expression data into interpretable regulatory hypotheses. In this PhD, the same idea can be applied to disease datasets: identify cell states, regulatory programs, spatial niches, and immune-stromal interactions.
You are trainable, honest, computationally strong, careful with interpretation, and motivated to learn the immunology.
Fibroblasts are structural/stromal cells. In disease, they can become activated and start inflammatory, tissue-remodeling, or profibrotic programs.
Relate to thesisIn your thesis, you studied gene-expression states and regulatory programs. Here, instead of TF-target networks in general, you would identify regulatory programs that explain why fibroblasts become inflammatory or fibrotic.
Safe answerFibroblast activation means fibroblasts shift from structural cells into inflammatory, tissue-remodeling, or profibrotic states. Computationally, I would identify fibroblast subclusters, marker genes, pathway enrichment, disease-control differences, and candidate regulatory programs.
Inflammation = immune activation. Fibrosis = excessive tissue scarring/remodeling.
Relate to thesisYour thesis dealt with changes in gene expression after perturbation. Here, disease processes can also be seen as changes in gene programs: inflammatory programs or profibrotic extracellular-matrix programs.
Safe answerInflammation refers to immune activation and inflammatory signaling. Fibrosis refers to excessive tissue remodeling and extracellular matrix deposition, often involving activated fibroblasts. Chronic inflammation can push fibroblasts toward fibrotic programs.
Gene-expression programs in stromal cells, especially fibroblasts.
Relate to thesisIn your thesis, you looked for TF-target dependencies. In this lab, you may look for TFs/pathways driving mesenchymal programs such as extracellular matrix production or tissue remodeling.
Safe answerBy mesenchymal programs, I mean gene-expression programs active in stromal cells such as fibroblasts. These may involve extracellular matrix, tissue remodeling, inflammatory signaling, migration, or profibrotic differentiation.
Which cells are physically close to each other in tissue.
Relate to thesisYour thesis used single-cell expression without tissue location. Spatial analysis adds location: not only what a cell expresses, but where it is and which cells are nearby.
Safe answerSpatial neighborhoods describe the local arrangement of cells in tissue. For example, an activated fibroblast may behave differently if it is near macrophages, T cells, endothelial cells, or other fibroblasts.
Immune cells and stromal cells signaling to each other.
Relate to thesisYour thesis inferred regulatory relationships inside gene networks. This is similar in spirit, but now the relationships are between cell types, such as macrophages sending signals to fibroblasts.
Safe answerImmune-stromal communication means signaling between immune cells and stromal cells such as fibroblasts. Computationally, I would study it using ligand-receptor inference, spatial co-localization, and pathway analysis, while treating results as hypotheses.
Resolving = tissue is moving toward repair. Pathogenic = tissue maintains inflammation or fibrosis.
Relate to thesisYour perturbation validation compared predicted vs actual response. In disease data, you may compare resolving vs pathogenic states and ask which genes/pathways/regulators differ.
Safe answerA resolving state moves tissue toward repair and reduced inflammation, while a pathogenic state maintains inflammation or fibrosis. Computationally, I would compare cell states, gene programs, pathway activity, and spatial organization between these conditions.
How disease signals spread or are maintained across tissue or cell populations.
Relate to thesisYour thesis asked whether a TF perturbation changes downstream gene expression. Disease propagation is similar conceptually: which cells/signals drive downstream changes in other cells or tissue areas.
Safe answerBy disease propagation, I mean how inflammatory or fibrotic programs may spread across tissue compartments or become maintained over time. Computationally, I would study this using spatial data, cell-cell communication analysis, trajectories, and regulatory models.
Genes, TFs, pathways, or cell-cell signals that may cause or maintain a disease state.
Relate to thesisIn your thesis, TF feature importance ranked candidate TF-target relationships. In this PhD, similar ranking can prioritize candidate drivers of fibroblast activation or immune-stromal communication.
Safe answerA regulatory model can prioritize candidate transcription factors or pathways that may drive a disease-associated state. I would not call them causal immediately; I would treat them as hypotheses for validation.
Atlas = what cell types/states are present. Mechanistic model = why they arise and what drives them.
Relate to thesisYour thesis tried to go beyond prediction by extracting interpretable regulatory hypotheses. That is the same idea: go beyond describing clusters and ask what regulates them.
Safe answerA descriptive atlas tells us which cell types and states are present. A mechanistic model tries to explain why those states arise, which regulators or signals drive them, and which candidates can be experimentally tested.
Meaning: remove or flag bad cells and artifacts. Thesis link: you checked detected genes, total counts, and mitochondrial percentage. Answer: I would use QC to remove technical artifacts before biological interpretation.
Meaning: combine datasets while reducing batch effects. Thesis link: like normalization/preprocessing, but for multiple patients/batches/technologies. Answer: I would correct technical variation while preserving true disease biology.
Meaning: assign labels to cells/clusters. Thesis link: you worked with genes/TFs; here you also label cells as fibroblasts, macrophages, T cells, etc.
Meaning: find genes up/down between conditions. Thesis link: related to log2FC and perturbation response. Answer: I would compare disease vs control within the same cell type/state.
Meaning: ask whether a cell type/state is more common in disease. Answer: disease may change both gene expression and the frequency of cell populations.
Meaning: infer a possible path from one cell state to another. Answer: useful for resting-to-activated fibroblast hypotheses, but static scRNA-seq does not directly observe time.
Meaning: predict possible cell-cell signaling. Thesis link: like a network, but between cell types instead of TF-target genes. Answer: useful but hypothesis-generating, not proof.
Meaning: study which cells are near each other in tissue. Thesis link: adds tissue location to expression analysis.
Gene expression for individual cells. This is closest to your thesis.
Spatial transcriptomics with tissue spots. Each spot may contain multiple cells. Think: expression + approximate location.
High-resolution imaging-based spatial transcriptomics for selected genes. Think: more spatially precise, but targeted.
Imaging mass cytometry: spatial protein profiling in tissue. Think: protein markers + location.
Patient/sample information: disease, treatment, severity, tissue, batch, donor. Helps avoid confusing technical or patient effects with disease biology.
Fibroblast activation means fibroblasts shift from structural cells into inflammatory, tissue-remodeling, or profibrotic states. Computationally, I would identify fibroblast subclusters, marker genes, pathway enrichment, disease-control differences, and regulatory programs. I am still learning the detailed fibroblast biology, but this is how I understand the computational question.
Inflammation is immune activation and inflammatory signaling. Fibrosis is excessive tissue remodeling or scarring, often involving activated fibroblasts and extracellular matrix deposition. Chronic inflammation can contribute to fibrosis.
Spatial neighborhoods refer to which cells are physically close to each other in tissue. For example, an activated fibroblast may behave differently if it is near macrophages, T cells, endothelial cells, or other stromal cells.
It means signaling between immune cells and stromal cells such as fibroblasts. Computationally, I would study it using ligand-receptor inference, pathway analysis, and spatial co-localization. I would treat it as hypothesis-generating, not proof of signaling.
A regulatory network model can rank TFs, pathways, or genes associated with a disease state. These are candidate drivers, not proven causal mechanisms. They should be validated using perturbation data, spatial evidence, protein data, or experiments.
It predicts possible cell-cell communication by checking whether one cell type expresses a ligand and another expresses the corresponding receptor. It is useful for hypothesis generation but does not prove active signaling.
It is based mainly on RNA expression, so it does not prove protein abundance, spatial contact, signaling activity, or causality. I would strengthen it using spatial proximity, protein data, perturbation evidence, and known biology.
I would use scRNA-seq to define cell types and states, spatial transcriptomics to locate them in tissue, IMC to validate protein-level spatial phenotypes, and clinical metadata to connect molecular patterns to disease group, severity, treatment, or outcome.
My understanding is that Visium gives spatial transcriptomic profiles across tissue spots, where each spot may contain multiple cells. MERSCOPE provides higher-resolution targeted spatial transcriptomics using imaging-based detection of selected genes.
A descriptive atlas tells us what cell types and states are present. A mechanistic model tries to explain why those states arise, which regulators or signals drive them, and what candidates can be tested experimentally.
I used writing assistance to polish the wording, but the motivation reflects the direction I am genuinely interested in. Some biological areas are still new to me, and I am actively preparing them. My strongest contribution right now is computational: single-cell data, machine learning, reproducible workflows, and interpretable analysis.
I handled high-dimensional omics data by reducing technical noise and then reducing dimensionality in a biologically meaningful way. In my thesis, I worked with scRNA-seq data containing more than 36,000 genes. I applied QC, normalization, log1p transformation, and HVG selection to reduce the data to the top 1,000 informative genes. Then I used target-specific KAN models rather than one huge global model, and scaled training using GPU/HPC parallelization.
I used standard scRNA-seq QC. I inspected detected genes per cell, total transcript counts, and mitochondrial read percentage. Low detected genes can indicate poor-quality cells, very high counts can suggest doublets or multiplets, and high mitochondrial content can indicate stressed or dying cells. After QC inspection, I used median-depth normalization, log1p transformation, and HVG selection.
Median-depth normalization corrects library-size differences using the typical cell depth, which is robust to extreme high-count cells. log1p compresses the skewed expression range and handles zeros naturally. This gave stable inputs for regression-based KAN models while preserving biologically meaningful variation.
Mean depth can be pulled upward by outlier cells with very high counts, such as doublets or technical artifacts. Median depth better represents the typical cell, so scaling to the median avoids over-amplifying noise caused by extreme cells.
I did not explicitly remove dropout using imputation. I handled noise and sparsity indirectly through QC, median-depth normalization, log1p transformation, HVG selection, regularized KAN training, and evaluation at both single-cell and gene-mean levels. The moderate single-cell correlations show the noise remained challenging, while the high gene-mean correlations show the model captured the dominant biological perturbation signal.
A batch effect is unwanted technical variation caused by sample processing, sequencing run, lab, machine, reagent lot, or time point. It is dangerous because cells may cluster by batch rather than biology. Batch correction should remove technical variation without removing real disease signal.
Causality means changing one variable directly produces a change in another. In GRN inference, correlation between TF A and gene B does not prove A regulates B. Stronger causal evidence comes from perturbation experiments, time-course data, chromatin accessibility, or wet-lab validation.
CD99 was an outlier because the experimentally observed expression was near zero in several perturbation settings, while the model predicted a small nonzero value. Since log2FC is ratio-based, dividing by near-zero actual expression produces a very large positive fold change. I would interpret it as model overestimation, not direct biological proof.
My strongest background is computational: high-dimensional biological data, ML, statistics, and reproducible workflows. I am still deepening my immunology, but I can contribute technically while learning the disease biology.
Fibroblast activation means fibroblasts shifting from structural cells into inflammatory, tissue-remodeling, or profibrotic states.
Spatial neighborhoods describe which cells are physically near each other in tissue, and they matter because cell function depends on local context.
Immune-stromal communication means signaling between immune cells and stromal cells, such as macrophages or T cells interacting with fibroblasts.
Ligand-receptor inference predicts possible cell-cell signaling based on ligand expression in one cell type and receptor expression in another. It is hypothesis-generating.
A descriptive atlas tells us what cell types and states exist. A mechanistic model tries to explain why those states arise and what drives them.
A regulatory hypothesis is a candidate mechanism suggested by a model, such as a TF or pathway that may drive a disease-associated state. It is not proof of causality until validated.
For unknown biology questions, be honest: I am still learning the detailed immunology, but computationally I would approach it through QC, annotation, differential analysis, pathways, spatial neighborhoods, communication analysis, and regulatory hypotheses.