# A single-cell atlas of the mouse and human prostate reveals heterogeneity and conservation of epithelial progenitors

1. Department of Medicine, Columbia University Irving Medical Center, United States
2. Department of Genetics and Development, Columbia University Irving Medical Center, United States
3. Department of Urology, Columbia University Irving Medical Center, United States
4. Department of Systems Biology, Columbia University Irving Medical Center, United States
5. Herbert Irving Comprehensive Cancer Center, Columbia University Irving Medical Center, United States
6. Department of Biomedical Informatics, Columbia University Irving Medical Center, United States
7. Department of Pathology and Laboratory Medicine, Weill Medical College of Cornell University, United States
8. Department of Pathology and Cell Biology, Columbia University Irving Medical Center, United States
Research Article

## Abstract

Understanding the cellular constituents of the prostate is essential for identifying the cell of origin for prostate adenocarcinoma. Here, we describe a comprehensive single-cell atlas of the adult mouse prostate epithelium, which displays extensive heterogeneity. We observe distal lobe-specific luminal epithelial populations (LumA, LumD, LumL, and LumV), a proximally enriched luminal population (LumP) that is not lobe-specific, and a periurethral population (PrU) that shares both basal and luminal features. Functional analyses suggest that LumP and PrU cells have multipotent progenitor activity in organoid formation and tissue reconstitution assays. Furthermore, we show that mouse distal and proximal luminal cells are most similar to human acinar and ductal populations, that a PrU-like population is conserved between species, and that the mouse lateral prostate is most similar to the human peripheral zone. Our findings elucidate new prostate epithelial progenitors, and help resolve long-standing questions about anatomical relationships between the mouse and human prostate.

## Introduction

Significant anatomical differences between the mouse and human prostate have long hindered analyses of mouse models of prostate diseases. The mouse prostate can be separated into anterior (AP), dorsal (DP), lateral (LP), and ventral (VP) lobes; the mouse dorsal and lateral lobes are often combined as the dorsolateral prostate (DLP) (Cunha et al., 1987; Shappell et al., 2004; Shen and Abate-Shen, 2010). In contrast, the human prostate lacks defined lobes, and instead is divided into different histological zones (central, transition, and peripheral); the peripheral zone represents the predominant site of prostate adenocarcinoma, whereas benign prostatic hyperplasia (BPH) occurs in the transition zone (Cunha et al., 2018; Ittmann, 2018; Shappell et al., 2004). Moreover, unlike the mouse, the human prostate has distinct ductal and acinar regions. Although microarray gene expression profiling has suggested that the DLP is most similar to the human peripheral zone (Berquin et al., 2005), there is no consensus on the relationship between mouse lobes and human zones (Ittmann, 2018; Ittmann et al., 2013; Shappell et al., 2004).

The adult prostate epithelium is comprised of luminal, basal, and rare neuroendocrine cells (Shen and Abate-Shen, 2010; Toivanen and Shen, 2017), and cellular heterogeneity has been suggested within the luminal (Barros-Silva et al., 2018; Chua et al., 2014; Karthaus et al., 2020; Karthaus et al., 2014; Kwon et al., 2016; Liu et al., 2016) and basal compartments (Goldstein et al., 2008; Lawson et al., 2007; Wang et al., 2020). Lineage-tracing analyses have shown that the hormonally intact adult prostate epithelium is maintained by unipotent progenitors within the basal and luminal epithelial compartments (Choi et al., 2012; Lu et al., 2013; Wang et al., 2013). However, following tissue dissociation, both basal and luminal cells can act as bipotent progenitors in organoid or tissue reconstitution assays (Chua et al., 2014; Karthaus et al., 2014). The progenitor properties of basal cells may reflect their ability to generate luminal progeny during tissue repair after wounding or inflammation (Kwon et al., 2014; Toivanen et al., 2016), but the role of presumptive luminal progenitors has been less clear. In particular, several studies have suggested increased progenitor potential in the proximal region of the prostate (nearest to the urethra) (Burger et al., 2005; Goto et al., 2006; Tsujimura et al., 2002), particularly for proximal luminal cells (Guo et al., 2020; Karthaus et al., 2020; Kwon et al., 2016; Wei et al., 2019; Zhang et al., 2018). However, the nature and distribution of these epithelial populations have been poorly characterized.

## Results

### Distinct luminal epithelial populations in the mouse prostate

To examine cellular heterogeneity, we performed single-cell RNA-sequencing of whole prostates from adult wild-type mice at 10 weeks of age. We microdissected the full proximal-distal extent of each prostate lobe down to its junction with the urethral epithelium (Figure 1—figure supplement 1A–C). We noted that the anterior (AP), dorsal (DP), and lateral (LP) lobes joined the urethra in close proximity on the dorsal side, whereas the ventral lobe (VP) had a distinct junction ventrally. As previously described (Cunha et al., 1987; Shappell et al., 2004; Shen and Abate-Shen, 2010), each lobe has a characteristic morphology, pattern of ductal branching, and histological appearance (Figure 1—figure supplement 1D–F). Single-cell populations were obtained using a multi-step tissue dissociation protocol with successive enzymatic digestion, trituration, and filtering steps. Droplet-based single-cell RNA-sequencing was performed on the 10x Genomics Chromium platform (see Materials and methods for details).

For analysis of the resulting data, we performed batch effect correction if needed for aggregation of datasets (Stuart et al., 2019), followed by application of Randomly, an algorithm that uses random matrix theory to reduce noise in single-cell datasets (Aparicio et al., 2020). Using the universality property of random matrix theory on eigenvalues and eigenvectors of sparse matrices, Randomly discriminates biological signals from noise and sparsity-induced confounding signals, which typically comprise approximately 98% of the data, based on a survey of published single-cell datasets (Aparicio et al., 2020). The Randomly algorithm is based on the three-fold structure of a single-cell dataset: a random matrix (95% or more), a sparsity-induced (fake) signal, and a biological signal. The algorithm uses the universality properties of random matrix theory for both eigenvalues and eigenvectors to detect the biological signal. After de-noising of single-cell data, we performed clustering using the Leiden algorithm as implemented in Wolf et al., 2018, with selection of the number of clusters based on the mean silhouette score. Processing by Randomly followed by dimensional reduction for visualization using t-SNE (t-distributed Stochastic Neighbor Embedding) or UMAP (Uniform Manifold Approximation and Projection) plots facilitated the identification of cell populations with distinct transcriptional signatures (Figure 1—figure supplement 2). Additional description of computational methods is provided in Materials and methods.

We identified distinct luminal, basal, and neuroendocrine populations that were annotated based on the expression of marker genes, as visualized in an aggregated dataset composed of 5288 cells from two whole prostates (tSNE plot shown in Figure 1A,D; UMAP plot shown in Figure 1—figure supplement 3A). Notably, we could identify five different luminal epithelial populations, a single basal population, rare neuroendocrine cells, and a small population of epithelial cells that expresses both basal and luminal markers. We could also identify distinct stromal and immune components, corresponding to two different stromal subsets (Kwon et al., 2019), as well as immune cells (macrophages, T cells, B cells); some datasets also contained small populations of contaminating vas deferens and seminal vesicle cells.

Figure 1 with 5 supplements see all

To assess whether some of these epithelial populations might be lobe-specific, we next performed single-cell RNA-seq analyses of individual lobes (tSNE plots shown in Figure 1B; UMAP plots shown in Figure 1—figure supplement 3B). We found that four of the luminal populations identified in the aggregated dataset were highly lobe-specific; hence, we named these populations LumA (AP-specific), LumD (DP-specific), LumL (LP-specific), and LumV (VP-specific). The remaining luminal population was observed in the datasets for all four lobes and was highly enriched in the proximal portion of each lobe; thus, we termed this population LumP (proximal) (Figure 1C).

### Spatial localization and morphology of epithelial populations

To examine the lobe-specificity and spatial distribution of these luminal populations, we identified candidate markers based on gene expression patterns in our single-cell datasets (marker genes of interest and/or used in this study are shown in the dot plot of Figure 1D; a larger set of differentially expressed genes are shown in Figure 1—figure supplement 4). We focused our analyses on candidate markers with commercially available antibodies that displayed low background staining in control sections, and selected antibodies for further study that were specific for the cell populations predicted by single-cell analyses (Figure 1D–F; Figure 1—figure supplement 5). For example, we found that the LumA and LumD marker Tgm4 was highly expressed by luminal cells in the distal region of the AP and DP, and that the LumL marker Msmb marked distal luminal cells in the LP. In contrast, the LumP marker Ppp1r1b was highly enriched in luminal cells that were primarily found in the proximal regions of all four lobes (Figure 1D–F). However, more general luminal markers such as androgen receptor (Ar) were expressed in all luminal populations (Figure 1—figure supplement 5).

Next, we investigated the spatial localization of these luminal populations along the proximal-distal axis in each lobe. We found that the LumP-containing proximal region extended from inside the rhabdosphincter to the first major ductal branch point in the AP, DP, and VP, but not the LP, whereas the bulk of the lobes corresponded to distal regions (Figure 1C). In the AP and DP, we found a discrete boundary in the medial region between the proximal LumP population and distal LumA or LumD populations, respectively (Figure 2A,C; Figure 2—figure supplement 1A). In contrast, the LP had a population between the proximal and distal regions that expressed low levels of LumL markers (Figure 2A; Figure 2—figure supplement 1A). Histological analyses revealed that distal luminal cells of each lobe had a tall columnar appearance consistent with secretory function, whereas proximal LumP cells typically had a cuboidal morphology. Notably, this analysis also revealed heterogeneity in the distal region of each lobe, with rare clusters of 1–10 LumP cells observed in the distal AP, DP, and LP, but more frequent LumP clusters in the distal VP (Figure 2B).

Figure 2 with 1 supplement see all

To clarify the phenotypic differences between proximal and distal luminal populations, we performed scanning electron microscopy of an 8-week-old anterior lobe (Figure 2C; Figure 2—figure supplement 1B). LumA cells displayed dense regions of rough endoplasmic reticulum throughout the cytoplasm, many free ribosomes, and abundant secretory vesicles on the apical surface, typical of secretory cells. In contrast, LumP cells displayed areas of high mitochondrial density, complex membrane interdigitation, and no vesicles. At the proximal-distal boundary, we observed an abrupt transition between cellular morphologies that took place within 1–2 cell diameters. These ultrastructural differences indicate that the LumA and LumP populations represent distinct cell types, rather than cell states.

Next, we investigated the remaining epithelial population, which shares basal and luminal features in our single-cell RNA-seq analysis (Figure 1A). We found that this small population was enriched in the most proximal region of all four lobes, residing inside the rhabdosphincter and adjacent to the urethral junction (Figure 2D; Figure 1—figure supplement 1C); hence, we termed this novel population PrU (periurethral). Although this PrU population co-expresses some markers with LumP (Figure 1D), it also expresses several urothelial markers such as Ly6d and Aqp3 (Figure 2D).

The proximity of the PrU and LumP populations to the urethra and their co-expression of multiple markers led us to investigate their developmental origin. Consequently, we examined the expression of Nkx3-1, whose mRNA expression marks epithelial cells in ductal derivatives of the developing urogenital sinus, such as the prostate, but not the urothelium (Bhatia-Gaur et al., 1999); similarly, the Nkx3-1Cre driver also marks early prostate bud cells but not the urogenital sinus during development (Thomsen et al., 2008; Zhang et al., 2008). In the adult prostate, Nkx3-1 is expressed by all four distal luminal populations (LumA, LumD, LumL, LumV), but is not expressed by LumP or PrU (Figure 1—figure supplement 5). However, we found the Nkx3-1Cre driver lineage-marks most of the cells in all of these populations, including LumP and PrU, but not the urethra (Figure 2E). These data indicate that the LumP and PrU populations are derived from Nkx3-1 expressing prostate epithelial cells and are distinct from the urothelium.

### Functional analysis of epithelial populations

We used an approach based on optimal transport theory to ascertain the relationships of these prostate epithelial populations (see Materials and methods). We calculated the similarity among cell populations in mouse using the Wasserstein-1 distance as a measure for phenotypic distance among cell populations, based on clusters in the latent space previously obtained by the Randomly algorithm. The pair-wise comparisons between populations (Figure 3A) can be captured by a neighbor-joining tree (Figure 3B), in which lower Wasserstein distance indicates greater similarity. We found that the distal populations grouped together, with the LumA and LumD populations being most closely related, followed by LumL and LumV. These distal populations were next most closely related to LumP, which in turn was most similar to PrU, followed by basal cells, suggesting a lineage relationship between LumP and distal luminal populations.

Figure 3 with 2 supplements see all

To investigate the functional properties of each epithelial population, we developed a flow sorting strategy to isolate these populations (Figure 3C; Figure 3—figure supplement 1A); we performed re-sorting experiments to assess the purity of the isolated populations (Figure 3—figure supplement 1B). We performed organoid formation assays with isolated cell populations using the defined ENR-based medium (Drost et al., 2016; Karthaus et al., 2014) as well as hepatocyte medium (HM), which has a more complex composition including serum (Chua et al., 2014). Despite differences in overall efficiency between media conditions, we consistently found that the PrU and basal populations were most efficient at forming organoids, followed by LumP, whereas the efficiency of distal LumA, LumD, LumL, and LumV was significantly lower (Figure 3D,E).

Next, we assessed the progenitor potential of isolated epithelial populations using in vivo tissue reconstitution assays. These assays involve recombination of dissociated epithelial cells with rat embryonic urogenital mesenchyme followed by renal grafting, and have been extensively utilized for analysis of progenitor properties in the prostate (Lawson et al., 2007; Wang et al., 2013; Xin et al., 2003). We observed significant variation in the frequency of graft formation depending on the number and type of input epithelial cells (Figure 3F). Based on histological and immunostaining analyses, we found that each epithelial population typically gave rise to cells of the same type, but their ability to generate cells of other populations varied considerably (Figure 3G; Figure 3—figure supplement 2A). Notably, grafts using distal luminal cells required relatively large numbers of input cells to grow (approximately 30,000 cells) (Figure 3F), and were mostly composed of the same population (i.e. LumA cells generated LumA cells) together with a normal or reduced percentage of basal cells; additionally, these grafts could contain small patches of cells expressing LumP markers on their periphery (Figure 3G; Figure 3—figure supplement 2A). Interestingly, LumV cells had the lowest grafting efficiency and generally formed small ductal structures that lacked basal cells. In contrast, LumP, PrU, and basal cells could produce grafts with significantly lower input cell numbers (approximately 1,000 cells), which contained LumP cells together with multiple distal luminal populations as well as a normal ratio of basal cells. We excluded contribution of host cells and rat urogenital epithelium to grafts using control tissue reconstitution assays with GFP-expressing donor epithelial cells (Figure 3—figure supplement 2B).

Taken together, these results suggest a spectrum of progenitor potential among the different epithelial populations. Both PrU and basal cells possessed high progenitor activity in these assays, with LumP cells also displaying enhanced activity. In contrast, the distal luminal populations are much less efficient in these assays. These findings are consistent with the inferred relationships between these populations based on molecular (Figure 3A,B) as well as histological and ultrastructural analyses (Figure 2A,C).

### Comparison of human and mouse prostate epithelial populations

To examine the conservation of epithelial populations between the mouse and human prostate, we performed single-cell RNA-seq analyses of tissue samples from human prostatectomies (Figure 4—source data 1). We identified two distinct luminal populations as well as a PrU-like population (tSNE plots shown in Figure 4A–C; UMAP plots shown in Figure 4—figure supplement 1), and examined their spatial distribution by immunostaining of benign human prostate tissue (Figure 4—source data 1). Using specific markers, we found that one luminal population (Lum Ductal, expressing KRT7 and RARRES1) was primarily localized to ducts, whereas the second population was predominantly acinar (Lum Acinar, expressing MSMB and MME), although some intermixing could be observed within ducts (Figure 4G). As in the mouse, the PrU-like population expressed both basal and luminal genes; similarly, some PrU-like markers were shared with Lum Ductal.

Figure 4 with 1 supplement see all

To extend this analysis, we computed the Wasserstein distances for each cross-species pair-wise comparison between epithelial and major non-epithelial populations (Figure 4D–F). This analysis showed that a PrU-like population is conserved between species. Furthermore, the human Lum Ductal cells are most closely related to mouse LumP, whereas the Lum Acinar cells are most closely related to LumL followed by LumV. Notably, these relationships were observed in each dataset.

## Discussion

We have generated a comprehensive cellular atlas of the prostate epithelium and have defined spatial, morphological, and functional properties of each epithelial population. Our analyses have revealed spatial and functional heterogeneity primarily in the luminal epithelial compartment, including distinct cell populations along the proximal-distal axis as well as lobe-specific identities. Notably, we have shown marked differences in progenitor potential between cell identities, which likely correspond to distinct cell types rather than cell states (Morris, 2019). In tissue reconstitution assays, the ability of LumP and PrU cells to generate luminal distal and basal cells suggests that both populations have properties of multipotent progenitors. In contrast to the luminal compartment, basal cells appear relatively homogeneous, suggesting that previously reported basal heterogeneity (Goldstein et al., 2008; Lawson et al., 2007; Wang et al., 2020) may be more limited. Notably, the PrU population is not readily classified in either compartment as it is comprised of cells with both basal and luminal features.

Our findings also provide a broader context for other reports of epithelial heterogeneity. Recent single-cell RNA-seq analysis of the mouse anterior prostate identified three distinct luminal populations (Karthaus et al., 2020), where L1 appears to correspond to our LumA and L2 to LumP; we also identified a population expressing L3 genes in both mouse and human datasets, but have annotated this as ductus/vas deferens based on marker expression (Figures 1A and 4C), relying on previous findings (Blomqvist et al., 2006). Another recent paper based on single-cell analysis of the whole mouse prostate has described three luminal populations, in which Luminal-A appears to correspond to our LumV, Luminal-B to our LumA and LumD, and Luminal-C to our LumP; notably, this analysis used lineage-tracing to demonstrate progenitor properties of distal Luminal-C (LumP) cells (Guo et al., 2020). Based on patterns of gene expression, we suggest that the LumP population corresponds to the previously reported Sca1high luminal cells (Kwon et al., 2016) as well as Trop2 (Tacstd2)-positive cells (Crowell et al., 2019), which have been described as proximal progenitor populations with scattered distal cells, and may also be responsible for the enhanced serial grafting efficiency of proximal prostate (Goto et al., 2006). In the human prostate, the progenitor activity of LumP and/or PrU-like cells may have been observed by retrospective lineage tracing using mitochondrial mutations (Moad et al., 2017). In addition, the LumP and PrU populations may share some similarities with the ‘hillock’ and ‘club’ cells originally described in a cellular atlas of the mouse lung (Montoro et al., 2018) and reported in the human prostate (Henry et al., 2018; Joseph et al., 2020) and in benign human prostate organoids (McCray et al., 2019), but the precise relationship of these populations is unclear.

Since LumP and PrU cells display multipotent progenitor activity in both organoid formation and tissue reconstitution assays, their spatial distribution may reflect the ability of the prostate to repair itself from a proximal to distal direction in response to extensive tissue damage as well as from distal progenitors in response to more localized injury. Our findings suggest that the novel PrU population in particular may play a role in prostate tissue repair and/or regeneration, consistent with the previous identification of Ly6d-positive cells as a castration-resistant progenitor (Barros-Silva et al., 2018). Notably, the ability of Nkx3-1Cre to lineage-mark both PrU and LumP, but not the urethra, suggests that these cell populations are distinct from the urothelium, despite the molecular similarities between the proximal prostate and the urethra noted in a recent report (Joseph et al., 2020). Nonetheless, our results imply that lineage relationships among the tissues derived from the urogenital sinus (Georgas et al., 2015) require careful elucidation, since they are of fundamental importance for understanding the genesis of congenital defects in the urogenital system.

Our findings help resolve a long-standing question about the relationship of the mouse and human prostates. Specifically, we speculate that mouse proximal and distal regions are most related to human ductal and acinar regions, respectively, and that the mouse LP is most similar to the human peripheral zone. Few if any studies have specifically assessed tumor phenotypes in the LP, as it is small and usually combined with the DP as the dorsolateral prostate (DLP); however, our analyses show that the DP differs significantly from the LP at the anatomical and molecular levels. Consequently, we suggest that a re-evaluation of tumor phenotypes in genetically engineered mouse models may reveal a closer similarity to human prostate tumor histopathology than previously appreciated.

Finally, the elucidation of prostate epithelial heterogeneity has potentially significant implications for understanding the cell of origin for prostate adenocarcinoma. Previous studies have suggested that luminal cells as well as basal cells can serve as the cell of origin for prostate cancer (Wang et al., 2013; Wang et al., 2014; Xin, 2019), yet known differences in human prostate cancer outcome (e.g. Zhao et al., 2017) cannot be simply explained on this basis. Notably, previous studies suggest that both distal and proximal luminal cells can serve as cells of origin for prostate cancer. For example, prostate tumors arise following deletion of Pten by the Nkx3.1-CreERT2 driver, which is specific for distal luminal cells (Wang et al., 2013), as well as by the Ck4-CreERT2 driver, which is specific for LumP cells (Guo et al., 2020). Therefore, further analyses of epithelial heterogeneity and progenitor potential will likely lead to key insights into prostate tumor initiation and progression.

## Materials and methods

Key resources table

### Mouse strains and genotyping

Request a detailed protocol

Wild type C57BL6/N (C57BL/6NTac, 8–10 weeks old) and Swiss-Webster (8–10 weeks) mice were purchased from Taconic. The UBC-GFP (C57BL/6-Tg(UBC-GFP)30Scha/J, 8–13 weeks old; JAX #004353)(Schaefer et al., 2001) and R26R–YFP (B6.Cg–Gt(ROSA)26Sortm3(CAG?EYFP)Hze/J; JAX #007903) (Madisen et al., 2010) mice were obtained from the Jackson Laboratory. The Nkx3-1Cre allele has been previously described (Lin et al., 2007; Thomsen et al., 2008). As hosts for renal grafts, R2G2 mice (B6;129- Rag2tm1FwaIl2rgtm1Rsky/DwIHsd, 8–15 weeks old) were purchased from Envigo, and NOD/SCID mice (NOD.Cg-Prkdcscid/J, 8–14 weeks old; JAX #001303) were purchased from the Jackson Laboratory. To obtain urogenital mesenchyme for tissue recombination and renal grafting, we used E18 Sprague-Dawley rat embryos from timed matings (Charles River #400). Animal studies were approved by and conducted according to standards set by the Columbia University Irving Medical Center (CUIMC) Institutional Animal Care and Use Committee (IACUC).

### Isolation of mouse prostate tissue

Request a detailed protocol

The anterior (AP), dorsal (DP), lateral (LP), and ventral prostate (VP) lobes were dissected individually, using a transverse cut at the intersection of each lobe with the urethra to include the periurethral (PrU) region. For some analyses, we dissected PrU tissues in the most proximal regions (extending 0.5–2 mm from the connection of the lobes with the urethra), or the remainder of the proximal regions separately from the distal lobes. Paired lobes were collected from a single C57BL/6 mouse for each scRNA-seq experiment, or from 2 to 5 C57BL/6 mice for flow sorting, organoid culture, and tissue reconstitution experiments. For analyses of prostate anatomy, lobes were either dissected individually or a deep cut was made at the caudal end of the urethra for removal of entire urogenital apparatus.

### Acquisition and pathological assessment of human prostate tissue samples

Request a detailed protocol

Human prostate tissue specimens were obtained from patients undergoing cystoprostatectomy for bladder cancer or radical prostatectomy at Columbia University Irving Medical Center or at Weill Cornell Medicine. Patients were aged 54–79 years old and gave informed consent under Institutional Review Board-approved protocols. The clinical characteristics of these patients are provided in Figure 4—source data 1. Following surgery, prostate tissue was submitted for gross pathological annotation and sectioning, with ischemic time less than 1 hr.

To acquire samples for single-cell RNA-sequencing, the prostate was transversely sectioned perpendicular to the urethra in three main parts (apex, mid and base), which were further divided based on laterality (left or right). Each part was cut in thick sections that included all three prostatic zones (peripheral, transversal and central). Thick sections with low or no tumor burden were selected for the study based on clinical findings and/or biopsies, and divided in three plates by performing two parallel cuts. The upper flanking plate was flash-frozen, cryosectioned, and a rapid review was performed by a board-certified surgical pathologist (H.H.) to provide preliminary assessment on the presence of benign prostate tissue/absence of carcinoma. The middle flanking plate was stored in RPMI medium with 5% FBS on ice, and immediately transferred to the research facility for single-cell RNA sequencing. The lower flanking plate was processed by formalin fixation and paraffin embedding, followed by sectioning and histological review to confirm presence of benign prostate tissue/absence of carcinoma.

For immunostaining analysis of prostate tissue sections, blocks previously assessed as containing benign prostate histology were selected by a surgical pathologist (B.D.R.). Paraffin sections were immunostained for markers of interest as well as CK5 to confirm the presence of basal cells, and adjacent sections were stained by H and E. H and E sections were then reviewed to confirm benign pathology.

### Dissociation of mouse and human prostate tissue

Request a detailed protocol

Prostate tissues were minced with scissors and then incubated in papain (20 units/ml) with 0.1 mg/ml DNase I (Worthington LK003150) at 37°C with gentle agitation. After 45 min, samples were gently triturated, then incubated for another 20–45 min in papain as needed. Samples were gently triturated again, followed by quenching of the enzyme using 1 mg/ml ovomucoid/bovine serum albumin solution with 0.1 mg/ml DNase I (Worthington LK003150). Cells were passed through a 70 μm strainer (pluriSelect 43-10070-70) and washed with PBS-EDTA with 0.1 mg/ml DNase I (Corning MT-46034CI). If needed, the samples were additionally digested in TrypLE Express (Invitrogen 12605–036) for 3–5 min at 37°C with gentle agitation. The samples were gently triturated and the TrypLE was inactivated by addition of HBSS with 10% FBS and 0.1 mg/ml DNase I. Samples were passed through a 20 μm strainer (pluriSelect 43-10020-70), washed in 1x PBS, and resuspended in appropriate buffers for downstream analyses.

### Single-cell RNA-sequencing

Request a detailed protocol

Dissociated cells were washed twice in 1x PBS, passed twice through 20 μm strainers (pluriSelect 43-10020-70), and counted using a hemocytometer or the Countess II FL Automated Cell Counter (ThermoFisher). If the viability of samples was >80% and the single-cell fraction was >95%, the cells were resuspended in 1x PBS with 0.04% BSA at approximately 1 × 106 cells/ml. Samples were submitted to the Columbia JP Sulzberger Genome Center for single-cell RNA-sequencing on the 10x Genomics Chromium platform. Libraries were generated using the Chromium Single Cell 3’ Reagent Kit v2, with 12 cycles for cDNA amplification and 12 cycles for library construction. Samples were sequenced on a NovaSeq 6000 (r1 = 26, i1 = 8, r2 = 91). Sequencing data were aligned and quantified using the Cell Ranger Single-Cell Software Suite (v.2.1.1) with either the GRCm38 mouse or the GRCh38 human reference genomes.

For the mouse prostate, two independent biological replicate samples of whole prostate were submitted for scRNA-seq, with 2361 and 2,927 cells sequenced. Two separate biological replicates for individual lobes were also used for scRNA-seq, with 1,581–2,735 cells sequenced for sample. For human prostate (Figure 4—source data 1), three independent samples (#1–3) that corresponded to regions of benign histology were submitted for scRNA-seq, with 1,600, 2,303, and 2,825 cells sequenced, respectively. Single-cell datasets have been deposited in GEO under accession number GSE150692, and can also be accessed through the Broad Institute Single-Cell Portal (www.singlecell.broadinstitute.org).

### Flow cytometry

Request a detailed protocol

Dissociated cell suspensions were counted and resuspended in FACS buffer (1–3% FBS in 1x PBS or HBSS, 1 mM EDTA, and 0.1 mg/ml DNase I). Primary antibodies were added at the final concentrations indicated in the Key Resources Table and incubated at 4°C for 20–30 min. Samples were additionally incubated with 1 μM propidium iodide for 15 min before sorting for dead cell exclusion. Cells were sorted using a BD Influx or Aria, using the widest nozzles and lowest pressure settings (140 μm nozzle and 7 psi for the Influx; 130 μm nozzle and 8 psi for the Aria). Data analysis was conducted using FCS Express seven software. A slightly modified strategy was used to sort prostate cells from UBC-GFP mice.

### Organoid culture

Request a detailed protocol

For ENR conditions (Karthaus et al., 2014), sorted cells from UBC-GFP mice were resuspended at a final concentration of 1,000 cells per 30 μl droplet of Matrigel (Corning 354234), and placed in individual wells of a 24-well plate. The Matrigel was covered with 500 μl of ENR medium, supplemented with 10 nM dihydrotestosterone (DHT), 5 mM A 83-01, and 10 μM Y-27632. Media were replenished at day five and organoids were imaged at day 10. Organoid measurements were performed using the Fiji Particle Analysis Plugin (Rueden et al., 2017; Schindelin et al., 2012), excluding particles with area <2000 μm2 and roundness value <0.5. If needed, Watershed was applied to separate overlapping organoids/particles, with wells from at least four independent experiments analyzed.

For hepatocyte media (HM) conditions (Chua et al., 2014), sorted cells from wild-type C57BL/6 mice were plated at 2000 and 5,000 cells per well in ultra-low attachment 96-well plates (Corning 3474), and grown in hepatocyte media with 5% Matrigel, 10 nM DHT, and 10 μM Y-27632, with media replenished every 5 days. Organoid formation efficiencies were calculated on day 12–13 of culture. Since LumV cells tended to form small structures containing 1–4 cells, we used a required minimum cut-off size; organoid images were analyzed using ImageJ. Data were collected from three biological replicate experiments, with a minimum of 2–3 technical replicates for each population in each experiment.

### Renal grafting

Request a detailed protocol

For tissue reconstitution experiments, urogenital mesenchyme (UGM) cells were collected from embryonic day 18.5 rat embryos as described (Chua et al., 2018) and passed through a 100 μM filter (Pluriselect) before use. Sorted mouse epithelial cells were used at ranges from 250 to 60,000 cells depending on the specific population; since basal cells have been previously examined in graft experiments (e.g., Wang et al., 2013), we did not investigate the minimum number of basal cells required for graft growth in these experiments. Rat UGM cells and sorted mouse epithelial cells were combined at pre-determined ratios (e.g., 250,000 UGM/5,000 LumP cells) and resuspended in 10–15 μl buttons composed of 9:1 collagen 1 (Corning 354249):setting solution (10x EBSS, 0.2 M NaHCO3, and 50 mM NaOH). After solidification of the collagen, the buttons were incubated in DMEM media with 10% FBS and 10 nM DHT overnight, followed by grafting under the kidney capsule of host immunodeficient mice on the next day. At the time of surgery, a slow-release testosterone pellet (12.5 mg testosterone, 90 day release; Innovative Research of America NA-151) was inserted subcutaneously in each host mouse. Grafts were analyzed 8–12 weeks after surgery.

### Histology, immunostaining, and image analysis

Request a detailed protocol

For generation of paraffin blocks, prostate tissues were dissected in ice-cold HBSS and fixed in 10% formalin overnight, followed by processing through an ethanol gradient and embedding. For generation of frozen blocks, dissected tissues were fixed in 4% paraformaldehyde, immersed in sucrose overnight (30% in 1x PBS), embedded in OCT (Tissue-Tek 4583), and stored at ?80°C. Alternatively, samples were flash-frozen in 2-methyl-butane (Sigma-Aldrich M32631) at ?150°C for 1 hr, then stored at ?80°C. Paraffin-embedded tissues were sectioned using a MICROM HM 325 microtome, and cryo-preserved tissues were sectioned using a Leica CM 1900 cryostat, at thicknesses of 5–13 μm.

For histological analyses, hematoxylin-eosin (H and E) staining was performed on paraffin sections using standard procedures. For immunofluorescence staining of paraffin sections, antigen retrieval was performed by boiling slides in citrate-based or Tris-based antigen unmasking buffer (Vector Labs H3300 and H3301) for 15-45 min. For immunofluorescence of cryosections, slides were rapidly fixed in either 4% paraformaldehyde or 10% NBF for 5 min after sectioning. Slides were washed, blocked in 5% animal serum for 1 hr, and incubated with primary antibodies overnight at 4°C. Slides were washed and incubated with Alexa Fluor secondary antibodies (Life Technologies) for one hour. Sections were stained with DAPI, and mounted (Vector Labs H-1200). Fluorescent images were acquired using a Leica TCS SP2, a Leica TCS SP5, or a Nikon Ti Eclipse inverted confocal microscope.

### Electron microscopy

Request a detailed protocol

Prostate tissue from a C57BL/6 mouse at 8 weeks of age was dissected and fixed for 2 hr in 0.1 M sodium cacodylate buffer (pH 7.2) containing 2.5% glutaraldehyde and 2% paraformaldehyde. A portion of the AP lobe at the proximal-distal boundary was micro-dissected and post-fixed for two hours with 1% osmium tetroxide, contrasted with 1% aqueous uranyl acetate, dehydrated using an ethanol gradient and embedded in EMbed 812 (Electron Microscopy Sciences, Hatfield, PA). 70 nm ultrathin sections were cut, mounted on formvar coated slot copper grids, and stained with uranyl acetate and lead citrate. Stained grids were imaged with a Zeiss Gemini300 scanning electron microscope using the STEM detector.

### Statistical analysis

Request a detailed protocol

Prism v.8 was used for statistical analyses of functional data and for plot generation. For analyses of organoid formation efficiencies, the data passed the Shapiro-Wilk test for normal distribution (p>alpha = 0.05), but did not pass the Bartlett’s test for equal variance (p<0.05). We therefore used the Brown-Forsythe and Welch One-way Analysis of Variance (ANOVA) to confirm statistically significant differences between organoid populations (p<0.0001 for both HM and ENR conditions). Since all populations have fewer than 50 data points per sample, we used the Dunnett’s T3 multiple comparisons test to determine which populations significantly differ (p<0.05). The p-values on the graphs indicate the least significant difference observed between compared populations.

For analyses of graft efficiency, the data did not pass the Shapiro-Wilk test for normal distribution (p<a = 0.05), or Bartlett’s test for equal variance (p<0.05). We therefore used a non-parametric Kruskal-Wallis test to confirm statistically significant differences between graft input cell numbers by epithelial population (p<0.0001), followed by the two-stage linear step-up procedure of Benjamini, Krieger and Yekutieli (p<0.05). The p-value on the graph indicates the average significant difference between the up to 10 lowest input cell numbers for each distal luminal population compared to the LumP population.

### Bioinformatic analysis of single-cell RNA-seq data

#### Outline of analytical pipeline

Request a detailed protocol

We first present an overview of our analytical workflow for processing the single-cell RNA-seq data, and describe these steps in detail below. All of the code used in our analysis is publicly available, and links to these packages are provided.

1. Filtering of raw sequencing data

2. Batch effect correction (for aggregation of datasets; https://satijalab.org/seurat/) and processing of data using Randomly (https://rabadan.c2b2.columbia.edu/html/randomly/)

3. Clustering of data using the Leiden algorithm (https://scanpy.readthedocs.io/en/stable)

4. Dimensional reduction for visualization by tSNE and UMAP plots (incorporated as part of the Randomly package)

#### Filtering the expression matrix

Request a detailed protocol

The starting pool of cells in the mouse prostate analysis is 13,429 cells, which is composed of two whole prostate samples of 2361 and 2927 cells, and 4 samples corresponding to each of the lobes at 2735 (AP), 1781 (DP), 2044 (LP), and 1581 (VP) cells. The starting pool of cells for human prostate analyses is 6728 cells coming from three independent samples of 2303, 1600 and 2825 cells each. When filtering the data, we removed cells with less than 500 genes detected and cells with >10% of total transcripts derived from mitochondrial-encoded genes. The expression matrices are normalized by $log2(1+TPM)$, where $TPM$ denotes transcripts per million.

#### Batch effect correction

Request a detailed protocol

For Figure 1A, we have aggregated the two samples corresponding to the whole mouse prostate. As a first step to remove batch effects, we used the algorithm described in Stuart et al., 2019, using default parameters.

#### Random matrix theory application to single-cell transcriptomics

Request a detailed protocol

To process our single-cell data, we have used Randomly, an algorithm based on Random Matrix Theory (RMT) (Aparicio et al., 2020). This algorithm is a public Python package and can be found in https://rabadan.c2b2.columbia.edu/html/randomly/. The idea of this algorithm is based on the fact that a single-cell dataset shows a threefold structure: a random matrix, a sparsity-induced (fake) signal and a biological signal. Indeed, 95% or more of the single-cell expression matrix is compatible with being a random matrix and hence, in such a case, with being pure noise (Aparicio et al., 2020). In order to detect the part of the expression matrix compatible with noise, Randomly uses the universality properties of RMT. More specifically, let us suppose a N × P expression matrix X, where N is the number of cells and P is the number of genes, and where each column is independently drawn from a distribution with mean zero and variance $σ$, the corresponding Wishart matrix is defined as an N × N matrix:

$W=\frac{1}{P}X{X}^{T}$

The eigenvalues $λi$ and normalized eigenvectors $ψi$ of the Wishart matrix where $i=1, 2, …N$ are given by the following relation:

$W{\psi }_{i}={\lambda }_{i}{\psi }_{i}$

If X happens to be a random matrix (a matrix whose entries xij are randomly sampled from a given distribution), then W becomes a random covariance matrix and the properties of its eigenvalues and eigenvectors are described by Random Matrix Theory. Universality properties of RMT arise in the limit $N→∞, P →∞, γ=NP$ fixed. One of the consequences of universality at the level of eigenvalues $λi$, is that empirical density of states converges to the so-called Marchenko-Pastur (MP) distribution:

${\rho }_{MP}\left(\lambda \right)=\frac{1}{2\pi \gamma {\sigma }^{2}}\frac{\sqrt{\left({a}_{+}-\lambda \right)\left(\lambda -{a}_{-}\right)}}{\lambda }{\mathbb{I}}_{\left[{a}_{-},{a}_{+}\right]}$

where

${a}_{±}={{\sigma }^{2}\left(1±\sqrt{\gamma }\right)}^{2}$

and $σ$ represents the variance of the probability distribution that generates each element in the random matrix ensemble. Any deviation of the eigenvalues from MP distribution would imply that the expression matrix X is not completely random, and therefore contains a signal that could be further analyzed.

One of the key features of the Randomly algorithm is the analysis of eigenvectors. At the level of eigenvectors, RMT universality is manifested through the so-called eigenvector delocalization, which implies that the norm of the eigenvectors $ψi$ is equally distributed among all their components α:

$\left|{\psi }_{i}^{\left(\alpha \right)}\right|~\frac{1}{\sqrt{N}}$

Interestingly, the distribution of components for delocalized eigenvectors at large N approximates a Gaussian distribution with mean zero and 1/N variance

$f\left(\psi \right)~\frac{N}{\sqrt{2\pi }}{e}^{\left(\frac{-N{\psi }^{2}}{2}\right)}$

The presence of any localized (non-delocalized) eigenvector implies that expression matrix X is not completely random, and hence the existence of a signal that carries information.

However, in single-cell datasets, there is a very important subtlety due to the sparsity, which can generate a fake signal (Aparicio et al., 2020). At the single-cell analysis level, the presence of localized eigenvectors related with sparsity implies the existence of an undesired (fake) signal. The Randomly algorithm is able to eliminate the sparsity-induced (fake) signal and isolate the biological signal.

In Figure 1—figure supplement 2, we show one example of the performance of the Randomly algorithm. Figure 1—figure supplement 2E shows a comparison of the eigenvector localization with and without sparsity-induced signal in one of the single-cell mouse datasets. The number of eigenvectors that carries signal is larger for the case with sparsity-induced signal. After removal of the fake signal due to sparsity, Figure 1—figure supplement 2F shows the distribution of eigenvalues and the fraction behaving in agreement with the MP distribution. More than 95% of the expression matrix is compatible with a random matrix and therefore is equivalent to random noise. In Figure 1—figure supplement 2G and H, the algorithm projects the original expression matrix into the signal-like eigenvectors and the noise-like eigenvectors and performs a chi-squared test for the variance (normalized sample variance), which allows identification of the signal-like genes based on a false discovery rate.

#### Clustering

Request a detailed protocol

Clustering was performed using the Leiden algorithm (Wolf et al., 2018), with the number of clusters based on the mean silhouette score. More specifically, we performed a set of clustering performances for different Leiden resolution parameters and computed the mean silhouette score for each case. The silhouette coefficient for a specific cell is given by:

$s=\frac{b-a}{\mathrm{m}\mathrm{a}\mathrm{x}?\left(a,b\right)}$

where the $a$ is the mean distance between a cell and all the other cells of the same class, and parameter $b$ is the mean distance between a cell and all other cells in the next nearest cluster. Figure 1—figure supplement 2I shows the mean silhouette score as a function of the Leiden resolution parameter and the number of clusters for each case. The strategy we followed was to select the number of clusters that maximizes this correlation. In some cases, it could be also useful to sub-cluster some of the clusters, repeating the strategy just described for one specific cluster. The sub-clustering was used to disentangle the immune populations or the vas deferens and the seminal vesicle populations.

#### t-SNE and UMAP representations

Request a detailed protocol

In order to visualize single-cell clusters, we performed a further dimensional reduction to two dimensions by t-SNE (t-distributed Stochastic Neighbor Embedding) or UMAP (Uniform Manifold Approximation and Projection) representations, using default parameters. The tSNE, UMAP, dot-plots, and ridge-plots were generated using the visualization functions of the Randomly package (Aparicio et al., 2020).

#### Comparison of RMT with traditional pipelines based on PCA dimensional reduction

Request a detailed protocol

To show a comparison with traditional approaches based on PCA, we have followed the pipeline in a public tool (Wolf et al., 2018) often used for single-cell analysis. We have performed a PCA reduction, selecting principal components (PCs) through accumulated variance changes across the different PCs (Figure 1—figure supplement 2A). In this case, only 10 PCs are selected following this approach. After the dimensional reduction, we performed a clustering following the strategy described in the previous section (Figure 1—figure supplement 2B), selecting the number of clusters that maximize the mean silhouette score. Comparing Figure 1—figure supplement 2B and I, it is clear that the RMT generates a better clustering performance: the maximum of the silhouette score curve is larger than that generated by the traditional PCA approach, and one of these clusters is able to capture the periurethral (PrU) population (Figure 1—figure supplement 2J). On the other hand, the method based on traditional PCA is not able to capture the PrU population even if we allow for larger Leiden resolutions (Figure 1—figure supplement 2C and D).

#### Differential expression analysis

Request a detailed protocol

To test for differentially expressed genes among the different populations of prostate luminal cells, we used a t-test on the datasets after de-noising them with Randomly. The p-value was corrected for multiple hypothesis testing using the Benjamini-Hochberg procedure. We used the implementation of Wolf et al., 2018 with overestimation of the variance and comparison with a Wilcoxon test. Based on this analysis, we selected genes with a corrected p-value smaller than 0.001.

### Mouse population similarity

Request a detailed protocol

To calculate the phenotypic similitude/distance between epithelial populations in the mouse prostate, we performed an analysis based on Optimal Transport (OT) (Kolouri et al., 2017; Villani, 2003). More specifically, we used the Wasserstein-1 distance as a measure for phenotypic distance between cell populations, that is, among clusters in the latent space obtained after using Randomly. The Wasserstein-1 distance is defined as a distance function between probability distributions in a given metric space. Assuming that the metric space is Euclidean, the Wasserstein-1 distance (Kolouri et al., 2017; Villani, 2003) is defined as:

${W}_{1}\left(\mu ,\nu \right)=min\left\{{\int }_{{\mathbb{R}}^{d}×{\mathbb{R}}^{d}}^{\text{?}}‖y?x‖\text{?}\gamma \left(dx,dy\right):\text{?}\gamma \in couplings\left(\mu ,\nu \right)\right\}$

It is also known as the earth mover’s distance, in which each probability distribution can be seen as an amount of dirt piled in the metric space, with the Wasserstein distance corresponding to the cost of turning one pile into the other. In our case, we would be evaluating the cost of transforming one population into another.

The optimization of the Wasserstein calculation can be turned into an OT problem, based on the Sinkhorn algorithm and the entropic regularization technique (Altschuler et al., 2017; Chizat et al., 2018; Cuturi, 2013; Schmitzer, 2016). We have used the Python implementation of the package POT Python Optimal Transport library (https://github.com/rflamary/POT), which solves the entropic regularization OT problem and return the loss ($W1(μ,ν)$). We have used as metric cost matrix a Euclidean pairwise distance matrix and assumed that the cell populations correspond to uniform probability distributions defined in the latent space obtained after using Randomly.

To calculate the distances between populations, we constructed a matrix of Wasserstein distances among the epithelial populations described in Figure 1 and visualized it using a heatmap and a hierarchical clustering (Figure 3A). We also generated a tree-like visualization of all the information contained in the hierarchical clustering/heatmap using a neighbor joining algorithm (Schliep, 2011). The length of the branches in the tree is measured in units of the Wasserstein-1 distance (Figure 3B).

### Cross-species analysis

Request a detailed protocol

We performed a comparison between the epithelial populations in human and mouse based on OT and Wasserstein distance. To harmonize the human and mouse datasets, we first constructed a common latent space between the aggregated mouse data set and each of the three human samples. To that end, we first looked for the mouse orthologous genes, and then normalized mouse and human separately using $log2(1+TPM$). We filtered out any gene which has an average expression smaller than 0.1 for human or mouse, and merged the two corresponding human and mouse datasets. Finally, we used Randomly to generate the common latent space.

We used the Wasserstein distance to calculate the similitude among the clusters of points previously identified with the different mouse and human populations in Figures 1 and 4. We visualized this with a set of nested heatmaps (Figure 4D–F) to make explicit which populations have the minimum Wasserstein distance between each human population and mouse populations.

We then validated the accuracy of this strategy. The first validation test is that the conserved epithelial populations Basal and Lum P in human have a minimum in the Wasserstein distance with their mouse equivalents. A second test of the robustness is to compare cell types that are known to be well-conserved across species, such as immune cells. As with the conserved epithelial cell types, the human immune cell populations also have a minimum Wasserstein distance with respect to the corresponding mouse immune populations.

## References

1. 1
Near-linear time approximation algorithms for optimal transport via sinkhorn iteration
(2017)
Advances in Neural Information Processing Systems. pp. 1961–1971.
2. 2
3. 3
4. 4
Expression signature of the mouse prostate (2005)
Journal of Biological Chemistry 280:36442–36451.
https://doi.org/10.1074/jbc.M504945200
5. 5
6. 6
7. 7
8. 8
9. 9
10. 10
11. 11
12. 12
13. 13
14. 14
Development of the human prostate (2018)
Differentiation; Research in Biological Diversity 103:24–45.
https://doi.org/10.1016/j.diff.2018.08.005
15. 15
Sinkhorn distances: lightspeed computation of optimal transport
(2013)
Advances in Neural Information Processing Systems.
16. 16
17. 17
18. 18
19. 19
20. 20
21. 21
22. 22
23. 23
Anatomy and histology of the human and murine prostate (2018)
Cold Spring Harbor Perspectives in Medicine 8:a030346.
https://doi.org/10.1101/cshperspect.a030346
24. 24
25. 25
26. 26
27. 27
28. 28
29. 29
30. 30
31. 31
32. 32
33. 33
34. 34
35. 35
36. 36
Single-cell RNA-Seq analysis identifies a putative epithelial stem cell population in human primary prostate cells in monolayer and organoid culture conditions
(2019)
American Journal of Clinical and Experimental Urology 7:123–138.
37. 37
38. 38
39. 39
40. 40
41. 41
42. 42
43. 43
44. 44
45. 45
46. 46
47. 47
48. 48
49. 49
50. 50
51. 51
52. 52
Topics in Optimal Transportation, 58
(2003)
American Mathematical Soc.
53. 53
54. 54
55. 55
56. 56
57. 57
58. 58
59. 59
Cells of origin for prostate Cancer (2019)
Advances in Experimental Medicine and Biology 1210:67–86.
https://doi.org/10.1007/978-3-030-32656-2_4
60. 60
61. 61
62. 62

## Decision letter

1. Nima Sharifi
Reviewing Editor; Cleveland Clinic, United States
2. Richard M White
Senior Editor; Memorial Sloan Kettering Cancer Center, United States
3. Nima Sharifi
Reviewer; Cleveland Clinic, United States
4. Mark Rubin
Reviewer
5. Justin D Lathia
Reviewer; Cleveland Clinic, United States

In the interests of transparency, eLife publishes the most substantive revision requests and the accompanying author responses.

Acceptance summary:

Your article identifies distinct luminal epithelial and periurethral populations of the adult mouse prostate using a single-cell sequencing approach. Furthermore, you identify the human prostate counterparts to these distinct populations of mouse prostate cells, which together advances the our understanding of the relationship between human and mouse prostate populations.

Decision letter after peer review:

Thank you for submitting your article "A single-cell atlas of the mouse and human prostate reveals heterogeneity and conservation of epithelial progenitors" for consideration by eLife. Your article has been reviewed by three peer reviewers, including Nima Sharifi as the Reviewing Editor and Reviewer #1, and the evaluation has been overseen by Richard White as the Senior Editor. The following individuals involved in review of your submission have agreed to reveal their identity: Mark Rubin (Reviewer #2); Justin D Lathia (Reviewer #3).

The reviewers have discussed the reviews with one another and the Reviewing Editor has drafted this decision to help you prepare a revised submission.

We would like to draw your attention to changes in our revision policy that we have made in response to COVID-19 (http://www.asadoresteatre.com/articles/57162). Specifically, we are asking editors to accept without delay manuscripts, like yours, that they judge can stand as eLife papers without additional data, even if they feel that they would make the manuscript stronger. Thus the revisions requested below only address clarity and presentation.

Summary:

This study is a well-done single-cell comparative atlas of the mouse and human prostate. Their studies were from two whole mouse prostates, 4 samples corresponding to each of the lobes and 3 independent sets of human prostate for comparison. Altogether, they identify lobe-specific luminal epithelial populations (LumA, LumD, LumL, and LumV) in the distal region and the proximally-enriched luminal population (LumP) that is not specific to any lobe and a periurethral population (PrU) that has both basal and luminal features. Organoid studies suggest that LumP and PrU cells are multipotent progenitors. Comparisons to the human prostate suggest that the mouse lateral prostate is most like the human peripheral zone – from which the majority of prostate cancers arise. Altogether, this is generally a well done study. The nature of the study is fundamental, the paper is generally well written and the studies themselves generally well-described. All three reviewers concur about the importance of this body of work.

Essential revisions:

Most of the comments pertain to optimizing the clarity of presentation and data accessibility. Please see the specific reviewer comments below.

1) Title. A title referencing the work as an atlas should make the atlas more publicly accessible and reproducible than an expression matrix. Either remove atlas from the title/text or consider making this group of work more accessible to the community so that it can be utilized as an atlas. One way to do this would be to place these data into a shiny app or some other sort of online single cell viewing portal (UCSC cell browser, or the broad institute single cell portal). This is especially important because you use Randomly and many people will place this data directly into other pipelines such as Seurat and the clustering will be different so people may not be able to query your exact populations without further information from your group. Following this the entirety of the code should be published with this manuscript since a newer method clustering is used and there is no single package that this data is being ran through.

2 Subsection “Distinct luminal epithelial populations in the mouse prostate” – please refer how the samples were dissociated for single cell seq here as this is important for future studies.

3) Subsection “Distinct luminal epithelial populations in the mouse prostate” – The Randomly package is explained and referenced but then in the Materials and methods section there are almost 3 pages about Randomly and the equations used. I believe it is enough to say at which steps you implemented the use of this published package and then move on from here. Additionally, please briefly mention that batch effect correction was done and how. I know this is in the Materials and methods but in general one should be able to read in 2-3 sentences the sequencing depth, method of data alignment, batch correction, use of randomly, and then downstream analysis. Extra detail about each of these steps and the exact settings used if not default settings can be added into the Materials and methods.

4) Subsection “Distinct luminal epithelial populations in the mouse prostate” – – it's my belief that these tSNE plots should be UMAPs to show your clusters better. For example, Figure 1A has intermixing of luminal A and D but viewing this as a tSNE I don't know if these clusters are mixed because they are similar and there is a population of LumD that appear more similar to LumA. There is also a PrU cell that appears in the center of the plot away from the rest of the PrU cells. I think these should be re-graphed as UMAP plots for the whole paper. At the very least they should be in the supplement as UMAPs but I suspect they will represent your data better and you may want to swap them once they are graphed.

5) The description and biology of the cellular constituents of the mouse prostate and comparison to human anatomy are well done. In human, the normal prostate anatomic origin of prostate cancer is well worked out. This group has pioneer several different mouse prostate cancer models. Given the profiling in the current paper, can they comment or show data on the cellular origin of mouse prostate cancer? The genetic models may have divergent origins and may be multifocal; however, one wonders if they bear some resemblance to LumP or PrU cells, at least more so compared to the other cell and anatomic types.

6) Subsection “Spatial localization and morphology of epithelial populations” – "suitable antibodies" is vague and needs to be expanded on what makes them suitable.

7) Subsection “Spatial localization and morphology of epithelial populations” – "specificity" for their markers also should be expanded on, without knockdown and knockout mice you can't prove the antibody is specific. It is simply enough to say that we utilized and stained with these antibodies and demonstrated that there was little or no background on the secondary only controls.

8).Subsection “Functional analysis of epithelial populations” – Figure 3C is unclear what these flow plots actually represent? What was the starting population or flow and what gates preceded these representative panels. Include this in the text and figure legend.

9) Discussion, first paragraph – function properties are noted and it would be nice to see differential gene expression showed for the groups and the significantly expressed differences between groups should be published as supplementary tables.

10) Subsection “Data availability”. Utilizing new methods of single cell analysis and calling the data an atlas and identifying new cell types necessitates that this data be available in a better format than GEO. The entirety of the code along with metadata showing which cells are in which clusters need to be made available. An added bonus would be uploading it to a portal so people can query their own genes with the cell clusters you identified but at the very minimum the above criteria should be met.

11) Referencing/novelty – There are several single cell studies in this general area, but are not referenced, and they should be (PMIDs: 31317052, 30566875, 29233929). It is acknowledged that this study is clear different, but the reader should be made aware of the current state of the field.

https://doi.org/10.7554/eLife.59465.sa1

## Author response

Essential revisions:

Most of the comments pertain to optimizing the clarity of presentation and data accessibility. Please see the specific reviewer comments below.

1) Title. A title referencing the work as an atlas should make the atlas more publicly accessible and reproducible than an expression matrix. Either remove atlas from the title/text or consider making this group of work more accessible to the community so that it can be utilized as an atlas. One way to do this would be to place these data into a shiny app or some other sort of online single cell viewing portal (UCSC cell browser, or the broad institute single cell portal). This is especially important because you use Randomly and many people will place this data directly into other pipelines such as Seurat and the clustering will be different so people may not be able to query your exact populations without further information from your group. Following this the entirety of the code should be published with this manuscript since a newer method clustering is used and there is no single package that this data is being ran through.

We thank the reviewers for this valuable comment. Since we believe that our study indeed constitutes an “atlas”, we agree that it is appropriate for us to deposit the single-cell RNA-seq data in a suitable on-line portal for ease-of-access. Consequently, we have placed our datasets in the Broad Institute Single Cell Portal, which can be accessed through the following temporary links: https://singlecell.broadinstitute.org/single_cell/study/SCP1080/anterior#study-visualize https://singlecell.broadinstitute.org/single_cell/study/SCP1081/dorsal#study-visualize

At present, these datasets can still considered to be in a “beta” format, but we will ensure that they are in a final form and are made available for public access prior to publication. We have noted that the datasets can be accessed at the Broad Institute Single Cell Portal in the subsections “Single-cell RNA-sequencing” and “Data availability”.

With respect to code availability, all of the code used in this study is publicly available and cited appropriately in the Materials and methods. While we did not use a single package to process our data, we have revised the Materials and methods description in the subsection “Outline of analytical pipeline” to outline which packages were used in sequence, and have provided links for access.

2) Subsection “Distinct luminal epithelial populations in the mouse prostate” – please refer how the samples were dissociated for single cell seq here as this is important for future studies.

In the subsection “Distinct luminal epithelial populations in the mouse prostate”, we have added text describing the dissociation protocol, and refer to the Materials and methods for details of the methodology.

3) Subsection “Distinct luminal epithelial populations in the mouse prostate” – The Randomly package is explained and referenced but then in the Materials and methods section there are almost 3 pages about Randomly and the equations used. I believe it is enough to say at which steps you implemented the use of this published package and then move on from here. Additionally, please briefly mention that batch effect correction was done and how. I know this is in the Materials and methods but in general one should be able to read in 2-3 sentences the sequencing depth, method of data alignment, batch correction, use of randomly, and then downstream analysis. Extra detail about each of these steps and the exact settings used if not default settings can be added into the Materials and methods.

In response to this critique, we have added a brief description of the analytical pipeline in the subsection “Distinct luminal epithelial populations in the mouse prostate”, and refer to the Materials and methods for details. As noted in critique #1 above, we have also simplified the description in the Materials and methods to clarify the workflow. However, we have retained most of the text on the Randomly algorithm and the accompanying Figure 1—figure supplement 2 since we believe that these descriptions are essential for understanding the advantages of our computational approach.

4) Subsection “Distinct luminal epithelial populations in the mouse prostate” – it's my belief that these tSNE plots should be UMAPs to show your clusters better. For example, Figure 1A has intermixing of luminal A and D but viewing this as a tSNE I don't know if these clusters are mixed because they are similar and there is a population of LumD that appear more similar to LumA. There is also a PrU cell that appears in the center of the plot away from the rest of the PrU cells. I think these should be re-graphed as UMAP plots for the whole paper. At the very least they should be in the supplement as UMAPs but I suspect they will represent your data better and you may want to swap them once they are graphed.

In response to this request, we have generated UMAP plots that correspond to each of the tSNE plots shown in the main figures. These UMAP plots are now shown in new supplementary figures (Figure 1—figure supplement 3 and Figure 4—figure supplement 1). We believe that these UMAP plots are complementary to the tSNE plots shown in the main figures, and that providing both sets of plots may be advantageous to some readers. With respect to Figure 1A, we note that LumA and LumD populations are indeed closely related and slightly intermingled, and remain so in the UMAP representation in the new Figure 1—figure supplement 3A.

5) The description and biology of the cellular constituents of the mouse prostate and comparison to human anatomy are well done. In human, the normal prostate anatomic origin of prostate cancer is well worked out. This group has pioneer several different mouse prostate cancer models. Given the profiling in the current paper, can they comment or show data on the cellular origin of mouse prostate cancer? The genetic models may have divergent origins and may be multifocal; however, one wonders if they bear some resemblance to LumP or PrU cells, at least more so compared to the other cell and anatomic types.

We agree with the reviewers that this is an important topic, and have added additional sentences to the last paragraph of the Discussion to state that both proximal and distal luminal cells can likely serve as cells of origin for prostate cancer. However, a comprehensive analysis of cell types of origin for prostate cancer lies well beyond the scope of the present study.

6) Subsection “Spatial localization and morphology of epithelial populations” – "suitable antibodies" is vague and needs to be expanded on what makes them suitable.

We agree that the previous description was vague, and have now added text to the subsection “Spatial localization and morphology of epithelial populations”, to describe how antibodies were selected for study.

7) Subsection “Spatial localization and morphology of epithelial populations” – "specificity" for their markers also should be expanded on, without knockdown and knockout mice you can't prove the antibody is specific. It is simply enough to say that we utilized and stained with these antibodies and demonstrated that there was little or no background on the secondary only controls.

As noted in our response to critique #6 above, we have now clarified the description of how antibodies were selected in the subsection “Spatial localization and morphology of epithelial populations”.

8) Subsection “Functional analysis of epithelial populations” – Figure 3C is unclear what these flow plots actually represent? What was the starting population or flow and what gates preceded these representative panels. Include this in the text and figure legend.

We agree with the reviewers that our previous description of flow cytometry was inadequate. Consequently, we have added the new Figure 3—figure supplement 1A to show the complete flow sorting strategy, have updated the panels in Figure 3C, and have revised the text in the subsection “Functional analysis of epithelial populations” accordingly. In addition, we now provide a control re-sorting experiment in Figure 3— figure supplement 1B to demonstrate the purity of the cell populations obtained by our sorting strategy.

9) Discussion, first paragraph – function properties are noted and it would be nice to see differential gene expression showed for the groups and the significantly expressed differences between groups should be published as supplementary tables.

In response to this critique, we have included the new Figure 1—figure supplement 4 to show an expanded dot plot for genes that are differentially expressed between the different epithelial populations, based on our single-cell RNA-seq data, and have cited this new figure in the subsection “Spatial localization and morphology of epithelial populations”. We believe that this dot plot is more useful for the reader than additional supplementary tables, since differential expression can be assessed at a glance in Figure 1—figure supplement 4.

10) Subsection “Data availability”. Utilizing new methods of single cell analysis and calling the data an atlas and identifying new cell types necessitates that this data be available in a better format than GEO. The entirety of the code along with metadata showing which cells are in which clusters need to be made available. An added bonus would be uploading it to a portal so people can query their own genes with the cell clusters you identified but at the very minimum the above criteria should be met.

As described in our response to critique #1, we have made all of our datasets available on the Broad Institute single-cell web portal. Importantly, this web portal will allow readers to query our datasets with their genes of interest. Also, as stated above, all of the code utilized in this study is publicly available. We have generated metadata for each dataset, which will be provided in GSE150692.

11) Referencing/novelty – There are several single cell studies in this general area, but are not referenced, and they should be (PMIDs: 31317052, 30566875, 29233929). It is acknowledged that this study is clear different, but the reader should be made aware of the current state of the field.

In response to this critique, we have expanded the Discussion to discuss and cite PMID 31317052 and 30566875, but not PMID 29233929, which is a study of LNCaP prostate cancer cells that we believe is not relevant for our work. We have also cited and described in lines 260-264 the new paper from the Gao laboratory (Guo et al., 2020), which was published recently in Nature Genetics.

https://doi.org/10.7554/eLife.59465.sa2

## Article and author information

### Author details

1. #### Laura Crowley

1. Department of Medicine, Columbia University Irving Medical Center, New York, United States
2. Department of Genetics and Development, Columbia University Irving Medical Center, New York, United States
3. Department of Urology, Columbia University Irving Medical Center, New York, United States
4. Department of Systems Biology, Columbia University Irving Medical Center, New York, United States
5. Herbert Irving Comprehensive Cancer Center, Columbia University Irving Medical Center, New York, United States
##### Contribution
Conceptualization, Funding acquisition, Investigation, Writing - original draft, Writing - review and editing, Conceptualization (data interpretation and presentation); Investigation (single-cell analysis); Investigation (urogenital anatomy); Investigation (mouse prostate analysis); Investigation (organoid assays); Investigation (renal graft assays); Investigation (statistical analysis)
##### Contributed equally with
Francesco Cambuli and Luis Aparicio
##### Competing interests
No competing interests declared
2. #### Francesco Cambuli

1. Department of Medicine, Columbia University Irving Medical Center, New York, United States
2. Department of Genetics and Development, Columbia University Irving Medical Center, New York, United States
3. Department of Urology, Columbia University Irving Medical Center, New York, United States
4. Department of Systems Biology, Columbia University Irving Medical Center, New York, United States
5. Herbert Irving Comprehensive Cancer Center, Columbia University Irving Medical Center, New York, United States
##### Contribution
Conceptualization, Funding acquisition, Investigation, Methodology, Writing - original draft, Writing - review and editing, Conceptualization (data interpretation and presentation); Methodology (flow cytometry); Investigation (single-cell analysis); Investigation (urogenital anatomy); Investigation (mouse prostate analysis); Investigation (flow cytometry); Investigation (organoid assays); Investigation (human prostate analysis)
##### Contributed equally with
Laura Crowley and Luis Aparicio
##### Competing interests
No competing interests declared
3. #### Luis Aparicio

1. Department of Systems Biology, Columbia University Irving Medical Center, New York, United States
2. Herbert Irving Comprehensive Cancer Center, Columbia University Irving Medical Center, New York, United States
3. Department of Biomedical Informatics, Columbia University Irving Medical Center, New York, United States
##### Contribution
Conceptualization, Investigation, Methodology, Writing - original draft, Writing - review and editing, Conceptualization (data interpretation and presentation); Investigation (single-cell analysis); Investigation (computational analysis)
##### Contributed equally with
Laura Crowley and Francesco Cambuli
##### Competing interests
No competing interests declared
4. #### Maho Shibata

1. Department of Medicine, Columbia University Irving Medical Center, New York, United States
2. Department of Genetics and Development, Columbia University Irving Medical Center, New York, United States
3. Department of Urology, Columbia University Irving Medical Center, New York, United States
4. Department of Systems Biology, Columbia University Irving Medical Center, New York, United States
5. Herbert Irving Comprehensive Cancer Center, Columbia University Irving Medical Center, New York, United States
Department of Anatomy and Cell Biology, School of Medicine and Health Sciences, The George Washington University Cancer Center, The George Washington University, Washington, DC, United States
##### Contribution
Funding acquisition, Investigation, Writing - review and editing, Investigation (single-cell analysis)
##### Competing interests
No competing interests declared
5. #### Brian D Robinson

Department of Pathology and Laboratory Medicine, Weill Medical College of Cornell University, New York, United States
Resources
##### Competing interests
No competing interests declared
6. #### Shouhong Xuan

1. Department of Medicine, Columbia University Irving Medical Center, New York, United States
2. Department of Genetics and Development, Columbia University Irving Medical Center, New York, United States
3. Department of Urology, Columbia University Irving Medical Center, New York, United States
4. Department of Systems Biology, Columbia University Irving Medical Center, New York, United States
5. Herbert Irving Comprehensive Cancer Center, Columbia University Irving Medical Center, New York, United States
##### Contribution
Investigation, Writing - review and editing, Investigation (mouse prostate analysis); Investigation (renal graft assays)
##### Competing interests
No competing interests declared
7. #### Weiping Li

1. Department of Medicine, Columbia University Irving Medical Center, New York, United States
2. Department of Genetics and Development, Columbia University Irving Medical Center, New York, United States
3. Department of Urology, Columbia University Irving Medical Center, New York, United States
4. Department of Systems Biology, Columbia University Irving Medical Center, New York, United States
5. Herbert Irving Comprehensive Cancer Center, Columbia University Irving Medical Center, New York, United States
##### Contribution
Investigation, Investigation (mouse prostate analysis)
##### Competing interests
No competing interests declared
8. #### Hanina Hibshoosh

1. Herbert Irving Comprehensive Cancer Center, Columbia University Irving Medical Center, New York, United States
2. Department of Pathology and Cell Biology, Columbia University Irving Medical Center, New York, United States
Resources
##### Competing interests
No competing interests declared
9. #### Massimo Loda

Department of Pathology and Laboratory Medicine, Weill Medical College of Cornell University, New York, United States
Resources
##### Competing interests
No competing interests declared

1. Department of Systems Biology, Columbia University Irving Medical Center, New York, United States
2. Herbert Irving Comprehensive Cancer Center, Columbia University Irving Medical Center, New York, United States
3. Department of Biomedical Informatics, Columbia University Irving Medical Center, New York, United States
##### Contribution
Conceptualization, Supervision, Funding acquisition, Writing - review and editing, Conceptualization (study design)
##### For correspondence
rr2579@cumc.columbia.edu
##### Competing interests
No competing interests declared
11. #### Michael M Shen

1. Department of Medicine, Columbia University Irving Medical Center, New York, United States
2. Department of Genetics and Development, Columbia University Irving Medical Center, New York, United States
3. Department of Urology, Columbia University Irving Medical Center, New York, United States
4. Department of Systems Biology, Columbia University Irving Medical Center, New York, United States
5. Herbert Irving Comprehensive Cancer Center, Columbia University Irving Medical Center, New York, United States
##### Contribution
Conceptualization, Supervision, Funding acquisition, Writing - original draft, Writing - review and editing, Conceptualization (study design); Conceptualization (data interpretation and presentation)
##### For correspondence
mshen@columbia.edu
##### Competing interests
No competing interests declared

### Funding

#### National Cancer Institute (R01CA238005)

• Michael M Shen

#### National Cancer Institute (U54CA193313)

• Michael M Shen

#### National Cancer Institute (P50CA211024)

• Massimo Loda
• Michael M Shen

• Maho Shibata

#### T.J. Martell Foundation

• Michael M Shen

#### U.S. Department of Defense (Prostate Cancer Research Program W81XWH-18-1-0424)

• Francesco Cambuli

#### National Science Foundation (DGE-16-44869)

• Laura Crowley

The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.

### Acknowledgements

We thank Reuben Akabas, Sarah Bergren, Mykola Bordyuh, Eva Leung, Bo Li, Maximilian Marhold, Agnieszka Pastula, Luis Pina, Roxanne Toivanen, Zixuan Wang, and Sven Wenske for their assistance with this project, and Cory Abate-Shen and Jia Li for comments on the manuscript. We thank Erin Bush and Peter Sims for assistance with single-cell sequencing in the Columbia Genomics and High Throughput Screening Shared Resource of the Herbert Irving Comprehensive Cancer Center, as well as the resources of the Cancer Center Flow Core Facility, which are supported in part by the Cancer Center Support Grant P30CA013696. We also thank Michael Kissner and Daniel Troast for assistance with flow cytometry in the Columbia Stem Cell Initiative Flow Cytometry Core, which is supported in part by S10OD026845. For assistance with electron microscopy, we thank Alice Liang, Chris Petzold, and Joseph Sall at the New York University Langone Health DART Microscopy Lab, which is partially funded by NYU Cancer Center Support Grant P30CA016087 and by S10OD019974. These studies were supported by grants from the NIH R01CA238005 (MMS), U54CA193313 (RR and MMS), P50CA211024 (ML and MMS), K99CA194287 (MS), by the TJ Martell Foundation (MMS), and by fellowships from the Department of Defense Prostate Cancer Research Program (W81XWH-18-1-0424; FC) and the National Science Foundation (DGE-16-44869;?LC).

### Ethics

Human subjects: Human prostate tissue specimens were obtained from patients undergoing cystoprostatectomy for bladder cancer or radical prostatectomy at Columbia University Irving Medical Center or at Weill Cornell Medicine. Patients gave informed consent under an Institutional Review Board-approved protocol (AAAN8850).

Animal experimentation: Animal studies were conducted according to protocols (AC-AABE0556, AC-AABG0564, AC-AABE5557) approved by the Columbia University Irving Medical Center (CUIMC) Institutional Animal Care and Use Committee (IACUC).

### Senior Editor

1. Richard M White, Memorial Sloan Kettering Cancer Center, United States

### Reviewing Editor

1. Nima Sharifi, Cleveland Clinic, United States

### Reviewers

1. Nima Sharifi, Cleveland Clinic, United States
2. Mark Rubin
3. Justin D Lathia, Cleveland Clinic, United States

### Publication history

2. Accepted: September 10, 2020
3. Accepted Manuscript published: September 11, 2020 (version 1)
4. Version of Record published: October 1, 2020 (version 2)

? 2020, Crowley et al.

## Metrics

• 1,495
Page views
• 269
• 2
Citations

Article citation count generated by polling the highest count across the following sources: Crossref, PubMed Central, Scopus.

A two-part list of links to download the article, or parts of the article, in various formats.

### Open citations (links to open the citations from this article in various online reference manager services)

1. Cancer Biology

# eIF4E S209 phosphorylation licenses myc- and stress-driven oncogenesis

Hang Ruan et al.
Research Article Updated

To better understand a role of eIF4E S209 in oncogenic translation, we generated EIF4ES209A/+ heterozygous knockin (4EKI) HCT 116 human colorectal cancer (CRC) cells. 4EKI had little impact on total eIF4E levels, cap binding or global translation, but markedly reduced HCT 116 cell growth in spheroids and mice, and CRC organoid growth. 4EKI strongly inhibited Myc and ATF4 translation, the integrated stress response (ISR)-dependent glutamine metabolic signature, AKT activation and proliferation in vivo. 4EKI inhibited polyposis in ApcMin/+ mice by suppressing Myc protein and AKT activation. Furthermore, p-eIF4E was highly elevated in CRC precursor lesions in mouse and human. p-eIF4E cooperated with mutant KRAS to promote Myc and ISR-dependent glutamine addiction in various CRC cell lines, characterized by increased cell death, transcriptomic heterogeneity and immune suppression upon deprivation. These findings demonstrate a critical role of eIF4E S209-dependent translation in Myc and stress-driven oncogenesis and as a potential therapeutic vulnerability.

1. Cancer Biology

# Inhibiting IRE1α-endonuclease activity decreases tumor burden in a mouse model for hepatocellular carcinoma

Nata?a Pavlovi? et al.
Research Article Updated

Hepatocellular carcinoma (HCC) is a liver tumor that usually arises in patients with cirrhosis. Hepatic stellate cells are key players in the progression of HCC, as they create a fibrotic micro-environment and produce growth factors and cytokines that enhance tumor cell proliferation and migration. We assessed the role of endoplasmic reticulum (ER) stress in the cross-talk between stellate cells and HCC cells. Mice with a fibrotic HCC were treated with the IRE1α-inhibitor 4μ8C, which reduced tumor burden and collagen deposition. By co-culturing HCC-cells with stellate cells, we found that HCC-cells activate IREα in stellate cells, thereby contributing to their activation. Inhibiting IRE1α blocked stellate cell activation, which then decreased proliferation and migration of tumor cells in different in vitro 2D and 3D co-cultures. In addition, we also observed cell-line-specific direct effects of inhibiting IRE1α in tumor cells.