# The birth of a bacterial tRNA gene by large-scale, tandem duplication events

1. Department of Evolutionary Theory, Max Planck Institute for Evolutionary Biology, Germany
2. Asia Pacific Center for Theoretical Physics, Republic of Korea
Research Article

## Abstract

Organisms differ in the types and numbers of tRNA genes that they carry. While the evolutionary mechanisms behind tRNA gene set evolution have been investigated theoretically and computationally, direct observations of tRNA gene set evolution remain rare. Here, we report the evolution of a tRNA gene set in laboratory populations of the bacterium Pseudomonas fluorescens SBW25. The growth defect caused by deleting the single-copy tRNA gene, serCGA, is rapidly compensated by large-scale (45–290 kb) duplications in the chromosome. Each duplication encompasses a second, compensatory tRNA gene (serTGA) and is associated with a rise in tRNA-Ser(UGA) in the mature tRNA pool. We postulate that tRNA-Ser(CGA) elimination increases the translational demand for tRNA-Ser(UGA), a pressure relieved by increasing serTGA copy number. This work demonstrates that tRNA gene sets can evolve through duplication of existing tRNA genes, a phenomenon that may contribute to the presence of multiple, identical tRNA gene copies within genomes.

## Introduction

Even though tRNAs perform the same canonical function in all organisms – decoding 61 sense codons into 20 amino acids – tRNA gene sets vary considerably across the tree of life (Fujishima and Kanai, 2014; Marck and Grosjean, 2002). Two aspects in which they vary are the types of tRNAs encoded and the number of gene copies encoding each type. Bacterial tRNA complements typically contain 28–46 types of tRNA, encoded by 28–120 genes (Chan and Lowe, 2016). Elucidating the factors contributing to, and the molecular mechanisms behind, the evolution of these variations has been of long-standing interest in biology. There is general agreement that bacterial tRNA gene sets are, in conjunction with the rest of the translational machinery, shaped by selection for rapid and accurate protein synthesis (‘translational efficiency’; reviewed in Gingold and Pilpel, 2011). Efficient translation is an important determinant of bacterial growth rate, with more efficient protein production enabling faster growth and division (Kurland, 1996; Kurland and Ehrenberg, 1987). tRNAs mainly contribute to translational efficiency during elongation, the stage of translation where codons are sequentially matched to aminoacylated (charged) tRNAs (reviewed in Gingold and Pilpel, 2011; Rodnina, 2018). Codon-tRNA matching occurs by a trial and error process; tRNAs – in the form of ternary complexes (Bensch et al., 1991) – are stochastically sampled from the available pool. The speed with which a matching tRNA is selected depends on the absolute and relative concentration of tRNAs that match the codon; codons matched by more abundant tRNAs are expected, on average, to be translated more quickly than those matched by rarer tRNAs (reviewed in Plotkin and Kudla, 2011). Given that the formation of codon-tRNA matches by stochastic sampling is the rate-limiting step of elongation (Varenne et al., 1984), any factors affecting the matching process are likely to influence the evolution of bacterial tRNA gene sets. Examples include variations in codon-tRNA matching patterns and codon bias. Both of these are discussed in more detail below.

Codon-tRNA matching patterns are complex; some tRNAs match, and hence translate, more than one synonymous codon (Crick, 1966; Ikemura, 1981). This expanded translational capacity is the result of relaxed base pairing between the first base of the tRNA anticodon (tRNA position 34) and third base of the codon (codon position 3). In this binding position, a number of non-standard, ‘wobble’ pairings are permitted (reviewed in Agris et al., 2018). Most wobble pairings involve post-transcriptionally, enzymatically modified bases in the anticodon stem-and-loop region of tRNAs (Boccaletto et al., 2018; Machnicka et al., 2016). These post-transcriptional modifications affect the binding capacity and/or accuracy of a large fraction of bacterial tRNAs (Bj?rk and Hagervall, 2014; Manickam et al., 2016). Hence, the set of post-transcriptional modification pathways active within a bacterial species affects the codon–tRNA matching pattern and, in turn, is expected to influence tRNA gene set composition. Indeed, various post-transcriptional modification pathways have been shown to co-vary with tRNA repertoires (Diwan and Agashe, 2018; Grosjean et al., 2010; Novoa et al., 2012).

Codon bias refers to the preferential use of some synonymous codons over others. A number of hypotheses exist regarding the evolutionary origins and consequences of preferred codons (reviewed in Novoa and Ribas de Pouplana, 2012; Plotkin and Kudla, 2011), one of which is the optimization of codon–tRNA matching during elongation (Berg and Kurland, 1997; Bulmer, 1991; Bulmer, 1987; Higgs and Ran, 2008; Rocha, 2004). There are several lines of support for this hypothesis. Firstly, different codons have been demonstrated to be translated at different rates in yeast, with more frequent codons generally being translated more quickly (Gardin et al., 2014). Secondly, codon and tRNA abundances have been observed to co-vary across many bacterial genomes (reviewed in Ikemura, 1985), particularly under conditions where rapid translation is required (e.g., during rapid growth [Dong et al., 1996; Emilsson and Kurland, 1990], in highly expressed genes [Ikemura, 1981], or among bacteria with faster growth rates [Sharp et al., 2010]). Thirdly, a number of studies have demonstrated an increase in protein expression when codon–tRNA co-variation is strengthened, either by optimizing synonymous codon use (S?rensen et al., 1989; Zhou et al., 2004) or by the addition of exogenous tRNAs (Gu et al., 2004; Misra and Reeves, 1985).

As outlined above, two factors expected to affect tRNA gene set evolution are the presence of tRNA post-transcriptional modification pathways and codon bias; post-transcriptional modifications heavily influence codon–tRNA matching patterns (affecting the?tRNA types that?are encoded), while codon bias dictates the translational demand for individual tRNA types (influencing tRNA abundances and hence tRNA gene copy number). Overall, theoretical and computational studies are consistent with the optimization of translational efficiency by the streamlining of tRNA gene sets; bacterial growth rate correlates with fewer tRNA types encoded by more gene copies (Ran and Higgs, 2010; Rocha, 2004).

In addition to factors influencing the evolution of tRNA gene sets, the mechanisms behind their evolution are an area of interest. Hypothetically, tRNA gene sets can evolve by several different mechanisms. Surplus tRNA genes may be lost (by deletion), while tRNA genes may be acquired from external sources (by horizontal gene transfer), or from within the genome (by duplication events). Additionally, the identity of existing tRNA genes may be altered by the acquisition of anticodon mutations (anticodon switching). Thus far, most evidence for the above routes of tRNA gene set evolution is indirect: phylogenetic analyses provide evidence of the flexibility of bacterial tRNA gene sets by loss, gain (both by horizontal gene transfer and duplication), and anticodon switch events (Diwan and Agashe, 2018; McDonald et al., 2015; Tremblay-Savard et al., 2015; Wald and Margalit, 2014; Withers et al., 2006). An anticodon switch event has also been directly observed in laboratory yeast; Saccharomyces cerevisiae populations in which the gene encoding tRNA-Arg(CCU) had been removed were repeatedly rescued by a C→T mutation in one of eleven gene copies encoding tRNA-Arg(UCU) (Yona et al., 2013). While the aforementioned study demonstrates the power of experimental evolution to provide insight into the evolution of tRNA gene sets, there remains a shortage of empirical studies directly investigating the evolution of bacterial tRNA gene sets and translation.

To address this, we (i) engineer a suboptimal bacterial tRNA gene set, (ii) compensate the defect using experimental evolution, and (iii) determine the genetic and molecular bases of compensation. More specifically, we delete the single-copy tRNA gene, serCGA, from the bacterium Pseudomonas fluorescens SBW25. We compensate the resulting growth defect during a 13-day serial transfer evolution experiment and show that the genetic basis of compensation is large-scale (45–290 kb), tandem duplications encompassing a second tRNA gene (serTGA). Using a bacterial adaptation of YAMAT-seq (a method of mature tRNA pool deep-sequencing originally developed for use in human cell lines [Shigematsu et al., 2017]), we demonstrate that each duplication event is accompanied by an increase in tRNA-Ser(UGA) in the mature tRNA pool. Finally, we develop a model that combines our experimental results with the predicted effects of codon–tRNA matching patterns and codon bias, to provide a molecular explanation of how the observed tRNA pool changes may affect translation and growth.

## Results

### The P. fluorescens SBW25 tRNA gene set

Isolated from the leaf of a sugar beet plant, P. fluorescens SBW25 is a non-pathogenic bacterium that is frequently used as a model system in evolutionary biology. The SBW25 genome (Silby et al., 2009) is predicted by GtRNAdb 2.0 (Chan and Lowe, 2016) to contain 67 tRNA genes (Figure 1A; Supplementary file 1). The RNA product of one of these genes (cysGCA-2) is predicted to form a secondary structure that deviates significantly from the cloverleaf structure typical of canonical tRNAs (Chan and Lowe, 2019). Hence, the SBW25 tRNA gene set consists of 66 canonical tRNA genes. These encode 39 different tRNA types, 14 of which are encoded by multiple (between two and five) gene copies. Of these 14 types,?12 are encoded by gene copies that are identical in sequence; only tRNA-Asn(GUU) and tRNA-fMet(CAU) are encoded by multiple gene copies with different sequences.

Figure 1 with 1 supplement see all

The 39 different tRNA types identified in SBW25 must together be capable of translating all 61 sense codons. To investigate SBW25 codon–tRNA matching patterns, a combination of current wobble rules (reviewed in Agris et al., 2018) and tRNA post-transcriptional modification prediction tools (Boccaletto et al., 2018; Machnicka et al., 2016; Panwar and Raghava, 2014) was applied. Evidence for the translation of multiple synonymous codons was found for 28 of 39 tRNA types (Supplementary file 1). The proposed SBW25 codon–tRNA matching patterns indicate that the 39 tRNA types consist of a core set of 33 essential types (thatwhich together are theoretically capable of translating all 61 codons) and six non-essential types (whose cognate codons are also?predicted to be translated by an essential tRNA type; Figure 1A; Supplementary file 1). The six non-essential types are candidates for suboptimal tRNA gene set construction; their apparent functional redundancy suggests that they can be eliminated, while their retention indicates that elimination may be detrimental.

With the aim of constructing a suboptimal tRNA gene set, we focussed our attention on serCGA, a single-copy gene encoding the non-essential tRNA type, tRNA-Ser(CGA) (Figure 1A). According to the predicted codon–tRNA matching pattern, the cognate codon of tRNA-Ser(CGA), UCG, can also be translated by the essential tRNA type, tRNA-Ser(UGA) (Figure 1B; Supplementary file 1). In several Gram-negative bacteria, tRNA-Ser(UGA) is post-transcriptionally modified at U34 to 5-methoxycarbonylmethoxyuridine (mcmo5U34; Boccaletto et al., 2018), resulting in the expansion of translational capacity from codon UCA to include UCG and UCU (Takai et al., 1999a; Takai, 1996; see Figure 1—figure supplement 1C). The mcmo5U34 modification is performed by the CmoA/B/M enzymatic pathway (Bj?rk and Hagervall, 2014; Sakai et al., 2016). Homologues of these enzymes are present in SBW25; protein BLAST searches of Escherichia coli MG1655 CmoA, CmoB, and CmoM against the SBW25 proteome give significant hits to Pflu1067, Pflu1066, and Pflu0633, respectively (BLASTp e-values?<1e-50; Altschul et al., 1990).

CmoA/B/M-mediated expansion of tRNA-Ser(UGA) translational capacity to include codon UCG is expected to rescue a serCGA deletion mutant. Such a rescue event would require all UCG (and UCA) codons to be translated by tRNA-Ser(UGA). Given that UCG is a relatively high use codon in SBW25 – accounting for 22.5% of serine and 1.31% of all codons (Figure 1B) – serCGA deletion is expected to considerably increase translational demand for tRNA-Ser(UGA). Hence, while a serCGA deletion mutant may survive, significant translation and growth defects are anticipated.

### Deletion of serCGA limits rapid growth

To test the prediction that serCGA deletion results in a viable strain with a growth defect, the entire 90 bp, single-copy serCGA gene was removed from P. fluorescens SBW25 by two-step allelic exchange (see Supplementary file 2 for construction details). This process changed the tRNA gene set from 66 tRNA genes of 39 types to 65 tRNA genes of 38 types. The engineering process was performed two independent times, yielding biological replicate strains ΔserCGA-1 and ΔserCGA-2. A third round of allelic exchange gave rise to an engineering control strain, SBW25-eWT. Successful deletion of serCGA demonstrates that it is not essential for survival; at least one other tRNA can translate codon UCG. serCGA deletion results in an immediately obvious growth defect on King’s medium B (KB), a rich medium that supports rapid growth and hence rapid translation (King et al., 1954). Deletion of serCGA results in visibly smaller colonies on KB agar (Figure 2A). Compared with SBW25, ΔserCGA-1 shows a reduction maximum growth rate in liquid KB (two-sample t-test p=1.18×10?6, Figure 2B and C) and elongated cells during growth in liquid KB (Figure 2—figure supplement 1A). Further, both independent serCGA deletion mutants lose when grown in direct, 1:1 competition with the neutrally marked SBW25-lacZ in liquid KB (one sample t-tests p<0.0001; Figure 2D). The observed growth defect is much less pronounced in minimal media, which supports slower growth and translation; SBW25 and ΔserCGA-1 colonies are similar sizes on M9 agar (Figure 2E), and no negative effects on growth or cell morphology were detected in liquid M9 (Figure 2F and G; Figure 2—figure supplement 1B). However, direct 1:1 competitions between ΔserCGA and SBW25-lacZ indicate a slight negative effect of serCGA deletion in liquid M9 (one sample t-tests p=0.06882 and 0.00908; Figure 2H).

Figure 2 with 1 supplement see all

The results in this section demonstrate that serCGA deletion leads to a growth defect that is more pronounced in KB than M9. The generation of a viable, but suboptimal, tRNA gene set upon serCGA deletion from SBW25 is consistent with the predictions of the previous section: that serCGA is not essential for translation, but it contributes to translational speed. It should, however, be noted that serCGA deletion may increase intracellular serine levels, which could lead to toxic effects on growth. Indeed, significant increases in intracellular serine have been shown to affect the growth and division of E. coli cells growing in rich, serine-containing media (Kriner and Subramaniam, 2020; Zhang et al., 2010; Zhang and Newman, 2008).

### The growth defect is repeatedly and rapidly compensated during experimental evolution

In order to?investigate whether and how the growth defect exhibited by the serCGA deletion mutant can be compensated for genetically, a serial transfer evolution experiment was performed. This experiment consisted of eight independent lineages: W1–W4 were control lines, each founded by a wild type strain (W1 and W2 by SBW25, W3 and W4 by SBW25-eWT), while M1–M4 were founded by the serCGA deletion mutant (M1 and M2 by ΔserCGA-1, M3 and M4 by ΔserCGA-2). Each lineage was started from a single colony and maintained in liquid KB for 15 days, with daily transfer of 1% to fresh medium. Samples of each population were frozen daily.

After 13 days (~90 generations), all four mutant lineages (M1, M2, M3, and M4) showed visibly improved growth. Plating of day 13 populations on KB agar revealed that many colonies from lineages M1–M4 were larger than those of the founding serCGA deletion mutant (Figure 3A). Notably, lineage M2 showed two phenotypically distinct types of large colonies: a phenotypically standard type and an opaque type. The opaque type closely resembles the previously reported switcher phenotype, in which on–off switching of colanic acid–based capsules generates opaque-translucent colony bistability (Beaumont et al., 2009; Gallie et al., 2019; Gallie et al., 2015; Remigi et al., 2019). No obvious change was observed in the size of colonies derived from day 13 of the four wild-type lineages (Figure 3A).

Figure 3

Five colonies were isolated from the day 13 mutant lineages for further analysis. These included one standard-looking, large colony from each mutant lineage (these isolates are hereafter referred to as M1-L, M2-L, M3-L, and M4-L) and a second large, opaque colony from lineage M2 (hereafter M2-Lop). Two representative colonies were isolated from day 13 of different wild-type lineages (hereafter W1-L and W3-L). Growth analyses in liquid KB showed improved growth profiles in all five isolates from the mutant lineages (Figure 3B–D). In line with the observed improvement, each of the five mutant lineage isolates outcompeted the serCGA deletion mutant in direct, 1:1 competition (one sample t-tests p<0.01; Figure 3E). Indeed, no fitness difference was detected in 1:1 competitions between two mutant lineage isolates and neutrally?marked SBW25-lacZ, providing evidence for high levels of compensation in these genotypes (M1-L and M4-L; one sample t-tests p>0.05; Figure 3E). The other three mutant lineage isolates were outcompeted by SBW25-lacZ, indicating partial compensation (M2-L, M2-Lop, and M3-L; one sample t-tests p<0.01; Figure 3E). No changes were observed in the growth or fitness of W1-L, the control isolate from day 13 of wild-type lineage 1 (Figure 3A–E).

The results in this section demonstrate that the growth defect caused by the deletion of serCGA was repeatedly and rapidly compensated, to varying degrees, in isolates from each of the four mutant lineages on day 13 of the evolution experiment.

### Genetic basis of compensation is large duplications spanning serTGA

To determine the genetic basis of ΔserCGA compensation, Illumina whole genome sequencing was performed on the seven day 13 isolates from the previous section: two control isolates from two wild type lineages (W1-L?and W3-L) and five isolates from four mutant lineages (M1-L, M2-L, M2-Lop, M3-L, and?M4-L). In each of the five mutant lineage isolates a large, direct, tandem duplication was identified at around 4.16 Mb in the SBW25 chromosome (Figure 4A–B, Figure 4—figure supplement 1, Supplementary file 3). No evidence of any such duplications was found in either of the wild type control isolates. In addition to the large duplications, one synonymous point mutation was identified in one mutant lineage: in the carbohydrate metabolism gene edd of M2-L, codon 17 is changed from CGC to CGA (both encoding arginine; Supplementary file 3). This mutation was not identified in any other isolate, including the second isolate from the M2 lineage (M2-Lop), and is not considered likely to contribute to the compensatory phenotype of M2-L.

Figure 4 with 2 supplements see all

A combination of computational analyses, PCR, and Sanger sequencing was used to determine the precise region of duplication in four of five isolates. The duplications range in size from 45 kb (in M1-L) to 290 kb (in M4-L), and occur between 4.05 Mb and 4.34 Mb of the chromosome (Table 1). The precise location of the duplication in the fifth isolate (M3-L) could not be determined due to highly repetitive flanking DNA.

Table 1

Next, we sought to identify the region(s) within the duplications responsible for the observed gain in fitness. Closer examination revealed a?~45 kb segment that is duplicated in all five strains (4,119,923–4,164,966). This segment is predicted to contain 45 genes, including 44 protein-coding genes (none of which is obviously linked to translation, or serine transport/metabolism), and one tRNA gene: serTGA (Figure 4A, Supplementary file 4). Close manual inspection revealed no evidence of any point mutations in any copy of the serTGA gene or promoter sequence, in any of the evolved isolates (Supplementary file 5), meaning that each duplication strain contains a second, wild-type copy of serTGA. In addition, isolate M4-L carries additional copies of four other tRNA genes (argTCT, hisGTG, leuTAA, and?hisGTG; Supplementary file 4). Notably, this experiment has revealed three different tRNA gene sets that each provides a similar level of fitness in KB (see Figure 3E): SBW25 encodes 66 canonical tRNA genes of 39 tRNA types, four of five compensated isolates carry 66 tRNA genes of 38 types, and the fifth compensated isolate (M4-L) carries 70 tRNA genes of 38 types.

Given that the duplicated serTGA gene encodes tRNA-Ser(UGA), the tRNA type that can theoretically perform the function of tRNA-Ser(CGA) (see Figure 1—figure supplement 1C), it is a logical candidate for the underlying cause of compensation. However, there are also 44 protein-coding genes in the shared duplication segment, any of which could contribute to the compensatory effect. We therefore tested whether a plasmid-based increase in serTGA expression can compensate for serCGA loss. To this end, the serCGA and serTGA genes were individually amplified from SBW25, and each ligated into the expression vector pSXn (giving pSXn-CGA and pSXn-TGA; Supplementary file 2). This placed the expression of the tRNA gene under the control of an isopropyl-?-D-1-thiogalactopyranoside (IPTG)-inducible tac promoter (de Boer et al., 1983; Owen and Ackerley, 2011).

Each plasmid construct was inserted into SBW25, ΔserCGA-1, and ΔserCGA-2, and the growth of the resulting strains was analysed in the absence of the inducer (to achieve lower-level, leaky expression of the tRNA gene). Expression of either serCGA or serTGA was shown to improve the growth of the serCGA deletion mutants in rich medium; addition of pSXn-CGA or pSXn-TGA increases the maximum growth rate of ΔserCGA, while addition of empty pSXn does not (Figure 4C?and?D; two-sample t-tests). Contrastingly, expression of neither tRNA improved growth of SBW25 (Figure 4C), with serCGA expression actually leading to a decrease in SBW25 maximum growth rate (Figure 4D; one-sided two-sample t-test p=0.000158). While it should be noted that tRNA-Ser(UGA) levels resulting from pSXn-based expression are likely to exceed those resulting from an additional chromosomal copy of serTGA, the result that pSXn-based serTGA expression specifically improves the growth rate of ΔserCGA demonstrates that serTGA can provide a degree of compensation for serCGA loss. Other genes in the shared 45 kb fragment may nevertheless contribute to compensation, through unidentified mechanisms.

Thus far, our results provide strong evidence that the growth defect caused by serCGA deletion is repeatedly and rapidly compensated by one of several large-scale duplications encompassing serTGA.

### Large-scale duplications arise quickly and are heterogeneous

The duplication-carrying strains were isolated from all mutant lineages on day 13, demonstrating that the large-scale duplications occur repeatedly and rapidly. To more closely investigate the rapidity with which the duplications arose, a PCR was performed to identify the first time point at which the emergent M1-L duplication junction, M1junct1 (see Supplementary file 2), could be amplified from lineage M1 (see Figure 4B). The lineage M1 population samples frozen daily during the evolution experiment were revived and used as templates for the PCR. The M1-L duplication junction was first visible by PCR on day 3 and grew stronger as the experiment progressed (Figure 4E). Similar PCRs for the remaining duplication strains detected the presence of the relevant duplication junctions on day 2 (M2-Lop), day 4 (M3-L), and day 5 (M2-L?and M4-L) (Figure 4—figure supplement 2).

A notable feature of the duplication fragments is that they are heterogeneous. That is, each of the five compensated isolates carries a unique duplication fragment with distinct endpoints (Table 1). Further, there is evidence of within-lineage heterogeneity; two distinct duplication fragments were identified in isolates from a single lineage (M2-L?and M2-Lop; Table 1). To investigate within-lineage heterogeneity more closely, additional large colonies were isolated from day 13 of each mutant lineage and tested by PCR for the presence of the duplication junction(s) previously identified in the lineage. The results provide evidence for genotypic heterogeneity in compensated isolates from three of the four mutant lineages (M2, M3, and M4; Figure 4—figure supplement 2). Three differently?sized PCR products were detected among the four isolates from lineage M2, demonstrating the presence of at least three distinct duplication fragments in the day 13 population. A mixture of compensated genotypes was also detected in lineages M3 and M4, with duplication junctions M3junct1 and M4junct1 amplifying in only three (of four) and two (of four) isolates, respectively. No PCR products were detected for the remaining isolates in these lineages, indicating that these three isolates either carry a duplication junction that was not tested for or compensate by a different mechanism.

The results in this section demonstrate that (i) strains carrying duplication fragments arise early within the mutant lineages of the evolution experiment (within 2–5 days or?~7–35 generations) and (ii) a degree of heterogeneity exists in duplication fragments, both between and within mutant lineages. These observations suggest that a mixture of duplication strains – and, by extension, tRNA gene sets – arise and compete within each mutant lineage.

### Duplication events increase the proportion of tRNA-Ser(UGA) in the mature tRNA pool

Next, we sought to quantify the effect of serCGA deletion and subsequent serTGA duplication on the mature tRNA pool of SBW25. To this end, YAMAT-seq – an established method of deep-sequencing mature tRNA pools in human cells (Shigematsu et al., 2017) and plants (Warren et al., 2020) – was adapted for use in P. fluorescens SBW25. The YAMAT-seq procedure quantifies charged and uncharged mature tRNAs; it does not measure pre-tRNAs, or tRNA fragments. Briefly, the YAMAT-seq procedure involves (i) isolation of total RNA from exponentially growing cells, (ii) removal of amino acids from the charged fraction of mature tRNAs, rendering all (or, most) mature tRNAs uncharged, (iii) ligation of Y-shaped, DNA/RNA hybrid adapters to the conserved, exposed ends of the mature, uncharged tRNAs, (iv) reverse transcription and amplification of adapter-tRNA complexes, (v) gel purification of the PCR products, (vi) high throughput sequencing, and (vii) computational and statistical analyses.

YAMAT-seq was performed on three replicates of nine strains: wild type (SBW25), the two independent serCGA deletion mutants (ΔserCGA-1, ΔserCGA-2), and six isolates from day 13 of the evolution experiment (W1-L, M1-L, M2-L, M2-Lop, M3-L, and?M4-L). High throughput sequencing of the reverse-transcribed tRNA pools resulted in an average of 1,177,340 raw reads per sample. More than 99.99% of the raw reads fell within the range of lengths expected for tRNA-containing reads (80–150 bp), indicating good adapter-binding specificity. The?>80 bp reads for each sample were aligned to a set of 42 reference tRNA sequences, consisting of all unique tRNA gene sequences predicted in the SBW25 chromosome (including cysGCA-2; Supplementary file 6; Chan and Lowe, 2016). At the conclusion of the alignment process, an average of 1,050,749 reads per sample were aligned to the reference sequences (~89% read retention; Supplementary file 7). Within each sample, the reference sequences with the highest and lowest (above zero) read counts consistently varied by a factor of?~10,000. For example, in sample 1, 109,963 reads aligned to Gly-GCC and 13 to Ile2-CAU (Supplementary file 7).

All samples showed reads aligned to 40 or 41 (of 42) reference sequences. The reference sequences without reads were Ser-CGA (in ΔserCGA and derivatives, as expected) and Cys-GCA-2 (in all 27 samples). Together with the prediction of a non-standard secondary structure for cysGCA-2 (Chan and Lowe, 2019; Chan and Lowe, 2016), our inability to detect the Cys-GCA-2 sequence in any sample strongly indicates that Cys-GCA-2 does not contribute to translation. Overall, the YAMAT-seq results support the tRNA gene set predicted for SBW25: all 39 predicted types of mature tRNAs were detectable, including the 33 essential and the six non-essential tRNA types (see Figure 1A, Supplementary file 1).

While the YAMAT-seq results provide a useful overview of the relative abundances of tRNAs in a mature tRNA pool, within-strain comparisons of different tRNA types should be interpreted cautiously. As outlined by Shigematsu et al., 2017, variations in tRNA structural components and post-transcriptional modifications can adversely affect the relative efficiency of the reverse transcription reaction for some tRNA types, reducing their apparent proportions. In our results, three tRNA types consistently align a very low proportion of reads (<0.0001): Phe-GAA, Glu-UUC, and Ile2-CAU. A much higher proportion of reads was expected in particular for Phe-GAA and Glu-UUC, both of which are the sole tRNA types responsible for decoding all synonymous codons for their respective amino acids (accounting for 3.66% and 5.43% of genome wide codons, respectively; Chan and Lowe, 2016). Given their low read numbers, and non-central role in our experiment, these three tRNA types were removed from downstream statistical analyses.

The strength of the YAMAT-seq procedure lies in comparing changes in mature tRNA levels across genotypes. Any issues with efficiencies are expected to remain relatively constant across the strains in this experiment, allowing changes in relative tRNA type abundances to be detected. To this end, DESeq2 (Love et al., 2014) was used to compare normalized expression levels of 36 mature tRNA types – all SBW25 tRNA types except for Phe-GAA, Glu-UUC, and Ile2-CAU – in pairs of strains. Firstly, the effect of deleting serCGA was investigated by comparing tRNA sequences from each of the two independent serCGA deletion strains with those in SBW25 (Figure 5A). The absence of tRNA-Ser(CGA) from both deletion mutants demonstrates that (i) tRNA-Ser(CGA) is encoded solely by the deleted serCGA gene, and (ii) under the conditions tested, tRNA-Ser(CGA) is indeed a non-essential tRNA type in SBW25 (i.e., can be eliminated without causing death). The serCGA deletion mutants also showed consistently lower levels of tRNA-Thr(CGU) (DESeq2 adjusted p<0.001), a result that may reflect a close metabolic relationship between threonine and serine (Sawers, 1998). No significant differences were detected between the two deletion mutants (DESeq2 adjusted p>0.1).

Figure 5 with 1 supplement see all

Next, the effect of evolution on the serCGA deletion mutant was investigated by pairwise comparisons between each day 13 isolate and its corresponding ancestor (Figure 5B). Importantly, no differences in tRNA pools were detected in the wild type control lineage (W1-L versus SBW25; DESeq2 adjusted p>0.1). Contrastingly, a single consistent, statistically significant difference was observed across the five mutant lineage isolates: the level of tRNA-Ser(UGA) was 2.06- to 2.60-fold higher than in the ΔserCGA ancestor (DESeq2 adjusted p<0.0001). A number of other tRNA types co-vary with the rise in tRNA-Ser(UGA) (Figure 5B). While none of these differences is statistically significant in all strains, they are consistently in the same direction (i.e., increase or decrease). Thus, while the main effect of serCGA deletion and serTGA duplication is the loss of tRNA-Ser(CGA) and elevation of tRNA-Ser(UGA) respectively, other more subtle effects are likely to exist. Finally, pairwise comparisons between the duplication strains reveal some tRNA pool differences between the different duplication strains (Figure 5—figure supplement 1). Interestingly, no differences were detected between strain M4-L and any of the other duplication strains (DESeq2 adjusted p>0.3), indicating that the four additional tRNA genes duplicated in M4-L do not contribute to the mature tRNA pool. It is possible that the duplication junction in M4-L – which lies 112 bp upstream of the duplicated argTCT-hisGTG-leuTAG-hisGTG tRNA genes – truncates a promoter, leading to little or no expression of the duplicated copies. This result highlights that tRNA gene copy number does not always correlate with mature tRNA level.

The YAMAT-seq data shows the major effects of our engineering and evolution on the mature tRNA pool of SBW25 (Figure 5C). In the wild type strain, the proportion of tRNA-Ser(CGA) is around 2.5-fold higher than that of tRNA-Ser(UGA) (0.015 and 0.0059, respectively). The dominant effect of serCGA deletion is elimination of tRNA-Ser(CGA), with the proportions of the other tRNA types, including tRNA-Ser(UGA), remaining relatively stable. The subsequent large-scale duplications spanning serTGA generate an approximately twofold increase in the relative abundance of tRNA-Ser(UGA).

### A model for how elevation of tRNA-Ser(UGA) increases translational efficiency

Thus far, our results show that the growth defect caused by tRNA-Ser(CGA) elimination can be compensated by large-scale duplication events that serve to increase the proportion tRNA-Ser(UGA) in the mature tRNA pool. In this final section, we develop a model to provide a molecular explanation of how elevating tRNA-Ser(UGA) levels may compensate for tRNA-Ser(CGA) loss.

As described in the introduction, tRNA pool composition is an important determinant of translational speed. During elongation, the codon occupying the ribosomal A site is matched to a corresponding tRNA by stochastically sampling from the available pool of ternary complexes. Ternary complexes consist of a tRNA, an elongation factor (EF-Tu), and GTP (Bensch et al., 1991). Given that EF-Tu is a highly abundant protein that binds all correctly charged tRNA types with approximately equal affinity (Louie et al., 1984; Schrader et al., 2011), and uncharged tRNAs with several fold lower affinity (Nissen et al., 1996; Shulman et al., 1974), the relative levels of each ternary complex are expected to reflect mature, charged tRNA proportions. Overall, the average time taken to match any given codon is dependent on the availability of the corresponding charged tRNA type(s); codons matched by a higher proportion of tRNAs will, on average, be translated more quickly than those matched by rarer tRNAs (Varenne et al., 1984). To illustrate, consider the serine codons and seryl-tRNAs from our experiments; codon UCA is translated by tRNA-Ser(UGA), while codon UCG can theoretically be translated by tRNA-Ser(CGA) or tRNA-Ser(UGA) (see Figure 1B). When both tRNA types are present, UCG is matched by a higher proportion of tRNAs than UCA and hence is expected to be translated more quickly, on average.

The above logic can be extended to provide relative numerical estimates of codon translation times, where codon-tRNA matching patterns and tRNA proportions are known. The average time required to translate any particular codon ($τcodon$) can be approximated by the inverse proportion of matching, charged tRNAs in the pool (Equation 1; where $ptRNA$ is the proportion of matching tRNAs).

(1) ${\tau }_{\text{codon}}=\text{?}\frac{1}{{p}_{\text{tRNA}}}$

Accordingly, the average time taken to translate UCA can be estimated by the inverse proportion of tRNA-Ser(UGA) in the tRNA pool (Equation 2a), while UCG translation time can be estimated as the inverse proportion of tRNA-Ser(UGA) plus tRNA-Ser(CGA) (Equation 2b). For simplicity, these equations assume that tRNA-Ser(CGA) and tRNA-Ser(UGA) translate UCG with equal efficiency (see Discussion).

(2a) ${\tau }_{\text{UCA}}=\text{?}\frac{1}{{p}_{\text{UGA}}}$
(2b) ${\tau }_{\text{UCG}}=\text{?}\frac{1}{\left({p}_{\text{CGA}}+\text{?}{p}_{\text{UGA}}\right)}$

If the proportions of mature tRNA-Ser(CGA) and tRNA-Ser(UGA) measured during YAMAT-seq are substituted into Equations 2a and 2b, relative measures of UCA and UCG translation times can be obtained in various genetic backgrounds (Table 2; see Supplementary file 7). According to these calculations, serCGA deletion increases the time taken to translate UCG by a factor of four (mean UCG translation times of 49.4 and 207 in wild type and serCGA deletion tRNA gene sets, respectively). Subsequent serTGA duplication elevates the proportion of tRNA-Ser(UGA), partially restoring UCG translation time (mean UCG translation times of 207, 86.9, and 49.4 in the serCGA deletion strains, wild types, and serTGA duplication strains, respectively).

Table 2

The cumulative impact of changing UCG translation speeds on translation and, ultimately, growth logically depends on how frequently the UCG codon is used. In SBW25, UCG is a relatively high use codon, occurring approximately 4.5 times more frequently than UCA (see Figure 1B; Chan and Lowe, 2016). To account for UCG codon bias, we estimated the average time required by each strain to translate an mRNA that reflects the relative UCG/UCA codon usage of SBW25. Using the same principle as in Equation 1, Equation 2a and b, the average time taken to translate the mRNA ($τmRNA$) is calculated as the sum of the time taken to translate one UCA codon plus the time taken to translate 4.5 UCG codons (Equation 3).

(3) ${\tau }_{\text{mRNA}}=1\left(\frac{1}{{p}_{\text{UGA}}}\right)+4.5\left(\frac{1}{\left({p}_{\text{CGA}}+{p}_{\text{UGA}}\right)}\right)$

The estimated average time required for the three tRNA gene complements to translate the mRNA can be calculated using the tRNA proportions measured during YAMAT-seq (Table 2). Using this method, on average a serCGA deletion mutant is expected to require approximately three times longer to translate the mRNA than the wild type (mean translation times of 1135 and 397, respectively). serTGA duplication is estimated to restore average translation time to near-wild type levels (mean translation times of 478 and 397, respectively).

Proportions of tRNA-Ser(UGA)a and tRNA-Ser(CGA)b, as measured by YAMAT-seq for each strain (see Supplementary file 7), were substituted into Equation 2a to estimate UCA translation time ($τUCA$), Equation 2b to estimate UCG translation time $τUCG$, and Equation 3 to estimate translation time of an artificial mRNA representing the relative codon use of UCA and UCG in SBW25 ($τmRNA$). Each calculation is performed for the nine strains listed, which can be separated into three tRNA gene sets. Mean tRNA proportions and translation times are provided below the last strain in each tRNA gene set. tRNA proportions are given to two significant figures (s.f.) and calculated translation times to three s.f.

It should be noted that a limitation of the above model is that it assumes that all mature tRNAs are charged, while the YAMAT-seq proportions include both charged and uncharged mature tRNAs. The degree to which mature E. coli tRNAs are charged has been shown to vary with tRNA type and growth medium (Avcilar-Kucukgoze et al., 2016; Dittmar et al., 2005), with seryl-tRNAs demonstrating particularly low charging levels during exponential growth in rich medium (Avcilar-Kucukgoze et al., 2016). However, despite these differences, both previous studies report consistency of within-family tRNA charging levels. For example, during exponential growth in LB, all four E. coli seryl-tRNAs show a charging rate of?~10% (Avcilar-Kucukgoze et al., 2016). If similar, consistently low charging levels exist for SBW25 seryl-tRNAs during YAMAT-seq, our general conclusions are expected to hold given that the model uses only seryl-tRNA proportions to estimate elongation times.

Overall, the predictions of the model are consistent with our experimental results: the growth defect caused by serCGA deletion is compensated to near-wild type levels by the large-scale duplications encompassing serTGA (see Figure 3). Together, our results are consistent with the hypothesis that tRNA-Ser(CGA) elimination exerts increased translational demand on tRNA-Ser(UGA), and that this pressure is at least partially relieved by elevating tRNA-Ser(UGA) through increased serTGA copy number.

## Discussion

The evolutionary and molecular mechanisms by which different tRNA gene sets emerge have been of consistent, long-standing interest (periodically reviewed in Gingold and Pilpel, 2011; Ikemura, 1985; Rak et al., 2018). A multitude of theoretical studies have focused on various aspects of tRNA gene set evolution, highlighting roles for post-transcriptional modifications and codon bias (Bulmer, 1987; Higgs and Ran, 2008; Ikemura, 1981; Novoa et al., 2012; Ran and Higgs, 2010; Rocha, 2004; Sharp et al., 2010). Phylogenetic analyses have provided evidence of bacterial tRNA gene set evolution by four main routes: (i) tRNA gene loss through deletion events, (ii) tRNA gene acquisition through horizontal gene transfer, (iii) tRNA gene acquisition by within-genome duplication events, and (iv) tRNA gene changes as a result of anticodon switching (Diwan and Agashe, 2018; Marck and Grosjean, 2002; McDonald et al., 2015; Tremblay-Savard et al., 2015; Wald and Margalit, 2014; Withers et al., 2006). In this work, we provide direct, empirical evidence for one of these routes: tRNA gene acquisition by within-genome duplication events. Loss of the single-copy tRNA gene serCGA was compensated by large-scale, tandem duplications that increase the copy number of tRNA gene serTGA, leading to elevation of tRNA-Ser(UGA) in the mature tRNA pool.

### Retention of serCGA in P. fluorescens SBW25 wild type

The observation that increasing the proportion of mature tRNA-Ser(UGA) can compensate for tRNA-Ser(CGA) loss can be explained on the molecular level: we hypothesize that elimination of tRNA-Ser(CGA) places translational strain on tRNA-Ser(UGA) at UCG codons, and that this pressure can be at least partially relieved by elevating tRNA-Ser(UGA) levels. While this hypothesis is consistent with our experimental results, it raises the question of why P. fluorescens SBW25 might retain a copy of serCGA in the natural, plant environment; given that selection for translational efficiency favours fewer tRNA types encoded by more tRNA gene copies (Ran and Higgs, 2010; Rocha, 2004), one might expect serCGA elimination in favour of multiple serTGA gene copies. However, serCGA is frequently retained in bacterial genomes; a study of 319 bacteria from different genera indicates a serCGA retention rate of?~70%, with retention correlating with higher UCG usage (Wald and Margalit, 2014). What is the advantage of retaining tRNA-Ser(CGA) over simply encoding higher levels of tRNA-Ser(UGA)?

One possible explanation for serCGA retention is that, while both tRNA-Ser(CGA) and tRNA-Ser(UGA) can translate UCG codons, tRNA-Ser(CGA) may do so with greater efficiency (i.e., more quickly and/or more accurately). Indeed, changing the anticodon of E. coli tRNA-Ser(UGA) from UGA to CGA has been shown to increase the efficiency of UCG translational in vitro (Takai et al., 1999b). Further, any difference in translational efficiencies may depend on environmental conditions; temperature, acidity, and ion concentration all alter the stability of RNA base pairings (Nikolova and Al-Hashimi, 2010; Serra et al., 2002). Overall more efficient translation of UCG by tRNA-Ser(CGA) in some environments could conceivably offset the cost of serCGA retention, particularly in bacteria with high UCG usage.

The possibility that restoration of serCGA may further increase the fitness of the serTGA duplication strains could be investigated by continuing the evolution experiment past day 13. This is because, in addition to increasing fitness, the duplication of serTGA has provided a possible route by which serCGA could be regained, and thus the original P. fluorescens SBW25 tRNA gene set restored. Specifically, one of the two copies of serTGA could acquire a T→C transition at tRNA position 34, changing the gene from serTGA to serCGA (i.e., an anticodon switch event). In order for such a mutation to spread through the population, it must encode a functional tRNA. This requires the new, hypothetical tRNA-Ser(CGA) to be recognized by seryl-tRNA ligase (SerRS), the enzyme that adds serine to all types of serine-carrying tRNAs in the cell (for a review of tRNA ligase function, see Ibba and Soll, 2000). Recognition of seryl-tRNAs by SerRS depends not on the tRNA sequence or the anticodon, but rather on the characteristic three-dimensional shape of seryl-tRNAs (Lenhard et al., 1999). Therefore, even though serTGA and the original serCGA encode tRNAs with very different sequences (see Figure 1—figure supplement 1A), it is plausible that the new, hypothetical serCGA could form a functional tRNA (see Figure 1—figure supplement 1B). Indeed, a UGA→CGA anticodon switch alters the translational capacity of E. coli tRNA-Ser(UGA) from codon UCA to UCG in vitro (Takai et al., 1999a; Takai et al., 1999b). Notably, none of the five genome sequenced, mutant lineage, day 13 isolates shows evidence of anticodon switch events in either copy of serTGA (see Supplementary file 5); whether anticodon switching occurs across a longer evolutionary time scale remains to be seen.

### Origin of large-scale duplications and new tRNA gene copies

Large-scale, tandem duplication events similar to those seen in this work are a well-documented adaptive solution to various phenotypic challenges in phage, bacteria, and yeast (reviewed in Anderson and Roth, 1977; Elliott et al., 2013; Reams and Roth, 2015). Extensive work has shown that large-scale tandem duplications occur at extremely high rates in bacteria; for example, in an unselected overnight culture, around 10% of Salmonella cells reportedly carry a duplication of some sort, with 0.005–3% carrying a duplication of a particular locus (Anderson and Roth, 1981; Anderson and Roth, 1977; Reams et al., 2010). These rates are orders of magnitude higher than those typically reported for single nucleotide polymorphisms (Westra et al., 2017) and are consistent with the early detection of duplication fragments in our evolution experiment: strains carrying large-scale duplications were isolated from every mutant lineage by day 13 (~90 generations), and the duplication fragments they contained were first detected in the relevant population between days 2 and 5 (Figure 4E; Figure 4—figure supplement 2).

Large-scale duplications arise through unusual exchange of DNA between two separate parts of the bacterial chromosome, with the separating distance determining the size of the duplication (reviewed in Anderson and Roth, 1977; Elliott et al., 2013; Reams and Roth, 2015). A variety of mechanisms have been reported to underpin duplication formation, including (i) RecA-mediated, unequal recombination, (ii) RecA independent unequal recombination, and (iii) errant topoisomerase or gyrase activity (Reams et al., 2014; Reams and Roth, 2015; Shyamala et al., 1990). The first of these, RecA-mediated unequal recombination, occurs between direct repeats some distance apart (e.g., rRNA operons?and rhs genes), with longer repeats generally leading to higher rates of recombination (Anderson and Roth, 1981). Two of the five mutant lineage isolates in this work (M2-L, M3-L) have endpoints in?~1.5 kb direct, imperfect repeats at 4.12 Mb and 4.31 Mb of the SBW25 chromosome, suggesting that the?~192 kb duplication fragment they contain arose by RecA-mediated, unequal recombination between these regions. The duplication fragments in the other three mutant lineage isolates (M1-L, M2-Lop, and?M4-L) show no obvious signs of homology between their endpoints (Table 1), indicating an alternative mechanistic origin. Overall, the diversity in duplication fragment endpoints – location and degree of homology – in this study are indicative of a range of mechanistic origins.

While duplication formation does not necessarily require sequence homology, duplications that occur at the highest rates typically result from unequal recombination between long (>200 bp) repeats (Anderson and Roth, 1981). The P. fluorescens SBW25 genome contains many of these types of repeats dispersed around the chromosome, including five nearly identical rRNA operons and three highly similar rhs genes (Silby et al., 2009). In addition, there are hundreds of smaller repeats throughout the genome, including REPINs and tRNA genes (Bertels and Rainey, 2011; Silby et al., 2009). Errant recombination between any of these repeated sequences could plausibly generate large-scale duplications, meaning that almost any region of the SBW25 chromosome – and therefore many tRNA genes – could conceivably be duplicated via homologous recombination. It has previously been noted that the region surrounding the SBW25 replication terminus appears to be more susceptible to evolutionary change than the rest of the chromosome (Silby et al., 2009). This variable region extends approximately 1.4 Mb?on either side of the terminus, engulfing 28 tRNA genes (including serTGA; see Figure 1B and Supplementary file 4). It seems probable that, while the copy number of many tRNA genes could feasibly be elevated by large-scale duplications, the tRNA genes surrounding the SBW25 replication terminus may be more prone to evolutionary change by duplication.

Given that dispersed repeats and large-scale duplications are a widespread feature of bacterial genomes (reviewed in Brazda et al., 2020; Reams and Roth, 2015), similar within-genome duplication may be capable of generating changes in tRNA gene copy number in many bacteria. Presumably, different duplication fragments – and therefore tRNA genes – arise at varying rates (depending on the presence and length of direct repeats in the surrounding area) and have varying degrees of evolutionary success (depending on fragment size and the dosage effects of genes in the fragment).

### Evolutionary fate of the large-scale duplications and new tRNA gene copies

Large-scale duplications in bacterial genomes are typically unstable (reviewed in Reams et al., 2010; Reams and Roth, 2015). That is, in addition to occurring at high rates, they are also lost – without a trace – at high rates. Duplications are lost in?~1% of cells per generation in Salmonella cultures, in the absence of selection (Anderson and Roth, 1981). Due to their combined ease of gain and loss, it has been suggested that duplications serve as fleeting evolutionary solutions to transient phenotypic challenges (Sonti and Roth, 1989); they occur at high frequency without adversely affecting the long-term structure and integrity of the genome. The reported instability of large-scale duplications raises questions about the long-term fate of the duplications seen during this experiment. That is, if the serial transfer experiment was continued past day 13, what would happen to the duplication strains and the tRNA genes that they contain?

Extension of the evolution experiment could have several outcomes. One is that duplication fragments encompassing serTGA may continue to arise and be lost, with duplication strains eventually reaching a stable level within the population (Reams et al., 2010). In this scenario, while individual large-scale duplications would continually rise and fall, they would remain the dominant evolutionary solution in the population. A second possibility is that progressively smaller (and inherently more stable) duplication fragments may arise, perhaps eventually resulting in a duplication fragment comprised only of serTGA and its promoter. Such smaller duplication fragments may arise either from a serCGA deletion mutant (by rarer duplication events between more proximate DNA sequences) or from a duplication strain (by remodelling of the existing duplication fragment). Once a smaller, more stable fragment is dominant in the population, one copy of serTGA could conceivably change to serCGA through an anticodon switch event (see earlier discussion). A third possible outcome of continuing the evolution experiment is that, over time, a more stable mutation may arise, independently of the duplication fragments. Possible examples include (i) point mutation(s) in the promoter of serTGA, leading to increased serTGA expression without requiring additional gene copies, (ii) mutation(s) that extend the translational capacity of seryl-tRNAs to translate (or, to better translate) UCG codons, and (iii) synonymous point mutation(s) in highly expressed UCG codon(s), lowering the translational demand for tRNA-Ser(UGA). Should any of these more stable mutations arise, they would be expected to displace large-scale duplications – and in some cases, the additional serTGA copy – as the dominant evolutionary strategy in the population.

### Concluding remarks

The elimination of one tRNA type from P. fluorescens SBW25 was readily counteracted by large-scale duplication events that increased the gene copy number of a second, compensatory tRNA type. Together, our results provide a direct observation of the evolution of a bacterial tRNA gene set by gene duplication, and lend empirical support for the optimization of translation by codon–tRNA matching.

## Materials and methods

Key resources table

### Strains, growth conditions, and oligonucleotides

Request a detailed protocol

Full lists of strains, plasmids, and oligonucleotides used are provided in Supplementary file 2. The serCGA deletion was constructed twice, independently; ΔserCGA-1 and ΔserCGA-2 are biological replicates. Unless otherwise stated, P. fluorescens SBW25 cultures were grown in King’s Medium B (KB; King et al., 1954) for?~16 hr at 28°C with shaking. E. coli strains were grown in Lysogeny broth (LB) for 16–18 hr at 37°C with shaking.

### Growth curves

Request a detailed protocol

Strains were streaked from glycerol stocks on KB, M9, or LB+Gm (20 μg ml?1) plates. After 48 hr incubation, six or seven colonies per strain (numbers of replicates based on previous work: Gallie et al., 2015; Lindsey et al., 2013) were grown in 200 μl of liquid KB, M9, or LB+Gm (20 μg ml?1) in a 96-well plate. Two microlitres of each culture were transferred to a fresh 198 μl of medium in a new 96-well plate, sealed with a plastic lid or a breathable rayon film (VWR), and grown at 28°C in a BioTek Epoch two plate reader. Absorbance at 600 nm (OD600) of each well was measured at 5 min intervals, with 5 s of 3 mm orbital shaking before each read. Medium control wells were used to standardize other wells. The mean absorbance and standard error of the replicates at every time point were used to draw the growth curves in Figures 2, 3, and 4. Maximum growth rate and lag time were calculated using a sliding window of nine data points during the exponential growth window of the curve (Gen5 software from BioTek; see also source data files 1, 3, and 4).

### Fitness assays

Request a detailed protocol

For each competition, six replicates were performed in three separate blocks. All competitions within a block were performed in parallel. The number of replicates is based on previous work (Beaumont et al., 2009; Gallie et al., 2019). Single colonies of each competitor were grown independently in shaken KB. Competition tubes were inoculated with?~5?×?106 cells of each competitor and incubated at 28°C (shaking, 24 hr). Competitor frequencies were determined by plating on KB agar or LB+X-gal (60 μg ml?1) agar at 0 and 72 hr. Competing genotypes were readily distinguished by their distinctive morphologies (on KB agar) or colour (neutrally?marked SBW25-lacZ forms blue colonies on LB+X-gal; Zhang and Rainey, 2007). Relative fitness was expressed as the ratio of Malthusian parameters (Lenski, 1991) in Figures 2 and 3. Deviation of relative fitness from one was determined by two-tailed, parametric one-sample t-tests (see also source data file 2).

### Evolution experiment

Request a detailed protocol

SBW25 (wild type), SBW25-eWT (engineering control), and the two independent tRNA-Ser(CGA) deletion mutants (ΔserCGA-1 and ΔserCGA-2) were streaked from glycerol stocks onto KB agar and grown at 28°C for 48 hr. Two colonies from every strain were picked.?Each of the eight chosen colonies became the founder of one evolutionary lineage. This resulted?in four independent?wild?type lineages (W1–W4) and four mutant lineages (M1–M4).A medium control lineage was also included. The numbers of parallel lineages were chosen based on the available laboratory resources. Each colony was inoculated into 4 ml KB in a 13 ml plastic tube and incubated overnight at 28°C (shaking). Each grown culture (day 0) was vortexed for 1 min, and 100 μl?was used to inoculate 10 ml KB in a 50 ml Falcon tube (28°C, shaking, 24 hr). Every 24 hr thereafter, 1% of each culture was transferred to a fresh 10 ml KB in a 50 ml Falcon tube, and a sample of the population frozen at ?80°C. The experiment was continued until day 15. Populations were periodically dilution plated on KB agar to check for changes in colony size.

### Genome sequencing

Request a detailed protocol

Seven isolates were purified and stored from day 13 of the evolution experiment (W1-L, W3-L, M1-L, M2-L, M2-Lop, M3-L, M4-L). Genomic DNA was isolated from 0.5 ml overnight culture of each using a Qiagen DNeasy Blood and Tissue Kit. DNA quality was checked by agarose gel electrophoresis. Whole genome sequencing was performed by the sequencing facility at the Max Planck Institute for Evolutionary Biology (Ploen, Germany). Paired-end, 150 bp reads were generated with an Illumina NextSeq 550 Output v2.5 kit. Raw reads are available at NCBI sequence read archive (SRA accession number: PRJNA558233; International Nucleotide Sequence Database Collaboration et al., 2011). A minimum of 4.5 million raw reads per strain were aligned to the SBW25 genome sequence (NCBI genome reference sequence NC_012660.1; Silby et al., 2009) using breseq (Deatherage and Barrick, 2014b) and Geneious (v11.1.4). A minimum mean coverage of 94.7 reads per base pair was obtained. A full list of mutation predictions is provided in Supplementary file 3.

### Identification of duplication junctions

Request a detailed protocol

The duplication junctions in M1-L, M2-L, M2-Lop, M3-L, and M4-L were identified using a combination of analysis of whole genome sequencing data and laboratory-based techniques. The raw reads obtained from whole genome sequencing of each isolate were aligned to the SBW25 genome sequence (Silby et al., 2009) using breseq (Barrick et al., 2014; Deatherage et al., 2014a; Deatherage and Barrick, 2014b) and Geneious (v11.1.4). Coverage analyses were performed in Geneious, and coverage plots generated in R (v3.6.0) (Figure 4—figure supplement 1). Manual inspection of the Geneious alignment in coverage shift regions led to predicted junctions in all isolates except M3-L. Each predicted junction was checked by alignment to (i) raw reads and (ii) previously unused sequences (using Geneious). Junction sequences were confirmed by PCR and Sanger sequencing (for primer details, see Supplementary file 2).

### Historical junction PCRs

Request a detailed protocol

Glycerol stock scrapings of the frozen daily populations, or large colony isolates, from each mutant lineage were grown in liquid KB. Washed cells were used as PCR templates, alongside positive and negative controls (see Supplementary file 2 for primer details). The PCR products were run on a 1% agarose gel against a 1 kb DNA ladder at 100 volts for 90 min. Gels were stained with SYBR Safe and photographed under UV illumination. In order to better detect faint PCR products in the earlier days of the evolution experiment, the colours in each photograph were inverted using Preview (v11.0) (Figure 4E and Figure 4—figure supplement 2).

### Expression of tRNA genes from the pSXn plasmid

Request a detailed protocol

Wild type copies of the serCGA and serTGA genes were individually ligated into the expression vector, pSXn, to give pSXn-CGA and pSXn-TGA. The pSXn vector contains an IPTG-inducible tac promoter (de Boer et al., 1983; Owen and Ackerley, 2011). Together with the empty vector, the two constructs were separately placed into the SBW25, ΔserCGA-1, and ΔserCGA-2 backgrounds. This was achieved by transformation of the vector constructs into chemically competent cells (Gallie et al., 2015). The growth profiles of the nine resulting genotypes were obtained in liquid LB+Gm (20 μg ml?1), in six replicates of six (see Growth Curves methods, Figure 4C?and?D, and source data file 4). No IPTG was added at any stage, in order to achieve lower-level, leaky expression of the tRNA gene from the uninduced tac promoter.

### YAMAT-seq procedure

Request a detailed protocol

YAMAT-seq (Shigematsu et al., 2017) is adapted in this work for use in P. fluorescens SBW25. Three independent replicates (based on replicate numbers reported in Shigematsu et al., 2017) of nine strains (i.e., 27 samples) were grown to mid-exponential phase in 250 ml flasks containing 20 ml KB. Total RNA was isolated from 1.5 ml aliquots (TRIzol Max Bacterial RNA isolation kit). For each sample, 10 μg of total RNA was subjected to tRNA deacylation treatment – incubation in 20 mM Tris-HCl (pH 9.0) for 40 min at 37°C. Each deacylated RNA sample was desalted and concentrated by ethanol precipitation. Y-shaped, DNA/RNA hybrid adapters (Eurofins; Shigematsu et al., 2017) were ligated to the conserved, exposed 5'-NCCA-3' and 3'-inorganic phosphate-5' ends of uncharged tRNAs using T4 RNA ligase 2. Ligation products were reverse transcribed into cDNA using SuperScript III reverse transcriptase and amplified by 11 rounds of PCR with Phusion. One of the 27 sample-specific indices listed in Supplementary file 7 was added to each of the 27 reactions. The quality and quantity of each PCR product were checked using an Agilent DNA 7500 kit on a Bioanalyzer, and samples combined in equimolar amounts into one tube. The mixture was run on a 5% native polyacrylaminde gel, and the bands between 180 and 250 bp excised. DNA was extracted in deionized water overnight, and agarose removed by centrifugation through filter paper. The final product was sequenced at the Max Planck Institute for Evolutionary Biology (Ploen, Germany). Single-end, 150 bp reads were generated with an Illumina NextSeq 550 Output v2.5 kit. YAMAT-seq data is available at NCBI Gene Expression Omnibus (GEO accession number GSE144791) (Edgar et al., 2002).

### Analysis of YAMAT-seq data

Request a detailed protocol

### Statistical tests

Request a detailed protocol

Parametric two-tailed two-sample t-tests were performed to detect differences in maximum growth rate (Vmax) and lag time in growth curves (Figures 2C, G, 3C, D, and 4D, source data files 1, 3, and 4) in cases where all assumptions were satisfied. Where equal variance or normality assumptions were violated, non-parametric Welch two-sample t-tests and Mann–Whitney–Wilcoxon rank sum tests were used, respectively (see Figure 3C and D, source data file 3). Parametric one-tailed one-sample t-tests were used to detect deviations of relative fitness values from one in competition assays (see Figures 2D, H and 3E, source data file 2). DESeq2 adjusted (for multiple comparisons) p-values were used to detect differences in tRNA expression during YAMAT-seq (Figure 5A and B, Figure 5—figure supplement 1, Figure 5—source data 1). Analyses were performed in R (v3.6.0). Significance levels: ns?=?not significant (p>0.05), *0.05?< p < 0.001, **0.01?< p < 0.001, ***p<0.001.

## References

1. 1
2. 2
Basic local alignment search tool (1990)
Journal of Molecular Biology 215:403–410.
https://doi.org/10.1016/S0022-2836(05)80360-2
3. 3
4. 4
5. 5
6. 6
7. 7
8. 8
9. 9
10. 10
11. 11
12. 12
13. 13
14. 14
15. 15
16. 16
17. 17
The selection-mutation-drift theory of synonymous Codon usage
(1991)
Genetics 129:897–907.
18. 18
19. 19
20. 20
21. 21
22. 22
23. 23
24. 24
25. 25
26. 26
27. 27
28. 28
29. 29
30. 30
31. 31
32. 32
33. 33
34. 34
35. 35
36. 36
37. 37
38. 38
Aminoacyl-tRNA synthesis (2000)
Annual Review of Biochemistry 69:617–650.
https://doi.org/10.1146/annurev.biochem.69.1.617
39. 39
40. 40
41. 41
Nucleic Acids Research 39:D19–D21.
https://doi.org/10.1093/nar/gkq1019
42. 42
43. 43
44. 44
Limitations of translational accuracy
(1996)
In: FC Neidhardt, editors. Escherichia coli and Salmonella - Cellular and Molecular Biology. ASM Press. pp. 979–1004.
45. 45
Growth-optimizing accuracy of gene expression (1987)
Annual Review of Biophysics and Biophysical Chemistry 16:291–317.
https://doi.org/10.1146/annurev.bb.16.060187.001451
46. 46
47. 47
48. 48
49. 49
Relative affinities of all Escherichia coli aminoacyl-tRNAs for elongation factor Tu-GTP
(1984)
The Journal of Biological Chemistry 259:5010–5016.
50. 50
51. 51
52. 52
53. 53
54. 54
55. 55
56. 56
57. 57
58. 58
59. 59
60. 60
61. 61
62. 62
63. 63
A language and environment for statistical computing (2013)
R Foundation for Statistical Computing, Vienna, Austria.
64. 64
65. 65
66. 66
67. 67
68. 68
69. 69
Mechanisms of gene duplication and amplification (2015)
Cold Spring Harbor Perspectives in Biology 7:a016592.
https://doi.org/10.1101/cshperspect.a016592
70. 70
71. 71
72. 72
73. 73
Translation in prokaryotes (2018)
Cold Spring Harbor Perspectives in Biology 10:a032664.
https://doi.org/10.1101/cshperspect.a032664
74. 74
75. 75
76. 76
77. 77
78. 78
Forces that influence the evolution of Codon bias (2010)
Philosophical Transactions of the Royal Society B: Biological Sciences 365:1203–1212.
https://doi.org/10.1098/rstb.2009.0305
79. 79
80. 80
81. 81
82. 82
83. 83
Role of gene duplications in the adaptation of Salmonella typhimurium to growth on limiting carbon sources
(1989)
Genetics 123:19–28.
84. 84
85. 85
86. 86
87. 87
88. 88
89. 89
90. 90
91. 91
92. 92
93. 93
94. 94
95. 95
96. 96
97. 97
98. 98

## Decision letter

1. Vaughn S Cooper
Reviewing Editor; University of Pittsburgh, United States
2. Patricia J Wittkopp
Senior Editor; University of Michigan, United States

In the interests of transparency, eLife publishes the most substantive revision requests and the accompanying author responses.

Acceptance summary:

This study shows that deletion of a non-essential single-copy tRNA gene in Pseudomonas alters the cellular tRNA pool and reduces fitness, especially when conditions enable rapid growth. During experimental evolution in the laboratory, they find that the tRNA deletion can be compensated by repeated, large duplications of a part of the genome, which include a near cognate tRNA gene. This work demonstrates effects of tRNA gene redundancy on fitness and the means by which genomes can rapidly compensate for the loss of redundancy.

Decision letter after peer review:

Thank you for submitting your article "The birth of a bacterial tRNA gene" for consideration by eLife. Your article has been reviewed by three peer reviewers, and the evaluation has been overseen by a Reviewing Editor and Patricia Wittkopp as the Senior Editor. The reviewers have opted to remain anonymous.

The reviewers have discussed the reviews with one another and the Reviewing Editor has drafted this decision to help you prepare a revised submission.

As the editors have judged that your manuscript is of interest, but as described below that additional experiments are required before it is published, we would like to draw your attention to changes in our revision policy that we have made in response to COVID-19 (http://www.asadoresteatre.com/articles/57162). First, because many researchers have temporarily lost access to the labs, we will give authors as much time as they need to submit revised manuscripts. We are also offering, if you choose, to post the manuscript to bioRxiv (if it is not already there) along with this decision letter and a formal designation that the manuscript is "in revision at eLife". Please let us know if you would like to pursue this option. (If your work is more suitable for medRxiv, you will need to post the preprint yourself, as the mechanisms for us to do so are still in development.)

Summary:

This well-written manuscript demonstrates that deletion of a non-essential single-copy tRNA gene in Pseudomonas fluorescens (ser-tRNA CGA) alters the cellular tRNA pool and reduces fitness. During experimental evolution in the laboratory, the authors find that the tRNA deletion is compensated by repeated, large duplications of a part of the genome, which include a near cognate tRNA gene (tRNA TGA). The duplications are associated with increased tRNA TGA expression and increased fitness. The authors suggest that this is a novel evolutionary response to overcome translational inefficiency. These results are framed by a simple model of translation dynamics to understand the initial fitness effects of the gene deletion, and the observed evolutionary response. Overall, the reported experimental work is well done and presents interesting results. The manuscript is also clearly written. However, all reviewers raised questions about the presented mathematical/verbal model and agree that the novelty and breadth of the findings are overstated. Evolution via gene duplication is a well-known phenomenon in evolutionary biology, has been observed in many experimental evolution studies, and is the most likely outcome of the experimental design. The repeated observation of duplications of a region containing tRNA-TGA as well as other tRNAs is a worthwhile finding but the generality of this result for our broader understanding of the evolution of tRNAs requires further exploration.

Essential revisions:

1) Please consider more of the literature on tRNA pool evolution and clarify how this study represents a significant advance. There has been much discussion of gene loss and duplication as key features (e.g. Withers et al., 2006; Wald and Margalit 2014; Tremblay-Savard et al., 2015), and the resulting evolutionary flexibility of tRNA gene sets (e.g. Ikemura, 1985; Rocha, 2004; Higgs and Ran, 2008; Diwan et al., 2018). This paper provides experimental support for these ideas, arising from the deletion of a single bacterial tRNA gene. This is a valuable result, being the first such demonstration in bacteria. However (contrary to the projection in the manuscript), this is not an unexpected result, and is not sufficient to generalize broadly. The reported adaptation of YAMAT-seq to measure bacterial tRNAs is very useful. Prior models of tRNA gene set evolution have demonstrated the importance of codon usage bias for translation rate (e.g. Bulmer, 1991; Berg et al., 1997; Higgs and Ran, 2008).

2) The initial model is overly simplistic and ignores much of the advances made in our understanding from prior work showing the importance of codon usage (see references above). The model does not explicitly include links between tRNA set and translation rate, and between translation rate and fitness; these are instead left as verbal arguments. Equation 1 is odd, because it suggests that codon B is translated by both alpha and beta (whereas only one tRNA can decode a codon at a time) and perhaps only works when alpha is limiting. The meaning of Equation 3 is unclear. The calculated translation times (Equation 4) should probably be clearly discussed as relative (not absolute) times. Finally, none of the model predictions are novel (see references above). Citing and discussing prior work may be sufficient to clearly set up the basic premise here (instead of the model), allowing a deeper focus on the experimental work.

3) We are not convinced that a key assumption of the model is reasonable: " the rate of translation by an anticodon-codon pair is determined solely by the proportion of the anticodon in the tRNA pool". I would think that the rate-limiting step in the translation of a codon is determined by the stochastic search for the cognate ternary complex (aminoacyl-tRNA+EF-Tu and GTP) to the A-site (Varenne et al., 1984). I cannot see that this is directly related to the proportion in the tRNA pool, but mainly to the concentration of each cognate ternary complex at steady-state. Reducing the concentration drastically by deleting a tRNA gene is likely to be limiting for growth, but this can be compensated for by increasing the concentration by a duplication. If the authors assume that competition with non-cognate and near-cognate ternary complexes are of major importance for the rate of translation of a codon this should be clearly stated. The authors find that tRNA proportions vary almost 100,000-fold; does this mean that translation rates are also expected to vary 100,000-fold and is there any experimental support for this? Please provide a clear explanation for why proportion of the anticodon in the tRNA pool is expected to be rate-limiting, supported by proper references.

4) A number of factors relevant to understand translation rates and tRNA gene evolution are not discussed in sufficient depth, and as early in the manuscript as is necessary. For instance, codon usage doesn't feature in the Results section until much later. So the experimental results seem puzzling until it is clear that the non-essential tRNA gene actually recognizes a codon that is very abundant in the genome. In the Discussion section (subsection “Retention of serCGA in P. fluorescens SBW25 wild type”), selection due to codon bias should be considered as a 4th hypothesis for the retention of tRNA(CGA) (perhaps in combination with hypotheses 1 or 3). Prior work also shows that tRNA modifications can alter the accuracy and efficiency of translation (Grosjean et al., 2010; Bjork and Hagervall, 2014; Manickam et al., 2015). These details are mentioned in passing, but deserve more prominence because they are really critical to set expectations and interpret the results. The focal tRNA species are expected to be modified by cmoAB; if the Pseudomonas strain used here has this modification system, it could explain the observed results: modified tRNA(UGA) can compensate tRNA(CGA) function, whereas the other near cognate tRNAs cannot (it is unclear whether G-U wobble works when G is in the codon and U is in the anticodon). I suggest that the details about codon usage, gene copy numbers of all cognate and near-cognate tRNAs, and relevant modification systems should be presented and clearly discussed at the outset.

5) We agree that the increase in tRNA(UGA) levels probably drove the large duplications observed during evolution. However, there are some points of concern here.

a) Given that the deletions were large, it would be useful to be able to estimate the contribution of tRNA(UGA) to increased fitness. Does deleting the duplicated tRNA(UGA) in evolved isolates reduce fitness, and by how much? Related to this, it was not clear whether there were any other mutations in the evolved lines, and their identity; e.g. were there any promoter mutations in the native copy of the tRNA(UGA) gene?

b) What is the level of overexpression of the tRNA gene on the plasmid (Figure 4)? If this is much more than the 2-fold increase due to gene duplication, it means that we do not know if a 2-fold increase is sufficient to increase fitness. On a related note, I could not see information on the sensitivity of the tRNA-seq method (I might have missed this); this is necessary to know how much confidence to place in the measured fold change values.

[Editors' note: further revisions were suggested prior to acceptance, as described below.]

Thank you for submitting your article "The birth of a bacterial tRNA gene by large-scale, tandem duplication events" for consideration by eLife. Your article has been reviewed by two peer reviewers, and the evaluation has been overseen by a Reviewing Editor and Patricia Wittkopp as the Senior Editor. The reviewers have opted to remain anonymous.

The reviewers have discussed the reviews with one another and the Reviewing Editor has drafted this decision to help you prepare a revised submission.

We would like to draw your attention to changes in our revision policy that we have made in response to COVID-19 (http://www.asadoresteatre.com/articles/57162). Specifically, we are asking editors to accept without delay manuscripts, like yours, that they judge can stand as eLife papers without additional data, even if they feel that they would make the manuscript stronger. Thus the revisions requested below only address clarity and presentation.

Summary:

The reviewers are mostly satisfied with the response to the prior set of reviews and appreciate the well-written presentation. A few points remain to be addressed to clarify assumptions and discuss caveats of your conclusions.

Revisions:

1) Please specify your assumption that all tRNAs are fully charged, maybe with reference that this is not always the case and description of what YAMAT-Seq measures. We are not quite satisfied with the response to Essential revision point 3 where the authors were asked to explain and justify the assumptions made to be able to say that the translation rate of a codon is determined solely by the proportion of anticodon in the tRNA pool. In the revised version the envisaged translation system is explained more clearly and key assumptions, such that EF-Tu binds all types of tRNA with equal affinity, are explicitly stated. However, the process of charging of tRNAs by aminoacyl-tRNA synthetases/tRNA-ligases and assumptions about charging levels of tRNAs remains incompletely considered. The authors only refer to mature tRNAs in the text, but I am not sure if this includes both charged (aa-tRNA) and uncharged tRNAs. This leads to a number of questions about experimental data, assumptions and what is included in the model:

a) Does YAMAT-seq measure both charged and uncharged tRNAs?

b) Do you assume that all tRNAs are fully charged? This might be reasonable in minimal media (for example Kimberly A Dittmar, Michael A S?rensen, Johan Elf, M?ns Ehrenberg, Tao Pan. Selective charging of tRNA isoacceptors induced by amino-acid starvation. EMBO Rep 2005 Feb;6(2):151-7 and references therein).

The focus on serine is potentially problematic. Serine is one of the most toxic amino acids and it is possible that the reduction in growth rate in the deletion mutants are mainly due to this toxicity rather than a reduced translation rate, which would provide an alternative explanation for why the effect on growth is much smaller in minimal media. The proteose peptone 3 used in KB medium is high in serine (about 12% of total amino acids) suggesting that this might cause toxicity due to the inability of L-serine-deaminase to degrade excess serine after a reduction in ser-tRNA concentration. Addressing this in the final manuscript would make readers aware of this issue, even while pointing out that this explanation may be insufficient because adding a plasmid with the amplified tRNA increases fitness.

2) Related to this point, in rich media it has been seen that charging levels for serine tRNAs can be very low at below 10% although serine concentration is high (Avcilar-Kucukgoze et al., 2016). This work is new to this reviewer, but this might be of interest to the authors and readers of the article.

https://doi.org/10.7554/eLife.57947.sa1

## Author response

Essential revisions:

1) Please consider more of the literature on tRNA pool evolution and clarify how this study represents a significant advance. There has been much discussion of gene loss and duplication as key features (e.g. Withers et al., 2006; Wald and Margalit, 2014; Tremblay-Savard et al., 2015), and the resulting evolutionary flexibility of tRNA gene sets (e.g. Ikemura, 1985; Rocha, 2004; Higgs and Ran, 2008; Diwan et al., 2018). This paper provides experimental support for these ideas, arising from the deletion of a single bacterial tRNA gene. This is a valuable result, being the first such demonstration in bacteria. However (contrary to the projection in the manuscript), this is not an unexpected result, and is not sufficient to generalize broadly. The reported adaptation of YAMAT-seq to measure bacterial tRNAs is very useful. Prior models of tRNA gene set evolution have demonstrated the importance of codon usage bias for translation rate (e.g. Bulmer, 1991; Berg et al., 1997; Higgs and Ran, 2008).

To address this comment, we have re-written substantial portions of the manuscript, including the Abstract, Introduction, and Discussion. Specifically:

1) The Introduction has been re-written to include a more comprehensive review of the literature on tRNA gene set evolution. It now includes a deeper analysis of factors that influence tRNA gene set evolution (including post-transcriptional modification pathways, and codon use; and the mechanisms of evolutionary change. The suggested references have been included, and we thank the reviewers for these useful suggestions.

2) The Abstract, final paragraph of the Introduction, and the final paragraph of the Discussion have been re-written to clarify the advances made in our study. In our opinion, the main advance provided by our work is the direct observation of the evolution of a tRNA gene set by the mechanism of gene duplication (Introduction). In this way, our work provides empirical support for a number of phylogenetic and computational studies that postulate gene duplication as a mechanism of bacterial tRNA gene set evolution (Tremblay-Savard et al., 2015; Wald and Margalit, 2014; Withers et al., 2006). We agree that our results alone are not sufficient to generalize about the prevalence of duplication events in tRNA gene set evolution, and have included this in our Discussion. A second advance provided by our work is the adaptation of YAMAT-seq for use in bacteria (Introduction). We are pleased that the reviewers think this is useful, and hope that it will aid others in their research.

2) The initial model is overly simplistic and ignores much of the advances made in our understanding from prior work showing the importance of codon usage (see references above). The model does not explicitly include links between tRNA set and translation rate, and between translation rate and fitness; these are instead left as verbal arguments. Equation 1 is odd, because it suggests that codon B is translated by both alpha and beta (whereas only one tRNA can decode a codon at a time) and perhaps only works when alpha is limiting. The meaning of Equation 3 is unclear. The calculated translation times (Equation 4) should probably be clearly discussed as relative (not absolute) times. Finally, none of the model predictions are novel (see references above). Citing and discussing prior work may be sufficient to clearly set up the basic premise here (instead of the model), allowing a deeper focus on the experimental work.

The response to Essential revision point 3 (below) also contains information relevant to this point.

From the reviewers’ comments, we can see that our explanation of the initial model was unclear. For example, we did not intend to imply that a single codon is translated by more than one tRNA molecule at any given time; rather, in cases where a codon can be translated by more than one type of tRNA, the proportions of these tRNAs were added together to estimate the average time required to randomly sample a matching tRNA from the available pool (see also the summary provided in response to Essential revision 3). The initial model was indeed simple. It was intended as a null model demonstrating one simple point: translational speed is optimized by eliminating tRNA redundancy (in the absence of complicating factors). Hence, in cases where surplus tRNAs are retained (such as serCGA in P. fluorescens SBW25), additional factors should be considered. An example factor – different translational efficiencies between codon-tRNA pairings – was provided by the extended model.

However, on reflection, we agree with the reviewers that the manuscript would benefit from reorganization to allow more focus on the main novelty of our study: the experimental work. As suggested, we have removed the initial modelling section, instead discussing more prior work in the Introduction (including work on the effects of codon use, links between tRNA set and translation rate, and links between translation rate and growth/fitness). The results now begin with a new section, “The P. fluorescens SBW25 tRNA gene set”. This section covers the structure of the tRNA gene set, with emphasis on the seryl-tRNAs and codons that feature later in our work. A (re-written) modelling section appears at the end of the Results section, with the aim of providing a cohesive molecular explanation of the observed results.

3) We are not convinced that a key assumption of the model is reasonable: " the rate of translation by an anticodon-codon pair is determined solely by the proportion of the anticodon in the tRNA pool". I would think that the rate-limiting step in the translation of a codon is determined by the stochastic search for the cognate ternary complex (aminoacyl-tRNA+EF-Tu and GTP) to the A-site (Varenne et al., 1984). I cannot see that this is directly related to the proportion in the tRNA pool, but mainly to the concentration of each cognate ternary complex at steady-state. Reducing the concentration drastically by deleting a tRNA gene is likely to be limiting for growth, but this can be compensated for by increasing the concentration by a duplication. If the authors assume that competition with non-cognate and near-cognate ternary complexes are of major importance for the rate of translation of a codon this should be clearly stated. The authors find that tRNA proportions vary almost 100,000-fold; does this mean that translation rates are also expected to vary 100,000-fold and is there any experimental support for this? Please provide a clear explanation for why proportion of the anticodon in the tRNA pool is expected to be rate-limiting, supported by proper references.

These comments, together with those in Essential revision point 2, lead us to conclude that there is a miscommunication about how we envisage the translational system and effects of the observed mutations. To clarify our position we have written the following summary (including definitions, citations, and details of where the relevant information can now be found in the manuscript):

Summary

Ternary complexes versus the mature tRNA pool.

Prokaryotic translation consists of three main stages: initiation, elongation, and termination (reviewed in Rodnina, 2018). The rate limiting step of elongation is the stochastic search for a ternary complex to match the codon occupying the ribosomal A site (Varenne et al., 1984). Ternary complexes consist of a mature tRNA, elongation factor EF-Tu, and GTP (Bensch et al., 1991). EF-Tu binds all types of mature tRNA with approximately equal affinity (Louie et al., 1984; Ott et al., 1990). Hence the composition of the ternary complex pool is expected to reflect that of the mature tRNA pool; proportions of mature tRNAs provide a barometer for ternary complex availability, and ultimately elongation speed. This information is now included in the Introduction and Results.

Competition between cognate and near-cognate tRNAs.

A codon-tRNA match is considered “cognate” when the first two bases of the codon form Watson-Crick base pairs with the tRNA anticodon and the third codon base forms either a Watson-Crick or wobble base pair (Plant et al., 2007). Hence, in our system, both tRNA-Ser(CGA) and tRNA-Ser(UGA) are considered to be cognate matches for codon UCG (and tRNA-Ser(UGA) is also a cognate match for codons UCA and UCU). This means that, while a single UCG codon is translated by a single tRNA molecule, stochastic sampling of either a tRNA-Ser(CGA) or tRNA-Ser(UGA) ternary complex is expected to result in translation. This is why the proportions of tRNA-Ser(CGA) and tRNA-Ser(UGA) are added together to estimate the average time required to match a UCG codon (Equation 2B) (this point may also help to clarify some parts of Essential revision point 2).

For simplicity, our model assumes that the two tRNA types translate UCG with equal efficiency. The possibility that UCG is translated more efficiently by tRNA-Ser(CGA) than tRNA-Ser(UGA) provides one reason why duplication of serTGA does not fully restore wild type fitness, and may contribute to the retention of serCGA in the wild type strain (as opposed to 2+ copies of serTGA) (Discussion). We have not referred to tRNA-Set(CGA) and tRNA-Ser(UGA) as being “in competition” for translating UCG because, in our view, they are better described as synergistic: the translational demand imposed by UCG codons is, presumably, shared between the two types. We have included a new section at the beginning of the Results (“The P. fluorescens SBW25 tRNA gene set”) in which we outline the codons, tRNAs, and matching patterns that are important for understanding the manuscript. This new section, together with the extended Introduction, aims to clarify these issues from the outset.

The effects of our mutations on translation.

As outlined above, UCG codons are expected to be translated by tRNA-Ser(CGA) or tRNA-Ser(UGA) in wild type P. fluorescens SBW25. Deletion of serCGA eliminates tRNA-Ser(CGA), presumably resulting in all UCG codons being translated by tRNA-Ser(UGA). Hence, in a serCGA deletion mutant, a lower proportion of ternary complexes match UCG, leading to an increase in the average time needed to stochastically sample a matching ternary complex (and ultimately slowing translation/growth). As the reviewers correctly surmise, the translational and selective pressure exerted on tRNA-Ser(UGA) by serCGA deletion can be relieved by elevating tRNA-Ser(UGA) levels, through increasing serTGA gene copy number. We have included this explanation in the final Results section and Discussion.

We appreciate that the original manuscript did not sufficiently lay out the relationship between UCG codons and tRNAs. We hope that the new organization and sections of the manuscript clarify these issues, and that the results of our experiments are now more intuitive.

The reviewers bring up some very good points regarding the YAMAT-seq measures. As they point out, the within-strain mature tRNA proportions measured by YAMAT-seq vary about 100,000 fold. Even after removing the three lowest tRNA proportions (which are thought to be affected by post-transcriptional modifications impeding the reverse transcription reaction), the variation in tRNA proportions exceeds reported variations in codon translation rates (Gardin et al., 2014). This highlights that, while the YAMAT-seq results provide a useful overview of the relative amounts of tRNAs in a mature tRNA pool, within-strain comparisons should be treated with caution. The real strength of the YAMAT-seq data lies in detecting differences in tRNA proportions across samples. In our case, this means detected the effects of serCGA deletion and subsequent serTGA duplication on the mature tRNA pool of SBW25. Hence, as the reviewers also point out in Essential revision point 2, interpretations of the YAMAT-seq data should focus on relative proportions (and, hence, relative translation times) across strains. We have altered the YAMAT-seq results and model section to reflect these comments.

4) A number of factors relevant to understand translation rates and tRNA gene evolution are not discussed in sufficient depth, and as early in the manuscript as is necessary. For instance, codon usage doesn't feature in the Results section until much later. So the experimental results seem puzzling until it is clear that the non-essential tRNA gene actually recognizes a codon that is very abundant in the genome. In the Discussion section (subsection “Retention of serCGA in P. fluorescens SBW25 wild type”), selection due to codon bias should be considered as a 4th hypothesis for the retention of tRNA(CGA) (perhaps in combination with hypotheses 1 or 3). Prior work also shows that tRNA modifications can alter the accuracy and efficiency of translation (Grosjean et al., 2010; Bjork and Hagervall, 2014; Manickam et al., 2015). These details are mentioned in passing, but deserve more prominence because they are really critical to set expectations and interpret the results. The focal tRNA species are expected to be modified by cmoAB; if the Pseudomonas strain used here has this modification system, it could explain the observed results: modified tRNA(UGA) can compensate tRNA(CGA) function, whereas the other near cognate tRNAs cannot (it is unclear whether G-U wobble works when G is in the codon and U is in the anticodon). I suggest that the details about codon usage, gene copy numbers of all cognate and near-cognate tRNAs, and relevant modification systems should be presented and clearly discussed at the outset.

We agree with the reviewers, and have changed the layout of the manuscript to ensure early and clear introduction of the suggest concepts and their relevance to our experimental system. Specifically:

1) We have outlined the role of codon-tRNA matching patterns (and, hence, post-transcriptional modification enzymes) and codon use in the Introduction.

2) We have written a new first Results section (“The P. fluorescens SBW25 tRNA gene set”) outlining the key elements of our experimental system, and the relationships between them (e.g., seryl tRNA genes, serine codon use, codon matching patterns). We would like to thank the reviewers for the particularly helpful suggestions regarding the Cmo-mediated modification of tRNA-Ser(UGA), of which we were not previously aware.

5) We agree that the increase in tRNA(UGA) levels probably drove the large duplications observed during evolution. However, there are some points of concern here.

a) Given that the deletions were large, it would be useful to be able to estimate the contribution of tRNA(UGA) to increased fitness. Does deleting the duplicated tRNA(UGA) in evolved isolates reduce fitness, and by how much? Related to this, it was not clear whether there were any other mutations in the evolved lines, and their identity; e.g. were there any promoter mutations in the native copy of the tRNA(UGA) gene?

While we agree that deleting one serTGA copy would be a nice addition to our experiments, this particular genetic manipulation is unlikely to be successful. This is largely due to the instability of large-scale duplication fragments; similar fragments are lost at rates of 1 % in overnight culture (Anderson and Roth, 1977; Reams and Roth, 2015), and ongoing investigations indicate that our duplication fragments are no exception (unpublished data). Duplication fragment loss typically occurs via RecA-mediated, homologous recombination (Reams and Roth, 2015) – the same process that would hypothetically be harnessed during the genetic engineering process to replace a serTGA copy with a ~1 kb deletion fragment. Successfully targeting the ~100 bp serTGA without affecting the remainder of the ~45 kb duplication fragment is highly unlikely. Even in the event of successful manipulation, the desired engineered strain would be expected to quickly lose the remainder of the duplication fragment during growth (given that loss of duplication fragments occurs at high rates, and no selective advantage is expected in the absence of the second serTGA copy).

With the above in mind, we concentrated our resources on demonstrating that an increase in serTGA expression can indeed compensate for serCGA loss. Expression of serTGA from the pSXn plasmid improves growth of the serCGA deletion mutant and, importantly, does not affect the growth of the wild type strain (Figure 3C). This result is central to our work, because it provides empirical evidence that codon UCG actually can be translated by tRNA-Ser(UGA) (Figure 1B), and hence can compensate for tRNA-Ser(CGA) loss. However, we do agree with the reviewers that other genes within the 45 kb duplication fragment could still contribute via unknown mechanisms to the compensatory effect. To reflect this, we have amended the relevant Results section: “…the result that pSXn-based serTGA expression specifically improves the growth rate of ?serCGA demonstrates that serTGA can provide a degree of compensation for serCGA loss. Other genes in the shared 45 kb fragment may nevertheless contribute to compensation, via unidentified mechanisms.”.

Regarding mutations other than the large-scale duplications: only one additional putative mutation was identified among the mutant isolates: M2-L contains a synonymous mutation in carbohydrate metabolism gene edd (in addition to carrying a 192 kb duplication fragment). This mutation is not expected to have major effects on our translational system because (i) it is a synonymous change to an arginine (i.e. not a serine) codon, resulting in no change to the Edd protein sequence, and (ii) it is not found in any other isolate, including M2-Lop that was isolated from the same population at the same time point. This information is now explicitly included in the Results. A number of intergenic differences in homopolymeric tracts and other repeats were called in all/most isolates (mutant and wild type); these have also been previously called in other, unrelated evolution experiments (i.e., they are either mistakes in the published SBW25 genome sequence, or are present in our stored wild type strain) (Beaumont et al., 2009; Gallie et al., 2019). A comprehensive list of all differences called by breseq (Deatherage and Barrick, 2014) and Geneious (v11.1.4), and accompanying descriptions, is in Supplementary file 3.

b) What is the level of overexpression of the tRNA gene on the plasmid (Figure 4)? If this is much more than the 2-fold increase due to gene duplication, it means that we do not know if a 2-fold increase is sufficient to increase fitness. On a related note, I could not see information on the sensitivity of the tRNA-seq method (I might have missed this); this is necessary to know how much confidence to place in the measured fold change values.

Regarding the pSXn plasmid: we do not know the exact level of expression from the pSXn plasmid. We can say that the inserted tRNA gene is placed under IPTG-inducible expression from a strong (tac) promoter (de Boer et al., 1983). We do not add IPTG at any stage of our experiments, meaning that we expect only low level, leaky expression from the repressed promoter (Owen and Ackerley, 2011). However, as the reviewers point out, this expression is likely to be higher than a two-fold increase. We have made this explicit in the interpretation of our results: “While it should be noted that tRNA-Ser(UGA) levels resulting from pSXn-based expression are likely to exceed those resulting from an additional chromosomal copy of serTGA, the result that pSXn-based serTGA expression specifically improves the growth rate of ?serCGA demonstrates that serTGA can provide a degree of compensation for serCGA loss.”. We have also included more details on the plasmid, and construction process (new Materials and methods section “Expression of tRNA genes from the pSXn plasmid”).

To assist with interpreting the strength of the YAMAT-seq assay, we have provided more details (e.g., raw read numbers, the numbers of reads aligning to each tRNA types) within the main text (Results) and Materials and methods. Here, we would like to highlight the considerable weight of the YAMAT-seq data. Firstly, this is a large data set that is consistent across replicates; each of the nine strains were assayed in three biological replicates, and no major differences were seen between tRNA proportions in among replicates (Supplementary file 7). Secondly, a robust statistical analysis that corrects for multiple comparisons was employed to detect differences in relative tRNA expression levels between strains (DESeq2; Anders and Huber, 2010; Love et al., 2014). Importantly, DESeq2 control comparisons revealed no significant differences in tRNA expression levels (i.e., no differences were detected between SBW25 and the Day 13 wild type lineage isolate, or between the two biological replicates of the serCGA deletion strain; Figures 5A, 5B). Thirdly, the differences detected were consistent: serCGA deletion results in (permanent) loss of tRNA-Ser(CGA), and a rise in tRNA-Ser(UGA) was the only consistently significant difference upon duplication of serTGA. Overall, we have confidence in the differences detected in tRNA expression levels between strains. To what extent smaller changes are not detected as significant, we cannot currently say.

[Editors' note: further revisions were suggested prior to acceptance, as described below.]

Revisions:

1) Please specify your assumption that all tRNAs are fully charged, maybe with reference that this is not always the case and description of what YAMAT-Seq measures. We are not quite satisfied with the response to Essential revision point 3 where the authors were asked to explain and justify the assumptions made to be able to say that the translation rate of a codon is determined solely by the proportion of anticodon in the tRNA pool. In the revised version the envisaged translation system is explained more clearly and key assumptions, such that EF-Tu binds all types of tRNA with equal affinity, are explicitly stated. However, the process of charging of tRNAs by aminoacyl-tRNA synthetases/tRNA-ligases and assumptions about charging levels of tRNAs remains incompletely considered. The authors only refer to mature tRNAs in the text, but I am not sure if this includes both charged (aa-tRNA) and uncharged tRNAs. This leads to a number of questions about experimental data, assumptions and what is included in the model:

a) Does YAMAT-seq measure both charged and uncharged tRNAs?

The YAMAT-seq procedure measures both charged and uncharged mature tRNAs. Specifically, the YAMAT-seq procedure is as follows (Shigematsu et al., 2017; Warren et al., 2020):

1) Total RNA is isolated from growing cells. This includes mRNA, rRNA, and small RNAs such as tRNAs (including pre-tRNAs, mature uncharged tRNAs, and mature charged tRNAs…).

2) Total RNA is treated with Tris-HCl (pH 9.0), removing the amino acids from charged tRNAs. The end result is that (most) mature tRNAs are now uncharged.

3) Y-shaped, DNA/RNA hybrid adapters (Author response image 1A) and T4 RNA ligase 2 are added. The adapters ligate specifically to nucleic acid molecules with a single-stranded, exposed 5'?NCCA?3' end in close proximity with a 3'?inorganic phosphate-5' (i.e., mature, uncharged tRNAs) (Author response image 1B). They do not bind efficiently to (for example) mature charged tRNAs, pre-tRNAs, tRNAs with damaged 5'?NCCA?3' or 3'?Pi?5' ends, or tRNA fragments.

4) Adapter-tRNA complexes are reverse transcribed (RT primer; Illumina), products PCR-amplified (F and R primers; Illumina) and purified from a native polyacrylamide gel.

5) The pool of DNA molecules is deep sequenced with Illumina.

Author response image 1

To clarify the above, a specific statement about charged and uncharged mature tRNAs has been included in the YAMAT-seq results, and some details have been added to the YAMAT-seq method section.

b) Do you assume that all tRNAs are fully charged? This might be reasonable in minimal media (for example Kimberly A Dittmar, Michael A S?rensen, Johan Elf, M?ns Ehrenberg, Tao Pan. Selective charging of tRNA isoacceptors induced by amino-acid starvation. EMBO Rep 2005 Feb;6(2):151-7 and references therein).

The reviewers make a very good point; our model uses our YAMAT-seq data – which measures both charged and uncharged mature tRNAs (see point 1a above) – while assuming that all mature tRNAs are charged.

Charged versus uncharged mature tRNA proportions have been systematically measured for all E. coli tRNA species in two different studies (Avcilar-Kucukgoze et al., 2016; Dittmar et al., 2005). These studies collectively report that in nutrient poor media, all tRNA species are fairly uniformly charged at ~60-90 %. However, during exponential growth in rich medium (LB), charging levels are reported to vary considerably between tRNA species (Avcilar-Kucukgoze et al., 2016). In particular, the four seryl-tRNAs were reported to show very low charging levels (~10 %) under these conditions (see their Figure 1A).

We find this result somewhat surprising. Uncharged tRNAs interact with RelA, a ribosome-associated protein that brings uncharged tRNAs to the ribosomal A site (Winther et al., 2018). In this respect, uncharged tRNAs compete with charged tRNAs (in the form of ternary complexes) for occupation of the ribosomal A site; uncharged tRNAs slow elongation or, at higher levels, activate the stringent response (Goldman and Jakubowski, 1990). The stringent response involves a global shift in gene expression, ultimately resulting in slower growth and division (reviewed in Hauryliuk et al., 2015). In light of the need for rapid growth and division, it seems counterintuitive that uncharged tRNAs would exist in high proportions during exponential growth in rich medium. While Avcilar-Kucukgoze et al. do not discuss the low seryl-tRNA charging levels with respect to the stringent response, we note that E. coli MC4100 – the strain in which the majority of their measurements were made – carries the relA1 mutation, which effectively eliminates uncharged tRNA-based activation of the strignent response (Metzger et al., 1989).

Clearly, a charging rate of only 10 % would violate our assumption that all mature seryl-tRNAs are charged. Reassuringly however, under rapid growth conditions all within-family tRNA types (i.e., those carrying the same amino acid) show similar charging levels under all conditions measured so far (Avcilar-Kucukgoze et al., 2016; Dittmar et al., 2005). If similar, consistently low charging levels exist for SBW25 seryl-tRNAs during YAMAT-seq, our general conclusions are expected to hold given that the model uses only seryl-tRNA proportions (which, if flawed, are presumably consistently so) to estimate elongation times.

We have added an explicit statement of our assumption and its limitations that all tRNAs are charged in the model section of the Results, and outlined why this is a limitation of the model.

The focus on serine is potentially problematic. Serine is one of the most toxic amino acids and it is possible that the reduction in growth rate in the deletion mutants are mainly due to this toxicity rather than a reduced translation rate, which would provide an alternative explanation for why the effect on growth is much smaller in minimal media. The proteose peptone 3 used in KB medium is high in serine (about 12% of total amino acids) suggesting that this might cause toxicity due to the inability of L-serine-deaminase to degrade excess serine after a reduction in ser-tRNA concentration. Addressing this in the final manuscript would make readers aware of this issue, even while pointing out that this explanation may be insufficient because adding a plasmid with the amplified tRNA increases fitness.

If we have understood correctly, the reviewers are suggesting that serCGA deletion may cause an increase in free intracellular serine, the toxicity of which could become problematic in the presence of high amounts of extracellular serine. This could provide an alternative explanation for why the ?serCGA growth defect is seen in KB (serine rich) and not M9 (serine poor) medium. The toxicity of serine is an aspect of our work that we had not fully considered, and we thank the reviewers for the suggestion.

On consideration, we think that while serine toxicity may contribute to the growth defect in KB medium, it is unlikely to account for the entire effect. This is due to three reasons:

1) Excess serine is highly toxic and thus tightly regulated (Avcilar-Kucukgoze et al., 2016; Kriner and Subramaniam, 2019). Studies of the three L-serine deaminases in E. coli show Michaelis constants (Km) in the millimolar range (Burman et al., 2004; Cicchillo et al., 2004), meaning that these L-serine deaminases have a high affinity for L-serine. This is consistent with L-serine deaminases playing a role in preventing serine excess (Zhang and Newman, 2008). Indeed, serine is depleted from rich medium much more rapidly than any other amino acid during bacterial growth (Zhang et al., 2010).

Like E. coli, P. fluorescens SBW25 is predicted to encode three L-serine deaminases (sdaA1/pflu1035, tdcG/pflu4898, sdaA2/pflu5679), and a transmembrane L-serine transporter (sdaC/pflu1034). Hence, SBW25 seems adequately equipped to deal with the (presumably relatively small) amounts of excess serine that may be produced upon the elimination of tRNA-Ser(CGA), a tRNA that accounts for ~1.5 % of the SBW25 tRNA pool (see Supplementary file 7). Notably, none of the L-serine deaminases or the serine transporter are found within any of the duplication fragments (see Supplementary file 4).

2) While serine is indeed toxic for E. coli, it is possible to delete all three genes encoding L-serine deaminases and grow the triple mutant in serine rich medium (Zhang and Newman, 2008). Cells of the triple mutant grown in this manner are deformed during the early stages of growth, indicating effects of serine accumulation on cell growth and division (Zhang et al., 2010; Zhang and Newman, 2008). The misshapen cells have been proposed to result from problematic cell wall synthesis, possibly through the mis-incorporation of serine in the place of alanine in peptidoglycan (Parveen and Reddy, 2017).

We have grown P. fluorescens SBW25 and our serCGA deletion mutants in serine rich (KB) and poor (M9) media, and compared cell phenotypes at various stages of growth (see new Figure 2—figure supplement 1). We have seen some intriguing cell phenotypes that we plan to investigate further – namely, very elongated long cells in the ?serCGA mutants during growth in KB. However, these phenotypes differ from those reported by Zhang et al. for high-level serine toxicity in E. coli cells (Zhang and Newman, 2008).

3) Finally, recent experiments in our laboratory have shown broadly similar results to those reported here with a second, non-seryl tRNA type. Reduction of this second (glycyl) tRNA type leads to a fitness defect that is compensated by large, tRNA-encompassing duplications in a distinct region of the SBW25 chromosome. While the new work is preliminary and ongoing, it strongly indicates that our results are not limited to seryl-tRNAs.

To summarize: due to the combination of the three reasons above, we think it unlikely that serine toxicity is the primary cause of the ?serCGA fitness defect in KB medium. However, serine toxicity may of course still contribute to the observed phenotypes. To reflect this development, we have included a figure of the cell morphologies (Figure 2—figure supplement 1), and a new paragraph in the relevant Results section.

2) Related to this point, in rich media it has been seen that charging levels for serine tRNAs can be very low at below 10% although serine concentration is high (Avcilar-Kucukgoze et al., 2016). This work is new to this reviewer, but this might be of interest to the authors and readers of the article.

Thank you for bringing this article to our attention! We have discussed the work in relation to our YAMAT-seq results (see point 1b above).

References

Burman JD, Harris RL, Hauton KA, Lawson DM, Sawers RG. 2004. The iron-sulfur cluster in the L-serine dehydratase TdcG from Escherichia coli is required for enzyme activity. FEBS Lett576:442–444. doi:10.1016/j.febslet.2004.09.058

Cicchillo RM, Baker MA, Schnitzer EJ, Newman EB, Krebs C, Booker SJ. 2004. Escherichia coli L-serine deaminase requires a [4Fe-4S] cluster in catalysis. J Biol Chem279:32418–32425. doi:10.1074/jbc.M404381200

Dittmar KA, S?rensen MA, Elf J, Ehrenberg M, Pan T. 2005. Selective charging of tRNA isoacceptors induced by amino-acid starvation. EMBO Rep6:151–157. doi:10.1038/sj.embor.7400341

Goldman E, Jakubowski H. 1990. Uncharged tRNA, protein synthesis, and the bacterial stringent response. Mol Microbiol4:2035–2040. doi:10.1111/j.1365-2958.1990.tb00563.x

Hauryliuk V, Atkinson GC, Murakami KS, Tenson T, Gerdes K. 2015. Recent functional insights into the role of (p)ppGpp in bacterial physiology. Nat Rev Microbiol13:298–309. doi:10.1038/nrmicro3448

Metzger S, Schreiber G, Aizenman E, Cashel M, Glaser G. 1989. Characterization of the relA1 mutation and a comparison of relA1 with new relA null alleles in Escherichia coli.J Biol Chem264:21146–21152.

Ott G, Schiesswohl M, Kiesewetter S, F?rster C, Arnold L, Erdmann VA, Sprinzl M. 1990. Ternary complexes of Escherichia coli aminoacyl-tRNAs with the elongation factor Tu and GTP: Thermodynamic and structural studies. BBA – Gene Struct Expr1050:222–225. doi:10.1016/0167-4781(90)90170-7

Parveen S, Reddy M. 2017. Identification of YfiH (PgeF) as a factor contributing to the maintenance of bacterial peptidoglycan composition. Mol Microbiol105:705–720. doi:10.1111/mmi.13730

Plant EP, Nguyen P, Russ JR, Pittman YR, Nguyen T, Quesinberry JT, Kinzy TG, Dinman JD. 2007. Differentiating between near- and non-cognate codons in Saccharomyces cerevisiae. PLoS One2:e517. doi:10.1371/journal.pone.0000517

Varenne S, Buc J, Lloubes R, Lazdunski C. 1984. Translation is a non-uniform process. Effect of tRNA availability on the rate of elongation of nascent polypeptide chains. J Mol Biol180:549–576. doi:10.1016/0022-2836(84)90027-5

Winther KS, Roghanian M, Gerdes K. 2018. Activation of the stringent response by loading of RelA-tRNA complexes at the ribosomal A-site. Mol Cell70:95–105. doi:10.1016/j.molcel.2018.02.033

https://doi.org/10.7554/eLife.57947.sa2

## Article and author information

### Author details

1. #### G?k?e B Ayan

Department of Evolutionary Theory, Max Planck Institute for Evolutionary Biology, Pl?n, Germany
##### Contribution
Resources, Data curation, Software, Formal analysis, Validation, Investigation, Visualization, Methodology, Writing - review and editing
##### Competing interests
No competing interests declared
2. #### Hye Jin Park

1. Department of Evolutionary Theory, Max Planck Institute for Evolutionary Biology, Pl?n, Germany
2. Asia Pacific Center for Theoretical Physics, Pohang, Republic of Korea
##### Contribution
Formal analysis, Visualization, Methodology, Writing - review and editing
##### Competing interests
No competing interests declared
3. #### Jenna Gallie

Department of Evolutionary Theory, Max Planck Institute for Evolutionary Biology, Pl?n, Germany
##### Contribution
Conceptualization, Resources, Data curation, Software, Formal analysis, Supervision, Validation, Investigation, Visualization, Methodology, Writing - original draft, Project administration, Writing - review and editing
##### For correspondence
gallie@evolbio.mpg.de
##### Competing interests
No competing interests declared

### Funding

#### Max Planck Society

• G?k?e B Ayan
• Hye Jin Park
• Jenna Gallie

The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.

### Acknowledgements

This work was supported by the Max Planck Society (all authors). The authors thank Gunda Dechow-Seligmann for technical assistance and Sven Künzel for assistance with troubleshooting the YAMAT-seq protocol. The?authors?also thank Frederic Bertels for discussions during development of the YAMAT-seq analysis, and the gift of the pSXn plasmid. The?authors thank the anonymous reviewers, Frederic Bertels, Anuradha Mukherjee, Devika Bhave, and Bilal Haider, for their?helpful comments on the manuscript.

### Senior Editor

1. Patricia J Wittkopp, University of Michigan, United States

### Reviewing Editor

1. Vaughn S Cooper, University of Pittsburgh, United States

### Publication history

2. Accepted: October 29, 2020
3. Accepted Manuscript published: October 30, 2020 (version 1)
4. Version of Record published: November 12, 2020 (version 2)

? 2020, Ayan et al.

## Metrics

• 749
Page views
• 129
• 0
Citations

Article citation count generated by polling the highest count across the following sources: Crossref, PubMed Central, Scopus.

A two-part list of links to download the article, or parts of the article, in various formats.

### Open citations (links to open the citations from this article in various online reference manager services)

1. Evolutionary Biology
2. Microbiology and Infectious Disease

# Dynamically evolving novel overlapping gene as a factor in the SARS-CoV-2 pandemic

Chase W Nelson et al.
Research Article Updated

Understanding the emergence of novel viruses requires an accurate and comprehensive annotation of their genomes. Overlapping genes (OLGs) are common in viruses and have been associated with pandemics but are still widely overlooked. We identify and characterize ORF3d, a novel OLG in SARS-CoV-2 that is also present in Guangxi pangolin-CoVs but not other closely related pangolin-CoVs or bat-CoVs. We then document evidence of ORF3d translation, characterize its protein sequence, and conduct an evolutionary analysis at three levels: between taxa (21 members of Severe acute respiratory syndrome-related coronavirus), between human hosts (3978 SARS-CoV-2 consensus sequences), and within human hosts (401 deeply sequenced SARS-CoV-2 samples). ORF3d has been independently identified and shown to elicit a strong antibody response in COVID-19 patients. However, it has been misclassified as the unrelated gene ORF3b, leading to confusion. Our results liken ORF3d to other accessory genes in emerging viruses and highlight the importance of OLGs.

1. Evolutionary Biology
2. Genetics and Genomics

# Innovation of heterochromatin functions drives rapid evolution of essential ZAD-ZNF genes in Drosophila

Bhavatharini Kasinathan et al.
Research Article

Contrary to dogma, evolutionarily young and dynamic genes can encode essential functions. We find that evolutionarily dynamic ZAD-ZNF genes, which encode the most abundant class of insect transcription factors, are more likely to encode essential functions in Drosophila melanogaster than ancient, conserved ZAD-ZNF genes. We focus on the Nicknack ZAD-ZNF gene, which is evolutionarily young, poorly retained in Drosophila species, and evolves under strong positive selection. Yet we find that it is necessary for larval development in D. melanogaster. We show that Nicknack encodes a heterochromatin-localizing protein like its paralog Oddjob, also an evolutionarily dynamic yet essential ZAD-ZNF gene. We find that the divergent D. simulans Nicknack protein can still localize to D. melanogaster heterochromatin and rescue viability of female but not male Nicknack-null D. melanogaster. Our findings suggest that innovation for rapidly changing heterochromatin functions might generally explain the essentiality of many evolutionarily dynamic ZAD-ZNF genes in insects.