# Investigating the trade-off between folding and function in a multidomain Y-family DNA polymerase

1. Department of Chemistry, State University of New York at Stony Brook, United States
2. Department of Biomedical Sciences, College of Medicine, Florida State University, United States
Research Article

## Abstract

The way in which multidomain proteins fold has been a puzzling question for decades. Until now, the mechanisms and functions of domain interactions involved in multidomain protein folding have been obscure. Here, we develop structure-based models to investigate the folding and DNA-binding processes of the multidomain Y-family DNA polymerase IV (DPO4). We uncover shifts in the folding mechanism among ordered domain-wise folding, backtracking folding, and cooperative folding, modulated by interdomain interactions. These lead to ‘U-shaped’ DPO4 folding kinetics. We characterize the effects of interdomain flexibility on the promotion of DPO4–DNA (un)binding, which probably contributes to the ability of DPO4 to bypass DNA lesions, which is a known biological role of Y-family polymerases. We suggest that the native topology of DPO4 leads to a trade-off between fast, stable folding and tight functional DNA binding. Our approach provides an effective way to quantitatively correlate the roles of protein interactions in conformational dynamics at the multidomain level.

## Introduction

Our understanding of protein folding has been deepened by intensive experimental, theoretical, and computational studies focused on single-domain proteins or isolated domains of multidomain proteins (Jackson, 1998). However, it is widely recognized that throughout all three kingdoms of life, proteins occur predominately in multidomain forms (Apic et al., 2001; Ekman et al., 2005). As their name indicates, multidomain proteins consist of more than one structural building unit, or domain (Teichmann et al., 1999). Domains themselves have a strong tendency to fold (Murzin et al., 1995), but although there is high structural modularity in a multidomain protein (Han et al., 2007), the folding of a multidomain protein usually takes a more complex form than a simple sum of folding of individual domains (Levy, 2017). The key component in the folding of a multidomain protein is the interaction of domain interfaces or linkers, which have been found to play nonuniversal roles in modulating folding stability (Fast et al., 2009; Bhaskara et al., 2013; Vishwanath et al., 2018), cooperativity (Batey et al., 2005) and kinetics (Osváth et al., 2005; Batey et al., 2006; Batey and Clarke, 2006).

Efficient folding of a multidomain protein is vital not only for providing structural scaffolds for biological function (Vogel et al., 2004), but also for preventing misfolding (Strucksberg et al., 2007). Multidomain proteins, which often possess significant domain interfaces, are more prone to aggregation during folding processes than single-domain proteins (Han et al., 2007; Borgia et al., 2011). It has been suggested that in vivo, multidomain proteins can undergo co-translational folding (Fedorov and Baldwin, 1997), where each domain folds sequentially one-by-one during protein synthesis from the ribosome (Netzer and Hartl, 1997; Frydman et al., 1999). Likewise, a ‘divide-and-conquer’ scenario has been proposed for in vitro multidomain protein folding, where all domains fold independently, followed by coalescence of neighbors (Wang et al., 2012a). In both of these folding scenarios, independent domain folding plays an essential role and is deemed to drive the global folding. At the same time, the role of domain coupling in multidomain protein folding appears to be important, but its complexity means that a definitive conclusion cannot easily be drawn (Batey et al., 2008). A recent computational study of a two-domain serpin elucidated the critical role of the functional binding-related reactive center loop (RCL) in the folding of the protein to distinct structures (Giri Rao and Gosavi, 2018). Folding of the serpin to the metastable active structure, where the RCL is present as an intradomain segment, is faster than folding to the stable latent structure, where the RCL is involved in extensive interactions between domains. Other work using a similar model, however, indicated that removal of interdomain interactions had little effect on the folding cooperativity of a three-domain adenylate kinase (Giri Rao and Gosavi, 2014). Using statistical mechanical models, Sasai and co-workers investigated a variety of multidomain proteins and their circular permutants. Their results showed that domain connectivity and interactions in multidomain proteins determine folding pathways, cooperativity, and kinetics (Itoh and Sasai, 2008; Inanami et al., 2014). However, at present, a unified perspective on the role of interdomain interactions is still missing. Addressing this issue is an important avenue in studies of multidomain protein folding.

From a structural perspective, the effects of neighboring domains in terms of interdomain interactions are fundamental for generating and stabilizing the correct multidomain folds (Jones et al., 2000; Bhaskara and Srinivasan, 2011). On the other hand, overwhelming interdomain interactions may distort domains from the structurally folded units, reducing the efficiency that comes with domain-wise folding. To achieve a ‘speed–stability’ balance, a multidomain protein may optimize the strength of interdomain interactions to simultaneously guarantee efficient folding through the ‘divide-and-conquer’ folding mechanism and successful formation of functional structures with the aid of stabilization from the domain interface. Usually, the relatively weak interdomain interactions trigger domain motions in multidomain proteins, making a pivotal contribution to protein function (Bennett and Huber, 1984; Schulz, 1991; Miyashita et al., 2003). As is now widely recognized, the native folds of proteins may exhibit a certain degree of frustration in favor of functional state switching (Ferreiro et al., 2007; Ferreiro et al., 2014; Whitford and Onuchic, 2015). Therefore, the energetics of interdomain interactions in multidomain proteins may be evolutionarily optimized for making the trade-off between fast, stable folding and efficient, tight substrate binding (Bigman and Levy, 2020). However, it is still unclear how a multidomain protein manages the intricate balance among its interactions to allow simultaneous folding and function. Here, we aim to answer this fundamental question through a computational study of the folding and DNA-binding processes of Sulfolobus solfataricus DNA polymerase IV (DPO4), a prototype Y-family DNA polymerase.

Akin to the other Y-family polymerases, DPO4 consists of a polymerase core with a right-handed architecture, including finger (F), palm (P), and thumb (T) domains, as well as a little figure (LF) domain that is connected to the polymerase core by a flexible linker (Figure 1A; Ling et al., 2001). Structural analysis has shown that many more intradomain contacts than interdomain contacts are present in the apo form of DPO4 (Wong et al., 2008; Appendix 2—table 1). Thermal unfolding experiments have indicated the existence of one intermediate state (Sherrer et al., 2012), which is probably formed by the unfolding of the linker interactions with the domains. These features imply that the four domains of DPO4, though differing in size and topology, are prone to fold independently. Binding of DPO4 to DNA is an essential step in nucleotide incorporation (Fiala and Suo, 2004; Wong et al., 2008). Structural comparisons of DPO4 in apo form and DNA binary form have revealed that significant rotation and translation of the LF domain occur during DNA binding, while the domain structures remain unchanged (Wong et al., 2008). This ‘open-to-closed’ conformational transition in DPO4 is pivotal to the formation of a high-affinity DPO4–DNA complex prior to nucleotide binding and incorporation (Fiala and Suo, 2004; Sherrer et al., 2009). Previous experimental and simulation studies have suggested that the linker plays an important role in this conformational transition (Xing et al., 2009; Sherrer et al., 2012; Chu et al., 2014). Nevertheless, a thorough investigation of the roles played by DPO4 domain interactions in the modulation of DNA binding and protein conformational dynamics is still lacking. More importantly, it remains unclear how the folding and DNA binding of DPO4 can be optimized by the interactions in DPO4.

Figure 1 with 1 supplement see all

Here, we use structure-based models (SBMs) with a comprehensive procedure for parameterizing the strengths of intra- and interdomain interactions to investigate the folding and DNA-binding processes of DPO4. We find a monotonic increase in folding stability led by a decrease in the interdomain interaction strengths for all intermediate states during folding but not the folded states. This further underlines the importance of interdomain connections in maintaining the correct fold. Interestingly, we find that strengthening the interdomain interactions can result in an increase in folding cooperativity and the chances of backtracking (Gosavi et al., 2006). The interplay among folding stability, backtracking, and cooperativity leads to a ‘U-shaped’ interdomain interaction-dependent kinetics. Finally, we quantitatively characterize the role of a flexible domain interface in accelerating the fast DNA (un)binding to DPO4, which probably promotes DNA lesion bypass during DNA synthesis undertaken by a Y-family enzyme. Our results suggest that the topology and interactions of DPO4 have been optimized to achieve its fast folding and tight DNA binding, plausibly by evolutionary pressure. Our systematic investigation of the interactions in a multidomain protein provides a mechanistic understanding of the relationship between protein folding and function at the multidomain level and offers useful guidance for multidomain protein engineering.

## Results

### Effects of interplay between intra- and interdomain interactions on DPO4 folding thermodynamics

SBMs are simplified models based on energy landscape theory (Bryngelson et al., 1995). They include only the interactions in the protein native structure and have proven successful in capturing protein folding mechanisms (Clementi et al., 2000). The essential assumption made by SBMs is that native contacts should determine the protein folding mechanism, which has been confirmed by all-atom simulations (Best et al., 2013). Further systematic comparisons between coarse-grained SBMs and all-atom simulations have also shown high consistency in extensive predictions for the folding of proteins with diverse native topologies (Hu et al., 2017). These features, together with the reliability and computational affordability of SBMs, indicate that they are attractive models for investigating protein folding, especially in the case of complex large proteins. Therefore, in this study, we use a coarse-grained SBM to simulate the folding of DPO4.

To improve the sampling, we use replica-exchange molecular dynamics (REMD) to explore the thermodynamics of DPO4 folding (Sugita and Okamoto, 1999). We apply the default parameter in the SBM, which sets the same strength for all the native contacts ($ρ=ρ0=1.0$; details are presented in Materials and methods). We project the folding free energy landscape onto the fraction of native contacts, $Q$, which has been recognized as a good reaction coordinate for describing folding of the single-domain proteins with two-state manner by means of SBMs (Best and Hummer, 2005; Cho et al., 2006). In addition, there are previous studies using $Q$ to describe the folding of various multidomain proteins (Li et al., 2012; Giri Rao and Gosavi, 2014; Inanami et al., 2014; Tanaka et al., 2015; Giri Rao and Gosavi, 2018), supporting the validity of $Q$ as a reaction coordinate for describing DPO4 folding. However, more precise and detailed description of the multidomain protein folding process may require the involvement of more reaction coordinates. Here, we use the interdomain and total contact $Q$ ($Q?(Inter)$ and $Q?(Total)$) to describe DPO4 folding (Figure 1A). The 2D free energy landscape reveals a complex DPO4 folding process with multiple intermediate states. The existence of these intermediate states in DPO4 folding is in good agreement with the results of temperature-induced unfolding experiments, which revealed more than one transition during the unfolding of DPO4 (Sherrer et al., 2012).

To see how DPO4 folds, we analyze the structures of DPO4 in the (meta)stable states indicated by free energy landscape minima from contact maps (Figure 1—figure supplement 1). This enables us to gain insight into domain and interface formation in DPO4 during folding. We see that $U$ and $N$ are the completely unfolded and fully folded states, with little and fully formed native contacts in DPO4, respectively; $I1$ is an intermediate folding state in which DPO4 has only an unfolded F domain; DPO4 in the $I2$ state further unfolds the T domain from $I1$; in its $I3$ and $I3′$ states, DPO4 possesses a similar $Q?(Total)$ but has a different $Q?(Inter)$. From the contact maps, we find that there is only one folded domain in DPO4 in the $I3$ or $I3′$ state: the LF domain in $I3$ and the P domain in $I3′$. Folding of the P domain can partly stabilize the formation of P–T and P–F interfaces within the DPO4 polymerase core (Silvian et al., 2001; Trincao et al., 2001), leading to an increase in $Q?(Inter)$. By contrast, folding of the LF domain does not trigger any interdomain formation, probably because the LF domain is separated by the flexible linker, far from the DPO4 core (Boudsocq et al., 2001; Ling et al., 2001). Overall, DPO4 folding complies with a ‘divide-and-conquer’ framework (Wang et al., 2012a). Such a folding mechanism obtained from the SBM with default parameter $ρ0$ (the same native contact strength) is probably a consequence of the DPO4 topology (Baker, 2000), which exhibits many more intradomain contacts than interdomain ones (Appendix 2—table 1).

To see how a change in the balance between intra- and interdomain interactions in DPO4 may influence folding, we modulate the relative strength of these interactions in the SBM through the ratio $ρ=?Inter/?Intra$, where $?Intra$ and $?Inter$ are the strengths of the native contacts for intra- and interdomain interactions, respectively. In practice, this is implemented by changing only $?Inter$ while keeping $?Intra$ and the other parameters to the default as they are in a homogeneously weighted SBM (Clementi et al., 2000). Using a reweighting method (Cao et al., 2016; Li et al., 2018), we quantify the free energy landscapes for DPO4 folding by different values ρ. In Figure 1B, we see that changing ρ within a moderate range (0.7–1.3) does not alter the multistate characteristics of DPO4 folding, since the values of $Q?(Total)$ for all the (meta)stable states remain almost the same as those at $ρ0$. Further decreasing or increasing ρ can distort the free energy landscape from that at $ρ0$. In general, this involves an alteration of the DPO4 folding mechanism when the strength of interdomain interactions deviates significantly from the default value.

Change in interactions in proteins, for example, by means of the mutations, pH, and denaturants, can affect folding stability. Likewise, modulation of the strength of the intra- and interdomain interactions may change the stability of different states during DPO4 folding. In Figure 1C, we can see that all the intermediate states of DPO4 exhibit monotonic decreases in folding stability as the interdomain interaction strengths increase ($Δ?Δ?F$ increases as ρ increases), although with different magnitudes. Since DPO4 in its intermediate states forms more intradomain contacts than interdomain ones, increasing the weight of interdomain interactions in the SBM is expected to weaken the folding stability in the intermediate states. DPO4 in the $I2$ state possesses two large-sized P and LF domains, which are distal structural units and do not have interactions in between (Appendix 2—table 1). Strengthening the interdomain interactions relatively weakens the intradomain interactions, thus leading to destabilization of the $I2$ state, which exhibits the most significant decrease among the intermediates as ρ increases. DPO4 in the $I1$ state is stabilized by both intra- and interdomain interactions (Figure 1—figure supplement 1). Thus, increasing the interdomain interaction strength has less of a destabilization effect than that on the $I3$ state, where only one domain in DPO4 is folded. An interesting nonmonotonic ρ-dependent behavior is observed in the $N$ state, where weakening the interdomain interactions (decreasing ρ below ～0.85) can instead lead to a decrease in folding stability. This is probably because the structures of DPO4 in the $N$ state are maintained by cooperation between intra- and interdomain interactions. Increasing the relative weight of intradomain interactions in the SBM (decreasing ρ) cannot always increase the stability of the $N$ state, since the fragile interdomain interactions may not be able to form domain interfaces within the DPO4 native structure.

### Effects of interplay between intra- and interdomain interactions on the DPO4 folding mechanism

The 2D free energy landscape for DPO4 folding with the default SBM parameter $ρ0$ indicates that there are at least two potential folding pathways that go separately from $I3$ and $I3′$ to $I2$ and then to $I1$ (Figure 1A). Increasing and decreasing the strength of the interdomain interactions appear to enhance one of these two pathways (Figure 2—figure supplement 1). Very weak ($ρ=0.5$) or very strong ($ρ=1.5$) interdomain interactions can shift DPO4 folding completely to one pathway (Figure 2A and Figure 2B), resonating with the findings from the 1D free energy landscape that a very high or low ρ can change the DPO4 folding mechanism. We further calculate the averaged $Q?(Inter)$ and $Q?(Total)$ and interestingly find that there are two regions exhibiting an increase followed by a decrease in $Q?(Inter)$ as $Q?(Total)$ increases (Figure 2C). This observation may be a sign of folding ‘backtracking,’ which involves the formation, breaking, and refolding of a subset of native contacts as the protein proceeds from the unfolded to the folded state (Gosavi et al., 2006). As the REMD simulations have broken the kinetic connectivity of the folding trajectory, we perform additional kinetic simulations at constant temperature to assess the backtracking rigorously. Although a temperature ($0.96?Tf$) lower than the folding temperature is used in the kinetic simulations with the aim of generating a sufficient number of folding events, we still observe the backtracking within many of the individual folding trajectories (Figure 2D, Figure 2—figure supplement 2, and Figure 2—figure supplement 3). Further analysis of the number of backtracking shows that backtracking is most probable when the interdomain interaction strength is higher than the default value in the SBM (Figure 2—figure supplement 4).

Figure 2 with 7 supplements see all

To address the origins of backtracking, we extract all the DPO4 structures where $Q?(Inter)$ shows a local maximum versus $Q?(Total)$ in the two backtracking regions and perform a structural analysis by calculating $Q$ for each domain in DPO4. We find that DPO4 in backtracking region I ($Q?(Total)=0.15$–0.35) probably has a folded T domain, while the other domains remain unfolded (bottom left in Figure 2E and Figure 2F). Since region I is located between the U and $I3$ states, we deduce that backtracking is caused mainly by folding and subsequent unfolding of the T domain. On the other hand, DPO4 in backtracking region II ($Q?(Total)=0.4$–0.7) has mainly either folded T and LF domains (top left in Figure 2E and Figure 2F) or a folded core formed from the F, T, and P domains (bottom right in Figure 2E and Figure 2F) during the transition from the $I3$ or $I3′$ state to the $I2$ and $I1$ states. The bimodal distribution of DPO4 structures in region II indicates that backtracking exists within both of the two folding pathways. Therefore, we suggest that the backtracking in DPO4 folding is led by unstable and fast domain folding and unfolding of the small-sized F and T domains (Figure 2—figure supplement 5).

Folding cooperativity measures how synchronous residues in a protein form native-like configurations during folding. It dictates the folding mechanisms of the single-domain proteins in two-state ‘all-or-none’ folding, folding with intermediates, and downhill folding (Mu?oz et al., 2016). Here, we apply a thermodynamic coupling index (TCI) to quantify the folding cooperativity of the domains and interfaces in DPO4. TCI is a measure of the similarity between a pair of intra- and interdomain melting curves during DPO4 unfolding (Sadqi et al., 2006; Sborgi et al., 2015). A large (small) TCI leads to a high (low) degree of synchronous folding between the domains/interfaces and thus a high (low) folding cooperativity (TCI is defined in Materials and methods). In Figure 2G, we can see that there is high cooperative folding in the DPO4 core. This is consistent with the experimental evidence that the conserved DPO4 core is a stable structural unit that unfolds cooperatively (Sherrer et al., 2012). When the interdomain interaction is weakened, DPO4 folding cooperativity decreases (Figure 2H). This trend is confirmed both when all domains/interfaces are taken into account and when only domains are considered. We also perform independent REMD folding simulations for individual domains of DPO4. This corresponds to the extreme case in which the interdomain interactions of the four domains of DPO4 are completely removed, reminiscent of the case $ρ=0$. We observe an increase in the mean TCI (MTCI) as ρ increases from 0 to 0.5, indicating that the presence of interdomain interactions enhances cooperative folding among the four domains of DPO4. Interestingly, the relation between ρ and MTCI is not entirely monotonic, since MTCI decreases slightly after ρ increases beyond ～1.0. This may be due to the following two reasons. (1) Increasing ρ tends to separate the folding curves of the unstable F domain and its associated F–P domain from the other parts of DPO4 (Appendix 3—figure 7). Removing the unfolding curves of the F domain and F–P interface when calculating MTCI can lead to a significant increase in its value as well as in the value of ρ at which MTCI reaches its maximum (Appendix 3—figure 8). (2) Increasing ρ decreases the relative intradomain contributions in an SBM and may have different magnitudes of destabilization of the folding of different domains. Such a decrease in MTCI with increasing ρ can be found in applications to independent folding of individual domains with a reweighting method (Appendix 3—figure 9). Overall, we find that a moderate increase in interdomain interaction strength can significantly increase the folding cooperativity of DPO4 (for values of ρ in the range 0.5–1.0).

### Effects of interplay between intra- and interdomain interactions on DPO4 folding kinetics

Previous studies have suggested that the topology of a protein’s native structure is an important determinant of its folding rate (Plaxco et al., 1998; Koga and Takada, 2001). Interaction heterogeneity, which originates from the amino acid sequence, has also been recognized as affecting folding kinetics (Islam et al., 2004; Calloni et al., 2003; Szczepaniak et al., 2019). To see how the energetic factor in terms of intra- and interdomain interactions in DPO4 affects the folding rate, we perform 100 independent kinetic simulations starting from different unfolded configurations at room temperature $Tr$ for each ρ. In practice, we use the time at which Q reaches 0.75, termed as the first passage time (FPT) to represent the folding rate. We observe a ‘U-shaped’ ρ-dependent mean FPT (MFPT) behavior for DPO4 global folding kinetics (Figure 3A). The value of ρ at which DPO4 global folding is fastest is 0.8, which is lower than the default parameter $ρ0$. There are distinct effects of ρ on the kinetics of domain and interface folding. Intuitively, weakening interdomain interactions (decreasing ρ) should promote domain folding (Figure 3B), since the increased folding stability of individual domains resulting from a relatively strengthening of intradomain interactions should accelerate the folding process (De Sancho et al., 2009; Garbuzynskiy et al., 2013). On the other hand, interdomain folding exhibits ‘U-shaped’ behavior with ρ, similar to global DPO4 folding. When interdomain interactions are weak (ρ in the range 0.5–0.9), the formation of interfaces between domains in DPO4 consumes more time than domain folding and thus is the rate-limiting step. Increasing ρ beyond 0.9 slows down interdomain formation. During DPO4 folding, domain interfaces use the folded domains as templates to proceed further. When ρ is high (≥1.0), corresponding to the case of weak intradomain interactions, domain folding is slow and may become the rate-limiting step (Figure 3B). Therefore, we investigate switching of the bottleneck for DPO4 folding between interface formation and domain folding through modulation of the intra- and interdomain interaction strengths.

Figure 3 with 3 supplements see all

In contrast to domain folding, we find that formation of domain interfaces and the linker show different ρ-dependent behaviors (Figure 3C). The formation of the T–LF interface and the linker are significantly accelerated when ρ increases from 0.5 to 0.8. These two interdomain structures are critical to DPO4–DNA functional binding, where a large-scale ‘open-to-closed’ conformational transition with rearrangement of the T–LF interface and the linker has been observed between apo-DPO4 and DNA-binding DPO4 (Wong et al., 2008). The kinetic result implies that the DNA-binding dynamics of DPO4 may be facilitated at $ρ≤0.8$, where folding of DPO4 to the stable apo form is slowed down.

The folding order of elements in a protein dictates the kinetic folding pathway. Experimental determination of the domain folding order in DPO4 is challenging (Sherrer et al., 2012). To measure the folding order of the domains/interfaces and establish its connection to the interactions in DPO4, we calculate the probability of folding domain/interface $I$ in step $k$, termed the folding order probability $OPkI$, for different values of ρ (Figure 3D–F). We find a high chance of folding the domains prior to forming the domain interface when ρ is very small ($ρ=0.5$, Figure 3D). In particular, domain folding at $ρ=0.5$ approximately follows a deterministic route with T → F → LF → P. Increasing ρ leads to more dispersed distributions of $OPkI$ (Figure 3E and F and Figure 3—figure supplement 1). The degree of dispersion of the $OPkI$ distribution is quantified by calculating its standard deviation $σOPk$ (Figure 3G). We observe a significant decrease in $σOPk$ with increasing ρ, in particular when $ρ≤0.9$. This indicates that the orders of folding and formation of the domains and interfaces become less deterministic as ρ increases, leading to more diverse folding pathways.

We also investigate the effects of temperature on folding kinetics by analyzing kinetic simulations performed at a temperature $0.96?Tf$, which is the optimal temperature for growth of Sulfolobus solfataricus (Figure 2). We find that most of the results obtained at this relatively high temperature, such as the ‘U-shaped’ ρ-dependent folding time and the more (less) deterministic folding order for lower (higher) ρ, are similar to those at room temperature $Tr$. Interestingly, we observe a plateau in MFPT for the range of ρ from 1.2 to 1.5. This is probably due to the complex ρ-dependent behaviors of intra- and interdomain folding kinetics when ρ is high, although the optimum value of ρ for achieving the fastest folding is still 0.8, which is less than the default parameter of the SBM. In addition, the folding order is not the same as that at $Tr$. The LF and P domains probably accomplish folding within the first two steps of DPO4 folding at $0.96?Tf$ when ρ is low (Figure 3). This may contribute to the elimination of backtracking when domains in DPO4 have a strong tendency to fold spontaneously (low ρ).

### Effects of interplay between intra- and interdomain interactions on the DPO4–DNA binding function

As a DNA polymerase (Ohmori et al., 2001), DPO4 synthesizes DNA molecules by assembling nucleotides. An essential step in the action of DPO4 is its DNA-binding process. We investigate the effects of the interactions in DPO4 on its function in terms of DNA binding. To describe DPO4–DNA binding, we construct a ‘double-basin’ SBM by adding to the original SBM the F–LF interdomain contacts in DPO4 and the DPO4–DNA native contacts identified in the crystal of the DPO4–DNA binary structure using a similar protocol proposed previously (Whitford et al., 2007; Wang et al., 2012b; Chu et al., 2014). The ‘double-basin’ SBM takes into account the effects of DNA binding and aims to capture the large-scale ‘open-to-closed’ conformational transition in DPO4 during DNA binding. Here, we assess the efficiency and effectiveness of the DPO4–DNA binding process in terms of thermodynamic stability and kinetic rates.

We use umbrella sampling techniques to calculate the free energy landscape of DPO4–DNA binding for values of ρ ranging from 0.5 to 1.5 (details are presented in Materials and methods and Appendix 1). We find an increase in binding affinity with strengthening interdomain interactions (increasing ρ), and the value at $ρ=0.7$ (simulated $Kd$ = 2.24 nM) is approximately equal to the experimental measurement of 3–10 nM (Figure 4A, details of $Kd$ calculation are presented in Appendix 1) (Fiala and Suo, 2004; Maxwell and Suo, 2012; Raper et al., 2016). This indicates that the interdomain interactions help to stabilize the DPO4–DNA complex. For all ρ, we observe a similar multistate DPO4–DNA binding process, which we divide into four binding states (Figure 4B): the completely unbound states (US) where DPO4 and DNA are widely separated ($dRMS>10.0$?nm), the encounter complexes (EC) where DPO4 initiates DNA binding ($2.5?nm?nm, with only transient native contacts formed), the intermediate binding states (IS) with $dRMS～2.5$?nm, and the bound states (BS) with $dRMS～0.1$?nm. The stabilities of both the BS and IS compared with the US are enhanced by increasing the strength of the interdomain interactions, as evidenced by the free energy landscapes. Besides, we find that during binding, DPO4 itself can have three different forms: apo-DPO4, DNA binary DPO4, and intermediate DPO4 (Figure 4—figure supplement 1). DPO4 in the last form has broken the LF interdomain interactions with an extended linker, serving as a conformational intermediate between apo-DPO4 and binary DPO4. In the IS, DPO4 can interconvert between the apo and intermediate forms. Increasing ρ leads to a higher population of the apo form (Figure 4—figure supplement 1). In the BS, DPO4 is in binary form, although large conformational fluctuations can be observed (Figure 4—figure supplement 1). The structural characteristics of DPO4 in the IS and BS indicate that a large-scale conformational transition of DPO4 from apo and intermediate forms to binary form should occur during binding from the IS to BS.?

Figure 4 with 5 supplements see all

To see how the interactions in DPO4 modulate the kinetics of DNA binding, we focus on the individual transitions between neighboring states during the binding process (Figure 4B). The transition from the US to the EC can be described approximately as a free diffusion of DPO4 to DNA molecules in the bulk. The diffusion coefficient thus controls the kinetics of this process. We perform simulations of free DPO4 in an infinite box for different values of ρ and then calculate the corresponding diffusion coefficient D (Figure 4C and Figure 4—figure supplement 2). Although for low ρ, DPO4 exhibits excessive conformational flexibility (Figure 4—figure supplement 3), which is assumed to increase the hydrodynamic radius of the chain (Huang and Liu, 2009), we find little effect of the structural fluctuations on D. Our results indicate that the free spatial diffusion of a folded large multidomain protein is barely affected by interfacial domain formation. Therefore, the capture rate of DNA by DPO4 in the first step of the binding process should be similar for different interaction strengths of DPO4, leading to ρ-independent observations.

Spatial proximity between DPO4 and DNA does not guarantee a successful transition from the EC to the IS, since DPO4 can dissociate from DNA through thermodynamic fluctuations. To investigate how the interactions of DPO4 influence the transition from the EC to the IS, we perform 200 independent kinetic simulations for different values of ρ. Each simulation starts from random configurations of DPO4 and DNA in the EC and ends when DPO4 binds with the DNA as the IS or when DPO4 completely dissociates from the DNA as the US. We calculate the rates associated with the EC, $KEvo$ and $KDis$, using a kinetic framework proposed previously (Huang and Liu, 2009). The encounter time, which is defined as $KDis/KEvo+1$, measures the time for DPO4 to achieve one successful transition from the EC to the IS. We find that the encounter time increases significantly as ρ increases from 0.5 to 0.7 and then becomes more or less constant for $ρ>0.7$ (Figure 4D). This indicates that the weakly formed and flexible domain interface of DPO4 facilitates the DNA-binding process. We further find a strong correlation between the degree of conformational fluctuation of DPO4, quantified by the root-mean-square fluctuation RMSF, and DNA-binding encounter times. These results confirm the roles of domain interfacial fluctuations of DPO4 led by low ρ in promoting DNA binding.

Direct simulations of the transitions between the IS and the BS are expected to be computationally demanding owing to the high barriers between these two states. Instead, we calculate the kinetic rates by performing so-called frequency-adaptive metadynamics simulations (Wang et al., 2018), for which the computational expense is significantly lower (details are presented in Materials and methods and Appendix 1). We find that the transition times calculated from the metadynamics simulations are strongly correlated with the barrier heights measured by the umbrella sampling simulations for the transitions between the IS and the BS (Figure 4—figure supplements 4 and 5). The consistency between thermodynamics and kinetics resonates with the findings of previous work, where a quantitative relationship between the barrier heights and binding rates in both ordered and disordered protein-binding processes has been established (Cao et al., 2016). We find a monotonic increase in the kinetic times for both of the transitions between the IS and the BS as ρ increases (Figure 4E). In particular, significant deceleration of the transition from the IS to the BS is found as the interdomain interactions in DPO4 become stronger. By contrast, the effect on the transition rates from the BS to the IS led by ρ appear to be minor. We find a strong ρ-dependent conformational distribution of DPO4 in the IS, where DPO4 in apo and intermediate forms is dominant for high and low ρ, respectively. The population of DPO4 in the apo form in the IS hinders the conformational dynamics of transformation of DPO4 to the DNA binary form and thus is disfavored in the binding transition to the BS. On the other hand, DPO4 is almost entirely in the DNA binary form in the BS (Figure 4—figure supplement 1). Escape from the BS should have little dependence on the conformational dynamics of DPO4. Therefore, the transition rates between the IS and the BS are dependent on DPO4 conformational dynamics, which is modulated by its inherent interactions. It is worth noting that when $ρ≥0.8$, the IS state, rather than the BS state, becomes the most stable in DPO4–DNA binding (Figure 4—figure supplement 4), and the transition time from the IS to the BS is much longer than that from the BS to the IS. Although there are quantitative discrepancies between the thermodynamic and kinetic results (Figure 4—figure supplement 5), our results lead to the conclusion that to achieve and maintain the conformation that underpins its biological function, DPO4 has to avoid strong interdomain interactions, even when they are purely native.

## Discussion

Thermal denaturation experiments revealed that unfolding of truncated DPO4 mutants, such as the DPO4 core and LF domains, proceeds via cooperative processes (Sherrer et al., 2009). Stopped-flow F?rster resonance energy transfer (FRET) studies monitoring intradomain (Raper and Suo, 2016) and interdomain (Xu et al., 2009; Maxwell et al., 2014) conformational dynamics of DPO4 during DNA binding as well as nucleotide binding and incorporation revealed weak and strong interactions between and within each domain of DPO4, respectively. Using only topological information, an SBM with homogeneously weighted native contacts predicted a ‘divide-and-conquer’ domain-wise folding scenario for DPO4 folding (Wang et al., 2012a;?Chu et al., 2020). These results suggested that domains in DPO4 can fold without any aid from other domains. Previous studies showed that many multidomain proteins have stable domains that can fold independently (Scott et al., 2002; Steward et al., 2002; Robertsson et al., 2005). From a structural perspective, these proteins exhibit a lack of densely packed domain interfaces, so their interdomain interactions should be minimal (Han et al., 2007). Nevertheless, the relatively weak interdomain interactions may still play an important role in the multidomain protein folding process, since many proteins with independently folding domains are found not to fold in a ‘sum of the parts’ manner (Levy, 2017). Here, we have investigated the effects of the interplay between intra- and interdomain interactions in DPO4 on the thermodynamics and kinetics of folding. The incorporation of strength heterogeneity into the intra- and interdomain contact interactions in an SBM has enabled a quantitative investigation into how DPO4 can modulate its inherent interactions to maximize folding efficiency.

We have characterized the critical role of interdomain interactions in controlling the continuum shift of the DPO4 folding mechanism in noncooperative unstable folding, fast folding, and highly cooperative ‘all-or-none’ folding (Figure 5, left panel). Folding of multidomain proteins resembles the binding of proteins (domains) to form complexes. Two extreme cases can be outlined. One is the docking of rigid domains, and the other is the concomitant folding and binding of domains, reminiscent of binding-coupled folding in intrinsically disordered proteins (IDPs) (Sugase et al., 2007; Dyson and Wright, 2002). Which of these two mechanisms is involved in multidomain protein folding should depend on the interplay between the folding tendency of domains and the binding strength between domains (Levy et al., 2004; Levy et al., 2005). For DPO4, we have found that a decrease in the interdomain interaction strength from that in the default SBM to a value of $ρ=0.8$ can lead to the fastest DPO4 folding rate. In the presence of weak interdomain interactions, the stability of the on-pathway folding intermediate states is enhanced. This can promote a deterministic folding order of the domains in DPO4 to help eliminate backtracking, which usually acts as a kinetic trap during folding (Capraro et al., 2008; Gosavi et al., 2008). However, if interdomain interactions are very weak, the folded states are destabilized, resulting in failed DPO4 folding. By contrast, strong interdomain interactions tend to couple domain folding and interface formation to fold the DPO4 as a whole. The ‘all-or-none’ DPO4 folding shows high cooperativity that disfavors the formation of intermediate states, so the folding order of DPO4 domains is not clearly defined. Besides, the domains in DPO4 in the presence of strong interdomain interactions are destabilized by the relatively weak intradomain interactions, which induces more backtracking. Collectively, the fast folding of DPO4 requires weak interdomain interactions, so that DPO4 can efficiently use the ‘divide-and-conquer’ strategy to fold via modest cooperativity and limited backtracking.

Figure 5

By investigating DPO4–DNA binding, we have addressed the roles of weak interdomain interactions of DPO4 in facilitating DNA binding. Our results show that weak interdomain interactions can induce massive conformational flexibility of DPO4 in favor of anchoring DPO4 to DNA (Figure 5, right panel). The result is reminiscent of the ‘fly-casting’ effect in the binding process (Shoemaker et al., 2000), where the roles of conformational disorder can be appreciated (Huang and Liu, 2009; Levy et al., 2007; Umezawa et al., 2016; Chu and Mu?oz, 2017). Structural analysis has revealed that DPO4 in the IS is not in the DNA binary form (BS in Figure 4B) and thus is functionally inactive. At the same time, DPO4 is almost correctly located at the primer–template junction of the DNA substrate. Therefore, the transition from the IS to the BS is mainly related to the large-scale ‘open-to-closed’ DPO4 conformational transition. Experimentally, the existence of switching from the nonproductive to the productive DPO4–DNA complex has also been observed by single-molecule fluorescence resonant energy transfer (FRET) (Brenlla et al., 2014; Raper et al., 2016) and has been proposed to be essential in completing the DPO4–DNA binding process (Raper et al., 2018). Here, we have found that a decrease in interdomain interaction strength can significantly accelerate the transition from the IS to the BS, thus favoring subsequent DNA replication. On the other hand, an effect of weak interdomain interactions in facilitating the transition from the BS to the IS has also been observed, but tends to be minor. As a prototype Y-family DNA polymerase, DPO4 is capable of catalyzing translesion synthesis (TLS) across various DNA lesions (Ohmori et al., 2001; Sale et al., 2012). Weak interdomain interactions can accelerate both of the transitions between the functionally inactive IS and the active BS. Thus, they promote the function of DPO4 as a polymerase for bypassing DNA damage during TLS, which would otherwise stall DNA synthesis in vivo by high-fidelity DNA polymerases (Kunkel and Bebenek, 2000). It is worth noting that in reality, both very weak and very strong interdomain interactions are not favored in DPO4–DNA binding. Very weak interdomain interactions lead to high conformational flexibility in DPO4 (Figure 4—figure supplement 3), resulting in very-low-affinity binding of DNA, which has been widely observed in IDPs (Wright and Dyson, 1999; Dunker et al., 2001). On the other hand, very strong interdomain interactions quench the conformational dynamics of DPO4, thereby slowing the transitions during the DNA-binding process and eventually populating the binding complex at the nonproductive IS rather than the productive BS.

We see that the default SBM with homogeneously weighted intra- and interdomain interactions can lead to many consistencies in describing the DPO4 folding and DNA–binding processes with experiments. The intermediate state caused by stable individual domain folding during DPO4 folding from the default SBM simulations resonates with the experimentally observed unfolding intermediate (Sherrer et al., 2012). In addition, simulations with the default SBM have described a multistate process for DPO4–DNA binding, in good agreement with the results of previous experiments, where the complexity of the processes, including multistep DNA binding and multistate DPO4 conformational dynamics, was revealed (Maxwell et al., 2014; Raper and Suo, 2016; Lee et al., 2017). These features are manifestations that the native topologies of DPO4 and the DPO4–DNA complex are major elements in determining the mechanisms of folding (Clementi et al., 2000) and binding (Levy et al., 2004), respectively. However, there is also evidence indicating that SBM with considering only the topological factors, may not accurately capture the DPO4 conformational dynamics. The SBM simulations with various parameterizations on interdomain interaction strength show that DPO4 folding is not the most efficient when the strengths of intra- and interdomain interactions are in the same weights. Furthermore, the default SBM has led to a significantly enhanced affinity of DPO4–DNA binding and stabilized more on the functionally inactive state IS rather than the functionally active state BS, in contradiction with the experiments. These facts imply that the default SBM fails to result in a fast-folding process of DPO4 and generate the correct functional DPO4–DNA binding. Considering both the DPO4 folding and the DPO4–DNA binding results, we have suggested that the weak interdomain interactions in DPO4 are the key to the trade-off between DPO4 folding and function. The value of ρ for achieving fast folding and efficient DNA (un)binding appears to be below 1.0. By assuming that DPO4 has structurally evolved in nature to optimize its folding and function, we suggest that the SBM with default parameter $ρ0$ overweights the strength of interdomain interactions in its potential. This suggestion seems to be reasonable since within the domains of DPO4, there are a large proportion of conserved hydrophobic residues (Ling et al., 2001), which should form stronger interactions inside domains than the ones between domains. We anticipate that the SBM can be further improved by incorporating the energetic heterogeneity of native contacts (Cho et al., 2009) or by benchmarking available experimental observations and all-atom simulations (Ganguly and Chen, 2011; Jackson et al., 2015).

One interesting observation from our simulations is that DPO4 folds with backtracking, which has been found to be related to the fast and unstable folding followed by the unfolding of the F and T domains. Structural analysis has revealed that the F and T domains are significantly smaller than the corresponding domains in high-fidelity DNA polymerases (Ling et al., 2001). Thus, the interactions between these two domains and DNA are significantly reduced, providing an explanation of the ability of DPO4 to accommodate and to bypass various DNA lesions (Ling et al., 2003; Vaisman et al., 2005). Here, we have suggested that backtracking can help DPO4 to perform TLS by using the structural fluctuations of the domains. This can be undertaken by DPO4 adjusting its binding conformation through opening the active site when it encounters a bulky DNA lesion, and such an adjustment may enhance the binding of the damaged DNA to the DPO4 (Mizukami et al., 2006). For example, DPO4 binds DNA containing benzo[a]pyrene-deoxyguanosine and allows the bulky lesion to be flipped/looped out of the DNA helix into a structural gap between the F and LF domains (Bauer et al., 2007). In addition, the structure of DPO4 with DNA containing 8-(deoxyguanosine-N2-yl)-1-aminopyrene (dG1,8) reveals that the dG moiety of the bulky lesion projects into the cleft between the F and LF domains of DPO4 (Vyas et al., 2015). These structural characteristics, differing from those of DPO4 binding to undamaged DNA, provide the dynamic basis for TLS and are probably favored by the fluctuating F and T domain conformations when binding to DNA with backtracking. However, we also note that when ρ is below 1.0 (weak interdomain interactions), the backtracking is not very populated, indicating limited fluctuations in the F and T domains. This may help to maintain the conformational rigidity of the F domain in the DPO4–DNA binding complex, which has been deemed to contribute to the low-fidelity DNA polymerization of DPO4 (Wong et al., 2008). Therefore, we speculate that DPO4 may optimize the extent of backtracking represented by the conformational fluctuations in the F and T domains to promote the binding of damaged DNA.

Our results have the important implication that DPO4 has naturally evolved to favor simultaneously folding and function. For folding, the high structural modularity in DPO4 has led to the high thermodynamic modularity that allows an efficient ‘divide-and-conquer’ mechanism by significantly reducing the number of configurations that need to be searched during folding (Wang et al., 2012a). Furthermore, the fastest folding of DPO4 is achieved when the interdomain interactions are weaker than they if they were determined simply by topology. This corresponds to the fact that the evolutionary pressure has acted on the DPO4 sequence to place more hydrophobic residues inside domains rather than on the domain interfaces (Ling et al., 2001). For DNA binding, weak interdomain interactions in DPO4 promote functional conformational transition through domain movement and facilitate all transitions throughout the DNA-binding process in favor of DNA replication and lesion bypass. Therefore, the natural evolution of DPO4 requires optimization of its protein sequence to form a structure composed of independently folded domains with hydrophobic cores to handle the folding and DNA-binding processes.

Our theoretical predictions can be potentially assessed by targeted biophysical experiments. In these, it should be straightforward to change the interactions in DPO4 via site-directed mutations and determine their effects on DPO4 folding and DNA binding. Although the positively charged linker has been targeted as one of the widely investigated mutation sites in DPO4 (Sherrer et al., 2012), we argue that ionic interactions can be nonspecific and long-ranged, which significantly changes the direct binding interactions with DNA (Ling et al., 2001). These features would lead to difficulties in delineating the effects of changes in internal interactions on the DPO4 folding and binding processes. In this context, more attractive mutations in DPO4 would be those that disrupt the hydrophobic interactions within the domains to mimic the effect of the weakening of intradomain interactions (i.e. relatively strengthening of interdomain interactions) in theoretical studies. Subsequently, the well-developed experimental approaches for DPO4 can be used to investigate the kinetics and mechanism of DPO4 folding, DNA binding, and nucleotide binding and incorporation. For instances, the melting circular dichroism (CD) spectroscopy used in our previous study for monitoring the temperature-dependent melting of wild-type DPO4 and its various truncation mutants can be easily adapted to examine the alterations of the folding cooperativity and folding order of DPO4’s domains by using site-directed mutagenesis and protein engineering (Sherrer et al., 2012). We expect to see a more cooperative folding of DPO4 with more synchronous melting curves of different DPO4 truncation mutants through weakening the intradomain interactions. In addition, the previously designed stopped-flow and single-molecule FRET experiments revealed a dynamical conformational equilibrium of DPO4 during DNA binding (Xu et al., 2009; Maxwell et al., 2012; Raper and Suo, 2016; Raper et al., 2016). These FRET-based techniques have further measured the kinetic rates of DPO4 interconversion between different states during DNA binding (Raper et al., 2018). The mutations designated to destabilize the intradomain interaction in DPO4 can potentially promote the effective interdomain interaction. Thus, according to our simulations, the conformational dynamics of DPO4 on DNA is expected to slow down because of shifting the equilibrium toward the catalytically incompetent complex. These expectations can be verified through future FRET experiments. Furthermore, the structural determination of DPO4 in complex with a damaged DNA substrate and an incoming nucleotide (Ling et al., 2001; Vaisman et al., 2005; Ling et al., 2004b; Bauer et al., 2007; Ling et al., 2004a; Vyas et al., 2015) can provide insights into the effects of backtracking caused by weakening intradomain interactions via mutations, which disrupt the structures of the ternary complexes.

Our modeling and simulations are applicable to various Y-family DNA polymerases, and we expect similar findings to be obtained for these. This expectation is based on the fact that all Y-family polymerases share structurally conserved architecture and sequence homology (Ling et al., 2001; Silvian et al., 2001; Trincao et al., 2001; Uljon et al., 2004). For polymerases in other families, the results of applying our approach may be substantially different, because these polymerases possess significantly different structural architectures to Y-family polymerases. For example, the F domains in high-fidelity DNA polymerases are much larger than those in the Y-family enzymes (Ling et al., 2001). A large-scale conformational change in the F domains of replicative DNA polymerases upon nucleotide binding is thought to be responsible for their high-fidelity DNA synthesis (Swan et al., 2009; Johnson, 2010; Prindle et al., 2013; B?benek and Ziuzia-Graczyk, 2018). However, such a change is not observed with Y-family DNA polymerases. These features imply that differences in topology of multidomain polymerases lead to different folding scenarios and different biological functions. We anticipate that models with a wide range of parameters can be applied when investigating other multidomain proteins and can provide a promising way to characterize the trade-off between folding and function. Our results can offer useful guidance for protein design and engineering at the multidomain level.

## Materials and methods

Key resources table

We used an SBM to investigate the folding and DNA binding of DPO4 with molecular dynamics simulations. In SBMs, which are based on funneled energy landscape theory (Bryngelson et al., 1995), it is assumed that it is the native topology of the protein/complex that determines the folding (Clementi et al., 2000) and binding mechanisms (Levy et al., 2004). SBMs can be verified by comparing simulation results with experiment measurements in terms of identifying the intermediate states, $?$ values, folding pathways, etc. (Clementi et al., 2000). Here, we applied a coarse-grained SBM that used one $Cα$ bead to represent one amino acid in DPO4 and three beads to represent the sugar, base, and phosphate groups of one nucleotide in DNA. For the DPO4 folding, we used a plain SBM potential $VSBMapoDPO4$ based on the crystal structure of apo-DPO4 (PDB: 2RDI) (Wong et al., 2008). $VSBMapoDPO4$ is made up of the bonded interactions, including bond stretching, angle bending, and dihedral rotation, as well as the nonbonded interactions (Clementi et al., 2000). To see the effects of intra- and interdomain native interactions on the DPO4 folding and DNA-binding processes, we further introduced a prefactor $?$ in front of the intra- and interdomain nonbonded terms to modulate the strength of the corresponding interaction. Thus, $VSBMapoDPO4$ can be expressed as follows:

$\begin{array}{ll}{V}_{\mathrm{S}\mathrm{B}\mathrm{M}}^{\mathrm{a}\mathrm{p}\mathrm{o}\mathrm{D}\mathrm{P}\mathrm{O}4}& ={V}_{\mathrm{S}\mathrm{B}\mathrm{M}}^{\mathrm{B}\mathrm{o}\mathrm{n}\mathrm{d}\mathrm{e}\mathrm{d}}+{V}_{\mathrm{S}\mathrm{B}\mathrm{M}}^{\mathrm{N}\mathrm{o}\mathrm{n}\mathrm{b}\mathrm{o}\mathrm{n}\mathrm{d}\mathrm{e}\mathrm{d}}\\ & ={V}_{\mathrm{S}\mathrm{B}\mathrm{M}}^{\mathrm{B}\mathrm{o}\mathrm{n}\mathrm{d}\mathrm{e}\mathrm{d}}+{?}_{\mathrm{I}\mathrm{n}\mathrm{t}\mathrm{r}\mathrm{a}}{V}_{\mathrm{S}\mathrm{B}\mathrm{M}}^{\mathrm{I}\mathrm{n}\mathrm{t}\mathrm{r}\mathrm{a}}+{?}_{\mathrm{I}\mathrm{n}\mathrm{t}\mathrm{e}\mathrm{r}}{V}_{\mathrm{S}\mathrm{B}\mathrm{M}}^{\mathrm{I}\mathrm{n}\mathrm{t}\mathrm{e}\mathrm{r}}+{?}_{\mathrm{L}\mathrm{i}\mathrm{n}\mathrm{k}\mathrm{e}\mathrm{r}}{V}_{\mathrm{S}\mathrm{B}\mathrm{M}}^{\mathrm{L}\mathrm{i}\mathrm{n}\mathrm{k}\mathrm{e}\mathrm{r}},\end{array}$

where $VSBMIntra$, $VSBMInter$, and $VSBMLinker$ are the nonbonded potentials for intradomain, interdomain, and linker interactions, respectively. We used a ratio $ρ=?Inter/?Intra$ to control the relative weight of the intra- and interdomain interactions. Since the linker is found to interact extensively with the other four domains in DPO4 (Table S1), we here grouped the interactions of linker into the interdomain interactions, so $?Linker≡?Inter$ was used throughout our simulations.

To explore DPO4 folding, we used replica-exchanged molecular dynamics (REMD) (Sugita and Okamoto, 1999) with the default parameters, where all prefactors $?$ are equal to 1.0 ($ρ=ρ0=1.0$). Reduced units, except for the length unit (nanometers, nm), were used throughout the simulations. The weighted histogram analysis method (WHAM) (Kumar et al., 1992) was then applied to calculate the thermodynamics of DPO4 folding, including heat capacity curves, free energy landscapes, and domain/interface melting curves, among other things. The melting curve was further fitted by a sigmoidal function that provides the (un)folding probability along the temperature $PI?(T)$ (Figure 2—figure supplement 6), where I indicates the individual domain/interface. $PI?(T)$ was then used to calculate $TCI$ with the expression

${\mathrm{TCI}}^{I,J}=-\mathrm{ln}??|\left[{P}^{I}?\left(T\right)-{P}^{J}?\left(T\right)\right]|?,$

and the mean $TCI$ ($MTCI$) was calculated as

$\mathrm{M}\mathrm{T}\mathrm{C}\mathrm{I}=?\mathrm{ln}\phantom{\rule{negativethinmathspace}{0ex}}\left\{\frac{1}{{N}^{I,J}}\sum _{I,J}?|\left[{P}^{I}\left(T\right)?{P}^{J}\left(T\right)\right]|?\right\},$

where $NI,J$ is the summed number of pairs I,J (Sadqi et al., 2006). A large (small) TCI corresponds to high (low) synchronous folding of the domains/interfaces and thus a high (low) folding cooperativity of DPO4.

We used a reweighting method based on the principles of statistical mechanics to efficiently calculate the thermodynamics of DPO4 folding at other values of ρ from the REMD simulations performed at $ρ0$ (Cao et al., 2016; Li et al., 2018). The algorithm was implemented as follows. The probability of one state having potential E and reaction coordinate $r$ with parameters $ρ0$ and ρ at temperature T can be written as

$\begin{array}{ll}p\left(E\left({\rho }_{0}\right),r\right)& =n\left(r\right)\mathrm{exp}\phantom{\rule{negativethinmathspace}{0ex}}\left[?\frac{E\left({\rho }_{0}\right)}{kT}\right],\\ p\left(E\left(\rho \right),r\right)& =n\left(r\right)\mathrm{exp}\phantom{\rule{negativethinmathspace}{0ex}}\left[?\frac{E\left(\rho \right)}{kT}\right],\end{array}$

where $n?(r)$ is the density of states. $n?(r)$ is intrinsic to the system, and thus is independent of ρ (Chu et al., 2013). Therefore, $p?(E?(ρ),r)$ can be calculated by reweighting $p?(E?(ρ0),r)$ as follows:

$p?\left(E?\left(\rho \right),r\right)=p?\left(E?\left({\rho }_{0}\right),r\right)?\mathrm{exp}?\left[-\frac{E?\left(\rho \right)-E?\left({\rho }_{0}\right)}{k?T}\right].$

Since $p?(E?(ρ0),r)$ has been calculated by analyzing the REMD simulations at $ρ0$ with WHAM, we used the above equation to calculate $p?(E?(ρ),r)$. Eventually, the equilibrium properties for different values of ρ (free energy landscapes and averaged $Q?(Inter)$ along with $Q?(Total)$, $TCI$, etc.) can be obtained. The reweighting method has been proven to be effective and accurate in characterizing many other protein dynamics processes, including many-body interactions in protein folding (Ejtehadi et al., 2004), multidomain protein folding (Li et al., 2018), and protein–protein binding (Cao et al., 2016). These processes often require elaborate calibration of parameters in the SBM potentials, so they are always computationally expensive. To verify the results from the reweighting method, we also performed the REMD simulations for four different values of ρ (0.5, 0.8, 1.2, and 1.5) in $VSBMapoDPO4$. The high degree of consistency between the results of these two approaches confirms the reliability of the reweighting method (Appendix 3—figure 2).

To identify the backtracking and calculate the time for DPO4 folding, we performed additional kinetic simulations that ran at constant temperature. For comparing the kinetic results obtained at different ρ, it is essential that the kinetic simulations be performed under identical environmental conditions, that?is, at identical temperatures. Since $Tf$ also changes with ρ, we shifted the folding temperature $Tfρ$ at the parameter ρ to that at $ρ0$ ($Tfρ0$). This was done by inserting the ratio $Tfρ0/Tfρ$ in front of $VSBMapoDPO4$. The rationale for this is based on the fact that the solvent effects in SBMs are linearly dependent on temperature. We examined the folding temperature with the rescaled $VSBMapoDPO4$ for four different values of ρ (0.5, 0.8, 1.2, and 1.5) by REMD simulations and confirmed the validity of our implementation (Appendix 3—figure 3). Furthermore, as the DPO4 folding at $Tf$ is too slow to be represented by the kinetic simulations, we performed the simulations at two temperatures lower than $Tf$. These were the optimal temperature $Tp$ for the growth of Sulfolobus solfataricus and the room temperature $Tr$ of the experiments. The simulation temperatures can be approximately deduced from the linear relation

${T}_{p,r}?\left(\mathrm{sim}\right)\approx {T}_{p,r}?\left(\mathrm{exp}\right)×\frac{{T}_{f}?\left(\mathrm{sim}\right)}{{T}_{f}?\left(\mathrm{exp}\right)},$

where $Tf?(exp)$ and $Tf?(sim)$ are the DPO4 folding temperatures in experiments (369?K) and simulations (1.13, in energy units). $Tf?(sim)$ was characterized as the temperature where the heat capacity curve shows a prominent peak (Appendix 3—figure 1). With the experimental temperatures $Tp?(exp)=353$?K and $Tr?(exp)=300$?K, we obtain the corresponding temperatures in the simulations: $Tp?(sim)=0.96?Tf?(sim)=1.08$ and $Tr?(sim)=0.81?Tf?(sim)=0.92$.

For DPO4–DNA binding, we used a short DNA segment that is present in the binary DPO4–DNA PDB crystal structure (PDB: 2RDJ) (Wong et al., 2008). As DPO4 exhibits a large-scale ‘open-to-closed’ transition from apo to DNA binary structures (Wong et al., 2008), we built a ‘double-basin’ SBM for DPO4 by adding the specific contacts within DPO4 at the binary structure as the potential $VSBMbinaryDPO4$(Chu et al., 2014). These contacts are found to be entirely located at the interface between the F and LF domains. It is worth noting that we did not use this ‘double-basin’ SBM for investigating DPO4 folding, where the potential of the SBM ($VSBMapoDPO4$) has only a ‘single basin’ in the apo-DPO4 structure. This is because (1) in the absence of DNA, DPO4 is mostly in apo form, and the transition rate for DPO4 from apo form to DNA binary form is very slow (Raper and Suo, 2016; Lee et al., 2017), so DPO4 is prone to fold to its apo form without DNA, and (2) the contacts formed at the F–LF interface can be regarded as a consequence of DNA binding (Raper and Suo, 2016), so the formation of these contacts reflects the effect of DNA binding. On the other hand, DNA was kept frozen throughout the simulations. Therefore, the potential for DPO4–DNA binding is expressed as

${V}_{\mathrm{S}\mathrm{B}\mathrm{M}}^{\mathrm{s}\mathrm{y}\mathrm{s}\mathrm{t}\mathrm{e}\mathrm{m}}={V}_{\mathrm{S}\mathrm{B}\mathrm{M}}^{\mathrm{a}\mathrm{p}\mathrm{o}\mathrm{D}\mathrm{P}\mathrm{O}4}+{V}_{\mathrm{S}\mathrm{B}\mathrm{M}}^{\mathrm{b}\mathrm{i}\mathrm{n}\mathrm{a}\mathrm{r}\mathrm{y}\mathrm{D}\mathrm{P}\mathrm{O}4}+{V}_{\mathrm{S}\mathrm{B}\mathrm{M}}^{\mathrm{D}\mathrm{P}\mathrm{O}4?\mathrm{D}\mathrm{N}\mathrm{A}}+{V}_{\mathrm{E}\mathrm{l}\mathrm{c}}^{\mathrm{D}\mathrm{P}\mathrm{O}4?\mathrm{D}\mathrm{N}\mathrm{A}},$

where $VSBMbinaryDPO4$ is the nonbonded potential (native contact) within DPO4 existing only in the DPO4–DNA binary structure, $VSBMDPO4-DNA$ is the native contact potential between DPO4 and DNA, and $VElcDPO4-DNA$ is the electrostatic potential. $VSBMbinaryDPO4$ and $VSBMDPO4-DNA$ are SBM terms and provide the driving forces for the formation of the functional DPO4–DNA complex (Levy et al., 2007). $VElcDPO4-DNA$ is mostly non-native, except when there is a native contact formed by two oppositely charged beads. Electrostatic interactions are known to play important roles in fast 3D diffusion (Raper and Suo, 2016) and in facilitated 1D search on DNA (von Hippel and Berg, 1989).

Only Arg and Lys in DPO4 were modeled to carry one positive charge, while and Asp and Glu in DPO4 and the phosphate pseudo-bead in DNA were modeled to carry one negative charge. In practice, we used the Debye–Hückel (DH) model, which considers the ion screening effect of a solvent to describe electrostatic interactions (Azia and Levy, 2009). The DH model was scaled to the strength at which two oppositely charged beads located at 0.5 nm would form the same strength as the native contact (1.0). This energy balance between the electrostatic and native contact interactions in SBM was previously suggested (Azia and Levy, 2009) and subsequently applied to a wide range of systems (Chu et al., 2012). Furthermore, we also used the rescaled $VSBMapoDPO4$ that led to the same DPO4 folding at room temperature with different values of ρ for investigating DPO4–DNA binding. Finally, we performed the DPO4–DNA binding simulations at room temperature. Details of the models can be found in SI and our previous work (Chu et al., 2012).

We used umbrella sampling simulations to calculate the free energy landscape of DPO4–DNA binding for different values of ρ. For a protein folding SBM, the fraction of native contacts Q is a typical reaction coordinate (Cho et al., 2006) and therefore is usually used for applying the biasing potential. For DPO4–DNA binding, the fraction of interchain native contacts $QDNA$ is an intuitively obvious candidate for performing umbrella sampling. However, as noted elsewhere (Cao et al., 2016; Chu and Wang, 2019), $QDNA$ cannot be used to distinguish among different unbound states, which all have $QDNA=0$. This would lead to great difficulty in accelerating the diffusion stage of binding and thereafter establishing the free energy for the unbound states. Instead, we chose the distance root-mean-square deviation of native contacts between DPO4 and DNA ($dRMS$) to perform the umbrella sampling simulations (Chu and Wang, 2019). $dRMS$ is given by

${d}_{\mathrm{RMS}}=\sqrt{\frac{1}{N}?\sum _{i,j}{\left({r}_{i?j}-{\sigma }_{i?j}\right)}^{2}},$

where N is the number of summed native contacts, $ri?j$ is the distance between pseudo-beads in DPO4 and DNA forming native contacts, and $σi?j$ is the value of the corresponding distance at the native structure. $dRMS$ has a minimum value of 0 nm for the native DPO4–DNA binary structure and deviates from 0 nm as unbinding proceeds.

DPO4–DNA binding was described by the three different processes shown in Figure 4B. The calculations of kinetics were done by performing three additional simulations: free diffusion of DPO4, the transition from the EC to the IS, and the transition between the IS and BS. The free diffusion of DPO4 was characterized by the simulations of free DPO4 for different values of ρ. The simulations for estimating the encounter times at the EC-to-IS transition started from different EC complex configurations (defined by forming one native binding contact between DPO4 and DNA) and ran until the arrival of the IS or the complete dissociation of DPO4 from DNA.

Since a high barrier is detected between the IS and BS (Figure 4A), especially when ρ is high, the corresponding transitions are expected to be very slow. This makes the direct transition between the IS and BS inaccessible by the kinetic simulation. We used a kinetic rate calculation framework based on enhanced sampling simulations (Tiwary and Parrinello, 2013; Salvalaglio et al., 2014). Specifically, we applied frequency-adaptive metadynamics simulations to investigate the transition between the IS and BS (Wang et al., 2018). We calculated the transition times for different values of ρ and found strong correlations with the kinetics inferred from the thermodynamic free energy landscapes (Figure 4—figure supplements 4 and 5). The result showed the consistency between the thermodynamic and kinetic simulations (Wang et al., 2017; Cao et al., 2016). Further details of the frequency-adaptive metadynamics simulations can be found in SI Appendix 1.

## Appendix 1

### Models and simulation protocols

The potential of SBM for DPO4 folding ($VSBMapoDNA$) is made up of the bonded ($VSBMBonded$) and non-bonded ($VSBMNonbonded$) potentials. Both are strongly biasing to the native structure of apo-DPO4 (Wong et al., 2008). The bonded term $VSBMBonded$ has the following expression (Clementi et al., 2000):

${V}_{\mathrm{SBM}}^{\mathrm{apoDPO4}}=\sum _{\mathrm{bonds}}{K}_{r}?{\left(r-{r}_{0}\right)}^{2}+\sum _{\mathrm{angles}}{K}_{\theta }?{\left(\theta -{\theta }_{0}\right)}^{2}+\sum _{\mathrm{dihedrals}}{K}_{?}^{\left(n\right)}?\left[1+c?o?s?\left(n×\left(?-{?}_{0}\right)\right)\right]$

, where the parameters $Kr$, $Kθ$ and $K?$?control the strengths of bond stretching, angle bending and torsional rotation interactions,?respectively. $r$, θ and $?$ are the bond length, angle, and dihedral angle, with a subscript zero representing the values adopted in the native structure. The non-bonded potential $VSBMNonbonded$ is further subdivided into intra-, interdomain, and linker potentials based on the participating residue i and j (Appendix 2—table 1):

${V}_{\mathrm{SBM}}^{\mathrm{Nonbonded}}={?}_{\mathrm{Intra}}?{V}_{\mathrm{SBM}}^{\mathrm{Intra}}+{?}_{\mathrm{Inter}}?{V}_{\mathrm{SBM}}^{\mathrm{Inter}}+{?}_{\mathrm{Linker}}?{V}_{\mathrm{SBM}}^{\mathrm{Linker}}$

, where the prefactor $?$ controls the weight of each interaction term. The non-bonded potential has a Lennard-Jones-like native contact term and a purely repulsive non-native contact term, expressed as follows:

${V}_{\mathrm{SBM}}^{\left(\mathrm{Intra},\mathrm{Inter},\mathrm{Linker}\right)}=\sum _{i

, where $?i?j$ and $?P?P$ controls the weight of native and non-native contact interactions. For native contacts, $σi?j$ is the distance between beads i and j in the native structure. For non-native contacts, $σP?P$ is equal to the diameter of the $Cα$ bead and the associated interactions provide the excluded volume repulsion between the beads in DPO4. The native contact map was generated by the Contacts of Structural Unit (CSU) software (Sobolev et al., 1999).

Length is in the unit of nm and the others are in reduced units for all calculations, so $Kr=10000.0$, $Kθ=20.0$, $K?(1)=1.0$, $K?(3)=0.5$, $?ij=1.0$, $?PP=1.0$ and $σPP=0.4$ nm. We note in the crystal structure of apo-DPO4, there are missing residues 33–39 in the F domain. This usually implies excessive flexibility. Thus, we weakened the strength of bonded interactions within this region to 0.01 and also removed the native contacts involved by this segment. We changed the value of ρ, which is equal to $?Inter/?intra$, to modulate the interplay between the intra- and interdomain interactions. Practically, we set $?Intra=1.0$ and we changed the value of $?Inter≡?Linker$ from 0.5 to 1.5, to change ρ. The default parameters of SBMs have $?Intra=?Intra=1.0$, so $ρ0=1.0$.

The potential of SBM for DPO4–DNA binding is made up of SBM potential of apo-DPO4 ($VSBMapoDPO4$), SBM potentials of the specific native contact of binary DPO4–DNA ($VSBMbinaryDPO4$ and $VSBMDPO4-DNA$) and electrostatic potential between DPO4 and DNA ($VElcDPO4-DNA$). The electrostatic interaction was described by the Debye-Hückel (DH) model:

${V}_{\mathrm{Elc}}^{\mathrm{DPO4}-\mathrm{DNA}}={K}_{\mathrm{coulomb}}?B?\left(\kappa \right)?\frac{{q}_{\mathrm{i}}?{q}_{\mathrm{j}}?e?x?p?\left(-\kappa ?{r}_{\mathrm{ij}}\right)}{{?}_{\mathrm{r}}?{r}_{\mathrm{ij}}}$

, where $B?(κ)$ is the salt-dependent coefficient; qi and qj are the charges of the coarse-grained beads $i$ and $j$, respectively. $?r$ is dielectric constant and was set to 80 throughout the simulations. $κ-1$ is the Debye screening length (in nm?1), which is determined by the salt concentration ($CSalt$, in molar units) with relation: $κ≈3.2CSalt/C0$. $C0$ is the reference molar concentration with $C0=1$ M. Practically, we set $CSalt=0.15$ M and we further rescaled the strengths of the DH model, so that the two oppositely charged beads located at 0.5 nm would form the same strength of native contact (1.0). More details on electrostatic potential can be found here (Azia and Levy, 2009).

Thermodynamic simulations on DPO4 folding were performed at $ρ0$ with the REMD approach (Sugita and Okamoto, 1999). We used 28 replicas with different temperatures ranging from 1.00 to 1.35 and concentrating around folding temperature to ensure sufficient sampling. The neighboring replicas attempted to exchange at every 1000 MD steps following the Metropolis criterion. Each replica ran for $1×109$ MD steps long. We observed reasonable exchanging probabilities between neighboring replicas, which are all higher than 0.2, indicating a good sampling. Finally, WHAM was used to collect all the replicas and calculate the thermodynamics of the DPO4 folding (Kumar et al., 1992).

Folding of individual domains of DPO4 (the F, P, T, and LF domains) was done by extracting the intradomain SBM potential from $VSBMapoDPO4$. For each domain, the SBM potential for folding $VSBMDomain$ has the bonded and non-bonded potentials that have the same expression and parameters within $VSBMapoDPO4$. It is worth noting that the P domain is made up of two sequentially separated segments (residues 1–10 and residues 77–166). We practically added a 5-residue long segment that has all glycines, linking the residue 10 to 77 (Appendix 3—figure 10). Such a long segment only has bond stretching and non-bonded repulsive potential, which has little effect on folding of the P domain. REMD simulations were performed for all the four domains independently.

Kinetic simulations for the DPO4 folding were performed for different ρ under the constant temperatures $Tp$ and $Tr$, respectively. For each ρ, we set up 100 independent simulations with a maximum length of $4×108$ MD steps at $Tp$ and $0.2×108$ MD steps at $Tr$, starting from different unfolded DPO4 conformations. The individual simulations were terminated when DPO4 accomplished folding, or the maximum MD steps was reached. We found that DPO4 can accomplish folding in most of the simulations (more than 91 out of 100 simulations for all ρ.). Finally, the first passage time (FPT) was collected by the criterion that the corresponding Q firstly exceeds 0.75, to represent the folding rate.

We used umbrella sampling to quantify the free energy landscape of DPO4–DNA binding. $dRMS$ of the DPO4–DNA binding native contact was applied as the reaction coordinate and 151 windows range from 0.0 nm to 15.0?nm were practically used. The simulation at each window was performed for $0.2×108$ MD steps. After the first round of umbrella sampling simulation, the probability distributions of $dRMS$ between 3.2 nm and 3.3 nm have little overlap when ρ is large. We therefore added 10 more windows in the range of $3.2～3.3n?m$ to ensure the sufficiency of sampling. Finally, we performed four same sets of umbrella sampling simulations (except that different initial configurations were used for different sets of umbrellas sampling). All the data were collected and analyzed by the WHAM (Kumar et al., 1992).

We calculated the theoretical binding affinity $Kd$ from the binding free energy landscape. We simplified the DPO4–DNA recognition to a two-state association and dissociation process:

$\left[\mathrm{D}\mathrm{P}\mathrm{O}4\right]+\left[\mathrm{D}\mathrm{N}\mathrm{A}\right]\text{?}?\text{?}\left[\mathrm{C}\mathrm{o}\mathrm{m}\mathrm{p}\mathrm{l}\mathrm{e}\mathrm{x}\right]$

, where the 'Complex’ is the (meta)stable binding complex that contains both the BS and IS. The binding affinity $Kd$ can be calculated by:

${K}_{\mathrm{d}}=\frac{\left[\mathrm{DPO4}\right]?\left[\mathrm{DNA}\right]}{\left[\mathrm{Complex}\right]}=\frac{{\left(1-{P}_{\mathrm{b}}\right)}^{2}}{{P}_{\mathrm{b}}}?{\left[\mathrm{DPO4}\right]}_{0}$

, where $Pb$ is the probability of the binding complex and $[DPO4]0$ is the total concentration of DPO4 in the system. In our simulation, we used a periodic cubic box with length of 20 nm, so

${\left[\mathrm{DPO4}\right]}_{0}=\frac{1×{10}^{-3}}{{N}_{\mathrm{A}}×{\left(20×{10}^{-9}\right)}^{3}}?m?o?l/L=0.21×{10}^{-3}?m?o?l/L$

, where $NA$ is the Avogadro constant. $Pb$ can be calculated from the binding free energy landscape with the following expression:

${P}_{\mathrm{b}}=\frac{{\int }_{{d}_{\mathrm{rms}}=0?n?m}^{{d}_{\mathrm{rms}}^{b/u}}e?x?p?\left[-F?\left({d}_{\mathrm{RMS}}\right)/k?{T}_{\mathrm{r}}\right]}{{\int }_{{d}_{\mathrm{rms}}=0?n?m}^{{d}_{\mathrm{rms}}=15?n?m}e?x?p?\left[-F?\left({d}_{\mathrm{RMS}}\right)/k?{T}_{\mathrm{r}}\right]}$

, where $drmsb/u$ separates the binding complex and the dissociative states. In practice, different $drmsb/u$ can lead to different $Kd$. We calculated $Kd$ changing with $drmsb/u$ at different ρ values (Appendix 3—figure 12). We found that $Kd$ does not change significantly within a range of different $drmsb/u$ (3.0–3.5 nm) for separating the IS and BS at $ρ=0.7$, when the theoretical $Kd$ approximates to the experimental $Kd$. We finally set $drmsb/u=3.25$ nm and obtained theoretical $Kd=2.24$ nM, which is close to the experimental $Kd$ of 3–10 nM (Fiala and Suo, 2004; Sherrer et al., 2009).

To calculate the diffusion coefficient of free DPO4, we performed 10 independent simulations starting from folded states of DPO4 for each ρ. Every simulation ran for $1×108$ MD steps. The free diffusion coefficient $D$ of DPO4 was calculated by applying the command '$g?_?m?s?d$' implemented in Gromacs (4.5.7) (Hess et al., 2008).

To calculate the encounter times for the transition from EC to IS, we performed 200 independent simulations starting from the DPO4–DNA configurations at EC, where only one native contact between DPO4 and DNA was formed. The simulation ran until the arrival of IS or complete dissociation of DPO4 from DNA.

To calculate the kinetics between the IS and BS, we used the frequency adaptive metadynamics approach (Wang et al., 2018). Metadynamics has been shown to be not only a powerful enhanced sampling method to efficiently explore the complex free energy landscape (Laio and Parrinello, 2002), but also a reliable approach to estimate the kinetic rate for a basin-to-basin transition (Tiwary and Parrinello, 2013). The acceleration factor by metadynamics was found to be:

$\alpha ?\left({\tau }_{\mathrm{sim}}\right)=?{e}^{{V}_{\mathrm{bias}}?\left(t\right)/k?T}?$

, where $Vbias?(t)$ is the biasing potential and the average is over a metadynamics simulation run up until the simulation time $τsim$. The kinetic calculation by metadynamics simulation is accurate if adding the bias is sufficiently infrequent so that the barrier region is not affected by the biasing potentials (Tiwary and Parrinello, 2013). To overcome the computational expenses laid by the infrequent bias adding, we applied an efficient approach, that?is, frequency adaptive metadynamics simulation (Wang et al., 2018), where a strategy of the fast filling up the basin followed by the infrequent bias near the barrier was proposed.

Practically, we used $dRMS$ as the collective variable to add the biasing potential in metadynamics simulations. The height of the Gaussian biasing potential was set to 1.0 for the transition from BS to IS and changed to 2.0 for the transition from the IS and BS when $ρ≥1.0$. The well-tempered metadynamics simulation was applied with bias factors varying from 2.0 to 100.0 (increasing with ρ) (Barducci et al., 2008). The initial deposition time for adding a bias is set to 1000 MD steps and increased to the maximum of $2×105$ MD steps, leading to the infrequent metadynamics. The details of the frequency adaptive metadynamics simulation can be found here (Wang et al., 2018).

We performed 100 independent frequency adaptive metadynamics simulations for the forward and backward transitions between the IS and BS for each ρ. A simulation for the transition from the IS to BS was deemed to a successful transition event when $dRMS$ is smaller than 0.1 nm, while a simulation for the transition from the BS to IS was deemed to a successful transition even when $dRMS$ is larger than 2.5 nm. The calculation of the transition time $τtrans$ was verified using a Kolmogorov–Smirnov (KS) test (Salvalaglio et al., 2014) to examine whether the cumulative distribution of the transition time obeys a Poisson distribution. We found all the p-values were higher than 0.05, confirming the reliability of the calculation (Salvalaglio et al., 2014).

We also used free energy landscape to estimate the times for the transitions between the IS and BS based on the energy landscape theory (Socci et al., 1996). The calculation procedure of $τtrans*$ is described as follows. In principle, $τtrans*$ can be calculated from the double integral of the free energy landscape projected onto the reaction coordinate (here we used $dRMS$) with the expression (Socci et al., 1996; Chahine et al., 2007):

${\tau }_{\mathrm{trans}}^{*}={\int }_{{d}_{\mathrm{RMS}}}^{{d}_{\mathrm{RMS}}?\left(\mathrm{End}\right)}d?\left({d}_{\mathrm{RMS}}\right)?{\int }_{{d}_{\mathrm{RMS}}?\left(\mathrm{Start}\right)}^{{d}_{\mathrm{RMS}}}d?\left({d}_{\mathrm{RMS}}^{\prime }\right)?\frac{e?x?p?\left[\left(F?\left({d}_{\mathrm{RMS}}\right)-F?\left({d}_{\mathrm{RMS}}^{\prime }\right)\right)/k?{T}_{\mathrm{r}}\right]}{D?\left({d}_{\mathrm{RMS}}\right)}$

where $D?(dRMS)$ is the position-dependent diffusion coefficient. By approximately setting $D?(dRMS)$ as a constant, we calculated the kinetic stability from the free energy landscape for different values of ρ. In practice, we set the thresholds of $dRMS$ for the transition from the IS to BS with $dRMS?(Start)=3.5n?m$?nm and $dRMS?(End)=0.1n?m$?nm?and for the transition from the BS to IS with $dRMS?(Start)=0.0n?m$?nm?and $dRMS?(End)=2.5n?m$?nm.

All the simulations were performed by Gromacs (4.5.7) (Hess et al., 2008) with PLUMED plugin (2.5.0) Tribello et al., 2014 used for implementing umbrella sampling and metadynamics simulations. Langevin dynamics was used with a friction coefficient 1.0τ?1, where τ is the reduced time unit. We applied constraints for all the bonds through LINCS algorithm (Hess et al., 1997), ensuring a 0.001τ time step without any simulation instability. The non-bonded interactions were cut-off at 3.0 nm. We followed a standard protocol suggested by SMOG tools to build the input files for the SBM simulations with Gromacs (Noel et al., 2010).

## Appendix 2

Appendix 2—table 1

## Appendix 3

Appendix 3—figure 1
Appendix 3—figure 2
Appendix 3—figure 3
Appendix 3—figure 4
Appendix 3—figure 5
Appendix 3—figure 6
Appendix 3—figure 7
Appendix 3—figure 8
Appendix 3—figure 9
Appendix 3—figure 10
Appendix 3—figure 11
Appendix 3—figure 12
Appendix 3—figure 13

## References

1. 1
2. 2
3. 3
4. 4
5. 5
6. 6
7. 7
8. 8
9. 9
10. 10
11. 11
12. 12
13. 13
14. 14
15. 15
16. 16
Proteins: molecules defined by their trade-offs (2020)
Current Opinion in Structural Biology 60:50–56.
https://doi.org/10.1016/j.sbi.2019.11.005
17. 17
18. 18
19. 19
20. 20
21. 21
22. 22
23. 23
24. 24
25. 25
26. 26
27. 27
28. 28
29. 29
30. 30
31. 31
32. 32
33. 33
34. 34
Protein folding rates and stability: how much is there beyond size? (2009)
Journal of the American Chemical Society 131:2074–2075.
https://doi.org/10.1021/ja808843h
35. 35
Intrinsically disordered protein (2001)
Journal of Molecular Graphics and Modelling 19:26–59.
https://doi.org/10.1016/S1093-3263(00)00138-8
36. 36
37. 37
38. 38
39. 39
40. 40
Cotranslational protein folding (1997)
Journal of Biological Chemistry 272:32715–32718.
https://doi.org/10.1074/jbc.272.52.32715
41. 41
42. 42
Frustration in biomolecules (2014)
Quarterly Reviews of Biophysics 47:285–363.
https://doi.org/10.1017/S0033583514000092
43. 43
44. 44
45. 45
46. 46
47. 47
48. 48
49. 49
50. 50
51. 51
The folding and evolution of multidomain proteins (2007)
Nature Reviews Molecular Cell Biology 8:319–330.
https://doi.org/10.1038/nrm2144
52. 52
53. 53
54. 54
55. 55
56. 56
57. 57
58. 58
59. 59
60. 60
61. 61
The kinetic and chemical mechanism of high-fidelity DNA polymerases (2010)
Biochimica Et Biophysica Acta (BBA) - Proteins and Proteomics 1804:1041–1048.
https://doi.org/10.1016/j.bbapap.2010.01.006
62. 62
63. 63
64. 64
65. 65
DNA replication fidelity (2000)
Annual Review of Biochemistry 69:497–529.
https://doi.org/10.1146/annurev.biochem.69.1.497
66. 66
67. 67
68. 68
69. 69
70. 70
71. 71
72. 72
73. 73
74. 74
75. 75
76. 76
77. 77
78. 78
79. 79
80. 80
81. 81
82. 82
83. 83
Limited cooperativity in protein folding (2016)
Current Opinion in Structural Biology 36:58–66.
https://doi.org/10.1016/j.sbi.2015.12.001
84. 84
85. 85
86. 86
87. 87
88. 88
89. 89
90. 90
91. 91
92. 92
93. 93
94. 94
Folding of spectrin’s sh3 domain in the presence of spectrin repeats
(2005)
Cellular & Molecular Biology Letters 10:595.
95. 95
96. 96
97. 97
98. 98
99. 99
100. 100
101. 101
Domain motions in proteins (1991)
Current Opinion in Structural Biology 1:883–888.
https://doi.org/10.1016/0959-440X(91)90082-5
102. 102
103. 103
104. 104
105. 105
106. 106
107. 107
108. 108
109. 109
110. 110
Reversible and irreversible unfolding of multi-domain proteins (2007)
Biochimica Et Biophysica Acta (BBA) - Proteins and Proteomics 1774:1591–1603.
https://doi.org/10.1016/j.bbapap.2007.09.005
111. 111
112. 112
113. 113
114. 114
115. 115
116. 116
Current Opinion in Structural Biology 9:390–399.
https://doi.org/10.1016/S0959-440X(99)80053-0
117. 117
118. 118
PLUMED 2: new feathers for an old bird (2014)
Computer Physics Communications 185:604–613.
https://doi.org/10.1016/j.cpc.2013.09.018
119. 119
120. 120
121. 121
Intrinsic disorder accelerates dissociation rather than association (2016)
Proteins: Structure, Function, and Bioinformatics 84:1124–1133.
https://doi.org/10.1002/prot.25057
122. 122
123. 123
124. 124
125. 125
Facilitated target location in biological systems
(1989)
The Journal of Biological Chemistry 264:675–678.
126. 126
127. 127
128. 128
129. 129
130. 130
131. 131
132. 132
133. 133
134. 134
135. 135
136. 136

## Decision letter

1. Yibing Shan
Reviewing Editor; DE Shaw Research, United States
2. Cynthia Wolberger
Senior Editor; Johns Hopkins University School of Medicine, United States
3. Yong Wang
Reviewer; University of Copenhagen, Denmark

In the interests of transparency, eLife publishes the most substantive revision requests and the accompanying author responses.

Acceptance summary:

Using a Go ?-like theoretical model Chu et al. explored the trade-off between strong intra- and inter-domain interactions of DPO4, a Y-family DNA polymerase, and showed that the system reflects a balance between expedient folding of individual domains and a stable inter-domain arrangement that also allows conformational flexibility required for the polymerase's DNA binding. The work represents an early theoretic analysis of folding of multi-domain proteins and their substrate binding.

Decision letter after peer review:

[Editors’ note: the authors submitted for reconsideration following the decision after peer review. What follows is the decision letter after the first round of review.]

Thank you for submitting your work entitled "From "divide-and-conquer" to "speed-stability": A trade-off between folding and function in a multi-domain protein" for consideration by eLife. Your article has been reviewed by three peer reviewers, one of whom is a member of our Board of Reviewing Editors, and the evaluation has been overseen by a Senior Editor. The reviewers have opted to remain anonymous.

Our decision has been reached after consultation between the reviewers. Based on these discussions and the individual reviews below, we regret to inform you that your work will not be considered further for publication in eLife. While recognizing the merits and sophistication of this work as, some of the reviewers are concerned that it is, as is presented in the manuscript, difficult for the more biologically oriented readership both in terms of style and in terms of substance. These reviewers also feel that publishing this work in a more biophysics-centric journal may do it more justice than eLife. At the same time, we would be willing to reconsider our decision if the manuscript is substantially revised to motivate the discussions by more biologically relevant questions, to make it more readable for eLife readership, and to give a better description of the underlying models and the simulations.

Reviewer #1:

This work used a Go model combined with enhances sampling simulations to investigate the folding of DPO4, a multi-domain protein that binds DNA. The results suggest that the relative strength of the inter- and intra-domain interactions determines the folding process, stability of the protein, and its DNA binding kinetics and affinity.

One important weakness of this manuscript, which makes this reviewer's assessment difficult, is its readability. From the main text, it is not very clear to a general reader, even one knowledgeable to protein modelling, what is the underlying physical model and what were the specific simulations. Moreover, the conclusions are presented in a somewhat convoluted way, alien to biologist. At a high level many of the conclusions are intuitive, and perhaps even obvious, e.g. strong inter-domain interactions hinder folding at domain level and leads to occasional unfolding of the domains, thus the backtracking. Overall, I am concerned whether this paper is suitable for eLife.

Reviewer #2:

In this manuscript, the authors use structure-based models to simulate multi-domain protein folding, in order to explore the relationship between inter- and intra-domain interactions during protein folding. Folding of multi-domain proteins is extremely challenging, and has only been seriously studied with simulations in the last few years, which makes the study timely. However, there are some serious issues with the manuscript that need to be addressed before the manuscript could be suitable for publication.

1) The use of English is rather poor. As a result, there are many passages that I cannot understand, making it difficult/impossible to assess the scientific quality of the full study. It may be necessary to consult with a professional scientific writer. Also, there are many statements for which the precision of the phrasing should be improved, such as claims of "perfect" or "proof".

2) As a motivation for the study, the manuscript states that "Until now, the mechanisms and functions of domain interactions on multi-domain protein folding have not been reported yet." However, this is not a fair statement. Specifically, the recent study of Rao and Gosavi, 2018, investigated multi-domain folding with a very similar model.

3) The authors use a re-weighting scheme to study the influence of changes in contact strengths. If this is simply free-energy perturbation, then a single equation could be given, along with a few sentence description. Since all results are derived from free-energy calculations based on re-weighting, there should be a clear description of the method, precisely as employed, in order to fully interpret the results.

4) There are many claims of backtracking, but the figures are not convincing. Specifically, Figure 2C shows that the average number of formed contacts is non-monotonic. However, this could simply arise from there being parallel pathways for folding, and the apparent backtracking could be an artifact of the projection onto Q. Since the simulations are based on REMD simulations, it is not obvious how one could distinguish between pathways, since full folding events are not observed. Demonstrating backtracking requires some form of pathway identification.

5) The Results state "In practice, for all different p, we set the free energy of the state that has the minimally formed total native contacts (Q(Total~ 0:08) to be 0." It is not clear why it is necessary to make this assumption. Couldn't the non-monotonic stability of the native ensemble be an artifact, if this assumption were not valid? As the contact strengths change, if the chosen point were to become less stable, it would shift the entire curve in Figure 1B. If the effect were sufficiently large, then perhaps the native ensemble would not exhibit the U-shape stability.

Reviewer #3:

This manuscript describes molecular dynamics simulations with the so-called Gō-like models that aim to investigate the folding and binding mechanism of a multi-domain protein modulated by inter-domain interactions. Specifically, the study focuses on determining the protein folding and DNA-binding processes of a DNA polymerase IV (DPO4). They observed an optimal DPO4 folding kinetics could be reached as inter-domain interactions were modestly weakened and showed the interesting competition between the fast, stable folding and efficient DNA-binding. This is an excellent study that is well-constructed, precisely executed and nicely presented.

Technically, they built a single-basin Gō model for DPO4 folding and double-basin Gō model for DPO4 binding with a frozen short DNA fragment. They used the classical enhanced sampling techniques, including parallel temperature replica exchanged MD and umbrella sampling, to accelerate the sampling and obtain the free energy profiles for folding and binding process, separately. They played with different strengths of inter-domain interactions ????????er so as to modulate the ratio ?? of ???????????? to ???????????? (the strengths of intra-domain interactions) to investigate the effects of the balance between inter-domain interactions and intra-domain interactions. To avoid extensive sampling of the free energy landscapes in different models, they employed a reweighting method to estimate the folding thermodynamics at a wide range of ?? by only performing REMD simulation with the standard model (at ??0=1.0). I appreciated the thoroughness and the technical sophistication to solidify the strength of the results. I have some questions and a few minor suggestions that they could consider to further strengthen the work.

Given that they aimed to explore the trade-off between folding and function for the same protein, it seems reasonable to investigate both the folding and binding process under the same energy landscape framework. Any reasons for not using a uniform double-basin Gō model for DPO4 in both folding and binding simulations?

They introduced the Debye-Hückel potential to describe the electrostatic interactions between DPO4 and DNA. It is not clear to me that if they also introduced DH potential to describe the intra-DPO4 interactions at the same time. If not, I am a bit concerned this might occur: a few positively charged residues in DPO4 bind at the same time with one negatively charged DNA bead just because these residues in DPO4 cannot feel the charges of others. Please make sure this situation didn't occur in the simulations.

And they also used specific native contacts to model the DPO4-DNA attractive interactions. So, in this binding model they used a hybrid specific LJ potential and a non-specific DH potential to model the DPO4-DNA binding process. This is of course not how real physics works in nature. Would some of the observations in this work be dependent on the choice?

There are at least four free parameters on the interaction strengths in the DPO4-DNA binding models (Materials and methods). It is not clear how the strength of the DH term was determined. And would the change of ????????er break the balance with other interactions? And could this impact the conclusions? Please comment on this.

They performed kinetic simulations of DPO4 folding at the pesudo room temperature ????sim which was identified by rescaling with the ratio of simulated Tf to experimental Tf. Changing the Hamiltonian parameters could change the thermodynamic properties, as they already recognized that "???? also changes with ??". So ????sim may also change with different ??. It seems the kinetic simulations were performed at the corresponding ????sim recalibrated by the temperature shift caused by ?? change. But it is not clear if they did the same in the DPO4-DNA binding kinetic simulations. Please clarify it.

In addition, they compared the DPO4-DNA binding affinities at different ?? with experimental Kd which. But again, the simulated Tf and Tr may shift due to the change of ??. So, does it make sense to compare the ?? or T-dependent affinities with the experimental Kd which was measured at a fixed temperature?

3) They stated that "Since direct simulations on the transition between the IS and BS are computationally impractical due to the high barrier between these two states, we instead tried to infer the kinetic rates from the barrier heights for different ??." But they actually didn't show the inferred kinetic rates in this manuscript, but instead just show the barrier heights in Figure 4E. I understand that they used the barrier height as a proxy of the transition rate based on the Arrhenius equation with a uniform pre-exponential factor. To release the dependence on this assumption and further strengthen the work, they could consider using other enhanced sampling methods with relatively low computational cost, such as frequency-adaptive metadynamics (Wang et al., 2018) and weighted ensemble simulation (Annu Rev Biophys. 2017;46:43-57) etc., to obtain the transition rates.

4) They stated that "We found a monotonic increase of barrier height for both two transitions between the IS and BS as ?? increases (Figure 4E)." Without error estimations, it is hard to judge if the trend for BS→IS barrier increase with ?? is significant or just within the errors. I would strongly suggest they do error estimations and include error bars in the free energy profiles.

5) There is one experimental author involved in this manuscript, so I read this work as an experimental/simulation collaboration, in which the simulations provide valuable predictions for experimental tests and validations. Besides the comparison with experimental Kd, it will strengthen the work by more comparisons. I understand that it is always non-trivial to combine and compare experiments and simulations, but I will appreciate if they could discuss and suggest the possibilities that could be tested by further experiments.

6) Could the conclusions in this manuscript be extended for other multi-domain proteins? Or how general are the conclusions?

[Editors’ note: further revisions were suggested prior to acceptance, as described below.]

Thank you for submitting your article "Investigating the trade-off between folding and function in a multidomain Y-family DNA polymerase" for consideration by eLife. Your article has been reviewed by three peer reviewers, one of whom is a member of our Board of Reviewing Editors, and the evaluation has been overseen by Cynthia Wolberger as the Senior Editor.

The reviewers have discussed the reviews with one another and the Reviewing Editor has drafted this decision to help you prepare a revised submission.

We would like to draw your attention to changes in our revision policy that we have made in response to COVID-19 (http://www.asadoresteatre.com/articles/57162). Specifically, we are asking editors to accept without delay manuscripts, like yours, that they judge can stand as eLife papers without additional data, even if they feel that they would make the manuscript stronger. Thus the revisions requested below only address clarity and presentation.

Summary:

Using a Go ?-like theoretical model Chu et al. explored the trade-off between strong intra- and inter-domain interactions of DPO4, a Y-family DNA polymerase, and showed that the system reflects a balance between expedient folding of individual domains and a stable inter-domain arrangement that also allows conformational flexibility required for the polymerase's DNA binding. The work represents an early theoretic analysis of folding of multi-domain proteins and their substrate binding.

Revisions:

In comparison to the original submission, the reviewers agree that manuscript has been substantially improved in terms of both presentation and substance. The reviewers suggested several relatively minor revisions:

Results paragraph four indicates that the ratio of inter and intra-domain contacts was varied. How this was implemented was not clear from the text. For example, one could vary one weight, or the other, or one could modulate both, while keeping some other quantity (e.g. total stabilizing energy) constant. Without this clearly defined, it is difficult fully appreciate the potential significance of any trends based on this ratio.

The authors should expand the discussion on the potential experimental consequence of their conclusions. For example, what signature might single molecule force spectroscopy observe as an implication. What other experiments can be conducted in this regard.

The manuscript repeatedly describes effects, relative to the "default" parameterization of a structure-based model. It is not clear what significance the default parameters may have. That is, are the default parameters considered to be an accurate approximation to systems in the cell? If so, how, and to what extent? Perhaps the initial parameterization is far from appropriate for the current application, in which case variations in the ratio of different interactions may correspond to a regime that is not biologically relevant. It is important that the authors present these trends in terms of the physical insights into a biological process.

The authors may further revise their figures and make them even more intuitive. For example, rho could be labelled as relative strength of inter vs. interactions on the figure, so could MTCI. The labelling of Figure 2G is not very clear, and similar issues exist for some other figure panels.

https://doi.org/10.7554/eLife.60434.sa1

## Author response

[Editors’ note: the authors resubmitted a revised version of the paper for consideration. What follows is the authors’ response to the first round of review.]

While recognizing the merits and sophistication of this work as, some of the reviewers are concerned that it is, as is presented in the manuscript, difficult for the more biologically oriented readership both in terms of style and in terms of substance. These reviewers also feel that publishing this work in a more biophysics-centric journal may do it more justice than eLife.

We agree that our original manuscript was not well organized and written, as it may lead to confusion or is difficult to be understood from the biological perspective. To take this issue seriously, we have made significant changes in the manuscript by following the reviewers’ constructive critiques and useful suggestions. Furthermore, we performed additional simulations to address the reviewers’ concerns. In the end, we feel that all changes we have made in the revised manuscript are important, not only to fully respond to the critiques of yours and reviewers’, but also to make our manuscript more readable to the broad readership of eLife. Therefore, we sincerely thank both you and reviewers for the excellent critiques. Following is the point-to-point response to the reviewers’ comments.

Reviewer #1:

[…]

One important weakness of this manuscript, which makes this reviewer's assessment difficult, is its readability. From the main text, it is not very clear to a general reader, even one knowledgeable to protein modelling, what is the underlying physical model and what were the specific simulations. Moreover, the conclusions are presented in a somewhat convoluted way, alien to biologist. At a high level many of the conclusions are intuitive, and perhaps even obvious, e.g. strong inter-domain interactions hinder folding at domain level and leads to occasional unfolding of the domains, thus the backtracking. Overall, I am concerned whether this paper is suitable for eLife.

We thank reviewer #1 for pointing out the weakness of the original manuscript, which has poor readability, considering the broad readership of eLife. Accordingly, we have significantly revised the manuscript to address reviewer #1’s concerns. A summary of the revision made to address reviewer #1’s concern is described below: (1) A concise but informative description of the models and simulations was added in the main text for ease of reading. (2) We enriched the motivation and discussion of the model parameterization process, which now has evident connections to the biological meanings. (3) We extensively rewrote the Discussion and Conclusions section that is now easily understandable to the readers with a biological background. (4) the rewritten Introduction section includes a clear motivation of the study (this was also suggested by reviewer #2), while the Discussion and Conclusions section now has a clear understanding of the results produced by the simulations. With these four improvements, we aim to demonstrate that our study, which was rigorously performed and thoroughly compared to previous work, experimental results and even obvious intuition, has a clear aspect of novelty. We conducted a quantitative investigation through molecular simulation, for the first time, on the interplay between the intra- and interdomain interactions in modulating the folding and functional binding processes for a multidomain protein. The quantitative results have allowed us to gain an unprecedented understanding of how a multidomain protein handles the folding and substrate binding through its native topology. To summarize the central theme of our work, we have made an important discovery that a minimally frustrated landscape for folding is optimized for the functional binding process at a multidomain protein level. Our findings can be deemed as a demonstration that the action of evolutionary pressures works effectively at the multidomain protein level to produce a native structure for fast folding and efficient function. Last but not least, our theoretical modeling strategies, simulation approaches, analysis tools/methods, and conclusions can be applied to other multidomain proteins.

In conclusion, we believe that the changes we have made in our revised manuscript fully addressed reviewer #1’s concern. We hope he/she will be satisfied with the revised manuscript.

Reviewer #2:

In this manuscript, the authors use structure-based models to simulate multi-domain protein folding, in order to explore the relationship between inter- and intra-domain interactions during protein folding. Folding of multi-domain proteins is extremely challenging, and has only been seriously studied with simulations in the last few years, which makes the study timely. However, there are some serious issues with the manuscript that need to be addressed before the manuscript could be suitable for publication.

As reviewer #2 indicates, one of the primary aims of our work is to explore the effects of the inter- and intra-domain interaction in a multidomain protein on the folding process. This was undertaken by the structure-based models (SBMs), which are based on the protein’s native structure and are advantageous to have an easily tunable parameterization process associated with clear experimental correspondence. We appreciate reviewer #2’s view that our work on multidomain protein folding is timely and technically feasible by the SBM-based molecular dynamics (MD) simulation. This was exactly our motivation for pursuing a theoretical and quantitative investigation of protein native structure in balancing the folding and function at a multidomain level. At the same time, the experimental approach is still limited on this aspect.

Reviewer #2 indicates that he/she has some concerns about the manuscript presentations, results, and conclusions in our original manuscript. We have taken reviewer #2’s comments very seriously and revised our manuscript accordingly. We believe that we have fully addressed reviewer #2’s concerns in the revised manuscript, which shows a more clear and solid study. We thank reviewer #2 for his/her comments and suggestions to improve our work. Our point-to-point responses are described below:

1) The use of English is rather poor. As a result, there are many passages that I cannot understand, making it difficult/impossible to assess the scientific quality of the full study. It may be necessary to consult with a professional scientific writer. Also, there are many statements for which the precision of the phrasing should be improved, such as claims of "perfect" or "proof".

We apologize for not being able to use the English properly, as it has led to difficulties for reading and hindered the understanding of our results (This was also implied by reviewer #1). Per the suggestion, we have asked a professional scientific editing agency for help in polishing and improving our manuscript. Along with the language editing service, we have also rewritten the unclear statements in our original manuscript in order to eliminate any confusion.

2) As a motivation for the study, the manuscript states that "Until now, the mechanisms and functions of domain interactions on multi-domain protein folding have not been reported yet." However, this is not a fair statement. Specifically, the recent study of Rao and Gosavi, 2018, investigated multi-domain folding with a very similar model.

We thank reviewer #2 for pointing out the study by Rao and Gosavi on investigating the folding of the serpin, a two-domain protein. By applying two SBMs that were separately built from two structures of the serpin, Rao and Gosavi found that folding of the metastable active structure is easier and faster than that of the stable latent structure. This is very inspiring and further manifests the validity of SBMs in studying the protein dynamics at the multidomain level.

While appreciating the study by Rao and Gosavi, herein we have used a theoretical modeling strategy with a parameterization of the SBM to quantitatively characterize the essential roles of the interplay between the intra- and interdomain interactions in both the folding and functional binding of DPO4. Our study, as presented in the manuscript, has shown a clear picture of changing in DPO4 folding mechanism modulated by the interdomain interactions. The DPO4 folding mechanism has been found to influence the folding kinetics critically, thus determining the folding pathways reflected by the backtracking and folding rates (This new result has been demonstrated by the additional kinetic simulations). Furthermore, the simulations on the DPO4-DNA binding process have led to an interpretation of how the DPO4 folding mechanism may influence the function (demonstrated by the thermodynamic and kinetic simulations). The detailed SBM parameterization enables us to make a quantitative connection of the DPO4 folding to its binding, thus providing a glimpse of how a multidomain protein handles the stable folding and efficient function at the same time.

In the revision, we modified the motivation of our study in the Introduction section and respectfully cited relevant work published by others. Please see the Introduction section.

“… A recent computational study of a two-domain serpin elucidated the critical role of the functional binding-related reactive center loop (RCL) in the folding of the protein to distinct structures (Rao and Gosavi, 2018). […] Addressing this issue is an important avenue in studies of multidomain protein folding.”

3) The authors use a re-weighting scheme to study the influence of changes in contact strengths. If this is simply free-energy perturbation, then a single equation could be given, along with a few sentence descriptions. Since all results are derived from free-energy calculations based on re-weighting, there should be a clear description of the method, precisely as employed, in order to fully interpret the results.

We agree with reviewer #2’s view. We have added more descriptions of the reweighting method and the explicit forms of equations that would address reviewer #2’s concern. We believe that this would also make the readers understand the technique more easily. Please see the Materials and methods section:

“We used a reweighting method based on the principles of statistical mechanics to efficiently calculate the thermodynamics of DPO4 folding at other values of p from the REMD simulations performed at p0 (Cao et al., 2016; Li et al., 2018). […] The high degree of consistency between the results of these two approaches confirms the reliability of the reweighting method (Appendix 3—figure 2).”

4) There are many claims of backtracking, but the figures are not convincing. Specifically, Figure 2C shows that the average number of formed contacts is non-monotonic. However, this could simply arise from there being parallel pathways for folding, and the apparent backtracking could be an artifact of the projection onto Q. Since the simulations are based on REMD simulations, it is not obvious how one could distinguish between pathways, since full folding events are not observed. Demonstrating backtracking requires some form of pathway identification.

This is indeed a good comment, which has urged us to pursue a rigorous analysis of the DPO4 folding process from the kinetic perspective, in particular on the backtracking. Backtracking is defined as the process with the formation, breaking, and refolding of a subset of native contacts as the protein proceeds from the unfolded to the folded state (Gosavi et al., 2006). As indicated by reviewer #2, the rigorous identification of backtracking should rely on the kinetically undisrupted trajectories, which are not available in the REMD simulations. Therefore, we have performed additional kinetic simulations under constant temperature with different ??, to verify the observation of the backtracking.

The temperature for kinetic simulations was set below the folding temperature Tf of DPO4. As DPO4 folds very slowly at Tf, the full folding is difficult to realize at Tf during the constant temperature simulation by affordable computational efforts. With different ??, we have performed 100 independent simulations at 0.96Tf, starting from different unfolded DPO4 configurations. Each simulation ran until one successful folding event was observed or for a maximum length of 4×108??, where ?? is the reduced time unit. We found that DPO4 successfully folds at least in 91 out of 100 simulations with different ??, leading to sufficient statistics for the pathway analysis.

We still used the same quantity, which is the averaged Q(Inter) along with Q(Total), to describe the backtracking, as it was regarded as a standard criterion to determine the backtracking from the kinetic simulations (Gosavi et al., 2006; Gosavi et al., 2008). However, here the quantity was calculated from the individual folding trajectory rather than the combination of all the trajectories to avoid the possible mixture of the multiple folding pathways. Still, we observed an increase followed by a decrease of Q(Inter) along with Q(Total) in many individual folding trajectories during DPO4 folding as a sign of the backtracking (Figure 2D).

We further examined one trajectory and illustrated the backtracking in the DPO4 folding (Figure 2—figure supplement 3). In Figure 2—figure supplement 3, we found that the backtracking is due to the folding and unfolding of the F and T domains. These two domains during the backtracking initially fold and then unfold because of their weak folding stabilities and fast folding rates. These two domains require other domains as templates to achieve stable folding. The folding of the F and T domain can partly form the relevant interdomain interfaces, leading to an increase of Q(Inter), while the subsequent unfolding of the F and T domains breaks the formation of the relevant interdomain interfaces, leading to a decrease of Q(Inter). This observation in one trajectory provides a structural explanation of the backtracking detected in terms of Q(Inter) and Q(Total) (Figure 2—figure supplement 3D and E).

We have identified the backtracking in all the individual simulations through the way proposed in Figure 2—figure supplement 3C, then collected all the DPO4 structures in the backtracking from the simulations. Finally, we analyzed the intra-domain structures (Figure 2E and F). From the 1D free energy landscape projected onto Q(Total) (Figure 1), we can see that the backtracking mainly occurs between the U and I3 (I3) in Region I and the I3 (I3) and I2 or I1 in Region II. Along with the structural characteristics of DPO4 in the U (unfolded states), I3 (one folded LF domain), I3 (one folded P domain), I2 (folded P and LF domains) and I1 (folded P, T and LF domains), we can attribute the backtracking to the transient, fast folding and unfolding of the F and T domains during the DPO4 folding.

The strength of the interdomain interaction in DPO4 has clear impacts on the backtracking during the DPO4 folding. The most probable backtracking at the region I and region II were observed with ?? at 1.3 and 1.1, respectively.

Since the folding of DPO4 at Tf is difficult to access, we have used a lower temperature at 0.96Tf. Experimentally, differing conditions can lead to different folding pathways. To see the impacts of temperatures on the backtracking during DPO4 folding, we estimated the populations of the backtracking based on the two temperatures that were used in our simulations (0.96Tf and Tr). We can see that from Figure 2—figure supplement 4, decreasing the temperature from 0.96Tf to Tr (0.81Tf) decreases the number of backtracking during DPO4 folding. Considering a higher temperature can lead to more instabilities of the F and T domains in DPO4, we speculate that the backtracking at Tf is more significant than at the lower temperatures (0.96Tf and Tr).

Please see the new Figure 2 and the Results section.

“We further calculate the averaged Q(Inter) and Q(Total) and interestingly find that there are two regions exhibiting an increase followed by a decrease in Q(Inter) as Q(Total) increases (Figure 2C). […] Therefore, we suggest that the backtracking in DPO4 folding is led by unstable and fast domain folding and unfolding of the small-sized F and T domains (Figure 2—figure supplement 5).”

5) The Results state "In practice, for all different p, we set the free energy of the state that has the minimally formed total native contacts (Q(Total~ 0:08) to be 0." It is not clear why it is necessary to make this assumption. Couldn't the non-monotonic stability of the native ensemble be an artifact, if this assumption were not valid? As the contact strengths change, if the chosen point were to become less stable, it would shift the entire curve in Figure 1B. If the effect were sufficiently large, then perhaps the native ensemble would not exhibit the U-shape stability.

We made such an assumption based on the intuition that a protein with the simple SBM potential at the completely unfolded states (Q(Total)=0.0) has no native contact, so the free energy should be entirely contributed by the local interaction energetic term and the entropic term. As these two terms are the same with different ??, so the free energy at Q(Total)=0.0 can be set as the same zero points for different ??.

However, as noted by reviewer #2, this assumption may not be fully guaranteed, as DPO4 never reaches the point Q(Total)=0.0 during the simulations (the lowest value of Q(Total) detected in the simulations is ~0.08). Besides, the sampling at the lowest Q(Total) may be insufficient. This may lead to an imprecise determination of free energy at that point (Q(Total)~0.08).

To rigorously investigate the changes in thermodynamics led by changing the strengths of the interdomain interactions (changing r), we have used the changes of the thermodynamic stabilities of the different folding states with respect to the change from ??0 to ?? in the new analysis. The stability of the protein folding state ???" is defined as the free energy difference between the folded or folding intermediate states and unfolded states with the expression ???" = ??" ? ??#, where the superscript U indicates the unfolded states, the superscript S indicates any of the folding states (I3-I1 and N) and F is the free energy. The folding stability, which does not require a zero energy reference point as we set in the previous analysis, also has a clear experimental correspondence, which can be obtained explicitly (e. g., by Differential Scanning Calorimetry, CD melting spectroscopy, etc.). Therefore, the change of stability at the folding state S led by changing ??, is calculated as ????(??)\$ = ???(??)" ? ???(??!)". We have calculated ????(??)\$ and have plotted a new figure to replace the old Figure 1C. From Figure 1C, we see that most of the conclusions in the original manuscript have been preserved (The only exception is that the relevant discussion on the unfolded states, as the stability of the unfolded state is always 0 with different ??.). It is possibly due to the fact that the unfolded states of DPO4 with a large absence of non-local interactions are minorly affected by changing ??.

Reviewer #3:

[…]

Given that they aimed to explore the trade-off between folding and function for the same protein, it seems reasonable to investigate both the folding and binding process under the same energy landscape framework. Any reasons for not using a uniform double-basin Gō model for DPO4 in both folding and binding simulations?

This is indeed a good question. Crystal structures have revealed that DPO4 is in its apo-form when DNA is not present (Wong et al., 2008). After binding to DNA, DPO4 undergoes a large-scale conformational transition that is mainly related to the rotation of the LF domain to embrace the DNA molecule associated with breaking the T-LF interface and forming the F-LF interdomain interactions (Wong et al., 2008). There are two reasons why we have used two different models for the DPO4 folding and DNA-binding processes based on the experimental evidence. (1) The crystal structures of DPO4 have shown that the isolated DPO4 is in a stable apo-form (A-form), distinct from the DNA-DPO4 binary form (B-form). As temperature increases, DPO4 was found to slightly destabilize its A-form and likely fall into a conformational equilibrium between the A-form, intermediate (I-form), and B-form (Lee et al., 2017). However, further experiments indicated that the isolated DPO4 has a small B-form population and exhibits a very low kinetic rate for the transition from the A- to B-form when DNA is not present (Raper and Suo, 2016). Moreover, there is a disordered loop in the F domain in apo-DPO4. The disorder-to-order transition on this loop can only be observed in the DPO4-DNA binary complex (Wong et al., 2008), thus, was suggested as a hallmark for the DNA binding process (Raper and Suo, 2016). The combined experimental features indicate that apo DPO4 is primarily populated in the A-form with a certain degree of conformational flexibility, while the conformational transition of DPO4 from the A-form to the B-form can only be achieved through DNA binding. (2) Building of the double-basin SBM corresponds to the addition of the F-LF interdomain native contacts from the B-form DPO4, which is induced by DNA binding, to the single-basin SBM. Regarding this, the F-LF interdomain native contacts in the double-basin SBM reflect the effects of the DNA-binding rather than of the DPO4 topology. Therefore, we consider the formation of the F-LF interface as a consequence of DNA-binding, so the relevant native contacts should be present in the double-basin SBM in DPO4-DNA binding rather than the single-basin SBM, which focuses on folding of DPO4 into the A-form without DNA binding. Please see the Materials and methods section:

“For DPO4-DNA binding, we used a short DNA segment that is present in the binary DPO4-DNA PDB crystal structure (PDB: 2RDJ) (Wong et al., 2008). […] This is because (1) in the absence of DNA, DPO4 is mostly in apo form, and the transition rate for DPO4 from apo form to DNA binary form is very slow (Raper and Suo, 2016; Lee et al., 2017), so DPO4 is prone to fold to its apo form without DNA, and (2) the contacts formed at the F-LF interface can be regarded as a consequence of DNA binding (Raper and Suo, 2016), so the formation of these contacts reflects the effect of DNA binding.”

They introduced the Debye-Hückel potential to describe the electrostatic interactions between DPO4 and DNA. It is not clear to me that if they also introduced DH potential to describe the intra-DPO4 interactions at the same time. If not, I am a bit concerned this might occur: a few positively charged residues in DPO4 bind at the same time with one negatively charged DNA bead just because these residues in DPO4 cannot feel the charges of others. Please make sure this situation didn't occur in the simulations.

In our study, we aimed to quantify the effects of the intra- and interdomain interactions on DPO4 folding. The interactions in SBMs are purely native, so they can be easily distinguished and modulated in terms of the intra- and interdomain interactions. This has enabled us to make a quantitative investigation on the effects of changing the intra- and interdomain interaction strengths in DPO4 on the folding process. As the electrostatic interactions are known to be long-ranged, and mostly non-specific/non-native, adding the electrostatic interactions to SBMs would inevitably increase the complexity of the analysis and further hinder us from drawing a definite conclusion. It is noteworthy that the previous studies using the plain SBMs without electrostatic interactions were capable of capturing many characteristics of DPO4 folding produced by the experiments (Wang et al. 2012; J. Chem. Theory Comput. (2020) 16, 2, 1319-1332). Therefore, we approximately ignored the electrostatic interaction in DPO4 folding and did not apply the Debye-Hückel (DH) model to the intra-DPO4 interactions in the simulations. On the other hand, the electrostatic interactions are crucial for the protein-DNA binding processes (performing the fast 3D diffusion close to the diffusion limit, forming the non-specific DPO4DNA binding complex and the functional DPO4-DNA binding complex, etc.), thus were included in the DPO4-DNA binding models.

Regarding the concern raised by reviewer #3, we think that reviewer #3 wants to confirm that there should not be the same charged beads in DPO4 accumulating in a limited space. Spatial accumulation of the same charged beads is unrealistic as the same charged beads would naturally form repulsive electrostatic interactions to push others away. However, the repulsive interactions between the same charged beads are missing in the intra-DPO4 model, and such an effect may be absent. To assess this situation, we have calculated the number of contacts formed by the same charged beads in DPO4 during the umbrella sampling simulations for DNA binding. In Appendix 3—figure 13, the contacts between the same charged DPO4 residues are further classified into the two-body and three-body types. We can see that the two-body (pairwise) contacts were formed with a very low chance indicated by the average and maximum contact numbers. At the same time, DPO4 was almost devoid of the three-body contacts formed by the same charged residues. This features no abnormally high number of same charges accumulating in a limited spatial space. Hence, the intraDPO4 model without electrostatic interactions is appropriate for taking care of this issue, probably due to the nature of native interactions employed in this study.

And they also used specific native contacts to model the DPO4-DNA attractive interactions. So, in this binding model they used a hybrid specific LJ potential and a non-specific DH potential to model the DPO4-DNA binding process. This is of course not how real physics works in nature. Would some of the observations in this work be dependent on the choice?

Specific native contact interactions between DPO4 and DNA are the driving forces in SBM to form the functional binding complex. SBMs by taking account into only native contact interactions have been proved very successful in describing not only the protein folding (Clementi et al., 2000) but also the protein binding (Levy, Wolynes and Onuchic, 2004) processes. The energy landscape theory claims that protein folding and binding occur on the minimally frustrated landscapes (Bryngelson et al., 1995). The non-native interactions have mainly been removed, so the native contact in principle, determines the mechanism of the molecular process. In particular, SBMs have also been used for studying protein-DNA recognition (Levy, Onuchic and Wolynes, 2007; J. Am. Chem. Soc. (2009) 131, 15084?15085) and have provided extensive predictions that are in line with the experiments (e. g., Proc. Natl. Acad. Sci. U.S.A (2011) 108, 17957-17962; Chu and Munoz, 2017). These features are the justifications of using SBMs with specific native contacts between DPO4 and DNA to investigate the binding process.

On the other hand, it is well known that electrostatic interactions are crucial in protein-DNA binding. It was found that the rate of DPO4 anchoring with the DNA is very high close to the diffusion limit, due to the facilitation made by electrostatic interactions (Raper and Suo, 2016). After forming the encounter complex, electrostatic interactions are responsible for promoting the search along with the DNA molecules in terms of 1D sliding (von Hippel and Berg, 1989). For DPO4 with a short DNA segment present, the 1D sliding has been simplified to a short-term 1D translocation process on the DNA (Chu et al., 2014). These 3D and 1D processes of DPO4 binding to DNA are controlled by the nonspecific electrostatic interactions, which should be explicitly considered and have been described by the DH model in SBM used in our study.

As mentioned by reviewer #3, there is a mixture of interactions for the oppositely charged beads that also form the native contacts. These two different types of interactions have different roles in DPO4-DNA recognition, determined by their corresponding interaction characteristics. The native contact interaction term is described by the Lennard-Jones (LJ) potential, which is short-ranged. This term is responsible for the transition from the non-specific search model to specific DNA binding complex, thus weakening or removing the native contact might fail the functional binding transition process (Chu and Munoz, 2017). On the other hand, the electrostatic interaction term is long-ranged. When DPO4 is far from the DNA native binding sites, electrostatic interaction, in principle, acts as the driving force for DPO4 to perform either the 3D diffusion or 1D translocation. Weakening or removing the electrostatic interaction formed by the native contact pairs (termed as the specific electrostatic interaction) would decrease the searching efficiency of the DPO4-DNA recognition.

Intuitively, the presence of the two different interactions formed by the oppositely charged native contact pairs seems unrealistic. In general, the specific protein-DNA binding complex is mainly stabilized by the complementary charges located at the binding interface (Proc. Natl. Acad. Sci. U.S.A (2011) 108, 17957-17962). However, the specific DNA binding site, compared to the random non-specific DNA binding sequence, should have an additional effect (cooperative interaction generated from the recognition sequence) to stabilize the specific protein-DNA binding complex as a termination of the non-specific search (Nat. Commun. (2020) 11, 540). For simplicity, we did not distinguish the specific and non-specific DNA sequences in SBM, so the electrostatic interactions from the different DNA sequences are the same. The interactions from the specific binding sites that are responsible for the binding cooperativity (the capability for switching from the non-specific to specific binding complex) have been included and described as the native contacts in SBM. However, the type of interaction is electrostatic in nature. In other words, the native contacts account for the stabilization effects originated from the specific DNA binding sequences.

Therefore, we would argue that taking into considerations both the short-ranged native contact and long-ranged electrostatic interactions are very important in building a model that is essential to describe the non-specific DNA search process and specific DNA recognition process. Please see the Materials and methods section:

“VbinaryDPO4SBM and VDPO4-DNASBM are SBM terms and provide the driving forces for the formation of the functional DPO4-DNA complex (Levy et al., 2007). VDPO4-DNAElc is mostly non-native, except when there is a native contact formed by two oppositely charged beads. […] Finally, we performed the DPO4-DNA binding simulations at room temperature. Details of the models can be found in SI and our previous work (Chu et al., 2012, 2014).”

There are at least four free parameters on the interaction strengths in the DPO4-DNA binding models (Materials and methods). It is not clear how the strength of the DH term was determined. And would the change of eInter break the balance with other interactions? And could this impact the conclusions? Please comment on this.

We have used the default parameters for the first three SBM potentials (VapoDPO4SBM, VbinaryDPO4SBM and VDPO4-DNASBM), except when the strengths of the intra- and interdomain contacts in VSBMapoDPO4 are modulated with changing ??. The default strength of the specific native contacts from the DPO4-DNA crystal structure is set to 1.0 (in reduced energy unit).

The strength of the DH term is set to be comparable to the LJ contact as suggested by Azia and Levy (J. Mol. Biol. (2009) 393, 527-542). In practice, we have rescaled the pre-factor of the DH term so that the oppositely charged pairs located at 0.5 nm have electrostatic interaction energy of -1.0, which is equal to the native contact strength. This electrostatic and contact strength balance has been widely used in the SBM simulations (e. g., Chu et al 2012; 2014; 2019).

Here, we aimed to see how the interaction of the intra-DPO4 affects the DNA binding process. We have changed the strengths of the intra- and interdomain interactions in DPO4 (VSBMapoDPO4), while the last three potentials, which are related to DNA binding, remain unchanged. In addition, VSBMapoDPO4 has been further rescaled to maintain the same folding temperature of the isolated DPO4 with different ??. Therefore, the overall balance between the four terms in the potential will not change. However, as mentioned by reviewer #3, for individual contacts, the change of ?? will break the previous balance among the native contact within DPO4, DPO4-DNA native contact and electrostatic interaction at the default parameter (??0=1.0). Nevertheless, this is exactly what we want to see: how the particular folding interactions in DPO4 affect DPO4-DNA binding (e. g. how a particular mutation in DPO4 changes the folding and also the DNA binding). We are also aware of the extreme situations where a big change of the intra- and interdomain interaction may significantly change interaction balance at the individual contact level. This will lead to a serious issue as the interaction balance between the folding and binding interaction is entirely broken, leading to an unrealistic model and then questionable results. In this study, we have applied a moderate change of ?? from 0.5 to 1.5, so the balance should have been reasonably maintained at the individual contact level. Please see the Materials and methods section. This has also been included in the above response.

They performed kinetic simulations of DPO4 folding at the pesudo room temperature Tr(sim) which was identified by rescaling with the ratio of simulated Tf to experimental Tf. Changing the Hamiltonian parameters could change the thermodynamic properties, as they already recognized that "Tf also changes with r". So Tr(sim) may also change with different ??. It seems the kinetic simulations were performed at the corresponding Tr(sim) recalibrated by the temperature shift caused by ?? change. But it is not clear if they did the same in the DPO4-DNA binding kinetic simulations. Please clarify it. In addition, they compared the DPO4-DNA binding affinities at different with experimental Kd which. But again the simulated Tf and Tr may shift due to the change of ??. So does it make sense to compare the ?? or T-dependent affinities with the experimental Kd which was measured at a fixed temperature?

We appreciate reviewer #3 for the useful comments regarding the simulation temperature. It is well known that the temperature is a critical factor in SBMs, so we have paid particular attention to setting the simulation temperature in order to make our results reliable and also comparable to the experiments. In general, the temperature in the simulations with SBM, which is an energy landscape-based physical model, does not directly connect to the real one. This is different from the temperature in MD simulations using a general empirical force field (e. g., Amber, CHARMM and OPLS/AA etc.), of which the parameters are determined by the quantum mechanics or experiment measures. Nevertheless, the temperature in the SBM simulation can be approximately connected to the real one based on consistent observations with the experiments, such as the folding temperature and B-factor (Jackson et al., 2015). Here, we have used the folding temperature to establish the connections between the simulations and the experiments.

As also noted by reviewer #3, we have observed that Tf changes with ??. To remove the effect of ?? on the temperature, we have assigned a ratio of Tf??0/Tf?? as the pre-factor of VSBMapoDPO4 based on the assumption that the temperature is linearly dependent on the energy in SBM. Therefore, after this implementation, the temperature scaling of SBMs with different ?? will be the same. We examined the folding temperature under the rescaled VSBMapoDPO4 at four different ??, and confirmed the validity of such implementation (Appendix 3—figure 3). In the end, all the SBMs with different ?? will generate the same folding temperature Tf, thus have the same temperature scaling (the same room temperature Tr and others). The temperature effects on the kinetics (both folding and DNA-binding) led by different ?? does not exist.

In order to compare with experiments, all the folding kinetic simulations were performed at the experimental temperatures. The experimental temperature in simulations has been estimated using the linear relation:

${\mathrm{??}}_{\mathrm{??}}\left(\mathrm{\text{sim}}\right)\approx \phantom{\rule{0.222em}{0ex}}{\mathrm{??}}_{\mathrm{??}}\phantom{\rule{0.222em}{0ex}}\left(exp\right)\frac{{\mathrm{??}}_{\mathrm{??}}\phantom{\rule{0.222em}{0ex}}\left(\mathrm{??}\mathrm{??}\mathrm{??}\right)}{{\mathrm{??}}_{\mathrm{??}}\phantom{\rule{0.222em}{0ex}}\left(\mathrm{??}\mathrm{??}\mathrm{??}\right)}$

Since Tf(sim) are the same for different ??, Tr(sim) will also be the same for different ??. This allowed us to perform the folding of DPO4 with different ?? at the same temperature (corresponding to the same environmental condition). For DPO4-DNA binding, we have also used the rescaled VSBMapoDPO4 in building the DPO4-DNA binding SBM. We have performed the DPO4-DNA binding simulation at the room temperature, where the experimental Kd was measured. As the folding temperatures of DPO4 for different ?? has been shifted to the same by the rescaled VSBMapoDPO4, Tr(sim) are also the same for different ??. This has also led to the same external environmental conditions (fixed temperature) for simulating the DPO4-DNA binding process.

Please see the Materials and methods section. We have done several modifications to make the description clear.

“To identify the backtracking and calculate the time for DPO4 folding, we performed additional kinetic simulations that ran at constant temperature. […] With the experimental temperatures Tp(exp) = 353 K and Tr(exp) = 300 K, we obtain the corresponding temperatures in the simulations: Tp(sim)=0.96 Tf (sim)=1.08 and Tr(sim)=0.81 Tf(sim)=0.92.”

3) They stated that "Since direct simulations on the transition between the IS and BS are computationally impractical due to the high barrier between these two states, we instead tried to infer the kinetic rates from the barrier heights for different ??." But they actually didn't show the inferred kinetic rates in this manuscript, but instead just show the barrier heights in Figure 4E. I understand that they used the barrier height as a proxy of the transition rate based on the Arrhenius equation with a uniform pre-exponential factor. To release the dependence on this assumption and further strengthen the work, they could consider using other enhanced sampling methods with relatively low computational cost, such as frequency-adaptive metadynamics (Wang et al., 2018) and weighted ensemble simulation [Annu Rev Biophys. 2017;46:43-57] etc., to obtain the transition rates.

Reviewer #3 suggested us calculating the “real” kinetic rates rather than inferring the kinetic information from the thermodynamic barrier. He/She also suggested two interesting papers that are helpful to calculate the kinetics from the enhanced sampling simulations. We appreciate his/her comments, which have inspired us to study the rare-event kinetics using reasonable computation resources during the revision. As per suggestion, we have performed additional frequency adaptive metadynamics simulations for DPO4-DNA binding with different ??. We focused on the transition processes between the specific BS and non-specific IS, and then calculated the kinetic times between the IS and BS through the methodologies developed previously (Tiwary and Parrinello, 2013; Salvalaglio, Tiwary and Parrinello, 2014; Wanf et al., 2018).

From Figure 4E, we see an apparent increase of transition time from the IS to BS as ?? increases (interdomain interaction strengthens) and a slight increase of transition time from the BS to IS as ?? increases. This observation of the ??-dependent kinetic trend is similar to that inferred from the thermodynamic barrier.

To see whether the kinetics and thermodynamics are closely linked, we compared the kinetic transition times with the corresponding barrier heights. In Figure 4—figure supplement 4B and C, we found strong correlations between the thermodynamic barriers and kinetic rates for both of the transitions between the IS and BS. This observation has demonstrated that thermodynamics and kinetics can be used to infer each other when it is difficult to measure one of the two in practice (Cao, Huang and Liu, 2016; Wang, Martins and Lindorr-Larrson, 2017).

The further quantitative link between thermodynamics and kinetics was investigated by comparing the thermodynamic stability obtained from the Umbrella Sampling simulations and kinetic stability, which is calculated by the logarithmic ratio between the forward and backward rates from the metadynamics simulations (Wang, Martins and Lindorr-Larrson, 2017). From Figure 4—figure supplement 5A, although we observed a strong correlation between these two quantities with different ??, the quantitative link seems only modest (an apparent deviation from the y = x reference line). The significant deviation was observed at the large free energy differences. It may be due to the sampling issue in the thermodynamics simulations when the system shows very distinct bistable populations at the IS and BS separated by a very high free energy barrier, or/and the precision of the kinetics calculations when the transition events are very slow.

We note that there may be another reason that has led to the inaccuracy. From the free energy profiles, we see that the IS has a wider distribution than the BS. This is more obvious when ?? is small. The dwell time at the basin should also take into account the distribution rather than simply using only the lowest free energy points. In principle, the transition time ttrans* can be obtained from the double integral of the free energy profiles projected on the reaction coordinate dRMS (Socci, Onuchic and Wolynes, 1996; Chahine et al., 2007):

$T{\text{trans}}^{?}=\text{?}{\int }_{{d}_{\text{RMS}}}^{{d}_{\text{RM}{S}^{\left(End\right)}}}d\text{?}\left({d}_{\text{RMS}}\right)\text{?}{\int }_{{d}_{{\text{RMS}}^{\left(Start\right)}}}^{{d}_{\text{RMS}}}d\text{?}\left({d}_{\text{RMS}}^{\prime }\right)\text{?}\frac{\mathrm{exp}?\left(F\left({d}_{\text{RMS}}\right)?F\left({d}_{\text{RMS}}^{\prime }\right)\right)\text{?}/k{T}_{r}}{d\text{?}\left({d}_{\text{RMS}}\right)}$

, where D(dRMS) is the position-dependent diffusion coefficient. By approximately setting D(dRMS) to a constant, we calculated the kinetic stability from the free energy profile written as ln(tIS->BS*/ tBS->IS*), which was subsequently compared to the “real” kinetic stability obtained from the Metadynamics simulations (Figure 4—figure supplement 5B). In order to be consistent with the Metadynamics simulation, where the transition event was defined as when the binding reached a threshold of dRMS. We set dRMS(Start)=0.0nm and dRMS(End)=2.5nm for the transition from the BS to IS and dRMS(Start)=3.5nm and dRMS(End)=0.1nm for the transition from the IS to BS. We see an improvement on the quantitative consistency between these two kinetic stabilities (Figure 4—figure supplement 5B, smaller magnitudes of the slope towards 1 and the intercept towards 0 than those in Figure 4—figure supplement 5A), which were independently calculated from the two different simulations, from that compared with the barrier height (Figure 4—figure supplement 5A). Nevertheless, there is still an element to consider, which we may attribute to the fair treatment for the constant D(dRMS). D(dRMS) has been found to be position-dependent in the binding-coupled-folding process of an intrinsically disordered protein (Chu et al., 2019), though the effect is minor (a cause of ~2kT free energy shift was observed at the binding complex in our previous study with SBMs).

“Direct simulations of the transitions between the IS and the BS are expected to be computationally demanding owing to the high barriers between these two states. […] Further details of the frequency-adaptive metadynamics simulations can be found in SI Appendix 1.”

4) They stated that "We found a monotonic increase of barrier height for both two transitions between the IS and BS as ?? increases (Figure 4E)." Without error estimations, it is hard to judge if the trend for BS→IS barrier increase with ?? is significant or just within the errors. I would strongly suggest they do error estimations and include error bars in the free energy profiles.

We appreciate this suggestion. In order to perform the error analysis, we have run additional three sets of umbrella sampling simulations (there are now a total of four sets of umbrella sampling simulations) with the same simulation parameters except for using different initial conditions (different initial DPO4-DNA configurations and velocities). These combined four independent Umbrella Sampling simulations allowed us to perform the error analysis. As shown in the new figures, all the conclusions made in the original manuscript are preserved. See Figure 4A and Figure 4—figure supplement 4A.

5) There is one experimental author involved in this manuscript, so I read this work as an experimental/simulation collaboration, in which the simulations provide valuable predictions for experimental tests and validations. Besides the comparison with experimental Kd, it will strengthen the work by more comparisons. I understand that it is always non-trivial to combine and compare experiments and simulations, but I will appreciate if they could discuss and suggest the possibilities that could be tested by further experiments.

We thank the reviewer for this comment. We incorporated one paragraph in the Discussion and Conclusions section of the revised manuscript which describes how our theoretical findings may impact the future experimental studies. Please see the Discussion and Conclusions section.

“Our theoretical predictions can be potentially assessed by targeted biophysical experiments. […] The well-developed experimental approaches for DPO4, such as melting circular dichroism (CD) spectroscopy (Sherrer et al., 2012), tryptophan fluorescence (Wong et al., 2008), real-time FRET (Xu et al., 2009; Maxwell et al., 2014; Raper and Suo, 2016), and structural determinations (Ling et al., 2001; Vaisman et al., 2005; Ling et al., 2004b,a; Vyas et al., 2015), can subsequently be used to investigate the kinetics and mechanism of DPO4 folding, DNA binding, and nucleotide binding and incorporation.”

6) Could the conclusions in this manuscript be extended for other multi-domain proteins? Or how general are the conclusions?

This is a good suggestion from the reviewer and we thereby added several sentences in the Discussion and Conclusions section. Please see the Discussion and Conclusions section.

“Our modeling and simulations are applicable to various Y-family DNA polymerases, and we expect similar 1ndings to be obtained for these. [...] Our results can offer useful guidance for protein design and engineering at the multidomain level.”

[Editors’ note: what follows is the authors’ response to the second round of review.]

Revisions:

In comparison to the original submission, the reviewers agree that manuscript has been substantially improved in terms of both presentation and substance. The reviewers suggested several relatively minor revisions:

Results paragraph four indicates that the ratio of inter and intra-domain contacts was varied. How this was implemented was not clear from the text. For example, one could vary one weight, or the other, or one could modulate both, while keeping some other quantity (e.g. total stabilizing energy) constant. Without this clearly defined, it is difficult fully appreciate the potential significance of any trends based on this ratio.

We thank the reviewers for pointing out this issue. In this study, we modulated the ratio (ρ) of the interdomain strength versus intradomain strength in structure-based models (SBMs) by changing only the strength of interdomain contact (εInter) while keeping the other parameters in SBMs the same as they are in default. Since the default value of intradomain contact strength in SBMs is 1.0 (εIntra=1.0), here we can simply have ρ=εInterIntraInter. For kinetic simulations, we further rescaled the total potential of the SBM (VSBMapoDPO4) by multiplying a ratio of Tfρ0/Tfρ to ensure that the DPO4 folding temperature with different ρ remains the same as the one in default (ρ0=1.0) (Please see details in the “Materials and methods” section).

We have added the descriptions of how we practically implemented the modulation of ρ in SBMs in the “Results” section:

“In practice, this is implemented by changing only εInter while keeping εIntra and the other parameters to the default as they are in a homogeneously weighted SBM…”

The authors should expand the discussion on the potential experimental consequence of their conclusions. For example, what signature might single molecule force spectroscopy observe as an implication. What other experiments can be conducted in this regard.

This is a good suggestion. In our previous manuscript, we made a preliminary discussion regarding the potential assessments that can be made in the future experiments to verify our simulation results. Also agreed with this comment, we are now aware that the previous discussion is incomplete and needs further extension with more details. Therefore, we have incorporated several sentences in the discussion of the revised manuscript describing how our theoretical findings may impact future experimental work on this issue.

“Subsequently, the well-developed experimental approaches for DPO4 can be used to investigate the kinetics and mechanism of DPO4 folding, DNA binding, and nucleotide binding and incorporation. […] Furthermore, the structural determination of DPO4 in complex with a damaged DNA substrate and an incoming nucleotide (Ling et al., 2001; Vaisman et al., 2005; Ling et al., 2004b; Bauer et al., 2007; Ling et al., 2004a; Vyas et al., 2015) can provide insights into the effects of backtracking caused by weakening intradomain interactions via mutations, which disrupt the structures of the ternary complexes.”

The manuscript repeatedly describes effects, relative to the "default" parameterization of a structure-based model. It is not clear what significance the default parameters may have. That is, are the default parameters considered to be an accurate approximation to systems in the cell? If so, how, and to what extent? Perhaps the initial parameterization is far from appropriate for the current application, in which case variations in the ratio of different interactions may correspond to a regime that is not biologically relevant. It is important that the authors present these trends in terms of the physical insights into a biological process.

This is a useful comment. In our simulations, we see that the SBM using default parameters (r=r0=1.0) captures many of the experimental observations on DPO4 folding and DNA binding. The consistency is mainly reflected in the following two aspects. (1) The existence of the folding intermediates: Both our current simulations and our previous experiments identified the intermediate states formed during DPO4 (un)folding, which indicates asynchronous (un)folding of individual domains in DPO4 (Sherrer et al., 2012). Structural analysis reveals that there are much more native contacts formed by intradomain than interdomain in DPO4 (Appendix 2—table 1), so the individual domain folding is strongly favored by the default, topology-based SBM, and it is minorly affected by the presence of the other domains. Such a decoupling of the domain folding is encoded in DPO4 native topology, which is the key element for forming the folding intermediates. (2) The complex conformational dynamics of DPO4 during DNA binding: Increasing experimental evidence suggested that DPO4-DNA recognition is a complex three-state binding process accompanied by the conformational transition occurring in DPO4 (Brenlla et al., 2014 ; Raper et al., 2016). Using the default SBM, we obtained an experimentally consistent picture of dynamical DPO4-DNA binding and further characterized a conformational equilibrium shift in DPO4 as DNA binding proceeds. These two features manifest that the mechanisms of the folding and the DNA-binding of DPO4 are largely determined by the native topologies of the protein and the complex, reminiscent of what has been observed in the single-domain protein folding (e. g., Clementi et al., 2000) and the protein-protein binding (e. g., Levy et al., 2004).

However, the in-depth analysis of the results from the default SBM and the measurements from the experiments remains elusive, and sometimes the comparisons lead to conflicting observations. For folding, due to the lack of precision determination on DPO4 (un)folding kinetics, whether DPO4 in reality has further optimized its internal topology-based interactions through assigning the heterogeneity into the intradomain and interdomain interactions to accelerate folding is unknown. Our simulation results suggest that weakening interdomain interactions in the default, homogenously weighted SBM can accelerate DPO4 folding in favor of the “divide-and-conquer” scenario. There are rich hydrophobic residues inside the domains in DPO4 (Ling et al., 2001), which should naturally increase the weight of intradomain interactions in the default SBM. Thus, the deduction from our simulations is seemingly true that the default SBM may have neglected the extra contributions of the intradomain hydrophobic interactions in DPO4 by over-weighting the interdomain interactions to slow down the folding. For binding, there are two prominent inconsistencies between the default SBM and experiments. One is the binding affinity, which has been significantly enhanced by the default SBM. The default SBM quenches conformational flexibility in the DPO4-DNA complex because of the relatively strong interdomain interactions in DPO4, leading to an enhanced binding affinity. The other is the aberrantly stable IS, which corresponds to functionally inactive complex. The default SBM promotes the formation of the IS rather than the functionally active BS, so the subsequent DNA replication process is not favored. Our results show that the weak interdomain interactions in DPO4 not only stabilize the BS, but also facilitate the transition between the active and inactive complex, in favor of DNA binding.

Therefore, we see that the default SBM can in part describe the folding and DNA-binding of DPO4, as both of these two processes are driven by the native topologies of the protein and complex, compatible with the principle of minimal frustration. However, the accurate descriptions require further and detailed parameterizations on the SBM. Our simulations, with a focus on modulating the interdomain interaction strength in DPO4, underlines the importance of weakening interdomain interactions in the default SBM in increasing the folding speed and generating an experimentally consistent picture of the DNA-binding process. We have added several sentences regarding this comment in the “Discussion and conclusions” section:

“We see that the default SBM with homogeneously weighted intra- and interdomain interactions can lead to many consistencies in describing the DPO4 folding and DPO4-DNA binding processes with experiments. […] Considering both the DPO4 folding and the DPO4-DNA binding results, we have suggested that the weak interdomain interactions in DPO4 are the key to the trade-off between DPO4 folding and function.”

The authors may further revise their figures and make them even more intuitive. For example, rho could be labelled as relative strength of inter vs. interactions on the figure, so could MTCI. The labelling of Figure 2G is not very clear, and similar issues exist for some other figure panels.

We have followed this suggestion and modified our figures related throughout the revised manuscript.

https://doi.org/10.7554/eLife.60434.sa2

## Article and author information

### Author details

1. #### Xiakun Chu

Department of Chemistry, State University of New York at Stony Brook, New York, United States
##### Contribution
Conceptualization, Resources, Data curation, Software, Formal analysis, Validation, Investigation, Visualization, Methodology, Writing - original draft, Writing - review and editing
##### Competing interests
No competing interests declared
2. #### Zucai Suo

Department of Biomedical Sciences, College of Medicine, Florida State University, Tallahassee, United States
##### Contribution
Conceptualization, Funding acquisition, Writing - original draft, Project administration, Writing - review and editing
##### Competing interests
No competing interests declared
3. #### Jin Wang

Department of Chemistry, State University of New York at Stony Brook, New York, United States
##### Contribution
Supervision, Conceptualization, Resources, Visualization, Funding acquisition, Investigation, Writing - original draft, Project administration, Writing - review and editing
##### For correspondence
jin.wang.1@stonybrook.edu
##### Competing interests
No competing interests declared

### Funding

#### National Institute of General Medical Sciences (R01GM124177)

• Zucai Suo
• Jin Wang

The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.

### Acknowledgements

XC thank Dr. Yongqi Huang for helpful discussion of the reweighting method. The authors thank Stony Brook Research Computing and Cyberinfrastructure, and the Institute for Advanced Computational Science at Stony Brook University for access to the high-performance SeaWulf computing system, which was made possible by a \$1.4M National Science Foundation grant (#1531492)

### Senior Editor

1. Cynthia Wolberger, Johns Hopkins University School of Medicine, United States

### Reviewing Editor

1. Yibing Shan, DE Shaw Research, United States

### Reviewer

1. Yong Wang, University of Copenhagen, Denmark

### Publication history

2. Accepted: October 16, 2020
3. Accepted Manuscript published: October 20, 2020 (version 1)
4. Version of Record published: November 4, 2020 (version 2)

? 2020, Chu et al.

## Metrics

• 343
Page views
• 69
• 0
Citations

Article citation count generated by polling the highest count across the following sources: Crossref, PubMed Central, Scopus.

A two-part list of links to download the article, or parts of the article, in various formats.

### Open citations (links to open the citations from this article in various online reference manager services)

1. Biochemistry and Chemical Biology
2. Computational and Systems Biology

# Spatially compartmentalized phase regulation of a Ca2+-cAMP-PKA oscillatory circuit

Brian Tenner et al.
Research Article

Signaling networks are spatiotemporally organized to sense diverse inputs, process information, and carry out specific cellular tasks. In β cells, Ca2+, cyclic adenosine monophosphate (cAMP), and Protein Kinase A (PKA) exist in an oscillatory circuit characterized by a high degree of feedback. Here, we describe a mode of regulation within this circuit involving a spatial dependence of the relative phase between cAMP, PKA, and Ca2+. We show that in mouse MIN6 β cells, nanodomain clustering of Ca2+-sensitive adenylyl cyclases (ACs) drives oscillations of local cAMP levels to be precisely in-phase with Ca2+ oscillations, whereas Ca2+-sensitive phosphodiesterases maintain out-of-phase oscillations outside of the nanodomain. Disruption of this precise phase relationship perturbs Ca2+ oscillations, suggesting the relative phase within an oscillatory circuit can encode specific functional information. This work unveils a novel mechanism of cAMP compartmentation utilized for localized tuning of an oscillatory circuit and has broad implications for the spatiotemporal regulation of signaling networks.

1. Biochemistry and Chemical Biology
2. Computational and Systems Biology