# FLEXIQuant-LF to quantify protein modification extent in label-free proteomics data

1. F.M. Kirby Neurobiology Center, Boston Children’s Hospital, United States
2. Department of Neurology, Harvard Medical School, United States
3. Department of Pathology, Boston Children’s Hospital, United States
4. Bioinformatics Unit (MF1), Robert Koch Institute, Germany
5. Department of Medical Biotechnology, Institute of Biotechnology, Technische Universit?t Berlin, Germany
6. Department of Pathology, Harvard Medical School, United States
7. Data Analytics and Computational Statistics, Hasso-Plattner-Institute, Faculty of Digital Engineering, University of Potsdam, Germany
8. Precision Vaccines Program, Boston Children’s Hospital, United States
Tools and Resources
Cite this article as: eLife 2020;9:e58783

## Abstract

Improvements in LC-MS/MS methods and technology have enabled the identification of thousands of modified peptides in a single experiment. However, protein regulation by post-translational modifications (PTMs) is not binary, making methods to quantify the modification extent crucial to understanding the role of PTMs. Here, we introduce FLEXIQuant-LF, a software tool for large-scale identification of differentially modified peptides and quantification of their modification extent without knowledge of the types of modifications involved. We developed FLEXIQuant-LF using label-free quantification of unmodified peptides and robust linear regression to quantify the modification extent of peptides. As proof of concept, we applied FLEXIQuant-LF to data-independent-acquisition (DIA) data of the anaphase promoting complex/cyclosome (APC/C) during mitosis. The unbiased FLEXIQuant-LF approach to assess the modification extent in quantitative proteomics data provides a better understanding of the function and regulation of PTMs. The software is available at https://github.com/SteenOmicsLab/FLEXIQuantLF.

## Introduction

Most cellular processes are regulated by post-translational modifications (PTMs). Given the sensitivity of current mass spectrometric analyses, even non-functional basal PTMs can be identified by mass spectrometry. Thus, an understanding of the stoichiometry and changes thereof are crucial to understanding function. Numerous bioinformatics tools that enable unbiased and large-scale identification of modified peptides and localization of the modification site such as MSFragger (Kong et al., 2017), MetaMorpheus (Solntsev et al., 2018), or TagGraph (Devabhaktuni et al., 2019)?have been developed. With steadily increasing qualitative information about the identity of PTMs (Grimsrud et al., 2010;?Choudhary et al., 2014; Kim et al., 2011; Melo-Braga et al., 2014; Udeshi et al., 2013), tools to investigate the modification quantity, that?is the proportion of the protein carrying PTMs, become increasingly important. Thus far, analysis of the PTM extent has mainly focused on quantifying the modified peptides, for?example using PTM enrichment methods (Pinkse et al., 2004; Rush et al., 2005; Zhou et al., 2011;?Lundby et al., 2012; Lundby et al., 2013), heavy isotope-labeled synthetic peptides (Gerber et al., 2003) or by enzymatically removing (Yamagata et al., 2002; Wu et al., 2011) PTMs. These approaches require prior knowledge of the PTMs and have been primarily applied to the phosphoproteome.

The development of FLEXIQuant (Full-Length-Expressed Stable Isotope-labeled Proteins for Quantification) (Singh et al., 2009; Singh et al., 2012a), for the first time, enabled the quantification of the extent of modification across a whole protein without prior knowledge of modification identities using an indirect approach of quantifying only the unmodified peptides. FLEXIQuant relies on the principle that the total number of molecules of a given protein is conserved in a sample, thus the number of molecules of each unique peptide derived from that protein will be equal. If a peptide is modified chemically by a PTM this would reduce the abundance of its unmodified cognate in the peptide pool. In FLEXIQuant, this idea is realized by analyzing the peptide pool of the sample with an added unmodified heavy isotope-labeled full-length reference protein. The extent of modification is calculated by comparing the quantities of labeled reference peptides derived from the standard to the unlabeled cognate peptides from the sample, that is the unlabeled endogenous protein. The advantage of this approach is the precise and accurate quantification of the degree of abundance reduction due to a modification as defined by Wold, 1981 or amino acid substitutions of each quantified unmodified peptide and the ability to calculate the absolute concentration. FLEXIQuant and derivatives thereof have been applied to study for example Tiki1 modification in head formation in frogs (Zhang et al., 2012), to gain insights into the phosphorylation dynamics of GSK3β-dependent phosphorylation of DCX (Singh et al., 2012b), the cell-cycle-dependent phosphorylation of KifC1 (Singh et al., 2014)?or to investigate PTM of Tau in Alzheimer’s disease (Mair et al., 2016). However, the limitation of this method is the requirement of a purified, unmodified, labeled reference protein and thus, it is labor intensive and has largely been used to investigate individual proteins. Recently, a PTM analysis pipeline without the need for an internal reference protein has been published (Bagwan et al., 2018). Although this is encouraging, the quantification relies on isobaric labeling approaches which are expensive to conduct due to the high cost of the labeling reagents and is therefore not widely applicable, especially not for large-scale experiments. This highlights the need for bioinformatic tools suitable for the analysis of label-free experiments.

Here, we introduce FLEXIQuant-LF as an unbiased, label-free computational tool to indirectly detect modified peptides and to quantify the degree of modification based solely on the unmodified peptide species building upon the FLEXIQuant (Singh et al., 2009) idea developed in our lab. We developed this approach to identify and elucidate differential protein modification extent in commonly analyzed time series or case-control studies. In these studies, a timepoint or control group is used as reference to enable a FLEXIQuant-like modification analysis. The key requirement of FLEXIQuant-LF is the unbiased, comprehensive and highly reproducible quantification of peptides, therefore, we use a data-independent acquisition (DIA) strategy (namely, SWATH [Gillet et al., 2012]). Here, we demonstrate the utility of FLEXIQuant-LF by applying our method to interrogate the peptide-resolved modification dynamics of the anaphase-promoting complex/cyclosome (APC/C) during mitosis (Singh et al., 2009; Steen et al., 2008).

## Materials and methods

Key resources table

### FLEXIQuant-LF algorithm and benchmarking

Request a detailed protocol

FLEXIQuant-LF indirectly identifies modified peptides by means of pinpointing unmodified peptides whose intensities strongly deviate from those of a reference sample using robust linear regression. Based on the distance of each peptide to the regression line, Relative Modification (RM) scores are calculated that, in analogy to the light to heavy ratio of the original FLEXIQuant method, correspond to the degree of modification (Figure 1B).

Figure 1

More specifically, the FLEXIQuant-LF algorithm trains a linear regression model for each sample in the input file using the random sample consensus (RANSAC) algorithm (Fischler and Bolles, 1981) to identify outliers iteratively and fit the model only based on the inliers, that is based on stably and consistently quantified unmodified peptides. For this, peptide intensities of each sample, as defined by the user as a single row in the input data file, are used as dependent variable (yi), peptide intensities of a user defined reference sample as independent variable (xi) and no intercept is fitted (Figure 1B2):

(1) $y\left(i\right)=\beta {x}_{i}+?$

In Formula 1, the slope of the regression line is described by β, whereas?ε represents an error term. To increase reproducibility and enhance the fit of the linear regression model to the data the algorithm is executed 30 times and the best model, based on r2 scores, is selected. Subsequently, vertical distances between each peptide intensity and the obtained regression line are determined, that is intensity differences between measured and expected peptide intensities. To correct for sample-specific differences in individual protein abundance and peptide-specific differences in intensity, distances are normalized by dividing them by the expected intensity, that is the reference intensity of the peptide from the independent variable multiplied by the slope of the regression line. FLEXIQuant-LF raw scores are then calculated by subtracting the normalized distances from 1 (Figure 1B3). For each sample, peptides with a raw score above three standard deviations from the median (MAD) of all raw scores are classified as outliers and excluded from the subsequent RM score (Relative Modification score) calculation. RM scores are calculated by dividing each inlier raw score by the median of the three highest raw scores (after removing outliers) for each sample, corresponding to one minus the extent of modification as defined by FLEXIQuant (Figure 1B4). A detailed derivation of the method can be found in our repository on GitHub: https://github.com/SteenOmicsLab/FLEXIQuantLF (copy archived at swh:1:rev:4ea3945f86ba477227c89e9ced75fc23751355ac;?Kahnert and Steen Lab, 2020).

The FLEXIQuant-LF method was implemented in Python 3.7.3 using Scikit-learn 0.21.2 (Pedregosa, 2011), NumPy 1.16.4 (van der Walt et al., 2011), Pandas 0.24.2 (McKinney, 2010), SciPy 1.2.1 (Jones, 2001), Matplotlib 3.1.0 (Hunter, 2007), Seaborn 0.9.0 (Waskom, 2017). Parameters of the RANSAC algorithm are set as FLEXIQuant-LF defaults as follows: max_trials?=?1000, base_estimator?=?linear_model.LinearRegression, min_samples?=?0.5, stop_probability?=?1, loss=‘squared_loss’, residual_threshold?=?MAD(sample intensities)2. The Fit_intercept parameter of the linear regression algorithm is set to False. The graphical user interface was built using QT Designer 5.13.0 (Espoo, 2019), QT 5.9.7 (The QT Company, 2019), and PyQT 5.13.2 (Riverbank Computing Limited, 2019). PyInstaller 3.6 (PyInstaller Development Team, 2019) was used to create the executable file. The command line interface version was built utilizing Click 7.0 (The Pallets Projects, 2020).

To benchmark FLEXIQuant-LF and identify differentially modified peptides in the time course experiment, peptide intensities of the proteins APC1, APC4, APC5, APC7, APC10, APC15, APC16, CDC16, CDC20, CDC23, CDC26, and CDC27 from the well-studied APC/C were extracted. Peptides used for quantification had to be unmodified, or only modified with oxidation on methionine and/or carbamidomethylation on cysteine residues. Furthermore, they had to be quantifiable in all LC/MS runs. Median intensities of all replicates of each time point were calculated. Time point 0 (S phase) was defined as reference time point for FLEXIQuant-LF application using the default values described above. Peptides were then classified in three categories based on their RM scores at time point 10 hr: (1) RM score?<0.5: peptide is likely differentially modified (magenta bars), (2) 0.5?≤?RM score<0.6: peptide is possibly differentially modified (blue bars), and (3) RM score?≥0.6: peptide is likely not differentially modified (green bars) (Figure 1B5).

### Reproducibility and quality assessment

Request a detailed protocol

The reproducibility of the developed approach was tested using a large, inhouse MRM data set consisting of 81 samples of which 26 were reference group samples. We ran FLEXIQuant-LF 1000 times on the same protein and analyzed the resulting slopes to determine the number and frequency of different outcomes. To further test the quality of the classification and quantification accuracy and independent publicly available, DIA dataset was downloaded from the PRIDE repository (PXD005573) (Bruderer et al., 2017). Search results (file 'R Fig1 HeLa 30kMS1_Report.xls') were filtered to remove all proteins that were identified with any modification. Additionally, only peptides uniquely matching a single protein were kept. To establish a ground truth, one-third of the remaining proteins were chosen randomly for further analysis. For each protein, four peptides were selected at random and a reduction factor between 0.0 and 1.0 was generated randomly from a uniform distribution to reduce the intensity of each of these peptides for each of the three replicates. For both, the unchanged and?>1400 in?silico modified peptides, FLEXIQuant-LF was applied with default parameters. Additionally, as parameter for the reference group all three replicates were selected. Resulting RM scores of in?silico modified peptides were compared to the RM scores of unchanged peptides and the RM score difference calculated as percentage of change between the two. The resulting values were then correlated with the expected change factors. Sensitivity and precision were also evaluated for the three classification categories. Furthermore, total RM score errors as defined by the difference between the expected change factor and the calculated RM score changes were compared to the number of peptides measured per protein to assess robustness of the FLEXIQuant-LF approach.

### Application improvement using a ‘superprotein’ approach

Request a detailed protocol

FLEXIQuant-LF results improve the more peptides are available to determine the linear regression. Two proteins (APC15, APC16) of APC/C are small and thus resulted in fewer than five quantified peptides in our experiments. To extend our analysis of APC/C to these proteins with an insufficient number of quantified peptides, we approached APC/C as a ‘superprotein’ drawing on the well-studied equal stoichiometry of APC/C components in the complex (with the exception of CDC20, which was omitted from this ‘superprotein’ approach, as it is known to change in abundance during M-phase (see results)). Therefore, all reliably quantified peptides were analyzed together as if they?were derived from a pan-APC/C ‘superprotein’ following the steps described above for single proteins up to raw score calculation. Peptides were then split according to the proteins they originate from and RM scores were subsequently calculated for each protein separately to improve quantification accuracy.

### Cell line

Request a detailed protocol

STR authenticated HeLa S3 cells were purchased from ATCC and assayed negative for mycoplasma contamination. HeLa S3 cells were grown in DMEM media (Invitrogen, Carlsbad, CA) supplemented with 10% FBS (Invitrogen), 2 mM L-glutamine (Invitrogen), 100 μg/ml penicillin and streptomycin mix (Invitrogen).

### Sample generation and co-IP

Request a detailed protocol

For S phase, cells were treated with 2 mM thymidine (Sigma, St. Louis, MO) at 80% confluency for 20 hr and cultured for 8 hr in media without thymidine. For M phase, thymidine-treated cells were cultured for 3 hr in fresh media and treated with 100 ng/ml nocodazole (Sigma) and were collected at 4 hr, 8 hr, and 10 hr time points. Cells were washed with PBS and lysed in 1x RIPA lysis buffer (Santa Cruz, Dallas, TX) supplemented with 1 M DTT (Sigma), 10% glycerol, protease and phosphatase inhibitors (HALT, Thermo Fisher Scientific, Waltham, MA) using a bead beater homogenizer (Precellys, Rockville, MD). Cell lysate volumes of 300 μl with 2–3 mg protein were precleared with 20 μl Affi-Prep Protein A beads (Biorad, Hercules, CA) and 20 μg mouse IgG (Santa Cruz) for 2 hr at 4°C with end-on-end rotation. Affi-Prep Protein A beads were washed twice with six volumes of buffer A (10 mM phosphate buffer, pH 7.4, 120 mM NaCl, 2.7 mM KCl, 0.1% Triton-X), twice with six volumes of buffer B (10 mM phosphate buffer, pH 7.4, 120 mM NaCl, 300 mM KCl, 0.1% Triton-X) and then once more with six volumes of buffer A. Freshly prepared beads (20 μl) conjugated with 20 μg CDC27 antibody (AF3.1, Santa Cruz) were incubated with precleared lysates for 2 hr at 4°C with end-over-end rotation. As control, the experiment was repeated using beads without conjugated antibody (controls). The beads were washed as described above. The immunoprecipitate was eluted from beads by boiling in 1x Laemmli buffer with β-mercaptoethanol.

### Sample preparation/digestion

Request a detailed protocol

The samples were separated on a 4–12% SDS-PAGE gel (Invitrogen) in 1x MES buffer (Invitrogen) and the excised band was subjected to in-gel digestion. In brief, proteins were reduced with DTT, alkylated with iodoacetamide and digested with trypsin. After extraction of the peptides with acetonitrile, peptides were dried down and resuspended in a buffer containing 5% formic acid/5% acetonitrile/90% water. HRM calibration peptides (Biognosys, Schlieren, Switzerland) were added to the samples prior to analysis according to manufacturer instructions.

### Spectral library generation

Request a detailed protocol

To generate a spectral library, we analyzed each sample once using a standard data-dependent acquisition (DDA) method on a TripleTOF 5600 mass spectrometer (Sciex, Framingham, MA) coupled online to a nanoLC system (Sciex/Eksigent, Dublin, CA), equipped with an LC-chip system (cHiPLC nanoflex, Eksigent, trapping column: Nano cHiPLC trap column 200 μm x 0.5 mm Reprosil C18 3 μm 120 ?, analytical column: Nano cHiPLC column 75 μm x 15 cm Reprosil C18 3 μm 120 ?). Peptides were separated by a linear gradient from 95% buffer A (0.2% FA in water)/5% buffer B (0.2% FA in ACN) to 70% buffer A/30% buffer B within 90 min. The mass spectrometer was operated in data-dependent TOP50 mode with the following settings: MS1 mass range 300–1700 Th with 250 ms accumulation time; MS2 mass range 100–1700 Th with 50 ms accumulation time and following MS2 selection criteria: UNIT resolution, intensity threshold 100 cts; charge states 2–5. Dynamic exclusion was set to 17 s.

The DDA data were searched with MaxQuant (v1.5.2.8) (Cox and Mann, 2008) against the human UNIPROT database (only reviewed entries, downloaded on October 31, 2014) using the .WIFF files without additional file conversion. The protein sequence database was appended with common laboratory contaminants (cRAP, version 2012.01.01) and the iRT fusion protein sequence (Escher et al., 2012) (Biognosys) resulting in 20,296 entries. The following settings were applied: trypsin with up to two missed cleavages; mass tolerances set to 0.1 Da for the first search and 0.01 for the main search. Oxidation of M (+15.995 Da), phosphorylation of S, T, Y (+79.966 Da) and acetylation of N-TERM (+42.011 Da) were selected as dynamic modifications and carbamidomethylation of C (+57.021 Da) was selected as a static modification. FDR was set to 1% at both the peptide and protein level. The default settings were used for all other search parameters. Finally, a spectral library based on the MaxQuant search results was generated in Spectronaut 7.0 (Biognosys) using the following settings: Q value cut-off of 0.01, and a minimum of 3 and a maximum of 6 fragment ions.

To improve PTM identification output, the DDA data were also searched in Protein Pilot (v4.5.1, Sciex) using the same database described above and the following settings: sample type – identification, cys alkylation – iodoacetamide, digestion – trypsin, instrument – TripleTOF 5600, special factors – phosphorylation emphasis and gel-based ID, ID focus – biological modifications, search effort – thorough ID. For the ProteinPilot searches, it is not necessary to select the instrument mass accuracy, number of missed cleavages or the PTMs. The search results were filtered by a peptide FDR of 1%.

### SWATH sample acquisition

Request a detailed protocol

The SWATH data were acquired using the same mass spectrometer (TripleTOF 5600, Sciex) and LC-setup as described for the DDA samples. The mass spectrometer was operated in SWATH mode covering the mass range from 400 to 1000 Th with 75 windows (8 Th width with 1 Da overlap). The accumulation time was set to 40 ms. Additionally, an MS1 scan was acquired in the mass range from 400 to 1000 Th with an accumulation time of 250 ms, resulting in a cycle time of 3.3 s. The samples were acquired in triplicates.

### SWATH data analysis

Request a detailed protocol

All SWATH data were directly analyzed in Spectronaut 7.0 (Biognosys) (Bruderer et al., 2015) without any file conversion using the previously generated spectral library. The following settings were applied in Spectronaut 7.0: peak detection – dynamic iRT, correction factor 1; dynamic score refinement and MS1 scoring – enabled; interference correction and cross run normalization (total peak area) – enabled. Peptides were grouped according to the protein grouping by MaxQuant during generation of the spectral library. Spiked-in HRM peptides were used in the analysis for retention time and m/z calibration. The m/z tolerance was in the range of 12 ppm and the median extraction window was 4 min. All results were filtered by a Q value of 0.05 (equivalent to an FDR of 5% at the peptide level). All other settings were set to default. Protein intensities were calculated by summing the peptide peak areas (sum of fragment ion peak areas as calculated by Spectronaut) from the Spectronaut output file.

## Results

The objective of our study was to establish a workflow to elucidate peptide-resolved modification dynamics in an unbiased manner without the need for heavy isotope-labeled reference proteins or peptides. Our strategy focused on profiling unmodified peptides, as any variation in the extent of modification of a peptide results in the reduction of the amount/intensity of the remaining unmodified peptide species. This consideration also applies to modifications on or close to proteolytic cleavage sites that interfere with the proteolytic cleavage. In such cases, the affected proteolytic cleavage yield shows a reduction which is equivalent to the modification extent. This interference with the proteolytic cleavage results in a proteolytic missed cleavage peptide, which in turn leads to a change in the amount/intensity of the two unmodified, fully cleaved peptide species. Thus, variation in the amount of an unmodified peptide can be used to identify and quantify peptides whose modification state changes relative to a reference point in a time series or in a dose-dependent manner. The idea of focusing on the quantity of the unmodified peptide to identify and quantify modified peptides was implemented in the original FLEXIQuant approach, which uses stable isotope-labeled reference samples (Singh et al., 2009; Singh et al., 2012a). Here, we introduce a new, label-free (LF) implementation of this approach: FLEXIQuant-LF.

FLEXIQuant-LF enables unbiased indirect identification of differentially abundant peptides resulting from PTM in label-free mass-spectrometry data. Besides identifying differentially modified peptides the tool can also quantify the extent of these modifications. Our tool only requires raw peptide intensities as input data making it widely applicable. FLEXIQuant-LF determines differentially abundant unmodified peptides using RANSAC-based robust linear regression in combination with a sophisticated normalization and score calculation procedure (Figure 1B). The RANSAC algorithm iteratively determines outliers in the data and fits the regression line only to observations not classified as outliers (inliers). This ensures that the regression line is fitted to the most robustly quantified peptides and facilitates the quantification of the modification extent based on the distance to the regression line.

We benchmarked FLEXIQuant-LF by analyzing the anaphase promoting complex/cyclosome (APC/C) during mitotic arrest. Application of FLEXIQuant-LF to this complex for proof of concept had two advantages: (1) excellent enrichment protocols for isolation of the complex by targeting the cell division cycle protein 27 (CDC27) are well established (Singh et al., 2009) and (2) the cell cycle-dependent phosphorylation dynamics of APC/C have been qualitatively and quantitatively described (Singh et al., 2009; Steen et al., 2008). We studied the APC/C after treating thymidine-synchronized HeLa cells with nocodazole for 4 hr, 8 hr, and 10 hr to induce prometaphase arrest. The APC/C isolated from S-phase cells served as a control (workflow in Figure 1A).

The software is open source and available in two versions: an easy-to-use graphical user interface (GUI) version as well as a command line interface (CLI) version to allow for integration into bioinformatics analysis pipelines.

### LC/MS analysis of the APC/C

Prior to analysis of the peptide-resolved modification dynamics of the APC/C, the complex was purified from each time point separately by co-immunoprecipitation (co-IP) using an anti-CDC27 antibody (workflow in Figure 1A). The protein digests of the co-immunoprecipitates from the different cell cycle stages were analyzed in triplicate using an LC-SWATH method. The SWATH dataset was analyzed using a project-specific spectral library comprising 13,724 peptides representing 2041 proteins (details in Material and Methods). Two out of 12 technical replicates were excluded from the analysis due to low number of identified/quantified proteins (one each from the 4 hr and 10 hr time points). As the APC/C is a well-characterized protein complex, we focused on the bona fide members of this protein complex, namely APC1, APC4, APC5, APC7, APC10, APC15, APC 16, CDC16, CDC20, CDC23, CDC26, and CDC27 (raw intensities of all peptides can be found in Supplementary file 1). For further FLEXIQuant-LF analysis, only unmodified peptides, and peptides with methionine oxidation or carbamidomethylation on cysteine were considered. Furthermore, any peptide had to be reliably quantified across all runs. In total, 248 out of 289 APC/C component-derived peptides (86%) met those criteria. Across these 248 quantified peptides, the median Pearson correlation coefficient between replicates was 0.98 (range: 0.96 to 0.99) highlighting excellent quantification.

### Benchmarking FLEXIQuant-LF with CDC27 and APC5

For the benchmarking of FLEXIQuant-LF, we focused on CDC27 and APC5. Cell cycle-dependent modification dynamics have been extensively studied for both proteins (Singh et al., 2009; Steen et al., 2008). These pre-existing published data enabled us to validate our peptide classifications.

We quantified 33 peptides corresponding to CDC27 (sequence coverage: 55%), of which nine peptides were classified as likely differentially modified within the time series (Figure 2A, magenta bars) by our FLEXIQuant-LF method. Four peptides were classified as possibly differentially modified (Figure 2A, blue bars). One peptide (31LYAEVHSEEALFLLATCamYYR50) was classified as outlier based on its raw scores and removed before RM score calculation.

Figure 2

We first checked for modified peptide species within our DDA dataset. In total, we identified six CDC27-derived phosphorylated peptides in the DDA data (Figure 2A, orange stars). The FLEXIQuant-LF analysis classified the unmodified versions of these six phosphopeptides as follows: five likely differentially modified peptides and one possibly differentially modified peptide. For the other four peptides identified as likely differentially modified as well as for two out of the three remaining peptides classified as possibly differentially modified by FLEXIQuant-LF, phosphorylations, and/or ubiquitinations were described (Steen et al., 2008) or deposited online at http://www.phosphosite.org (Hornbeck et al., 2015) (yellow stars). In summary, FLEXIQuant-LF demonstrated an excellent performance in detecting potentially modified peptides. We found evidence of modification for all peptides identified as likely modified in our study and for three out of four possibly differentially modified peptides.

In addition to the detection of potentially modified peptides, we designed FLEXIQuant-LF to also quantify the degree of differential modification. Relative to time point 0 hr (S-phase) in our time series, the N-terminal region of CDC27 (amino acids 170–441) was increasingly modified after 4 hr (27% to 54%, average: 36%) and reached its maximum modification extent after 10 hr (between 55% and 81%, average: 65%) (Figure 2A). The peptide spanning amino acids 442 to 456 (442ISTITPQIQAFNLQK456) also showed an increasing degree of modification but to a lower extent (up to 48% modified after 10 hr).

For APC5, two peptides (497FPPNSQHAQLWMLCamDQK513 and 35IAVLVLLNEMSR46) out of 37 quantified peptides (sequence coverage: 59%) were classified as outliers based on their raw scores and were excluded from further analysis. Out of the 35 remaining peptides, four were classified as likely differentially modified across the time points (Figure 2B, magenta bars), while five peptides were considered as possibly differentially modified (Figure 2B, blue bars). We identified in our DDA data the phosphorylated cognates for two of the four likely modified peptides (Figure 2B, orange stars). We found evidence of modifications within the literature for the other two peptides identified as likely differentially modified in our FLEXIQuant-LF analysis as well as for four of the five peptides classified as possibly differentially modified (yellow stars)?(Hornbeck et al., 2004). Interestingly, we did not find evidence for modification for one possibly differentially modified peptide (99LMAEGELK106). This could be interpreted either as a false positive classification or that this peptide has a hitherto undescribed modification on the methionine, glutamic acid or lysine residues.

Overall, these findings are in excellent agreement with the original label-based FLEXIQuant analysis of CDC27 and APC5 (Singh et al., 2009), which identified the same protein regions as being modified to a comparable extent. It was noted that compared to the original FLEXIQuant study the extent of modification was lower in our novel FLEXIQuant-LF analysis. A highly probable explanation for this phenomenon could be that there was a basal modification level of up to 20% for some peptides in S-phase, as shown with FLEXIQuant (Singh et al., 2009) which determines the absolute extent of modification. Thus, we emphasize that the FLEXIQuant-LF approach provides the extent of modification relative to the given reference time point.

### The peptide-resolved modification dynamics of other APC/C complex components

FLEXIQuant-LF has an advantage over the conventional FLEXIQuant implementation in that the analysis can easily be performed on all proteins within a sample (provided a minimum of five peptides was quantified to allow for linear regression). Here, we extended our analysis to all other APC/C proteins of which sufficient peptides were reliably quantified in our data, namely APC1 (Figure 3) and APC4, APC7, APC10, CDC16, CDC20, and CDC23 (Figure 3—figure supplement 1). By applying FLEXIQuant-LF, we classified between zero (APC4, APC10) and seven peptides (APC1) per protein as likely differentially modified and between zero (APC4, APC10, CDC20) and four peptides (APC1) per protein as possibly differentially modified (summary in Table 1 and Table 2).

Figure 3 with 2 supplements see all
Table 1
Table 2

From APC1, we quantified 54 peptides (sequence coverage: 39%) in total, of which four peptides were classified as outlier based on their raw scores and were excluded from further analysis (930LVVWMTNVGFTLR942, 943DLETLPFGIALPIR956, 1546TGGEMNYGFHLAHHMALGLLFLGGGR1571 and 1618LLVPVDVDTNTPCamYALLEVTYK1639). Out of the remaining 50 peptides, seven peptides were classified as likely differentially modified and four peptides as possibly differentially modified (Figure 3, magenta and blue bars, respectively). We identified phosphorylated peptides in the DDA data for five of the seven likely and for three out of four possibly differentially modified peptides (Figure 3, orange stars).

We also classified the peptide 311GDSPVTSPFQNYSSIHQSR330 as likely differentially modified by our FLEXIQuant-LF analysis, although no modified cognate peptides were identified in the DDA data. However, multiple modifications, that is phosphorylations on S313, T316, S317, Y322 and S324, have been reported in previous studies (Hornbeck et al., 2015) (indicated by a yellow star in Figure 3). For one likely (1070LLQLCamQR1076) and one possibly differentially modified peptide (1077TMALPVGR1084), we could not find published evidence of PTMs within the sequence of those two peptides. Interestingly, these are consecutive peptides with the latter one starting with a threonine residue (AA 1077). A hitherto undiscovered phosphorylation at this threonine residue could lead to a highly increased missed cleavage which would be consistent with the observed intensity reduction of both peptides.

A more detailed analysis of the peptide-resolved modification dynamics observed in APC1 identified two regions within the protein that appear to have different modification kinetics (Figure 3). Peptides in the N-terminal part ranging from residues 285–368 (four peptides), showed a degree of modification of 29–31% after 4 hr, 63–68% after 8 hr and between 70% and 71% after 10 hr. In contrast, the peptides in a more C-terminal domain spanning residues 680–739 (three peptides) showed slower modification kinetics and a lower degree of modification at 10 hr. After 4 hr, 17–22% change in modification status was observed. The extent of modification changed between 39% and 57% after 8 hr, and 49–64% after 10 hr. These data demonstrate that tools to study the peptide-resolved modification dynamics in a quantitative manner will greatly improve the current understanding of biological processes.

### Testing and validating reproducibility of FLEXIQuant-LF

We tested the reproducibility of FLEXIQuant-LF by analyzing a large inhouse MRM data set consisting of 81 samples of which 26 were reference group samples. Over 1000 runs FLEXIQuant-LF yielded ambiguous results for 6 out of 81 samples when using one RANSAC initiation. In all six cases, two distinct outcomes were observed. To improve the reproducibility, we initiated the RANSAC algorithm multiple times within a single run (1, 10, 20, 30, 40, 50, 60, 70, 80, 90, and 100) and selected the best model (based on r2 score). This was again tested by running each set of initiations 1000 times and analyzing the slopes of the resulting best models. Initiating RANSAC 10 times resulted in six ambiguous samples, 20 initiations in 5, 30 in 2 and 40 in 3. Between 50 and 100 initiations, the number of ambiguous samples was oscillating between 1 and 2 (Figure 4A). Furthermore, as illustrated by means of sample 30, the frequency of different outcomes within a sample decreased with increasing number of RANSAC initiations (Figure 4B). With one initiation the suboptimal outcome prevailed (88%), whereas with 10 initiations the optimal outcome was predominant (67.2%). Using 30 initiations the fraction of suboptimal outcomes was 2.2% and further decreased to 0.5% with 50 initiations.

Figure 4

Overall, using 30 RANSAC initiations and selecting the best model resulted in 44 suboptimal outcomes in 81,000 runs of FLEXIQuant-LF (0.05%). These results confirm the benefits of multiple RANSAC initiations and demonstrate a very high reproducibility of the developed approach. Based on these finding, the default number of RANSAC initiations in FLEXIQuant-LF was set to 30 but can be changed by the user.

We also tested the quality of FLEXIQuant-LF classification and quantification on an independent DIA dataset (Bruderer et al., 2017). After stringent filtering 122 proteins without detected modifications and with unique and unambiguous associations to peptides were selected for in?silico modification, that is intensities of single peptides were reduced to simulate the presence of modified peptide species. For each protein, four peptides were randomly chosen, and their intensity adjusted by a random change factor for each replicate resulting in 1574 total FLEXIQuant-LF runs (Supplementary file 3). RM score changes showed a high correlation with the in-silico intensity change factors (Pearson’s r?=?0.98, Figure 4C). The overall near perfect correlation of the in-silico and RM score changes was also reflected in the classification results. The classes of likely not differentially modified and likely differentially modified peptides also showed near optimal performance with sensitivity and precision values at 97.5% and 99.1%, and 100.0% and 96.1%, respectively, while the class of possibly differentially modified peptides with hard-to-define borders showed reduced performance with a sensitivity of 77.2% and precision of 88.6% (Figure 4D). This decrease in accuracy likely reflects more on the definition criteria of the middle group than on the performance of the method. Additionally, of interest was the robustness of FLEXIQuant-LF results in relation to the number of peptides available for each protein. Overall, quantification errors were low with a mean error of 0.020 (SD?=?0.058,<0.1 for 94.7% of cases, Figure 4E). Quantification errors in proteins with fewer peptides (n?=?5–7: μ?=?0.027, SD?=?0.072) did not deviate from proteins with more peptides (n?=?8–10: μ?=?0.010, SD?=?0.020; n?=?11–13: μ?=?0.013, SD?=?0.026; n?=?14–17: μ?=?0.006, SD?=?0.012) and were within a very small error margin of modification extent of?<5% (Figure 4F). Outliers are rare (<5%) and can be attributed to the random selection and simulation effects of the assessment approach.

### Analyzing multiple proteins as a ‘superprotein’

For the two smallest APC/C subunits, namely APC15 and APC16, the number of reliably quantified peptides were 1 and 3, respectively, that is?below the described threshold of five quantifiable peptides, making the above described FLEXIQuant-LF analysis nonapplicable. However, if proteins are known to have invariable expression/abundance profiles over the considered time period or between different conditions of interest, as for?example observed for the core components of the APC/C, we hypothesized that the peptides of multiple proteins can be analyzed together by treating this set of proteins with invariable abundance as one ‘superprotein’. This would expand the FLEXIQuant-LF approach to smaller proteins and allow for the identification and quantification of differential modification of proteins with less than five quantified peptides that could not be analyzed by the normal FLEXIQuant-LF implementation otherwise.

To test this ‘superprotein’ approach, we applied FLEXIQuant-LF to analyze all reliably quantified APC/C proteins in our data in a combined fashion as a ‘superprotein’, namely APC1, APC4, APC5, APC7, APC10, APC15, APC16, CDC16, CDC23, and CDC27. CDC20 was excluded from this ‘superprotein’ approach as it is known that this APC/C regulator displays changing affinity to the APC/C during the M-phase of the cell cycle, that is its abundance within the APC/C changes during the investigated time period (Fang et al., 1999). We validated the ‘superprotein’ strategy by comparing its results to the results of the FLEXIQuant-LF analysis of the individual proteins for all proteins that could be analyzed with both strategies. The two approaches resulted in identical classification of peptides as well as near identical quantification of the modification extent (mean RM score difference?=?2.45E-10, r2?=?1), demonstrating the applicability of the ‘superprotein’ strategy (Supplementary file 2).

The FLEXIQuant-LF analysis of APC15 and APC16 using the ‘superprotein’ approach revealed that all four peptides associated with these two small proteins not amenable to the conventional FLEXIQuant-LF approach are likely not differentially modified (Supplementary file 2 and Figure 3—figure supplement 2A and B). These results are in accordance with the literature where, to the best of our knowledge, no cell cycle-dependent differential modifications in APC15 and APC16 have been described for the detected protein domains (Hornbeck et al., 2015).

## Discussion

Our FLEXIQuant-LF is based on robust linear regression, as implemented in the RANSAC algorithm. In general, linear regression requires a certain number of data points to yield meaningful results and becomes more accurate with increasing number of data points. Thus, FLEXIQuant-LF becomes more powerful with increasing numbers of unique peptides that can be identified for the proteins of interest. We set the minimum number of unique peptides required for a FLEXIQuant-LF analysis to five, although a larger number of peptides will provide more robust results. Addressing this limitation, we introduce the ‘superprotein’ approach.

In the presented proof-of-concept study, we applied FLEXIQuant-LF to a time series experiment. However, FLEXIQuant-LF can also be applied to study protein modification differences between different experimental conditions or cohorts in general, e.g. comparing diseased vs. healthy individuals or testing the effect of different drugs. As FLEXIQuant-LF only determines the extent of differential modification of a peptide relative to the reference based on the unmodified counterparts, quantification of the absolute abundance of the modification still requires an isotopically labeled internal standard. However, using a reference time point or condition has the advantage of highlighting modifications that exhibit dynamic regulation in the context of interest. Furthermore, FLEXIQuant-LF can be used to repeatedly analyze the same data set with varying reference samples, providing complementary perspectives on the peptide-resolved modification dynamics. This consideration is particularly important for the current implementation of FLEXIQuant-LF which only quantifies modifications extents that are increasing relative to the reference sample. This is due to our normalization method used for RM score calculation, which excludes peptides with decreased modification. FLEXIQuant-LF reports these peptides in the ‘_removed_peptides.csv’ output file and raw score information can be retrieved from the ‘_raw_score.csv’ output file. We reasoned that the reference sample is normally the least modified protein. In case this assumption does not hold, a second?FLEXIQuant-LF can be carried out after selecting another sample as the reference.

While FLEXIQuant-LF, similar to the initial FLEXIQuant approach, indirectly identifies peptides that are differentially modified, identifying the specific type and site of modification is not the objective of this analysis. The detailed characterization and localization of the modifications require either more in-depth data analysis or follow-up experiments. Nevertheless, this analysis method can be easily applied in any affinity purification or global proteomics experiment and the results can further be used to identify proteins to be synthesized as reference standards, if more in-depth experiments are needed. In the original FLEXIQuant approach, changes in the modification extent had to be above 20% in order to be reliably detected. For our proof of concept study, we chose 40% which is consistent with stoichiometries commonly observed in cell cycle-focused or signaling pathway activation-related large-scale phosphoproteomics studies, which reported serine and threonine phosphorylation extents exceeding 75% and tyrosine phosphorylation extents above 50% (Sharma et al., 2014). However, our FLEXIQuant-LF approach enables the selection of a minimum change in modification extent deemed to be interesting and/or compatible with the precision of the dataset; clearly the usual trade-off considerations between specificity and sensitivity apply.

Even though APC/C is a moderately abundant complex within the cell, we opted for affinity enrichment to ensure a high number of unique peptides per protein of the complex components are measured. For highly abundant protein classes such as the ribosome and proteasome (Beck et al., 2011), a direct FLEXIQuant-LF analysis on the whole cell lysate is most likely a viable option. However, we expect that future technological developments in LC/MS instrumentation and data analysis approaches will make less abundant proteins and protein complexes such as signaling proteins/complexes amenable to FLEXIQuant-LF analysis without further enrichment of the proteins/protein complexes of interest.

A premise of the FLEXIQuant-LF method is, that the modification state of the majority of peptides in a protein does not change between experimental conditions or over the time period considered in a study. This is likely to hold true for most proteins. However, in some cases with more modified than unmodified peptides as in Tau for example (Mair et al., 2016), FLEXIQuant-LF might not be applicable.

The combination of affinity purification of the protein complex and DIA analysis facilitated the precise and reproducible peptide quantification. Currently, DIA is the most suitable acquisition method due to its precise peptide quantification capabilities with fewer missing values. However, the FLEXIQuant-LF approach is also applicable to other label-free quantification methods.

### Conclusions

Here, we presented FLEXIQuant-LF, a computational label-free implementation of the original FLEXIQuant method using a DIA dataset of the APC/C during mitotic arrest as a proof of concept. FLEXIQuant-LF is a widely applicable analysis workflow building on the previously published FLEXIQuant idea of monitoring the unmodified peptides and changes to their signal intensities to indirectly detect modification events and to quantify the degree of modifications. The use of a reference timepoint or reference sample within a study instead of labeled reference proteins allows FLEXIQuant-LF to be universally applicable post hoc to any protein robustly identified across the sample series without the need for the addition of heavy isotope-labeled protein standard(s).

The application of FLEXIQuant-LF to a DIA dataset of the APC/C isolated from HeLa cells at various time points during a mitotic arrest, allowed us to recapitulate the peptide-resolved modification dynamics previously observed with our original heavy isotope-labeled internal standard-based FLEXIQuant method. Using FLEXIQuant-LF, we easily extended our analysis to other APC/C subunits present in the sample set and delivered a comprehensive report on the peptide-resolved modification dynamics of the core components of APC/C following nocodazole-induced mitotic arrest. Furthermore, we introduced a ‘superprotein’ approach which allowed us to include proteins, which provided insufficient numbers of unique peptides to be analyzed individually; this is a limitation that applies in particular to small proteins, but can also be encountered in very acidic or hydrophobic proteins with few proteolytic cleavage sites. This ‘superprotein’ strategy enabled us to extract more information from data and gain additional insights about peptides/proteins that would have been disregarded otherwise.

FLEXIQuant-LF is unbiased toward the type of modification and relies on identification of the unmodified counterpart of a modified peptide to detect and quantify modification events, irrespective of the modification’s nature, that is?the types of modifications involved remain unknown. The valuable information of modification event detection and modification extent quantification can subsequently be used to study rare PTMs and to potentially discover novel PTMs using more targeted approaches in future experiments. Therefore, FLEXIQuant-LF can pave the way to a better general understanding of peptide-resolved PTM dynamics of proteins. We expect that FLEXIQuant-LF will enable researchers to revisit their data and identify novel modification-driven features of health and disease that could be used as targets for treatment.

### Data and software availability

Data are available via ProteomeXchange with identifier PXD018411.

FLEXIQuant-LF is available with graphical user interface (GUI) as standalone executable EXE file. Additionally, a command line interface (CLI) version is available as PY file. Both versions as well as the Python source code can be downloaded via GitHub: https://github.com/SteenOmicsLab/FLEXIQuantLF.

## References

1. 1
2. 2
3. 3
4. 4
5. 5
6. 6
7. 7
8. 8
9. 9
QT Designer, version 5.13.0 (2019)
The QT Company.
10. 10
Control of mitotic transitions by the anaphase–promoting complex (1999)
Philosophical Transactions of the Royal Society of London. Series B: Biological Sciences 354:1583–1590.
https://doi.org/10.1098/rstb.1999.0502
11. 11
Random sample consensus (1981)
Communications of the ACM 24:381–395.
https://doi.org/10.1145/358669.358692
12. 12
13. 13
14. 14
Phosphoproteomics for the masses (2010)
ACS Chemical Biology 5:105–119.
https://doi.org/10.1021/cb900277e
15. 15
16. 16
17. 17
Matplotlib: a 2D graphics environment (2007)
Computing in Science & Engineering 9:90–95.
https://doi.org/10.1109/MCSE.2007.55
18. 18
SciPy: Open Source Scientific Tools for Python
(2001)
SciPy: Open Source Scientific Tools for Python.
19. 19
20. 20
21. 21
22. 22
23. 23
24. 24
25. 25
Data structures for statistical computing in Python
(2010)
Proceedings of the 9th Python in Science Conference. pp. 51–56.
26. 26
27. 27
Scikit-learn: machine learning in Python
(2011)
Journal of Machine Learning Research 12:2825–2830.
28. 28
29. 29
PyInstaller, version 3.6 (2019)
PyInstaller.
30. 30
31. 31
32. 32
33. 33
34. 34
35. 35
36. 36
37. 37
38. 38
39. 39
Click 7.0 (2020)
Click 7.0.
40. 40
QT 5.9.7 (2019)
QT 5.9.7.
41. 41
42. 42
43. 43
Mwaskom/Seaborn: V0.8.1
(2017)
Mwaskom/Seaborn: V0.8.1.
44. 44
45. 45
46. 46
47. 47
48. 48

## Decision letter

1. Akhilesh Pandey
Reviewing Editor; Mayo Clinic, United States
2. Philip A Cole
Senior Editor; Harvard Medical School, United States

In the interests of transparency, eLife publishes the most substantive revision requests and the accompanying author responses.

Acceptance summary:

While large-scale identification of post-translational modifications by mass spectrometry has become trivial, determining the extent of these modifications remains a challenge. FLEXIQuant-LF permits an assessment of the extent of PTMs without the use of heavy labeled standards and should be especially useful in large-scale experiments involving cellular dynamics or perturbation experiments.

Decision letter after peer review:

Thank you for submitting your article "FLEXIQuant-LF: Robust Regression to Quantify Protein Modification Extent in Label-Free Proteomics Data" for consideration by eLife. Your article has been reviewed by three peer reviewers, one of whom is a member of our Board of Reviewing Editors, and the evaluation has been overseen by Philip Cole as the Senior Editor. The reviewers have opted to remain anonymous. The reviewers have discussed the reviews with one another and the Reviewing Editor has drafted this decision to help you prepare a revised submission.

The manuscript describes a method for quantifying modifications of peptides through analysis of intensity of unmodified peptides. The authors highlight that it is capable of large-scale identification and quantification of differentially modified peptides without any prior knowledge of the type of modification. The method was applied to DIA data from an experiment designed to study APC/C complex to benchmark and to identify differentially modified peptides in a time course setting. The obvious advantage of this method is that it does not rely on heavy isotope-labeled proteins/peptides. The manuscript describes a novel method, which is likely to be impactful for the field and could be a good fit as an eLife Tools and Resources paper. However, the way the manuscript is currently written is quite confusing in several places. The authors are advised to submit a revised manuscript that addresses the following issues:

Essential revisions:

1) Modifying the text and the Materials and methods section to emphasize that changes are measured in unmodified peptides and that the modification is not directly measured at all. Perhaps, the authors could consider including a graphic to illustrate this.

2) The authors assume that the number of molecules of each unique peptide derived from a protein will be equal. Because in routine proteomic analyses, this is not always true possibly due to the different degree of digestions at different proteolytic sites. The authors should address this issue.

3) Since many peptides often contain multiple potential sites of modifications (e.g. several phosphorylation sites on the same peptide), it is not possible to accurately measure the phosphorylation dynamics for each site. The authors should modify their claims and discuss this important caveat.

4) Although the authors state that FLEXIQuant-LF allows quantification of the modification extent without prior knowledge of the type of modification, they should explicitly state that after this analysis, the exact type of modification is still large unknown.

5) The authors should carry out an artificial positive control experiment using synthetic peptides to provide better understanding to readers of how well the algorithm works.

6) The authors should apply FLEXIQuant-LF to one of publicly available label-free phosphoproteome datasets with known answers regarding the extent of phosphorylation? If the authors can do this and obtain similar conclusions, then it can indeed be presented as a powerful strategy that can be applied in many studies.

https://doi.org/10.7554/eLife.58783.sa1

## Author response

Essential revisions:

1) Modifying the text and the Materials and methods section to emphasize that changes are measured in unmodified peptides and that the modification is not directly measured at all. Perhaps, the authors could consider including a graphic to illustrate this.

We thank the reviewer for the comment and the suggestion to include an illustration to clarify the methodology. We have adjusted the descriptions throughout the text and the Materials and methods section to clarify that FLEXIQuant-LF, like FLEXIQuant, measures the modification extent indirectly by observing the abundance changes of the unmodified peptide counterparts. We also expanded Figure 1B to illustrate better that the unmodified peptide abundances are used to measure the modification extent.

Changes in manuscript:

The development of FLEXIQuant (Full-Length-Expressed Stable Isotope-labeled Proteins for Quantification) (Singh et al., 2009; Singh et al., 2012 ), for the first time, enabled the quantification of the extent of modification across a whole protein without prior knowledge of modification identities using an indirect approach of quantifying only the unmodified peptides. FLEXIQuant relies on the principle that the total number of molecules of a given protein is conserved in a sample, thus the number of molecules of each unique peptide derived from that protein will be equal. If a peptide is modified chemically by a PTM this would reduce the abundance of its unmodified cognate in the peptide pool. […] The advantage of this approach is the precise and accurate quantification of the degree of abundance reduction due to a modification as defined by Wold (Wold, 1981) or amino acid substitutions of each quantified unmodified peptide and the ability to calculate the absolute concentration.”

Changes in manuscript:

“Here, we introduce FLEXIQuant-LF as an unbiased, label-free computational tool to indirectly detect modified peptides and to quantify the degree of modification based solely on the unmodified peptide species building upon the FLEXIQuant (Singh et al., 2009) idea developed in our lab. We developed this approach to identify and elucidate differential protein modification extent in commonly analyzed time series or case-control studies.”

Changes in manuscript:

“FLEXIQuant-LF indirectly identifies modified peptides by means of pinpointing unmodified peptides whose intensities strongly deviate from those of a reference sample using robust linear regression. […] More specifically, the FLEXIQuant-LF algorithm trains a linear regression model for each sample in the input file using the random sample consensus (RANSAC) algorithm (Fischler et al., 1981) to identify outliers iteratively and fit the model only based on the inliers, i.e. based on stably and consistently quantified unmodified peptides.”

Changes in manuscript:

“To increase reproducibility and enhance the fit of the linear regression model to the data the algorithm is executed 30 times and the best model, based on r2 scores, is selected (Figure 4A and B). […] RM scores are calculated by dividing each inlier raw score by the median of the three highest raw scores (after removing outliers) for each sample, corresponding to one minus the extent of modification as defined by FLEXIQuant (Figure 1B4).”

Changes in manuscript:

“Peptides were then classified in three categories based on their RM scores at time point 10h: (1) RM score < 0.5: peptide is likely differentially modified (magenta bars), (2) 0.5 ≤ RM score < 0.6: peptide is possibly differentially modified (blue bars) and (3) RM score ≥ 0.6: peptide is likely not differentially modified (green bars) (Figure 1B5).”

Changes in manuscript:

“The objective of our study was to establish a workflow to elucidate peptide-resolved modification dynamics in an unbiased manner without the need for heavy isotope-labeled reference proteins or peptides. […] This interference with the proteolytic cleavage results in a proteolytic missed cleavage peptide, which in turn leads to a change in the amount/intensity of the two unmodified, fully cleaved peptide species.”

Changes in manuscript:

“FLEXIQuant-LF enables unbiased indirect identification of differentially abundant peptides resulting from post-translational modification in label-free mass-spectrometry data. […] This ensures that the regression line is fitted to the most robustly quantified peptides and facilitates the quantification of the modification extent based on the distance to the regression line.”

Changes in manuscript:

“However, FLEXIQuant-LF can also be applied to study protein modification differences between different experimental conditions or cohorts in general, e.g. comparing diseased vs. healthy individuals or testing the effect of different drugs. As FLEXIQuant-LF only determines the extent of differential modification of a peptide relative to the reference based on the unmodified counterparts, quantification of the absolute abundance of the modification still requires an isotopically labeled internal standard.”

“While FLEXIQuant-LF, similar to the initial FLEXIQuant approach, indirectly identifies peptides that are differentially modified, identifying the specific type and site of modification is not the objective of this analysis.”

Changes in manuscript:

“FLEXIQuant-LF is a widely applicable analysis workflow building on the previously published FLEXIQuant idea of monitoring the unmodified peptides and changes to their signal intensities to indirectly detect modification events and to quantify the degree of modifications. The use of a reference timepoint or reference sample within a study instead of labeled reference proteins allows FLEXIQuant-LF to be universally applicable post hoc to any protein robustly identified across the sample series without the need for the addition of heavy isotope-labeled protein standard(s).”

Changes in manuscript:

“Figure 1: Workflow and FLEXIQuant-LF Concept. […] (B1) Firstly, intensities of unmodified peptides are used to indirectly identify and quantify the modification extent of a protein using the following steps. ”

2) The authors assume that the number of molecules of each unique peptide derived from a protein will be equal. Because in routine proteomic analyses, this is not always true possibly due to the different degree of digestions at different proteolytic sites. The authors should address this issue.

We thank the reviewer for the comment. We agree that different degrees of digestion at proteolytic cleavage sites could theoretically pose an issue when comparing the total amount of peptides within a protein. However, it is our experience that similar samples digested within a batch on the same day show very similar proteolysis yields. Furthermore, under non-limiting digestions conditions, the digestion yields reach a plateau, i.e. even longer digestion times will not majorly affect the ratio of fully cleaved vs. missed cleaved. Both these assumptions are also made in almost all targeted LC/MS assays which add a heavy proteotypic peptide as internal standard to the sample prior to digestion in order to absolutely quantify the protein of interest. In summary, as long as non-limiting digestion conditions are ensured, we strongly believe that the risk of vastly different degrees of digestions can be managed, especially when it is ensured that the reference sample and the samples of interest are digested in the same batch.

In addition to the biochemical aspect, FLEXIQuant-LF is sufficiently robust to add another ‘layer of protection’ against different degrees of digestion by using robust RANSAC regression line fitting: Differences in reproducibility due to e.g. variable degree of digestion does not affect the robust RANSAC linear regression fitting and will not result in biased results. This notion is underscored by the fact that we observed several pairs of fully cleaved peptides and peptides with missed cleavages – both show very similar ratios relative to the reference sample.

Of note: This treatment of proteolytic cleavages as modifications is consistent with the 1981 definition of modification by Wold (Wold, 1981) which defines it as affecting peptide bonds, carboxyl- and amino-termini as well as individual amino acid side chains. In our manuscript we extend this definition by changes from the canonical sequence, i.e. mutations.

To clarify these points, we have adapted the text to reflect proteolytic cleavages more generally and highlighted changes as abundance changes due to modification throughout the manuscript and included the definition of modification as used in FLEXIQuant-LF.

Changes in manuscript see our response to Essential revisions point 1.

3) Since many peptides often contain multiple potential sites of modifications (e.g. several phosphorylation sites on the same peptide), it is not possible to accurately measure the phosphorylation dynamics for each site. The authors should modify their claims and discuss this important caveat.

We thank the reviewer for the comment and hope to clarify this point in the following. We agree that it is hardly possible to accurately measure individual modification dynamics of each site when several phosphorylation events occur within the same peptide. However, we want to emphasize here that FLEXIQuant-LF, as well as the experimental version FLEXIQuant, do not to measure and identify types of modifications nor their sites but solely highlight the presence of modifications and measure the extent thereof through indirect means, i.e. by measuring the unmodified peptide. Therefore, FLEXIQuant-LF does not measure individual modification dynamics per site but does so on a peptide level. We hope the changes we made following the first comment implicitly clarify this point in the manuscript further (see Essential revisions point 1). We also added the term ‘peptide-resolved’ to modification dynamics to clarify this point further.

Changes in manuscript:

“Here, we demonstrate the utility of FLEXIQuant-LF by applying our method to interrogate the peptide-resolved modification dynamics of the anaphase-promoting complex/cyclosome (APC/C) during mitosis (Singh et al., 2009; Steen et al., 2008) ”

Changes in manuscript:

“The objective of our study was to establish a workflow to elucidate peptide-resolved modification dynamics in an unbiased manner without the need for heavy isotope-labeled reference proteins or peptides.”

Changes in manuscript:

“Prior to analysis of the peptide-resolved modification dynamics of the APC/C, the complex was purified from each time point separately by co-immunoprecipitation (co-IP) using an anti-CDC27 antibody (workflow in Figure 1A).”

Changes in manuscript:

“The peptide-resolved modification dynamics of other APC/C complex components”

Changes in manuscript:

“A more detailed analysis of the peptide-resolved modification dynamics observed in APC1 identified two regions within the protein that appear to have different modification kinetics (Figure 3). […] These data demonstrate that tools to study the peptide-resolved modification dynamics in a quantitative manner will greatly improve the current understanding of biological processes.”

Changes in manuscript:

“Furthermore, FLEXIQuant-LF can be used to repeatedly analyze the same data set with varying reference samples, providing complementary perspectives on the peptide-resolved modification dynamics.”

Changes in manuscript:

“The application of FLEXIQuant-LF to a DIA dataset of the APC/C isolated from Hela cells arrested at various time points during a mitotic arrest, allowed us to recapitulate the peptide-resolved modification dynamics previously observed with our original heavy isotope-labeled internal standard-based FLEXIQuant method. Using FLEXIQuant-LF, we easily extended our analysis to other APC/C subunits present in the sample set and delivered a comprehensive report on the peptide-resolved modification dynamics of the core components of APC/C following nocodazole-induced mitotic arrest.”

Changes in manuscript:

“Therefore, FLEXIQuant-LF can pave the way to a better general understanding of peptide-resolved post-translational modification dynamics of proteins.”

4) Although the authors state that FLEXIQuant-LF allows quantification of the modification extent without prior knowledge of the type of modification, they should explicitly state that after this analysis, the exact type of modification is still large unknown.

We thank the reviewer for the comment. Although we never claim to identify the exact type of modification, we agree that our wording in the manuscript does not explicitly state this fact. We have added clarifications in the Abstract and conclusion of the manuscript to further highlight this point.

Changes in manuscript:

“Here, we introduce FLEXIQuant-LF, a software tool for large-scale identification of differentially modified peptides and quantification of their modification extent without knowledge of the types of modifications involved.”

Changes in manuscript:

“FLEXIQuant-LF is unbiased towards the type of modification and relies on identification of the unmodified counterpart of a modified peptide to detect and quantify modification events, irrespective of the modification’s nature, i.e. the types of modifications involved remain unknown. The valuable information of modification event detection and modification extent quantification can subsequently be used to study rare PTMs and to potentially discover novel PTMs using more targeted approaches in future experiments.”

5) The authors should carry out an artificial positive control experiment using synthetic peptides to provide better understanding to readers of how well the algorithm works.

We thank the reviewer for the suggestion. The use of synthetic peptides with and without modifications is challenging to use as a positive control as FLEXIQuant-LF relies on other peptides of the protein. Spiking in synthetic peptides with modifications changes the overall abundance of this peptide compared with the remaining peptides of a protein and therefore would not be suitable as a positive control. Similarly, using synthetic proteins with modifications the purity and overall changes of modifications provides a suboptimal positive control experiment. To address the point raised by the reviewer we have added additional analyses of an independent DIA experiment with replicates by Bruderer et al. (Bruderer et al., 2017, publicly available data downloaded from the PRIDE repository, PXD005573) and changed randomly based on a uniform distribution the intensities of peptides by multiplying by a factor between 0.0 and 1.0 representing a modification extent of 100% to 0%, respectively, and tested whether FLEXIQuant-LF could recover the expected change. We assessed this both qualitatively, i.e. classifying the peptide as likely not differentially modified, possibly differentially modified, or likely differentially modified, and quantitatively, i.e. whether the RM-score difference between in-silico changed and unchanged peptides represents the factor. This procedure was repeated on 122 proteins with multiple peptides per protein resulting in a total of 484 peptides being tested across replicates. Classification of peptides as likely not differentially modified and likely differentially modified were identified with sensitivity and precision of >96%. Due to the hard-to-define border of the narrow group of possibly differentially modified peptides situated in the grey zone between the clean cut groups of likely and likely not differentially modified peptides results are reduced in sensitivity of 72.2% and precision of 88.6%. This decrease in accuracy likely reflects more on the definition criteria (boundaries of RM-score > 0.5 and RM-score < 0.6) than on the method itself. Additionally, the RM score also shows a high correlation with the expected in-silico modification changes (Pearson’s r=0.98). We have added two panels to Figure 4 highlighting the results and added a paragraph in the Materials and methods and Results section describing the additional analyses.

Changes in manuscript:

“Reproducibility and quality assessment

[...] To further test the quality of the classification and quantification accuracy an independent publicly available DIA dataset was downloaded from the PRIDE repository (PXD005573) (Bruderer et al., 2015). […] Furthermore, total RM score errors as defined by the difference between the expected change factor and the calculated RM score changes were compared to the number of peptides measured per protein to assess robustness of the FLEXIQuant-LF approach.”

Changes in manuscript:

“Testing and Validating Reproducibility of FLEXIQuant-LF

[…] We also tested the quality of FLEXIQuant-LF classification and quantification on an independent DIA dataset (Bruderer et al., 2015). [...] RM score changes showed a high correlation with the in-silico intensity change factors (Pearson's r=0.98, Figure 4C). [...] Overall, quantification errors were low with a mean error of 0.020 (SD=0.058, <0.1 for 94.7% of cases, Figure 4E). […] Outliers are rare (<5%) and can be attributed to the random selection and simulation effects of the assessment approach.”

Changes in manuscript:

“Figure 4. FLEXIQuant-LF reproducibility and error estimation. […] Overall quantification errors are very low and further improve with the number of peptides used for FLEXIQuant-LF analysis.”

6) The authors should apply FLEXIQuant-LF to one of publicly available label-free phosphoproteome datasets with known answers regarding the extent of phosphorylation? If the authors can do this and obtain similar conclusions, then it can indeed be presented as a powerful strategy that can be applied in many studies.

We thank the reviewer for the comment. FLEXIQuant-LF is a label-free based method that identifies and quantifies the modification extent of peptides within a protein in an indirect manner by measuring the abundance change in unmodified peptides. By design this method is agnostic of the modification type and stoichiometry of different modification sites within a peptide. As the reviewer had pointed out previously, we have clarified these aspects of our strategy (see Essential revisions point 1). Unfortunately, label-free phosphoproteome data, which are based on phosphopeptide-enrichment, do not provide the basis on which FLEXIQuant-LF operates. Even combining label-free global data and label-free phosphoproteome data suffer from the presence of additional modifications that are inherently included in the agnostic FLEXIQuant-LF method but are neglected by enrichment methodologies. We appreciate the reviewer’s comment to boost the power of our strategy but have to highlight, that no published dataset is available to date to follow the suggestion by the reviewer as too many confounding factors, i.e. additional modifications, enrichment protocols for specific modifications only, exist.

https://doi.org/10.7554/eLife.58783.sa2

## Article and author information

### Author details

1. #### Christoph N Schlaffner

1. F.M. Kirby Neurobiology Center, Boston Children’s Hospital, Boston, United States
2. Department of Neurology, Harvard Medical School, Boston, United States
##### Contribution
Conceptualization, Data curation, Formal analysis, Supervision, Validation, Investigation, Visualization, Methodology, Writing - review and editing
##### Contributed equally with
Konstantin Kahnert and Jan Muntel
##### Competing interests
No competing interests declared
2. #### Konstantin Kahnert

1. Department of Pathology, Boston Children’s Hospital, Boston, United States
2. Bioinformatics Unit (MF1), Robert Koch Institute, Berlin, Germany
3. Department of Medical Biotechnology, Institute of Biotechnology, Technische Universit?t Berlin, Berlin, Germany
##### Contribution
Data curation, Software, Formal analysis, Validation, Investigation, Visualization, Methodology, Writing - original draft
##### Contributed equally with
Christoph N Schlaffner and Jan Muntel
##### Competing interests
No competing interests declared
3. #### Jan Muntel

1. Department of Pathology, Boston Children’s Hospital, Boston, United States
2. Department of Pathology, Harvard Medical School, Boston, United States
##### Present address
Biognosys AG, Schlieren, Switzerland
##### Contribution
Investigation, Visualization, Writing - original draft
##### Contributed equally with
Christoph N Schlaffner and Konstantin Kahnert
##### Competing interests
No competing interests declared
4. #### Ruchi Chauhan

F.M. Kirby Neurobiology Center, Boston Children’s Hospital, Boston, United States
##### Present address
Grousbeck Gene Therapy Center, Gene Transfer Vector Core, Massachusetts Eye and Ear Infirmary/Harvard Medical School, Boston, United States
##### Contribution
Investigation
##### Competing interests
No competing interests declared
5. #### Bernhard Y Renard

1. Bioinformatics Unit (MF1), Robert Koch Institute, Berlin, Germany
2. Data Analytics and Computational Statistics, Hasso-Plattner-Institute, Faculty of Digital Engineering, University of Potsdam, Potsdam, Germany
##### Contribution
Conceptualization, Supervision, Funding acquisition, Validation, Methodology, Writing - review and editing
##### Competing interests
No competing interests declared
6. #### Judith A Steen

1. F.M. Kirby Neurobiology Center, Boston Children’s Hospital, Boston, United States
2. Department of Neurology, Harvard Medical School, Boston, United States
##### Contribution
Conceptualization, Resources, Funding acquisition
##### Competing interests
No competing interests declared
7. #### Hanno Steen

1. Department of Pathology, Boston Children’s Hospital, Boston, United States
2. Department of Pathology, Harvard Medical School, Boston, United States
3. Precision Vaccines Program, Boston Children’s Hospital, Boston, United States
##### Contribution
Conceptualization, Resources, Supervision, Funding acquisition, Validation, Methodology, Project administration, Writing - review and editing
##### For correspondence
Hanno.Steen@childrens.harvard.edu
##### Competing interests
No competing interests declared

### Funding

• Hanno Steen

• Hanno Steen

• Hanno Steen

• Hanno Steen

• Hanno Steen

• Hanno Steen

#### National Institutes of Health (R01GM112007)

• Judith A Steen

#### Deutsche Forschungsgemeinschaft (RE3474/2-2)

• Bernhard Renard

The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.

### Acknowledgements

We thank Benoit Fatou and Mukesh Kumar for providing test data during development and testing of FLEXIQuant-LF. Furthermore, we thank Kyle Higgins and Patrick van Zalm for testing the FLEXIQuant-LF software.

### Senior Editor

1. Philip A Cole, Harvard Medical School, United States

### Reviewing Editor

1. Akhilesh Pandey, Mayo Clinic, United States

### Publication history

1. Received: May 11, 2020
2. Accepted: November 23, 2020
3. Version of Record published: December 7, 2020 (version 1)

### Copyright

? 2020, Schlaffner et al.

This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.

## Metrics

• 305
Page views
• 45
Downloads
• 0
Citations

Article citation count generated by polling the highest count across the following sources: Crossref, PubMed Central, Scopus.

## Download links

A two-part list of links to download the article, or parts of the article, in various formats.

### Open citations (links to open the citations from this article in various online reference manager services)

1. Further reading

### Further reading

1. Biochemistry and Chemical Biology
2. Microbiology and Infectious Disease

# Structure of dual-BON domain protein DolP identifies phospholipid binding as a new mechanism for protein localization

Jack Alfred Bryant et al.
Research Article

The Gram-negative outer membrane envelops the bacterium and functions as a permeability barrier against antibiotics, detergents and environmental stresses. Some virulence factors serve to maintain the integrity of the outer membrane, including DolP (formerly YraP) a protein of unresolved structure and function. Here we reveal DolP is a lipoprotein functionally conserved among Gram-negative bacteria and that loss of DolP increases membrane fluidity. We present the NMR solution structure for Escherichia coli DolP, which is composed of two BON domains that form an interconnected opposing pair. The C-terminal BON domain binds anionic phospholipids through an extensive membrane:protein interface. This interaction is essential for DolP function and is required for sub-cellular localization of the protein to the cell division site, providing evidence of subcellular localization of these phospholipids within the outer membrane. The structure of DolP provides a new target for developing therapies that disrupt the integrity of the bacterial cell envelope.

1. Biochemistry and Chemical Biology
2. Microbiology and Infectious Disease

# Cross-talk between individual phenol-soluble modulins in Staphylococcus aureus biofilm enables rapid and efficient amyloid formation

Masihuz Zaman, Maria Andreasen
Research Article Updated

The infective ability of the opportunistic pathogen Staphylococcus aureus, recognized as the most frequent cause of biofilm-associated infections, is associated with biofilm-mediated resistance to host immune response. Phenol-soluble modulins (PSM) comprise the structural scaffold of S. aureus biofilms through self-assembly into functional amyloids, but the role of individual PSMs during biofilm formation remains poorly understood and the molecular pathways of PSM self-assembly are yet to be identified. Here we demonstrate high degree of cooperation between individual PSMs during functional amyloid formation. PSMα3 initiates the aggregation, forming unstable aggregates capable of seeding other PSMs resulting in stable amyloid structures. Using chemical kinetics we dissect the molecular mechanism of aggregation of individual PSMs showing that PSMα1, PSMα3 and PSMβ1 display secondary nucleation whereas PSMβ2 aggregates through primary nucleation and elongation. Our findings suggest that various PSMs have evolved to ensure fast and efficient biofilm formation through cooperation between individual peptides.