Population stratification continues to bias the results of genome-wide association studies (GWAS). When these results are used to construct polygenic scores, even subtle biases can cumulatively lead to large errors. To study the effect of residual stratification, we simulated GWAS under realistic models of demographic history. We show that when population structure is recent, it cannot be corrected using principal components of common variants because they are uninformative about recent history. Consequently, polygenic scores are biased in that they recapitulate environmental structure. Principal components calculated from rare variants or identity-by-descent segments can correct this stratification for some types of environmental effects. While family-based studies are immune to stratification, the hybrid approach of ascertaining variants in GWAS but re-estimating effect sizes in siblings reduces but does not eliminate stratification. We show that the effect of population stratification depends not only on allele frequencies and environmental structure but also on demographic history.
- Iain Mathieson
The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.
- George H Perry, Pennsylvania State University, United States
- Received: July 29, 2020
- Accepted: November 16, 2020
- Accepted Manuscript published: November 17, 2020 (version 1)
? 2020, Zaidi & Mathieson
This article is distributed under the terms of the Creative Commons Attribution License permitting unrestricted use and redistribution provided that the original author and source are credited.
Downloads (link to download the article as PDF)
Download citations (links to download the citations from this article in formats compatible with various reference manager tools)
Open citations (links to open the citations from this article in various online reference manager services)
Sense of coherence (SoC) is the origin of health according to Antonovsky. The link between SoC and risk of cancer has however rarely been assessed. We performed a cohort study of 46,436 women from the Karolinska Mammography Project for Risk Prediction of Breast Cancer (Karma). Participants answered a SoC-13 questionnaire at recruitment to Karma and were subsequently followed up for incident breast cancer. Multivariate Cox models were used to assess the hazard ratios (HRs) of breast cancer in relation to SoC. We identified 771 incident cases of breast cancer during follow-up (median time: 5.2 years). No association was found between SoC, either as a categorical (strong vs. weak SoC, HR: 1.08, 95% CI: 0.90–1.29) or continuous (HR: 1.08; 95% CI: 1.00–1.17 per standard deviation increase of SoC) variable, and risk of breast cancer. In summary, we found little evidence to support an association between SoC and risk of breast cancer.
Given a lifetime risk of ~90% by the ninth decade of life, it is unknown if there are true controls for hypertension in epidemiological and genetic studies. Here, we compared Bayesian logistic and time-to-event approaches to modeling hypertension. The median age at hypertension was approximately a decade earlier in African Americans than in European Americans or Mexican Americans. The probability of being free of hypertension at 85 years of age in African Americans was less than half that in European Americans or Mexican Americans. In all groups, baseline hazard rates increased until nearly 60 years of age and then decreased but did not reach zero. Taken together, modeling of the baseline hazard function of hypertension suggests that there are no true controls and that controls in logistic regression are cases with a late age of onset.