DBSLMM is the software implementing the Deterministic Bayesian Sparse Linear Mixed Model (DBSLMM). DBSLMM can be used to construction Polygenic Genetics Score (PGS). It fits Linear Mixed Model using summary statistics, LD matrix and LD block information. It is computationally efficient and accurate for Biobank scale GWAS data and uses freely available open-source numerical libraries.
\[y=X_{l} \beta _{l}+X_{s} \beta _{s}+ \epsilon \] where \(X_{l}\) is the by \(m_{l}\) genotype matrix for \(m_{l}\) selected likely large-effect SNPs; \(\beta _{l}\) is an \(m_{l}\) -vector of corresponding effect sizes; \(X_{s}\) is the by \(m_{s}\) genotype matrix for \(m_{s}=m-m_{l}\) remaining likely small-effect SNPs; \(\beta _{s}\) is an \(m_{s}\) -vector of corresponding effect sizes.
External validation is another software in DBSLMM. It can be used to constrution PGS by exteranl summary statsitics and reference panel. It is flexiable to construct PGS for each chromosome one by one or 22 chromosomes together.
\[R=cor(\tilde{y},\hat{\tilde{y}})=\frac{cov(\tilde{y},\hat{\tilde{y}})}{var(\tilde{y})var(\hat{\tilde{y}})}=\frac{\tilde{z}^T\hat{\beta}}{\sqrt{\hat{\beta}^T\it\Sigma\hat{\beta}}}\] where \(\tilde{z}\) is the z-score for external observed summary statistics in terms of z-score, \(\hat{\beta}\) is the estimated effect from DBSLMM and \(\it\Sigma\) is the LD structure of reference panel.
In order to install DBSLMM
and VALID
, you should clone this repository via the commands
git clone https://github.com/biostat0903/DBSLMM.git
Sheng Yang, Xiang Zhou (2019). Accurate and Scalable Construction of Polygenic Scores in Large Biobank Data Sets. bioRxiv.