-
Notifications
You must be signed in to change notification settings - Fork 5
MA FOCUS: Fine mapping TWAS associations across multiple ancestry groups
MA-FOCUS is a multi-ancestry version of FOCUS that takes as input across multiple ancestries:
- GWAS summary statistics
- reference LD
- eQTL weight database.
Given these data, MA-FOCUS can fine-map in a tissue-agnostic or tissue-prioritized approach.
Compared to original FOCUS approach, MA-FOCUS:
- takes required data across multiple ancestries.
- leverages number of genes in the genomic risk region from gencode as the prior probability for a gene to be causal (while you can still specify it as a fixed probability).
- has a better prior estimate on the variance of gene expression effects on complex traits.
The basic command for multi-ancestry fine-mapping is
focus finemap SUMSTATS_POP1:SUMSTATS_POP2 PLINK_REFLD_POP1:PLINK_REFLD_POP2 WEIGHT_DB_POP1:WEIGHT_DB2 --locations FINE_MAPPED_REGION
where SUMSTATS
is the GWAS summary file, PLINK_REFLD
is the path to PLINK-formatted genotype data for computing reference LD, and WEIGHT_DB
is the path to a FOCUS weight database. RISK_REGION
is the path to independent genomic regions (we have generated some files for your use. see wiki Home). Help on all the options and functionality can be listed by entering
focus finemap --help
To add more population, just use :
to concatenate.
For example, the command to perform tissue-agnostic multi-ancestry fine-mapping on chromosome 1 for GWAS summary data LDL_EUR.sumstats.gz
and LDL_AFR.sumstats.gz
using 1000G.EUR.QC.1
and 1000G.AFR.QC.1
reference genotypes, and gtex_v7_EUR.db
and gtex_v7_AFR.db
eQTL weights for risk regions 37:EUR-AFR
generated by LDetect on GRCh37 for European and African ancestry is given as,
focus finemap LDL_EUR.sumstats.gz:LDL_AFR.sumstats.gz 1000G.EUR.QC.1:1000G.AFR.QC.1 gtex_v7_EUR.db:gtex_v7_AFR.db --locations 37:EUR-AFR --chr 1 --out LDL_mafocus.chr1
To take the tissue-prioritized approach the flag --tissue TISSUE
is added.
MA-FOCUS has the ability to generate a figure for each region, each ancestry that contains the predicted expression correlation, TWAS summary statistics and PIP for each gene. To do this add the --plot
flag. Example is similar to single-ancestry FOCUS.
The output from the ma-finemap operation is a table using 2-pop MA-FOCUS as an example:
Column | Description |
---|---|
block | independent genomic region chrom:start-chrom:stop |
ens_gene_id | Ensembl gene ID |
ens_tx_id | Ensemble transcript ID |
mol_name | Name of the gene/linc/pseudogene |
tissue | Tissue the original expression was measured in |
ref_name | Name of the QTL reference panel |
type | Type of molecular feature (gene, lncRNA, lincRNA, pseudogene) |
chrom | Chromosome |
tx_start | Transcription start site |
tx_stop | Transcription stop site |
block_genes | number of genes in the region to set the prior probability for a gene to be causal |
inference_pop1 | Inference procedure for model (e.g., LASSO, BSLMM) for the first population |
inference_pop2 | Inference procedure for model (e.g., LASSO, BSLMM) for the second population |
inter_z_pop1 | intercept of z scores when regressing out average tagged pleiotropic associations for the first population, None if intercept = False |
inter_z_pop2 | intercept of z scores when regressing out average tagged pleiotropic associations for the second population, None if intercept = False |
cv.R2_pop1 | Cross-validation predictive Rsquared for the first population |
cv.R2_pop2 | Cross-validation predictive Rsquared for the second population |
cv.R2.pval_pop1 | P-value of the Cross-validation for the first population |
cv.R2.pval_pop2 | P-value of the Cross-validation for the second population |
twas_z_pop1 | Marginal TWAS Z score the first population |
twas_z_pop2 | Marginal TWAS Z score the second population |
pip_pop1 | Marginal posterior inclusion probability the first population |
pip_pop2 | Marginal posterior inclusion probability the second population |
in_cred_set_pop1 | Flag indicating whether or not model is included in the credible set the first population |
in_cred_set_pop2 | Flag indicating whether or not model is included in the credible set the second population |
ldregion_pop1 | LD regions from reference genome the first population |
ldregion_pop2 | LD regions from reference genome the second population |
We recommend using reference LD from LDSC.
We recommend using a multiple tissue, multiple eQTL reference panel weight database here. This combines GTExv7 weights from PrediXcan with METSIM, NTR, YFS, and CMC weights from FUSION software into a single usable database for FOCUS.