MA FOCUS: Fine mapping TWAS associations across multiple ancestry groups

MA-FOCUS is a multi-ancestry version of FOCUS that takes as input across multiple ancestries:

GWAS summary statistics
reference LD
eQTL weight database.

Given these data, MA-FOCUS can fine-map in a tissue-agnostic or tissue-prioritized approach.

Compared to original FOCUS approach, MA-FOCUS:

takes required data across multiple ancestries.
leverages number of genes in the genomic risk region from gencode as the prior probability for a gene to be causal (while you can still specify it as a fixed probability).
has a better prior estimate on the variance of gene expression effects on complex traits.

The basic command for multi-ancestry fine-mapping is

focus finemap SUMSTATS_POP1:SUMSTATS_POP2 PLINK_REFLD_POP1:PLINK_REFLD_POP2 WEIGHT_DB_POP1:WEIGHT_DB2 --locations FINE_MAPPED_REGION

where SUMSTATS is the GWAS summary file, PLINK_REFLD is the path to PLINK-formatted genotype data for computing reference LD, and WEIGHT_DB is the path to a FOCUS weight database. RISK_REGION is the path to independent genomic regions (we have generated some files for your use. see wiki Home). Help on all the options and functionality can be listed by entering

focus finemap --help

To add more population, just use : to concatenate.

For example, the command to perform tissue-agnostic multi-ancestry fine-mapping on chromosome 1 for GWAS summary data LDL_EUR.sumstats.gz and LDL_AFR.sumstats.gz using 1000G.EUR.QC.1 and 1000G.AFR.QC.1 reference genotypes, and gtex_v7_EUR.db and gtex_v7_AFR.db eQTL weights for risk regions 37:EUR-AFR generated by LDetect on GRCh37 for European and African ancestry is given as,

focus finemap LDL_EUR.sumstats.gz:LDL_AFR.sumstats.gz 1000G.EUR.QC.1:1000G.AFR.QC.1 gtex_v7_EUR.db:gtex_v7_AFR.db --locations 37:EUR-AFR --chr 1 --out LDL_mafocus.chr1

To take the tissue-prioritized approach the flag --tissue TISSUE is added.

MA-FOCUS has the ability to generate a figure for each region, each ancestry that contains the predicted expression correlation, TWAS summary statistics and PIP for each gene. To do this add the --plot flag. Example is similar to single-ancestry FOCUS.

The output from the ma-finemap operation is a table using 2-pop MA-FOCUS as an example:

Column	Description
block	independent genomic region chrom:start-chrom:stop
ens_gene_id	Ensembl gene ID
ens_tx_id	Ensemble transcript ID
mol_name	Name of the gene/linc/pseudogene
tissue	Tissue the original expression was measured in
ref_name	Name of the QTL reference panel
type	Type of molecular feature (gene, lncRNA, lincRNA, pseudogene)
chrom	Chromosome
tx_start	Transcription start site
tx_stop	Transcription stop site
block_genes	number of genes in the region to set the prior probability for a gene to be causal
inference_pop1	Inference procedure for model (e.g., LASSO, BSLMM) for the first population
inference_pop2	Inference procedure for model (e.g., LASSO, BSLMM) for the second population
inter_z_pop1	intercept of z scores when regressing out average tagged pleiotropic associations for the first population, None if intercept = False
inter_z_pop2	intercept of z scores when regressing out average tagged pleiotropic associations for the second population, None if intercept = False
cv.R2_pop1	Cross-validation predictive Rsquared for the first population
cv.R2_pop2	Cross-validation predictive Rsquared for the second population
cv.R2.pval_pop1	P-value of the Cross-validation for the first population
cv.R2.pval_pop2	P-value of the Cross-validation for the second population
twas_z_pop1	Marginal TWAS Z score the first population
twas_z_pop2	Marginal TWAS Z score the second population
pip_pop1	Marginal posterior inclusion probability the first population
pip_pop2	Marginal posterior inclusion probability the second population
in_cred_set_pop1	Flag indicating whether or not model is included in the credible set the first population
in_cred_set_pop2	Flag indicating whether or not model is included in the credible set the second population
ldregion_pop1	LD regions from reference genome the first population
ldregion_pop2	LD regions from reference genome the second population

We recommend using reference LD from LDSC.

We recommend using a multiple tissue, multiple eQTL reference panel weight database here. This combines GTExv7 weights from PrediXcan with METSIM, NTR, YFS, and CMC weights from FUSION software into a single usable database for FOCUS.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MA FOCUS: Fine mapping TWAS associations across multiple ancestry groups

Clone this wiki locally