Skip to content

MA FOCUS: Fine mapping TWAS associations across multiple ancestry groups

Zeyun edited this page Feb 8, 2022 · 4 revisions

MA-FOCUS is a multi-ancestry version of FOCUS that takes as input across multiple ancestries:

  1. GWAS summary statistics
  2. reference LD
  3. eQTL weight database.

Given these data, MA-FOCUS can fine-map in a tissue-agnostic or tissue-prioritized approach.

Compared to original FOCUS approach, MA-FOCUS:

  1. takes required data across multiple ancestries.
  2. leverages number of genes in the genomic risk region from gencode as the prior probability for a gene to be causal (while you can still specify it as a fixed probability).
  3. has a better prior estimate on the variance of gene expression effects on complex traits.

The basic command for multi-ancestry fine-mapping is

focus finemap SUMSTATS_POP1:SUMSTATS_POP2 PLINK_REFLD_POP1:PLINK_REFLD_POP2 WEIGHT_DB_POP1:WEIGHT_DB2 --locations FINE_MAPPED_REGION

where SUMSTATS is the GWAS summary file, PLINK_REFLD is the path to PLINK-formatted genotype data for computing reference LD, and WEIGHT_DB is the path to a FOCUS weight database. RISK_REGION is the path to independent genomic regions (we have generated some files for your use. see wiki Home). Help on all the options and functionality can be listed by entering

focus finemap --help

To add more population, just use : to concatenate.

For example, the command to perform tissue-agnostic multi-ancestry fine-mapping on chromosome 1 for GWAS summary data LDL_EUR.sumstats.gz and LDL_AFR.sumstats.gz using 1000G.EUR.QC.1 and 1000G.AFR.QC.1 reference genotypes, and gtex_v7_EUR.db and gtex_v7_AFR.db eQTL weights for risk regions 37:EUR-AFR generated by LDetect on GRCh37 for European and African ancestry is given as,

focus finemap LDL_EUR.sumstats.gz:LDL_AFR.sumstats.gz 1000G.EUR.QC.1:1000G.AFR.QC.1 gtex_v7_EUR.db:gtex_v7_AFR.db --locations 37:EUR-AFR --chr 1 --out LDL_mafocus.chr1

To take the tissue-prioritized approach the flag --tissue TISSUE is added.

MA-FOCUS has the ability to generate a figure for each region, each ancestry that contains the predicted expression correlation, TWAS summary statistics and PIP for each gene. To do this add the --plot flag. Example is similar to single-ancestry FOCUS.

The output from the ma-finemap operation is a table using 2-pop MA-FOCUS as an example:

Column Description
block independent genomic region chrom:start-chrom:stop
ens_gene_id Ensembl gene ID
ens_tx_id Ensemble transcript ID
mol_name Name of the gene/linc/pseudogene
tissue Tissue the original expression was measured in
ref_name Name of the QTL reference panel
type Type of molecular feature (gene, lncRNA, lincRNA, pseudogene)
chrom Chromosome
tx_start Transcription start site
tx_stop Transcription stop site
block_genes number of genes in the region to set the prior probability for a gene to be causal
inference_pop1 Inference procedure for model (e.g., LASSO, BSLMM) for the first population
inference_pop2 Inference procedure for model (e.g., LASSO, BSLMM) for the second population
inter_z_pop1 intercept of z scores when regressing out average tagged pleiotropic associations for the first population, None if intercept = False
inter_z_pop2 intercept of z scores when regressing out average tagged pleiotropic associations for the second population, None if intercept = False
cv.R2_pop1 Cross-validation predictive Rsquared for the first population
cv.R2_pop2 Cross-validation predictive Rsquared for the second population
cv.R2.pval_pop1 P-value of the Cross-validation for the first population
cv.R2.pval_pop2 P-value of the Cross-validation for the second population
twas_z_pop1 Marginal TWAS Z score the first population
twas_z_pop2 Marginal TWAS Z score the second population
pip_pop1 Marginal posterior inclusion probability the first population
pip_pop2 Marginal posterior inclusion probability the second population
in_cred_set_pop1 Flag indicating whether or not model is included in the credible set the first population
in_cred_set_pop2 Flag indicating whether or not model is included in the credible set the second population
ldregion_pop1 LD regions from reference genome the first population
ldregion_pop2 LD regions from reference genome the second population

We recommend using reference LD from LDSC.

We recommend using a multiple tissue, multiple eQTL reference panel weight database here. This combines GTExv7 weights from PrediXcan with METSIM, NTR, YFS, and CMC weights from FUSION software into a single usable database for FOCUS.