Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Direct assignment of parser arguments #218

Open
j-andrews7 opened this issue Jan 7, 2025 · 3 comments
Open

Direct assignment of parser arguments #218

j-andrews7 opened this issue Jan 7, 2025 · 3 comments
Labels
enhancement feature-request rmspc The R package of MSPC distributed via Bioconductor.

Comments

@j-andrews7
Copy link

I am rather confused about how MSPC handles the p-values of the input bed files, particularly for MACS, as the score column is actually the -log10(qvalue)*10 as specified in their docs (emphasis mine):

  1. score - Indicates how dark the peak will be displayed in the browser (0-1000). Thus, it’s for the purpose of displaying on genome browser. In MACS3 callpeak output, we use the -log10qvalue*10. However, it may happen when the value in this column goes above 1000, and cause trouble while loading it in genome browsers. In this case, use the following awk command to fix: awk -F'\t' '{ if ($5 > 1000) $5=1000; OFS="\t"; print }' peak.narrowPeak

I don't think this is a MACS3 change and think this has been the default for a while now (perhaps always).

While I have looked at the parser configuration options, it appears to have different expectations than what MACS provides.

The -log10(p-value) is in the 8th column of the typical MACS narrowPeak output. Would it be possible to make the parser argument(s) direct parameters in the rmspc R package rather than using a JSON file? It'd make things simpler. Maybe just have it take a named list?

The vignette is confusing, as it's clearly using MACS files, but I don't know if it's appropriate given what the score values actually are (or if those files were adjusted/parsed upstream).

@VJalili VJalili added rmspc The R package of MSPC distributed via Bioconductor. enhancement labels Jan 8, 2025
@VJalili
Copy link
Member

VJalili commented Jan 8, 2025

Thanks for bringing that to our attention! MSPC is set to read values from the score column in the BED format for broader compatibility. And as you correctly mentioned, that does not match the p-value column of MACS outputs; hence, the user needs to adjust that using parser configuration.

MSPC does not currently provide an option to pass parser configuration directly in the R package or as CLI args, and as you mentioned, a JSON file is the only option. I agree the approach you suggest would make the invocation simpler. Meanwhile, please let us know if using JSON is blocking you.

@j-andrews7
Copy link
Author

I was hoping to try this in the scope of a package for group-wise super enhancer calling to derive a robust consensus peak set, as I've found noisy peak calling one of the biggest confounders in the process. This seemed a more elegant solution than arbitrary requirements of recurrence and such, though I have a few other methods to try as well.

It's not a block, it's easy enough to provide the JSON file in the package and feed it.

Thanks for the response.

@VJalili
Copy link
Member

VJalili commented Jan 9, 2025

I am glad you found MSPC helpful, and please let me know if you have any other questions/concerns.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement feature-request rmspc The R package of MSPC distributed via Bioconductor.
Projects
None yet
Development

No branches or pull requests

2 participants