Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

data.indiv_proba.csv and probability threshold #174

Open
sashajenner opened this issue Oct 11, 2024 · 1 comment
Open

data.indiv_proba.csv and probability threshold #174

sashajenner opened this issue Oct 11, 2024 · 1 comment

Comments

@sashajenner
Copy link

First of all, is this a mistake in the README.md?
"The output file data.indiv_proba.csv contains the probability of modification for each read"
Should this instead be "for each position"?
Similarly, should this
"probability_modified: The probability that a given read is modified"
instead be
"probability_modified: The probability that a given position is modified"

Next, what is the modified probability threshold value for RNA004?

Finally, what is the thresholding algorithm? I.e. I imagine that it is something like:
If p > thresh: modified
Else: not modified

@yuukiiwa
Copy link
Collaborator

Hi @sashajenner,

To clarify, data.indiv_proba.csv contains the read-level probability of modification at each position for each read (you should see multiple reads for each position) while data.site_proba.csv aggregates all the read-level probabilities to produce a site-level probability.

The mod_ratio from data.site_proba.csv is defined as the proportion of reads with a read-level probability above the DEFAULT_READ_THRESHOLD = 0.033379376 at each site.

Thanks!

Best wishes,
Yuk Kei

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants