data.indiv_proba.csv and probability threshold #174

sashajenner · 2024-10-11T03:48:13Z

First of all, is this a mistake in the README.md?
"The output file data.indiv_proba.csv contains the probability of modification for each read"
Should this instead be "for each position"?
Similarly, should this
"probability_modified: The probability that a given read is modified"
instead be
"probability_modified: The probability that a given position is modified"

Next, what is the modified probability threshold value for RNA004?

Finally, what is the thresholding algorithm? I.e. I imagine that it is something like:
If p > thresh: modified
Else: not modified

yuukiiwa · 2024-10-21T01:06:58Z

Hi @sashajenner,

To clarify, data.indiv_proba.csv contains the read-level probability of modification at each position for each read (you should see multiple reads for each position) while data.site_proba.csv aggregates all the read-level probabilities to produce a site-level probability.

The mod_ratio from data.site_proba.csv is defined as the proportion of reads with a read-level probability above the DEFAULT_READ_THRESHOLD = 0.033379376 at each site.

Thanks!

Best wishes,
Yuk Kei

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

data.indiv_proba.csv and probability threshold #174

data.indiv_proba.csv and probability threshold #174

sashajenner commented Oct 11, 2024

yuukiiwa commented Oct 21, 2024

data.indiv_proba.csv and probability threshold #174

data.indiv_proba.csv and probability threshold #174

Comments

sashajenner commented Oct 11, 2024

yuukiiwa commented Oct 21, 2024