Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change the way to exclude regions in hmmratac and fix the incorrect keep-duplicate option #689

Merged
merged 1 commit into from
Feb 13, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
31 changes: 23 additions & 8 deletions ChangeLog
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
2024-11-30 Tao Liu <vladimir.liu@gmail.com>
MACS 3.0.3b1
2025-02-12 Tao Liu <vladimir.liu@gmail.com>
MACS 3.0.3

* Features added

Expand All @@ -19,20 +19,35 @@
source codes can be more compatible to Python programming tools
such as `flake8`. During rewritting, we cleaned the source codes
even more, and removed unnecessary dependencies during
compilation.
compilation. We will continue to do more code cleaning in the
future.

3) We changed the behavior on the usage of 'blacklist' regions in
`hmmratac`. We will remove the aligned fragments located in the
'blacklist' regions before the EM step to estimate fragment
lengths distributions and HMM step to learn and predict nucleosome
states. The reason is discussed in #680. To implement this
feature, we added the `exclude` functions to PETrackI and
PETrackII.

* Bug fixed

1) Fix issues in big-endian system in `Parser.py` codes. Enable
big-endian support in `BAM.py` codes for accessig certain
alignment records that overlap with givin genomic
coordinates using BAM/BAI files.
1) `hmmratac` option `--keep-duplicate` had opposite effect
previously as indicated by the name and description. It has been
renamed as `--remove-dup` to reflect the actual
behavior. `hmmratac` will not remove duplicated fragments unless
this option is set.

2) `hmmratac`: wrong class name used while saving digested signals
in BedGraph files. Multiple other issues related to output
filenames. #682

3) `predictd` and `filterdup`: wrong variable name used while
3) Fix issues in big-endian system in `Parser.py` codes. Enable
big-endian support in `BAM.py` codes for accessig certain
alignment records that overlap with givin genomic
coordinates using BAM/BAI files.

4) `predictd` and `filterdup`: wrong variable name used while
reading multiple pe/frag files.

* Doc
Expand Down
29 changes: 18 additions & 11 deletions MACS3/Commands/hmmratac_cmd.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Time-stamp: <2025-02-05 12:38:55 Tao Liu>
# Time-stamp: <2025-02-12 14:30:02 Tao Liu>

"""Description: Main HMMR command

Expand Down Expand Up @@ -118,9 +118,17 @@ def run(args):
# remember to finalize the petrack
petrack.finalize()

options.info(f"# Read {petrack.total} fragments.")

# filter duplicates if needed
if options.misc_keep_duplicates:
if options.misc_remove_duplicates:
options.info("# Removing duplicated fragments...")
# by default, we keep all duplicates
before_total = petrack.total
petrack.filter_dup(maxnum=1)
removed_n = before_total - petrack.total
options.info(f"# We removed {removed_n} duplicated fragments.")
options.info(f"# There are {petrack.total} fragments left.")

# read in blacklisted if option entered
if options.blacklist:
Expand All @@ -139,6 +147,14 @@ def run(args):
blacklist_regions = Regions()
blacklist_regions.init_from_PeakIO(blacklist)

# remove fragments aligned in the blacklisted regions

before_total = petrack.total
petrack.exclude(blacklist_regions)
removed_n = before_total - petrack.total
options.info(f"# We removed {removed_n} fragments overlapping with blacklisted regions.")
options.info(f"# There are {petrack.total} fragments left.")

#############################################
# 2. EM
#############################################
Expand Down Expand Up @@ -287,10 +303,6 @@ def run(args):
training_regions.expand(options.hmm_training_flanking)
training_regions.merge_overlap()

# remove peaks overlapping with blacklisted regions
if options.blacklist:
training_regions.exclude(blacklist_regions)
options.info(f"# after removing those overlapping with provided blacklisted regions, we have {training_regions.total} left")
if options.save_train:
fhd = open(training_region_bedfile, "w")
training_regions.write_to_bed(fhd)
Expand Down Expand Up @@ -409,11 +421,6 @@ def run(args):
candidate_regions.merge_overlap()
options.info(f"# after expanding and merging, we have {candidate_regions.total} candidate regions")

# remove peaks overlapping with blacklisted regions
if options.blacklist:
candidate_regions.exclude(blacklist_regions)
options.info(f"# after removing those overlapping with provided blacklisted regions, we have {candidate_regions.total} left")

# extract signals
options.info("# Extract signals in candidate regions and decode with HMM")
# we will do the extraction and prediction in a step of 10000 regions by default
Expand Down
Loading
Loading