Releases: gbouras13/hybracter
v0.11.2
v0.11.1
v0.11.0
- Replaces kmc with lrge when using
--auto
, a much faster tool designed for the purpose of estimating genome size from long reads. It is very very fast and robust. Thanks @mbhall88 !- If you input has more than 10000 long reads (it probably should!), lrge will run in default settings. If it has under this, then it will run a (slightly) more computationally expensive all-vs-all mode with all input reads. In practice, if you have low read counts, you probably should take downstream analysis (inclduing lrge and hybracter) with caution anyway.
- According to the preprint (and my less exhaustive testing), lrge is more accurate and much faster than kmc, but I would still be careful using it on data that has lower quality than < Q15.
- Nothing else changes - the estimated chromosome size used by Hybracter will still be 80% of the estimate, as it needs to account for plasmids
- Adds
r1041_e82_400bps_bacterial_methylation
as an option for--medakaModel
thanks to this issue.- Note this won't work if you run
hybracter
on a Mac (as medaka v2 is not available)
- Note this won't work if you run
v0.10.1
- Adds retry functionality for dnaapler with 1 thread if there is an error - for some genomes on some systems, it was observed that using the default resources (8 threads, 16GB RAM) will lead to an error in dnaapler.
- Thanks @richardstoeckl for implementing this
- Thanks to some feedback from @oschwengers, modify documentation to warn users that for low quality long read sets (e.g. R9 FAST/HAC or sub Q15 reads),
--auto
is not recommended. It can tend to overestimate the chromosome size as more erroneous 21-mers will be counted by kmc than expected. Please specify a chromosome size for this type of data going forward.
v0.10.0
- Updates Medaka to v2.0.1, implementing the
--bacteria
option by default. - This is based on the recommendations of Ryan Wick here who found it improved assemblies due to (likely) enhanced methylation error correction.
- If you still want to specify a Medaka model, the flag
--medaka_override
has been added. You need to include this along with your model via--medakaModel
. This is most likely useful for older R9 data. - Adds
--extra_params_flye
parameter if you want to specify extra commands for the Flye assembly step thanks @pdobbler #101
v0.9.1
- Small change to the
plassembler.yaml
config and plassembler rules preventing installation bugs - Unicycler v0.5.1 to be installed in a much simpler fashion via Bioconda thanks @npbhavya. Installation should be a lot less fragile now - The crappy workaround was because Unicycler conda package for MacOS was not built/broken for v0.5.0 - thanks @mencian @tcezard for fixing v0.5.1 bioconda/bioconda-recipes#49602
v0.9.0
--auto
for automatic estimation of chromosome size
-
Thanks to an issue and code from @richardstoeckl, Hybracter can now estimate the estimated chromosome size for each sample by passing
--auto
. -
The implementation uses kmc. Specifically, Hybracter uses kmc to count the number of unique 21mers that appear at least 10 times in your long-read FASTQ file. This is because, for a given assembly of length L, and a k-mer size of k, the total number of unique possible k-mers will be given by ( L – k ) + 1, and if L >> k, then it suffices as an estimate of total assembly size
-
The estimated chromosome size used by Hybracter will actually be 80% of the number of 21-mers found at least 10 times, as it needs to account for plasmids
-
If you aren't sure whether you have enough data for assembly (i.e. coverage lower than 20x), be careful using
--auto
, because the actual assembly size will tend to be larger than the number of unique 21mers found at least 10 times. Therefore, the estimated chromosome size will almost certainly be an underestimate and may lead to Hybracter considering your assembly "complete" when in fact it isn't. -
If you use
--auto
, you do not need to specify the chromosome length in the input. This means you don't need to-c
withlong-single
orhybrid-single
and in the input csv sample sheet, you do not need a column with chromosome length.
e.g. for hybracter long
you only need 2 columns with sample name and long-read FASTQ file path:
s_aureus_sample1,sample1_long_read.fastq.gz
p_aeruginosa_sample2,sample2_long_read.fastq.gz
and for hybracter hybrid
you only need 4 columns with sample name, long-read FASTQ, and R1 and R2 short-read FASTQ file paths:
s_aureus_sample1,sample1_long_read.fastq.gz,sample1_SR_R1.fastq.gz,sample1_SR_R2.fastq.gz
p_aeruginosa_sample2,sample2_long_read.fastq.gz,sample2_SR_R1.fastq.gz,sample2_SR_R2.fastq.gz
Other changes
- Hybracter v0.9.0 will automatically support the reorientation of archaeal chromosomes (thanks @richardstoeckl) to begin with the cog1474 Orc1/cdc6 gene.
--datadir
can now also accept 2 paths separated by a comma, if you have long reads and short reads in separate directories e.g.--datadir "long_read_dir,short_read_dir"
(#76).--min_depth
parameter added. Hybracter will error out if your QC'd long reads have a coverage lower thanmin_depth
for a sample (#89).
v0.8.0
- Adds
--datadir
that removes the need to add full paths in sample sheet (thanks @oschwengers) - Update medaka to v1.12.1 to support the newest models (#84 )
- New default medaka model is
r1041_e82_400bps_sup_v5.0.0
- New default medaka model is
- Adds
--mac
flag if you are running Hybracter on MacOS - it is now recommended from to run Hybracter on Linux if you want the latest Medaka models.- This is because ONT do not support bioconda install anymore and the latest version (v1.12.1) from pip doesn't work on Mac
--mac
will install and run Medaka v1.8.0 as in previous versions and user1041_e82_400bps_sup_v4.2.0
as default
0.7.3
- Enforce spades>=v3.15.2 in the
plassembler.yaml
environment - For some reason, the environment on Linux environments was being solved for v3.14.1, which was causing an error with Unicycler within Plassembler for some samples described (rrwick/Unicycler#318)