Fixes...

pfeiferd · Feb 3, 2025 · 63da0b1 · 63da0b1
1 parent a635152
commit 63da0b1
Show file tree

Hide file tree

Showing 4 changed files with 10 additions and 14 deletions.
diff --git a/README.md b/README.md
@@ -169,9 +169,8 @@ The meaning of the columns is a follows:
 | `normalized kmers`      | *k*-mer counts from column `kmers` but normalized with respect to the total number of *k*-mers per fastq file and the number of specific *k*-mers for the tax id in the database. The value allows for a less biased comparison of *k*-mer counts across fastq files and across species. It is computed as `normalizedKMersFactor` * `kmers` */ k<sub>f</sub> * u / u<sub>t</sub>*, where *k<sub>f</sub>* is the total number of *k*-mers in the fastq file, *u* is the total number of *k*-mers in the database and *u<sub>t</sub>* is the number of specific *k*-mers for the tax id in the database. `normalizedKMersFactor` is a configuration property; its default is 1000000000 (see also Section [Configuration parameters](#configuration-parameters)). |
 | `exp. unique kmers`      | The number of expected unique *k*-mers, which is *u<sub>t</sub> * (1 - (1 - 1/u<sub>t</sub>)*<sup>`kmers`</sup>), where *u<sub>t</sub>* is the number of specific *k*-mers for the tax id in the database.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       |
 | `unique kmers / exp.`      | The ratio `unique kmers` / `exp. unique kmers` for the tax id. This should be close to 1 for a consistent match of *k*-mers. ([This paper](https://arxiv.org/pdf/1602.05822.pdf) discusses the corresponding background distribution (of `unique kmers`).)                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       |
-| `quality prediction`      | Computed as  `normalized kmers` * `unique kmers / exp.`. It combines the normalized counts of *k*-mers with the valued consistency between *k*-mers and unique *k*-mers.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         |
-| `max kmer counts` | The frequencies of the most frequent unique *k*-mers which are specific to the tax id's genome in descending order separated by `;`. This column is experimental and only added when the configuration property `matchWithKMerCounts` is set to `true`. The number of frequencies is determined via `maxKMerResCounts` (see also Section [Configuration parameters](#configuration-parameters)).                                                                                                                                                                                                                                                                                                                                                                 |
 | `reads >= 1 kmer` | Reads with at least on *k*-mer of the respective tax id.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         |
+| `max kmer counts` | The frequencies of the most frequent unique *k*-mers which are specific to the tax id's genome in descending order separated by `;`. This column is experimental and only added when the configuration property `matchWithKMerCounts` is set to `true`. The number of frequencies is determined via `maxKMerResCounts` (see also Section [Configuration parameters](#configuration-parameters)).                                                                                                                                                                                                                                                                                                                                                                 |
 The frequencies from `max kmer counts` can be used to build frequency graph's for *k*-mers as shown below. The frequency graphs help to further assess the validity of analysis results.
 
 <p align="center">

diff --git a/src/main/java/org/metagene/genestrip/GSConfigKey.java b/src/main/java/org/metagene/genestrip/GSConfigKey.java
@@ -146,12 +146,13 @@ public enum GSConfigKey implements ConfigKey {
 			+ "Using the bloom filter tends to shorten matching time, if the most part of the reads cannot be classified because they contain *no* *k*-mers from the database. "
 			+ "Otherwise, using the bloom filter might increase matching time by up to 30%. It also requires more main memory.")
 	USE_BLOOM_FILTER_FOR_MATCH("useBloomFilterForMatch", new BooleanConfigParamInfo(true), GSGoalKey.MATCH, GSGoalKey.MATCHLR),
-	@MDDescription("The absolute or relative maximum number of *k*-mers that do not have to be in the database for a read to be classified. "
+	@MDDescription("The absolute or relative maximum number of *k*-mers that do not need to be in the database for a read to be classified (read error count). "
 			+ "If the number is above `maxReadTaxErrorCount`, then the read will not be classified. "
 			+ "Otherwise the read will be classified in the same way as [done by Kraken](https://genomebiology.biomedcentral.com/articles/10.1186/gb-2014-15-3-r46/figures/1). "
 			+ "If `maxReadTaxErrorCount` is >= 1, then it is interpreted as an absolute number of *k*-mers. "
-			+ "Otherwise (and so, if >= 0 and < 1), it is interpreted as the ratio between the *k*-mers not in the database and all *k*-mers of the read.")
-	MAX_READ_TAX_ERROR_COUNT("maxReadTaxErrorCount", new DoubleConfigParamInfo(0, Double.MAX_VALUE, 0.5),
+			+ "Otherwise (and so, if >= 0 and < 1), it is interpreted as the ratio between the *k*-mers not in the database and all *k*-mers of the read."
+			+ "If `maxReadTaxErrorCount` < 0, then the read error count is disregarded, which means that even a single matching *k*-mer will lead to the read's classification.")
+	MAX_READ_TAX_ERROR_COUNT("maxReadTaxErrorCount", new DoubleConfigParamInfo(-1, Double.MAX_VALUE, -1),
 			GSGoalKey.MATCH, GSGoalKey.MATCHLR),
 	@MDDescription("If > 0, the corresponding number of frequencies of the most frequent *k*-mers per tax id will be reported.")
 	MAX_KMER_RES_COUNTS("maxKMerResCounts", new IntConfigParamInfo(0, 65536, 0), GSGoalKey.MATCH, GSGoalKey.MATCHLR),

diff --git a/src/main/java/org/metagene/genestrip/make/ConfigParamInfo.java b/src/main/java/org/metagene/genestrip/make/ConfigParamInfo.java
@@ -87,7 +87,7 @@ protected Integer fromString(String s) {
 		public boolean isValueInRange(Object value) {
 			if (value instanceof Integer) {
 				Integer i = (Integer) value;
-				return i >= min || i <= max;
+				return i >= min && i <= max;
 			}
 			return false;
 		}
@@ -126,7 +126,7 @@ protected Long fromString(String s) {
 		public boolean isValueInRange(Object value) {
 			if (value instanceof Long) {
 				Long i = (Long) value;
-				return i >= min || i <= max;
+				return i >= min && i <= max;
 			}
 			return false;
 		}
@@ -165,7 +165,7 @@ protected Double fromString(String s) {
 		public boolean isValueInRange(Object value) {
 			if (value instanceof Double) {
 				Double d = (Double) value;
-				return d >= min || d <= max;
+				return d >= min && d <= max;
 			}
 			return false;
 		}

diff --git a/src/main/java/org/metagene/genestrip/match/ResultReporter.java b/src/main/java/org/metagene/genestrip/match/ResultReporter.java
@@ -129,7 +129,7 @@ public void printMatchResult(MatchingResult res, PrintStream out, Database wrapp
 		out.print(
 				"name;rank;taxid;reads;kmers from reads;kmers;unique kmers;contigs;average contig length;max contig length;max contig desc.;");
 		if (estimator != null) {
-			out.print("db coverage;normalized kmers;exp. unique kmers;unique kmers / exp.;quality prediction;");
+			out.print("db coverage;normalized kmers;exp. unique kmers;unique kmers / exp.;");
 		}
 		out.print("normalized reads; reads >= 1 kmer; normalized reads >= 1 kmer; reads >= 1 kmer bps; avg read >= 1 kmer len; normalized reads * avg len;");
 		if (res.isWithMaxKMerCounts()) {
@@ -204,11 +204,7 @@ public void printMatchResult(MatchingResult res, PrintStream out, Database wrapp
 						out.print(DF.format(expUnique));
 						out.print(';');
 
-						double cScore1 = stats.getUniqueKMers() / expUnique;
-						out.print(DF.format(cScore1));
-						out.print(';');
-
-						out.print(DF.format(normalizedKMers * cScore1));
+						out.print(DF.format(stats.getUniqueKMers() / expUnique));
 						out.print(';');
 
 						/*