Skip to content

Commit

Permalink
DOC: expand readme to include result of app_help()
Browse files Browse the repository at this point in the history
  • Loading branch information
GavinHuttley committed Aug 25, 2024
1 parent 1335266 commit d0c2e0d
Showing 1 changed file with 184 additions and 9 deletions.
193 changes: 184 additions & 9 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -38,7 +38,7 @@ Commands:

### `dvgt prep`: Preparing the sequence data

The sequences need to be processed before running the `max` command. This is done with the `prep` command.
Convert sequence data into a more efficient format for the diversity assessment. This must be done before running either the `nmost` or `max` commands.

#### Usage:

Expand All @@ -62,7 +62,7 @@ Options:
-s, --seqdir PATH directory containing sequence files [required]
-sf, --suffix TEXT sequence file suffix [default: fa]
-o, --outpath PATH location to write processed seqs [required]
-p, --parallel run in parallel
-np, --numprocs INTEGER number of processes [default: 1]
-F, --force_overwrite Overwrite existing file if it exists
-m, --moltype [dna|rna] Molecular type of sequences, defaults to DNA
[default: dna]
Expand All @@ -72,11 +72,120 @@ Options:
```
<!-- [[[end]]] -->

### `dvgt nmost`: Select the n-most divergent sequences

We recommend using `nmost` for large datasets.

> **Note**
> A fuller explanation is coming soon!
#### Command line usage:

<!-- [[[cog
import cog
from divergent.cli import main
from click.testing import CliRunner
runner = CliRunner()
result = runner.invoke(main, ["nmost", "--help"])
help = result.output.replace("Usage: main", "Usage: dvgt")
cog.out(
"```\n{}\n```".format(help)
)
]]] -->
```
Usage: dvgt nmost [OPTIONS]
Identify n seqs that maximise average delta JSD
Options:
-s, --seqfile PATH path to .dvtgseqs file [required]
-o, --outpath PATH the input string will be cast to Path instance
-n, --number INTEGER number of seqs in divergent set [required]
-k INTEGER k-mer size [default: 6]
-i, --include TEXT seqnames to include in divergent set
-np, --numprocs INTEGER number of processes [default: 1]
-L, --limit INTEGER number of sequences to process
-v, --verbose is an integer indicating number of cl occurrences
[default: 0]
--help Show this message and exit.
```
<!-- [[[end]]] -->

#### As a cogent3 plugin:

The `dvgt_select_nmost` is also available as a [cogent3 app](https://cogent3.org/doc/app/index.html). The result of using `cogent3.app_help("dvgt_select_nmost")` is shown below.

<!-- [[[cog
import cog
import contextlib
import io
from cogent3 import app_help
buffer = io.StringIO()
with contextlib.redirect_stdout(buffer):
app_help("dvgt_select_nmost")
cog.out(
"```\n{}\n```".format(buffer.getvalue())
)
]]] -->
```
Overview
--------
selects the n-most divergent seqs from a sequence collection
Options for making the app
--------------------------
dvgt_select_nmost_app = get_app(
'dvgt_select_nmost',
n=3,
moltype='dna',
include=None,
k=6,
seed=None,
)
Parameters
----------
n
the number of divergent sequences
moltype
molecular type of the sequences
k
k-mer size
include
sequence names to include in the final result
seed
random number seed
Notes
-----
If called with an alignment, the ungapped sequences are used.
The order of the sequences is randomised. If include is not None, the
named sequences are added to the final result.
Input type
----------
Alignment, SequenceCollection, ArrayAlignment
Output type
-----------
Alignment, SequenceCollection, ArrayAlignment
```
<!-- [[[end]]] -->

### `dvgt max`: Maximise average delta JSD

Once the sequence data has been prepared using `dvgt prep`, the `max` command can be used to identify the sequences that maximise the Jensen-Shannon divergence. The kmer frequencies of the sequences are used to determine the Jensen-Shannon divergence
The result of the `max` command is typically a set that are modestly more divergent than that fron `nmost`.

#### Usage:
> **Note**
> A fuller explanation is coming soon!
#### Command line usage:

<!-- [[[cog
import cog
Expand All @@ -99,7 +208,9 @@ Options:
-o, --outpath PATH the input string will be cast to Path instance
-z, --min_size INTEGER minimum size of divergent set [default: 7]
-zp, --max_size INTEGER maximum size of divergent set
-k INTEGER k-mer size [default: 3]
-k INTEGER k-mer size [default: 6]
-st, --stat [stdev|cov] statistic to maximise [default: stdev]
-i, --include TEXT seqnames to include in divergent set
-np, --numprocs INTEGER number of processes [default: 1]
-L, --limit INTEGER number of sequences to process
-T, --test_run reduce number of paths and size of query seqs
Expand All @@ -110,11 +221,75 @@ Options:
```
<!-- [[[end]]] -->

## Running the tests

#### As a cogent3 plugin:

The `dvgt_select_nmost` is also available as a [cogent3 app](https://cogent3.org/doc/app/index.html). The result of using `cogent3.app_help("dvgt_select_nmost")` is shown below.

<!-- [[[cog
import cog
import contextlib
import io
from cogent3 import app_help
buffer = io.StringIO()
with contextlib.redirect_stdout(buffer):
app_help("dvgt_select_max")
cog.out(
"```\n{}\n```".format(buffer.getvalue())
)
]]] -->
```
$ pytest -n auto
```
Overview
--------
selects the maximally divergent seqs from a sequence collection
Options for making the app
--------------------------
dvgt_select_max_app = get_app(
'dvgt_select_max',
min_size=3,
max_size=10,
stat='stdev',
moltype='dna',
include=None,
k=6,
seed=None,
)
Parameters
----------
min_size
minimum size of the divergent set
max_size
the maximum size if the divergent set
stat
statistic for maximising the set, either mean_delta_jsd, mean_jsd, total_jsd
moltype
molecular type of the sequences
include
sequence names to include in the final result
k
k-mer size
seed
random number seed
This runs in parallel, greatly speeding things up.
Notes
-----
If called with an alignment, the ungapped sequences are used.
The order of the sequences is randomised. If include is not None, the
named sequences are added to the final result.
Input type
----------
Alignment, SequenceCollection, ArrayAlignment
Output type
-----------
Alignment, SequenceCollection, ArrayAlignment
```
<!-- [[[end]]] -->

0 comments on commit d0c2e0d

Please sign in to comment.