Skip to content

Commit

Permalink
Merge pull request #43 from GavinHuttley/main
Browse files Browse the repository at this point in the history
DOC: simplify the README
  • Loading branch information
GavinHuttley authored Aug 29, 2024
2 parents d4dc97a + 5d54162 commit f3a72cf
Showing 1 changed file with 32 additions and 49 deletions.
81 changes: 32 additions & 49 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,45 +4,16 @@
[![CodeQL](https://github.com/HuttleyLab/DiverseSeq/actions/workflows/codeql.yml/badge.svg)](https://github.com/HuttleyLab/DiverseSeq/actions/workflows/codeql.yml)
[![Ruff](https://img.shields.io/endpoint?url=https://raw.githubusercontent.com/astral-sh/ruff/main/assets/badge/v2.json)](https://github.com/astral-sh/ruff)

# DiverseSeq identifies the most diverse biological sequences from a collection
# `diverse_seq` identifies the most diverse biological sequences from a collection

`diverse_seq` provides tools for selecting a representative subset of sequences from a larger collection. It is an alignment-free method which scales linearly with the number of sequences. It identifies the subset of sequences that maximize diversity as measured using Jensen-Shannon divergence. `DiverseSeq` provides a command-line tool (`dvs`) and plugins to the Cogent3 app system (prefixed by `dvs_`) allowing users to embed code in their own scripts. The command-line tools can be run in parallel.
`diverse_seq` provides tools for selecting a representative subset of sequences from a larger collection. It is an alignment-free method which scales linearly with the number of sequences. It identifies the subset of sequences that maximize diversity as measured using Jensen-Shannon divergence. `diverse_seq` provides a command-line tool (`dvs`) and plugins to the Cogent3 app system (prefixed by `dvs_`) allowing users to embed code in their own scripts. The command-line tools can be run in parallel.

## The available commands

<!-- [[[cog
import cog
from diverse_seq.cli import main
from click.testing import CliRunner
runner = CliRunner()
result = runner.invoke(main, ["--help"])
help = result.output.replace("Usage: main", "Usage: dvs")
cog.out(
"```\n{}\n```".format(help)
)
]]] -->
```
Usage: dvs [OPTIONS] COMMAND [ARGS]...
dvs -- alignment free detection of the most diverse sequences using JSD
Options:
--version Show the version and exit.
--help Show this message and exit.
Commands:
prep Writes processed sequences to a <HDF5 file>.dvseqs.
max Identify the seqs that maximise average delta JSD
nmost Identify n seqs that maximise average delta JSD
```
<!-- [[[end]]] -->

### `dvs prep`: Preparing the sequence data
### `dvs prep`: preparing the sequence data

Convert sequence data into a more efficient format for the diversity assessment. This must be done before running either the `nmost` or `max` commands.

#### Usage:
<details>
<summary>CLI options for dvs prep</summary>

<!-- [[[cog
import cog
Expand Down Expand Up @@ -75,14 +46,17 @@ Options:
```
<!-- [[[end]]] -->

### `dvs nmost`: Select the n-most diverse sequences
</details>

### `dvs nmost`: select the n-most diverse sequences

We recommend using `nmost` for large datasets.
Selects the n sequences that maximise the total JSD. We recommend using `nmost` for large datasets.

> **Note**
> A fuller explanation is coming soon!
#### Command line usage:
<details>
<summary>Options for command line dvs nmost</summary>

<!-- [[[cog
import cog
Expand Down Expand Up @@ -116,7 +90,10 @@ Options:
```
<!-- [[[end]]] -->

#### As a cogent3 plugin:
</details>

<details>
<summary>Options for cogent3 app dvs_select_nmost</summary>

The `dvs nmost` is also available as the [cogent3 app](https://cogent3.org/doc/app/index.html) `dvs_select_nmost`. The result of using `cogent3.app_help("dvs_select_nmost")` is shown below.

Expand Down Expand Up @@ -173,23 +150,26 @@ named sequences are added to the final result.
Input type
----------
ArrayAlignment, Alignment, SequenceCollection
SequenceCollection, Alignment, ArrayAlignment
Output type
-----------
ArrayAlignment, Alignment, SequenceCollection
SequenceCollection, Alignment, ArrayAlignment
```
<!-- [[[end]]] -->
</details>


### `dvs max`: Maximise average delta JSD
### `dvs max`: maximise variance in the selected sequences

The result of the `max` command is typically a set that are modestly more diverse than that fron `nmost`.
The result of the `max` command is typically a set that are modestly more diverse than that from `nmost`.

> **Note**
> A fuller explanation is coming soon!
#### Command line usage:
<details>
<summary>Options for command line dvs max</summary>

<!-- [[[cog
import cog
Expand Down Expand Up @@ -226,10 +206,12 @@ Options:
```
<!-- [[[end]]] -->

</details>

#### As a cogent3 plugin:
<details>
<summary>Options for cogent3 app dvs_select_max</summary>

The `dvs max` is also available as the [cogent3 app](https://cogent3.org/doc/app/index.html) `dvs_select_max`. The result of using `cogent3.app_help("dvs_select_max")` is shown below.
The `dvs max` is also available as the [cogent3 app](https://cogent3.org/doc/app/index.html) `dvs_select_max`.

<!-- [[[cog
import cog
Expand All @@ -256,8 +238,8 @@ Options for making the app
--------------------------
dvs_select_max_app = get_app(
'dvs_select_max',
min_size=3,
max_size=10,
min_size=5,
max_size=30,
stat='stdev',
moltype='dna',
include=None,
Expand Down Expand Up @@ -291,11 +273,12 @@ named sequences are added to the final result.
Input type
----------
ArrayAlignment, Alignment, SequenceCollection
SequenceCollection, Alignment, ArrayAlignment
Output type
-----------
ArrayAlignment, Alignment, SequenceCollection
SequenceCollection, Alignment, ArrayAlignment
```
<!-- [[[end]]] -->
</details>

0 comments on commit f3a72cf

Please sign in to comment.