Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merge main #96

Merged
merged 21 commits into from
Jan 12, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
21 commits
Select commit Hold shift + click to select a range
6f2b904
ENH: dvs_nmost and dvs_max return same type as input
GavinHuttley Nov 22, 2024
f82b231
DEV: bump version to 2024.11.22a1
GavinHuttley Nov 22, 2024
fd56640
Merge pull request #84 from GavinHuttley/main
GavinHuttley Nov 22, 2024
9bc7649
Bump astral-sh/setup-uv from 3 to 4
dependabot[bot] Nov 25, 2024
48f9522
Bump ruff from 0.7.3 to 0.8.1
dependabot[bot] Dec 2, 2024
7bf374f
Merge pull request #87 from HuttleyLab/dependabot/pip/ruff-0.8.1
GavinHuttley Dec 2, 2024
2ef9c52
Merge pull request #85 from HuttleyLab/dependabot/github_actions/astr…
GavinHuttley Dec 2, 2024
6359c57
Bump ruff from 0.8.1 to 0.8.2
dependabot[bot] Dec 9, 2024
d0e9399
Merge pull request #88 from HuttleyLab/dependabot/pip/ruff-0.8.2
GavinHuttley Dec 10, 2024
feae226
Bump ruff from 0.8.2 to 0.8.3
dependabot[bot] Dec 16, 2024
7054664
Merge pull request #89 from HuttleyLab/dependabot/pip/ruff-0.8.3
GavinHuttley Dec 22, 2024
df2c93c
Bump astral-sh/setup-uv from 4 to 5
dependabot[bot] Dec 23, 2024
e3c72ef
Bump ruff from 0.8.3 to 0.8.4
dependabot[bot] Dec 23, 2024
e510c88
DEV: address rename of piqtree2 to piqtree
GavinHuttley Dec 26, 2024
769fc8b
Merge pull request #91 from HuttleyLab/dependabot/pip/ruff-0.8.4
GavinHuttley Dec 26, 2024
f14571c
Merge pull request #92 from GavinHuttley/main
GavinHuttley Dec 26, 2024
4d5fb73
Merge pull request #90 from HuttleyLab/dependabot/github_actions/astr…
GavinHuttley Dec 26, 2024
a22bace
REL: bumped version to 2024.12.26a1
GavinHuttley Dec 26, 2024
6f615e6
Merge pull request #93 from GavinHuttley/main
GavinHuttley Dec 26, 2024
3bdfeb1
DOC: updated bib to include doi
GavinHuttley Jan 12, 2025
bcc24b4
Merge pull request #95 from GavinHuttley/JOSS
GavinHuttley Jan 12, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .github/workflows/ci-cogent3-dev.yml
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@ jobs:
python-version: "3.12"

- name: Install uv
uses: astral-sh/setup-uv@v3
uses: astral-sh/setup-uv@v5
with:
enable-cache: true

Expand Down
287 changes: 175 additions & 112 deletions paper/paper.bib
Original file line number Diff line number Diff line change
@@ -1,176 +1,239 @@

@article{parks.2018.natbiotechnol,
author = {Parks, Donovan H. and Chuvochina, Maria and Waite, David W. and Rinke, Christian and Skarshewski, Adam and Chaumeil, Pierre-Alain and Hugenholtz, Philip},
journal = {Nature Biotechnology},
month = nov,
number = {10},
pages = {996--1004},
title = {A standardized bacterial taxonomy based on genome phylogeny substantially revises the tree of life},
volume = {36},
year = {2018}}
@article{balaban.2019.plosone,
author = {Balaban, Metin and Moshiri, Niema and Mai, Uyen and Jia, Xingfan and Mirarab, Siavash},
doi = {10.1371/journal.pone.0221068},
editor = {Bozdag, Serdar},
issn = {1932-6203},
journal = {PLOS ONE},
language = {en},
month = aug,
number = {8},
pages = {e0221068},
shorttitle = {{TreeCluster}},
title = {{TreeCluster}: {Clustering} biological sequences using phylogenetic trees},
url = {https://dx.plos.org/10.1371/journal.pone.0221068},
urldate = {2024-08-14},
volume = {14},
year = {2019}}

@book{sokal.1995,
address = {New York},
author = {Sokal, R R and Rohlf, F J},
edition = {3},
publisher = {W. H. Freeman and Company},
title = {Biometry},
year = {1995}}
@article{widmann.2006.molcellproteomics,
author = {Widmann, Jeremy and Hamady, Micah and Knight, Rob},
doi = {10.1074/mcp.T600022-MCP200},
journal = {Mol Cell Proteomics},
language = {eng},
number = {8},
pages = {1520--1532},
pmid = {16769708},
title = {{DivergentSet}, a tool for picking non-redundant sequences from large sequence collections.},
volume = {5},
year = {2006}}

@article{schneider.1990.nucleicacidsres,
author = {Schneider, T D and Stephens, R M},
doi = {10.1093/NAR/18.20.6097},
issn = {0305-1048},
journal = {Nucleic acids research},
month = oct,
number = {20},
pages = {6097--100},
pmid = {2172928},
title = {Sequence logos: a new way to display consensus sequences.},
url = {http://www.ncbi.nlm.nih.gov/pubmed/2172928},
urldate = {2016-11-08},
volume = {18},
year = {1990}}

@article{lin.1991.ieeetrans.inf.theory,
author = {Lin, J.},
doi = {10.1109/18.61115},
issn = {1557-9654},
journal = {IEEE Transactions on Information Theory},
month = jan,
number = {1},
pages = {145--151},
title = {Divergence measures based on the {Shannon} entropy},
url = {https://ieeexplore.ieee.org/document/61115},
urldate = {2024-08-21},
volume = {37},
year = {1991}}

@article{widmann.2006.molcellproteomics,
author = {Widmann, Jeremy and Hamady, Micah and Knight, Rob},
journal = {Mol Cell Proteomics},
number = {8},
pages = {1520--1532},
title = {{DivergentSet}, a tool for picking non-redundant sequences from large sequence collections.},
volume = {5},
year = {2006}}
@article{lake.1994.procnatlacadsciua,
author = {Lake, J A},
journal = {Proc Natl Acad Sci U S A},
language = {eng},
number = {4},
pages = {1455--1459},
pmid = {8108430},
title = {Reconstructing evolutionary trees from {DNA} and protein sequences: paralinear distances.},
url = {https://pubmed.ncbi.nlm.nih.gov/8108430},
volume = {91},
year = {1994}}

@article{harrison.2024.nucleicacidsresearch,
author = {Harrison, Peter W and Amode, M Ridwan and Austine-Orimoloye, Olanrewaju and Azov, Andrey G and Barba, Matthieu and Barnes, If and Becker, Arne and Bennett, Ruth and Berry, Andrew and Bhai, Jyothish and Bhurji, Simarpreet Kaur and Boddu, Sanjay and Branco Lins, Paulo R and Brooks, Lucy and Ramaraju, Shashank Budhanuru and Campbell, Lahcen I and Martinez, Manuel Carbajo and Charkhchi, Mehrnaz and Chougule, Kapeel and Cockburn, Alexander and Davidson, Claire and De Silva, Nishadi H and Dodiya, Kamalkumar and Donaldson, Sarah and El Houdaigui, Bilal and Naboulsi, Tamara El and Fatima, Reham and Giron, Carlos Garcia and Genez, Thiago and Grigoriadis, Dionysios and Ghattaoraya, Gurpreet S and Martinez, Jose Gonzalez and Gurbich, Tatiana A and Hardy, Matthew and Hollis, Zoe and Hourlier, Thibaut and Hunt, Toby and Kay, Mike and Kaykala, Vinay and Le, Tuan and Lemos, Diana and Lodha, Disha and Marques-Coelho, Diego and Maslen, Gareth and Merino, Gabriela Alejandra and Mirabueno, Louisse Paola and Mushtaq, Aleena and Hossain, Syed Nakib and Ogeh, Denye N and Sakthivel, Manoj Pandian and Parker, Anne and Perry, Malcolm and Pili{\v z}ota, Ivana and Poppleton, Daniel and Prosovetskaia, Irina and Raj, Shriya and P{\'e}rez-Silva, Jos{\'e} G and Salam, Ahamed Imran Abdul and Saraf, Shradha and Saraiva-Agostinho, Nuno and Sheppard, Dan and Sinha, Swati and Sipos, Botond and Sitnik, Vasily and Stark, William and Steed, Emily and Suner, Marie-Marthe and Surapaneni, Likhitha and Sutinen, Ky{\"o}sti and Tricomi, Francesca Floriana and Urbina-G{\'o}mez, David and Veidenberg, Andres and Walsh, Thomas A and Ware, Doreen and Wass, Elizabeth and Willhoft, Natalie L and Allen, Jamie and Alvarez-Jarreta, Jorge and Chakiachvili, Marc and Flint, Bethany and Giorgetti, Stefano and Haggerty, Leanne and Ilsley, Garth R and Keatley, Jon and Loveland, Jane E and Moore, Benjamin and Mudge, Jonathan M and Naamati, Guy and Tate, John and Trevanion, Stephen J and Winterbottom, Andrea and Frankish, Adam and Hunt, Sarah E and Cunningham, Fiona and Dyer, Sarah and Finn, Robert D and Martin, Fergal J and Yates, Andrew D},
doi = {10.1093/nar/gkad1049},
issn = {0305-1048},
journal = {Nucleic Acids Research},
month = jan,
number = {D1},
pages = {D891--D899},
title = {Ensembl 2024},
url = {https://doi.org/10.1093/nar/gkad1049},
urldate = {2024-08-22},
volume = {52},
year = {2024}}

@inproceedings{numba,
address = {New York, NY, USA},
author = {Lam, Siu Kwan and Pitrou, Antoine and Seibert, Stanley},
booktitle = {Proceedings of the {Second} {Workshop} on the {LLVM} {Compiler} {Infrastructure} in {HPC}},
month = nov,
pages = {1--6},
publisher = {Association for Computing Machinery},
series = {{LLVM} '15},
title = {Numba: a {LLVM}-based {Python} {JIT} compiler},
year = {2015}}

@article{zhu.2019.nat.commun,
author = {Zhu, Qiyun and Mai, Uyen and Pfeiffer, Wayne and Janssen, Stefan and Asnicar, Francesco and Sanders, Jon G. and Belda-Ferre, Pedro and Al-Ghalith, Gabriel A. and Kopylova, Evguenia and McDonald, Daniel and Kosciolek, Tomasz and Yin, John B. and Huang, Shi and Salam, Nimaichand and Jiao, Jian-Yu and Wu, Zijun and Xu, Zhenjiang Z. and Cantrell, Kalen and Yang, Yimeng and Sayyari, Erfan and Rabiee, Maryam and Morton, James T. and Podell, Sheila and Knights, Dan and Li, Wen-Jun and Huttenhower, Curtis and Segata, Nicola and Smarr, Larry and Mirarab, Siavash and Knight, Rob},
doi = {10.1038/s41467-019-13443-4},
journal = {Nature Communications},
number = {1},
pages = {5477},
pmid = {31792218},
title = {Phylogenomics of 10,575 genomes reveals evolutionary proximity between domains {Bacteria} and {Archaea}},
url = {https://doi.org/10.1038/s41467-019-13443-4},
volume = {10},
year = {2019}}

@book{sokal.1995,
address = {New York},
author = {Sokal, R R and Rohlf, F J},
edition = {3},
publisher = {W. H. Freeman and Company},
title = {Biometry},
year = {1995}}

@article{knight.2007.genomebiol,
author = {Knight, R and Maxwell, P and Birmingham, A and Carnes, J and Caporaso, J G and Easton, B C and Eaton, M and Hamady, M and Lindsay, H and Liu, Z and Lozupone, C and McDonald, D and Robeson, M and Sammut, R and Smit, S and Wakefield, M J and Widmann, J and Wikman, S and Wilson, S and Ying, H and Huttley, G A},
doi = {10.1186/gb-2007-8-8-r171},
journal = {Genome Biol},
language = {ENG},
number = {8},
pages = {R171},
pmid = {17708774},
title = {{PyCogent}: a toolkit for making sense from sequence.},
url = {https://www.ncbi.nlm.nih.gov/pubmed/17708774},
volume = {8},
year = {2007}}

@article{lake.1994.procnatlacadsciua,
author = {Lake, J A},
journal = {Proc Natl Acad Sci U S A},
number = {4},
pages = {1455--1459},
title = {Reconstructing evolutionary trees from {DNA} and protein sequences: paralinear distances.},
volume = {91},
year = {1994}}

@article{cleveland.1979.j.am.stat.assoc,
author = {Cleveland, William S.},
doi = {10.1080/01621459.1979.10481038},
issn = {0162-1459},
journal = {Journal of the American Statistical Association},
month = dec,
number = {368},
pages = {829--836},
title = {Robust {Locally} {Weighted} {Regression} and {Smoothing} {Scatterplots}},
url = {https://www.tandfonline.com/doi/abs/10.1080/01621459.1979.10481038},
urldate = {2024-08-26},
volume = {74},
year = {1979}}

@article{schneider.1990.nucleicacidsres,
author = {Schneider, T D and Stephens, R M},
journal = {Nucleic acids research},
month = oct,
number = {20},
pages = {6097--100},
title = {Sequence logos: a new way to display consensus sequences.},
volume = {18},
year = {1990}}

@article{ondov.2016.mash,
title={Mash: fast genome and metagenome distance estimation using MinHash},
author={Ondov, Brian D and Treangen, Todd J and Melsted, P{\'a}ll and Mallonee, Adam B and Bergman, Nicholas H and Koren, Sergey and Phillippy, Adam M},
journal={Genome biology},
volume={17},
pages={1--14},
year={2016},
publisher={Springer}
}


@article{minh.2020.iq,
title={IQ-TREE 2: new models and efficient methods for phylogenetic inference in the genomic era},
author={Minh, Bui Quang and Schmidt, Heiko A and Chernomor, Olga and Schrempf, Dominik and Woodhams, Michael D and Von Haeseler, Arndt and Lanfear, Robert},
journal={Molecular biology and evolution},
volume={37},
number={5},
pages={1530--1534},
year={2020},
publisher={Oxford University Press}
}

@article{murtagh.2012.algorithms,
title={Algorithms for hierarchical clustering: an overview},
author={Murtagh, Fionn and Contreras, Pedro},
journal={Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery},
volume={2},
number={1},
pages={86--97},
year={2012},
publisher={Wiley Online Library}
}

@article{tavare.1986.some,
title={Some probabilistic and statistical problems on the analysis of DNA sequence.},
author={Tavar{\'e}, Simon},
journal={Lecture of Mathematics for Life Science},
volume={17},
pages={57},
year={1986}
}
@article{saitou.1987.mol.biol.evol,
author = {Saitou, N and Nei, M},
journal = {Mol. Biol. Evol.},
number = {4},
pages = {406--425},
title = {The neighbor-joining method: a new method for reconstructing phylogenetic trees},
url = {https://www.ncbi.nlm.nih.gov/pubmed/3447015},
volume = {4},
year = {1987}}

@article{choi.2017.ismej,
author = {Choi, Jinlyung and Yang, Fan and Stepanauskas, Ramunas and Cardenas, Erick and Garoutte, Aaron and Williams, Ryan and Flater, Jared and Tiedje, James M. and Hofmockel, Kirsten S. and Gelder, Brian and Howe, Adina},
doi = {10.1038/ismej.2016.168},
issn = {1751-7370},
journal = {The ISME journal},
language = {eng},
month = apr,
number = {4},
pages = {829--834},
pmcid = {PMC5364351},
pmid = {27935589},
title = {Strategies to improve reference databases for soil microbiomes},
volume = {11},
year = {2017}}

@article{saitou.1987.mol.biol.evol,
author = {Saitou, N and Nei, M},
journal = {Mol. Biol. Evol.},
number = {4},
pages = {406--425},
title = {The neighbor-joining method: a new method for reconstructing phylogenetic trees},
volume = {4},
year = {1987}}
@article{parks.2018.natbiotechnol,
author = {Parks, Donovan H. and Chuvochina, Maria and Waite, David W. and Rinke, Christian and Skarshewski, Adam and Chaumeil, Pierre-Alain and Hugenholtz, Philip},
doi = {10.1038/nbt.4229},
issn = {1546-1696},
journal = {Nature Biotechnology},
language = {en},
month = nov,
number = {10},
pages = {996--1004},
title = {A standardized bacterial taxonomy based on genome phylogeny substantially revises the tree of life},
url = {https://www.nature.com/articles/nbt.4229},
urldate = {2024-08-27},
volume = {36},
year = {2018}}

@article{balaban.2019.plosone,
author = {Balaban, Metin and Moshiri, Niema and Mai, Uyen and Jia, Xingfan and Mirarab, Siavash},
journal = {PLOS ONE},
month = aug,
number = {8},
pages = {e0221068},
title = {{TreeCluster}: {Clustering} biological sequences using phylogenetic trees},
volume = {14},
year = {2019}}
@inproceedings{numba,
address = {New York, NY, USA},
author = {Lam, Siu Kwan and Pitrou, Antoine and Seibert, Stanley},
booktitle = {Proceedings of the {Second} {Workshop} on the {LLVM} {Compiler} {Infrastructure} in {HPC}},
doi = {10.1145/2833157.2833162},
isbn = {978-1-4503-4005-2},
month = nov,
pages = {1--6},
publisher = {Association for Computing Machinery},
series = {{LLVM} '15},
shorttitle = {Numba},
title = {Numba: a {LLVM}-based {Python} {JIT} compiler},
url = {https://dl.acm.org/doi/10.1145/2833157.2833162},
urldate = {2024-09-22},
year = {2015}}

@article{minh.2020.iq,
author = {Minh, Bui Quang and Schmidt, Heiko A and Chernomor, Olga and Schrempf, Dominik and Woodhams, Michael D and von Haeseler, Arndt and Lanfear, Robert},
doi = {10.1093/molbev/msaa015},
issn = {0737-4038},
journal = {Molecular Biology and Evolution},
month = may,
number = {5},
pages = {1530--1534},
shorttitle = {{IQ}-{TREE} 2},
title = {{IQ}-{TREE} 2: {New} {Models} and {Efficient} {Methods} for {Phylogenetic} {Inference} in the {Genomic} {Era}},
url = {https://doi.org/10.1093/molbev/msaa015},
urldate = {2024-10-16},
volume = {37},
year = {2020}}

@article{murtagh.2012.algorithms,
author = {Murtagh, Fionn and Contreras, Pedro},
doi = {10.1002/widm.53},
issn = {1942-4795},
journal = {WIREs Data Mining and Knowledge Discovery},
language = {en},
number = {1},
pages = {86--97},
shorttitle = {Algorithms for hierarchical clustering},
title = {Algorithms for hierarchical clustering: an overview},
url = {https://onlinelibrary.wiley.com/doi/abs/10.1002/widm.53},
urldate = {2025-01-12},
volume = {2},
year = {2012}}

@article{ondov.2016.mash,
author = {Ondov, Brian D. and Treangen, Todd J. and Melsted, P?ll and Mallonee, Adam B. and Bergman, Nicholas H. and Koren, Sergey and Phillippy, Adam M.},
doi = {10.1186/s13059-016-0997-x},
issn = {1474-760X},
journal = {Genome Biology},
month = dec,
number = {1},
pages = {132},
title = {Mash: fast genome and metagenome distance estimation using {MinHash}},
url = {http://genomebiology.biomedcentral.com/articles/10.1186/s13059-016-0997-x},
urldate = {2017-07-04},
volume = {17},
year = {2016}}

@article{tavare.1986.some,
author = {Tavare, S},
journal = {Lec. Math. Life Sci.},
pages = {57--86},
title = {Some probabilistic and statistical problems in the analysis of {DNA} sequences.},
volume = {17},
year = {1986}}
6 changes: 3 additions & 3 deletions pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -50,7 +50,7 @@ test = [
"pytest",
"pytest-cov",
"pytest-xdist",
"ruff==0.7.3",
"ruff==0.8.4",
]
dev = [
"cogapp",
Expand All @@ -60,7 +60,7 @@ dev = [
"pytest",
"pytest-cov",
"pytest-xdist",
"ruff==0.7.3",
"ruff==0.8.4",
]
doc = ["click",
"ipykernel",
Expand All @@ -75,7 +75,7 @@ doc = ["click",
"nbformat",
"nbsphinx",
"numpydoc",
"piqtree2",
"piqtree",
"pandas",
"plotly",
"requests",
Expand Down
2 changes: 1 addition & 1 deletion src/diverse_seq/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,4 +4,4 @@
# found by h5py
import hdf5plugin # noqa

__version__ = "2024.11.8a3"
__version__ = "2024.12.26a1"
Loading
Loading