Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Searching OMIM disease by name (partial match) #49

Merged
merged 1 commit into from
Mar 20, 2024
Merged

Searching OMIM disease by name (partial match) #49

merged 1 commit into from
Mar 20, 2024

Conversation

apraga
Copy link
Contributor

@apraga apraga commented Mar 11, 2024

Hi,

From the documentation, I could not find how to look for an HPO entity by name (like get_hpo_object in pyhpo ?). Here's a first stab at it. As it's my first commit to a rust project, I could not run the test but would like your advice regarding this feature.

Thanks !

@anergictcell
Copy link
Owner

Thanks for the suggestion, this makes total sense to add to hpo. I'll try to help a bit with a few comments and remarks:

  1. In the PR you mention searching HPO by name, but you're actually searching for OmimDiseases. Just a minor comment, but I wasn't sure if that was intentional or not.
  2. In the docstring you mention name / symbol. I prefer to use name for disease or even substring for the disease_by_name function. symbol refers to the name of genes. I am a bit pedantic about the naming :)
  3. The diseases_by_name function could also return an Iterator instead. That way it does not need to allocate new menory, which would be more efficient. You could go the easy way and simply return animpl Iterator<Item = &OmimDisease>, but I acutally would prefer a dedicated struct. Something like:
struct DiseaseIter<'a, F> {
    inner: Filter<Values<'a, OmimDiseaseId, OmimDisease>, F>
}

impl<'a, F: FnMut(&&'a OmimDisease) -> bool + 'a> Iterator for DiseaseIter<'a, F> {
    type Item = &'a OmimDisease;
    fn next(&mut self) -> Option<Self::Item> {
        self.inner.next()
    }
}

impl Ontology {
// ...
    pub fn diseases_by_name<'a>(&'a self, symbol: &'a str) -> DiseaseIter<impl FnMut(&&'a OmimDisease) -> bool + 'a> {
        DiseaseIter{ inner: self.omim_diseases.values().filter(move |disease| disease.name() == symbol)}
    }
// ...
}

(I would actually prefer this even more strict and add the impl FnMut(.... part into the struct itself, but couldn't figure out how to do this right now. However, that's really not important, just a minor nitpick)

Your disease_by_name function is great the way it is!

As for testing during development, I use an LSP (rust-analyzer) in my IDE that helps me to make sure I don't make any typing errors while coding.
Then I always run

# Automatically fix formatting of the source code
cargo fmt

# Show suggestions how to improve my code
cargo clippy

# Run all tests
cargo test

Feel free to ask more questions, I'm happy to help whenever I can. And thanks for helping to make hpo better!

@apraga
Copy link
Contributor Author

apraga commented Mar 12, 2024

Thanks a lot for your thoughtful and very details answer. I've edited the commit accordingly. However, I was unable to make the following tests works as the result order does not seems to be deterministic.

    #[test]
    fn disease_by_name() {
        let ontology = Ontology::from_binary("tests/example.hpo").unwrap();
        let cystinosis = ontology.disease_by_name("Cystinosis").unwrap();
        assert_eq!(cystinosis.name(), "Cystinosis, nephropathic");
    }

    #[test]
    fn diseases_by_name() {
        let ontology = Ontology::from_binary("tests/example.hpo").unwrap();
        let cystinosis: Vec<_> = ontology
            .diseases_by_name("Cystinosis")
            .map(|d| d.name())
            .collect();
        let cystinosis_ref = [
            "Cystinosis, late-onset juvenile or adolescent nephropathic",
            "Cystinosis, adult nonnephropathic",
            "Cystinosis, nephropathic",
        ];
        assert_eq!(cystinosis, cystinosis_ref);
    }
}

Actually, I also want to be able to search terms by name too, hence the PR name :)

Copy link
Owner

@anergictcell anergictcell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey, the changes look very good, thanks a lot. I entered a few inline comments and have some minor suggestions in addition:

  • Can you rebase onto the current main branch. Some of the previous changes (e.g. version updates) are appearing in your commit but are already present in the main branch.

  • Please add some unit test cases as well, not only the doctests.

  • I suggest to add the following tetst cases for both functions:

    • query that does not match any diseases
    • query that matches exactly one disease
    • query that matches several diseases. You rightfully mention that the order changes, so you could only check for the number of returned items or check the the returned disease is within a set of your expected results.

src/ontology.rs Outdated Show resolved Hide resolved
src/ontology.rs Outdated Show resolved Hide resolved
src/ontology.rs Outdated Show resolved Hide resolved
@apraga apraga changed the title Searching HPO by name (partial match) Searching OMIM disease by name (partial match) Mar 19, 2024
One function to get only the first match, the second returns an
iterator for all matches, following @anergetictcell suggestions
in #49.
@apraga
Copy link
Contributor Author

apraga commented Mar 19, 2024

The code has been changed according to your suggestions with 3 test cases for both function. Thanks for your help.

@anergictcell anergictcell merged commit b1c08ff into anergictcell:main Mar 20, 2024
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants