Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error when multiple copies of a part are detected in directory inventory #253

Open
jakebeal opened this issue Feb 25, 2022 · 7 comments
Open
Assignees
Labels
automation GitHub action automation work bug Something isn't working good first issue Good for newcomers

Comments

@jakebeal
Copy link
Contributor

Currently, if there are multiple copies of a part found during the directory inventory, one will be arbitrarily selected to use. Instead, we should cause this to be an error, so that it can fail fast and be corrected.

Example of this problem: pSB1C5 in both FASTA and GenBank format causing a change in #252

@jakebeal jakebeal added bug Something isn't working good first issue Good for newcomers automation GitHub action automation work labels Feb 25, 2022
@plouisross plouisross self-assigned this Jun 25, 2022
@plouisross
Copy link
Collaborator

Comment

@plouisross plouisross assigned plouisross and unassigned plouisross Jun 25, 2022
@jguillen15
Copy link

Hi Jake!
@plouisross and I are trying to tackle this issue, but we have a few questions:
-We don’t know if we are getting the context right, we understand as if there is a script that goes through each directory doing an inventory of parts, if it finds the part referenced in different formats(e.g. FASTA, GenBank) it selects one arbitrarily. Is this correct?
-Assuming the previous to be true, which script is the one in charge of doing the inventory and arbitrarily selecting the parts?
-We don´t clearly see the relation with issue #252 , could you please explain further?
Thanks!

@jakebeal
Copy link
Contributor Author

The script that does this scripts/scriptutils/part_retrieval.py. I believe this problem is in the PackageInventory class, where it gives a warning rather than an error when it finds a duplicate:

logging.warning(f'Inventory found duplicate of part {key}')

There might turn out to be other part of the code that contribute to the failure as well, however. The thing to do will be to make a test that fails due to this issue, then change the code to fix it.

Regarding pull request #252: this issue caused an unexpected change when the automation ran on the pull request, which is what caused us to notice the issue and file it here.

@jguillen15
Copy link

Hi Jake!
@plouisross and I continued working on this issue and we have two more questions about it:

-For creating the test we need a directory that contains a duplicate of a part, but we are not sure which directory we can take from the distribution that presents this particularity, in part because we are not sure what multiple copies of a part should look like.

-In the process of finding a directory that contains a duplicate part, we found something that confuses us a bit and changed the previous concept of a duplicate part we had. Originally we understood it as if it was a part saved in different formats(e.g. FASTA, GenBank) within the same directory, but we saw some commentaries in the CRISPR-Cas folder that might indicate multiple copies of a part is something that can be read in a single file. Could you please clarify this to us?

Thanks.

@jakebeal
Copy link
Contributor Author

The CRISPR-Cas folder in the main branch currently has an example of the problem: iGEM_raw_imports.fasta contains pSB1C5, and so does pSB1C5.gb.

If you also copied the pSB1C5.gb file to another file named pSB1C5-copy.gb, that would be two copies in the same format.

If you changed this line in iGEM_raw_imports.fasta:

to instead say: "> pSB1C5", then there would be two copies of pSB1C5 with different sequences in a single file.

@jguillen15
Copy link

jguillen15 commented Aug 26, 2022

Hi Jake!
We were making the test to reproduce the error and we have 3 new questions:

  1. When importing the part_retrieval.py file in our test we had an issue: "ImportError: attempted relative import with no known parent package". The way we solved it was removing the dot present in the imports, in line 17,18 of part_retrieval.py and in line 10 of package_specification.py.
    Could this affect some other functionality? What is the purpose of that dot in the imports?

  2. In two out of the three different ways in which a duplicate could occur, described above in the previous comment, the duplicate was catched and the warning was present in the log, however, in the first case, in which iGEM_raw_imports.fasta contains pSB1C5, and so does pSB1C5.gb, the warning is not triggered.
    The reason might be that the uri assigned for the pSB1C5 in the iGEM_raw_imports.fasta is http://parts.igem.org/pSB1C5, meanwhile the uri assigned to pSB1C5.gb is https://github.com/iGEM-Engineering/iGEM-distribution/CRISPR-Cas/pSB1C5, so when the comparison is made they don't match and therefore it is not identified as a duplicate.

  3. The way we fixed the duplicate to cause an error was to change the logging.warning() to logging.error(). Line 97
    This will not stop the code from executing or take any other action, is this ok?

@jakebeal
Copy link
Contributor Author

  1. This likely means you don't have paths set up in the way the project specifies. Can you run scripts/test/test_import.py correctly?
  2. Yes, but the two end up with the same alias, which means they can get remapped to the same ID during the collate_packages.py stage. We need to check for that case in the inventory stage as well.
  3. Changing from a warning to an error is good. We do need to have the script terminate with an exception, however, in order to make sure that we end up with a red X and not a green checkmark. Nobody will look at the log file unless it's a red X.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
automation GitHub action automation work bug Something isn't working good first issue Good for newcomers
Projects
None yet
Development

No branches or pull requests

3 participants