Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add ability to build out unit cells from CIF files #11

Merged
merged 102 commits into from
Dec 20, 2024
Merged

Conversation

janbridley
Copy link
Collaborator

Description

A primary use case of this package is building out crystal structures. CIF files store this data as Wyckoff (symmetry irreducible) sites and symmetry operations. The extract_atomic_positions function applies every symmetry operation to every Wyckoff site and filters out duplicate points (after wrapping the coordinates back into the box).

Caveats

As a text format, CIF makes no guarantees about the precision of numerical data. Files with 3 or fewer decimal places may result in sites that are effectively overlapping but are not pruned by the uniqueness check. This is the case for gemmi and ASE as well, although this function gets the correct number of atoms more frequently than gemmi. With enough decimal places, all of these packages get the same result.

The uniqueness check

We perform two (array-set) operations to filter unique points: one on the fractional coordinates, and one on the coordinates in real space. The fractional coordinate check removes about 96% of duplicate points (based on an example 1000 files from the AFlow database). The second check matches another ~1% of coordinates, for a total of 97% accuracy. Using a distance based check (as in gemmi and ase) yields similar statistics but more cases where too many points are removed. In my opinion this is worse, as it's far more difficult to add in unmatched points than it is to manually remove near-duplicates. Either way, it seems like Parsnip is at least as effective as established alternatives for this method.

Types of Changes

  • Documentation update
  • Bug fix
  • New feature
  • Breaking change1

1The change breaks (or has the potential to break) existing functionality and should be merged into the breaking branch

Checklist:

  • I am familiar with the Development Guidelines
  • The changes introduced by this pull request are covered by existing or newly introduced tests.
  • I have updated the changelog and added my name to the credits.

Copy link

@DomFijan DomFijan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left some comments for minor improvements. Looks good overall.

Comment on lines +248 to +250
_, unique_indices, unique_counts = np.unique(
pos.round(n_decimal_places), return_index=True, return_counts=True, axis=0
)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be possible to use tolerance instead of rounding here? This is OK, but rounding doesn't give you precise control over the tolerance you wish to apply....

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will come up with a solution for this - it's a shame it's not implemented in Numpy general, as this is a problem I run into frequently.

@janbridley janbridley merged commit 844578f into main Dec 20, 2024
8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants