-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add ability to build out unit cells from CIF files #11
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left some comments for minor improvements. Looks good overall.
_, unique_indices, unique_counts = np.unique( | ||
pos.round(n_decimal_places), return_index=True, return_counts=True, axis=0 | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be possible to use tolerance instead of rounding here? This is OK, but rounding doesn't give you precise control over the tolerance you wish to apply....
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will come up with a solution for this - it's a shame it's not implemented in Numpy general, as this is a problem I run into frequently.
fad06db
to
d50b3ee
Compare
Description
A primary use case of this package is building out crystal structures. CIF files store this data as Wyckoff (symmetry irreducible) sites and symmetry operations. The
extract_atomic_positions
function applies every symmetry operation to every Wyckoff site and filters out duplicate points (after wrapping the coordinates back into the box).Caveats
As a text format, CIF makes no guarantees about the precision of numerical data. Files with 3 or fewer decimal places may result in sites that are effectively overlapping but are not pruned by the uniqueness check. This is the case for gemmi and ASE as well, although this function gets the correct number of atoms more frequently than gemmi. With enough decimal places, all of these packages get the same result.
The uniqueness check
We perform two (array-set) operations to filter unique points: one on the fractional coordinates, and one on the coordinates in real space. The fractional coordinate check removes about 96% of duplicate points (based on an example 1000 files from the AFlow database). The second check matches another ~1% of coordinates, for a total of 97% accuracy. Using a distance based check (as in gemmi and ase) yields similar statistics but more cases where too many points are removed. In my opinion this is worse, as it's far more difficult to add in unmatched points than it is to manually remove near-duplicates. Either way, it seems like Parsnip is at least as effective as established alternatives for this method.
Types of Changes
1The change breaks (or has the potential to break) existing functionality and should be merged into the
breaking
branchChecklist: