Skip to content

Commit

Permalink
Add CIF value reader (#4)
Browse files Browse the repository at this point in the history
* Add _str2num and _deg2rad _utils

* Add cif file keys list to sample data

* Add key_value_pairs reader and cell_params reader to parse

* Add tests for key reader

* Add tests for new utils

* Reorder test_key_reader

* Improve documentation for regex

* Add warnings and tests to read_key_value_pairs

* Restore trailing spaces to downloaded CIF files

* Properly track keys containing "-"

* Improved tests for key value pair reader

* Add key-value tests for INTENTIONALLY_BAD_CIF.cif

* Fix docs

* Enable top of page button

* Update brand primary colors

* Improve docs for parse.py

* Add __future__.annotations imports to relevant files

* Fix typo

* Seperate _errors from _templates

* Clean up docstring return types

* Add PDB cif to test suite

* Fix test in test_key_reader

* Clean up patterns.py and add remove_nondelimiting_whitespace

* Update table_reader to use remove_nondelimiting_whitespace

* Allow value reader to read mmCIF files

* Update test_table_reader.py

* Remove seperate mmCIF reader

* Add docs for patterns module

* Fix cast_to_float default value

* Update docs

* Add documentation for __call__

* Update regex_filter param documentation

* Fix typo

* Remove unneeded comment

* Fix default values in docs

* Fix typo

* Minor doc fix

* Fix typo

* Remove duplicate Introduction from index

* Remove duplicate entries from toc

* Add source for PDB cif

* Add mmCIF flag to read_cell_params

* Add quickstart.rst

* Fix comment in quickstart

* Remove unnecessary line in quickstart

* Fix image path in README.rst

* Update regex documentation

* Fix CI

* Documentation fix

* Documentation fix for regex filter

* Comment fixes

* Fix #8

* Fix typo in _parsed_line_generator docs

Co-authored-by: Kelly Wang <47036428+klywang@users.noreply.github.com>

* Typo fix

* Move tip block comment

* Untrack cif files from end-of-file-fixer

* Add missing key to CifData namedtuple

* Remove __future__ annotations

* Remove type | type
  • Loading branch information
janbridley authored May 22, 2024
1 parent ee5894e commit dfd640c
Show file tree
Hide file tree
Showing 25 changed files with 2,068 additions and 185 deletions.
2 changes: 2 additions & 0 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,9 @@ repos:
rev: 'v4.4.0'
hooks:
- id: end-of-file-fixer
exclude: tests/sample_data
- id: trailing-whitespace
exclude: tests/sample_data
- id: check-builtin-literals
- id: check-executables-have-shebangs
- id: check-json
Expand Down
16 changes: 8 additions & 8 deletions README.rst
Original file line number Diff line number Diff line change
@@ -1,13 +1,15 @@
.. _header:
.. _images:

.. image:: _static/parsnip_header_dark.svg
.. image:: doc/source/_static/parsnip_header_dark.svg
:width: 600
:class: only-light

.. image:: _static/parsnip_header_light.svg
.. image:: doc/source/_static/parsnip_header_light.svg
:width: 600
:class: only-dark

.. _header:

..
TODO: set up Readthedocs, PyPI, and conda-forge
Expand All @@ -27,12 +29,10 @@

**parsnip** is a minimal Python library for parsing `CIF <https://www.iucr.org/resources/cif>`_ files. While its primary focus is on simplicity and portability, performance-oriented design choices are made where possible.

The ``parsnip.parse`` module handles standard CIF files (including those under the `CIF 1.1 <https://www.iucr.org/resources/cif/spec/version1.1>`_ and `CIF 2.0 <https://www.iucr.org/resources/cif/cif2>`_ standards). It includes a table reader for `loop\_`-delimited tables as well as a key-value pair reader. Provide a filename and a list of keys to either of these functions and you're all set to read start parsing CIF files!


.. TODO: reintroduce this text when the parsemm module is updated
``parsnip.parsemm`` handles `mmCIF <https://www.iucr.org/resources/cif/dictionaries/cif_mm>` files.
.. _parse:

The ``parsnip.parse`` module handles standard CIF files (including those under the `CIF 1.1 <https://www.iucr.org/resources/cif/spec/version1.1>`_ and `CIF 2.0 <https://www.iucr.org/resources/cif/cif2>`_ standards), as well as many features from the `mmCIF <https://pdb101.rcsb.org/learn/guide-to-understanding-pdb-data/beginner’s-guide-to-pdb-structures-and-the-pdbx-mmcif-format>`_ format.
The package includes a table reader for `loop\_`-delimited tables as well as a key-value pair reader. Provide a filename and a list of keys to either of these functions and you're all set to read start parsing CIF and mmCIF files!

.. _installing:

Expand Down
8 changes: 6 additions & 2 deletions doc/source/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,7 @@
"sphinx.ext.autodoc",
"sphinx.ext.autosummary",
"sphinx.ext.intersphinx",
"sphinx.ext.napoleon",
"autodocsumm",
]

Expand All @@ -36,6 +37,7 @@
"show-inheritance": True,
"autosummary": True,
}
autodoc_typehints = "description"

pygments_style = "friendly"
pygments_dark_style = "native"
Expand All @@ -50,12 +52,14 @@
"light_logo": "parsnip_header_dark.svg",
"dark_logo": "parsnip_header_light.svg",
"dark_css_variables": {
"color-brand-primary": "#5187b2",
"color-brand-primary": "#4AA092",
"color-brand-content": "#5187b2",
},
"light_css_variables": {
"color-brand-primary": "#406a8c",
"color-brand-primary": "#005A50",
"color-brand-content": "#406a8c",
},
"top_of_page_button": "edit",
"source_edit_link": "https://github.com/glotzerlab/parsnip",
}
html_favicon = "_static/parsnip_logo_favicon.svg"
27 changes: 27 additions & 0 deletions doc/source/example_file.cif
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
data_cif_file

_journal_year 1999
_journal_page_first 0
_journal_page_last 123

_chemical_name_mineral 'Copper FCC'
_chemical_formula_sum 'Cu'

_cell_length_a 3.6
_cell_length_b 3.6
_cell_length_c 3.6
_cell_angle_alpha 90.0
_cell_angle_beta 90.0
_cell_angle_gamma 90.0


loop_
_atom_site_label
_atom_site_fract_x
_atom_site_fract_y
_atom_site_fract_z
_atom_site_type_symbol
_atom_site_Wyckoff_label
Cu1 0.0000000000 0.0000000000 0.0000000000 Cu a

_symmetry_space_group_name_H-M 'Fm-3m'
20 changes: 11 additions & 9 deletions doc/source/index.rst
Original file line number Diff line number Diff line change
@@ -1,11 +1,19 @@
.. image:: _static/parsnip_header_dark.svg
:width: 600
:class: only-light

.. image:: _static/parsnip_header_light.svg
:width: 600
:class: only-dark

.. include:: ../../README.rst
:start-after: .. _header:


.. toctree::
:maxdepth: 2
:caption: Getting Started

introduction
installation
quickstart

Expand All @@ -15,22 +23,16 @@
:caption: API

package-parse
package-patterns


.. toctree::
:maxdepth: 1
:caption: Reference

genindex
modindex
development
changelog
credits
license


Indices and tables
==================

* :ref:`genindex`
* :ref:`modindex`
* :ref:`search`
6 changes: 0 additions & 6 deletions doc/source/introduction.rst

This file was deleted.

8 changes: 8 additions & 0 deletions doc/source/package-patterns.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
Patterns Module
==============================

.. rubric:: Overview

.. automodule:: parsnip.patterns
:members:
:special-members:
115 changes: 115 additions & 0 deletions doc/source/quickstart.rst
Original file line number Diff line number Diff line change
Expand Up @@ -2,3 +2,118 @@

Quickstart Tutorial
===================

Once you have :ref:`installed <installation>` **parsnip**, most workflows involve reading a CIF file.
Let's assume we have the file my_file.cif in the current directory, and these are its contents:

.. literalinclude:: example_file.cif

Reading Keys
^^^^^^^^^^^^


Now, let's read extract the key-value pairs:

.. code-block:: python
from parsnip import parse
filename = "my_file.cif"
pairs = parse.read_key_value_pairs(filename)
print(pairs)
... {
... '_journal_year': '1999',
... '_journal_page_first': '0',
... '_journal_page_last': '123',
... '_chemical_name_mineral': "'Copper FCC'",
... '_chemical_formula_sum': "'Cu'",
... '_cell_length_a': '3.6',
... '_cell_length_b': '3.6',
... '_cell_length_c': '3.6',
... '_cell_angle_alpha': '90.0',
... '_cell_angle_beta': '90.0',
... '_cell_angle_gamma': '90.0'
... '_symmetry_space_group_name_H-M': 'Fm-3m'
... }
By default, read_key_value_pairs reads every key. To read only numeric data values, set
``only_read_numerics`` to ``True``.To take a subset, provide a tuple of strings to the ``keys`` argument.

.. code-block:: python
# Only read the numeric data values
pairs = parse.read_key_value_pairs(filename,only_read_numerics=True)
print(pairs)
... {
... '_journal_year': 1999,
... '_journal_page_first': 0,
... '_journal_page_last': 123,
... '_cell_length_a': 3.6,
... '_cell_length_b': 3.6,
... '_cell_length_c': 3.6,
... '_cell_angle_alpha': 90.0,
... '_cell_angle_beta': 90.0,
... '_cell_angle_gamma': 90.0
... }
# Read only these keys
keys = (
"_journal_year"
"_journal_page_first"
"_journal_page_last"
)
pairs = parse.read_key_value_pairs(filename,keys=keys)
print(pairs)
... {
... '_journal_year': '1999',
... '_journal_page_first': '0',
... '_journal_page_last': '123',
... }
Reading Tables
^^^^^^^^^^^^^^

Now, let's read a table. To do this, we need a list of keys:

.. code-block:: python
keys = (
"_atom_site_label",
"_atom_site_fract_x",
"_atom_site_fract_y",
"_atom_site_fract_z",
"_atom_site_type_symbol",
"_atom_site_Wyckoff_label"
)
table = parse.read_table(filename,keys=keys)
print(table)
... array([['Cu1',
... '0.0000000000(0)',
... '0.0000000000(0)',
... '0.0000000000(0)',
... 'Cu'
... 'a']],
... dtype='<U12')
Now, maybe don't need the atom site or Wyckoff labels - let's select just the numeric values, and export them as floats:

.. code-block:: python
keys = (
"_atom_site_fract_x",
"_atom_site_fract_y",
"_atom_site_fract_z",
)
table = parse.read_table(filename,keys=keys,cast_to_float=True)
print(table)
... array([[0., 0., 0.]], dtype=float32)
The cast_to_float argument automatically converts numeric data types, and removes tolerance and precision markers for us.
Extracting the fractional coordinates of a unit cell is a pretty common operation, so we have a convenience function that does this as well.

.. code-block:: python
table = parse.read_fractional_positions(filename)
print(table)
... array([[0., 0., 0.]], dtype=float32)
14 changes: 14 additions & 0 deletions parsnip/_errors.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
class ParseWarning(Warning):
def __init__(self, message):
self.message = message

def __str__(self):
return repr(self.message)


class ParseError(RuntimeError):
def __init__(self, message):
self.message = message

def __str__(self):
return repr(self.message)
17 changes: 7 additions & 10 deletions parsnip/_utils.py
Original file line number Diff line number Diff line change
@@ -1,14 +1,11 @@
class ParseWarning(Warning):
def __init__(self, message):
self.message = message
import numpy as np

def __str__(self):
return repr(self.message)

def _str2num(val: str):
"""Convert a string value to an integer if possible, or a float otherwise."""
return float(val) if "." in val else int(val)

class ParseError(RuntimeError):
def __init__(self, message):
self.message = message

def __str__(self):
return repr(self.message)
def _deg2rad(val: float):
"""Convert a value in degrees to one in radians."""
return val * np.pi / 180
Loading

0 comments on commit dfd640c

Please sign in to comment.