Add CIF value reader (#4)

* Add _str2num and _deg2rad _utils * Add cif file keys list to sample data * Add key_value_pairs reader and cell_params reader to parse * Add tests for key reader * Add tests for new utils * Reorder test_key_reader * Improve documentation for regex * Add warnings and tests to read_key_value_pairs * Restore trailing spaces to downloaded CIF files * Properly track keys containing "-" * Improved tests for key value pair reader * Add key-value tests for INTENTIONALLY_BAD_CIF.cif * Fix docs * Enable top of page button * Update brand primary colors * Improve docs for parse.py * Add __future__.annotations imports to relevant files * Fix typo * Seperate _errors from _templates * Clean up docstring return types * Add PDB cif to test suite * Fix test in test_key_reader * Clean up patterns.py and add remove_nondelimiting_whitespace * Update table_reader to use remove_nondelimiting_whitespace * Allow value reader to read mmCIF files * Update test_table_reader.py * Remove seperate mmCIF reader * Add docs for patterns module * Fix cast_to_float default value * Update docs * Add documentation for __call__ * Update regex_filter param documentation * Fix typo * Remove unneeded comment * Fix default values in docs * Fix typo * Minor doc fix * Fix typo * Remove duplicate Introduction from index * Remove duplicate entries from toc * Add source for PDB cif * Add mmCIF flag to read_cell_params * Add quickstart.rst * Fix comment in quickstart * Remove unnecessary line in quickstart * Fix image path in README.rst * Update regex documentation * Fix CI * Documentation fix * Documentation fix for regex filter * Comment fixes * Fix #8 * Fix typo in _parsed_line_generator docs Co-authored-by: Kelly Wang <47036428+klywang@users.noreply.github.com> * Typo fix * Move tip block comment * Untrack cif files from end-of-file-fixer * Add missing key to CifData namedtuple * Remove __future__ annotations * Remove type | type
glotzerlab · May 22, 2024 · dfd640c · dfd640c
1 parent ee5894e
commit dfd640c
Show file tree

Hide file tree

Showing 25 changed files with 2,068 additions and 185 deletions.
diff --git a/.pre-commit-config.yaml b/.pre-commit-config.yaml
@@ -6,7 +6,9 @@ repos:
     rev: 'v4.4.0'
     hooks:
       - id: end-of-file-fixer
+        exclude: tests/sample_data
       - id: trailing-whitespace
+        exclude: tests/sample_data
       - id: check-builtin-literals
       - id: check-executables-have-shebangs
       - id: check-json

diff --git a/README.rst b/README.rst
@@ -1,13 +1,15 @@
-.. _header:
+.. _images:
 
-.. image:: _static/parsnip_header_dark.svg
+.. image:: doc/source/_static/parsnip_header_dark.svg
   :width: 600
   :class: only-light
 
-.. image:: _static/parsnip_header_light.svg
+.. image:: doc/source/_static/parsnip_header_light.svg
   :width: 600
   :class: only-dark
 
+.. _header:
+
 ..
   TODO: set up Readthedocs, PyPI, and conda-forge
 
@@ -27,12 +29,10 @@
 
 **parsnip** is a minimal Python library for parsing `CIF <https://www.iucr.org/resources/cif>`_ files. While its primary focus is on simplicity and portability, performance-oriented design choices are made where possible.
 
-The ``parsnip.parse`` module handles standard CIF files (including those under the `CIF 1.1 <https://www.iucr.org/resources/cif/spec/version1.1>`_ and `CIF 2.0 <https://www.iucr.org/resources/cif/cif2>`_ standards). It includes a table reader for `loop\_`-delimited tables as well as a key-value pair reader. Provide a filename and a list of keys to either of these functions and you're all set to read start parsing CIF files!
-
-
-.. TODO: reintroduce this text when the parsemm module is updated
-  ``parsnip.parsemm`` handles `mmCIF <https://www.iucr.org/resources/cif/dictionaries/cif_mm>` files.
+.. _parse:
 
+The ``parsnip.parse`` module handles standard CIF files (including those under the `CIF 1.1 <https://www.iucr.org/resources/cif/spec/version1.1>`_ and `CIF 2.0 <https://www.iucr.org/resources/cif/cif2>`_ standards), as well as many features from the `mmCIF <https://pdb101.rcsb.org/learn/guide-to-understanding-pdb-data/beginner’s-guide-to-pdb-structures-and-the-pdbx-mmcif-format>`_ format.
+The package includes a table reader for `loop\_`-delimited tables as well as a key-value pair reader. Provide a filename and a list of keys to either of these functions and you're all set to read start parsing CIF and mmCIF files!
 
 .. _installing:
 

diff --git a/doc/source/conf.py b/doc/source/conf.py
@@ -21,6 +21,7 @@
     "sphinx.ext.autodoc",
     "sphinx.ext.autosummary",
     "sphinx.ext.intersphinx",
+    "sphinx.ext.napoleon",
     "autodocsumm",
 ]
 
@@ -36,6 +37,7 @@
     "show-inheritance": True,
     "autosummary": True,
 }
+autodoc_typehints = "description"
 
 pygments_style = "friendly"
 pygments_dark_style = "native"
@@ -50,12 +52,14 @@
     "light_logo": "parsnip_header_dark.svg",
     "dark_logo": "parsnip_header_light.svg",
     "dark_css_variables": {
-        "color-brand-primary": "#5187b2",
+        "color-brand-primary": "#4AA092",
         "color-brand-content": "#5187b2",
     },
     "light_css_variables": {
-        "color-brand-primary": "#406a8c",
+        "color-brand-primary": "#005A50",
         "color-brand-content": "#406a8c",
     },
+    "top_of_page_button": "edit",
+    "source_edit_link": "https://github.com/glotzerlab/parsnip",
 }
 html_favicon = "_static/parsnip_logo_favicon.svg"
diff --git a/doc/source/example_file.cif b/doc/source/example_file.cif
@@ -0,0 +1,27 @@
+data_cif_file
+
+_journal_year 1999
+_journal_page_first 0
+_journal_page_last 123
+
+_chemical_name_mineral 'Copper FCC'
+_chemical_formula_sum 'Cu'
+
+_cell_length_a     3.6
+_cell_length_b     3.6
+_cell_length_c     3.6
+_cell_angle_alpha  90.0
+_cell_angle_beta   90.0
+_cell_angle_gamma  90.0
+
+
+loop_
+_atom_site_label
+_atom_site_fract_x
+_atom_site_fract_y
+_atom_site_fract_z
+_atom_site_type_symbol
+_atom_site_Wyckoff_label
+Cu1 0.0000000000 0.0000000000 0.0000000000  Cu a
+
+_symmetry_space_group_name_H-M  'Fm-3m'
diff --git a/doc/source/index.rst b/doc/source/index.rst
@@ -1,11 +1,19 @@
+.. image:: _static/parsnip_header_dark.svg
+  :width: 600
+  :class: only-light
+
+.. image:: _static/parsnip_header_light.svg
+  :width: 600
+  :class: only-dark
+
 .. include:: ../../README.rst
+  :start-after: .. _header:
 
 
 .. toctree::
    :maxdepth: 2
    :caption: Getting Started
 
-   introduction
    installation
    quickstart
 
@@ -15,22 +23,16 @@
    :caption: API
 
    package-parse
+   package-patterns
 
 
 .. toctree::
    :maxdepth: 1
    :caption: Reference
 
    genindex
+   modindex
    development
    changelog
    credits
    license
-
-
-Indices and tables
-==================
-
-* :ref:`genindex`
-* :ref:`modindex`
-* :ref:`search`
diff --git a/doc/source/introduction.rst b/doc/source/introduction.rst
diff --git a/doc/source/package-patterns.rst b/doc/source/package-patterns.rst
@@ -0,0 +1,8 @@
+Patterns Module
+==============================
+
+.. rubric:: Overview
+
+.. automodule:: parsnip.patterns
+   :members:
+   :special-members:
diff --git a/doc/source/quickstart.rst b/doc/source/quickstart.rst
@@ -2,3 +2,118 @@
 
 Quickstart Tutorial
 ===================
+
+Once you have :ref:`installed <installation>` **parsnip**, most workflows involve reading a CIF file.
+Let's assume we have the file my_file.cif in the current directory, and these are its contents:
+
+.. literalinclude:: example_file.cif
+
+Reading Keys
+^^^^^^^^^^^^
+
+
+Now, let's read extract the key-value pairs:
+
+.. code-block:: python
+
+    from parsnip import parse
+    filename = "my_file.cif"
+    pairs = parse.read_key_value_pairs(filename)
+    print(pairs)
+    ...    {
+    ...      '_journal_year': '1999',
+    ...      '_journal_page_first': '0',
+    ...      '_journal_page_last': '123',
+    ...      '_chemical_name_mineral': "'Copper FCC'",
+    ...      '_chemical_formula_sum': "'Cu'",
+    ...      '_cell_length_a': '3.6',
+    ...      '_cell_length_b': '3.6',
+    ...      '_cell_length_c': '3.6',
+    ...      '_cell_angle_alpha': '90.0',
+    ...      '_cell_angle_beta': '90.0',
+    ...      '_cell_angle_gamma': '90.0'
+    ...      '_symmetry_space_group_name_H-M':  'Fm-3m'
+    ...    }
+
+By default, read_key_value_pairs reads every key. To read only numeric data values, set
+``only_read_numerics`` to ``True``.To take a subset, provide a tuple of strings to the ``keys`` argument.
+
+.. code-block:: python
+
+    # Only read the numeric data values
+    pairs = parse.read_key_value_pairs(filename,only_read_numerics=True)
+    print(pairs)
+    ...    {
+    ...      '_journal_year': 1999,
+    ...      '_journal_page_first': 0,
+    ...      '_journal_page_last': 123,
+    ...      '_cell_length_a': 3.6,
+    ...      '_cell_length_b': 3.6,
+    ...      '_cell_length_c': 3.6,
+    ...      '_cell_angle_alpha': 90.0,
+    ...      '_cell_angle_beta': 90.0,
+    ...      '_cell_angle_gamma': 90.0
+    ...    }
+
+    # Read only these keys
+    keys = (
+      "_journal_year"
+      "_journal_page_first"
+      "_journal_page_last"
+    )
+    pairs = parse.read_key_value_pairs(filename,keys=keys)
+    print(pairs)
+    ...    {
+    ...      '_journal_year': '1999',
+    ...      '_journal_page_first': '0',
+    ...      '_journal_page_last': '123',
+    ...    }
+
+Reading Tables
+^^^^^^^^^^^^^^
+
+Now, let's read a table. To do this, we need a list of keys:
+
+.. code-block:: python
+
+    keys = (
+      "_atom_site_label",
+      "_atom_site_fract_x",
+      "_atom_site_fract_y",
+      "_atom_site_fract_z",
+      "_atom_site_type_symbol",
+      "_atom_site_Wyckoff_label"
+    )
+    table = parse.read_table(filename,keys=keys)
+    print(table)
+    ...    array([['Cu1',
+    ...            '0.0000000000(0)',
+    ...            '0.0000000000(0)',
+    ...            '0.0000000000(0)',
+    ...            'Cu'
+    ...            'a']],
+    ...            dtype='<U12')
+
+
+Now, maybe don't need the atom site or Wyckoff labels - let's select just the numeric values, and export them as floats:
+
+.. code-block:: python
+
+    keys = (
+      "_atom_site_fract_x",
+      "_atom_site_fract_y",
+      "_atom_site_fract_z",
+    )
+    table = parse.read_table(filename,keys=keys,cast_to_float=True)
+    print(table)
+    ...    array([[0., 0., 0.]], dtype=float32)
+
+The cast_to_float argument automatically converts numeric data types, and removes tolerance and precision markers for us.
+Extracting the fractional coordinates of a unit cell is a pretty common operation, so we have a convenience function that does this as well.
+
+.. code-block:: python
+
+
+    table = parse.read_fractional_positions(filename)
+    print(table)
+    ...    array([[0., 0., 0.]], dtype=float32)
diff --git a/parsnip/_errors.py b/parsnip/_errors.py
@@ -0,0 +1,14 @@
+class ParseWarning(Warning):
+    def __init__(self, message):
+        self.message = message
+
+    def __str__(self):
+        return repr(self.message)
+
+
+class ParseError(RuntimeError):
+    def __init__(self, message):
+        self.message = message
+
+    def __str__(self):
+        return repr(self.message)
diff --git a/parsnip/_utils.py b/parsnip/_utils.py
@@ -1,14 +1,11 @@
-class ParseWarning(Warning):
-    def __init__(self, message):
-        self.message = message
+import numpy as np
 
-    def __str__(self):
-        return repr(self.message)
 
+def _str2num(val: str):
+    """Convert a string value to an integer if possible, or a float otherwise."""
+    return float(val) if "." in val else int(val)
 
-class ParseError(RuntimeError):
-    def __init__(self, message):
-        self.message = message
 
-    def __str__(self):
-        return repr(self.message)
+def _deg2rad(val: float):
+    """Convert a value in degrees to one in radians."""
+    return val * np.pi / 180