deploy: 0d20f89

SamsungDS · Oct 16, 2024 · c9590b8 · c9590b8
commit c9590b8
Show file tree

Hide file tree

Showing 53 changed files with 6,999 additions and 0 deletions.
diff --git a/.buildinfo b/.buildinfo
@@ -0,0 +1,4 @@
+# Sphinx build info version 1
+# This file hashes the configuration used when building these files. When it is not found, a full rebuild will be done.
+config: 4cd19021c6d2d6d014e58268b3091e7f
+tags: 645f666f9bcd5a90fca523b33c5a78b7
diff --git a/.nojekyll b/.nojekyll
diff --git a/_images/web_ss_1.png b/_images/web_ss_1.png
diff --git a/_images/web_ss_2.png b/_images/web_ss_2.png
diff --git a/_images/web_ss_3.png b/_images/web_ss_3.png
diff --git a/_sources/index.rst.txt b/_sources/index.rst.txt
@@ -0,0 +1,63 @@
+.. _sec-welcome:
+
+Welcome to NVMe-Spex's documentation!
+=====================================
+
+.. toctree::
+   :maxdepth: 1
+   :hidden:
+
+   what_is_spex.rst
+   setup/index.rst
+   user_guide/stages.rst
+   user_guide/using_spex.rst
+   user_guide/dev.rst
+
+
+Welcome to the documentation for **Spex**, a tool for extracting information
+on data-structures in the NVMe specification documents.
+
+To read more about what **Spex** does, see :ref:`sec-what-is-spex`.
+For help on setting up **Spex** on your system, see :ref:`sec-setup`.
+
+
+For direct usage of nvme-spex it is possible to run it from docker. For setup of
+docker on windows we refer to guides from `Docker Desktop
+<https://docs.docker.com/desktop/install/windows-install/>`_.
+
+.. code-block:: shell
+
+      docker run --rm -v ~/Documents/specs/:/specs ghcr.io/samsungds/nvme-spex-webserver:latest run -s --output=/specs/output /specs/nvme_base.docx 
+
+The output of the run will be available at ~/Documents/specs/output in this example.
+
+It is also possible to lint the docx specification by using the web application.
+To start the web application can be started with the following command:
+
+.. code-block:: shell
+
+       docker pull ghcr.io/samsungds/nvme-spex-webserver:latest
+       docker run --rm -p 8000:8000 ghcr.io/samsungds/nvme-spex-webserver:latest webserver
+
+When the docker container is successfully running the web application can be
+accessed in the browser at `http://localhost:8000 <http://localhost:8000>`_.
+
+
+The web application will show the following user interface:
+
+.. image:: images/web_ss_1.png
+  :width: 100%
+  :alt: Alternative text
+
+Upload the specification .docx or .html file and press the submit button.
+
+.. image:: images/web_ss_2.png
+  :width: 100%
+  :alt: Alternative text
+
+After processing is done the web-application will show
+the following report:
+
+.. image:: images/web_ss_3.png
+  :width: 100%
+  :alt: Alternative text
diff --git a/_sources/setup/index.rst.txt b/_sources/setup/index.rst.txt
@@ -0,0 +1,30 @@
+.. _sec-setup:
+
+Setting up Spex
+===============
+
+.. toctree::
+   :maxdepth: 1
+   :hidden:
+
+   nix.rst
+   manual.rst
+
+**Spex** has various dependencies. You can use :ref:`sec-setup-nix` to setup the reference
+environment which is actively used in development and tested in CI.
+
+Otherwise, you can install Spex in the traditional way, see :ref:`sec-setup-manual`.
+
+.. note::
+    **A note on Spex' requirements**
+
+    Please note that these dependencies may change, and others may be added.
+    The *only* exhaustive description of dependencies, is the ``flake.nix`` file.
+
+    Please understand that dependencies are chosen and/or upgraded
+    to make development easier, increase software robustness or provide
+    additional features.
+    Dependencies will not be dropped, nor will code be rewritten to support
+    old software or conservative Linux distributions.
+
+    You can use Nix to run on such platforms.
diff --git a/_sources/setup/manual.rst.txt b/_sources/setup/manual.rst.txt
@@ -0,0 +1,43 @@
+.. _sec-setup-manual:
+
+Setting up Spex manually
+========================
+
+.. note::
+    In doubt as to which method to use when setting up Spex? See :ref:`sec-setup`.
+
+
+.. warning::
+    Spex reserves the right to update dependencies if it
+    helps development or enables new features or better performance.
+
+    If you choose the manual route, it is up to *you* to update your system
+    accordingly.
+
+**Spex** is implemented in Python and distributed via :pypi:`Pypi <>` and thus
+installable via ``pip`` / ``pipx``::
+
+  pipx install nvme-spex
+
+And then run it::
+
+  spex --help
+
+For this to run then the following runtime requirements must be met:
+
+1. Python >= 3.11
+    * **spex** relies on Python features for types introduced in Python 3.11
+
+2. Python packages
+    * For details, then have a look at the **Spex** flake (``setup.cfg``)
+
+3. C libraries used by the Python packages
+    * Specificaly, then **Spex** uses the Python package ``lxml`` which in turn
+      requires the ``libxml2`` C library to be present on the system.
+    * This may change and more dependencies may be added, see the ``flake.nix`` file for full details.
+
+
+.. note::
+    The setup of the above requirements is specific to the environment that you are
+    using. The only supported environment is the reference environment, managed by
+    :ref:`sec-setup-nix`
diff --git a/_sources/setup/nix.rst.txt b/_sources/setup/nix.rst.txt
@@ -0,0 +1,84 @@
+.. _sec-setup-nix:
+
+Setting up Spex with Nix
+=========================
+
+Development is done in an environment managed by ``nix`` which recreates the
+*exact* same environment as is used for running, developing and testing
+**Spex**. To create the environment, you need to install ``nix``.
+
+Using the Nix environment
+-------------------------
+If you have not yet installed nix, see :ref:`sec-setup-nix-install` below.
+
+Run Spex (development)
+~~~~~~~~~~~~~~~~~~~~~~
+To run Spex in a development context, where unversioned local changes are taken into account:
+
+Enter the development environment::
+
+  nix develop .#
+
+Run spex::
+
+  spex
+
+.. note::
+  The ``spex`` command is actually an alias defined in the shell described by ``flake.nix``, which modifies
+  the ``PYTHONPATH`` variable to put the ``./src`` directory on ``sys.path``, where modules are searched, and
+  the ``-m`` flag to execute the ``spex`` module.
+
+  This means ``spex`` uses the local source code files, and any changes made to the source will be reflected
+  next time you run ``spex``. However, it also *requires* you to stand in the project root (or ``src``) directory.
+
+  For details on how executable modules work and the ``python -m <module>`` command, see
+  `Python docs on __main__.py in Python Packages <https://docs.python.org/3/library/__main__.html#main-py-in-python-packages>`_ for details.
+
+Now run ``spex -h`` (or just ``spex`` without arguments) to see which arguments you can provide.
+For information on using Spex, see :ref:`sec-using-spex`.
+
+Run the Spex program
+~~~~~~~~~~~~~~~~~~~~
+You can run spex, even without cloning the repository, like so::
+
+  nix run github:SamsungDS/Spex#spex
+
+.. note::
+  that this is not for development use, this will not reflect any local changes made to the source code.
+
+.. _sec-setup-nix-install:
+
+Install Nix
+-----------
+Skip this section if you have already installed Nix.
+
+Linux & MacOS
+~~~~~~~~~~~~~
+
+On Linux and MacOS, run the following to install nix::
+
+  curl --proto '=https' --tlsv1.2 -sSf -L https://install.determinate.systems/nix | sh -s -- install
+
+
+Windows (WSL)
+~~~~~~~~~~~~~
+Windows can use Nix through the Windows Subsystem for Linux (WSL) environment.
+
+First install WSL. Open a command-prompt and type::
+
+  wsl --install
+
+You may have to reboot the machine afterwards.
+
+Then install a Ubuntu WSL VM::
+
+  wsl --install -d ubuntu
+
+
+Then, from *within the WSL environment* (type ``wsl`` in command-prompt to enter), install Nix in *single-user mode*::
+
+  sh <(curl -L https://nixos.org/nix/install) --no-daemon
+
+
+Finally, close your command prompt(s) and start a new one, now Nix should be installed and ready for use!
+
diff --git a/_sources/user_guide/dev.rst.txt b/_sources/user_guide/dev.rst.txt
@@ -0,0 +1,134 @@
+.. _sec-guide-dev:
+
+Development Guide
+=================
+
+The following are some quick notes to help you get started with the **Spex** codebase.
+
+Overview
+--------
+
+1. Extract table-like figures and document metadata - create stage 1 document
+2. Using document metadata, select appropriate **DocumentParser**
+3. Use **DocumentParser** to iterate over all figures
+4. For each figure: extract its contents (fields/values) using an **Extractor**.
+5. save to stage 2 document
+
+Terminology
+-----------
+**DocumentParser**
+    A class responsible for driving the parsing of a document. It defines a method to iterate
+    over figures, defines a list of default **Extractors** to try when extracting the contents
+    of a figure and allows explicitly overriding/defining a specific extractor to use for a figure.
+    Finally, a **DocumentParser** also allows you to override the label (field name) and brief
+    (one-line description) of any field of any figure - as a way to enhance or improve the
+    generated output.
+
+**Extractor**
+    An extractor is responsible for *extracting* the raw contents of a figure (a HTML table),
+    making sense of it (i.e. identifying field names, ranges etc) and transforming it into
+    one of the standard **entity** types which **Spex** operates with, namely **value** and
+    **struct**.
+
+**Entity**
+    The stage 2 document (JSON) will contain a list of entities, each corresponding either
+    to a top-level figure, or some figure found nested within another figure.
+    An entity can presently be of two types, **value**, which describes something typically
+    rendered in C as an enum, or a (bit|byte) **struct**. A struct entities have one or more
+    fields with clearly defined ordering and sizes, as defined by their ranges. A struct with
+    bit ranges could be represented as a bit field in C, or as an array of values with macros
+    to extract each "field". A struct with byte ranges cleanly maps to a packed struct in C.
+
+**Stage 0/1/2 Document**
+    See :ref:`sec-guide-stages`.
+
+**Quirks**
+    Ideally all figures across all documents should be possible to parse using the same set of
+    standard extractors. We expect to work toward this in the future, but for now, we may need
+    to implement custom processing for specific figures. To do this, we create a custom
+    **DocumentParser** class and define overrides for specific figures within the document.
+    We then install this **DocumentParser**, containing the relevant overrides, into the
+    central **quirks map**, where **Spex** looks for a **DocumentParser** based on the
+    ``(title, revision)`` key extracted from the document's page header.
+
+(Detailed) Overview
+-------------------
+
+1. Create stage 1 document
+~~~~~~~~~~~~~~~~~~~~~~~~~~
+This stage is skipped if the input document is already a stage 1 document (HTML).
+
+Otherwise, the original full specification is opened and all table-like figures
+are extracted from the docx file and their form is translated into HTML and stored
+into a stage 1 document.
+
+Additionally, the document title and revision, visible on the page header on each
+page, is extracted and embedded into the document. **Spex** uses this to uniquely
+identify the document - and this allows us to define a custom **DocumentParser**
+for the document, if need be.
+
+2. Select appropriate **DocumentParser**
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+**Spex** determines the correct **DocumentParser** to use by forming a unique
+key from the document title and revision, and looking up in a "quirks"
+dictionary.
+The quirks dictionary is defined in ``spex/jsonspec/quirks/__init__.py``.
+Specialized document parsers are defined under ``spex/jsonspec/quirks``.
+
+Otherwise, the default **DocumentParser** is used. Note that a specific
+**DocumentParser** is only necessary provided we need to override which
+extractor is applied to one or more figures, manually provide labels for
+one or more fields or otherwise override parsing.
+
+3. Iterate over all figures
+~~~~~~~~~~~~~~~~~~~~~~~~~~~
+The **DocumentParser** class (``spex/jsonspec/document.py``) provides the 
+``DocumentParser.parse`` method, which iterates over all top-level figures
+in order, calling ``DocumentParser._on_parse_fig()`` on each figure found.
+
+The ``_on_parse_fig()`` method then finds an appropriate **Extractor** to
+apply (see below) which, among other things, may encounter additional
+*nested* figures.
+It is the responsibility of the **Extractor** to invoke a callback
+(which points to ``DocumentParser._on_parse_fig()``) for each nested
+figure it encounters.
+This allows a **Extractor** to facilitate recursively parsing all relevant
+nested figures, while ignoring irrelevant tables in other columns, for instance.
+
+4. Apply an appropriate **Extractor** to extract figure contents
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+For each figure encountered, ``DocumentParser._on_parse_fig()`` is called.
+This method first finds a suitable **Extractor** to use (or errors out), then
+it applies it, getting a generator of **Entities**, it yields to its caller in turn.
+
+Finding the **Extractor** to use
+""""""""""""""""""""""""""""""""
+The process to determine the **Extractor** to use is the following:
+
+First look in ``DocumentParser.fig_extractor_overrides`` for an entry
+matching the key of the ID of this figure.
+This option is only used in case a the figure requires special processing.
+In that case, we may provide a special extractor, usually a specialization
+of the bits, bytes or value extractor, to handle processing.
+These are typically defined alongside the specialized **DocumentExtrator**
+in the ``spex/jsonspec/extractors/quirks`` package.
+
+.. note::
+    **How are figures assigned an ID?**
+    For top-level figures, the ID is the number of the figure itself. For
+    nested figures, the **Extractor** used should construct a unique ID from
+    the parent ID and some unique data from the row, typically the bit/byte
+    offset or value from a value table.
+
+
+In most cases, there is no specific override for a figure, and so the default
+extractors, as specified by ``DocumentParser.extractors`` are tried, in order.
+These are presently ``BytesTableExtractor``, ``ValueTableExtractor`` and
+``BitsTableExtractor``, all defined in ``spex/jsonspec/extractors``.
+
+In case of these extractors, the method tries each in turn, calling
+the ``Extractor.can_apply()`` method on each, providing the extractor the
+columns of the table. It is from the column names alone that an
+extractor decides whether it is applicable or not.
+
+
diff --git a/_sources/user_guide/stages.rst.txt b/_sources/user_guide/stages.rst.txt
@@ -0,0 +1,35 @@
+.. _sec-guide-stages:
+
+Stages
+======
+
+Spex is deliberately designed to break processing into a series of
+discrete and individually verifiable stages.
+
+Stage artifacts
+~~~~~~~~~~~~~~~
+
+Each stage generate one or more files as its output. These files are referred
+to by their stage elsewhere in documentation. Each stage's output is presently
+a file of a different file type, but this may change in the future, as possibly
+additional steps of processing are added. Thus it is better to refer to a
+document by its stage than file type.
+
+* **Stage 0** - the NVMe specification document (in docx)
+* **Stage 1** - the HTML file containing an extract of all table-like figures from stage 0.
+* **Stage 2** - the JSON document representing all data-structures found in the document. These are either enum-like pairs of names and assigned codes/values, or struct-like data-structures, where each field is decribed by its range and field name.
+
+Why organize the program into stages?
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Breaking the program into distinct stages is beneficial to development and
+testing/verifying the output produced. However, it also has an important benefit to
+users: the ability to opt-out at any point of processing.
+
+Each step of processing makes the output more predictable and easy to process, but
+it also throws away information. For example, in processing the docx document to
+HTML, spex throws away every graphic and text between the tables.
+
+In producing the stage 2 output, the data-structure model, Spex discards supplementary
+text, figure headers, table section headers and so on.
+Each stage further refines the data, but discards other data in the process.