Skip to content

Commit

Permalink
deploy: 0d20f89
Browse files Browse the repository at this point in the history
  • Loading branch information
github-actions[bot] committed Oct 16, 2024
0 parents commit c9590b8
Show file tree
Hide file tree
Showing 53 changed files with 6,999 additions and 0 deletions.
4 changes: 4 additions & 0 deletions .buildinfo
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
# Sphinx build info version 1
# This file hashes the configuration used when building these files. When it is not found, a full rebuild will be done.
config: 4cd19021c6d2d6d014e58268b3091e7f
tags: 645f666f9bcd5a90fca523b33c5a78b7
Empty file added .nojekyll
Empty file.
Binary file added _images/web_ss_1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added _images/web_ss_2.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added _images/web_ss_3.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
63 changes: 63 additions & 0 deletions _sources/index.rst.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,63 @@
.. _sec-welcome:

Welcome to NVMe-Spex's documentation!
=====================================

.. toctree::
:maxdepth: 1
:hidden:

what_is_spex.rst
setup/index.rst
user_guide/stages.rst
user_guide/using_spex.rst
user_guide/dev.rst


Welcome to the documentation for **Spex**, a tool for extracting information
on data-structures in the NVMe specification documents.

To read more about what **Spex** does, see :ref:`sec-what-is-spex`.
For help on setting up **Spex** on your system, see :ref:`sec-setup`.


For direct usage of nvme-spex it is possible to run it from docker. For setup of
docker on windows we refer to guides from `Docker Desktop
<https://docs.docker.com/desktop/install/windows-install/>`_.

.. code-block:: shell
docker run --rm -v ~/Documents/specs/:/specs ghcr.io/samsungds/nvme-spex-webserver:latest run -s --output=/specs/output /specs/nvme_base.docx
The output of the run will be available at ~/Documents/specs/output in this example.

It is also possible to lint the docx specification by using the web application.
To start the web application can be started with the following command:

.. code-block:: shell
docker pull ghcr.io/samsungds/nvme-spex-webserver:latest
docker run --rm -p 8000:8000 ghcr.io/samsungds/nvme-spex-webserver:latest webserver
When the docker container is successfully running the web application can be
accessed in the browser at `http://localhost:8000 <http://localhost:8000>`_.


The web application will show the following user interface:

.. image:: images/web_ss_1.png
:width: 100%
:alt: Alternative text

Upload the specification .docx or .html file and press the submit button.

.. image:: images/web_ss_2.png
:width: 100%
:alt: Alternative text

After processing is done the web-application will show
the following report:

.. image:: images/web_ss_3.png
:width: 100%
:alt: Alternative text
30 changes: 30 additions & 0 deletions _sources/setup/index.rst.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
.. _sec-setup:

Setting up Spex
===============

.. toctree::
:maxdepth: 1
:hidden:

nix.rst
manual.rst

**Spex** has various dependencies. You can use :ref:`sec-setup-nix` to setup the reference
environment which is actively used in development and tested in CI.

Otherwise, you can install Spex in the traditional way, see :ref:`sec-setup-manual`.

.. note::
**A note on Spex' requirements**

Please note that these dependencies may change, and others may be added.
The *only* exhaustive description of dependencies, is the ``flake.nix`` file.

Please understand that dependencies are chosen and/or upgraded
to make development easier, increase software robustness or provide
additional features.
Dependencies will not be dropped, nor will code be rewritten to support
old software or conservative Linux distributions.

You can use Nix to run on such platforms.
43 changes: 43 additions & 0 deletions _sources/setup/manual.rst.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,43 @@
.. _sec-setup-manual:

Setting up Spex manually
========================

.. note::
In doubt as to which method to use when setting up Spex? See :ref:`sec-setup`.


.. warning::
Spex reserves the right to update dependencies if it
helps development or enables new features or better performance.

If you choose the manual route, it is up to *you* to update your system
accordingly.

**Spex** is implemented in Python and distributed via :pypi:`Pypi <>` and thus
installable via ``pip`` / ``pipx``::

pipx install nvme-spex

And then run it::

spex --help

For this to run then the following runtime requirements must be met:

1. Python >= 3.11
* **spex** relies on Python features for types introduced in Python 3.11

2. Python packages
* For details, then have a look at the **Spex** flake (``setup.cfg``)

3. C libraries used by the Python packages
* Specificaly, then **Spex** uses the Python package ``lxml`` which in turn
requires the ``libxml2`` C library to be present on the system.
* This may change and more dependencies may be added, see the ``flake.nix`` file for full details.


.. note::
The setup of the above requirements is specific to the environment that you are
using. The only supported environment is the reference environment, managed by
:ref:`sec-setup-nix`
84 changes: 84 additions & 0 deletions _sources/setup/nix.rst.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,84 @@
.. _sec-setup-nix:

Setting up Spex with Nix
=========================

Development is done in an environment managed by ``nix`` which recreates the
*exact* same environment as is used for running, developing and testing
**Spex**. To create the environment, you need to install ``nix``.

Using the Nix environment
-------------------------
If you have not yet installed nix, see :ref:`sec-setup-nix-install` below.

Run Spex (development)
~~~~~~~~~~~~~~~~~~~~~~
To run Spex in a development context, where unversioned local changes are taken into account:

Enter the development environment::

nix develop .#

Run spex::

spex

.. note::
The ``spex`` command is actually an alias defined in the shell described by ``flake.nix``, which modifies
the ``PYTHONPATH`` variable to put the ``./src`` directory on ``sys.path``, where modules are searched, and
the ``-m`` flag to execute the ``spex`` module.

This means ``spex`` uses the local source code files, and any changes made to the source will be reflected
next time you run ``spex``. However, it also *requires* you to stand in the project root (or ``src``) directory.

For details on how executable modules work and the ``python -m <module>`` command, see
`Python docs on __main__.py in Python Packages <https://docs.python.org/3/library/__main__.html#main-py-in-python-packages>`_ for details.

Now run ``spex -h`` (or just ``spex`` without arguments) to see which arguments you can provide.
For information on using Spex, see :ref:`sec-using-spex`.

Run the Spex program
~~~~~~~~~~~~~~~~~~~~
You can run spex, even without cloning the repository, like so::

nix run github:SamsungDS/Spex#spex

.. note::
that this is not for development use, this will not reflect any local changes made to the source code.

.. _sec-setup-nix-install:

Install Nix
-----------
Skip this section if you have already installed Nix.

Linux & MacOS
~~~~~~~~~~~~~

On Linux and MacOS, run the following to install nix::

curl --proto '=https' --tlsv1.2 -sSf -L https://install.determinate.systems/nix | sh -s -- install


Windows (WSL)
~~~~~~~~~~~~~
Windows can use Nix through the Windows Subsystem for Linux (WSL) environment.

First install WSL. Open a command-prompt and type::

wsl --install

You may have to reboot the machine afterwards.

Then install a Ubuntu WSL VM::

wsl --install -d ubuntu


Then, from *within the WSL environment* (type ``wsl`` in command-prompt to enter), install Nix in *single-user mode*::

sh <(curl -L https://nixos.org/nix/install) --no-daemon


Finally, close your command prompt(s) and start a new one, now Nix should be installed and ready for use!

134 changes: 134 additions & 0 deletions _sources/user_guide/dev.rst.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,134 @@
.. _sec-guide-dev:

Development Guide
=================

The following are some quick notes to help you get started with the **Spex** codebase.

Overview
--------

1. Extract table-like figures and document metadata - create stage 1 document
2. Using document metadata, select appropriate **DocumentParser**
3. Use **DocumentParser** to iterate over all figures
4. For each figure: extract its contents (fields/values) using an **Extractor**.
5. save to stage 2 document

Terminology
-----------
**DocumentParser**
A class responsible for driving the parsing of a document. It defines a method to iterate
over figures, defines a list of default **Extractors** to try when extracting the contents
of a figure and allows explicitly overriding/defining a specific extractor to use for a figure.
Finally, a **DocumentParser** also allows you to override the label (field name) and brief
(one-line description) of any field of any figure - as a way to enhance or improve the
generated output.

**Extractor**
An extractor is responsible for *extracting* the raw contents of a figure (a HTML table),
making sense of it (i.e. identifying field names, ranges etc) and transforming it into
one of the standard **entity** types which **Spex** operates with, namely **value** and
**struct**.

**Entity**
The stage 2 document (JSON) will contain a list of entities, each corresponding either
to a top-level figure, or some figure found nested within another figure.
An entity can presently be of two types, **value**, which describes something typically
rendered in C as an enum, or a (bit|byte) **struct**. A struct entities have one or more
fields with clearly defined ordering and sizes, as defined by their ranges. A struct with
bit ranges could be represented as a bit field in C, or as an array of values with macros
to extract each "field". A struct with byte ranges cleanly maps to a packed struct in C.

**Stage 0/1/2 Document**
See :ref:`sec-guide-stages`.

**Quirks**
Ideally all figures across all documents should be possible to parse using the same set of
standard extractors. We expect to work toward this in the future, but for now, we may need
to implement custom processing for specific figures. To do this, we create a custom
**DocumentParser** class and define overrides for specific figures within the document.
We then install this **DocumentParser**, containing the relevant overrides, into the
central **quirks map**, where **Spex** looks for a **DocumentParser** based on the
``(title, revision)`` key extracted from the document's page header.

(Detailed) Overview
-------------------

1. Create stage 1 document
~~~~~~~~~~~~~~~~~~~~~~~~~~
This stage is skipped if the input document is already a stage 1 document (HTML).

Otherwise, the original full specification is opened and all table-like figures
are extracted from the docx file and their form is translated into HTML and stored
into a stage 1 document.

Additionally, the document title and revision, visible on the page header on each
page, is extracted and embedded into the document. **Spex** uses this to uniquely
identify the document - and this allows us to define a custom **DocumentParser**
for the document, if need be.

2. Select appropriate **DocumentParser**
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
**Spex** determines the correct **DocumentParser** to use by forming a unique
key from the document title and revision, and looking up in a "quirks"
dictionary.
The quirks dictionary is defined in ``spex/jsonspec/quirks/__init__.py``.
Specialized document parsers are defined under ``spex/jsonspec/quirks``.

Otherwise, the default **DocumentParser** is used. Note that a specific
**DocumentParser** is only necessary provided we need to override which
extractor is applied to one or more figures, manually provide labels for
one or more fields or otherwise override parsing.

3. Iterate over all figures
~~~~~~~~~~~~~~~~~~~~~~~~~~~
The **DocumentParser** class (``spex/jsonspec/document.py``) provides the
``DocumentParser.parse`` method, which iterates over all top-level figures
in order, calling ``DocumentParser._on_parse_fig()`` on each figure found.

The ``_on_parse_fig()`` method then finds an appropriate **Extractor** to
apply (see below) which, among other things, may encounter additional
*nested* figures.
It is the responsibility of the **Extractor** to invoke a callback
(which points to ``DocumentParser._on_parse_fig()``) for each nested
figure it encounters.
This allows a **Extractor** to facilitate recursively parsing all relevant
nested figures, while ignoring irrelevant tables in other columns, for instance.

4. Apply an appropriate **Extractor** to extract figure contents
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
For each figure encountered, ``DocumentParser._on_parse_fig()`` is called.
This method first finds a suitable **Extractor** to use (or errors out), then
it applies it, getting a generator of **Entities**, it yields to its caller in turn.

Finding the **Extractor** to use
""""""""""""""""""""""""""""""""
The process to determine the **Extractor** to use is the following:

First look in ``DocumentParser.fig_extractor_overrides`` for an entry
matching the key of the ID of this figure.
This option is only used in case a the figure requires special processing.
In that case, we may provide a special extractor, usually a specialization
of the bits, bytes or value extractor, to handle processing.
These are typically defined alongside the specialized **DocumentExtrator**
in the ``spex/jsonspec/extractors/quirks`` package.

.. note::
**How are figures assigned an ID?**
For top-level figures, the ID is the number of the figure itself. For
nested figures, the **Extractor** used should construct a unique ID from
the parent ID and some unique data from the row, typically the bit/byte
offset or value from a value table.


In most cases, there is no specific override for a figure, and so the default
extractors, as specified by ``DocumentParser.extractors`` are tried, in order.
These are presently ``BytesTableExtractor``, ``ValueTableExtractor`` and
``BitsTableExtractor``, all defined in ``spex/jsonspec/extractors``.

In case of these extractors, the method tries each in turn, calling
the ``Extractor.can_apply()`` method on each, providing the extractor the
columns of the table. It is from the column names alone that an
extractor decides whether it is applicable or not.


35 changes: 35 additions & 0 deletions _sources/user_guide/stages.rst.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
.. _sec-guide-stages:

Stages
======

Spex is deliberately designed to break processing into a series of
discrete and individually verifiable stages.

Stage artifacts
~~~~~~~~~~~~~~~

Each stage generate one or more files as its output. These files are referred
to by their stage elsewhere in documentation. Each stage's output is presently
a file of a different file type, but this may change in the future, as possibly
additional steps of processing are added. Thus it is better to refer to a
document by its stage than file type.

* **Stage 0** - the NVMe specification document (in docx)
* **Stage 1** - the HTML file containing an extract of all table-like figures from stage 0.
* **Stage 2** - the JSON document representing all data-structures found in the document. These are either enum-like pairs of names and assigned codes/values, or struct-like data-structures, where each field is decribed by its range and field name.

Why organize the program into stages?
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Breaking the program into distinct stages is beneficial to development and
testing/verifying the output produced. However, it also has an important benefit to
users: the ability to opt-out at any point of processing.

Each step of processing makes the output more predictable and easy to process, but
it also throws away information. For example, in processing the docx document to
HTML, spex throws away every graphic and text between the tables.

In producing the stage 2 output, the data-structure model, Spex discards supplementary
text, figure headers, table section headers and so on.
Each stage further refines the data, but discards other data in the process.
Loading

0 comments on commit c9590b8

Please sign in to comment.