-
Notifications
You must be signed in to change notification settings - Fork 3
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
0 parents
commit c9590b8
Showing
53 changed files
with
6,999 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,4 @@ | ||
# Sphinx build info version 1 | ||
# This file hashes the configuration used when building these files. When it is not found, a full rebuild will be done. | ||
config: 4cd19021c6d2d6d014e58268b3091e7f | ||
tags: 645f666f9bcd5a90fca523b33c5a78b7 |
Empty file.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,63 @@ | ||
.. _sec-welcome: | ||
|
||
Welcome to NVMe-Spex's documentation! | ||
===================================== | ||
|
||
.. toctree:: | ||
:maxdepth: 1 | ||
:hidden: | ||
|
||
what_is_spex.rst | ||
setup/index.rst | ||
user_guide/stages.rst | ||
user_guide/using_spex.rst | ||
user_guide/dev.rst | ||
|
||
|
||
Welcome to the documentation for **Spex**, a tool for extracting information | ||
on data-structures in the NVMe specification documents. | ||
|
||
To read more about what **Spex** does, see :ref:`sec-what-is-spex`. | ||
For help on setting up **Spex** on your system, see :ref:`sec-setup`. | ||
|
||
|
||
For direct usage of nvme-spex it is possible to run it from docker. For setup of | ||
docker on windows we refer to guides from `Docker Desktop | ||
<https://docs.docker.com/desktop/install/windows-install/>`_. | ||
|
||
.. code-block:: shell | ||
docker run --rm -v ~/Documents/specs/:/specs ghcr.io/samsungds/nvme-spex-webserver:latest run -s --output=/specs/output /specs/nvme_base.docx | ||
The output of the run will be available at ~/Documents/specs/output in this example. | ||
|
||
It is also possible to lint the docx specification by using the web application. | ||
To start the web application can be started with the following command: | ||
|
||
.. code-block:: shell | ||
docker pull ghcr.io/samsungds/nvme-spex-webserver:latest | ||
docker run --rm -p 8000:8000 ghcr.io/samsungds/nvme-spex-webserver:latest webserver | ||
When the docker container is successfully running the web application can be | ||
accessed in the browser at `http://localhost:8000 <http://localhost:8000>`_. | ||
|
||
|
||
The web application will show the following user interface: | ||
|
||
.. image:: images/web_ss_1.png | ||
:width: 100% | ||
:alt: Alternative text | ||
|
||
Upload the specification .docx or .html file and press the submit button. | ||
|
||
.. image:: images/web_ss_2.png | ||
:width: 100% | ||
:alt: Alternative text | ||
|
||
After processing is done the web-application will show | ||
the following report: | ||
|
||
.. image:: images/web_ss_3.png | ||
:width: 100% | ||
:alt: Alternative text |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,30 @@ | ||
.. _sec-setup: | ||
|
||
Setting up Spex | ||
=============== | ||
|
||
.. toctree:: | ||
:maxdepth: 1 | ||
:hidden: | ||
|
||
nix.rst | ||
manual.rst | ||
|
||
**Spex** has various dependencies. You can use :ref:`sec-setup-nix` to setup the reference | ||
environment which is actively used in development and tested in CI. | ||
|
||
Otherwise, you can install Spex in the traditional way, see :ref:`sec-setup-manual`. | ||
|
||
.. note:: | ||
**A note on Spex' requirements** | ||
|
||
Please note that these dependencies may change, and others may be added. | ||
The *only* exhaustive description of dependencies, is the ``flake.nix`` file. | ||
|
||
Please understand that dependencies are chosen and/or upgraded | ||
to make development easier, increase software robustness or provide | ||
additional features. | ||
Dependencies will not be dropped, nor will code be rewritten to support | ||
old software or conservative Linux distributions. | ||
|
||
You can use Nix to run on such platforms. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,43 @@ | ||
.. _sec-setup-manual: | ||
|
||
Setting up Spex manually | ||
======================== | ||
|
||
.. note:: | ||
In doubt as to which method to use when setting up Spex? See :ref:`sec-setup`. | ||
|
||
|
||
.. warning:: | ||
Spex reserves the right to update dependencies if it | ||
helps development or enables new features or better performance. | ||
|
||
If you choose the manual route, it is up to *you* to update your system | ||
accordingly. | ||
|
||
**Spex** is implemented in Python and distributed via :pypi:`Pypi <>` and thus | ||
installable via ``pip`` / ``pipx``:: | ||
|
||
pipx install nvme-spex | ||
|
||
And then run it:: | ||
|
||
spex --help | ||
|
||
For this to run then the following runtime requirements must be met: | ||
|
||
1. Python >= 3.11 | ||
* **spex** relies on Python features for types introduced in Python 3.11 | ||
|
||
2. Python packages | ||
* For details, then have a look at the **Spex** flake (``setup.cfg``) | ||
|
||
3. C libraries used by the Python packages | ||
* Specificaly, then **Spex** uses the Python package ``lxml`` which in turn | ||
requires the ``libxml2`` C library to be present on the system. | ||
* This may change and more dependencies may be added, see the ``flake.nix`` file for full details. | ||
|
||
|
||
.. note:: | ||
The setup of the above requirements is specific to the environment that you are | ||
using. The only supported environment is the reference environment, managed by | ||
:ref:`sec-setup-nix` |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,84 @@ | ||
.. _sec-setup-nix: | ||
|
||
Setting up Spex with Nix | ||
========================= | ||
|
||
Development is done in an environment managed by ``nix`` which recreates the | ||
*exact* same environment as is used for running, developing and testing | ||
**Spex**. To create the environment, you need to install ``nix``. | ||
|
||
Using the Nix environment | ||
------------------------- | ||
If you have not yet installed nix, see :ref:`sec-setup-nix-install` below. | ||
|
||
Run Spex (development) | ||
~~~~~~~~~~~~~~~~~~~~~~ | ||
To run Spex in a development context, where unversioned local changes are taken into account: | ||
|
||
Enter the development environment:: | ||
|
||
nix develop .# | ||
|
||
Run spex:: | ||
|
||
spex | ||
|
||
.. note:: | ||
The ``spex`` command is actually an alias defined in the shell described by ``flake.nix``, which modifies | ||
the ``PYTHONPATH`` variable to put the ``./src`` directory on ``sys.path``, where modules are searched, and | ||
the ``-m`` flag to execute the ``spex`` module. | ||
|
||
This means ``spex`` uses the local source code files, and any changes made to the source will be reflected | ||
next time you run ``spex``. However, it also *requires* you to stand in the project root (or ``src``) directory. | ||
|
||
For details on how executable modules work and the ``python -m <module>`` command, see | ||
`Python docs on __main__.py in Python Packages <https://docs.python.org/3/library/__main__.html#main-py-in-python-packages>`_ for details. | ||
|
||
Now run ``spex -h`` (or just ``spex`` without arguments) to see which arguments you can provide. | ||
For information on using Spex, see :ref:`sec-using-spex`. | ||
|
||
Run the Spex program | ||
~~~~~~~~~~~~~~~~~~~~ | ||
You can run spex, even without cloning the repository, like so:: | ||
|
||
nix run github:SamsungDS/Spex#spex | ||
|
||
.. note:: | ||
that this is not for development use, this will not reflect any local changes made to the source code. | ||
|
||
.. _sec-setup-nix-install: | ||
|
||
Install Nix | ||
----------- | ||
Skip this section if you have already installed Nix. | ||
|
||
Linux & MacOS | ||
~~~~~~~~~~~~~ | ||
|
||
On Linux and MacOS, run the following to install nix:: | ||
|
||
curl --proto '=https' --tlsv1.2 -sSf -L https://install.determinate.systems/nix | sh -s -- install | ||
|
||
|
||
Windows (WSL) | ||
~~~~~~~~~~~~~ | ||
Windows can use Nix through the Windows Subsystem for Linux (WSL) environment. | ||
|
||
First install WSL. Open a command-prompt and type:: | ||
|
||
wsl --install | ||
|
||
You may have to reboot the machine afterwards. | ||
|
||
Then install a Ubuntu WSL VM:: | ||
|
||
wsl --install -d ubuntu | ||
|
||
|
||
Then, from *within the WSL environment* (type ``wsl`` in command-prompt to enter), install Nix in *single-user mode*:: | ||
|
||
sh <(curl -L https://nixos.org/nix/install) --no-daemon | ||
|
||
|
||
Finally, close your command prompt(s) and start a new one, now Nix should be installed and ready for use! | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,134 @@ | ||
.. _sec-guide-dev: | ||
|
||
Development Guide | ||
================= | ||
|
||
The following are some quick notes to help you get started with the **Spex** codebase. | ||
|
||
Overview | ||
-------- | ||
|
||
1. Extract table-like figures and document metadata - create stage 1 document | ||
2. Using document metadata, select appropriate **DocumentParser** | ||
3. Use **DocumentParser** to iterate over all figures | ||
4. For each figure: extract its contents (fields/values) using an **Extractor**. | ||
5. save to stage 2 document | ||
|
||
Terminology | ||
----------- | ||
**DocumentParser** | ||
A class responsible for driving the parsing of a document. It defines a method to iterate | ||
over figures, defines a list of default **Extractors** to try when extracting the contents | ||
of a figure and allows explicitly overriding/defining a specific extractor to use for a figure. | ||
Finally, a **DocumentParser** also allows you to override the label (field name) and brief | ||
(one-line description) of any field of any figure - as a way to enhance or improve the | ||
generated output. | ||
|
||
**Extractor** | ||
An extractor is responsible for *extracting* the raw contents of a figure (a HTML table), | ||
making sense of it (i.e. identifying field names, ranges etc) and transforming it into | ||
one of the standard **entity** types which **Spex** operates with, namely **value** and | ||
**struct**. | ||
|
||
**Entity** | ||
The stage 2 document (JSON) will contain a list of entities, each corresponding either | ||
to a top-level figure, or some figure found nested within another figure. | ||
An entity can presently be of two types, **value**, which describes something typically | ||
rendered in C as an enum, or a (bit|byte) **struct**. A struct entities have one or more | ||
fields with clearly defined ordering and sizes, as defined by their ranges. A struct with | ||
bit ranges could be represented as a bit field in C, or as an array of values with macros | ||
to extract each "field". A struct with byte ranges cleanly maps to a packed struct in C. | ||
|
||
**Stage 0/1/2 Document** | ||
See :ref:`sec-guide-stages`. | ||
|
||
**Quirks** | ||
Ideally all figures across all documents should be possible to parse using the same set of | ||
standard extractors. We expect to work toward this in the future, but for now, we may need | ||
to implement custom processing for specific figures. To do this, we create a custom | ||
**DocumentParser** class and define overrides for specific figures within the document. | ||
We then install this **DocumentParser**, containing the relevant overrides, into the | ||
central **quirks map**, where **Spex** looks for a **DocumentParser** based on the | ||
``(title, revision)`` key extracted from the document's page header. | ||
|
||
(Detailed) Overview | ||
------------------- | ||
|
||
1. Create stage 1 document | ||
~~~~~~~~~~~~~~~~~~~~~~~~~~ | ||
This stage is skipped if the input document is already a stage 1 document (HTML). | ||
|
||
Otherwise, the original full specification is opened and all table-like figures | ||
are extracted from the docx file and their form is translated into HTML and stored | ||
into a stage 1 document. | ||
|
||
Additionally, the document title and revision, visible on the page header on each | ||
page, is extracted and embedded into the document. **Spex** uses this to uniquely | ||
identify the document - and this allows us to define a custom **DocumentParser** | ||
for the document, if need be. | ||
|
||
2. Select appropriate **DocumentParser** | ||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | ||
**Spex** determines the correct **DocumentParser** to use by forming a unique | ||
key from the document title and revision, and looking up in a "quirks" | ||
dictionary. | ||
The quirks dictionary is defined in ``spex/jsonspec/quirks/__init__.py``. | ||
Specialized document parsers are defined under ``spex/jsonspec/quirks``. | ||
|
||
Otherwise, the default **DocumentParser** is used. Note that a specific | ||
**DocumentParser** is only necessary provided we need to override which | ||
extractor is applied to one or more figures, manually provide labels for | ||
one or more fields or otherwise override parsing. | ||
|
||
3. Iterate over all figures | ||
~~~~~~~~~~~~~~~~~~~~~~~~~~~ | ||
The **DocumentParser** class (``spex/jsonspec/document.py``) provides the | ||
``DocumentParser.parse`` method, which iterates over all top-level figures | ||
in order, calling ``DocumentParser._on_parse_fig()`` on each figure found. | ||
|
||
The ``_on_parse_fig()`` method then finds an appropriate **Extractor** to | ||
apply (see below) which, among other things, may encounter additional | ||
*nested* figures. | ||
It is the responsibility of the **Extractor** to invoke a callback | ||
(which points to ``DocumentParser._on_parse_fig()``) for each nested | ||
figure it encounters. | ||
This allows a **Extractor** to facilitate recursively parsing all relevant | ||
nested figures, while ignoring irrelevant tables in other columns, for instance. | ||
|
||
4. Apply an appropriate **Extractor** to extract figure contents | ||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | ||
For each figure encountered, ``DocumentParser._on_parse_fig()`` is called. | ||
This method first finds a suitable **Extractor** to use (or errors out), then | ||
it applies it, getting a generator of **Entities**, it yields to its caller in turn. | ||
|
||
Finding the **Extractor** to use | ||
"""""""""""""""""""""""""""""""" | ||
The process to determine the **Extractor** to use is the following: | ||
|
||
First look in ``DocumentParser.fig_extractor_overrides`` for an entry | ||
matching the key of the ID of this figure. | ||
This option is only used in case a the figure requires special processing. | ||
In that case, we may provide a special extractor, usually a specialization | ||
of the bits, bytes or value extractor, to handle processing. | ||
These are typically defined alongside the specialized **DocumentExtrator** | ||
in the ``spex/jsonspec/extractors/quirks`` package. | ||
|
||
.. note:: | ||
**How are figures assigned an ID?** | ||
For top-level figures, the ID is the number of the figure itself. For | ||
nested figures, the **Extractor** used should construct a unique ID from | ||
the parent ID and some unique data from the row, typically the bit/byte | ||
offset or value from a value table. | ||
|
||
|
||
In most cases, there is no specific override for a figure, and so the default | ||
extractors, as specified by ``DocumentParser.extractors`` are tried, in order. | ||
These are presently ``BytesTableExtractor``, ``ValueTableExtractor`` and | ||
``BitsTableExtractor``, all defined in ``spex/jsonspec/extractors``. | ||
|
||
In case of these extractors, the method tries each in turn, calling | ||
the ``Extractor.can_apply()`` method on each, providing the extractor the | ||
columns of the table. It is from the column names alone that an | ||
extractor decides whether it is applicable or not. | ||
|
||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,35 @@ | ||
.. _sec-guide-stages: | ||
|
||
Stages | ||
====== | ||
|
||
Spex is deliberately designed to break processing into a series of | ||
discrete and individually verifiable stages. | ||
|
||
Stage artifacts | ||
~~~~~~~~~~~~~~~ | ||
|
||
Each stage generate one or more files as its output. These files are referred | ||
to by their stage elsewhere in documentation. Each stage's output is presently | ||
a file of a different file type, but this may change in the future, as possibly | ||
additional steps of processing are added. Thus it is better to refer to a | ||
document by its stage than file type. | ||
|
||
* **Stage 0** - the NVMe specification document (in docx) | ||
* **Stage 1** - the HTML file containing an extract of all table-like figures from stage 0. | ||
* **Stage 2** - the JSON document representing all data-structures found in the document. These are either enum-like pairs of names and assigned codes/values, or struct-like data-structures, where each field is decribed by its range and field name. | ||
|
||
Why organize the program into stages? | ||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | ||
|
||
Breaking the program into distinct stages is beneficial to development and | ||
testing/verifying the output produced. However, it also has an important benefit to | ||
users: the ability to opt-out at any point of processing. | ||
|
||
Each step of processing makes the output more predictable and easy to process, but | ||
it also throws away information. For example, in processing the docx document to | ||
HTML, spex throws away every graphic and text between the tables. | ||
|
||
In producing the stage 2 output, the data-structure model, Spex discards supplementary | ||
text, figure headers, table section headers and so on. | ||
Each stage further refines the data, but discards other data in the process. |
Oops, something went wrong.