Added new doc site

jafioti · Apr 24, 2024 · 5b8e922 · 5b8e922
1 parent 1424c40
commit 5b8e922
Show file tree

Hide file tree

Showing 20 changed files with 298 additions and 42 deletions.
diff --git a/docs/02 GraphTensor API.md b/docs/02 GraphTensor API.md
diff --git a/docs/05 Serialization.md b/docs/05 Serialization.md
diff --git a/docs/06 Writing Compilers.md b/docs/06 Writing Compilers.md
diff --git a/docs/README.md b/docs/README.md
@@ -0,0 +1,4 @@
+```
+npm i -g mintlify
+mintlify dev
+```
diff --git a/docs/blog/4-24-2024.mdx b/docs/blog/4-24-2024.mdx
@@ -0,0 +1,48 @@
+---
+title: 'Luminal: Efficient ML in Rust through graph compilation'
+description: 'A new approach to ML'
+---
+![](https://raw.githubusercontent.com/jafioti/luminal/main/dag.jpeg)
+
+**Luminal is a deep learning library that uses composable compilers to achieve high performance.**
+
+Current ML libraries tend to be large and complex because they try to map high level operations directly on to low level handwritten kernels, and focus on eager execution. Libraries like PyTorch contain hundreds of thousands of lines of code, making it nearly impossible for a single programmer to understand it all, set aside do a large refactor.
+
+But does it need to be so complex? ML models tend to be static dataflow graphs made up of a few simple operators. This allows us to have a dirt simple core only supporting a few primitive operations, and use them to build up complex neural networks. We can then write compilers that modify the graph after we build it, to swap more efficient ops back in depending on which backend we're running on.
+
+Luminal takes this approach to the extreme, supporting only 11 primitive operations (primops):
+- **Unary** - Log2, Exp2, Sin, Sqrt, Recip
+- **Binary** - Add, Mul, Mod, LessThan
+- **Other** - SumReduce, MaxReduce, Contiguous
+
+Every complex operation boils down to these primitive operations, so when you do `a - b` for instance, `add(a, mul(b, -1))` gets written to the graph. Or when you do `a.matmul(b)`, what actually gets put on the graph is `sum_reduce(mul(reshape(a), reshape(b)))`.
+
+Once the graph is built, iterative compiler passes can modify it to replace primops with more efficient ops, depending on the device it's running on. On Nvidia cards, for instance, efficient Cuda kernels are written on the fly to replace these ops, and specialized cublas kernels are swapped in for supported operations.
+
+This approach leads to a simple library, and performance is only limited by the creativity of the compiler programmer, not the model programmer.
+
+Luminal has a number of other neat features, check out the repo [here](https://github.com/jafioti/luminal).
+
+## Welcome
+
+There are two ways to build API documentation: [OpenAPI](https://mintlify.com/docs/api-playground/openapi/setup) and [MDX components](https://mintlify.com/docs/api-playground/mdx/configuration). For the starter kit, we are using the following OpenAPI specification.
+
+<Card
+  title="Plant Store Endpoints"
+  icon="leaf"
+  href="https://github.com/mintlify/starter/blob/main/api-reference/openapi.json"
+>
+  View the OpenAPI specification file
+</Card>
+
+## Authentication
+
+All API endpoints are authenticated using Bearer tokens and picked up from the specification file.
+
+```json
+"security": [
+  {
+    "bearerAuth": []
+  }
+]
+```
diff --git a/docs/blog/endpoint/create.mdx b/docs/blog/endpoint/create.mdx
@@ -0,0 +1,4 @@
+---
+title: 'Create Plant'
+openapi: 'POST /plants'
+---
diff --git a/docs/blog/endpoint/delete.mdx b/docs/blog/endpoint/delete.mdx
@@ -0,0 +1,4 @@
+---
+title: 'Delete Plant'
+openapi: 'DELETE /plants/{id}'
+---
diff --git a/docs/blog/endpoint/get.mdx b/docs/blog/endpoint/get.mdx
@@ -0,0 +1,4 @@
+---
+title: 'Get Plants'
+openapi: 'GET /plants'
+---
diff --git a/docs/CONTRIBUTING.md → docs/developers/introduction.mdx b/docs/CONTRIBUTING.md → docs/developers/introduction.mdx
@@ -1,10 +1,22 @@
-# Contributing to luminal
-![image](https://raw.githubusercontent.com/jafioti/luminal/main/resources/dag.jpeg)
+---
+title: Developing Luminal
+description: 'Building the future of ML.'
+icon: 'hand-wave'
+---
 
-Please take a look at the [issues](https://github.com/jafioti/luminal/issues) and [roadmap](https://github.com/users/jafioti/projects/1) to see what's targeted for upcoming releases. Contributions for those features are preferred and will be reviewed and merged very rapidly. Other contributions are welcome, but please note luminal is and always will be a fairly minimal library.
+<img
+  className="block dark:hidden rounded-xl"
+  src="/images/abstract_light.jpg"
+  alt="Hero Light"
+/>
+<img
+  className="hidden dark:block rounded-xl"
+  src="/images/abstract.jpg"
+  alt="Hero Dark"
+/>
 
-The core design of luminal is heavily predicated on extensibility. Compilers alow for immense complexity to be removed from the core library and added with third party compilers. For instance, datatypes and devices are typically first class primitives. In luminal, they're compilers and the core has no idea about them. This is the general trend we'll stick to: core remains brutally simple, and everything that can be externalized to a compiler will be.
+Please take a look at the [issues](https://github.com/jafioti/luminal/issues) and [roadmap](https://github.com/users/jafioti/projects/1) to see what's targeted for upcoming releases. Contributions for those features are preferred and will be reviewed and merged very rapidly. Other contributions are welcome, but please note Luminal is and always will be a fairly minimal library.
 
-We will be adding training support soon, and as you guessed, it will entirely reside in a compiler. Just define the model's graph, run the output through an optimizer, and then run the `AutogradCompiler` before any other compilers. Boom, we got training, and the core of the library has no idea! (aside from some quality of life apis)
+The core design of Luminal is heavily predicated on extensibility. Compilers alow for immense complexity to be removed from the core library and added with third party compilers. For instance, datatypes and devices are typically first class primitives. In Luminal, they're compilers and the core has no idea about them. This is the general trend we'll stick to: core remains brutally simple, and everything that can be externalized to a compiler will be.
 
-PRs that remove complexity are always welcome, but note that line count often is a bad proxy for complexity. Ideally the entire luminal core should be a few thousand lines of code, but anything remotely resembling code golf is not allowed.
+PRs that remove complexity are always welcome, but note that line count often is a bad proxy for complexity. Ideally the entire Luminal core should be a few thousand lines of code, but anything remotely resembling code golf is not allowed.
diff --git a/docs/04 Compilers.md → docs/docs/compilers.mdx b/docs/04 Compilers.md → docs/docs/compilers.mdx
@@ -1,27 +1,30 @@
-# Compilers
+---
+title: Compilers
+description: 'Core transformations of the computation graph.'
+icon: 'microchip'
+---
 
 So now we have our graph all set up. We did our forward passes through the model, so now what? Do we run it?
 
 We could! But it wouldn't be very fast. Right now your graph is full of **primops**, which are the simplest set of primitive operations in luminal. One of the key tenants of luminal is a small primop set, which makes it easy to add new backends and write compilers for. But another consequence of a small primset is that even simple operations usually end up creating quite a few operations, and even small neural networks can end up with hundreds or thousands of primops, which are slow to run directly. So it's time to compile the graph!
 
-Compilers are structs that implement the `Compiler` trait, which simply specifies a single function:
+We use a loose definition of a compiler. Compilers are structs that implement the `Compiler` trait, which simply specifies a single function:
 ```rust
 pub trait Compiler {
+    type Output = ();
     /// Run a compilation pass
-    fn compile<T: ToIdsMut>(&self, graph: &mut Graph, remap: T);
+    fn compile<T: ToIdsMut>(&self, graph: &mut Graph, remap: T) -> Self::Output;
 }
 ```
-So all a compiler does is take a mutable reference to the graph, something called remap (beyond the scope of this introduction), and does something to the graph. That something is compilation, usually in the form of finding patterns of nodes and replacing them with other nodes. For instance, there's no Subtract operation in the primops, so subtractions are implemented as `add(a, mul(b, -1))`. We can have a compiler that looks for that pattern of nodes and directly replaces it with a `Subtract` operation. We'll look at how to do this in the [Writing Compilers](https://github.com/jafioti/luminal/blob/main/docs/06%20Writing%20Compilers.md) section.
+So all a compiler does is take a mutable reference to the graph, something called remap (beyond the scope of this introduction), and does something to the graph. That something is compilation, usually in the form of finding patterns of nodes and replacing them with other nodes. For instance, there's no Subtract operation in the primops, so subtractions are implemented as `add(a, mul(b, -1))`. We can have a compiler that looks for that pattern of nodes and directly replaces it with a `Subtract` operation. We'll look at how to do this in the [Writing Compilers](/developers/compilers) section.
 
 All you need to know for now is that we can use this compiler on the graph by doing:
 ```rust
 cx.compile(SubtractionCompiler::default());
 ```
 Now the graph will have the old mul + add pattern removed and Subtract ops placed in. There are plenty of different compilers for different purposes. Some of the popular ones:
 - GenericCompiler - A handful of hardware-agnostic optimizations like [CSE](https://en.wikipedia.org/wiki/Common_subexpression_elimination) to be ran before any hardware-specific compilers.
-- CudaCompiler<T> - The full stack of cuda compilers to convert a graph to a cuda-specialized graph with T as the datatype (either f32 or f16). Imported from luminal_cuda
-- MetalCompiler<T> - Same as CudaCompiler. Imported from luminal_metal
+- CudaCompiler\<T\> - The full stack of cuda compilers to convert a graph to a cuda-specialized graph with T as the datatype (either f32 or f16). Imported from luminal_cuda.
+- MetalCompiler\<T\> - Same as CudaCompiler. Imported from luminal_metal.
 
-Compilers are entirely seperate from luminal, so they can be fully implemented by third party crates. For instance, everything specific to Cuda is contained in luminal_cuda.
-
-[Now let's look into how to load weights from a file.](https://github.com/jafioti/luminal/blob/main/docs/05%20Serialization.md)
+Compilers are entirely seperate from luminal, so they can be fully implemented by third party crates. For instance, everything specific to Cuda is contained in luminal_cuda.
diff --git a/docs/01 Introduction.md → docs/docs/graphtensor.mdx b/docs/01 Introduction.md → docs/docs/graphtensor.mdx
@@ -1,5 +1,10 @@
-# Luminal Introduction
+---
+title: GraphTensor API
+description: 'The high-level interface for writing ML code, checked at compile time.'
+icon: 'webhook'
+---
 
+## Familiarizing ourselves
 Let's get up to speed with how to use luminal, and how it works internally.
 
 First we'll take a look at what the simplest program will look like:
@@ -35,4 +40,22 @@ Then we set the data for these tensors. But if `GraphTensor` doesn't hold data,
 
 Alright, that was a lot but now we've touched on all the main aspects of running a model in luminal.
 
-[Let's take a look at each piece in more depth.](https://github.com/jafioti/luminal/blob/main/docs/02%20GraphTensor%20API.md)
+## GraphTensors
+
+We're working with pretty complicated graphs to build our computation on, but we don't want to manually place all the nodes ourselves! So how can we build these static graphs in a nice, familiar way? GraphTensors!
+
+Essentially GraphTensors are pointers to a specific node on the graph, as well as some metadata about the output of that node, such as its shape. We can make a new GraphTensor by doing:
+```rust
+let mut cx = Graph::new(); // We need a graph to build!
+let a: GraphTensor<R1<3>> = cx.tensor(); // Here we create a new node on the graph and get a GraphTensor back, pointing to it.
+```
+Notice the type of `a`: `GraphTensor<R1<3>>`. So what's that generic all about? It's the shape! We make tensor shapes part of the type, so they're tracked at compile time! In this case, the shape is rank 1, with 3 elements, or in other words, a vector of 3 dimensions. (Side note: `R1<N>` is a typedef of `(Const<N>,)`) It should be impossible to accidentally get a runtime shape mismatch.
+
+Now we can use the `a` as you would in a library like PyTorch, performing linear algebra:
+```rust
+let b = a.exp().sqrt();
+let c = b + a;
+```
+We just placed some ops on the graph! It doesn't look like it because you don't need to think about the graph while writing ML code.
+
+Next we'll see how GraphTensors are used to build whole neural networks.
diff --git a/docs/docs/introduction.mdx b/docs/docs/introduction.mdx
@@ -0,0 +1,67 @@
+---
+title: Introduction
+description: 'Welcome to a new way to do ML.'
+icon: 'hand-wave'
+---
+
+<img
+  className="block dark:hidden rounded-xl"
+  src="/images/abstract_light.jpg"
+  alt="Hero Light"
+/>
+<img
+  className="hidden dark:block rounded-xl"
+  src="/images/abstract.jpg"
+  alt="Hero Dark"
+/>
+
+Luminal is a new machine learning framework focused on **speed**, **simplicity** and **composability**. We take a new approach to ML by focusing on static graphs and leaning heavily on compilers.
+
+## Contents
+
+Navigate around the Luminal docs.
+
+<CardGroup cols={2}>
+  <Card
+    title="Quickstart"
+    icon="bolt"
+    href="/docs/quickstart"
+  >
+    Get up and running ML models in a flash.
+  </Card>
+  <Card
+    title="Why Luminal"
+    icon="lightbulb"
+    href="/docs/why"
+  >
+    Dive into why Luminal was created and the design philosophy behind it.
+  </Card>
+  <Card
+    title="GraphTensor API"
+    icon="webhook"
+    href="/docs/graphtensor"
+  >
+    High-level interface for building models.
+  </Card>
+  <Card
+    title="Modules"
+    icon="shapes"
+    href="/docs/modules"
+  >
+    Composable building blocks of complex neural networks.
+  </Card>
+  <Card
+    title="Compilers"
+    icon="microchip"
+    href="/docs/compilers"
+  >
+    Core transformations of the computation graph.
+  </Card>
+  <Card
+    title="Developers"
+    icon="code"
+    href="/docs/developers"
+  >
+    Resources for contributors and future development.
+  </Card>
+</CardGroup>
diff --git a/docs/03 Modules.md → docs/docs/modules.mdx b/docs/03 Modules.md → docs/docs/modules.mdx
@@ -1,4 +1,9 @@
-# NN Modules
+---
+title: Modules
+description: 'Composable building blocks of complex neural networks.'
+icon: 'shapes'
+---
+
 Like any good DL library, we organize our networks into `Module`s. Here is the module trait:
 ```rust
 /// A module with a forward pass
@@ -26,6 +31,4 @@ impl<const A: usize, const B: usize> Module<GraphTensor<R1<A>>> for Linear<A, B>
 ```
 Here we see a single weight matrix as the internal state, of size AxB. We've written a single forward function for single input vectors of shape (A,) and matmul it by our weight matrix to get an output of shape (B,).
 
-Now all of these ops are recorded on the graph, to be compiled and ran later on.
-
-[So how does this compilation work? Let's find out!](https://github.com/jafioti/luminal/blob/main/docs/04%20Compilers.md)
+Now all of these ops are recorded on the graph, to be compiled and ran later on.
diff --git a/docs/docs/quickstart.mdx b/docs/docs/quickstart.mdx
@@ -0,0 +1,39 @@
+---
+title: 'Quickstart'
+description: 'Start running ML models in minutes.'
+icon: 'bolt'
+---
+
+## Clone the repo
+
+Clone the codebase locally by running the following:
+```bash
+git clone https://github.com/jafioti/luminal
+cd luminal
+```
+
+## Hello World
+
+Simple examples demonstrate how a library works without diving in too deep. Run your first Luminal code like so:
+```bash
+cd ./examples
+cargo run --release
+```
+Great! You've ran your first Luminal model!
+
+## Run Llama 3
+
+Run the following to start generating text with Llama 3 8B:
+```bash
+cd ./examples/llama
+# Download the model
+bash ./setup/setup.sh
+# Run the model
+cargo run --release --features metal    # MacOS (Recommended)
+cargo run --release --features cuda     # Nvidia
+cargo run --release                     # CPU
+```
+
+<Warning>
+  Luminal currently isn't well optimized for CPU usage, so running large models like Llama 3 on CPU isn't recommended.
+</Warning>