Skip to content

Commit

Permalink
Added new doc site
Browse files Browse the repository at this point in the history
  • Loading branch information
jafioti committed Apr 24, 2024
1 parent 1424c40 commit 5b8e922
Show file tree
Hide file tree
Showing 20 changed files with 298 additions and 42 deletions.
19 changes: 0 additions & 19 deletions docs/02 GraphTensor API.md

This file was deleted.

1 change: 0 additions & 1 deletion docs/05 Serialization.md

This file was deleted.

1 change: 0 additions & 1 deletion docs/06 Writing Compilers.md

This file was deleted.

4 changes: 4 additions & 0 deletions docs/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
```
npm i -g mintlify
mintlify dev
```
48 changes: 48 additions & 0 deletions docs/blog/4-24-2024.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,48 @@
---
title: 'Luminal: Efficient ML in Rust through graph compilation'
description: 'A new approach to ML'
---
![](https://raw.githubusercontent.com/jafioti/luminal/main/dag.jpeg)

**Luminal is a deep learning library that uses composable compilers to achieve high performance.**

Current ML libraries tend to be large and complex because they try to map high level operations directly on to low level handwritten kernels, and focus on eager execution. Libraries like PyTorch contain hundreds of thousands of lines of code, making it nearly impossible for a single programmer to understand it all, set aside do a large refactor.

But does it need to be so complex? ML models tend to be static dataflow graphs made up of a few simple operators. This allows us to have a dirt simple core only supporting a few primitive operations, and use them to build up complex neural networks. We can then write compilers that modify the graph after we build it, to swap more efficient ops back in depending on which backend we're running on.

Luminal takes this approach to the extreme, supporting only 11 primitive operations (primops):
- **Unary** - Log2, Exp2, Sin, Sqrt, Recip
- **Binary** - Add, Mul, Mod, LessThan
- **Other** - SumReduce, MaxReduce, Contiguous

Every complex operation boils down to these primitive operations, so when you do `a - b` for instance, `add(a, mul(b, -1))` gets written to the graph. Or when you do `a.matmul(b)`, what actually gets put on the graph is `sum_reduce(mul(reshape(a), reshape(b)))`.

Once the graph is built, iterative compiler passes can modify it to replace primops with more efficient ops, depending on the device it's running on. On Nvidia cards, for instance, efficient Cuda kernels are written on the fly to replace these ops, and specialized cublas kernels are swapped in for supported operations.

This approach leads to a simple library, and performance is only limited by the creativity of the compiler programmer, not the model programmer.

Luminal has a number of other neat features, check out the repo [here](https://github.com/jafioti/luminal).

## Welcome

There are two ways to build API documentation: [OpenAPI](https://mintlify.com/docs/api-playground/openapi/setup) and [MDX components](https://mintlify.com/docs/api-playground/mdx/configuration). For the starter kit, we are using the following OpenAPI specification.

<Card
title="Plant Store Endpoints"
icon="leaf"
href="https://github.com/mintlify/starter/blob/main/api-reference/openapi.json"
>
View the OpenAPI specification file
</Card>

## Authentication

All API endpoints are authenticated using Bearer tokens and picked up from the specification file.

```json
"security": [
{
"bearerAuth": []
}
]
```
4 changes: 4 additions & 0 deletions docs/blog/endpoint/create.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
---
title: 'Create Plant'
openapi: 'POST /plants'
---
4 changes: 4 additions & 0 deletions docs/blog/endpoint/delete.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
---
title: 'Delete Plant'
openapi: 'DELETE /plants/{id}'
---
4 changes: 4 additions & 0 deletions docs/blog/endpoint/get.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
---
title: 'Get Plants'
openapi: 'GET /plants'
---
24 changes: 18 additions & 6 deletions docs/CONTRIBUTING.md → docs/developers/introduction.mdx
Original file line number Diff line number Diff line change
@@ -1,10 +1,22 @@
# Contributing to luminal
![image](https://raw.githubusercontent.com/jafioti/luminal/main/resources/dag.jpeg)
---
title: Developing Luminal
description: 'Building the future of ML.'
icon: 'hand-wave'
---

Please take a look at the [issues](https://github.com/jafioti/luminal/issues) and [roadmap](https://github.com/users/jafioti/projects/1) to see what's targeted for upcoming releases. Contributions for those features are preferred and will be reviewed and merged very rapidly. Other contributions are welcome, but please note luminal is and always will be a fairly minimal library.
<img
className="block dark:hidden rounded-xl"
src="/images/abstract_light.jpg"
alt="Hero Light"
/>
<img
className="hidden dark:block rounded-xl"
src="/images/abstract.jpg"
alt="Hero Dark"
/>

The core design of luminal is heavily predicated on extensibility. Compilers alow for immense complexity to be removed from the core library and added with third party compilers. For instance, datatypes and devices are typically first class primitives. In luminal, they're compilers and the core has no idea about them. This is the general trend we'll stick to: core remains brutally simple, and everything that can be externalized to a compiler will be.
Please take a look at the [issues](https://github.com/jafioti/luminal/issues) and [roadmap](https://github.com/users/jafioti/projects/1) to see what's targeted for upcoming releases. Contributions for those features are preferred and will be reviewed and merged very rapidly. Other contributions are welcome, but please note Luminal is and always will be a fairly minimal library.

We will be adding training support soon, and as you guessed, it will entirely reside in a compiler. Just define the model's graph, run the output through an optimizer, and then run the `AutogradCompiler` before any other compilers. Boom, we got training, and the core of the library has no idea! (aside from some quality of life apis)
The core design of Luminal is heavily predicated on extensibility. Compilers alow for immense complexity to be removed from the core library and added with third party compilers. For instance, datatypes and devices are typically first class primitives. In Luminal, they're compilers and the core has no idea about them. This is the general trend we'll stick to: core remains brutally simple, and everything that can be externalized to a compiler will be.

PRs that remove complexity are always welcome, but note that line count often is a bad proxy for complexity. Ideally the entire luminal core should be a few thousand lines of code, but anything remotely resembling code golf is not allowed.
PRs that remove complexity are always welcome, but note that line count often is a bad proxy for complexity. Ideally the entire Luminal core should be a few thousand lines of code, but anything remotely resembling code golf is not allowed.
21 changes: 12 additions & 9 deletions docs/04 Compilers.md → docs/docs/compilers.mdx
Original file line number Diff line number Diff line change
@@ -1,27 +1,30 @@
# Compilers
---
title: Compilers
description: 'Core transformations of the computation graph.'
icon: 'microchip'
---

So now we have our graph all set up. We did our forward passes through the model, so now what? Do we run it?

We could! But it wouldn't be very fast. Right now your graph is full of **primops**, which are the simplest set of primitive operations in luminal. One of the key tenants of luminal is a small primop set, which makes it easy to add new backends and write compilers for. But another consequence of a small primset is that even simple operations usually end up creating quite a few operations, and even small neural networks can end up with hundreds or thousands of primops, which are slow to run directly. So it's time to compile the graph!

Compilers are structs that implement the `Compiler` trait, which simply specifies a single function:
We use a loose definition of a compiler. Compilers are structs that implement the `Compiler` trait, which simply specifies a single function:
```rust
pub trait Compiler {
type Output = ();
/// Run a compilation pass
fn compile<T: ToIdsMut>(&self, graph: &mut Graph, remap: T);
fn compile<T: ToIdsMut>(&self, graph: &mut Graph, remap: T) -> Self::Output;
}
```
So all a compiler does is take a mutable reference to the graph, something called remap (beyond the scope of this introduction), and does something to the graph. That something is compilation, usually in the form of finding patterns of nodes and replacing them with other nodes. For instance, there's no Subtract operation in the primops, so subtractions are implemented as `add(a, mul(b, -1))`. We can have a compiler that looks for that pattern of nodes and directly replaces it with a `Subtract` operation. We'll look at how to do this in the [Writing Compilers](https://github.com/jafioti/luminal/blob/main/docs/06%20Writing%20Compilers.md) section.
So all a compiler does is take a mutable reference to the graph, something called remap (beyond the scope of this introduction), and does something to the graph. That something is compilation, usually in the form of finding patterns of nodes and replacing them with other nodes. For instance, there's no Subtract operation in the primops, so subtractions are implemented as `add(a, mul(b, -1))`. We can have a compiler that looks for that pattern of nodes and directly replaces it with a `Subtract` operation. We'll look at how to do this in the [Writing Compilers](/developers/compilers) section.

All you need to know for now is that we can use this compiler on the graph by doing:
```rust
cx.compile(SubtractionCompiler::default());
```
Now the graph will have the old mul + add pattern removed and Subtract ops placed in. There are plenty of different compilers for different purposes. Some of the popular ones:
- GenericCompiler - A handful of hardware-agnostic optimizations like [CSE](https://en.wikipedia.org/wiki/Common_subexpression_elimination) to be ran before any hardware-specific compilers.
- CudaCompiler<T> - The full stack of cuda compilers to convert a graph to a cuda-specialized graph with T as the datatype (either f32 or f16). Imported from luminal_cuda
- MetalCompiler<T> - Same as CudaCompiler. Imported from luminal_metal
- CudaCompiler\<T\> - The full stack of cuda compilers to convert a graph to a cuda-specialized graph with T as the datatype (either f32 or f16). Imported from luminal_cuda.
- MetalCompiler\<T\> - Same as CudaCompiler. Imported from luminal_metal.

Compilers are entirely seperate from luminal, so they can be fully implemented by third party crates. For instance, everything specific to Cuda is contained in luminal_cuda.

[Now let's look into how to load weights from a file.](https://github.com/jafioti/luminal/blob/main/docs/05%20Serialization.md)
Compilers are entirely seperate from luminal, so they can be fully implemented by third party crates. For instance, everything specific to Cuda is contained in luminal_cuda.
27 changes: 25 additions & 2 deletions docs/01 Introduction.md → docs/docs/graphtensor.mdx
Original file line number Diff line number Diff line change
@@ -1,5 +1,10 @@
# Luminal Introduction
---
title: GraphTensor API
description: 'The high-level interface for writing ML code, checked at compile time.'
icon: 'webhook'
---

## Familiarizing ourselves
Let's get up to speed with how to use luminal, and how it works internally.

First we'll take a look at what the simplest program will look like:
Expand Down Expand Up @@ -35,4 +40,22 @@ Then we set the data for these tensors. But if `GraphTensor` doesn't hold data,

Alright, that was a lot but now we've touched on all the main aspects of running a model in luminal.

[Let's take a look at each piece in more depth.](https://github.com/jafioti/luminal/blob/main/docs/02%20GraphTensor%20API.md)
## GraphTensors

We're working with pretty complicated graphs to build our computation on, but we don't want to manually place all the nodes ourselves! So how can we build these static graphs in a nice, familiar way? GraphTensors!

Essentially GraphTensors are pointers to a specific node on the graph, as well as some metadata about the output of that node, such as its shape. We can make a new GraphTensor by doing:
```rust
let mut cx = Graph::new(); // We need a graph to build!
let a: GraphTensor<R1<3>> = cx.tensor(); // Here we create a new node on the graph and get a GraphTensor back, pointing to it.
```
Notice the type of `a`: `GraphTensor<R1<3>>`. So what's that generic all about? It's the shape! We make tensor shapes part of the type, so they're tracked at compile time! In this case, the shape is rank 1, with 3 elements, or in other words, a vector of 3 dimensions. (Side note: `R1<N>` is a typedef of `(Const<N>,)`) It should be impossible to accidentally get a runtime shape mismatch.

Now we can use the `a` as you would in a library like PyTorch, performing linear algebra:
```rust
let b = a.exp().sqrt();
let c = b + a;
```
We just placed some ops on the graph! It doesn't look like it because you don't need to think about the graph while writing ML code.

Next we'll see how GraphTensors are used to build whole neural networks.
67 changes: 67 additions & 0 deletions docs/docs/introduction.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,67 @@
---
title: Introduction
description: 'Welcome to a new way to do ML.'
icon: 'hand-wave'
---

<img
className="block dark:hidden rounded-xl"
src="/images/abstract_light.jpg"
alt="Hero Light"
/>
<img
className="hidden dark:block rounded-xl"
src="/images/abstract.jpg"
alt="Hero Dark"
/>

Luminal is a new machine learning framework focused on **speed**, **simplicity** and **composability**. We take a new approach to ML by focusing on static graphs and leaning heavily on compilers.

## Contents

Navigate around the Luminal docs.

<CardGroup cols={2}>
<Card
title="Quickstart"
icon="bolt"
href="/docs/quickstart"
>
Get up and running ML models in a flash.
</Card>
<Card
title="Why Luminal"
icon="lightbulb"
href="/docs/why"
>
Dive into why Luminal was created and the design philosophy behind it.
</Card>
<Card
title="GraphTensor API"
icon="webhook"
href="/docs/graphtensor"
>
High-level interface for building models.
</Card>
<Card
title="Modules"
icon="shapes"
href="/docs/modules"
>
Composable building blocks of complex neural networks.
</Card>
<Card
title="Compilers"
icon="microchip"
href="/docs/compilers"
>
Core transformations of the computation graph.
</Card>
<Card
title="Developers"
icon="code"
href="/docs/developers"
>
Resources for contributors and future development.
</Card>
</CardGroup>
11 changes: 7 additions & 4 deletions docs/03 Modules.md → docs/docs/modules.mdx
Original file line number Diff line number Diff line change
@@ -1,4 +1,9 @@
# NN Modules
---
title: Modules
description: 'Composable building blocks of complex neural networks.'
icon: 'shapes'
---

Like any good DL library, we organize our networks into `Module`s. Here is the module trait:
```rust
/// A module with a forward pass
Expand Down Expand Up @@ -26,6 +31,4 @@ impl<const A: usize, const B: usize> Module<GraphTensor<R1<A>>> for Linear<A, B>
```
Here we see a single weight matrix as the internal state, of size AxB. We've written a single forward function for single input vectors of shape (A,) and matmul it by our weight matrix to get an output of shape (B,).

Now all of these ops are recorded on the graph, to be compiled and ran later on.

[So how does this compilation work? Let's find out!](https://github.com/jafioti/luminal/blob/main/docs/04%20Compilers.md)
Now all of these ops are recorded on the graph, to be compiled and ran later on.
39 changes: 39 additions & 0 deletions docs/docs/quickstart.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,39 @@
---
title: 'Quickstart'
description: 'Start running ML models in minutes.'
icon: 'bolt'
---

## Clone the repo

Clone the codebase locally by running the following:
```bash
git clone https://github.com/jafioti/luminal
cd luminal
```

## Hello World

Simple examples demonstrate how a library works without diving in too deep. Run your first Luminal code like so:
```bash
cd ./examples
cargo run --release
```
Great! You've ran your first Luminal model!

## Run Llama 3

Run the following to start generating text with Llama 3 8B:
```bash
cd ./examples/llama
# Download the model
bash ./setup/setup.sh
# Run the model
cargo run --release --features metal # MacOS (Recommended)
cargo run --release --features cuda # Nvidia
cargo run --release # CPU
```

<Warning>
Luminal currently isn't well optimized for CPU usage, so running large models like Llama 3 on CPU isn't recommended.
</Warning>
Loading

0 comments on commit 5b8e922

Please sign in to comment.