-
Notifications
You must be signed in to change notification settings - Fork 90
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Showing
20 changed files
with
298 additions
and
42 deletions.
There are no files selected for viewing
This file was deleted.
Oops, something went wrong.
This file was deleted.
Oops, something went wrong.
This file was deleted.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,4 @@ | ||
``` | ||
npm i -g mintlify | ||
mintlify dev | ||
``` |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,48 @@ | ||
--- | ||
title: 'Luminal: Efficient ML in Rust through graph compilation' | ||
description: 'A new approach to ML' | ||
--- | ||
 | ||
|
||
**Luminal is a deep learning library that uses composable compilers to achieve high performance.** | ||
|
||
Current ML libraries tend to be large and complex because they try to map high level operations directly on to low level handwritten kernels, and focus on eager execution. Libraries like PyTorch contain hundreds of thousands of lines of code, making it nearly impossible for a single programmer to understand it all, set aside do a large refactor. | ||
|
||
But does it need to be so complex? ML models tend to be static dataflow graphs made up of a few simple operators. This allows us to have a dirt simple core only supporting a few primitive operations, and use them to build up complex neural networks. We can then write compilers that modify the graph after we build it, to swap more efficient ops back in depending on which backend we're running on. | ||
|
||
Luminal takes this approach to the extreme, supporting only 11 primitive operations (primops): | ||
- **Unary** - Log2, Exp2, Sin, Sqrt, Recip | ||
- **Binary** - Add, Mul, Mod, LessThan | ||
- **Other** - SumReduce, MaxReduce, Contiguous | ||
|
||
Every complex operation boils down to these primitive operations, so when you do `a - b` for instance, `add(a, mul(b, -1))` gets written to the graph. Or when you do `a.matmul(b)`, what actually gets put on the graph is `sum_reduce(mul(reshape(a), reshape(b)))`. | ||
|
||
Once the graph is built, iterative compiler passes can modify it to replace primops with more efficient ops, depending on the device it's running on. On Nvidia cards, for instance, efficient Cuda kernels are written on the fly to replace these ops, and specialized cublas kernels are swapped in for supported operations. | ||
|
||
This approach leads to a simple library, and performance is only limited by the creativity of the compiler programmer, not the model programmer. | ||
|
||
Luminal has a number of other neat features, check out the repo [here](https://github.com/jafioti/luminal). | ||
|
||
## Welcome | ||
|
||
There are two ways to build API documentation: [OpenAPI](https://mintlify.com/docs/api-playground/openapi/setup) and [MDX components](https://mintlify.com/docs/api-playground/mdx/configuration). For the starter kit, we are using the following OpenAPI specification. | ||
|
||
<Card | ||
title="Plant Store Endpoints" | ||
icon="leaf" | ||
href="https://github.com/mintlify/starter/blob/main/api-reference/openapi.json" | ||
> | ||
View the OpenAPI specification file | ||
</Card> | ||
|
||
## Authentication | ||
|
||
All API endpoints are authenticated using Bearer tokens and picked up from the specification file. | ||
|
||
```json | ||
"security": [ | ||
{ | ||
"bearerAuth": [] | ||
} | ||
] | ||
``` |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,4 @@ | ||
--- | ||
title: 'Create Plant' | ||
openapi: 'POST /plants' | ||
--- |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,4 @@ | ||
--- | ||
title: 'Delete Plant' | ||
openapi: 'DELETE /plants/{id}' | ||
--- |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,4 @@ | ||
--- | ||
title: 'Get Plants' | ||
openapi: 'GET /plants' | ||
--- |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,10 +1,22 @@ | ||
# Contributing to luminal | ||
 | ||
--- | ||
title: Developing Luminal | ||
description: 'Building the future of ML.' | ||
icon: 'hand-wave' | ||
--- | ||
|
||
Please take a look at the [issues](https://github.com/jafioti/luminal/issues) and [roadmap](https://github.com/users/jafioti/projects/1) to see what's targeted for upcoming releases. Contributions for those features are preferred and will be reviewed and merged very rapidly. Other contributions are welcome, but please note luminal is and always will be a fairly minimal library. | ||
<img | ||
className="block dark:hidden rounded-xl" | ||
src="/images/abstract_light.jpg" | ||
alt="Hero Light" | ||
/> | ||
<img | ||
className="hidden dark:block rounded-xl" | ||
src="/images/abstract.jpg" | ||
alt="Hero Dark" | ||
/> | ||
|
||
The core design of luminal is heavily predicated on extensibility. Compilers alow for immense complexity to be removed from the core library and added with third party compilers. For instance, datatypes and devices are typically first class primitives. In luminal, they're compilers and the core has no idea about them. This is the general trend we'll stick to: core remains brutally simple, and everything that can be externalized to a compiler will be. | ||
Please take a look at the [issues](https://github.com/jafioti/luminal/issues) and [roadmap](https://github.com/users/jafioti/projects/1) to see what's targeted for upcoming releases. Contributions for those features are preferred and will be reviewed and merged very rapidly. Other contributions are welcome, but please note Luminal is and always will be a fairly minimal library. | ||
|
||
We will be adding training support soon, and as you guessed, it will entirely reside in a compiler. Just define the model's graph, run the output through an optimizer, and then run the `AutogradCompiler` before any other compilers. Boom, we got training, and the core of the library has no idea! (aside from some quality of life apis) | ||
The core design of Luminal is heavily predicated on extensibility. Compilers alow for immense complexity to be removed from the core library and added with third party compilers. For instance, datatypes and devices are typically first class primitives. In Luminal, they're compilers and the core has no idea about them. This is the general trend we'll stick to: core remains brutally simple, and everything that can be externalized to a compiler will be. | ||
|
||
PRs that remove complexity are always welcome, but note that line count often is a bad proxy for complexity. Ideally the entire luminal core should be a few thousand lines of code, but anything remotely resembling code golf is not allowed. | ||
PRs that remove complexity are always welcome, but note that line count often is a bad proxy for complexity. Ideally the entire Luminal core should be a few thousand lines of code, but anything remotely resembling code golf is not allowed. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,27 +1,30 @@ | ||
# Compilers | ||
--- | ||
title: Compilers | ||
description: 'Core transformations of the computation graph.' | ||
icon: 'microchip' | ||
--- | ||
|
||
So now we have our graph all set up. We did our forward passes through the model, so now what? Do we run it? | ||
|
||
We could! But it wouldn't be very fast. Right now your graph is full of **primops**, which are the simplest set of primitive operations in luminal. One of the key tenants of luminal is a small primop set, which makes it easy to add new backends and write compilers for. But another consequence of a small primset is that even simple operations usually end up creating quite a few operations, and even small neural networks can end up with hundreds or thousands of primops, which are slow to run directly. So it's time to compile the graph! | ||
|
||
Compilers are structs that implement the `Compiler` trait, which simply specifies a single function: | ||
We use a loose definition of a compiler. Compilers are structs that implement the `Compiler` trait, which simply specifies a single function: | ||
```rust | ||
pub trait Compiler { | ||
type Output = (); | ||
/// Run a compilation pass | ||
fn compile<T: ToIdsMut>(&self, graph: &mut Graph, remap: T); | ||
fn compile<T: ToIdsMut>(&self, graph: &mut Graph, remap: T) -> Self::Output; | ||
} | ||
``` | ||
So all a compiler does is take a mutable reference to the graph, something called remap (beyond the scope of this introduction), and does something to the graph. That something is compilation, usually in the form of finding patterns of nodes and replacing them with other nodes. For instance, there's no Subtract operation in the primops, so subtractions are implemented as `add(a, mul(b, -1))`. We can have a compiler that looks for that pattern of nodes and directly replaces it with a `Subtract` operation. We'll look at how to do this in the [Writing Compilers](https://github.com/jafioti/luminal/blob/main/docs/06%20Writing%20Compilers.md) section. | ||
So all a compiler does is take a mutable reference to the graph, something called remap (beyond the scope of this introduction), and does something to the graph. That something is compilation, usually in the form of finding patterns of nodes and replacing them with other nodes. For instance, there's no Subtract operation in the primops, so subtractions are implemented as `add(a, mul(b, -1))`. We can have a compiler that looks for that pattern of nodes and directly replaces it with a `Subtract` operation. We'll look at how to do this in the [Writing Compilers](/developers/compilers) section. | ||
|
||
All you need to know for now is that we can use this compiler on the graph by doing: | ||
```rust | ||
cx.compile(SubtractionCompiler::default()); | ||
``` | ||
Now the graph will have the old mul + add pattern removed and Subtract ops placed in. There are plenty of different compilers for different purposes. Some of the popular ones: | ||
- GenericCompiler - A handful of hardware-agnostic optimizations like [CSE](https://en.wikipedia.org/wiki/Common_subexpression_elimination) to be ran before any hardware-specific compilers. | ||
- CudaCompiler<T> - The full stack of cuda compilers to convert a graph to a cuda-specialized graph with T as the datatype (either f32 or f16). Imported from luminal_cuda | ||
- MetalCompiler<T> - Same as CudaCompiler. Imported from luminal_metal | ||
- CudaCompiler\<T\> - The full stack of cuda compilers to convert a graph to a cuda-specialized graph with T as the datatype (either f32 or f16). Imported from luminal_cuda. | ||
- MetalCompiler\<T\> - Same as CudaCompiler. Imported from luminal_metal. | ||
|
||
Compilers are entirely seperate from luminal, so they can be fully implemented by third party crates. For instance, everything specific to Cuda is contained in luminal_cuda. | ||
|
||
[Now let's look into how to load weights from a file.](https://github.com/jafioti/luminal/blob/main/docs/05%20Serialization.md) | ||
Compilers are entirely seperate from luminal, so they can be fully implemented by third party crates. For instance, everything specific to Cuda is contained in luminal_cuda. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,67 @@ | ||
--- | ||
title: Introduction | ||
description: 'Welcome to a new way to do ML.' | ||
icon: 'hand-wave' | ||
--- | ||
|
||
<img | ||
className="block dark:hidden rounded-xl" | ||
src="/images/abstract_light.jpg" | ||
alt="Hero Light" | ||
/> | ||
<img | ||
className="hidden dark:block rounded-xl" | ||
src="/images/abstract.jpg" | ||
alt="Hero Dark" | ||
/> | ||
|
||
Luminal is a new machine learning framework focused on **speed**, **simplicity** and **composability**. We take a new approach to ML by focusing on static graphs and leaning heavily on compilers. | ||
|
||
## Contents | ||
|
||
Navigate around the Luminal docs. | ||
|
||
<CardGroup cols={2}> | ||
<Card | ||
title="Quickstart" | ||
icon="bolt" | ||
href="/docs/quickstart" | ||
> | ||
Get up and running ML models in a flash. | ||
</Card> | ||
<Card | ||
title="Why Luminal" | ||
icon="lightbulb" | ||
href="/docs/why" | ||
> | ||
Dive into why Luminal was created and the design philosophy behind it. | ||
</Card> | ||
<Card | ||
title="GraphTensor API" | ||
icon="webhook" | ||
href="/docs/graphtensor" | ||
> | ||
High-level interface for building models. | ||
</Card> | ||
<Card | ||
title="Modules" | ||
icon="shapes" | ||
href="/docs/modules" | ||
> | ||
Composable building blocks of complex neural networks. | ||
</Card> | ||
<Card | ||
title="Compilers" | ||
icon="microchip" | ||
href="/docs/compilers" | ||
> | ||
Core transformations of the computation graph. | ||
</Card> | ||
<Card | ||
title="Developers" | ||
icon="code" | ||
href="/docs/developers" | ||
> | ||
Resources for contributors and future development. | ||
</Card> | ||
</CardGroup> |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,39 @@ | ||
--- | ||
title: 'Quickstart' | ||
description: 'Start running ML models in minutes.' | ||
icon: 'bolt' | ||
--- | ||
|
||
## Clone the repo | ||
|
||
Clone the codebase locally by running the following: | ||
```bash | ||
git clone https://github.com/jafioti/luminal | ||
cd luminal | ||
``` | ||
|
||
## Hello World | ||
|
||
Simple examples demonstrate how a library works without diving in too deep. Run your first Luminal code like so: | ||
```bash | ||
cd ./examples | ||
cargo run --release | ||
``` | ||
Great! You've ran your first Luminal model! | ||
|
||
## Run Llama 3 | ||
|
||
Run the following to start generating text with Llama 3 8B: | ||
```bash | ||
cd ./examples/llama | ||
# Download the model | ||
bash ./setup/setup.sh | ||
# Run the model | ||
cargo run --release --features metal # MacOS (Recommended) | ||
cargo run --release --features cuda # Nvidia | ||
cargo run --release # CPU | ||
``` | ||
|
||
<Warning> | ||
Luminal currently isn't well optimized for CPU usage, so running large models like Llama 3 on CPU isn't recommended. | ||
</Warning> |
Oops, something went wrong.