Skip to content

Commit

Permalink
docs(hydro_lang): initial website docs on core Hydro concepts
Browse files Browse the repository at this point in the history
  • Loading branch information
shadaj committed Jan 23, 2025
1 parent 496ac2b commit 1846301
Show file tree
Hide file tree
Showing 22 changed files with 148 additions and 12 deletions.
1 change: 1 addition & 0 deletions Cargo.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

16 changes: 16 additions & 0 deletions docs/docs/hydro/correctness.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
---
sidebar_position: 3
---

# Safety and Correctness
Just like Rust's type system helps you avoid memory safety bugs, Hydro helps you ensure **distributed safety**. Hydro's type systems helps you avoid many kinds of distributed systems bugs, including:
- Non-determinism due to message delays (which reorder arrival) or retries (which result in duplicates)
- See [Live Collections / Consistency and Safety](./live-collections/consistency.md)
- Using mismatched serialization and deserialization formats across services
- See [Locations and Networking](./locations/index.md)
- Misusing node identifiers across logically independent clusters of machines
- See [Locations / Clusters](./locations/clusters.md)
- Relying on non-determinstic clocks for batching events
- See [Ticks and Timestamps / Batching and Emitting Streams](./ticks-timestamps/batching-and-emitting.md)

These safety guarantees are surfaced through the Rust type system, so you can catch these bugs at compile time rather than in production. And when it is necessary to bypass these checks for advanced distributed logic, you can use the same `unsafe` keyword as in Rust as an escape hatch.
8 changes: 8 additions & 0 deletions docs/docs/hydro/live-collections/_category_.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
{
"label": "Live Collections",
"position": 4,
"link": {
"type": "doc",
"id": "hydro/live-collections/index"
}
}
27 changes: 27 additions & 0 deletions docs/docs/hydro/live-collections/bounded-unbounded.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
---
sidebar_position: 0
---

# Bounded and Unbounded Types
Although live collections can be continually updated, some collection types also support **termination**, after which no additional changes can be made. For example, a live collection created by reading integers from an in-memory `Vec` will become terminated once all the elements of the `Vec` have been loaded. But other live collections, such as one being updated by the network, may never become terminated.

In Hydro, certain APIs are restricted to only work on collections that are **guaranteed to terminate** (**bounded** collections). All live collections in Hydro have a type parameter (typically named `B`), which tracks whether the collection is bounded (has the type `Bounded`) or unbounded (has the type `Unbounded`). These types are used in the signature of many Hydro APIs to ensure that the API is only called on the appropriate type of collection.

## Converting Boundedness
In some cases, you may need to convert between bounded and unbounded collections. Converting from a bounded collection **to an unbounded collection** is always allowed and safe, since it relaxes the guarantees on the collection. This can be done by calling `.into()` on the collection.

```rust,no_run
# use hydro_lang::*;
# use dfir_rs::futures::StreamExt;
fn my_unbounded_transformation<'a, L: Location<'a>>(stream: Stream<usize, L, Unbounded>) -> Stream<usize, L, Unbounded> {
stream.map(q!(|x| x + 1))
}
# let flow = FlowBuilder::new();
# let process = flow.process::<()>();
# let tick = process.tick();
# let numbers = process.source_iter(q!(vec![1, 2, 3, 4]));
# let batch = unsafe { numbers.timestamped(&tick).tick_batch() };
// assume batch is a bounded collection
let unbounded = my_unbounded_transformation(batch.into());
```
Original file line number Diff line number Diff line change
@@ -1,22 +1,17 @@
---
sidebar_position: 3
sidebar_position: 1
---

# Consistency and Safety
A key feature of Hydro is its integration with the Rust type system to highlight possible sources of inconsistent distributed behavior due to sources of non-determinism such as batching, timeouts, and message reordering. In this section, we'll walk through the consistency guarantees in Hydro and how to use the **`unsafe`** keyword as an escape hatch when introducing sources of non-determinism.
# Eventual Determinism
Most programs are strong guarantees on **determinism**, the property that when provided the same inputs, the outputs of the program are always the same. Even when the inputs and outputs are live collections, we can focus on the _eventual_ state of the collection (as if we froze the input and waited until the output stops changing).

:::info

Our consistency and safety model is based on the POPL'25 paper [Flo: A Semantic Foundation for Progressive Stream Processing](https://arxiv.org/abs/2411.08274), which covers the formal details and proofs underlying this system.

:::

## Eventual Determinism
Hydro provides strong guarantees on **determinism**, the property that when provided the same inputs, the outputs of the program are always the same. Even when the inputs and outputs are streaming, we can use this property by looking at the **aggregate collection** (i.e. the result of collecting the elements of the stream into a finite collection). This makes it easy to build composable blocks of code without having to worry about runtime behavior such as batching or network delays.

Because Hydro programs can involve network delay, we guarantee **eventual determinism**: given a set of streaming inputs which have arrived, the outputs of the program (which continuously change as inputs arrive) will **eventually** have the same _aggregate_ value.

Again, by focusing on the _aggregate_ value rather than individual outputs, Hydro programs can involve concepts such as retractions (for incremental computation) while still guaranteeing determinism because the _resolved_ output (after processing retractions) will eventually be the same.
Hydro thus guarantees **eventual determinism**: given a set of streaming inputs which have arrived, the outputs of the program will **eventually** have the same _final_ value. This makes it easy to build composable blocks of code without having to worry about runtime behavior such as batching or network delays.

:::note

Expand Down Expand Up @@ -54,9 +49,9 @@ use std::fmt::Debug;
use std::time::Duration;

/// ...
///
///
/// # Safety
/// This function will non-deterministically print elements
/// This function will non-deterministically print elements
/// from the stream according to a timer.
unsafe fn print_samples<T: Debug, L>(
stream: Stream<T, Process<L>, Unbounded>
Expand Down
4 changes: 4 additions & 0 deletions docs/docs/hydro/live-collections/index.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
# Live Collections
Traditional programs (like those in Rust) typically manipulate **collections** of data elements, such as those stored in a `Vec` or `HashMap`. These collections are **fixed** in the sense that any transformations applied to them such as `map` are immediately executed on a snapshot of the collection. This means that the output will not be updated when the input collection is modified.

In Hydro, programs instead work with **live collections** which are expected to dynamically change over time as new elements are added or removed (in response to API requests, streaming ingestion, etc). Applying a transformation like `map` to a live collection results in another live collection that will dynamically change over time.
6 changes: 6 additions & 0 deletions docs/docs/hydro/live-collections/singletons-optionals.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
---
sidebar_position: 3
---

# Singletons and Optionals
TODO
6 changes: 6 additions & 0 deletions docs/docs/hydro/live-collections/streams.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
---
sidebar_position: 2
---

# Streams
TODO
8 changes: 8 additions & 0 deletions docs/docs/hydro/locations/_category_.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
{
"label": "Locations and Networking",
"position": 5,
"link": {
"type": "doc",
"id": "hydro/locations/index"
}
}
6 changes: 6 additions & 0 deletions docs/docs/hydro/locations/clusters.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
---
sidebar_position: 1
---

# Clusters
TODO
6 changes: 6 additions & 0 deletions docs/docs/hydro/locations/external-clients.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
---
sidebar_position: 2
---

# External Clients
TODO
2 changes: 2 additions & 0 deletions docs/docs/hydro/locations/index.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
# Locations and Networking
TODO
6 changes: 6 additions & 0 deletions docs/docs/hydro/locations/processes.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
---
sidebar_position: 0
---

# Processes
TODO
2 changes: 1 addition & 1 deletion docs/docs/hydro/stageleft.mdx
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
---
title: Stageleft
sidebar_position: 4
sidebar_position: 7
---

import StageleftDocs from '../../../stageleft/README.md'
Expand Down
8 changes: 8 additions & 0 deletions docs/docs/hydro/ticks-timestamps/_category_.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
{
"label": "Ticks and Timestamps",
"position": 6,
"link": {
"type": "doc",
"id": "hydro/ticks-timestamps/index"
}
}
6 changes: 6 additions & 0 deletions docs/docs/hydro/ticks-timestamps/atomicity.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
---
sidebar_position: 3
---

# Atomicity with Timestamps
TODO
6 changes: 6 additions & 0 deletions docs/docs/hydro/ticks-timestamps/batching-and-emitting.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
---
sidebar_position: 1
---

# Batching and Emitting Streams
TODO
6 changes: 6 additions & 0 deletions docs/docs/hydro/ticks-timestamps/execution-model.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
---
sidebar_position: 0
---

# The Tick Execution Model
TODO
2 changes: 2 additions & 0 deletions docs/docs/hydro/ticks-timestamps/index.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
# Ticks and Timestamps
TODO
6 changes: 6 additions & 0 deletions docs/docs/hydro/ticks-timestamps/stateful-loops.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
---
sidebar_position: 2
---

# Stateful Loops
TODO
10 changes: 10 additions & 0 deletions hydro_lang/src/stream.rs
Original file line number Diff line number Diff line change
Expand Up @@ -73,6 +73,16 @@ pub struct Stream<T, L, B, Order = TotalOrder> {
_phantom: PhantomData<(T, L, B, Order)>,
}

impl<'a, T, L: Location<'a>, O> From<Stream<T, L, Bounded, O>> for Stream<T, L, Unbounded, O> {
fn from(stream: Stream<T, L, Bounded, O>) -> Stream<T, L, Unbounded, O> {
Stream {
location: stream.location,
ir_node: stream.ir_node,
_phantom: PhantomData,
}
}
}

impl<'a, T, L: Location<'a>, B> From<Stream<T, L, B, TotalOrder>> for Stream<T, L, B, NoOrder> {
fn from(stream: Stream<T, L, B, TotalOrder>) -> Stream<T, L, B, NoOrder> {
Stream {
Expand Down
1 change: 1 addition & 0 deletions hydro_test/Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -31,3 +31,4 @@ hydro_lang = { path = "../hydro_lang", version = "^0.11.0", features = [ "deploy
futures = "0.3.0"
async-ssh2-lite = { version = "0.5.0", features = ["vendored-openssl"] }
dfir_macro = { path = "../dfir_macro", version = "^0.11.0" }
tokio-test = "0.4.4"

0 comments on commit 1846301

Please sign in to comment.