Skip to content

Commit

Permalink
update the examples for SDK version 1.2.0
Browse files Browse the repository at this point in the history
  • Loading branch information
leightonw-cerebras committed Jul 11, 2024
1 parent 66a1eb6 commit b7edf77
Show file tree
Hide file tree
Showing 185 changed files with 9,872 additions and 2,034 deletions.
8 changes: 4 additions & 4 deletions README.rst
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
CSL Examples
============

Register for access to the Cerebras SDK `here <https://www.cerebras.net/homepage-landing/developers/sdk-request/>`_.
Register to access the Cerebras SDK `here <https://www.cerebras.net/homepage-landing/developers/sdk-request/>`_.
Documentation for the SDK can be found `here <https://sdk.cerebras.net>`_.

This repository contains examples of CSL code. Each example has the following
Expand Down Expand Up @@ -35,7 +35,7 @@ This is the place to start.
There are 10 tutorials which teach basic CSL language features and
``SdkRuntime`` host runtime features by building up an increasingly
complex code to compute a GEMV.
There are an additional 12 tutorial examples which illustrate
There are an additional 15 tutorial examples which illustrate
specific language features,
and 3 tutorial examples which build an increasingly complex
pipelined computation.
Expand Down Expand Up @@ -95,9 +95,9 @@ Branches

For each release of the SDK, there is a corresponding release tag in this
repository which contains a version of the CSL examples which are compatible
with that SDK release. For example, the tag ``rel-sdk-1.1.0`` in this
with that SDK release. For example, the tag ``rel-sdk-1.2.0`` in this
repository contains a version of the CSL examples which will work (compile and
simulate) with the SDK 1.1.0 release. The ``master`` branch is identical to the
simulate) with the SDK 1.2.0 release. The ``master`` branch is identical to the
newest release.

Full backward compatibility of the SDK is not guaranteed.
Expand Down
19 changes: 19 additions & 0 deletions RELEASE-NOTES.rst
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,25 @@ Release Notes
The following are the release notes for the CSL Examples repository,
``csl-examples``.

Version 1.2.0
-------------

- The examples are improved and updated to comply with the SDK version 1.2.0.

- All tutorial example programs have been updated to support WSE-3.

- Two new example programs for switches, demonstrating use of the
``<control>`` library, have been added.

- A new example program demonstrating the ``<simprint>`` library has been
added.

- ``wide-multiplication``, ``residual``, ``mandelbrot``,
``gemv-collectives_2d``, ``gemv-checkerboard-pattern``,
``gemm-collectives_2d``, ``stencil-3d-7pts``, ``bicgstab``,
``conjugateGradient``, ``preconditionedConjugateGradient``, and
``powerMethod`` programs have been updated to support WSE-3.

Version 1.1.0
-------------

Expand Down
66 changes: 66 additions & 0 deletions benchmarks/25-pt-stencil/README.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,66 @@
25-Point Stencil
================

The stencil code is a time-marching app, requiring the following three inputs:

- scalar ``iterations``: number of time steps
- tensor ``vp``: velocity field
- tensor ``source``: source term

and producing the following three outputs:

- maximum and minimum value of vector field of last time step, two f32 per PE
- timestamps of the time-marching per PE, three uint32 per PE
- vector field ``z`` of last time step, ``zdim`` f32 per PE

The stencil code uses 21 colors and task IDs for communication patterns,
and ``SdkRuntime`` reserves 6 colors,
so only 4 colors are left for ``streaming`` H2D/D2H transfers
and some entrypoints for control flow.
We use one color (color 0) to launch kernel functions
and one entrypoint (color 2) to trigger the time marching.
The ``copy`` mode of memcpy is used for two inputs and two outputs.

After the simulator (or WSE) has been launched,
we send input tensors ``vp`` and ``source`` to the device via ``copy`` mode.

Second, we launch time marching with the argument ``iterations``.

In this example, we have two kernel launches.
One performs time marching after ``vp`` and ``source`` are received,
and the other prepares the output data ``zValues``.
The former has the function symbol ``f_activate_comp``
and the latter has the function symbol ``f_prepare_zout``.
Here ``SdkRuntime.launch()`` triggers a host-callable function, in which
the first argument is the function symbol ``f_activate_comp``,
and the second argument is ``iterations``,
which is received as an argument by ``f_activate_comp``.

The end of time marching (``f_checkpoint()`` in ``task.csl``)
will record the maximum and minimum value
of the vector field and timing info into an array ``d2h_buf_f32``.
The host calls ``memcpy_d2h()`` to receive the data in ``d2h_buf_f32``.

To receive the vector field of the last time step,
the function ``f_prepare_zout()`` is called by ``SdkRuntime.launch()``
to prepare this data into a temporary array ``zout``,
because the result is in either ``zValues[0, :]`` or ``zValues[1, :]``.

The last operation, ``memcpy_d2h()``, sends the array ``zout`` back to the host.

When ``f_activate_comp`` is launched, it triggers the entrypoint ``f_comp()``
to start the time-marching and to record the starting time.

At the end of time marching, the function ``epilog()`` checks
``iterationCount``.
If it reaches the given ``iterations``, ``epilog()`` triggers the entrypoint
``CHECKPOINT`` to prepare the data for the first ``memcpy_d2h()``.

The function ``f_checkpoint()`` calls ``unblock_cmd_stream()`` to process the
next operation which is the first ``memcpy_d2h()``.
Without ``unblock_cmd_stream()``, the program stalls because the
``memcpy_d2h()`` is never scheduled.

The function ``f_prepare_zout()`` prepares the vector field into ``zout``.
It also calls ``unblock_cmd_stream()`` to process the next operation, which is
the second ``memcpy_d2h()``.
112 changes: 112 additions & 0 deletions benchmarks/25-pt-stencil/cmd_parser.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,112 @@
# Copyright 2024 Cerebras Systems.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

# This is not a real test, but a module that gets imported in other tests.

"""parse command line for sparse level routines
-m <int> number of rows of the matrix A
-n <int> number of columns of the matrix A
--local_out_sz <int> dimension of submatrix in tile approach,
or number of rows in non-tile approach
--eps tolerance
--latestlink working directory
--debug show A, x, and b
--sdkgui prepare data fro sdk gui, including source code
--driver path to CSL compiler
--autocsl use get_cslang_dir to find out the path of CSL
"""


import argparse


SIZE = 10
ZDIM = 10
ITERATIONS = 10
DX = 20


def parse_args():
parser = argparse.ArgumentParser()

parser.add_argument('--name', help='the test name')
parser.add_argument(
'--zDim', type=int, help='size of the Z dimension', default=ZDIM
)
parser.add_argument(
'--size', type=int, help='size of the domain in x and y dims', default=SIZE
)

parser.add_argument(
'--skip-compile', action="store_true",
help='Skip compilation of the code from python'
)

parser.add_argument(
'--skip-run', action="store_true",
help='Skip run of the code from python'
)

parser.add_argument(
'--iterations',
type=int,
help='number of timesteps to simulate',
default=ITERATIONS
)

parser.add_argument(
'--dx',
type=int,
help='dx value (impacting the boundary)', default=DX
)

parser.add_argument(
'--fabric_width',
type=int,
help='Width of the fabric we are compiling for',
)

parser.add_argument(
'--fabric_height',
type=int,
help='Height of the fabric we are compiling for',
)

parser.add_argument('--cmaddr', help='IP:port for CS system')

parser.add_argument(
"--debug",
help="show A, x, and b", action="store_true"
)

parser.add_argument(
"--width-west-buf",
default=0, type=int,
help="width of west buffer")
parser.add_argument(
"--width-east-buf",
default=0, type=int,
help="width of east buffer")
parser.add_argument(
"--n_channels",
default=1, type=int,
help="Number of memcpy \"channels\" (LVDS/streamers for both input and output) to use \
when memcpy support is compiled with this program. If this argument is not present, \
or is 0, then the previous single-LVDS version is compiled.")

args = parser.parse_args()

return args
10 changes: 10 additions & 0 deletions benchmarks/25-pt-stencil/commands.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
#!/usr/bin/env bash

set -e

cslc ./layout.csl --arch=wse2 --fabric-dims=17,12 --fabric-offsets=4,1 \
-o=out_code --params=width:10,height:10,zDim:10,sourceLength:10,dx:20 \
--params=srcX:0,srcY:0,srcZ:0 --verbose --memcpy --channels=1 \
--width-west-buf=0 --width-east-buf=0
cs_python run.py --name out \
--iterations=10 --dx=20 --skip-compile
107 changes: 107 additions & 0 deletions benchmarks/25-pt-stencil/consts.csl
Original file line number Diff line number Diff line change
@@ -0,0 +1,107 @@
// Copyright 2024 Cerebras Systems.
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.

param pattern: u16;
param paddedZDim: u16;

const math = @import_module("<math>");
// We need to allocate space for not just the (padded) size of the problem (in
// the Z dimension), but also space for ghost cells.
const zBufferSize = paddedZDim + 2 * (pattern - 1);

fn initBuffer() [2, zBufferSize]f32 {
return @zeros([2, zBufferSize]f32);
}

// Minimig - main.c:15-23, target_3d.c:23, and target_3d.c:30
fn computeMinimigConsts(dx: u16) [9]f32 {
@comptime_assert(pattern == 5);
const dx2:f32 = @as(f32, dx * dx);
const c0:f32 = -205.0 / 72.0 / dx2;
const c1:f32 = 8.0 / 5.0 / dx2;
const c2:f32 = -1.0 / 5.0 / dx2;
const c3:f32 = 8.0 / 315.0 / dx2;
const c4:f32 = -1.0 / 560.0 / dx2;

return [9]f32 {
c4,
c3,
c2,
c1,
c0 * 3.0,
c1,
c2,
c3,
c4,
};
}

// `computeMinimigConsts()` computes constants in both the positive as well as
// negative direction of the X, Y, and Z dimensions. However, for any given
// axis, our implementation splits communication and computation into two, one
// for the positive direction and another for the negative direction. This
// function extracts the first half of the constants, and optionally includes
// the center element.
fn fetchFirstHalfConsts(consts: [2 * pattern - 1]f32, self: bool) [pattern]f32 {
var idx: u16 = 0;
var result = @zeros([pattern]f32);

if (!self) {
idx += 1;
}

while (idx < pattern) : (idx += 1) {
result[idx] = consts[pattern - idx - 1];
}

return result;
}

fn fetchSecondHalfConsts(consts: [2 * pattern - 1]f32, self: bool) [pattern]f32 {
var idx: u16 = 0;
var result = @zeros([pattern]f32);

if (!self) {
idx += 1;
}

while (idx < pattern) : (idx += 1) {
result[idx] = consts[pattern + idx - 1];
}

return result;
}

// The sequence in which each PE receives wavetlets from its neighbors depends
// on the relative placement of the PE within each group of `pattern` PEs. This
// function reorders the constants to match the sequence of source PE IDs so
// that we multiply the incoming data with the right constants.
fn permuteConsts(pattId: u16, originalConsts: [pattern]f32) [pattern]f32 {
const start = pattId;
var result = @zeros([pattern]f32);

var idx: u16 = 0;
while (idx < pattern) : (idx += 1) {
var value: f32 = 0.0;
if (start < idx) {
value = originalConsts[(start + pattern) - idx];
} else {
value = originalConsts[start - idx];
}

result[idx] = value;
}

return result;
}
Loading

0 comments on commit b7edf77

Please sign in to comment.