-
Notifications
You must be signed in to change notification settings - Fork 5
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
update the examples for SDK version 1.2.0
- Loading branch information
1 parent
66a1eb6
commit b7edf77
Showing
185 changed files
with
9,872 additions
and
2,034 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,66 @@ | ||
25-Point Stencil | ||
================ | ||
|
||
The stencil code is a time-marching app, requiring the following three inputs: | ||
|
||
- scalar ``iterations``: number of time steps | ||
- tensor ``vp``: velocity field | ||
- tensor ``source``: source term | ||
|
||
and producing the following three outputs: | ||
|
||
- maximum and minimum value of vector field of last time step, two f32 per PE | ||
- timestamps of the time-marching per PE, three uint32 per PE | ||
- vector field ``z`` of last time step, ``zdim`` f32 per PE | ||
|
||
The stencil code uses 21 colors and task IDs for communication patterns, | ||
and ``SdkRuntime`` reserves 6 colors, | ||
so only 4 colors are left for ``streaming`` H2D/D2H transfers | ||
and some entrypoints for control flow. | ||
We use one color (color 0) to launch kernel functions | ||
and one entrypoint (color 2) to trigger the time marching. | ||
The ``copy`` mode of memcpy is used for two inputs and two outputs. | ||
|
||
After the simulator (or WSE) has been launched, | ||
we send input tensors ``vp`` and ``source`` to the device via ``copy`` mode. | ||
|
||
Second, we launch time marching with the argument ``iterations``. | ||
|
||
In this example, we have two kernel launches. | ||
One performs time marching after ``vp`` and ``source`` are received, | ||
and the other prepares the output data ``zValues``. | ||
The former has the function symbol ``f_activate_comp`` | ||
and the latter has the function symbol ``f_prepare_zout``. | ||
Here ``SdkRuntime.launch()`` triggers a host-callable function, in which | ||
the first argument is the function symbol ``f_activate_comp``, | ||
and the second argument is ``iterations``, | ||
which is received as an argument by ``f_activate_comp``. | ||
|
||
The end of time marching (``f_checkpoint()`` in ``task.csl``) | ||
will record the maximum and minimum value | ||
of the vector field and timing info into an array ``d2h_buf_f32``. | ||
The host calls ``memcpy_d2h()`` to receive the data in ``d2h_buf_f32``. | ||
|
||
To receive the vector field of the last time step, | ||
the function ``f_prepare_zout()`` is called by ``SdkRuntime.launch()`` | ||
to prepare this data into a temporary array ``zout``, | ||
because the result is in either ``zValues[0, :]`` or ``zValues[1, :]``. | ||
|
||
The last operation, ``memcpy_d2h()``, sends the array ``zout`` back to the host. | ||
|
||
When ``f_activate_comp`` is launched, it triggers the entrypoint ``f_comp()`` | ||
to start the time-marching and to record the starting time. | ||
|
||
At the end of time marching, the function ``epilog()`` checks | ||
``iterationCount``. | ||
If it reaches the given ``iterations``, ``epilog()`` triggers the entrypoint | ||
``CHECKPOINT`` to prepare the data for the first ``memcpy_d2h()``. | ||
|
||
The function ``f_checkpoint()`` calls ``unblock_cmd_stream()`` to process the | ||
next operation which is the first ``memcpy_d2h()``. | ||
Without ``unblock_cmd_stream()``, the program stalls because the | ||
``memcpy_d2h()`` is never scheduled. | ||
|
||
The function ``f_prepare_zout()`` prepares the vector field into ``zout``. | ||
It also calls ``unblock_cmd_stream()`` to process the next operation, which is | ||
the second ``memcpy_d2h()``. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,112 @@ | ||
# Copyright 2024 Cerebras Systems. | ||
# | ||
# Licensed under the Apache License, Version 2.0 (the "License"); | ||
# you may not use this file except in compliance with the License. | ||
# You may obtain a copy of the License at | ||
# | ||
# http://www.apache.org/licenses/LICENSE-2.0 | ||
# | ||
# Unless required by applicable law or agreed to in writing, software | ||
# distributed under the License is distributed on an "AS IS" BASIS, | ||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
# See the License for the specific language governing permissions and | ||
# limitations under the License. | ||
|
||
# This is not a real test, but a module that gets imported in other tests. | ||
|
||
"""parse command line for sparse level routines | ||
-m <int> number of rows of the matrix A | ||
-n <int> number of columns of the matrix A | ||
--local_out_sz <int> dimension of submatrix in tile approach, | ||
or number of rows in non-tile approach | ||
--eps tolerance | ||
--latestlink working directory | ||
--debug show A, x, and b | ||
--sdkgui prepare data fro sdk gui, including source code | ||
--driver path to CSL compiler | ||
--autocsl use get_cslang_dir to find out the path of CSL | ||
""" | ||
|
||
|
||
import argparse | ||
|
||
|
||
SIZE = 10 | ||
ZDIM = 10 | ||
ITERATIONS = 10 | ||
DX = 20 | ||
|
||
|
||
def parse_args(): | ||
parser = argparse.ArgumentParser() | ||
|
||
parser.add_argument('--name', help='the test name') | ||
parser.add_argument( | ||
'--zDim', type=int, help='size of the Z dimension', default=ZDIM | ||
) | ||
parser.add_argument( | ||
'--size', type=int, help='size of the domain in x and y dims', default=SIZE | ||
) | ||
|
||
parser.add_argument( | ||
'--skip-compile', action="store_true", | ||
help='Skip compilation of the code from python' | ||
) | ||
|
||
parser.add_argument( | ||
'--skip-run', action="store_true", | ||
help='Skip run of the code from python' | ||
) | ||
|
||
parser.add_argument( | ||
'--iterations', | ||
type=int, | ||
help='number of timesteps to simulate', | ||
default=ITERATIONS | ||
) | ||
|
||
parser.add_argument( | ||
'--dx', | ||
type=int, | ||
help='dx value (impacting the boundary)', default=DX | ||
) | ||
|
||
parser.add_argument( | ||
'--fabric_width', | ||
type=int, | ||
help='Width of the fabric we are compiling for', | ||
) | ||
|
||
parser.add_argument( | ||
'--fabric_height', | ||
type=int, | ||
help='Height of the fabric we are compiling for', | ||
) | ||
|
||
parser.add_argument('--cmaddr', help='IP:port for CS system') | ||
|
||
parser.add_argument( | ||
"--debug", | ||
help="show A, x, and b", action="store_true" | ||
) | ||
|
||
parser.add_argument( | ||
"--width-west-buf", | ||
default=0, type=int, | ||
help="width of west buffer") | ||
parser.add_argument( | ||
"--width-east-buf", | ||
default=0, type=int, | ||
help="width of east buffer") | ||
parser.add_argument( | ||
"--n_channels", | ||
default=1, type=int, | ||
help="Number of memcpy \"channels\" (LVDS/streamers for both input and output) to use \ | ||
when memcpy support is compiled with this program. If this argument is not present, \ | ||
or is 0, then the previous single-LVDS version is compiled.") | ||
|
||
args = parser.parse_args() | ||
|
||
return args |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,10 @@ | ||
#!/usr/bin/env bash | ||
|
||
set -e | ||
|
||
cslc ./layout.csl --arch=wse2 --fabric-dims=17,12 --fabric-offsets=4,1 \ | ||
-o=out_code --params=width:10,height:10,zDim:10,sourceLength:10,dx:20 \ | ||
--params=srcX:0,srcY:0,srcZ:0 --verbose --memcpy --channels=1 \ | ||
--width-west-buf=0 --width-east-buf=0 | ||
cs_python run.py --name out \ | ||
--iterations=10 --dx=20 --skip-compile |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,107 @@ | ||
// Copyright 2024 Cerebras Systems. | ||
// | ||
// Licensed under the Apache License, Version 2.0 (the "License"); | ||
// you may not use this file except in compliance with the License. | ||
// You may obtain a copy of the License at | ||
// | ||
// http://www.apache.org/licenses/LICENSE-2.0 | ||
// | ||
// Unless required by applicable law or agreed to in writing, software | ||
// distributed under the License is distributed on an "AS IS" BASIS, | ||
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
// See the License for the specific language governing permissions and | ||
// limitations under the License. | ||
|
||
param pattern: u16; | ||
param paddedZDim: u16; | ||
|
||
const math = @import_module("<math>"); | ||
// We need to allocate space for not just the (padded) size of the problem (in | ||
// the Z dimension), but also space for ghost cells. | ||
const zBufferSize = paddedZDim + 2 * (pattern - 1); | ||
|
||
fn initBuffer() [2, zBufferSize]f32 { | ||
return @zeros([2, zBufferSize]f32); | ||
} | ||
|
||
// Minimig - main.c:15-23, target_3d.c:23, and target_3d.c:30 | ||
fn computeMinimigConsts(dx: u16) [9]f32 { | ||
@comptime_assert(pattern == 5); | ||
const dx2:f32 = @as(f32, dx * dx); | ||
const c0:f32 = -205.0 / 72.0 / dx2; | ||
const c1:f32 = 8.0 / 5.0 / dx2; | ||
const c2:f32 = -1.0 / 5.0 / dx2; | ||
const c3:f32 = 8.0 / 315.0 / dx2; | ||
const c4:f32 = -1.0 / 560.0 / dx2; | ||
|
||
return [9]f32 { | ||
c4, | ||
c3, | ||
c2, | ||
c1, | ||
c0 * 3.0, | ||
c1, | ||
c2, | ||
c3, | ||
c4, | ||
}; | ||
} | ||
|
||
// `computeMinimigConsts()` computes constants in both the positive as well as | ||
// negative direction of the X, Y, and Z dimensions. However, for any given | ||
// axis, our implementation splits communication and computation into two, one | ||
// for the positive direction and another for the negative direction. This | ||
// function extracts the first half of the constants, and optionally includes | ||
// the center element. | ||
fn fetchFirstHalfConsts(consts: [2 * pattern - 1]f32, self: bool) [pattern]f32 { | ||
var idx: u16 = 0; | ||
var result = @zeros([pattern]f32); | ||
|
||
if (!self) { | ||
idx += 1; | ||
} | ||
|
||
while (idx < pattern) : (idx += 1) { | ||
result[idx] = consts[pattern - idx - 1]; | ||
} | ||
|
||
return result; | ||
} | ||
|
||
fn fetchSecondHalfConsts(consts: [2 * pattern - 1]f32, self: bool) [pattern]f32 { | ||
var idx: u16 = 0; | ||
var result = @zeros([pattern]f32); | ||
|
||
if (!self) { | ||
idx += 1; | ||
} | ||
|
||
while (idx < pattern) : (idx += 1) { | ||
result[idx] = consts[pattern + idx - 1]; | ||
} | ||
|
||
return result; | ||
} | ||
|
||
// The sequence in which each PE receives wavetlets from its neighbors depends | ||
// on the relative placement of the PE within each group of `pattern` PEs. This | ||
// function reorders the constants to match the sequence of source PE IDs so | ||
// that we multiply the incoming data with the right constants. | ||
fn permuteConsts(pattId: u16, originalConsts: [pattern]f32) [pattern]f32 { | ||
const start = pattId; | ||
var result = @zeros([pattern]f32); | ||
|
||
var idx: u16 = 0; | ||
while (idx < pattern) : (idx += 1) { | ||
var value: f32 = 0.0; | ||
if (start < idx) { | ||
value = originalConsts[(start + pattern) - idx]; | ||
} else { | ||
value = originalConsts[start - idx]; | ||
} | ||
|
||
result[idx] = value; | ||
} | ||
|
||
return result; | ||
} |
Oops, something went wrong.