Skip to content

Commit

Permalink
Added documentation on integrating with existing software
Browse files Browse the repository at this point in the history
  • Loading branch information
cliffburdick committed Jan 30, 2025
1 parent 5010a55 commit 5d75d3c
Showing 1 changed file with 145 additions and 0 deletions.
145 changes: 145 additions & 0 deletions docs_input/external.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,145 @@
.. _devexternal:

Interfacing With External Code and Libraries
############################################

Existing host and CUDA can can interoperate seamlessly with MatX both by using MatX primitives in existing code,
and transferring MatX data into other libraries. Integrating MatX into existing code is a common use case that
allows developers to incrementally port code into MatX without having to rewrite everything at once.

This guide is not intended for developers who wish to extend MatX See :ref:`_devguide` for the MatX developer guide.


Passing Existing Pointers to MatX
---------------------------------

To use MatX in existing code, the pointers (whether host or device) are passed into the `make_tensor` call as the
first parameter:

.. code-block:: cpp
// Existing code
float *my_data_ptr;
cudaMalloc((void*)&my_data_ptr, 100 * sizeof(float)); // Treated as a 10x10 float matrix in the code
foo(my_data_ptr); // Call existing function that uses my_data_ptr
// Work with my_data_ptr on the device
// End of existing code. Convert to MatX tensor
auto matx_tensor = matx::make_tensor<float>(my_data_ptr, {10, 10});
// MatX functions
In the code above the developer has an existing device pointer that they used in their CUDA code. It's common in existing
CUDA code to see linear allocations like the one above, but the developer treats it as a higher-dimension tensor in the code.
For this example `my_data_ptr` was allocated with linear memory holding 100 floats, but the user treats it later as a 10x10 matrix.

Since MatX needs to know the shape of the tensor when it's created, we explictly pass the `{10, 10}` shape into the
`make_tensor` call.

By default MatX will not take ownership of the pointer; the user is responsible for freeing the memory when they are done with it.
This is true of all `make_tensor` calls that take an existing pointer as an argument since the user typically has their own
memory management outside of MatX. The last parameter of each `make_tensor` call is a boolean named `owning` that tells MatX to
take ownership, and defaults to *false*. By setting `owning` to *true*, MatX will free the memory when the tensor goes out of scope.
By default it uses its own allocator, but users can pass in their own PMR-compatible allocator if they wish. For more information
see :ref:`_creating`.

Passing MatX Operators to External Code/Libraries
-------------------------------------------------

MatX operators can be passed to external code or libraries in two ways: by object or by pointer. Passing MatX operators by object is
the preferred way when possible. Doing so maintains all of the internal information and state that is contained in the operator and
reduces the chances of errors.

Sometimes code cannot be modified to allow for passing by object. This is common when working with libraries that have API that
cannot be changed easily, or if the overhead of passing by value is too large. MatX also allows developers to extract the pointer
from a MatX operator and pass it to external code by using the `Data()` method of a tensor. Note that unlike the "pass-by-object" method,
this method only works for tensors since general operators do not have a data pointer.

Care must be taken when passing either operators or pointers to existing code to avoid bugs:

* The data is only valid for the lifetime of the tensor. If the tensor goes out of scope, the data backing the tensor is invalid. For
example, if a CUDA kernel is called asynchronously with a tensor as a parameter, then the tensor goes out of scope while the kernel
runs, the results are undefined.
* The *kind* of the pointer must be known to the external code. For example, if the tensor was created in device memory, the external
code must access it only where device memory is accessible.

If the external code supports the *dlpack* standard, the tensor's `ToDLPack()` method can be used instead to get a `DLManagedTensor` object.
This method is much safer since all shape and ownership can be transferred.


Passing By Object
=================

Passing by object makes all of the object's metadata available inside of an external function. Since operator types can be very complex, it's
always recommended to pass the operator as a template parameter rather than specifying the type of the operator:

.. code-block:: cpp
template <typename Op>
void foo(Op &op)
{
// Do something with the operator
auto val = op(10, 1);
}
template <typename Op>
__global__ void foo_kernel(Op op)
{
// Do something with the operator
auto val = op(10, 1);
}
// Create a MatX operator
auto t1 = matx::make_tensor<float>({10, 10});
auto t2 = matx::make_tensor<float>({10, 10});
auto o1 = (t1 + t2) * 2.0f;
foo(o1);
typename matx::detail::base_type_t<decltype(o1)> o1_base;
foo_kernel<<<1,1>>>(o1_base);
The first function `foo` is a host function that takes a MatX operator as a template parameter by reference, while `foo_kernel` is
a CUDA kernel that takes the operator by value. When passing an operator to a CUDA kernel it should always be passed by value
unless the operator's memory is accessible on the device. The template parameter allows the user to pass any operator to the
function that adheres to the operator interface. This is a powerful concept that reduces the need for code changes if the type
of the operator changes. For example, changing the `o1` statment to `t1 - t2` would change the type of the operator, but using
templates allows the same code to exist in `foo` without changing the type.

For more information about the *operator interface*. see :ref:`_concepts`.

Inside of both `foo` and `foo_kernel` all functions in the *operator interface* are available. `op(10, 1)` will return the value
at the 11th row and 2nd column of the operator (0-based). Using `operator()` inside of the operator will handle all the indexing
logic to handle the shape and stride of the operator.

The last part to mention in the code is the declaration of `o1_base`. Some operator types in MatX, such as a `tensor_t`, cannot
be passed directly to a CUDA kernel due to internal types that cannot be used on the device. The `base_type_t` type trait will
convert the operator to a type that can be used on the device if needed, or it will return the same type if it's already usable
on the device.

Passing By Pointer
==================

In the code above `t1` and `t2` could have their pointers extracted, but `o1` could not. For that reason, passing raw pointers
can only be used on tensors and not other operators.

.. code-block:: cpp
#include <matx.h>
// Existing function
void foo(float *data);
// Create a MatX tensor in managed memory
auto t1 = matx::make_tensor<float>({10, 10});
// MatX processing code
// Existing code
foo(matx_tensor.Data());
The above example shows an existing function `foo` taking in a pointer from the MatX tensor `t1`. Since only a pointer is available, all
metadata available in the operator (shape, strides, etc) is not available inside of the function, and the user must ensure the correctness
of usage with the pointer.

0 comments on commit 5d75d3c

Please sign in to comment.