You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Static inference in RawC and GGML backends is not supported. Currently, these backends rely on dynamic execution even when constant inputs are provided. Static inference will allow for pre-computation of operations at compile time, thereby optimizing performance.
Motivation
This feature will eliminate the need for dynamic execution, improve efficiency and reduce runtime overhead when constant inputs are supplied.
Proposed Solution
1. RawC Backend
Develop Python wrapper functions to execute supported operations directly on the RawC backend when static inputs are supplied.
2. GGML Backend
Utilize RawC backend operations as the basis for computations.
Convert GGML arrays to C arrays before passing them to functions, and convert the results back to GGML arrays.
In GGML code generation, bypass tensor creation and graph marking for statically inferred keys by directly assigning these keys to the output.
Alternatives Considered
An alternative approach for the GGML backend would involve creating a separate dynamic library to manage the GGML flow for tensor operations. However, this method would require context and memory buffer allocation for each static inference, potentially offsetting the performance benefits.
Additional Context
The text was updated successfully, but these errors were encountered:
emrecakmakyurdu
changed the title
[FEATURE] Add Static Inference Support for RawC and GGML
[FEATURE] Static Inference Support for RawC and GGML
Mar 9, 2025
Feature Request
Describe the Feature
Static inference in RawC and GGML backends is not supported. Currently, these backends rely on dynamic execution even when constant inputs are provided. Static inference will allow for pre-computation of operations at compile time, thereby optimizing performance.
Motivation
This feature will eliminate the need for dynamic execution, improve efficiency and reduce runtime overhead when constant inputs are supplied.
Proposed Solution
1. RawC Backend
2. GGML Backend
Alternatives Considered
An alternative approach for the GGML backend would involve creating a separate dynamic library to manage the GGML flow for tensor operations. However, this method would require context and memory buffer allocation for each static inference, potentially offsetting the performance benefits.
Additional Context
The text was updated successfully, but these errors were encountered: