Skip to content

Commit

Permalink
fixed grammatical errros on page 102
Browse files Browse the repository at this point in the history
  • Loading branch information
level2fast committed Nov 18, 2023
1 parent dbeace0 commit 093eea3
Showing 1 changed file with 1 addition and 1 deletion.
2 changes: 1 addition & 1 deletion dft.tex
Original file line number Diff line number Diff line change
Expand Up @@ -430,7 +430,7 @@ \section{\gls{dft} optimization}
Derive a formula for the access pattern for the 1D array $S'$ given as input the row number $i$ and column element $j$ corresponding to the array $S$. That is, how do we index into the 1D $S$ array to access element $S(i,j)$ from the 2D $S$ array.
\end{exercise}

To increase performance further we can apply techniques that are very similar to the matrix-vector multiply. Previously, we observed that increasing performance of matrix-vector multiply required partitioning the \lstinline|M[][]| array. Unfortunately, representing the $S$ matrix using the $S'$ means that there is no longer an effective way to partition $S'$ to increase the amount of data that we can read on each clock cycle. Every odd row and column of $S$ includes every element of $S'$. As a result, there is no way to partition the values of $S'$ like were able to do with $S$. The only way to increase the number of read ports from the memory that stores $S'$ is to replicate the storage. Fortunately, unlike with a memory that must be read and written, it is relatively easy to replicate the storage for an array that is only read. In fact, \VHLS will perform this optimization automatically when instantiates a \gls{rom} for an array which is initialized and then never modified. One advantage of this capability is that we can simply move the $sin()$ and $cos()$ calls into an array initialization. In most cases, if this code is at the beginning of a function and only initializes the array, then \VHLS is able to optimize away the trigonometric computation entirely and compute the contents of the ROM automatically.
To increase performance further we can apply techniques that are very similar to the matrix-vector multiply. Previously, we observed that increasing performance of matrix-vector multiply required partitioning the \lstinline|M[][]| array. Unfortunately, representing the $S$ matrix using the $S'$ means that there is no longer an effective way to partition $S'$ to increase the amount of data that we can read on each clock cycle. Every odd row and column of $S$ includes every element of $S'$. As a result, there is no way to partition the values of $S'$ like we were able to do with $S$. The only way to increase the number of read ports from the memory that stores $S'$ is to replicate the storage. Fortunately, unlike with a memory that must be read and written, it is relatively easy to replicate the storage for an array that is only read. In fact, \VHLS will perform this optimization automatically when instantiates a \gls{rom} for an array which is initialized and then never modified. One advantage of this capability is that we can simply move the $sin()$ and $cos()$ calls into an array initialization. In most cases, if this code is at the beginning of a function and only initializes the array, then \VHLS is able to optimize away the trigonometric computation entirely and compute the contents of the ROM automatically.

\begin{exercise}
Devise an architecture that utilizes $S'$ -- the 1D version of the $S$ matrix. How does this affect the required storage space? Does this change the logic utilization compared to an implementation using the 2D $S$ matrix?
Expand Down

0 comments on commit 093eea3

Please sign in to comment.