-
CUDA kernel launch configurations have been tuned to improve performance, and now allows for high resolution in the vertical direction PR #1969, issue #1854 closed.
-
DSS was refactored, and machine precision changes can be expected. PR #1958.
- Added hyperbolic tangent stretching. PR #1930.
- Support for matrix fields on spectral and point spaces was added, PR #1884.
- Support for 3-component DSS transform was added, PR #1693.
- Support for column-wise "accumulate"/"reduce" operations were added, PR #1903. These abstractions will allow us to group, paralellize and optimize more column-wise work on the GPU.
- A new macro,
Fields.@rprint_diff
was added, which recursively print differences between twoFieldVector
s (of the same type) (PR #1886). - Julia 1.11 fixes (PR #1883)
Nh
has been added to the type parameter space, which allows us to more flexibly write performant backend kernels (PR #1894). This was leveraged in PR #1898, and may result in slightly more performant kernels.
- Various performance tweaks (PRs #1840, #1837, #1843, #1839).
- CPU/GPU kernels are now determined by dispatching, instead of specializing, which should (hopefully) have generally fixed GPU dispatching issues (PR #1863).
- Matrix multiplication kernels have been improved (PR #1880).
- Support for the following methods have been deprecated (PR #1821, ):
IntervalTopology(::Mesh)
in favor of usingIntervalTopology(::ClimaComms.AbstractDevice, ::Mesh)
FaceFiniteDifferenceSpace(::Mesh)
in favor of usingFaceFiniteDifferenceSpace(::ClimaComms.AbstractDevice, ::Mesh)
CenterFiniteDifferenceSpace(::Mesh)
in favor of usingCenterFiniteDifferenceSpace(::ClimaComms.AbstractDevice, ::Mesh)
FiniteDifferenceGrid(::Mesh)
in favor of usingFiniteDifferenceGrid(::ClimaComms.AbstractDevice, ::Mesh)
GPU dispatching with
copyto!
andfill!
have been fixed PR #1802.
Added
FieldMatrixWithSolver
, a wrapper that helps defining implicit Jacobians. PR #1788
-
Added
array2field(::Field)
andfield2array(::Field)
convenience functions, to help facilitate use with RRTMGP. PR #1768 -
Nv
is now a type parameter in DataLayouts that have vertical levels. As a result, users can useDataLayouts.nlevels(::AbstractData)
to obtain a compile-time constant for the number of vertical levels. -
Added
interpolate(field, target_hcoords, target_zcoord)
convenience function so that theRemapper
does not have to be explicitely constructed. PR #1764
run_field_matrix_solver!
was fixed for column spaces, and tests were added to ensure it doesn't break in the future. PR #1750We're now using local memory (MArrays) in the
band_matrix_solve!
, which has improved performance. PR #1735.We've specialized some cases in
run_field_matrix_solver!
, which results in more efficient kernels being launched. PR #1732.We've reduced memory reads in the
band_matrix_solve!
for tridiagonal systems, improving its performance. PR #1731.We've added NVTX annotations in ClimaCore functions, so that we have a more granular trace of performance. PRs #1726, #1723.
Extend adapt_structure for all operator and boundary condition types. Also use
unrolled_map
inmultiply_matrix_at_index
to avoid the recursive inference limit when compiling nested matrix operations. PR #1684Remapper
s can now process multipleField
s at the same time if created with somebuffer_lenght > 1
. PR (#1669) Machine-precision differences are expected. This change is breaking because remappers now return the same array type as the input field.We inlined the
multiple_field_solve
kernels, which should improve performance. PR #1715.We added support for MultiBroadcastFusion, which allows users to fuse similar space point-wise broadcast expressions via
Fields.@fused_direct
. PR #1641.
We fixed some fieldvector broadcasting on Julia 1.9. PR #1658.
We fixed an inference failure with matrix field broadcasting. PR #1683.
We now always inline for all ClimaCore kernels. PR #1647. This can result in more brittle inference (due to compiler heuristics). Technically, this is not a breaking change, but some code changes may be needed in practice.
fixed array allocation for interpolation on CPU. PR #1643.
fixed edge case in interpolation that led to incorrect vertical interpolation. PR #1640.
fixed
interpolate!
for MPI runs. PR #1642.
support for many deprecated methods have been dropped PR #1632.
Slight performance improvement by replacing
rdiv
withrmul
. PR (#1496) Machine-precision differences are expected.Rewritten
distributed_remapping
. Newdistributed_remapping
is non-allocating and up to 1000x faster (on GPUs). Newdistributed_remapping
no longer supports thephysical_z
argument (this option is still available inRemapping.interpolate_column
). Newinterpolate!
function is available for remapping in-place. The new preferred way to define aRampper
isRemapper(space, target_hcoords, target_zcoords)
(instead ofRemapper(target_hcoords, target_zcoords, space)
). PR (#1630)