Merge branch 'develop' into ic-mm-comms-overlap

OrderN · Jan 10, 2024 · 4a71353 · 4a71353
2 parents 4d77876 + 4939b8b
commit 4a71353
Show file tree

Hide file tree

Showing 26 changed files with 516,533 additions and 206 deletions.
diff --git a/benchmarks/matrix_multiply/Conquest_input b/benchmarks/matrix_multiply/Conquest_input
@@ -1,6 +1,7 @@
 AtomMove.TypeOfRun static
 IO.Coordinates coords.dat
 IO.Iprint 0
+IO.WriteOutToFile F
 Grid.GridCutoff  200
 DM.SolutionMethod ordern
 DM.L_range 16.0

diff --git a/benchmarks/matrix_multiply/README.md b/benchmarks/matrix_multiply/README.md
@@ -7,4 +7,8 @@ The additional coordinate files `si_XYZ.xtl` can be used to test weak scaling an
 would work well for increasing the number of nodes: `si_222.xtl` is the same as `coords.dat`
 and has 64 atoms. This means it would run well on anywhere from 2MPI/4OpenMP to 8MPI/1OpenMP.
 With the rest of the `xtl` files, we double the number of atoms each time, and would need
-to double the number of processes.
+to double the number of processes.
+
+We now have systems from 64 atoms (222) to 262144 atoms (323232) which will scale from 8 MPI
+to 32,768 MPI processes (1 OpenMP thread) with 8 atoms per process or 1 MPI process to 4096 MPI
+processes with 64 atoms per process.
diff --git a/benchmarks/matrix_multiply/si_161616.xtl b/benchmarks/matrix_multiply/si_161616.xtl
diff --git a/benchmarks/matrix_multiply/si_16168.xtl b/benchmarks/matrix_multiply/si_16168.xtl
diff --git a/benchmarks/matrix_multiply/si_1688.xtl b/benchmarks/matrix_multiply/si_1688.xtl
diff --git a/benchmarks/matrix_multiply/si_321616.xtl b/benchmarks/matrix_multiply/si_321616.xtl
diff --git a/benchmarks/matrix_multiply/si_323216.xtl b/benchmarks/matrix_multiply/si_323216.xtl
diff --git a/benchmarks/matrix_multiply/si_323232.xtl b/benchmarks/matrix_multiply/si_323232.xtl
diff --git a/benchmarks/water_64mols/Conquest_input b/benchmarks/water_64mols/Conquest_input
@@ -2,6 +2,7 @@ IO.Title    Water static test, DZ, GridCutoff=50Ha
 IO.Coordinates    H2O_coord.in
 IO.FractionalAtomicCoords F
 IO.Iprint 1
+IO.WriteOutToFile F
 General.DistanceUnits Angstrom
 IO.WriteOutToFile F
 

diff --git a/docs/groundstate.rst b/docs/groundstate.rst
@@ -157,6 +157,40 @@ where ``Diag.MPOrder`` specifies the order of the Methfessel-Paxton
 expansion.  It is recommended to start with the lowest order and
 increase gradually, testing the effects.
 
+Go to :ref:`top <groundstate>`.
+
+.. _gs_pad:
+
+Padding Hamiltonian matrix by setting block size
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+With the default setting, the size of Hamiltonian and overlap matrices 
+is determined by the total number of support functions. 
+It can be a prime number and timing of diagonalisation can be very slow 
+in such cases, since the division of the matrix into small pieces is difficult.
+
+By padding, we can change the size of Hamiltonian matrix to improve 
+the efficiency of the diagonalisation. To set an appropriate value 
+for the block size of the matrix, specify the following two variables.
+
+ ::
+
+  Diag.BlockSizeR       20
+  Diag.BlockSizeC       20
+
+Note that these two numbers should be the same when padding 
+(and when using ELPA which will be introduced to CONQUEST soon).
+We suggest that an appropriate value is between 20 and 200, but 
+this should be tested. 
+
+The option for padding was introduced after v1.2, and if you would 
+like to remove it, set the following variable. 
+
+ ::
+
+  Diag.PaddingHmatrix              F 
+
+
 Go to :ref:`top <groundstate>`.
 
 .. _gs_on:

diff --git a/docs/input_tags.rst b/docs/input_tags.rst
@@ -657,29 +657,27 @@ Diag.GammaCentred (*boolean*)
 
     *default*: F
 
-Diag.ProcRows (*integer*)
-
-    *default*:
+Diag.PaddingHmatrix (*boolean*)
+    After v1.2, we have introduced a method to have an optimum value of 
+    block size for Hamiltonian and overlap matrices (See below) by padding.
+    By setting 'F', we do not use the method.
 
-Diag.ProcCols (*integer*)
-
-    *default*:
+    *default*: T
 
 Diag.BlockSizeR (*integer*)
+    Block size for rows (See next).
 
-    *default*:
+    *default*: Determined automatically
 
 Diag.BlockSizeC (*integer*)
     R ... rows, C ... columns
-    These are ScaLAPACK parameters, and can be set heuristically by the code. Blocks
-    are sub-divisions of matrices, used to divide up the matrices between processors.
+    These are ScaLAPACK parameters, and can be set heuristically by the code. 
+    Blocks are sub-divisions of matrices, used to divide up the matrices between processors.
     The block sizes need to be factors of the square matrix size
     (i.e. :math:`\sum_{\mathrm{atoms}}\mathrm{NSF(atom)}`). A value of 64 is considered
-    optimal by the ScaLAPACK user’s guide. The rows and columns need to multiply
-    together to be less than or equal to the number of processors. If ProcRows
-    :math:`\times` ProcCols :math:`<` number of processors, some processors will be left idle.
+    optimal by the ScaLAPACK user’s guide. 
 
-    *default*:
+    *default*: Determined automatically
 
 Diag.MPShift[X/Y/Z] (*real*)
     Specifies the shift *s* of k-points along the x(y,z) axis, in fractional
@@ -742,7 +740,10 @@ Diag.ProcRows (*integer*)
 
 Diag.ProcCols (*integer*)
     Number of columns in the processor grid for SCALAPACK within each k-point
-    processor group 
+    processor group.  The rows and columns need to multiply
+    together to be less than or equal to the number of processors. If ProcRows
+    :math:`\times` ProcCols :math:`<` number of processors, some processors will be left idle.
+
 
     *default*: Determined automatically
 
@@ -827,12 +828,28 @@ AtomMove.OutputFreq (*integer*)
 
     *default*: 50
 
-AtomMove.WriteXSF *(boolean*)
+AtomMove.WriteXSF (*boolean*)
     Write atomic coordinates to ``trajectory.xsf`` for ``AtomMove.TypeOfRun = md`` or ``cg``,
-    every ``AtomMove.OutputFreq`` steps
+    every ``AtomMove.XsfFreq`` steps
 
     *default*: T
 
+AtomMove.XsfFreq (*integer*)
+    Frequency of output of atomic coordinates to ``trajectory.xsf``
+
+    *default*: same as ``AtomMove.OutputFreq``
+
+AtomMove.WriteXYZ (*boolean*)
+    Write atomic coordinates to ``trajectory.xyz`` for ``AtomMove.TypeOfRun = md``,
+    every ``AtomMove.XyzFreq`` steps
+
+    *default*: T
+
+AtomMove.XyzFreq (*integer*)
+    Frequency of output of atomic coordinates to ``trajectory.xyz``
+
+    *default*: same as ``AtomMove.OutputFreq``
+
 AtomMove.TestForces (*boolean*)
     Flag for testing forces with comparison of analytic and numerical calculations.
     Can produce *large* amounts of output