This repository is for large-scale simulation of Gaussian boson sampling simulations, capable of simulating the largest quantum supremacy experiments to-date. We provide a supercomputer GPU implementation and a personal computer CPU only implementation.
We tested our supercomputer GPU implementation on the Polaris supercomputer at Argonne Leadership Computing Facility. It uses exactly
We tested our CPU implementation on an M1 Macbook Pro. For the most difficult experiment to match the two point correlation (Jiuzhang2-P65-2, see explanation in italic below), we obtain the MPS after approximate 3 hours. The bond dimension of 2000 is chosen to match the two-point correlation slope with the experiment, although the quality of higher-order correlation decreases.
The bond dimension required to match the two-point correlation slopes of all other experiments, including Jiuzhang2-P65-1,3,4,5, Jiuzhang3, and Borealis, are all smaller. This may be due to two factors:
- Increasing the squeezing parameter increases the experimental noise, hence reducing the two-point correlation slopes.
- Increasing the squeezing parameter increases the displacement channel strength. In general, the trace distance between the approximate state and the ideal state is smaller after both go through a noise channel such as the displacement channel, hence reducing the bond dimension requirement.
The majority of the program is written using python, requiring the following python packages:
- numpy
- scipy
- tqdm
- requests (for downloading data from Xanadu in
Xanadu_download.py
) - pandas (for loading the squeezing parameters from xlsx file of Jiuzhang3 in
make_cov.py
) - cvxpy and cvxopt (for obtaining the quantum-classical covariance matrix decomposition in
get_decomposition.py
.) - strawberryfields (data analysis only)
- thewalrus (data analysis only)
For GPU implementations, we need:
- cupy
- torch (for GPU matrix exponential, only used in sampling)
- mpi4py
- filelock
We recommend Anaconda for python virtual environment management, and it is best to start a clean environment for this project.
For threshold detector Torontonian analysis of Jiuzhang samples, a Julia program compute_torontonian.jl
is used (https://arxiv.org/abs/2207.10058). Julia can also be installed using Anaconda within the same virtual environment.
If GPU implementations are used, please insure that cupy is installed properly. Normally, you shouldn't have to install cudatoolkit before hand, as the default method of installing cupy from conda
conda install -c conda-forge cupy
should take care of installing the appropriate version of cudatoolkit. For more details, see https://docs.cupy.dev/en/stable/install.html.
A custom cuda kernel is used for generating the MPS, and cupy has to compile the kernel. If a compilation error occurs stating that permission is denied to access a directory, this is becuase cupy does not have permission to access the default temp file directory. We can simply change the default temp file directory by changing the environment variable CUPY_CACHE_DIR
(recommended) or TMPDIR
(system-wide) to a directory with sufficient perimssion.
When an error occurs, the rank that experiences the error will abort, while other MPI ranks may not. If other ranks are waiting for communication to complete from that rank, the program will not exit and will wait indefinitely. This causes significant waste of allocated compute time, and we choose to perform all-rank abort whenever a single rank experiences some error. However, the drawback of this approach is that error stack cannot be properly traced and debugging is hard. For production, we recommend using the all-rank abort error handling to avoid accidental failed simulations from wasting time. During debugging and testing, we recommend turning off this behavior. In all GPU implementations, python scripts have a line of code that changes the default error handling behavior of python into the all-rank abort behavior:
sys.excepthook = mpiabort_excepthook
You can comment such lines away to turn off this behavior.
The current implementation does not scale to 1000 GPUs or ranks on the Polaris system at Argonne Leadership Computing Facility. This may be an issue with mpi4py, the implementation of MPICH on Polaris, or our program, and has not been tested on other systems. We welcome pull requests if a fix is identified within our program, or instructions for addressing this issue in the Readme file.
MPI communications with cupy arrays has unexpected behaviors (therefore, we use numpy buffers exclusively for inter-rank communications). Not tested on other systems and therefore source of issue is unknown.
Although our simulation code base does not require the strawberryfields or thewalrus libraries, benchmarking may need them. However, we experimence some conflicts with other libraries, particularly breaking the symbols with numpy for certain linear algebra libraries such as Intel MKL. If such issue occurs, we recommend creating a separate environment for data analysis with the two libraries installed.
We use PyTorch in our sampling algorithm for matrix exponentiation on GPUs. This is needed when creating the displacement operator from the ladder operators. The benefit of using GPUs is less significant when the bond dimension is large, or when the number of samples to generate is small. It can be removed and substituted with cpu matrix exponentiation from scipy.linalg.expm
.
Most operations are tensor operations, and can potentially benefit from using Tensor Cores on Nvidia GPUs that offer siginificant performance gains. We experimented with enabling tensor cores by setting the environment variable CUPY_TF32=1
, but no significant changes were observed. One limitation is cp.einsum
(cupy.einsum
), which may not be utilizing tensor cores. Further investigations are needed to determine if improvements are possible.
We prepare the variables specifying the location of the files and which experiment to run here. In the example given, rootdir is the location where experiment configuration data is all saved.
# Back slashes are needed at the end
rootdir='/grand/projects/QuantumSupremacy/'
experiment='ustc_larger_5/'
dir="$rootdir""$experiment"
# Specify which file saves all python outputs.
# Error outputs are not redirected to outfile.
# Can be specified with job scheduler (if applicable)
outfile='output.out'
Later when we specify the output file, we use > $outfile
at the end of a command to create the file and write python printed outputs to the file, or use >> $outfile
to append.
For downloading Borealis (Xanadu) data, execute the following python script:
python -u Xanadu_download.py --dir $rootdir >> $outfile
This will download data to corresponding figure folders under rootdir
(data used to create different figures). To choose which data to download, edit the files_list
variable in Xanadu_download.py
. Afterwards, we would like to rename the folders appropriately by executing the following commands (or rename using GUI file managers):
cd $rootdir # Go to root directory
# Renaming names for clarity and consistency
[ -d "fig2" ] && mv fig2 Borealis_M16
[ -d "fig3a" ] && mv fig3a Borealis_M216_low
[ -d "fig3b" ] && mv fig3b Borealis_M72
[ -d "fig4" ] && mv fig4 Borealis_M216_high
[ -d "figS15" ] && mv figS15 Borealis_M288
cd - # Return to the last directory
For downloading Jiuzhang2.0 data, go to http://quantum.ustc.edu.cn/web/node/951. If you would like to download it using the terminal, execute the following:
wget -P $rootdir \
http://quantum.ustc.edu.cn/web/sites/default/files/2021-07/GBS144%20Data.part1_.rar
wget -P $rootdir \
http://quantum.ustc.edu.cn/web/sites/default/files/2021-07/GBS144%20Data.part2_.rar
wget -P $rootdir \
http://quantum.ustc.edu.cn/web/sites/default/files/2021-07/GBS144%20Data.part3_.rar
wget -P $rootdir \
http://quantum.ustc.edu.cn/web/sites/default/files/2021-07/GBS144%20Data.part4_.rar
You can similarly decompress the compressed files in terminal:
# Unrar your files. `x` tells it to extract all parts.
unrar x 'GBS144 Data.part1_.rar'
If you do not have sudo
privilege to install the unrar
command, you can use the a container following this tutorial: https://vsoch.github.io/lessons/unrar-python/ (their python tutorial does not work as it still depends on the unrar
executable). For example, if your system provides singularity
for running containers, you can execute the following:
# Loading singulaity. Most linux servers have `module` command to manage modules.
# If singularity is not available, try docker.
module load singularity
# Getting the container that can run unrar
singularity-ce pull --name rar.simg shub://singularityhub/rar
# If you are using a cluster, start an interactive session job.
# Using the image on a login node may not work
# The example is shown for slurm, but you may have a different scheduler
# and different arguments may be required as well
srun --nodes=1 --ntasks-per-node=1 --time=01:00:00 --pty bash
# Binding directory to container so that it can find the files
singularity shell -B <directory/that/has/your/rar/files>
# Unrar your files. `x` tells it to extract all parts.
unrar x 'GBS144 Data.part1_.rar'
# Exit singularity container
exit
# Exit interactive job
exit
If nothing works, just download to your own computer and extract there.
You should then rename data folders to Jiuzhang2_P<P>_<id>
correctly, and you may use a GUI file manager or similar commands for the Xanadu data:
mv waist=65um/0.15 Jiuzhang2_P65_1
mv waist=65um/0.3 Jiuzhang2_P65_2
mv waist=65um/0.6 Jiuzhang2_P65_3
mv waist=65um/1.0 Jiuzhang2_P65_4
mv waist=65um/1.65 Jiuzhang2_P65_5
mv waist=125um/0.5 Jiuzhang2_P125_1
mv waist=125um/1.412 Jiuzhang2_P125_2
cp waist=65um/matrix.mtx Jiuzhang2_P65_1/matrix.mtx
cp waist=65um/matrix.mtx Jiuzhang2_P65_2/matrix.mtx
cp waist=65um/matrix.mtx Jiuzhang2_P65_3/matrix.mtx
cp waist=65um/matrix.mtx Jiuzhang2_P65_4/matrix.mtx
cp waist=65um/matrix.mtx Jiuzhang2_P65_5/matrix.mtx
cp waist=125um/matrix.mtx Jiuzhang2_P125_1/matrix.mtx
cp waist=125um/matrix.mtx Jiuzhang2_P125_2/matrix.mtx
The first step is to generate the covariance matrix saved as cov.npy
. Run the python script:
python -u make_cov.py --dir $dir >> outfile
The next step is to decompose the covariance matrix of the experiment into a pure quantum part and a random displacement part. The quantum part results in the covariance matrix saved as sq_cov.npy
.
python -u get_decomposition.py --dir $dir >> $outfile
# Local Hilbert space dimension for constructing the MPS.
# Usually small because the actual squeezed photon number is small
d=4
# Local Hilbert space dimension for sampling from MPS after displacement.
# Larger because more photons are present in a mode after displacement.
dd=10
chi=100 # Bond dimension
# Number of samples to save in a single numpy file per site.
# This is to prevent losing data when an error occurs,
# and is useful for rescuing partial progress.
N=1000
# Number of samples to produce in parallel.
# If set too large, results in GPU out-of-memory error.
n=100
# Number of iterations to produce N samples.
# In total, produce iter * N samples.
iter=10
The GPU supercomputer implementation uses exactly
Ensure that local scratch disk space is enabled. Otherwise, cannot save partial progress before sampling.
If no local scratch disk space is available, you need to change the python files 'distributed_kron.py' and 'distributed_MPS.py' to save partial progress to permenant storage, and also save with different filenames for different ranks. You will also need to change 'distributed_MPS.py' and 'distributed_sampling.py' to read rank-specific partial results produced by the previous program. Otherwise, you can implement a combined program that saves everything on volatile memory and not disk, but some GPU and CPU memory freeing and garbage collection operations might be needed.
Tensors are not saved, and only used for sampling. Only samples are saved to permenant storage. The samples are saved to individual files for different modes.
# Input appropriate job submission specifications. Ignored here
# Directory for local scratch space. Different systems may have different names.
# Not all systems have local scratch space and the code may need to be modified.
ls='/local/scratch/'
gpn=4 # Number of GPUs per node. Polaris has 4.
now=$(date +"%T")
echo "Start time : $now"
# Preparation of getting the MPS
# mpiexec parameters may be different depending on your MPI implementation
# (MPICH, Open MPI, etc. See instructions of your cluster.)
mpiexec -n $NTOTRANKS --ppn $NRANKS_PER_NODE --depth=$NDEPTH --cpu-bind depth \
--env OMP_NUM_THREADS=$NTHREADS -env OMP_PLACES=threads \
python -u distributed_kron.py --d $d --chi $chi \
--gpn $gpn --dir $dir --ls $ls >> $outfile
# Getting the MPS
mpiexec -n $NTOTRANKS --ppn $NRANKS_PER_NODE --depth=$NDEPTH --cpu-bind depth \
--env OMP_NUM_THREADS=$NTHREADS -env OMP_PLACES=threads \
python -u distributed_MPS.py --d $d --chi $chi \
--gpn $pgn --dir $dir --ls $ls >> $outfile
now=$(date +"%T")
echo "Finishing MPS time : $now"
mpiexec -n $NTOTRANKS --ppn $NRANKS_PER_NODE --depth=$NDEPTH --cpu-bind depth \
--env OMP_NUM_THREADS=$NTHREADS -env OMP_PLACES=threads \
python -u distributed_sampling.py --N $N --n $n --iter $iter --d $d \
--dd $dd --chi $chi --gpn $gpn --dir $dir --ls $ls >> $outfile
now=$(date +"%T")
echo "Finishing sampling time : $now"
python kron_cpu.py --d $d --chi $chi --dir $rootdir
python MPS_cpu.py --d $d --chi $chi --dir $rootdir
python sampling_cpu.py --N $N --n $n --iter $iter --d $d --dd $dd --chi $chi --dir $rootdir
The analysis code is located in the analysis
folder:
cd analysis
XEB_photon_click (XEB_photon_number) Compute XEBs of samples.
Correlation_photon_click (correlation_photon_number) Compute two-point correlation functions of photon click (photon number)
Tot_probs_photon_number (Tot_probs_photon_click) Get the approximation of the probabilities of obtaining N photons (N clicks) for a given covariance matrix, which is needed to XEB for normalization.
For Jiuzhang data analysis, we compute torontonians of samples from different photon click sectors. It saves “tors_{N}.npy” for the N sector. The code from https://github.com/polyquantique/torontonian-julia is needed. We will clone the repository minimally:
git clone --depth=1 --branch=master https://github.com/polyquantique/torontonian-julia.git
rm -rf ./torontonian-julia/.git
We then run the program:
julia compute_torontonian.jl $dir <file/name/for/npz/samples>
where <file/name/for/npz/samples>
should only be the file name as does not include the directory (will be taken care of by argument dir
).
To load Jiuzhang2 and Jiuzhang3 data, use Jiuzhang2_load_samples.py
and Jiuzhang3_load_samples.py
to convert the provided binary files into numpy files.
This work combines original work by Minzhao Liu and Changhun Oh and work derived from thewalrus library by XANADU. The original work by Minzhao Liu and Changhun Oh is licensed under the MIT License. The portion from thewalrus is in make_cov.py, where the portion taken from thewalrus is indicated as such. Portions from thewalrus is licenced under the Apache License Version 2.0.