Skip to content
This repository has been archived by the owner on Dec 8, 2024. It is now read-only.

Pull request for CQML extension (2nd trial) #115

Open
wants to merge 89 commits into
base: develop
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
89 commits
Select commit Hold shift + click to select a range
c0b9742
Fchl main (#37)
andersx Mar 1, 2018
b74b173
Updated clang->gcc in macos installation instructions, hattip geoff h…
andersx Mar 1, 2018
4f8e5ad
Updated autodeployment to GH pages and PyPI
andersx Mar 2, 2018
181a009
Develop (#38)
andersx Mar 2, 2018
a9aa729
Updated .travis.yml again
andersx Mar 2, 2018
3bbaa45
Updated .travis.yml again2
andersx Mar 2, 2018
34f1bf7
Updated .travis.yml again3
andersx Mar 2, 2018
15e45d7
Updated version number
andersx Mar 2, 2018
e225620
Merge branch 'master' into develop
andersx Mar 2, 2018
38c023a
Develop (#40)
andersx Mar 2, 2018
9e99168
Fchl doc (#41)
andersx Mar 2, 2018
caeaa6d
Fchl doc (#42)
andersx Mar 2, 2018
4f0c6c0
Fchl doc (#44)
andersx Mar 5, 2018
d817ca5
Fchl doc (#45)
andersx Mar 5, 2018
b48ce5b
pulled master into develop
andersx Mar 6, 2018
e15587b
added custom alchemy to FCHL and fixed parsing bug in compound class
andersx Mar 6, 2018
c0298d8
bobfix (#46)
larsbratholm Mar 6, 2018
4799500
preparing merge of tests
larsbratholm Jul 3, 2018
6c7d5b8
Merged tests
larsbratholm Jul 3, 2018
adb6b11
updated docs
larsbratholm Jul 3, 2018
bb764cb
added examples
larsbratholm Jul 3, 2018
bc3acbe
Preparing for qml high level interface merge
larsbratholm Jul 3, 2018
45de1bd
high level interface merge
larsbratholm Jul 3, 2018
0910e6c
Fixed test that failed with py2
larsbratholm Jul 3, 2018
a3899cd
Another py2 test fix
larsbratholm Jul 3, 2018
d007b8e
Added dependencies in travis
larsbratholm Jul 3, 2018
b0f5d40
Added dependencies in travis
larsbratholm Jul 3, 2018
49b475b
Merge remote-tracking branch 'origin/qmldev' into develop
larsbratholm Jul 3, 2018
ffaa6d3
Added dependencies in travis
larsbratholm Jul 3, 2018
d400fdf
Added dependencies in travis
larsbratholm Jul 3, 2018
7c20394
Updated examples to match restructure. The qmlcode/tutorials will fai…
larsbratholm Jul 3, 2018
6ed040e
Updated author list
larsbratholm Jul 4, 2018
8f328e2
Updated gitignore
larsbratholm Jul 4, 2018
0395f7e
Updated authors and licence on tests
larsbratholm Jul 4, 2018
bc887d6
Merge pull request #50 from larsbratholm/qmldev
andersx Jul 5, 2018
be6c322
ignoring pycharm files
SilviaAmAm Jul 16, 2018
475bd0e
modifications so that tensorflow is no longer a requirement
SilviaAmAm Jul 16, 2018
a56c87b
Made a test that checks that the save/load functions in MRMP work in …
SilviaAmAm Jul 16, 2018
e04ec95
Modified aglaia so that it inherits from BaseEstimator and now has ge…
SilviaAmAm Jul 17, 2018
8c008da
Added a test that will fail until I have fixed the get_params function
SilviaAmAm Jul 18, 2018
9c99313
updated readme, removed redundant readme, changed mkldiscover message…
charnley Jul 25, 2018
62b834e
Merge remote-tracking branch 'upstream/develop' into develop
SilviaAmAm Jul 25, 2018
02fe513
Remade existing QML examples in a notebook
SilviaAmAm Jul 26, 2018
9967f90
Finished adding Aglaia examples into the documentation
SilviaAmAm Jul 26, 2018
093ab49
Modified the example to be consisted between "descriptor" and "repres…
SilviaAmAm Jul 26, 2018
170843b
Modified aglaia so that "representation" is used instead of "descriptor"
SilviaAmAm Jul 26, 2018
4ef7404
Modified all examples so that they work with "representation" instead…
SilviaAmAm Jul 26, 2018
8b316c4
Atom centered symmetry functions (#64)
larsbratholm Jul 26, 2018
da30585
Added nbsphinx
SilviaAmAm Jul 26, 2018
c4c26a7
Merge remote-tracking branch 'upstream/develop' into develop
SilviaAmAm Jul 26, 2018
05333ee
Modified travis.yml to get pandoc
SilviaAmAm Jul 26, 2018
158149a
Changed to apt install pandoc instead of pip install
SilviaAmAm Jul 26, 2018
8f126ac
actually removed pip install
SilviaAmAm Jul 26, 2018
740aca5
Merge pull request #66 from SilviaAmAm/develop
andersx Jul 28, 2018
28871ca
Atom centered symmetry functions (#64) (#65)
larsbratholm Jul 28, 2018
3ce8af3
Closes #62 (#67)
larsbratholm Jul 28, 2018
842701b
Minor docs changes (#69)
berquist Aug 2, 2018
1d1bfdd
MPI driver prototype (#72)
ferchault Aug 7, 2018
6b777fe
Bug fix to acsf fortran code (#77)
larsbratholm Aug 13, 2018
cf4b9bc
Directory structure (#71)
andersx Aug 15, 2018
c93619e
Fchl merge (#79)
andersx Aug 24, 2018
1108f41
Symmetric Kernels (#81)
larsbratholm Sep 9, 2018
df37c92
Added python interface to symmetric vector kernels (#83)
larsbratholm Sep 10, 2018
8a43e06
Qmlearn (#82)
larsbratholm Sep 10, 2018
b634784
Removed pesky u-umlaut (#89)
larsbratholm Sep 16, 2018
6a489df
Temporary fix for issue #90 (#91)
Sep 28, 2018
fc59122
Develop (#94)
Oct 4, 2018
b285baf
This commit adds the multi-fidelity learning approach called CQML,
Nov 1, 2018
d9f7cc8
Made omp less memory intensive in acsf, including a fix to #90 (#96)
larsbratholm Nov 13, 2018
e958a74
Omp reduction bugfix (#95)
larsbratholm Nov 13, 2018
4306b01
Changed linear fit to lasso regression (#93)
larsbratholm Nov 13, 2018
5da66ee
continue->cycle (#92)
larsbratholm Nov 13, 2018
cad7081
Minor bugfixes, including #86 (#88)
larsbratholm Nov 13, 2018
f4a1514
Develop (#98)
Dec 12, 2018
937f2bc
Fixed error and warnings caused by latest numpy 1.16.0
andersx Jan 19, 2019
cd6e10a
Fixed non-pythonic character causing errors in Python2
andersx Jan 19, 2019
9e40ec0
Merge pull request #99 from andersx/python2_bugs
andersx Jan 19, 2019
25c8a71
Kernel PCA (#101)
andersx Jan 22, 2019
caf65f0
Add ability to create compounds from file-like objects (#103)
WardLT Feb 5, 2019
9d4e407
Tolerance fix (#104)
Feb 5, 2019
ff910f4
Fixed extra space before -lpthread flag (#105)
Feb 11, 2019
c576539
Remove ase and dataprovider (#108)
andersx Mar 2, 2019
2665f63
Removed all f-code requiring omp_lib from FCHL code (#111)
andersx May 21, 2019
954cdd6
MRMP changes (#112)
Jun 26, 2019
3ed7379
* merging latest version of development branch into
Jul 22, 2019
5a5f0e2
* updated CQML implementation to be compatible with latest
Jul 22, 2019
2632a7b
* updated equation reference in CQML implementation
Jul 22, 2019
bf1960a
* Removed data files for CQML example and added README.cqml, which
Jul 29, 2019
d002291
Merge remote-tracking branch 'upstream/develop' into dev_cqml
Jul 29, 2019
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions README.rst
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,8 @@ Current list of contributors:
- Alexandre Tkatchenko (University of Luxembourg)
- Klaus-Robert Muller (Technische Universitat Berlin/Korea University)
- \O. Anatole von Lilienfeld (University of Basel)
- Peter Zaspel (University of Basel)
- Helmut Harbrecht (University of Basel)

1) Citing QML:
--------------
Expand Down
8 changes: 8 additions & 0 deletions examples/README.cqml
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
To be able to use the examples for the CQML approach, please first download
the file "DataSet_cqml.tgz" from

https://doi.org/10.6084/m9.figshare.9130973

and decompress it in the "examples" folder by

tar xvzf DataSet_cqml.tgz
273 changes: 273 additions & 0 deletions examples/cqml2d_CI9.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,273 @@
# MIT License
#
# Copyright (c) 2018 Peter Zaspel, Helmut Harbrecht
#
# Permission is hereby granted, free of charge, to any person obtaining a copy
# of this software and associated documentation files (the "Software"), to deal
# in the Software without restriction, including without limitation the rights
# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
# copies of the Software, and to permit persons to whom the Software is
# furnished to do so, subject to the following conditions:
#
# The above copyright notice and this permission notice shall be included in all
# copies or substantial portions of the Software.
#
# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
# SOFTWARE.

from __future__ import print_function

from math import *
import os
import numpy as np
import sys
#import matplotlib.pyplot as plt

import qml
from qml.models.cqml2d import *


trials = 20


def write_data(filename, costs, errors):
outfile = open(filename,'w')
outfile.write("N error\n")
for i in range(len(errors)):
outfile.write("%d %e\n" % (costs[i],errors[i]))
outfile.close()

# this test compares cqml2d with different numbers of level (PM7 + DFT + G4MP2 vs. DFT + G4MP2 vs. G4MP2) for the CI9 data set
# with Coulomb matrix representation, cf. Figure 6 in the paper
# "Boosting quantum machine learning models with multi-level cqml_objectnation technique: Pople diagrams revisited"
def test_cqml2d_coulomb():

#########################
# loading multilevel data
#########################

# getting structure for config of multilevel data
ml_data_cfg = multilevel_data_config()

# setting data files per level for data which is to be learned
ml_data_cfg.data_files = ["DataSet_cqml2d_CI9/Y1.dat","DataSet_cqml2d_CI9/Y2.dat","DataSet_cqml2d_CI9/Y3.dat"]

# setting directories per level for the xyz files
ml_data_cfg.xyz_directories = ["DataSet_cqml2d_CI9/db/ci6k_PM7","DataSet_cqml2d_CI9/db/ci6k_B3LYP631G2DFP","DataSet_cqml2d_CI9/db/ci6k_G4MP2"]

# choose representation
ml_data_cfg.representation_type = "coulomb_matrix"

# choose the number of compounds that shall be loaded
ml_data_cfg.N_load = 6095

# choose the number of levels to load
ml_data_cfg.level_count_load = 3

# load the multilevel data
ml_data = load_multilevel_data(ml_data_cfg)

# artificially imposint costs for the level (in the future this shall be delivered with the data)
ml_data.level_costs = [0,0,1]

###############################
# testing combination technique
###############################

# getting structure for the configuraiton of the combination technique test
ct_cfg = cqml2d_config()

# choose the maximum resolution level (i.e. number of learning samples = 2^max_resolution_level) for which the convergence shall be checked
ct_cfg.max_resolution_level = 12 # 9

# set the scaling parameters for each level (this is the global level as in the ml_data structure)
ct_cfg.scalings = [400,400,400]

# set the regularization parameter for each level (this is the clobal level as in the ml_data structure)
ct_cfg.regularizations = [10**(-10),10**(-10),10**(-10)];

# set the jump size between each individual level
ct_cfg.level_jump_size = [1]

# set the (global!) level on which the error shall be evaluated (starting from 0!)
ct_cfg.error_level = 2


# set the current (global!) base level from which the combination technique shall be started (starting from 0)
# here: start on level 0
ct_cfg.base_level = 0
# set the current (local!) number of levels for which the combination technique shall be computed
# here: use all levels
ct_cfg.level_count = 3

# => this is the full combination technique on all levels

# create a figure
# f = plt.figure()

# do the computation with averaging over "trials" trials
(errors, costs) = compute_cqml2d_convergence(ml_data_cfg, ml_data, ct_cfg, trials);
# plot the result
# plt.loglog(costs,errors, label="PM7+DFT+G4MP2")
write_data('DataSet_cqml2d_CI9/DataSet_cqml2d_CI9_PM7_DFT_G4MP2_coulomb.csv',costs,errors)

# here: start on level 1
ct_cfg.base_level = 1
# here: use only DFT+G4MP2
ct_cfg.level_count = 2

# => this corresponds to a modified version of Delta-ML

# compute
(errors, costs) = compute_cqml2d_convergence(ml_data_cfg, ml_data, ct_cfg, trials);
# plot
# plt.loglog(costs,errors, label="DFT+G4MP2")
write_data('DataSet_cqml2d_CI9/DataSet_cqml2d_CI9_DFT_G4MP2_coulomb.csv',costs,errors)

# here: start on level 2
ct_cfg.base_level = 2
# here: use only one level
ct_cfg.level_count = 1

# => this corresponds to standard learning on level 2

# compute
(errors, costs) = compute_cqml2d_convergence(ml_data_cfg, ml_data, ct_cfg, trials);
# plot
# plt.loglog(costs,errors, label="G4MP2")
write_data('DataSet_cqml2d_CI9/DataSet_cqml2d_CI9_G4MP2_coulomb.csv',costs,errors)

# generate a legend for the plot
# plt.legend()
# add axis labels and a title
# plt.xlabel("number of most expensive molecules")
# plt.ylabel("MAE [kcal/mol] wrt. G4MP2 solution")
# plt.title("CQML: Plot wrt. number of most expensive molecules")

# plt.show()



# this test compares cqml2d with different numbers of level (PM7 + DFT + G4MP2 vs. DFT + G4MP2 vs. G4MP2) for the CI9 data set
# with SLATM representation, cf. Figure 6 in the paper
# "Boosting quantum machine learning models with multi-level cqml_objectnation technique: Pople diagrams revisited"
def test_cqml2d_slatm():

#########################
# loading multilevel data
#########################

# getting structure for config of multilevel data
ml_data_cfg = multilevel_data_config()

# setting data files per level for data which is to be learned
ml_data_cfg.data_files = ["DataSet_cqml2d_CI9/Y1.dat","DataSet_cqml2d_CI9/Y2.dat","DataSet_cqml2d_CI9/Y3.dat"]

# setting directories per level for the xyz files
ml_data_cfg.xyz_directories = ["DataSet_cqml2d_CI9/db/ci6k_PM7","DataSet_cqml2d_CI9/db/ci6k_B3LYP631G2DFP","DataSet_cqml2d_CI9/db/ci6k_G4MP2"]

# choose representation
ml_data_cfg.representation_type = "slatm"

# choose the number of compounds that shall be loaded
ml_data_cfg.N_load = 6095

# choose the number of levels to load
ml_data_cfg.level_count_load = 3

# load the multilevel data
ml_data = load_multilevel_data(ml_data_cfg)

# artificially imposint costs for the level (in the future this shall be delivered with the data)
ml_data.level_costs = [0,0,1]

###############################
# testing combination technique
###############################

# getting structure for the configuraiton of the combination technique test
ct_cfg = cqml2d_config()

# choose the maximum resolution level (i.e. number of learning samples = 2^max_resolution_level) for which the convergence shall be checked
ct_cfg.max_resolution_level = 12 # 9

# set the scaling parameters for each level (this is the global level as in the ml_data structure)
ct_cfg.scalings = [400,400,400]

# set the regularization parameter for each level (this is the clobal level as in the ml_data structure)
ct_cfg.regularizations = [10**(-10),10**(-10),10**(-10)];

# set the jump size between each individual level
ct_cfg.level_jump_size = [1]

# set the (global!) level on which the error shall be evaluated (starting from 0!)
ct_cfg.error_level = 2


# set the current (global!) base level from which the combination technique shall be started (starting from 0)
# here: start on level 0
ct_cfg.base_level = 0
# set the current (local!) number of levels for which the combination technique shall be computed
# here: use all levels
ct_cfg.level_count = 3

# => this is the full combination technique on all levels

# create a figure
# f = plt.figure()

# do the computation with averaging over "trials" trials
(errors, costs) = compute_cqml2d_convergence(ml_data_cfg, ml_data, ct_cfg, trials);
# plot the result
# plt.loglog(costs,errors, label="PM7+DFT+G4MP2")
write_data('DataSet_cqml2d_CI9/DataSet_cqml2d_CI9_PM7_DFT_G4MP2_slatm.csv',costs,errors)

# here: start on level 1
ct_cfg.base_level = 1
# here: use only DFT+G4MP2
ct_cfg.level_count = 2

# => this corresponds to a modified version of Delta-ML

# compute
(errors, costs) = compute_cqml2d_convergence(ml_data_cfg, ml_data, ct_cfg, trials);
# plot
# plt.loglog(costs,errors, label="DFT+G4MP2")
write_data('DataSet_cqml2d_CI9/DataSet_cqml2d_CI9_DFT_G4MP2_slatm.csv',costs,errors)

# here: start on level 2
ct_cfg.base_level = 2
# here: use only one level
ct_cfg.level_count = 1

# => this corresponds to standard learning on level 2

# compute
(errors, costs) = compute_cqml2d_convergence(ml_data_cfg, ml_data, ct_cfg, trials);
# plot
# plt.loglog(costs,errors, label="G4MP2")
write_data('DataSet_cqml2d_CI9/DataSet_cqml2d_CI9_G4MP2_slatm.csv',costs,errors)

# generate a legend for the plot
# plt.legend()
# add axis labels and a title
# plt.xlabel("number of most expensive molecules")
# plt.ylabel("MAE [kcal/mol] wrt. G4MP2 solution")
# plt.title("CQML: Plot wrt. number of most expensive molecules")

# plt.show()



def main():

test_cqml2d_coulomb()
test_cqml2d_slatm()

if __name__ == '__main__':
main()
Loading