This directory contains all scripts required to generate molecular representations, including wall-time measurements.
To generate molecular representations for an entire dataset, use the dataset_submit.sh script.
Arguments must be provided in the following order: Dataset name, Representation type, and Directory path containing XYZ files.
For the representation types used in the main text, see here.
bash dataset_submit.sh TM-GSspinPlus SPAHM_e ../data/TM-GSspinPlus/0-xyzThis command creates a job file named:
generate_{rep}_{dataset}.job
To generate representations for only 10 molecules (useful for testing and debugging), add the debug argument:
bash dataset_submit.sh TM-GSspinPlus SPAHM_e ../data/TM-GSspinPlus/0-xyz debugThis creates a job file named generate_{rep}_{dataset}_debug.job
If you are running on a SLURM-based system, submit the job using sbatch. For local execution, simply run the job file with bash.
To generate representations for a predefined subset of molecules, use subset_submit.sh.
In addition to the dataset, representation type, and XYZ directory, provide a text file containing refcodes for the subset.
bash subset_submit.sh TM-GSspinPlus SPAHM_e ../data/TM-GSspinPlus/0-xyz \
TM-GSspinPlus/subset_refcodes/TM-GSspinPlus_sub_0.txtFor further details, see:
dataset_submit.shsubset_submit.sh- Python scripts in each dataset directory:
generate_rep_spahm_qstack.pygenerate_rep_slatm_fchl_qml2.pygenerate_rep_soap_featomic.py
- Example output files in
examples_output/*.out
Execution timings are reported in the last line of each output file.
The job generates representation arrays (.npy files) in the corresponding dataset directory (for example, see TM-GSspinPlus).
For representations used to predict spin-splitting energies, comment out the lines that include high-spin geometries (or high-spin states).
See the Python scripts in OctaKulik for details.
To measure kernel computation times for Gaussian (G) or Laplacian (L) kernels, use kernel_submit.sh.
Provide the Python script time_kernel_qstack.py, dataset name, representation type, path to the .npy representation file, and kernel type.
bash kernel_submit.sh time_kernel_qstack.py TM-GSspinPlus SPAHM_e \
TM-GSspinPlus/SPAHM_e-TM-GSspinPlus-subset.npy LThis command generates a job file named:
kernel_{kernel}_{rep}_{dataset}.job
For details, see:
kernel_submit.shtime_kernel_qstack.py- Example timing outputs in
examples_output/*.out.
Execution timings are reported in the last line of the corresponding output file.