This documentation is not really complete (yet). Please contact when you have any questions!
Throughout this documentation we assume that you are familiar with the theoretical background behind the scanning transmission electron microscope (STEM) to some degree. Also, we assume that you have some knowledge about the UNIX/Linux command line and parallelized computation. STEMsalabim is currently not intended to be run on a desktop computer. While that is possible and works, the main purpose of the program is to be used in a highly parallelized multi-computer environment.
We took great care of making STEMsalabim easy to install. You can find instructions at Installing STEMsalabim. However, if you run into technical problems you should seek help from an administrator of your computer cluster first.
Structure of a simulation¶
There essence of STEMsalabim is to model the interaction of a focused electron beam with a bunch of atoms, typically in the form of a crystalline sample. Given the necessary input files, the simulation crunches numbers for some time, after which all of the calculated results can be found in the output file. Please refer to Executing STEMsalabim for notes how to start a simulation.
All information about the specimen are listed in the Crystal file format, which is one of the two required input files for STEMsalabim. It contains each atom’s species (element), coordinates, and mean square displacement as it appears in the Debye-Waller factors.
In addition, you need to supply a Parameter files for each simulation, containing information about the microscope, detector, and all required simulation parameters. All these parameters are given in a specific syntax in the Parameter files that are always required for starting a STEMsalabim* simulation.
The complete output of a STEMsalabim simulation is written to a NetCDF file. NetCDF is a binary, hierarchical file format for scientific data, based on HDF5. NetCDF/HDF5 allow us to compress the output data and store it in machine-readable, organized format while still only having to deal with a single output file.
You can read more about the output file structure at Output file format.
Hybrid Parallelization model¶
STEMsalabim simulations can be parallelized both via POSIX threads and via message passing interface (MPI). A typical simulation will use both schemes at the same time: MPI is used for communication between the computing nodes, and threads are used for intra-node parallelization, the usual multi-cpu/multi-core structure.
A high performance computation cluster is an array of many (equal) computing nodes. Typical highly-parallelized software uses more than one of the nodes for parallel computations. There is usually no memory that is shared between the nodes, so all information required for the management of parallel computing needs to be explicitely communicated between the processes on the different machines. The quasi-standard for that is the message passing interface (MPI).
Let us assume a simulation that runs on \(M\) computers and each of them spawns \(N\) threads.
There is a single, special master thread (the thread 0 of the MPI process with rank 0) that orchestrates the simulation, distributes work packages and does all the I/O. This is to avoid race conditions and prevent waiting times of the worker process. All other threads (\((M\times N)-1\)) participate in the simulation.
A typical STEMsalabim simulation is composed of many independent multi-slice simulations that differ only in the position of the scanning probe. Hence, parallelization is done on the level of these multi-slice simulations, with each thread performing them independently from other threads. In order to reduce the number of MPI messages being sent around, only the main thread of each of the \(M\) MPI processors communicates with the master thread. The master thread sends a work package containing some number of probe pixels to be calculated to an MPI process, which then carries out all the calculations in parallel on its \(N\) threads, and sends back the results when it is finished. After the results are received by the master thread, it sends another work package to the MPI process until there is no work left. In parallel, the worker threads of the MPI process with rank 0 also work on emptying the work queue.
Within one MPI processor, the threads can share their memory. As the main memory consumption comes from storing the weak phase objects of the slices in the multi-slice simulation, which don’t change during the actual simulation, this greatly reduces memory usage as compared to MPI only parallelization. You should therefore always aim for hybrid parallelization!