PaScal Analyzer
The Parallel Scalability Analyzer (PaScal Analyzer) is a profiling tool designed to measure and compare executions of parallel applications across different configurations. Its main goal is to help developers understand the scalability potential of their programs by observing how computational resources are utilized under varying conditions.
PaScal Analyzer stands out for its low-intrusiveness, ensuring that the behavior and scalability of the application are measured accurately without significantly impacting performance. After execution, the Analyzer organizes the collected data — such as execution time, power consumption, and performance counters — for future visual analysis using PaScal Viewer.
The tool supports both automatic and manual instrumentation: * Automatic instrumentation allows developers to run the tool without modifying the code, making the profiling process faster and more convenient. * Manual instrumentation involves explicitly marking regions of interest in the source code, giving developers precise control over which parts of the execution are monitored.
Key Features
- Reads and executes multiple configuration definitions of a parallel program.
- Measures and collects performance data automatically.
- Collects data in a non-intrusive manner.
- Organizes information for visual exploration of scalability trends.
- Enables developers to identify performance bottlenecks and optimize resource usage.
Requirements
- Linux operating system
- For profiling:
- Applications written in C and C++
- OpenMP or PThreads (MPI support planned)
- Profiling binary: GNU C Compiler (GCC)
Installation
PaScal Analyzer needs to be installed on the machine where the parallel program will run. A Linux x86_64 binary version can be quickly installed with the following commands:
wget -c https://gitlab.com/lappsufrn/pascal-releases/-/archive/master/pascal-releases-master.zip
unzip pascal-releases-master.zip
cd pascal-releases-master/
source env.sh
To see help options:
pascalanalyzer -h
Instrumenting Source Code
Manual instrumentation is available using the functions pascal_start(id)
and pascal_stop(id)
defined at include/pascalops.h
.
For manual instrumentation of PThreads or OpenMP code:
- Include the
pascalops.h
header file in your source file. - Use
pascal_start(region_id)
to mark the beginning of a region. - Use
pascal_stop(region_id)
to mark the end of the region.
Example
#include "pascalops.h"
// ...existing code...
int main()
{
// ...existing code...
pascal_start(1);
#pragma omp parallel
{
// ...parallel region code...
}
pascal_stop(1);
// ...existing code...
}
To perform automatic instrumentation, simply run pascalanalyzer with the optional
argument -t {aut,AUT}
/--inst {aut,AUT}
.
Compiling Instrumented Code
-
After installation, the
pascalops
library should be visible to GCC. -
Simply add the flag
-lmpascalops
for compiling manually instrumented sources.
gcc myapp.c -fopenmp -lmpascalops -o myapp
Flags
Below is the list of available flags and options for pascalanalyzer:
usage: pascalanalyzer [-h] [-c CORES] [-f FREQUENCIES] [-i INPUTS] [-o OUTPUT]
[-r RPTS] [-g] [-t {aut,AUT,man,MAN}] [-a LEVEL] [--mpi RUNTIME] [--ragt TYPE] [-v VERBOSE] [--dcrs] [--dhpt] [--domp] [--dout] [--govr GOVERNOR] [--idtm TIME]
[--fgpe EVENT] [--fgps EVENT] [--rple {sysfs,perf}] [--rpls {sysfs,scontrol,perf}] [--ipmi SERVER USER PASSWORD] [--modl NPTS MODE] [--prcs] [--lpcs] [--imnt PATH EXTENSIONS] [application]
Script to run application collecting information while executing a program with different configurations of cores, frequencies, and inputs arguments.
positional arguments:
application Application name to run
optional arguments:
-h, --help show this help message and exit
-c CORES, --cors CORES
List of cores numbers to be used. Ex: 1,2,4. For scalability analysis, organize the number of cores from the smallest to the largest and make sure the ratio between adjacent core numbers is consistent with the ratios for the -i argument.
-f FREQUENCIES, --frqs FREQUENCIES
List of frequencies (KHz). Ex: 2000000,2100000
-i INPUTS, --ipts INPUTS
Input arguments to be used separated by commas. For scalability analysis, organize the input arguments from the smallest problem size to the largest and sure the ratio between adjacent problem sizes is consistent with the ratios for the -c argument.
-o OUTPUT, --outp OUTPUT
Output file name
-r RPTS, --rpts RPTS Number of repetitions for a specific run. (Default: 1)
-g, --gapr Include as regions the gaps among regions already identified
-t {aut,AUT,man,MAN}, --inst {aut,AUT,man,MAN}
Instrumentation type to identify code regions (auto or manual). When not used, the tool will not attempt to identify inner regions.
-a LEVEL, --ragl LEVEL
Set the regions hierarchy level to aggregate measures. 0 = No level.
--mpi RUNTIME Enable mpi and set the runtimer
--ragt TYPE Region aggregation type. The 'acc' type means to use a accumulated time
variable that reduces memory consumption and calculation time.
-v VERBOSE, --verb VERBOSE
verbosity level. 0 = No verbose
--dcrs Disable cores
--dhpt Enable hyperthread (disabled by default)
--domp Set OMP_NUM_THREADS variable with -c parameter values. (enabled by default)
--dout Discard the output
--govr GOVERNOR Set the cpu governor
--idtm TIME Idle time between runs. (Default: 0)
--fgpe EVENT Collect performance counters
--fgps EVENT Sample performance counters
--rple {sysfs,perf} Enable rapl energy measuments
--rpls {sysfs,scontrol,perf}
Enable rapl sample energy measuments
--ipmi SERVER USER PASSWORD
Enable ipmi measuments
--modl NPTS MODE Run pascal with random configurations to create a model
--prcs Track cpu cores
--lpcs List performance counters
--imnt PATH EXTENSIONS
Instrument code with pascal_start() and pascal_stop() around OpenMP parallel regions
Examples
Here are some examples of how to use pascalanalyzer:
# Example 1: Automatic instrumentation with specific cores and inputs
pascalanalyzer ./myapp --inst aut --idtm 5 --cors 1:4 --ipts 10,20,30,40 --verb INFO
# Example 2: Manual instrumentation with runtime and output file
pascalanalyzer ./myapp -t man -g -r 5 -c 1,32 -i 10,320 -v 2 -o myoutput.json