The simplist OpenMP demo is given in openmp_demo.c
#include <stdio.h>
#include <stdlib.h>
#include <omp.h>
void Hello(void);
int main(int argc, char* argv[]){
int thread_count = strtol(argv[1], NULL, 10);
# pragma omp parallel num_threads(thread_count)
return 0;
void Hello(void) {
int my_rank = omp_get_thread_num();
int thread_count = omp_get_num_threads();
printf("Hello from thread %d of %d\n", my_rank, thread_count);
OpenMP instruction should be programmed inside #pragma omp
. The statement
# pragma omp parallel num_threads(thread_count)
open thread_count
threads to execute the followed block, specifically, the current thread, as the master thread, forks thread_count - 1
slaves threads to execute the same block, the master and slaves are called a team. An implicit barrier is added after Hello()
hence the master will block until all the threads returned from the block.
As shown, the function omp_get_thread_num()
and omp_get_num_threads()
return the rank of current thread and the count of threads respectively.
To be compatible with those complier does not support OpenMP, we can use the marco _OPENMP
to do the conditional compiling, for example we can modify the demo program as openmp_demo_comp.c
#include <stdio.h>
#include <stdlib.h>
#ifdef _OPENMP
#include <omp.h>
void Hello(void);
int main(int argc, char* argv[]){
int thread_count = strtol(argv[1], NULL, 10);
# pragma omp parallel num_threads(thread_count)
return 0;
void Hello(void) {
#ifdef _OPENMP
int my_rank = omp_get_thread_num();
int thread_count = omp_get_num_threads();
int my_rank = 0;
int thread_count = 1;
printf("Hello from thread %d of %d\n", my_rank, thread_count);
To complie the C program with OpenMP:
gcc -g -Wall -fopenmp -o hello hello.c
and two approaches to specify the threads count, passing argument or getting from environement variable:
by passing to argument:
./hello 4
Inside the OpenMP implementation of trapezoidal rule openmp_trapezoidal.c, the critical section is specified by
where the pointer global_result_p
points to the global sum which might be modified by multiple threads synchronically, we use the #pragma omp critical
to implement mutual exclusion.
There are two types of variable: shared variable that can be accessed by all the threads and, private variable can only be accessed by the current thread.
All the variables defined before the parallel block are shared by default, while the variables defined in the block is private.
The reduction clause is provided in OpenMP as the shortcut for reduction:
double global_result = 0.0;
# pragma omp parallel num_threads(thread_count) \
reduction(+: global_result)
double a, double b, int n); global_result += Local_trap(
This is equivalent to the manual critical version
double global_result = 0.0;
# pragma omp parallel num_threads(thread_count)
{double my_result = 0.0;
double a, double b, int n);
my_result += Local_trap(# pragma omp critical
double a, double b, int n);
global_result += Local_trap( }
Operator | Identity Value |
&& | 1 |
|| | 0 |
& | 111…111 |
| | 0 |
^ | 0 |
h = (b - a)/n;2.0
approx = (f(a) + f(b))/#pragma omp parallel for num_threads(thread_count) \
for (i = 0; i <= n-1; i++)
approx += f(a + i*h); approx = h * approx;
Only a limit form of for
are supported:
must be a integer or pointer type. (for example, it can’t be a float.)start
, end
and incr
must have a compatible type.start
, end
and incr
must not change during execution of the loop.for
statement.OpenMP is not aware of the dependency between iteration. The programmer should avoid introducing the loop-carried dependency in parallel for
int Linear_search(int key, int A[], int n){
int i;
# paraga omp parallel
OpenMP provides atomic
To set a single assignment as critical section:
# pragma omp atomic
x += y++;
To time the omp program, use double omp_get_time(void)