[Pw_forum] again on OpenMPI 1.3.3
Carlo Nervi
carlo.nervi at unito.it
Wed Oct 7 14:30:11 CEST 2009
Hi again,
I am sorry to bother all the community with my compiling problems, that
a little OT, but they are quite unusual. Certainly something is wrong in
my machine (Linux Gentoo on dual Xeon 5345), but I cannot guess what.
After many tests and compiling I found that pw.x run perfectly using
"mpirun -np 2", but fail with "mpirun -np 8". The error is
"MPI_ABORT was invoked on rank 0 in communicator MPI_COMM_WORLD with
errorcode 0".
This really I cannot understand.
Is there anyone that could give me a hint?
I succesfully compiled the "pingpong" code below with mpif90 (wrapper to
ifort). Also in this case mpirun -np 2 works, but if I put -np 4, 6 or 8
the program crash with the following message:
*** Process received signal ***
Signal: Segmentation fault (11)
Signal code: Address not mapped (1)
Failing at address: (nil)
[ 0] /lib/libpthread.so.0 [0x2b48da46b400]
[ 1] /lib/libc.so.6(fputs+0x1e) [0x2b48da6d9d0e]
[ 2] ./pingpong(main+0x206) [0x402536]
[ 3] /lib/libc.so.6(__libc_start_main+0xf4) [0x2b48da6965e4]
[ 4] ./pingpong [0x4022b9]
*** End of error message ***
I was thinking that ssh does not propagate the environment variables (so
the libraries cannot be found), but it runs on 2 cpus!
Any helps would be greatly appreciated.
Carlo
----------------
/* pingpong - measure effective bandwidth and latency */
#include "mpi.h"
#include <stdio.h>
#include <unistd.h>
#include <sys/time.h>
#include <sys/types.h>
#include <errno.h>
#define MAXSIZE (1024*1024)
#define MINSIZE (0)
#define REPEAT 50
#define INCSIZE (2)
#define INCOP *=
#define CALIBRATION_LOOPS 100
#define TAG_PING 1
#define TAG_PONG 2
/* define DETAIL if you want to create histogramms by measuring
the latency of each single ping-pong transfer */
#if 0
#define DETAIL
#include "getus.h"
#endif
#ifdef linux
#define longlong_t long long
#endif
char *buffer;
char *exename;
int min_size, max_size, inc_size, repeats;
int myrank, mysize;
static FILE* fpGlobal = NULL;
static FILE* fpDetail = NULL;
void ping (int to, int from);
void pong (int to);
int main(int argc, char **argv) {
MPI_Status status;
int first;
char fname[128];
MPI_Init(&argc, &argv);
MPI_Comm_rank(MPI_COMM_WORLD, &myrank);
MPI_Comm_size(MPI_COMM_WORLD, &mysize);
if (myrank == 0) {
strcpy(fname, argv[0]);
strcat(fname,".dat");
fpGlobal = fopen(fname,"w");
}
exename = argv[0];
if(mysize % 2 != 0) {
printf ("pingpong must be used with an even number of
processes.\n");
MPI_Finalize();
exit (1);
}
/* set run parameters */
if (argc != 4) {
printf ("usage: pingpong min_size max_size repeats\n");
printf ("using default values for this run\n");
min_size = MINSIZE;
max_size = MAXSIZE;
inc_size = INCSIZE;
repeats = REPEAT;
} else {
min_size = atoi( argv[1] );
max_size = atoi( argv[2] );
inc_size = INCSIZE;
repeats = atoi( argv[3] );
}
buffer = (char *)malloc (max_size);
/* find ping and pong processes */
if ( myrank < mysize/2 ) {
if (myrank % 2 == 0)
ping( myrank + mysize/2, myrank);
else
pong( myrank + mysize/2 );
} else {
first = (mysize/2) % 2;
if (myrank % 2 == first)
pong( myrank - mysize/2 );
else
ping( myrank - mysize/2, myrank );
}
if (myrank == 0) {
fclose(fpGlobal);
}
free (buffer);
MPI_Finalize();
}
void ping( int to, int from ) {
MPI_Status status;
double starttime, totaltime;
double getticks_overhead;
#ifdef DETAIL
longlong_t hr_start, hr_end;
longlong_t calibration = 0;
longlong_t *timings;
#endif
char fname[128];
char bytes[128];
int i, j;
int firstrun = 1;
#ifdef DETAIL
fprintf (stderr, "Calibrating...");
for (i = 0; i < CALIBRATION_LOOPS; i++) {
GETTICKS(&hr_start);
GETTICKS(&hr_end);
calibration += hr_end - hr_start;
}
getticks_overhead = ((double)calibration)/(CALIBRATION_LOOPS);
fprintf (stderr, "gethrtime() overhead is %6.3f\n", getticks_overhead);
timings = (longlong_t *)malloc (repeats*sizeof(longlong_t));
#endif
printf("pingpong from %d to %d\n\n", from, to);
fprintf(fpGlobal, "# msgsize[byte] repeats bandwidth[MB/s]
latency[us]\n");
fflush(stdout);
for( i = min_size; i <= max_size; i INCOP inc_size) {
if ((!firstrun) && (i == 0)) {
i++;
if (i > max_size)
break;
}
--
------------------------------------------------------
Carlo Nervi carlo.nervi at unito.it Tel:+39 011 6707507/8
Fax: +39 011 6707855 - Dipartimento di Chimica IFM
via P. Giuria 7, 10125 Torino, Italy
http://lem.ch.unito.it/
More information about the Pw_forum
mailing list