Skip to Content.
Sympa Menu

charm - Re: [charm] Is AMPI support MPI_Waitall?

charm AT lists.cs.illinois.edu

Subject: Charm++ parallel programming system

List archive

Re: [charm] Is AMPI support MPI_Waitall?


Chronological Thread 
  • From: Gengbin Zheng <zhenggb AT gmail.com>
  • To: Zhang Kai <zhangk1985 AT gmail.com>
  • Cc: Phil Miller <mille121 AT illinois.edu>, charm AT cs.uiuc.edu
  • Subject: Re: [charm] Is AMPI support MPI_Waitall?
  • Date: Fri, 12 Feb 2010 09:00:12 -0600
  • List-archive: <http://lists.cs.uiuc.edu/pipermail/charm>
  • List-id: CHARM parallel programming system <charm.cs.uiuc.edu>


I was asking because I could not repeat your problem. I ran it on ranger a few times, and it worked for me. This is an example job output on ranger cluster:

./charmrun +p128 ./t 32768 30 +vp256 +tcharm_stacksize 32768000
Warning> Randomization of stack pointer is turned on in kernel, thread migration may not work! Run 'echo 0 > /proc/sys/kernel/randomize_va_space' as root to disable it, or try run with '+isomalloc_sync'.
Charm++: scheduler running in netpoll mode.
Charm++> cpu topology info is being gathered.
Charm++> Running on 8 unique compute nodes (16-way SMP).
Computed 32768 particles in 3.081355 seconds
TACC: Cleaning up after job: 1245471
TACC: Done.


AMPI runs each MPI rank in a user-level thread,  +tcharm_stacksize is to set the maximum size of the stack for each thread. Since this program declares large arrays on stack, it is important to specify a large enough stack size. But I think 32MB like we are using here should be more than enough.

In your runs, do you even see the first lines of charm output?
If you don't see any output, it might be a startup problem. Try add  ++scalable-start to your command line.

Gengbin


2010/2/12 Zhang Kai <zhangk1985 AT gmail.com>
Of coures the value of MAX_P is larger than 256

Actually, I set MAX_PARTICLES to 32768 and MAX_P to 512. I don't think there is any problem in the program because I can run it with MPICH2.

The program also failed when I used +p128 +vp128 options.

By the way, I am not sure about the meaning of +tcharm_stacksize.

Does that means the communication buffer between VPs?


在 2010年2月12日 上午2:35,Gengbin Zheng <zhenggb AT gmail.com>写道:


What are the values of MAX_PARTICLES and MAX_P in your program.

I suppose your command line is like:

./charmrun +p128 ./pgm 32768 30  +vp256 +tcharm_stacksize 32768000 

so you are actually running on 256 virtual AMPI processes.

Gengbin

2010/2/9 Zhang Kai <zhangk1985 AT gmail.com>

What I modified is just the values of  MAX_PARTICLES and MAX_P defined before main() so that it can run on larger scale.

Besides this, I dosen't modify any line.

Zhang Kai




2010/2/10 Phil Miller <mille121 AT illinois.edu>

I'll investigate this on a local cluster. Are you working with the
sample code you originally linked, or a modified version? If the
sample code works, but your modification doesn't, please send us the
modified version so we can isolate the bug.

Phil

2010/2/9 Zhang Kai <zhangk1985 AT gmail.com>:
> hi Phil:
>
> Sorry to disturb you again.
>
> I met a new problem when I run this nbody program on a cluster which
> contains 16 nodes and totally 128 phisical processors.
>
> I have successfully run it with MPICH2. And I can also build it with AMPI.
>
> However, when I run it with this command:
> ./charmrun +p128 ./pgm 32768 30  +vp256 +tcharm_stacksize 32768000
> ++nodelist charm.hosts
>
> I found it doesn't work.
>
> By using ++verbose option, I found all nodes I listed in the nodefile have
> been connected and all pgm processs have been created at all nodes.
> But then, there was no response from them.
>
> Some times it may run successfully when I use +p64 instead of +p128. But
> some times it may not.
>
> I wonder if there is another bug or i just misconfiged AMPI.
>
> Looking forward your reply.
>
> Best wishes!
>
> Zhang Kai
> Department of Computer Science & Technology
> Tsinghua University
>
>
>
>
> 2010/1/30 Phil Miller <mille121 AT illinois.edu>
>>
>> The attached patch is generated from and applies cleanly to charm-6.1.3.
>>
>> If you're familiar with the 'git' version control system, our
>> repository can be cloned from git://charm.cs.uiuc.edu/charm.git
>>
>> If you don't mind my asking, what work are you doing with AMPI? We'd
>> be very interested to hear about your experiences.
>>
>> Phil
>>
>> On Fri, Jan 29, 2010 at 08:03, Phil Miller <mille121 AT illinois.edu> wrote:
>> > 2010/1/29 张凯 <zhangk1985 AT gmail.com>:
>> >> I am using charm-6.1.3.  Can that patch fix the problem on charm-6.1.3?
>> >>
>> >> When I patch it to ampi.C by using command "patch
>> >> src/libs/ck-libs/ampi/ampi.C  AMPI_Cart_shift.patch", it reports :
>> >>
>> >> patching file src/libs/ck-libs/ampi/ampi.C
>> >> Hunk #1 succeeded at 5158 (offset -185 lines).
>> >> Hunk #2 FAILED at 5198.
>> >> 1 out of 2 hunks FAILED -- saving rejects to file
>> >> src/libs/ck-libs/ampi/ampi.C.rej
>> >>
>> >> AfterI tried to read the patch file and fix it by myself, i found that
>> >> there
>> >> maybe a lot difference between the released version and development
>> >> version
>> >> of charm source code.
>> >>
>> >> Could you help me build an patch for charm-6.1.3? Or tell me where can
>> >> i
>> >> find stable developemnt version which I can patch it by myself?
>> >
>> > I'll send you the patch against 6.1.3 shortly, when I'm in the office.
>> >
>> > Phil
>> >
>> >
>> >>
>> >> Zhang Kai
>> >>
>> >> 2010/1/29 Phil Miller <mille121 AT illinois.edu>
>> >>>
>> >>> The bug has been fixed in the development version of charm. If you use
>> >>> a pre-built development binary, the fix will be in tonight's autobuild
>> >>> for whatever platform you use. If you're building it from development
>> >>> source yourself, the patch is attached. If you are using the released
>> >>> Charm 6.1.x, we can port that fix over for you if you're not
>> >>> comfortable doing so yourself.
>> >>>
>> >>> Phil
>> >>>
>> >>> 2010/1/28 张凯 <zhangk1985 AT gmail.com>:
>> >>> >
>> >>> > i think u got the problem i suffered.
>> >>> >
>> >>> > thanks for your reply.
>> >>> >
>> >>> > best regards.
>> >>> >
>> >>> > Zhang Kai
>> >>> >
>> >>> > 2010/1/29 Phil Miller <mille121 AT illinois.edu>
>> >>> >>
>> >>> >> On Thu, Jan 28, 2010 at 07:50, 张凯 <zhangk1985 AT gmail.com> wrote:
>> >>> >> > hi:
>> >>> >> >
>> >>> >> > I am a beginner of AMPI and trying to run a MPI program using it.
>> >>> >> > But
>> >>> >> > i
>> >>> >> > found a little problem.
>> >>> >> >
>> >>> >> > Here(
>> >>> >> >
>> >>> >> >
>> >>> >> >
>> >>> >> > http://www.mcs.anl.gov/research/projects/mpi/usingmpi/examples/advmsg/nbodypipe_c.htm)
>> >>> >> > you can find an example of a MPI program. I have successfully
>> >>> >> > built
>> >>> >> > and
>> >>> >> > run it using both MPICH and intel MPI.
>> >>> >> >
>> >>> >> > However, when i running it with AMPI, i found that the program
>> >>> >> > was
>> >>> >> > blocked
>> >>> >> > by MPI_Waitall function and never return again.
>> >>> >> >
>> >>> >> > I just run it with ++local +p2 +vp2 options. Did i miss other
>> >>> >> > options?
>> >>> >> > or
>> >>> >> > misconfig AMPI?
>> >>> >>
>> >>> >> I'm seeing the same effect as you describe on a net-linux-x86_64
>> >>> >> build
>> >>> >> of AMPI from the latest charm sources. We'll look into this and get
>> >>> >> back to you.
>> >>> >>
>> >>> >> For reference, the attached code (with added prints) produces the
>> >>> >> following:
>> >>> >>
>> >>> >> $ ./charmrun nbp +vp 4 20 +p4
>> >>> >> Charm++: scheduler running in netpoll mode.
>> >>> >> Charm++> cpu topology info is being gathered.
>> >>> >> Charm++> Running on 1 unique compute nodes (8-way SMP).
>> >>> >> Iteration 9
>> >>> >> Iteration 9:0 a
>> >>> >> Iteration 9
>> >>> >> Iteration 9:0 a
>> >>> >> Iteration 9
>> >>> >> Iteration 9:0 a
>> >>> >> Iteration 9
>> >>> >> Iteration 9:0 a
>> >>> >> Iteration 9:0 b
>> >>> >> Iteration 9:0 b
>> >>> >> Iteration 9:0 b
>> >>> >> Iteration 9:0 b
>> >>> >
>> >>> >
>> >>
>> >>
>> >
>
>


_______________________________________________
charm mailing list
charm AT cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/charm










Archive powered by MHonArc 2.6.16.

Top of Page