Skip to Content.
Sympa Menu

charm - Re: [charm] Is AMPI support MPI_Waitall?

charm AT lists.cs.illinois.edu

Subject: Charm++ parallel programming system

List archive

Re: [charm] Is AMPI support MPI_Waitall?


Chronological Thread 
  • From: Phil Miller <mille121 AT illinois.edu>
  • To: Zhang Kai <zhangk1985 AT gmail.com>
  • Cc: charm AT cs.uiuc.edu
  • Subject: Re: [charm] Is AMPI support MPI_Waitall?
  • Date: Tue, 2 Mar 2010 14:54:13 -0600
  • List-archive: <http://lists.cs.uiuc.edu/pipermail/charm>
  • List-id: CHARM parallel programming system <charm.cs.uiuc.edu>

Have you gotten the chance to try this again? Did Gengbin's suggestion
of using ++scalable-start work?

Phil

2010/2/12 Zhang Kai
<zhangk1985 AT gmail.com>:
>
> I didn't get any output in my failed runs.
>
> I am on vacation and I will try it after I come back to office.
>
> Thanks for your reply. And happy Chinese New Year!
>
> Zhang Kai
>
>
> 在 2010年2月12日 下午11:00,Gengbin Zheng
> <zhenggb AT gmail.com>写道:
>>
>> I was asking because I could not repeat your problem. I ran it on ranger a
>> few times, and it worked for me. This is an example job output on ranger
>> cluster:
>>
>> ./charmrun +p128 ./t 32768 30 +vp256 +tcharm_stacksize 32768000
>> Warning> Randomization of stack pointer is turned on in kernel, thread
>> migration may not work! Run 'echo 0 > /proc/sys/kernel/randomize_va_space'
>> as root to disable it, or try run with '+isomalloc_sync'.
>> Charm++: scheduler running in netpoll mode.
>> Charm++> cpu topology info is being gathered.
>> Charm++> Running on 8 unique compute nodes (16-way SMP).
>> Computed 32768 particles in 3.081355 seconds
>> TACC: Cleaning up after job: 1245471
>> TACC: Done.
>>
>>
>> AMPI runs each MPI rank in a user-level thread,  +tcharm_stacksize is to
>> set the maximum size of the stack for each thread. Since this program
>> declares large arrays on stack, it is important to specify a large enough
>> stack size. But I think 32MB like we are using here should be more than
>> enough.
>>
>> In your runs, do you even see the first lines of charm output?
>> If you don't see any output, it might be a startup problem. Try add
>> ++scalable-start to your command line.
>>
>> Gengbin
>>
>>
>> 2010/2/12 Zhang Kai
>> <zhangk1985 AT gmail.com>
>>>
>>> Of coures the value of MAX_P is larger than 256
>>>
>>> Actually, I set MAX_PARTICLES to 32768 and MAX_P to 512. I don't think
>>> there is any problem in the program because I can run it with MPICH2.
>>>
>>> The program also failed when I used +p128 +vp128 options.
>>>
>>> By the way, I am not sure about the meaning of +tcharm_stacksize.
>>>
>>> Does that means the communication buffer between VPs?
>>>
>>>
>>> 在 2010年2月12日 上午2:35,Gengbin Zheng
>>> <zhenggb AT gmail.com>写道:
>>>>
>>>> What are the values of MAX_PARTICLES and MAX_P in your program.
>>>>
>>>> I suppose your command line is like:
>>>>
>>>> ./charmrun +p128 ./pgm 32768 30  +vp256 +tcharm_stacksize 32768000
>>>>
>>>> so you are actually running on 256 virtual AMPI processes.
>>>>
>>>> Gengbin
>>>>
>>>> 2010/2/9 Zhang Kai
>>>> <zhangk1985 AT gmail.com>
>>>>>
>>>>> What I modified is just the values of  MAX_PARTICLES and MAX_P defined
>>>>> before main() so that it can run on larger scale.
>>>>>
>>>>> Besides this, I dosen't modify any line.
>>>>>
>>>>> Zhang Kai
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> 2010/2/10 Phil Miller
>>>>> <mille121 AT illinois.edu>
>>>>>>
>>>>>> I'll investigate this on a local cluster. Are you working with the
>>>>>> sample code you originally linked, or a modified version? If the
>>>>>> sample code works, but your modification doesn't, please send us the
>>>>>> modified version so we can isolate the bug.
>>>>>>
>>>>>> Phil
>>>>>>
>>>>>> 2010/2/9 Zhang Kai
>>>>>> <zhangk1985 AT gmail.com>:
>>>>>> > hi Phil:
>>>>>> >
>>>>>> > Sorry to disturb you again.
>>>>>> >
>>>>>> > I met a new problem when I run this nbody program on a cluster which
>>>>>> > contains 16 nodes and totally 128 phisical processors.
>>>>>> >
>>>>>> > I have successfully run it with MPICH2. And I can also build it with
>>>>>> > AMPI.
>>>>>> >
>>>>>> > However, when I run it with this command:
>>>>>> > ./charmrun +p128 ./pgm 32768 30  +vp256 +tcharm_stacksize 32768000
>>>>>> > ++nodelist charm.hosts
>>>>>> >
>>>>>> > I found it doesn't work.
>>>>>> >
>>>>>> > By using ++verbose option, I found all nodes I listed in the
>>>>>> > nodefile have
>>>>>> > been connected and all pgm processs have been created at all nodes.
>>>>>> > But then, there was no response from them.
>>>>>> >
>>>>>> > Some times it may run successfully when I use +p64 instead of +p128.
>>>>>> > But
>>>>>> > some times it may not.
>>>>>> >
>>>>>> > I wonder if there is another bug or i just misconfiged AMPI.
>>>>>> >
>>>>>> > Looking forward your reply.
>>>>>> >
>>>>>> > Best wishes!
>>>>>> >
>>>>>> > Zhang Kai
>>>>>> > Department of Computer Science & Technology
>>>>>> > Tsinghua University
>>>>>> >
>>>>>> >
>>>>>> >
>>>>>> >
>>>>>> > 2010/1/30 Phil Miller
>>>>>> > <mille121 AT illinois.edu>
>>>>>> >>
>>>>>> >> The attached patch is generated from and applies cleanly to
>>>>>> >> charm-6.1.3.
>>>>>> >>
>>>>>> >> If you're familiar with the 'git' version control system, our
>>>>>> >> repository can be cloned from git://charm.cs.uiuc.edu/charm.git
>>>>>> >>
>>>>>> >> If you don't mind my asking, what work are you doing with AMPI?
>>>>>> >> We'd
>>>>>> >> be very interested to hear about your experiences.
>>>>>> >>
>>>>>> >> Phil
>>>>>> >>
>>>>>> >> On Fri, Jan 29, 2010 at 08:03, Phil Miller
>>>>>> >> <mille121 AT illinois.edu>
>>>>>> >> wrote:
>>>>>> >> > 2010/1/29 张凯
>>>>>> >> > <zhangk1985 AT gmail.com>:
>>>>>> >> >> I am using charm-6.1.3.  Can that patch fix the problem on
>>>>>> >> >> charm-6.1.3?
>>>>>> >> >>
>>>>>> >> >> When I patch it to ampi.C by using command "patch
>>>>>> >> >> src/libs/ck-libs/ampi/ampi.C  AMPI_Cart_shift.patch", it reports
>>>>>> >> >> :
>>>>>> >> >>
>>>>>> >> >> patching file src/libs/ck-libs/ampi/ampi.C
>>>>>> >> >> Hunk #1 succeeded at 5158 (offset -185 lines).
>>>>>> >> >> Hunk #2 FAILED at 5198.
>>>>>> >> >> 1 out of 2 hunks FAILED -- saving rejects to file
>>>>>> >> >> src/libs/ck-libs/ampi/ampi.C.rej
>>>>>> >> >>
>>>>>> >> >> AfterI tried to read the patch file and fix it by myself, i
>>>>>> >> >> found that
>>>>>> >> >> there
>>>>>> >> >> maybe a lot difference between the released version and
>>>>>> >> >> development
>>>>>> >> >> version
>>>>>> >> >> of charm source code.
>>>>>> >> >>
>>>>>> >> >> Could you help me build an patch for charm-6.1.3? Or tell me
>>>>>> >> >> where can
>>>>>> >> >> i
>>>>>> >> >> find stable developemnt version which I can patch it by myself?
>>>>>> >> >
>>>>>> >> > I'll send you the patch against 6.1.3 shortly, when I'm in the
>>>>>> >> > office.
>>>>>> >> >
>>>>>> >> > Phil
>>>>>> >> >
>>>>>> >> >
>>>>>> >> >>
>>>>>> >> >> Zhang Kai
>>>>>> >> >>
>>>>>> >> >> 2010/1/29 Phil Miller
>>>>>> >> >> <mille121 AT illinois.edu>
>>>>>> >> >>>
>>>>>> >> >>> The bug has been fixed in the development version of charm. If
>>>>>> >> >>> you use
>>>>>> >> >>> a pre-built development binary, the fix will be in tonight's
>>>>>> >> >>> autobuild
>>>>>> >> >>> for whatever platform you use. If you're building it from
>>>>>> >> >>> development
>>>>>> >> >>> source yourself, the patch is attached. If you are using the
>>>>>> >> >>> released
>>>>>> >> >>> Charm 6.1.x, we can port that fix over for you if you're not
>>>>>> >> >>> comfortable doing so yourself.
>>>>>> >> >>>
>>>>>> >> >>> Phil
>>>>>> >> >>>
>>>>>> >> >>> 2010/1/28 张凯
>>>>>> >> >>> <zhangk1985 AT gmail.com>:
>>>>>> >> >>> >
>>>>>> >> >>> > i think u got the problem i suffered.
>>>>>> >> >>> >
>>>>>> >> >>> > thanks for your reply.
>>>>>> >> >>> >
>>>>>> >> >>> > best regards.
>>>>>> >> >>> >
>>>>>> >> >>> > Zhang Kai
>>>>>> >> >>> >
>>>>>> >> >>> > 2010/1/29 Phil Miller
>>>>>> >> >>> > <mille121 AT illinois.edu>
>>>>>> >> >>> >>
>>>>>> >> >>> >> On Thu, Jan 28, 2010 at 07:50, 张凯
>>>>>> >> >>> >> <zhangk1985 AT gmail.com>
>>>>>> >> >>> >> wrote:
>>>>>> >> >>> >> > hi:
>>>>>> >> >>> >> >
>>>>>> >> >>> >> > I am a beginner of AMPI and trying to run a MPI program
>>>>>> >> >>> >> > using it.
>>>>>> >> >>> >> > But
>>>>>> >> >>> >> > i
>>>>>> >> >>> >> > found a little problem.
>>>>>> >> >>> >> >
>>>>>> >> >>> >> > Here(
>>>>>> >> >>> >> >
>>>>>> >> >>> >> >
>>>>>> >> >>> >> >
>>>>>> >> >>> >> >
>>>>>> >> >>> >> > http://www.mcs.anl.gov/research/projects/mpi/usingmpi/examples/advmsg/nbodypipe_c.htm)
>>>>>> >> >>> >> > you can find an example of a MPI program. I have
>>>>>> >> >>> >> > successfully
>>>>>> >> >>> >> > built
>>>>>> >> >>> >> > and
>>>>>> >> >>> >> > run it using both MPICH and intel MPI.
>>>>>> >> >>> >> >
>>>>>> >> >>> >> > However, when i running it with AMPI, i found that the
>>>>>> >> >>> >> > program
>>>>>> >> >>> >> > was
>>>>>> >> >>> >> > blocked
>>>>>> >> >>> >> > by MPI_Waitall function and never return again.
>>>>>> >> >>> >> >
>>>>>> >> >>> >> > I just run it with ++local +p2 +vp2 options. Did i miss
>>>>>> >> >>> >> > other
>>>>>> >> >>> >> > options?
>>>>>> >> >>> >> > or
>>>>>> >> >>> >> > misconfig AMPI?
>>>>>> >> >>> >>
>>>>>> >> >>> >> I'm seeing the same effect as you describe on a
>>>>>> >> >>> >> net-linux-x86_64
>>>>>> >> >>> >> build
>>>>>> >> >>> >> of AMPI from the latest charm sources. We'll look into this
>>>>>> >> >>> >> and get
>>>>>> >> >>> >> back to you.
>>>>>> >> >>> >>
>>>>>> >> >>> >> For reference, the attached code (with added prints)
>>>>>> >> >>> >> produces the
>>>>>> >> >>> >> following:
>>>>>> >> >>> >>
>>>>>> >> >>> >> $ ./charmrun nbp +vp 4 20 +p4
>>>>>> >> >>> >> Charm++: scheduler running in netpoll mode.
>>>>>> >> >>> >> Charm++> cpu topology info is being gathered.
>>>>>> >> >>> >> Charm++> Running on 1 unique compute nodes (8-way SMP).
>>>>>> >> >>> >> Iteration 9
>>>>>> >> >>> >> Iteration 9:0 a
>>>>>> >> >>> >> Iteration 9
>>>>>> >> >>> >> Iteration 9:0 a
>>>>>> >> >>> >> Iteration 9
>>>>>> >> >>> >> Iteration 9:0 a
>>>>>> >> >>> >> Iteration 9
>>>>>> >> >>> >> Iteration 9:0 a
>>>>>> >> >>> >> Iteration 9:0 b
>>>>>> >> >>> >> Iteration 9:0 b
>>>>>> >> >>> >> Iteration 9:0 b
>>>>>> >> >>> >> Iteration 9:0 b
>>>>>> >> >>> >
>>>>>> >> >>> >
>>>>>> >> >>
>>>>>> >> >>
>>>>>> >> >
>>>>>> >
>>>>>> >
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> charm mailing list
>>>>> charm AT cs.uiuc.edu
>>>>> http://lists.cs.uiuc.edu/mailman/listinfo/charm
>>>>>
>>>>
>>>>
>>>
>>
>>
>>
>
>





Archive powered by MHonArc 2.6.16.

Top of Page