Skip to Content.
Sympa Menu

charm - Re: [charm] Issues trying to run mpi-coexist example

charm AT lists.cs.illinois.edu

Subject: Charm++ parallel programming system

List archive

Re: [charm] Issues trying to run mpi-coexist example


Chronological Thread 
  • From: Phil Miller <mille121 AT illinois.edu>
  • To: Steve Petruzza <spetruzza AT sci.utah.edu>
  • Cc: Sam White <white67 AT illinois.edu>, Jozsef Bakosi <jbakosi AT gmail.com>, charm <charm AT lists.cs.illinois.edu>
  • Subject: Re: [charm] Issues trying to run mpi-coexist example
  • Date: Wed, 22 Jun 2016 09:36:13 -0500

Actually, looking closer, that's already there. So that won't be it.

You could edit that flag in src/arch/mpi-darwin-x86_64/conv-mach.sh to
set a higher version (e.g. 10.11) and see if it's happier.

Also, when debugging the build, please drop the -j16 parallel build
argument, so that we can get a non-mangled error output.

On Wed, Jun 22, 2016 at 9:34 AM, Phil Miller
<mille121 AT illinois.edu>
wrote:
> Please try adding " -mmacosx-version-min=10.7 " to the end of your
> command line. If that works, then it would confirm that we need to
> update the mpi-darwin build to match the other Mac OS builds.
>
> On Wed, Jun 22, 2016 at 9:28 AM, Steve Petruzza
> <spetruzza AT sci.utah.edu>
> wrote:
>> Thank you Sam and Jozsef,
>>
>> On the XC40 I am already using gni-crayxc-smp, so yes that issue is
>> definitely related.
>>
>> On Mac if I try with:
>> ./build charm++ mpi-darwin-x86_64 -j16 --with-production --enable-tracing
>>
>> I get errors like:
>> clang: : : errorerror: : error: invalid deployment target for
>> -stdlib=libc++
>> (requires OS X 10.7 or later)invalid deployment target for -stdlib=libc++
>> (requires OS X 10.7 or later)invalid deployment target for -stdlib=libc++
>> (requires OS X 10.7 or later)
>>
>> But I am on a OS X 10.11. Should I set a different target?
>>
>> Thank you,
>> Steve
>>
>>
>>
>>
>>
>> On 22 Jun 2016, at 17:14, Sam White
>> <white67 AT illinois.edu>
>> wrote:
>>
>> MPI interoperation is currently only supported on MPI, PAMILRTS, and GNI
>> builds of Charm++. We will look into those hangs, but at least the second
>> one appears to be a known issue with running interop in SMP mode that we
>> are
>> working on (https://charm.cs.illinois.edu/redmine/issues/903).
>>
>> For your Mac, I would recommend building mpi-darwin-x86_64 and on the XC40
>> I
>> would recommend gni-crayxc. The PAMI_CLIENTS environment variable is only
>> applicable to PAMILRTS builds, you shouldn't need any extra flags for GNI.
>>
>> -Sam
>>
>> On Wed, Jun 22, 2016 at 9:12 AM, Jozsef Bakosi
>> <jbakosi AT gmail.com>
>> wrote:
>>>
>>> Hi Steve,
>>>
>>> Charm++ developers please correct me if I'm wrong...
>>>
>>> To interoperate with MPI codes or libraries you need to build Charm++ on
>>> top of the MPI backend. On Mac I do that with:
>>>
>>> $ build charm++ mpi-darwin-x86_64
>>>
>>> On linux, I do
>>>
>>> $ build charm++ mpi-linux-x86_64 mpicxx
>>>
>>> Jozsef
>>>
>>> On Wed, Jun 22, 2016 at 5:13 AM, Steve Petruzza
>>> <spetruzza AT sci.utah.edu>
>>> wrote:
>>>>
>>>> Hi,
>>>> I am trying to use the MPI interoperability with Charm++, and I am
>>>> starting using the example mpi-coexist.
>>>>
>>>> I tried to build on my Mac (openmpi + multicore-darwin-x86_64-clang or
>>>> netlrts-darwin-x86_64-smp-clang) but I cannot use the:
>>>> CharmLibInit(MPI_Comm userComm, int argc, char **argv);
>>>> because CMK_CONVERSE_MPI is set to 0 in mpi-interoperate.h
>>>>
>>>> So it just tried to use the other CharmLibInit passing a 0 as userComm,
>>>> but on the call it just crashes:
>>>> mpirun -np 4 ./multirun
>>>> ——————————————
>>>> [Steve:72383] *** Process received signal ***
>>>> [Steve:72383] Signal: Segmentation fault: 11 (11)
>>>> [Steve:72383] Signal code: Address not mapped (1)
>>>> [Steve:72383] Failing at address: 0x0
>>>> [Steve:72383] [ 0] 0 libsystem_platform.dylib
>>>> 0x00007fff8b9b652a _sigtramp + 26
>>>> [Steve:72383] [ 1] 0 ???
>>>> 0x0000000000000000 0x0 + 0
>>>> [Steve:72383] [ 2] [Steve:72384] *** Process received signal ***
>>>> [Steve:72384] Signal: Segmentation fault: 11 (11)
>>>> [Steve:72384] Signal code: Address not mapped (1)
>>>> [Steve:72384] Failing at address: 0x0
>>>> [Steve:72384] [ 0] 0 libsystem_platform.dylib
>>>> 0x00007fff8b9b652a _sigtramp + 26
>>>> [Steve:72384] [ 1] 0 ???
>>>> 0x0000000000000000 0x0 + 0
>>>> [Steve:72384] [ 2] [Steve:72385] *** Process received signal ***
>>>> [Steve:72385] Signal: Segmentation fault: 11 (11)
>>>> [Steve:72385] Signal code: Address not mapped (1)
>>>> [Steve:72385] Failing at address: 0x0
>>>> [Steve:72385] [ 0] 0 libsystem_platform.dylib
>>>> 0x00007fff8b9b652a _sigtramp + 26
>>>> [Steve:72385] [ 1] 0 ???
>>>> 0x0000000000000000 0x0 + 0
>>>> [Steve:72385] [ 2] [Steve:72382] *** Process received signal ***
>>>> [Steve:72382] Signal: Segmentation fault: 11 (11)
>>>> [Steve:72382] Signal code: Address not mapped (1)
>>>> [Steve:72382] Failing at address: 0x0
>>>> [Steve:72382] [ 0] 0 libsystem_platform.dylib
>>>> 0x00007fff8b9b652a _sigtramp + 26
>>>> [Steve:72382] [ 1] 0 ???
>>>> 0x0000000000000000 0x0 + 0
>>>> [Steve:72382] [ 2] 0 multirun
>>>> 0x0000000100118106 CmiAbortHelper + 38
>>>> [Steve:72382] [ 3] 0 multirun
>>>> 0x00000001001147a0 CmiSyncBroadcastAllFn + 0
>>>> [Steve:72382] [ 4] 0 multirun
>>>> 0x0000000100118106 CmiAbortHelper + 38
>>>> [Steve:72385] [ 3] 0 multirun
>>>> 0x00000001001147a0 CmiSyncBroadcastAllFn + 0
>>>> [Steve:72385] [ 4] 0 multirun
>>>> 0x0000000100111fd4 CharmLibInit + 36
>>>> [Steve:72385] [ 5] 0 multirun
>>>> 0x0000000100111fd4 CharmLibInit + 36
>>>> [Steve:72382] [ 5] 0 multirun
>>>> 0x0000000100118106 CmiAbortHelper + 38
>>>> [Steve:72383] [ 3] 0 multirun
>>>> 0x00000001001147a0 CmiSyncBroadcastAllFn + 0
>>>> [Steve:72383] [ 4] 0 multirun
>>>> 0x0000000100111fd4 CharmLibInit + 36
>>>> [Steve:72383] [ 5] 0 multirun
>>>> 0x0000000100001633 main + 147
>>>> [Steve:72383] [ 6] 0 multirun
>>>> 0x0000000100001574 start + 52
>>>> [Steve:72383] *** End of error message ***
>>>> 0 multirun 0x0000000100118106 CmiAbortHelper
>>>> + 38
>>>> [Steve:72384] [ 3] 0 multirun
>>>> 0x00000001001147a0 CmiSyncBroadcastAllFn + 0
>>>> [Steve:72384] [ 4] 0 multirun
>>>> 0x0000000100111fd4 CharmLibInit + 36
>>>> [Steve:72384] [ 5] 0 multirun
>>>> 0x0000000100001633 main + 147
>>>> [Steve:72384] [ 6] 0 multirun
>>>> 0x0000000100001633 main + 147
>>>> [Steve:72385] [ 6] 0 multirun
>>>> 0x0000000100001574 start + 52
>>>> [Steve:72385] *** End of error message ***
>>>> 0 multirun 0x0000000100001633 main + 147
>>>> [Steve:72382] [ 6] 0 multirun
>>>> 0x0000000100001574 start + 52
>>>> [Steve:72382] *** End of error message ***
>>>> 0 multirun 0x0000000100001574 start + 52
>>>> [Steve:72384] *** End of error message ***
>>>> ——————————————
>>>>
>>>> I guess this has to do with the build. How can I build charm++ and this
>>>> example on Mac in order to use the correct CharmLibInit with MPI_Comm?
>>>>
>>>> Anyway I tried the same on a Cray XC40 node (built correctly using
>>>> CharmLibInit with MPI_Comm), but:
>>>> If I run:
>>>> srun -N 1 -n 16 --hint=nomultithread --ntasks-per-socket=16 ./multirun
>>>> ——————————————
>>>> Charm++> Running on Gemini (GNI) with 16 processes
>>>> Charm++> static SMSG
>>>> Charm++> SMSG memory: 79.0KB
>>>> Charm++> memory pool init block size: 8MB, total memory pool limit 0MB (0
>>>> means no limit)
>>>> Charm++> memory pool registered memory limit: 200000MB, send limit:
>>>> 100000MB
>>>> Charm++> only comm thread send/recv messages
>>>> Charm++> Cray TLB page size: 2048K
>>>> Charm++> Running in SMP mode: numNodes 16, 1 worker threads per process
>>>> Charm++> The comm. thread both sends and receives messages
>>>> Converse/Charm++ Commit ID: v6.7.0-202-g95e5ac0
>>>> Warning> using Isomalloc in SMP mode, you may need to run with
>>>> '+isomalloc_sync'.
>>>> CharmLB> Load balancer assumes all CPUs are same.
>>>> Charm++> Running on 1 unique compute nodes (64-way SMP).
>>>> ——————————————
>>>>
>>>> Here it hangs forever.
>>>>
>>>> Then if I run:
>>>> srun -N 1 -n 16 --hint=nomultithread --ntasks-per-socket=16
>>>> ./multirun_time
>>>> ——————————————
>>>> Charm++> Running on Gemini (GNI) with 16 processes
>>>> Charm++> static SMSG
>>>> Charm++> SMSG memory: 79.0KB
>>>> Charm++> memory pool init block size: 8MB, total memory pool limit 0MB (0
>>>> means no limit)
>>>> Charm++> memory pool registered memory limit: 200000MB, send limit:
>>>> 100000MB
>>>> Charm++> only comm thread send/recv messages
>>>> Charm++> Cray TLB page size: 2048K
>>>> Charm++> Running in SMP mode: numNodes 16, 1 worker threads per process
>>>> Charm++> The comm. thread both sends and receives messages
>>>> Converse/Charm++ Commit ID: v6.7.0-202-g95e5ac0
>>>> Warning> using Isomalloc in SMP mode, you may need to run with
>>>> '+isomalloc_sync'.
>>>> CharmLB> Load balancer assumes all CPUs are same.
>>>> Charm++> Running on 1 unique compute nodes (64-way SMP).
>>>> Running Hi on 16 processors for 10 elements
>>>> Hi[1] from element 0
>>>> Hi[2] from element 1
>>>> Hi[3] from element 2
>>>> Hi[4] from element 3
>>>> Hi[5] from element 4
>>>> Hi[6] from element 5
>>>> Hi[7] from element 6
>>>> Hi[8] from element 7
>>>> Hi[9] from element 8
>>>> Hi[10] from element 9
>>>> ——————————————
>>>>
>>>> Also here it hangs forever.
>>>>
>>>> Is there any parameter or flag I should add? (I tried already -envs
>>>> PAMI_CLIENTS=MPI,Converse without success)
>>>>
>>>> Thank you,
>>>> Steve
>>>
>>>
>>
>>



Archive powered by MHonArc 2.6.16.

Top of Page