Skip to Content.
Sympa Menu

charm - Re: [charm] Issues trying to run mpi-coexist example

charm AT lists.cs.illinois.edu

Subject: Charm++ parallel programming system

List archive

Re: [charm] Issues trying to run mpi-coexist example


Chronological Thread 
  • From: Phil Miller <mille121 AT illinois.edu>
  • To: Steve Petruzza <spetruzza AT sci.utah.edu>
  • Cc: Sam White <white67 AT illinois.edu>, Jozsef Bakosi <jbakosi AT gmail.com>, charm <charm AT lists.cs.illinois.edu>
  • Subject: Re: [charm] Issues trying to run mpi-coexist example
  • Date: Wed, 22 Jun 2016 09:34:30 -0500

Please try adding " -mmacosx-version-min=10.7 " to the end of your
command line. If that works, then it would confirm that we need to
update the mpi-darwin build to match the other Mac OS builds.

On Wed, Jun 22, 2016 at 9:28 AM, Steve Petruzza
<spetruzza AT sci.utah.edu>
wrote:
> Thank you Sam and Jozsef,
>
> On the XC40 I am already using gni-crayxc-smp, so yes that issue is
> definitely related.
>
> On Mac if I try with:
> ./build charm++ mpi-darwin-x86_64 -j16 --with-production --enable-tracing
>
> I get errors like:
> clang: : : errorerror: : error: invalid deployment target for -stdlib=libc++
> (requires OS X 10.7 or later)invalid deployment target for -stdlib=libc++
> (requires OS X 10.7 or later)invalid deployment target for -stdlib=libc++
> (requires OS X 10.7 or later)
>
> But I am on a OS X 10.11. Should I set a different target?
>
> Thank you,
> Steve
>
>
>
>
>
> On 22 Jun 2016, at 17:14, Sam White
> <white67 AT illinois.edu>
> wrote:
>
> MPI interoperation is currently only supported on MPI, PAMILRTS, and GNI
> builds of Charm++. We will look into those hangs, but at least the second
> one appears to be a known issue with running interop in SMP mode that we are
> working on (https://charm.cs.illinois.edu/redmine/issues/903).
>
> For your Mac, I would recommend building mpi-darwin-x86_64 and on the XC40 I
> would recommend gni-crayxc. The PAMI_CLIENTS environment variable is only
> applicable to PAMILRTS builds, you shouldn't need any extra flags for GNI.
>
> -Sam
>
> On Wed, Jun 22, 2016 at 9:12 AM, Jozsef Bakosi
> <jbakosi AT gmail.com>
> wrote:
>>
>> Hi Steve,
>>
>> Charm++ developers please correct me if I'm wrong...
>>
>> To interoperate with MPI codes or libraries you need to build Charm++ on
>> top of the MPI backend. On Mac I do that with:
>>
>> $ build charm++ mpi-darwin-x86_64
>>
>> On linux, I do
>>
>> $ build charm++ mpi-linux-x86_64 mpicxx
>>
>> Jozsef
>>
>> On Wed, Jun 22, 2016 at 5:13 AM, Steve Petruzza
>> <spetruzza AT sci.utah.edu>
>> wrote:
>>>
>>> Hi,
>>> I am trying to use the MPI interoperability with Charm++, and I am
>>> starting using the example mpi-coexist.
>>>
>>> I tried to build on my Mac (openmpi + multicore-darwin-x86_64-clang or
>>> netlrts-darwin-x86_64-smp-clang) but I cannot use the:
>>> CharmLibInit(MPI_Comm userComm, int argc, char **argv);
>>> because CMK_CONVERSE_MPI is set to 0 in mpi-interoperate.h
>>>
>>> So it just tried to use the other CharmLibInit passing a 0 as userComm,
>>> but on the call it just crashes:
>>> mpirun -np 4 ./multirun
>>> ——————————————
>>> [Steve:72383] *** Process received signal ***
>>> [Steve:72383] Signal: Segmentation fault: 11 (11)
>>> [Steve:72383] Signal code: Address not mapped (1)
>>> [Steve:72383] Failing at address: 0x0
>>> [Steve:72383] [ 0] 0 libsystem_platform.dylib
>>> 0x00007fff8b9b652a _sigtramp + 26
>>> [Steve:72383] [ 1] 0 ???
>>> 0x0000000000000000 0x0 + 0
>>> [Steve:72383] [ 2] [Steve:72384] *** Process received signal ***
>>> [Steve:72384] Signal: Segmentation fault: 11 (11)
>>> [Steve:72384] Signal code: Address not mapped (1)
>>> [Steve:72384] Failing at address: 0x0
>>> [Steve:72384] [ 0] 0 libsystem_platform.dylib
>>> 0x00007fff8b9b652a _sigtramp + 26
>>> [Steve:72384] [ 1] 0 ???
>>> 0x0000000000000000 0x0 + 0
>>> [Steve:72384] [ 2] [Steve:72385] *** Process received signal ***
>>> [Steve:72385] Signal: Segmentation fault: 11 (11)
>>> [Steve:72385] Signal code: Address not mapped (1)
>>> [Steve:72385] Failing at address: 0x0
>>> [Steve:72385] [ 0] 0 libsystem_platform.dylib
>>> 0x00007fff8b9b652a _sigtramp + 26
>>> [Steve:72385] [ 1] 0 ???
>>> 0x0000000000000000 0x0 + 0
>>> [Steve:72385] [ 2] [Steve:72382] *** Process received signal ***
>>> [Steve:72382] Signal: Segmentation fault: 11 (11)
>>> [Steve:72382] Signal code: Address not mapped (1)
>>> [Steve:72382] Failing at address: 0x0
>>> [Steve:72382] [ 0] 0 libsystem_platform.dylib
>>> 0x00007fff8b9b652a _sigtramp + 26
>>> [Steve:72382] [ 1] 0 ???
>>> 0x0000000000000000 0x0 + 0
>>> [Steve:72382] [ 2] 0 multirun
>>> 0x0000000100118106 CmiAbortHelper + 38
>>> [Steve:72382] [ 3] 0 multirun
>>> 0x00000001001147a0 CmiSyncBroadcastAllFn + 0
>>> [Steve:72382] [ 4] 0 multirun
>>> 0x0000000100118106 CmiAbortHelper + 38
>>> [Steve:72385] [ 3] 0 multirun
>>> 0x00000001001147a0 CmiSyncBroadcastAllFn + 0
>>> [Steve:72385] [ 4] 0 multirun
>>> 0x0000000100111fd4 CharmLibInit + 36
>>> [Steve:72385] [ 5] 0 multirun
>>> 0x0000000100111fd4 CharmLibInit + 36
>>> [Steve:72382] [ 5] 0 multirun
>>> 0x0000000100118106 CmiAbortHelper + 38
>>> [Steve:72383] [ 3] 0 multirun
>>> 0x00000001001147a0 CmiSyncBroadcastAllFn + 0
>>> [Steve:72383] [ 4] 0 multirun
>>> 0x0000000100111fd4 CharmLibInit + 36
>>> [Steve:72383] [ 5] 0 multirun
>>> 0x0000000100001633 main + 147
>>> [Steve:72383] [ 6] 0 multirun
>>> 0x0000000100001574 start + 52
>>> [Steve:72383] *** End of error message ***
>>> 0 multirun 0x0000000100118106 CmiAbortHelper
>>> + 38
>>> [Steve:72384] [ 3] 0 multirun
>>> 0x00000001001147a0 CmiSyncBroadcastAllFn + 0
>>> [Steve:72384] [ 4] 0 multirun
>>> 0x0000000100111fd4 CharmLibInit + 36
>>> [Steve:72384] [ 5] 0 multirun
>>> 0x0000000100001633 main + 147
>>> [Steve:72384] [ 6] 0 multirun
>>> 0x0000000100001633 main + 147
>>> [Steve:72385] [ 6] 0 multirun
>>> 0x0000000100001574 start + 52
>>> [Steve:72385] *** End of error message ***
>>> 0 multirun 0x0000000100001633 main + 147
>>> [Steve:72382] [ 6] 0 multirun
>>> 0x0000000100001574 start + 52
>>> [Steve:72382] *** End of error message ***
>>> 0 multirun 0x0000000100001574 start + 52
>>> [Steve:72384] *** End of error message ***
>>> ——————————————
>>>
>>> I guess this has to do with the build. How can I build charm++ and this
>>> example on Mac in order to use the correct CharmLibInit with MPI_Comm?
>>>
>>> Anyway I tried the same on a Cray XC40 node (built correctly using
>>> CharmLibInit with MPI_Comm), but:
>>> If I run:
>>> srun -N 1 -n 16 --hint=nomultithread --ntasks-per-socket=16 ./multirun
>>> ——————————————
>>> Charm++> Running on Gemini (GNI) with 16 processes
>>> Charm++> static SMSG
>>> Charm++> SMSG memory: 79.0KB
>>> Charm++> memory pool init block size: 8MB, total memory pool limit 0MB (0
>>> means no limit)
>>> Charm++> memory pool registered memory limit: 200000MB, send limit:
>>> 100000MB
>>> Charm++> only comm thread send/recv messages
>>> Charm++> Cray TLB page size: 2048K
>>> Charm++> Running in SMP mode: numNodes 16, 1 worker threads per process
>>> Charm++> The comm. thread both sends and receives messages
>>> Converse/Charm++ Commit ID: v6.7.0-202-g95e5ac0
>>> Warning> using Isomalloc in SMP mode, you may need to run with
>>> '+isomalloc_sync'.
>>> CharmLB> Load balancer assumes all CPUs are same.
>>> Charm++> Running on 1 unique compute nodes (64-way SMP).
>>> ——————————————
>>>
>>> Here it hangs forever.
>>>
>>> Then if I run:
>>> srun -N 1 -n 16 --hint=nomultithread --ntasks-per-socket=16
>>> ./multirun_time
>>> ——————————————
>>> Charm++> Running on Gemini (GNI) with 16 processes
>>> Charm++> static SMSG
>>> Charm++> SMSG memory: 79.0KB
>>> Charm++> memory pool init block size: 8MB, total memory pool limit 0MB (0
>>> means no limit)
>>> Charm++> memory pool registered memory limit: 200000MB, send limit:
>>> 100000MB
>>> Charm++> only comm thread send/recv messages
>>> Charm++> Cray TLB page size: 2048K
>>> Charm++> Running in SMP mode: numNodes 16, 1 worker threads per process
>>> Charm++> The comm. thread both sends and receives messages
>>> Converse/Charm++ Commit ID: v6.7.0-202-g95e5ac0
>>> Warning> using Isomalloc in SMP mode, you may need to run with
>>> '+isomalloc_sync'.
>>> CharmLB> Load balancer assumes all CPUs are same.
>>> Charm++> Running on 1 unique compute nodes (64-way SMP).
>>> Running Hi on 16 processors for 10 elements
>>> Hi[1] from element 0
>>> Hi[2] from element 1
>>> Hi[3] from element 2
>>> Hi[4] from element 3
>>> Hi[5] from element 4
>>> Hi[6] from element 5
>>> Hi[7] from element 6
>>> Hi[8] from element 7
>>> Hi[9] from element 8
>>> Hi[10] from element 9
>>> ——————————————
>>>
>>> Also here it hangs forever.
>>>
>>> Is there any parameter or flag I should add? (I tried already -envs
>>> PAMI_CLIENTS=MPI,Converse without success)
>>>
>>> Thank you,
>>> Steve
>>
>>
>
>



Archive powered by MHonArc 2.6.16.

Top of Page