Skip to Content.
Sympa Menu

charm - Re: [charm] how to suspend/resume a chare ?

charm AT lists.cs.illinois.edu

Subject: Charm++ parallel programming system

List archive

Re: [charm] how to suspend/resume a chare ?


Chronological Thread 
  • From: Christian Perez <christian.perez AT inria.fr>
  • To: gzheng AT illinois.edu
  • Cc: Filippo Gioachin <gioachin AT uiuc.edu>, "kale AT illinois.edu" <kale AT illinois.edu>, "charm AT cs.uiuc.edu" <charm AT cs.uiuc.edu>, "Kale, Laxmikant V" <kale AT cs.uiuc.edu>, Gengbin Zheng <zhenggb AT gmail.com>
  • Subject: Re: [charm] how to suspend/resume a chare ?
  • Date: Fri, 28 May 2010 11:03:39 +0200
  • List-archive: <http://lists.cs.uiuc.edu/pipermail/charm>
  • List-id: CHARM parallel programming system <charm.cs.uiuc.edu>
  • Organization: INRIA/LIP

Here is a simplified test program that bugs (line 31) while the version with line 30 on works (with one process).
After compiling, it may happen that the 1st run works ... so it may be a race condition.
Running it with 2 processes seems to be ok on my machine.

Christian

On 05/28/2010 04:03 AM, Gengbin Zheng wrote:
I wrote a simple test program passing either thishandle, or CProxy, it
seems to work both ways.
I think it would be best if you can write a simple standalone program
that can reproduce your problem and send to us for investigation.

Gengbin

On Thu, May 27, 2010 at 8:51 AM, Christian Perez
<christian.perez AT inria.fr>
wrote:
But, why does it work if I use the parameter 'psi'?

If I remove the 'sync', there is a synchronization issue :(

Christian

On 05/26/2010 09:54 PM, Gengbin Zheng wrote:
remove [sync] from your ci file. Calling sync will block the caller,
which in your case is not a threaded entry method.

Gengbin

On Wed, May 26, 2010 at 10:29 AM, Christian Perez
<christian.perez AT inria.fr>
wrote:

On 05/26/2010 05:00 PM, Gengbin Zheng wrote:

On Wed, May 26, 2010 at 6:37 AM, Christian Perez
<christian.perez AT inria.fr>
wrote:


On 05/25/2010 07:22 PM, Gengbin Zheng wrote:


Is the caller of connect() a thread (I mean is connect() a threaded
entry function)?



It is not a threaded charm method: when I tried to turn it into a
[threaded]
entry
method it I've got an error when invoking JNI method from this chare. I
shall investigate it later.


another way of doing this without using thread is to send component_B
proxy to component_A, and let component_A directly send its proxy to
component_B.



That is the alternative solution: the good news is that it works(*),
the not-so-good news is that it make the mapping between components
and charm objects a litte more complex.

(*) I need to use a hack that seems weird to me:

Chare component_A : SimpleInterface {
void set_s(CProxy_SetSimpleInterface& pssi, CProxy_SimpleInterface&
psi) {
pssi.set_si(psi);
}

works while the following version does not work (psi is changed to
thishandle)


Chare component_A : SimpleInterface {
void set_s(CProxy_SetSimpleInterface& pssi, CProxy_SimpleInterface&
psi) {
pssi.set_si(thishandle);
}

psi is obtained from a call to CProxy_component_A::ckNew.

Question: how can I create a Proxy from within a chare?

I tried

pssi.set_si(CProxy_SimpleInterface(thishandle));

but it fails with the same error :

[0] Assertion "n<len" failed in file cklists.h line 221.
------------- Processor 0 Exiting: Called CmiAbort ------------
Reason:
[0] Stack Traceback:
[0:0] CmiAbort+0x89 [0x506ff2]
[0:1] __cmi_assert+0x4b [0x514a0c]
[0:2] _ZN5CkVecIPvEixEm+0x36 [0x49c7a8]
[0:3] /opt/usr/stow/ULCMi/libexec/Charm-launcher [0x4966f9]
[0:4] _Z15_processHandlerPvP11CkCoreState+0x30d [0x49712e]
[0:5] CmiHandleMessage+0x84 [0x50f5f1]
[0:6] CsdSchedulePoll+0x96 [0x50fb5f]
[0:7] CsdScheduler+0x23 [0x50f8bf]
[0:8] CthStandinCode+0xe [0x50fc5a]
[0:9] CthStartThread+0x59 [0x4830b7]
[0:10] /lib/libc.so.6 [0x7f98d9aaf370]
Fatal error on PE 0>

Thanks for your help!



It is hard to eyeball errors from segmented code.
What is in set_si?


I produce this bug with this piece of code, with only one process.

ci file:
chare SingleProvider : SimpleInterface {
...
entry [sync] void provider_set_s(CProxy_SetSimpleInterface& pssi,
CProxy_SimpleInterface& psi, int n, char name[n]);
...
}

C file:

class SingleProvider : virtual public CBase_SingleProvider {
...

void provider_set_s(CProxy_SetSimpleInterface& pssi,
CProxy_SimpleInterface& psi, int n, char* name) {
pssi.ulcmi_set_si(CProxy_SimpleInterface(thishandle), n, name);
}


It also fails if I use thishandle instead of
CProxy_SimpleInterface(thishandle).
It works fine if I use pssi (which as been with ckNew of
CProxy_SingleProvider).

| Can you build your program with -g, when it crashes again, get stack
trace
like:

Here the output within gdb:

[0] Assertion "n<len" failed in file cklists.h line 221.
------------- Processor 0 Exiting: Called CmiAbort ------------
Reason:
[0] Stack Traceback:
[0:0] CmiAbort+0x89 [0x506f02]
[0:1] __cmi_assert+0x4b [0x51491c]
[0:2] _ZN5CkVecIPvEixEm+0x36 [0x49c6b8]
[0:3] /opt/usr/stow/ULCMi/libexec/Charm-launcher [0x496609]
[0:4] _Z15_processHandlerPvP11CkCoreState+0x30d [0x49703e]
[0:5] CmiHandleMessage+0x84 [0x50f501]
[0:6] CsdSchedulePoll+0x96 [0x50fa6f]
[0:7] CsdScheduler+0x23 [0x50f7cf]
[0:8] CthStandinCode+0xe [0x50fb6a]
[0:9] CthStartThread+0x59 [0x482fc7]
[0:10] /lib/libc.so.6 [0x7ffff5729370]
CHARM++ FATAL ERROR:

Program received signal SIGSEGV, Segmentation fault.
0x0000000000506f31 in CmiAbort (message=0x53d785 "") at machine.c:612
612 *(int *)NULL = 0; /*Write to null, causing bus error*/
(gdb) bt
#0 0x0000000000506f31 in CmiAbort (message=0x53d785 "") at machine.c:612
#1 0x000000000051491c in __cmi_assert (expr=0x534e87 "n<len",
file=0x534e7d "cklists.h", line=221) at convcore.c:3399
#2 0x000000000049c6b8 in CkVec<void*>::operator[] (this=0x7ffff0067aa0,
n=140737226103192) at cklists.h:221
#3 0x0000000000496609 in _processForPlainChareMsg (ck=0x7ffff006c640,
env=0x7ffff007e3e8) at ck.C:844
#4 0x000000000049703e in _processHandler (converseMsg=0x7ffff007e3e8,
ck=0x7ffff006c640) at ck.C:1117
#5 0x000000000050f501 in CmiHandleMessage (msg=0x7ffff007e3e8)
at convcore.c:1393
#6 0x000000000050fa6f in CsdSchedulePoll () at convcore.c:1610
#7 0x000000000050f7cf in CsdScheduler (maxmsgs=0) at convcore.c:1491
#8 0x000000000050fb6a in CthStandinCode () at convcore.c:1674
#9 0x0000000000482fc7 in CthStartThread (fn1=0, fn2=5307228, arg1=0,
arg2=0)
at threads.c:1579
#10 0x00007ffff5729370 in ?? () from /lib/libc.so.6
#11 0x0000000000000000 in ?? ()



addr2line -e ./your_binay 0x4966f9



Can you send me the output of that for lines:

[0:2] _ZN5CkVecIPvEixEm+0x36 [0x49c7a8]
[0:3] /opt/usr/stow/ULCMi/libexec/Charm-launcher [0x4966f9]


addr2line -e /opt/usr/stow/ULCMi/libexec/Charm-launcher 0x49c6b8
/home/cperez/Research/charm-6.2.0/net-linux-x86_64-smp/tmp/cklists.h:222

addr2line -e /opt/usr/stow/ULCMi/libexec/Charm-launcher 0x496609
/home/cperez/Research/charm-6.2.0/net-linux-x86_64-smp/tmp/ck.C:844


btw, did you ever run megatest (in charm/tests/charm++/megatest)
successfully? Just wanted to make sure charm threaded entry method
indeed works on your system.



I run it up to "++p 16" and it works fine.
My system is a linux box (Intel(R) Core(TM)2 Duo CPU) running
debian/unstable (gcc (Debian 4.4.4-2) 4.4.4).

Thank you

Christian




Attachment: bug.tz2
Description: Binary data




Archive powered by MHonArc 2.6.16.

Top of Page