Charm++ parallel programming system

Text archives Help

Re: [charm] PAMI_Context_advance throws an error after PAMI_Rput call

Chronological Thread 
  • From: Sameer Kumar <sameermanepalli AT>
  • To: Nitin Bhat <nitin AT>
  • Cc: "charm AT" <charm AT>
  • Subject: Re: [charm] PAMI_Context_advance throws an error after PAMI_Rput call
  • Date: Thu, 14 Sep 2017 05:15:23 -0400
  • Authentication-results:; spf=pass smtp.mailfrom=sameermanepalli AT

Looks like a user error triggering crash in memcpy over shmem. Can you verify the buffer addresses are correct? Also verify calls to PAMI_Memregion_create.

Sent from my iPhone

On 13-Sep-2017, at 5:25 pm, Nitin Bhat <nitin AT> wrote:



I am getting an error while working with RDMA calls in the PAMI communication library on BG/Q Vesta and needed help on debugging it. 


I get the error when I build charm with “./build charm++ pamilrts-bluegeneq --with-production –j16 –g”. Surprisingly, the error does not reproduce when I build charm with –O0 optimization “./build charm++ pamilrts-bluegeneq –j16 –O0 –g”.


Specifically, the crash is at PAMI_Context_advance, which is called sometime after calling PAMI_Rput. I have attached the job output and the stack trace that I obtained from bgqstack. 


I see that the error occurs after the completion function executes (done_fn that I pass to the PAMI_Rput call). 

Additionally, the stack trace reveals that the error occurs at /bgsys/source/srcV1R2M4.29840/comm/sys/buildtools/pami/p2p/protocols/rput/PutRdma.h:149, which is a call to the complete_simple() method, and the line which shows the error is 

put->simple.done_fn (context, put->simple.cookie, PAMI_SUCCESS);


But I’m not sure why the invocation of the done function is throwing an error. I am not doing anything specific in the done_fn other than print out “completion fn beg” and “completion fn end” as seen in the job output. 


Interestingly, things work just fine and I don’t see any crash at PAMI_Context_advance when my program uses PAMI_Rget (instead of PAMI_Rput).


Any pointers for debugging this error? Are there any restrictions for making calls to PAMI_Context_advance after Rput/Rget calls?



Nitin Bhat

Software Engineer, 

Charmworks Inc.


  • Re: [charm] PAMI_Context_advance throws an error after PAMI_Rput call, Sameer Kumar, 09/14/2017

Archive powered by MHonArc 2.6.19.

Top of page