Skip to Content.
Sympa Menu

charm - Re: [charm] [ppl] multi-thread in taub

charm AT lists.cs.illinois.edu

Subject: Charm++ parallel programming system

List archive

Re: [charm] [ppl] multi-thread in taub


Chronological Thread 
  • From: Gengbin Zheng <zhenggb AT gmail.com>
  • To: Fernando Stump <fernando.stump AT gmail.com>
  • Cc: Phil Miller <mille121 AT illinois.edu>, Charm Mailing List <charm AT cs.illinois.edu>
  • Subject: Re: [charm] [ppl] multi-thread in taub
  • Date: Fri, 21 Oct 2011 23:57:28 -0500
  • List-archive: <http://lists.cs.uiuc.edu/pipermail/charm>
  • List-id: CHARM parallel programming system <charm.cs.uiuc.edu>

I think in general the second option is better since it does not rely
on pthreads which generally has much larger context switch overhead
compared to our other Converse thread implementation.
However, it only works on x86 based Linux. That said, the first option
might be more portable as long as TLS and pthreads are supported.

Gengbin


On Fri, Oct 21, 2011 at 7:04 PM, Fernando Stump
<fernando.stump AT gmail.com>
wrote:
> Hi Gengbin,
>
>
> Is there pros and cons in each option? How do I choose?
>
> Thanks
> Fernando
> On Oct 21, 2011, at 4:56 PM, Gengbin Zheng wrote:
>
>> Basically yes, you can add __thread to your variables.
>> However, since Converse thread is user level threads, which is not
>> like pthreads as TLS is designed for, it won't work automatically. You
>> have two options (after adding __thread):
>>
>> 1.  Make Converse thread implemented as pthread, to do that link with
>>    -thread pthreads
>>
>> 2. Use special implementation of Converse thread to swap TLS when
>> these user level thread context switch, to do that, you need to
>> compile and link your program with charmc flag:
>>    -tlsglobals
>>
>>
>> Gengbin
>>
>> On Fri, Oct 21, 2011 at 12:02 PM, Fernando Stump
>> <fernando.stump AT gmail.com>
>> wrote:
>>> Hi,
>>> I have another question related to multi-threading. here it is:
>>> My Problem: My original serial code uses a lot of  static variables /
>>> static
>>> member functions . To preserve the semantics of the code, I would like
>>> that
>>> this static variables are local to each thread. (i.e. If I create a static
>>> variable inside driver()) each driver will. see a static variable)
>>> Solutions:
>>> Aaron proposed this solution:
>>> You're right, this is a potential source of errors--all drivers on a
>>> processor will share any static members. One workaround to this issue
>>> if you
>>> need to keep the static members is to replace your static
>>> property_set with a static map<int, property_set> which has an entry
>>> for each chare on the processor. So, instead of accessing
>>> set_.whatever(), you would instead access set_[CkMyPE()].whatever().
>>> This has potentially bad cache behavior, but it does allow you to
>>> retain the static variables and still get correct results.
>>> Googling I found another potential solution and I would like to hear your
>>> opinions:
>>> Solution :  Use the specifier __thread
>>> As in
>>>
>>> static __thread char *p;
>>>
>>> Here it is the documentation of this specifier at gcc
>>> http://gcc.gnu.org/onlinedocs/gcc-4.1.2/gcc/Thread_002dLocal.html
>>> Is there a chance it will work? I do not understande very well how does
>>> ParFUM/Charm++ create threads or assign Chares/Driver() to threads.
>>> Thanks
>>> Fernando
>>>
>>>
>>>
>>> On Oct 7, 2011, at 2:40 PM, Phil Miller wrote:
>>>
>>> On Fri, Oct 7, 2011 at 14:22, Fernando Stump
>>> <fernando.stump AT gmail.com>
>>> wrote:
>>>
>>> Hi,
>>>
>>> I'm running my ParFUMized version of my code at the taub cluster at uiuc.
>>>  Each node contains 12 processors. I'm running in one node, with option
>>> +p2,
>>> but I have the feeling that the code is running in a single processor. My
>>> clue is that this is related with this "warning"
>>>
>>> Charm++> Running on MPI version: 2.2 multi-thread support: 0 (max
>>> supported:
>>> -1)
>>>
>>> This is a detail of the underlying MPI implementation. It doesn't mean
>>> Charm++ is running on only 1 thread.
>>>
>>> My question is:
>>>
>>> Where it is the issue?  Is it on how MPI was compiled or on how charm++
>>> was
>>> compiled or on how I call charmrun?
>>>
>>> Here it is the full call.
>>>
>>> [fstump2@taubh2
>>> io]$ ../yafeq/build/debug/yafeq/charmrun
>>> ../yafeq/build/debug/yafeq/pfem.out +p2
>>>
>>> Running on 2 processors:  ../yafeq/build/debug/yafeq/pfem.out
>>>
>>> charmrun>  /usr/bin/setarch x86_64 -R  mpirun -np 2
>>>  ../yafeq/build/debug/yafeq/pfem.out
>>>
>>> Charm++> Running on MPI version: 2.2 multi-thread support: 0 (max
>>> supported:
>>> -1)
>>>
>>> Charm++> Running on 1 unique compute nodes (12-way SMP).
>>>
>>> Charm++> Cpu topology info:
>>>
>>> PE to node map: 0 0
>>>
>>> Node to PE map:
>>>
>>> Chip #0: 0 1
>>>
>>> Charm++> cpu topology info is gathered in 0.003 seconds.
>>>
>>> This output seems to indicate that things are working correctly. You
>>> got two cores, 0 and 1, on chip 0 of node 0. Did some other indication
>>> lead you to the conclusion that only one core was doing work?
>>>
>>> Phil
>>>
>>>
>>> _______________________________________________
>>> charm mailing list
>>> charm AT cs.uiuc.edu
>>> http://lists.cs.uiuc.edu/mailman/listinfo/charm
>>>
>>> _______________________________________________
>>> ppl mailing list
>>> ppl AT cs.uiuc.edu
>>> http://lists.cs.uiuc.edu/mailman/listinfo/ppl
>>>
>>>
>
>





Archive powered by MHonArc 2.6.16.

Top of Page