Skip to Content.
Sympa Menu

charm - Re: [charm] disable hwloc?

charm AT lists.cs.illinois.edu

Subject: Charm++ parallel programming system

List archive

Re: [charm] disable hwloc?


Chronological Thread 
  • From: Evan Ramos <evan AT hpccharm.com>
  • To: Jozsef Bakosi <jbakosi AT lanl.gov>
  • Cc: charm <charm AT lists.cs.illinois.edu>
  • Subject: Re: [charm] disable hwloc?
  • Date: Tue, 8 May 2018 11:47:45 -0500
  • Authentication-results: illinois.edu; spf=none smtp.mailfrom=evan AT hpccharm.com; dkim=pass header.d=hpccharm-com.20150623.gappssmtp.com header.s=20150623; dmarc=none

Hi Jozsef,

I'm glad the fix worked. I am now in the process of pushing it to
Charm++, as well as requesting to upstream it to hwloc:

https://charm.cs.illinois.edu/gerrit/4145

https://github.com/open-mpi/hwloc/pull/311

Regards,
--
Evan A. Ramos
Software Engineer
Charmworks, Inc.


On Tue, May 8, 2018 at 8:56 AM, Jozsef Bakosi
<jbakosi AT lanl.gov>
wrote:
> Hi Evan,
>
> The fix below appears to work. Here is the output of running a Charm++
> executable.
>
> ============================
> $ export HWLOC_COMPONENTS_VERBOSE=1
> $ Main/meshconv
> Registered cpu discovery component `no_os' with priority 40 (statically
> build)
> Registered global discovery component `xml' with priority 30 (statically
> build)
> Registered global discovery component `synthetic' with priority 30
> (statically build)
> Registered global discovery component `custom' with priority 30 (statically
> build)
> Registered cpu discovery component `linux' with priority 50 (statically
> build)
> Registered cpu discovery component `x86' with priority 45 (statically build)
> Enabling cpu discovery component `linux'
> Enabling cpu discovery component `x86'
> Enabling cpu discovery component `no_os'
> Excluding global discovery component `xml', conflicts with excludes 0x2
> Excluding global discovery component `synthetic', conflicts with excludes
> 0x2
> Excluding global discovery component `custom', conflicts with excludes 0x2
> Final list of enabled discovery components: linux,x86,no_os
> Disabling cpu discovery component `linux'
> Disabling cpu discovery component `x86'
> Disabling cpu discovery component `no_os'
> Registered cpu discovery component `no_os' with priority 40 (statically
> build)
> Registered global discovery component `xml' with priority 30 (statically
> build)
> Registered global discovery component `synthetic' with priority 30
> (statically build)
> Registered global discovery component `custom' with priority 30 (statically
> build)
> Registered cpu discovery component `linux' with priority 50 (statically
> build)
> Registered misc discovery component `linuxpci' with priority 19 (statically
> build)
> Registered cpu discovery component `x86' with priority 45 (statically build)
> Enabling cpu discovery component `linux'
> Enabling cpu discovery component `x86'
> Enabling cpu discovery component `no_os'
> Excluding global discovery component `xml', conflicts with excludes 0x2
> Excluding global discovery component `synthetic', conflicts with excludes
> 0x2
> Excluding global discovery component `custom', conflicts with excludes 0x2
> Enabling misc discovery component `linuxpci'
> Final list of enabled discovery components: linux,x86,no_os,linuxpci
> Charm++> Running on MPI version: 3.0
> Charm++> level of thread support used: MPI_THREAD_SINGLE (desired:
> MPI_THREAD_SINGLE)
> Charm++> Running in non-SMP mode: numPes 1
> Converse/Charm++ Commit ID:
> Warning> Randomization of virtual memory (ASLR) is turned on in the kernel,
> thread migration may not work! Run 'echo 0 >
> /proc/sys/kernel/randomize_va_space' as root to disable it, or try running
> with '+isomalloc_sync'.
> CharmLB> Load balancer assumes all CPUs are same.
> Charm++> Running on 1 unique compute nodes (36-way SMP).
> Charm++> cpu topology info is gathered in 0.000 seconds.
>
> meshconv Command-line Parameters:
> -h, --help Display one-liner help on all command-line
> arguments
> -H, --helpkw string Display verbose help on a single keyword
> -i, --input string Specify the input file
> -o, --output string Specify the output file
> -r, --reorder string Reorder mesh nodes
> -v, --verbose Select verbose screen output
> [Partition 0][Node 0] End of program
> Disabling cpu discovery component `linux'
> Disabling cpu discovery component `x86'
> Disabling cpu discovery component `no_os'
> Disabling misc discovery component `linuxpci'
> ============================
>
> Will you commit this soon so can I pull it from the github mirror's 'charm'
> branch?
>
> Thanks a lot,
> Jozsef
>
> On 05.07.2018 14:41, Evan Ramos wrote:
>> Hi Jozsef,
>>
>> Please try the following:
>>
>> 1. Adding this to hwloc/include/hwloc/rename.h in place of the
>> one-line change I suggested previously, and recompiling Charm++:
>>
>> #define hwloc_aix_component HWLOC_NAME(aix_component)
>> #define hwloc_bgq_component HWLOC_NAME(bgq_component)
>> #define hwloc_cuda_component HWLOC_NAME(cuda_component)
>> #define hwloc_custom_component HWLOC_NAME(custom_component)
>> #define hwloc_darwin_component HWLOC_NAME(darwin_component)
>> #define hwloc_fake_component HWLOC_NAME(fake_component)
>> #define hwloc_freebsd_component HWLOC_NAME(freebsd_component)
>> #define hwloc_gl_component HWLOC_NAME(gl_component)
>> #define hwloc_hpux_component HWLOC_NAME(hpux_component)
>> #define hwloc_linux_component HWLOC_NAME(linux_component)
>> #define hwloc_linuxpci_component HWLOC_NAME(linuxpci_component)
>> #define hwloc_netbsd_component HWLOC_NAME(netbsd_component)
>> #define hwloc_noos_component HWLOC_NAME(noos_component)
>> #define hwloc_nvml_component HWLOC_NAME(nvml_component)
>> #define hwloc_opencl_component HWLOC_NAME(opencl_component)
>> #define hwloc_osf_component HWLOC_NAME(osf_component)
>> #define hwloc_pci_component HWLOC_NAME(pci_component)
>> #define hwloc_solaris_component HWLOC_NAME(solaris_component)
>> #define hwloc_synthetic_component HWLOC_NAME(synthetic_component)
>> #define hwloc_windows_component HWLOC_NAME(windows_component)
>> #define hwloc_x86_component HWLOC_NAME(x86_component)
>> #define hwloc_xml_libxml_component HWLOC_NAME(xml_libxml_component)
>> #define hwloc_xml_nolibxml_component HWLOC_NAME(xml_nolibxml_component)
>>
>> 2. Running `export HWLOC_COMPONENTS_VERBOSE=1` before your binary.
>>
>> Regards,
>> --
>> Evan A. Ramos
>> Software Engineer
>> Charmworks, Inc.
>>
>>
>> On Wed, May 2, 2018 at 12:31 AM, Jozsef Bakosi
>> <jbakosi AT lanl.gov>
>> wrote:
>> > Hi Evan,
>> >
>> > Applying the patch below allows linking fine, but I get a segfault at
>> > runtime (running in serial):
>> >
>> > (gdb) run
>> > Starting program: /home/quinoa/quinoa/build/Main/meshconv Main/meshconv
>> > [New LWP 18798]
>> >
>> > Thread 1 "meshconv" received signal SIGSEGV, Segmentation fault.
>> > 0x00007ffff79b4c10 in opal_hwloc191_hwloc_components_init ()
>> > (gdb) where
>> > #0 0x00007ffff79b4c10 in opal_hwloc191_hwloc_components_init ()
>> > #1 0x00007ffff79a0077 in opal_hwloc191_hwloc_topology_init ()
>> > #2 0x00007ffff7983174 in opal_hwloc_base_get_topology ()
>> > #3 0x00007ffff774b686 in ompi_mpi_init ()
>> > #4 0x00007ffff7760660 in PMPI_Init_thread ()
>> > #5 0x00007ffff76beeee in LrtsInit (argc=0x7fffffffebdc,
>> > argv=0x7fffffffebd0, numNodes=0x7ffff7feb398 <_Cmi_numnodes>,
>> > myNodeID=0x7ffff7feb2e0 <_Cmi_mynode>) at machine.c:1440
>> > #6 0x00007ffff76bd130 in ConverseInit (argc=2, argv=0x7fffffffeca8,
>> > fn=0x7ffff75c4b0b <_initCharm(int, char**)>, usched=0, initret=0) at
>> > machine-common-core.c:1286
>> > #7 0x00007ffff75c2a11 in main (argc=2, argv=0x7fffffffeca8) at main.C:9
>> >
>> > Thanks for looking into this,
>> > Jozsef
>> >
>> > On 05.01.2018 18:07, Evan Ramos wrote:
>> >> It is not possible to disable hwloc, since we rely on it to query
>> >> hardware topology and set affinities. We also cannot rely on whatever
>> >> version may be linked into OpenMPI due to potential mismatches with
>> >> our code. However, it looks like this issue may have a simple fix.
>> >> Could you test this change:
>> >>
>> >>
>> >> diff --git a/contrib/hwloc/include/hwloc/rename.h
>> >> b/contrib/hwloc/include/hwloc/rename.h
>> >> index 9a0c5fae5..39660f4d3 100644
>> >> --- a/contrib/hwloc/include/hwloc/rename.h
>> >> +++ b/contrib/hwloc/include/hwloc/rename.h
>> >> @@ -489,6 +489,8 @@ extern "C" {
>> >> #define hwloc_component_type_t HWLOC_NAME(component_type_t)
>> >> #define hwloc_component HWLOC_NAME(component)
>> >>
>> >> +#define hwloc_linux_component HWLOC_NAME(linux_component)
>> >> +
>> >> #define hwloc_plugin_check_namespace HWLOC_NAME(plugin_check_namespace)
>> >>
>> >> #define hwloc_insert_object_by_cpuset
>> >> HWLOC_NAME(insert_object_by_cpuset)
>> >>
>> >>
>> >> If this resolves the issue, I will fix it in our tree and report it
>> >> upstream so that this commit can be partially reverted:
>> >> https://github.com/open-mpi/hwloc/commit/93abf09fee121c55b99f578d62e3ea21decdfbed
>> >>
>> >> Regards,
>> >> --
>> >> Evan A. Ramos
>> >> Software Engineer
>> >> Charmworks, Inc.
>> >>
>> >>
>> >> On Tue, May 1, 2018 at 10:56 AM, Jozsef Bakosi
>> >> <jbakosi AT lanl.gov>
>> >> wrote:
>> >> > Hi folks,
>> >> >
>> >> > Is it possible to disable hwloc in Charm++? I'm getting:
>> >> >
>> >> > /opt/openmpi/lib/libopen-pal.a(topology-linux.o):(.data.rel.ro.local+0x40):
>> >> > multiple definition of `hwloc_linux_component'
>> >> > <charm-install-dir>/charm/bin/../lib/libhwloc_embedded.a(topology-linux.o):(.data.rel.ro.local+0x0):
>> >> > first defined here
>> >> >
>> >> > Thanks,
>> >> > Jozsef



Archive powered by MHonArc 2.6.19.

Top of Page