Skip to Content.
Sympa Menu

charm - [charm] Fwd: Question about running BigNetSim with "+wth4"

charm AT lists.cs.illinois.edu

Subject: Charm++ parallel programming system

List archive

[charm] Fwd: Question about running BigNetSim with "+wth4"


Chronological Thread 
  • From: Xuehan Xu <xxhdx1985126 AT gmail.com>
  • To: charm AT cs.uiuc.edu
  • Subject: [charm] Fwd: Question about running BigNetSim with "+wth4"
  • Date: Sun, 16 Oct 2011 15:11:48 +0800
  • List-archive: <http://lists.cs.uiuc.edu/pipermail/charm>
  • List-id: CHARM parallel programming system <charm.cs.uiuc.edu>

I did the following modification to the file "netconfig"

NUM_NODES 8
DIMENSIONS 2 2 2

and the output of BigNetSim beco like this:
[couple@node70 BlueGene]$ ../tmp/charmrun +p1 ../tmp/bigsimulator 1 0 ++remote-shell ssh
Charmrun> started all node programs in 1.174 seconds.
Converse/Charm++ Commit ID: v6.3.0-626-gf074431
Charm++> scheduler running in netpoll mode.
Charm++> Running on 1 unique compute nodes (2-way SMP).
Charm++> cpu topology info is gathered in 0.000 seconds.
================= Simulation Configuration =================
Number of physical PEs: 1
POSE mode: Parallel
Network model: BlueGene
Command line: /home/couple/NewCharm/BigNetSim/trunk/BlueGene/../tmp/bigsimulator 1 0
Timing factor: 1.000000e+08 (i.e., 1 GVT tick = 10 ns)
cpufactor: 1.000000
bgTrace summary: totalBGProcs=32 X=2 Y=2 Z=2 #CommThreads=1 #WorkerThreads=4 #PEs=1 LogVersion=6
Simulation mode: trace driven
Simulation network mode: full contention
Initializing POSE... 
POSE initialization complete.
Using Inactivity Detection for termination.
Network parameters:
   Max packet size: 256
   File window size: 0
   Debug print level: 0
   Window load threshold: 0
   Intra node latency: 0.500000 us
   Intra node bandwidth: 1.000000 GB/s
   Number of buffers per port in each switch: 12
   Switch buffer size: 1024
   Channel bandwidth: 1.000000
   Channel delay: 0
   Link stats collection interval: 1000000 GVT ticks
   Link stats on: no
   Message stats on: no
   Adaptive routing on: yes
   Header size: 16 bytes
   Processor send overhead: 0 GVT ticks
   Processor receive overhead: 0 GVT ticks
   Number of simulated nodes: 8
============================================================
Info> invoking startup task from proc 0 ...
Info> Starting at the beginning of the simulation
Info> Running to the end of the simulation
WARNING: TASK NOT FOUND src:0 msg:4 on:24
WARNING: TASK NOT FOUND src:0 msg:4 on:24
WARNING: TASK NOT FOUND src:0 msg:4 on:26
WARNING: TASK NOT FOUND src:0 msg:4 on:26
WARNING: TASK NOT FOUND src:0 msg:4 on:27
WARNING: TASK NOT FOUND src:0 msg:4 on:27
WARNING: TASK NOT FOUND src:0 msg:4 on:28
WARNING: TASK NOT FOUND src:0 msg:4 on:28
WARNING: TASK NOT FOUND src:0 msg:4 on:30
WARNING: TASK NOT FOUND src:0 msg:4 on:30
WARNING: TASK NOT FOUND src:0 msg:4 on:31
WARNING: TASK NOT FOUND src:0 msg:4 on:31
WARNING: TASK NOT FOUND src:0 msg:4 on:8
WARNING: TASK NOT FOUND src:0 msg:4 on:8
WARNING: TASK NOT FOUND src:0 msg:4 on:10
WARNING: TASK NOT FOUND src:0 msg:4 on:10

.........

273407 0 Something wrong src 1 dst 0 msgid 1
273407 0 message was not stored in advance273543 0 Something wrong src 1 dst 0 msgid 1
273543 0 message was not stored in advance273679 0 Something wrong src 1 dst 0 msgid 1
273679 0 message was not stored in advance272863 0 Something wrong src 2 dst 0 msgid 2
272863 0 message was not stored in advance273135 0 Something wrong src 2 dst 0 msgid 2
273135 0 message was not stored in advance273271 0 Something wrong src 2 dst 0 msgid 2
273271 0 message was not stored in advance273951 0 Something wrong src 3 dst 0 msgid 3
273951 0 message was not stored in advance274087 0 Something wrong src 3 dst 0 msgid 3
274087 0 message was not stored in advance274495 0 Something wrong src 3 dst 0 msgid 3
274495 0 message was not stored in advance274223 0 Something wrong src 4 dst 0 msgid 4
274223 0 message was not stored in advance274359 0 Something wrong src 4 dst 0 msgid 4
274359 0 message was not stored in advance274903 0 Something wrong src 4 dst 0 msgid 4
274903 0 message was not stored in advance274767 0 Something wrong src 5 dst 0 msgid 5
274767 0 message was not stored in advance275583 0 Something wrong src 5 dst 0 msgid 5
275583 0 message was not stored in advance275719 0 Something wrong src 5 dst 0 msgid 5
275719 0 message was not stored in advance275175 0 Something wrong src 6 dst 0 msgid 6
275175 0 message was not stored in advance275311 0 Something wrong src 6 dst 0 msgid 6
275311 0 message was not stored in advance276263 0 Something wrong src 6 dst 0 msgid 6
276263 0 message was not stored in advance275855 0 Something wrong src 7 dst 0 msgid 7
275855 0 message was not stored in advance275991 0 Something wrong src 7 dst 0 msgid 7
275991 0 message was not stored in advance276127 0 Something wrong src 7 dst 0 msgid 7
276127 0 message was not stored in advance281472 0 Something wrong src 4 dst 0 msgid 4
281472 0 message was not stored in advance282024 0 Something wrong src 4 dst 0 msgid 4
282024 0 message was not stored in advance282208 0 Something wrong src 5 dst 0 msgid 5
282208 0 message was not stored in advance282576 0 Something wrong src 5 dst 0 msgid 5
282576 0 message was not stored in advance283128 0 Something wrong src 6 dst 0 msgid 6
283128 0 message was not stored in advance283312 0 Something wrong src 6 dst 0 msgid 6
283312 0 message was not stored in advance283864 0 Something wrong src 7 dst 0 msgid 7
283864 0 message was not stored in advance284048 0 Something wrong src 7 dst 0 msgid 7
284048 0 message was not stored in advance280920 0 Something wrong src 3 dst 0 msgid 3
280920 0 message was not stored in advance281104 0 Something wrong src 3 dst 0 msgid 3
281104 0 message was not stored in advance281656 0 Something wrong src 3 dst 0 msgid 3
281656 0 message was not stored in advance279632 0 Something wrong src 1 dst 0 msgid 1
279632 0 message was not stored in advance280184 0 Something wrong src 1 dst 0 msgid 1
280184 0 message was not stored in advance280552 0 Something wrong src 1 dst 0 msgid 1
280552 0 message was not stored in advance279816 0 Something wrong src 2 dst 0 msgid 2
279816 0 message was not stored in advance280368 0 Something wrong src 2 dst 0 msgid 2
280368 0 message was not stored in advance280736 0 Something wrong src 2 dst 0 msgid 2
280736 0 message was not stored in advanceSimulation inactive at time: 2776774
Final GVT = 2776774
282392 0 Something wrong src 4 dst 0 msgid 4
282392 0 message was not stored in advance282760 0 Something wrong src 5 dst 0 msgid 5
282760 0 message was not stored in advance283680 0 Something wrong src 6 dst 0 msgid 6
283680 0 message was not stored in advance284232 0 Something wrong src 7 dst 0 msgid 7
284232 0 message was not stored in advanceFinal basic stats: Commits: 8685  Rollbacks: 3
Final basic stats: GVT iterations: 2237
1 PE Simulation finished at 1.101722.

     Is this due to my misconfiguration of netconfig?
---------- Forwarded message ----------
From: Xuehan Xu <xxhdx1985126 AT gmail.com>
Date: 16 October 2011 14:45
Subject: Question about running BigNetSim with "+wth4"
To: charm AT cs.uiuc.edu


Dear Sirs:
      I tried to simulate the Cjacobi3D program with the parameter "+wth4", but some assertion error occured when running BigNetSim.
      I used the emulator to run the program with "+wth4" like the following:
./charmrun +p1 ./jacobi 4 4 2 +x2 +y2 +z2 +wth4 ++remote-shell ssh +bglog
      Then I moved the traces to BigNetSim/trunk/BlueGene/ and ran the BigNetSim:

[couple@node70 BlueGene]$ ../tmp/charmrun +p1 ../tmp/bigsimulator 1 0 ++remote-shell ssh
Charmrun> started all node programs in 2.110 seconds.
Converse/Charm++ Commit ID: v6.3.0-626-gf074431
Charm++> scheduler running in netpoll mode.
Charm++> Running on 1 unique compute nodes (2-way SMP).
Charm++> cpu topology info is gathered in 0.001 seconds.
================= Simulation Configuration =================
Number of physical PEs: 1
POSE mode: Parallel
Network model: BlueGene
Command line: /home/couple/NewCharm/BigNetSim/trunk/BlueGene/../tmp/bigsimulator 1 0
Timing factor: 1.000000e+08 (i.e., 1 GVT tick = 10 ns)
cpufactor: 1.000000
bgTrace summary: totalBGProcs=32 X=2 Y=2 Z=2 #CommThreads=1 #WorkerThreads=4 #PEs=1 LogVersion=6
Simulation mode: trace driven
Simulation network mode: full contention
Initializing POSE... 
POSE initialization complete.
Using Inactivity Detection for termination.
Network parameters:
   Max packet size: 256
   File window size: 0
   Debug print level: 0
   Window load threshold: 0
   Intra node latency: 0.500000 us
   Intra node bandwidth: 1.000000 GB/s
   Number of buffers per port in each switch: 12
   Switch buffer size: 1024
   Channel bandwidth: 1.000000
   Channel delay: 0
   Link stats collection interval: 1000000 GVT ticks
   Link stats on: no
   Message stats on: no
   Adaptive routing on: yes
   Header size: 16 bytes
   Processor send overhead: 0 GVT ticks
   Processor receive overhead: 0 GVT ticks
   Number of simulated nodes: 8
============================================================
Info> invoking startup task from proc 0 ...
Info> Starting at the beginning of the simulation
Info> Running to the end of the simulation
[0] Assertion "inPort == numP" failed in file modDirectionOrderedNDTorus.C line 29.
------------- Processor 0 Exiting: Called CmiAbort ------------
Reason:
[0] Stack Traceback:
  [0:0] CmiAbort+0x75  [0x82b7130]
  [0:1] __cmi_assert+0x3c  [0x82bfdb7]
  [0:2] _ZN26modDirectionOrderedNDTorus11selectRouteEiiiP8TopologyP6PacketRSt3mapIiiSt4lessIiESaISt4pairIKiiEEESC_Pt+0x187  [0x818b56d]
  [0:3] _ZN10SwitchBase10recvPacketEP6Packet+0x5d1  [0x8160ce9]
  [0:4] _ZN12state_Switch10recvPacketEP6Packet+0x24  [0x8161134]
  [0:5] _ZN6Switch9ResolveFnEiPv+0xf1  [0x815f2b9]
  [0:6] _ZN6adapt44StepEv+0x34b  [0x81dc453]
  [0:7] _ZN3sim4StepEv+0xc0  [0x81d7f22]
  [0:8] _ZN6Switch10recvPacketEP6Packet+0x1e3  [0x81610cb]
  [0:9] _ZN14CkIndex_Switch23_call_recvPacket_PacketEPvP6Switch+0x18  [0x815d28c]
  [0:10] CkDeliverMessageFree+0x44  [0x822bc6d]
  [0:11] _ZN14CkLocRec_local11invokeEntryEP12CkMigratablePvib+0x13a  [0x8243246]
  [0:12] _ZN14CkLocRec_local7deliverEP14CkArrayMessage11CkDeliver_ti+0x1bc  [0x82434da]
  [0:13] _ZN8CkLocMgr7deliverEP9CkMessage11CkDeliver_ti+0x266  [0x8244980]
  [0:14] _ZN8CkLocMgr13deliverInlineEP9CkMessage+0x28  [0x8230d30]
  [0:15]   [0x822d7ac]
  [0:16] _Z15_processHandlerPvP11CkCoreState+0x1af  [0x822d961]
  [0:17] CmiHandleMessage+0x3c  [0x82bd27e]
  [0:18] CsdScheduleForever+0x6b  [0x82bd463]
  [0:19] CsdScheduler+0x11  [0x82bd3d5]
  [0:20]   [0x82bb8f5]
  [0:21] ConverseInit+0x342  [0x82bbe04]
  [0:22] main+0x44  [0x8234d5b]
  [0:23] __libc_start_main+0xe6  [0x670cc6]
  [0:24]   [0x815afc1]
Fatal error on PE 0>

     How should I deal with it? Thank you, sir:-)




Archive powered by MHonArc 2.6.16.

Top of Page