Speed profiling using TimingInfo and TimingDeltas classes

From Yade

TimingInfo class provides per-engine information about the time it was running and how many times it was executed. TimingDeltas provide for much finer (but intrusive) way of checkpointing code and getting detailed information at the level of parts of functions.

Usage

Timing information collection has to be turned on

<source lang="cpp"> TimingInfo::enabled=True; // c++</source> <source lang="python"> O.timingEnabled=True # python </source>

Note that timing has some overhead. It is a few percent for plain TimingInfo (engine-level), It can be much more for TimingDeltas.

After running the simulation, you can get some simple statistics: <source lang="python"> import yade.timing timing.stats() </source>

Name                                                    Count                 Time            Rel. time
-------------------------------------------------------------------------------------------------------
PhysicalActionContainerReseter                      400               9380μs                0.01%      
BoundingVolumeMetaEngine                            400            1162938μs                1.79%      
PersistentSAPCollider                               400            9664181μs               14.90%      
InteractionGeometryMetaEngine                       400           15401862μs               23.75%      
InteractionPhysicsMetaEngine                        400            9709880μs               14.97%      
ConstitutiveLawDispatcher                           400           26975622μs               41.59%      
"damper"                                            400            1871791μs                2.89%      
"strainer"                                          400              21561μs                0.03%      
"plotDataCollector"                                 101              40705μs                0.06%      
"damageChecker"                                       6               1967μs                0.00%      
TOTAL                                                             64859892μs              100.00%      

If I want to see what happend inside the ConstitutiveLaw I use, I will add TimingDeltas to the c++ source (see below) to get:

Name                                                    Count                 Time            Rel. time
-------------------------------------------------------------------------------------------------------
PhysicalActionContainerReseter                      400               9449μs                0.01%      
BoundingVolumeMetaEngine                            400            1171770μs                1.15%      
PersistentSAPCollider                               400            9433093μs                9.24%      
InteractionGeometryMetaEngine                       400           15177607μs               14.87%      
InteractionPhysicsMetaEngine                        400            9518738μs                9.33%      
ConstitutiveLawDispatcher                           400           64810867μs               63.49%      
  ef2_Spheres_Brefcom_BrefcomLaw                                                                       
    setup                                           4926145            7649131μs               15.25%  
    geom                                            4926145           23216292μs               46.28%  
    material                                        4926145            8595686μs               17.14%  
    rest                                            4926145           10700007μs               21.33%  
    TOTAL                                                             50161117μs              100.00%  
"damper"                                            400            1866816μs                1.83%      
"strainer"                                          400              21589μs                0.02%      
"plotDataCollector"                                 160              64284μs                0.06%      
"damageChecker"                                       9               3272μs                0.00%      
TOTAL                                                            102077490μs              100.00%      
  • MetaEngine and ParallelEngine can have subordinated slaves/functors, in which case the table shows those as well in a tree-like fashion.
  • TimingDeltas show in subordination to their containing Engine/EngineUnit. Here, the "ef2_Spheres_Brefcom_BrefcomLaw" has 4 checkpoints named "setup", "geom", "material" and "rest" (see below on details).
  1. Constitutive law takes over 60% now. This is because it containes TimingDeltas, which are very expensive, since they are run 5e6 times (and it is just a few minutes of simulation)!!
  2. The proportions of checkpoints inside ef2_Spheres_Brefcom_BrefcomLaw are however realistic. I am for example told that almost half of the time (relative to ef2_Spheres_Brefcom_BrefcomLaw) is spent in the "geom" part, only 17% in the material law itself.
  3. I can compare total time for ef2_Spheres_Brefcom_BrefcomLaw::go and time that is spent in ConstitutiveDispatcher in total: 50.2sec vs. 64.8sec. The dispatcher therefore has overhead of about 10% of total simulation time (102sec) to do its job.

TimingInfo

Each engine and EngineUnit contains TimingInfo object. Every time the engine is activated, nanosecond-resolution is requested from the system. When the engine returns, the durations is recorded to the engine's timingInfo and execCount is incremented. This can be queried using <source lang="python"> someEngine.execTime # in nanoseconds someEngine.execCount </source> To reset all timing data, say <source lang="python"> yade.timing.reset() # traverses recursively all engines and functors and resets counts and times to 0 </source>

TimingDeltas

TimingDeltas is an object that can be optionally held by any Engine or EngineUnit. Using special functions, labeled checkpoints are marked in the source code. As checkpoint is passed, the time since the last checkpoint is increased and execCount is incremented. Typically, you will use this to see which part of your funcion take the most time during execution.

First, I have to create the timingDeltas object before I use it, for example in the Engine's (or EngineUnit's) constructor: <source lang="cpp"> class ef2_Spheres_Brefcom_BrefcomLaw: public ConstitutiveLaw{

  public:
     ef2_Spheres_Brefcom_BrefcomLaw(): { timingDeltas=shared_ptr<TimingDeltas>(new TimingDeltas); }
  // ...

}; </source> Then I will add checkpoints to the function I want to measure: <source lang="cpp"> void ef2_Spheres_Brefcom_BrefcomLaw::go(shared_ptr<InteractionGeometry>& _geom, shared_ptr<InteractionPhysics>& _phys, Interaction* I, MetaBody* rootBody){

  timingDeltas->start(); // the point at which the first timing starts
  // prepare some variables etc here
  timingDeltas->checkpoint("setup");
  // find geometrical data (deformations) here
  timingDeltas->checkpoint("geom");
  // compute forces here
  timingDeltas->checkpoint("material");
  // apply forces, cleanup here
  timingDeltas->checkpoint("rest");

} </source>

See TimingDeltas class for more information. This class can be also accessed from python: <source lang="python"> deltas=someEngine.timingDeltas deltas[0][0] # 0th checkpoint name deltas[0][1] # 0th checkpoint time in nanoseconds deltas[0][2] # 0th checkpoint execution count deltas[1][0] # ...

yade.timing.reset() # resets also all timingDeltas </source>