Speed profiling using TimingInfo and TimingDeltas classes
From Yade
TimingInfo class provides per-engine information about the time it was running and how many times it was executed. TimingDeltas provide for much finer (but intrusive) way of checkpointing code and getting detailed information at the level of parts of functions.
Usage
Timing information collection has to be turned on
<source lang="cpp"> TimingInfo::enabled=True; // c++</source> <source lang="python"> O.timingEnabled=True # python </source>
Note that timing has some overhead. It is a few percent for plain TimingInfo (engine-level), It can be much more for TimingDeltas.
After running the simulation, you can get some simple statistics: <source lang="python"> import yade.timing timing.stats() </source>
Name Count Time Rel. time ------------------------------------------------------------------------------------------------------- PhysicalActionContainerReseter 400 9380μs 0.01% BoundingVolumeMetaEngine 400 1162938μs 1.79% PersistentSAPCollider 400 9664181μs 14.90% InteractionGeometryMetaEngine 400 15401862μs 23.75% InteractionPhysicsMetaEngine 400 9709880μs 14.97% ConstitutiveLawDispatcher 400 26975622μs 41.59% "damper" 400 1871791μs 2.89% "strainer" 400 21561μs 0.03% "plotDataCollector" 101 40705μs 0.06% "damageChecker" 6 1967μs 0.00% TOTAL 64859892μs 100.00%
If I want to see what happend inside the ConstitutiveLaw I use, I will add TimingDeltas to the c++ source (see below) to get:
Name Count Time Rel. time ------------------------------------------------------------------------------------------------------- PhysicalActionContainerReseter 400 9449μs 0.01% BoundingVolumeMetaEngine 400 1171770μs 1.15% PersistentSAPCollider 400 9433093μs 9.24% InteractionGeometryMetaEngine 400 15177607μs 14.87% InteractionPhysicsMetaEngine 400 9518738μs 9.33% ConstitutiveLawDispatcher 400 64810867μs 63.49% ef2_Spheres_Brefcom_BrefcomLaw setup 4926145 7649131μs 15.25% geom 4926145 23216292μs 46.28% material 4926145 8595686μs 17.14% rest 4926145 10700007μs 21.33% TOTAL 50161117μs 100.00% "damper" 400 1866816μs 1.83% "strainer" 400 21589μs 0.02% "plotDataCollector" 160 64284μs 0.06% "damageChecker" 9 3272μs 0.00% TOTAL 102077490μs 100.00%
- MetaEngine and ParallelEngine can have subordinated slaves/functors, in which case the table shows those as well in a tree-like fashion.
- TimingDeltas show in subordination to their containing Engine/EngineUnit. Here, the "ef2_Spheres_Brefcom_BrefcomLaw" has 4 checkpoints named "setup", "geom", "material" and "rest" (see below on details).
- Constitutive law takes over 60% now. This is because it containes TimingDeltas, which are very expensive, since they are run 5e6 times (and it is just a few minutes of simulation)!!
- The proportions of checkpoints inside ef2_Spheres_Brefcom_BrefcomLaw are however realistic. I am for example told that almost half of the time (relative to ef2_Spheres_Brefcom_BrefcomLaw) is spent in the "geom" part, only 17% in the material law itself.
- I can compare total time for ef2_Spheres_Brefcom_BrefcomLaw::go and time that is spent in ConstitutiveDispatcher in total: 50.2sec vs. 64.8sec. The dispatcher therefore has overhead of about 10% of total simulation time (102sec) to do its job.
TimingInfo
Each engine and EngineUnit contains TimingInfo object. Every time the engine is activated, nanosecond-resolution is requested from the system. When the engine returns, the durations is recorded to the engine's timingInfo and execCount is incremented. This can be queried using <source lang="python"> someEngine.execTime # in nanoseconds someEngine.execCount </source> To reset all timing data, say <source lang="python"> yade.timing.reset() # traverses recursively all engines and functors and resets counts and times to 0 </source>
TimingDeltas
TimingDeltas is an object that can be optionally held by any Engine or EngineUnit. Using special functions, labeled checkpoints are marked in the source code. As checkpoint is passed, the time since the last checkpoint is increased and execCount is incremented. Typically, you will use this to see which part of your funcion take the most time during execution.
First, I have to create the timingDeltas object before I use it, for example in the Engine's (or EngineUnit's) constructor: <source lang="cpp"> class ef2_Spheres_Brefcom_BrefcomLaw: public ConstitutiveLaw{
public: ef2_Spheres_Brefcom_BrefcomLaw(): { timingDeltas=shared_ptr<TimingDeltas>(new TimingDeltas); } // ...
}; </source> Then I will add checkpoints to the function I want to measure: <source lang="cpp"> void ef2_Spheres_Brefcom_BrefcomLaw::go(shared_ptr<InteractionGeometry>& _geom, shared_ptr<InteractionPhysics>& _phys, Interaction* I, MetaBody* rootBody){
timingDeltas->start(); // the point at which the first timing starts // prepare some variables etc here timingDeltas->checkpoint("setup"); // find geometrical data (deformations) here timingDeltas->checkpoint("geom"); // compute forces here timingDeltas->checkpoint("material"); // apply forces, cleanup here timingDeltas->checkpoint("rest");
} </source>
See TimingDeltas class for more information. This class can be also accessed from python: <source lang="python"> deltas=someEngine.timingDeltas deltas[0][0] # 0th checkpoint name deltas[0][1] # 0th checkpoint time in nanoseconds deltas[0][2] # 0th checkpoint execution count deltas[1][0] # ...
yade.timing.reset() # resets also all timingDeltas </source>