Insertion Sort Collider Stride

From Yade

Strategy description

  • Bounding boxes of all bodies are statically enlarged by the quantity InsertionSortCollider::sweepLength (must be set by the user); if negative (default), no striding. BoundingVolumeMetaEngine must be present in O.engines; it is deactivated by InsertionSortCollider, since we run it from within the collider only when it is needed.
  • InsertionSortCollider::stride (interval) is computed based on this fixed length and a measure provided by NewtonsDampedLaw: NewtonsDampedLaw::maxVelocitySq. InsertionSortCollider schedules the next run (this is based on virtual time rather than iteration number; this way, changing timestep is handled gracefully). maxVelocitySq is multiplied by InsertionSortCollider::sweepFactor (default 1.05, user-settable) before being used, to handle oscillation. NewtonsDampedLaw must also be present in O.engines.
  • InsertionSortCollider::isActivated monitors (at every step) maxVelocitySq; if it gets larger such that it would be possible for the body with this velocity to go out of the bounding box, the collider is woken up prematurely.

Parameter calibration

Based on experiments with scripts/test/collider-stride-triax.py, best values for the two parameters seem to be:

  • sweepFactor can be kept at the default value, i.e. 1.05.
  • sweepLength about 0.05*typical_sphere_radius



    • The lower limit is that if sweepLength is too small, no striding will be performed, except for almost-static simulations.
    • The upper limit is number of spurious potential interactions, which slows down the rest. For normal TriaxialTest, the ratio O.interactions.countReal()/len(O.interactions)==0.56, drops to about 0.5 for sweepLength=0.05*radius and to 0.43 for 0.1*radius.

Results

  • scripts/test/collider-stride-triax.py with 8k spheres (it is expected, but not yet measured, that the difference will be much larger with many particles
    • 43s with the normal setup (table for steps 2000-2500 )
Name                                                    Count                 Time            Rel. time  
-------------------------------------------------------------------------------------------------------  
PhysicalActionContainerReseter                      500             209626us                2.78%        
BoundingVolumeMetaEngine                              0                  0us                0.00%        
InsertionSortCollider                               500            1600184us               21.23%        
InteractionDispatchers                              500            3763965us               49.93%        
GlobalStiffnessTimeStepper                           10              68153us                0.90%        
TriaxialCompressionEngine                           500             515486us                6.84%        
NewtonsDampedLaw                                    500            1381668us               18.33%        
TOTAL                                                              7539084us              100.00%  
Number of interactions: 37543 (real ratio: 0.56964)
  • 37s with sweepLength=0.05*radius (collider taking <1% in the later stages) (table for steps 2000-2500 )
Name                                                    Count                 Time            Rel. time  
-------------------------------------------------------------------------------------------------------  
PhysicalActionContainerReseter                      500             208016us                3.49%        
BoundingVolumeMetaEngine                              0                  0us                0.00%        
InsertionSortCollider                                14              80656us                1.35%        
InteractionDispatchers                              500            3737658us               62.73%        
GlobalStiffnessTimeStepper                           10              74176us                1.24%        
TriaxialCompressionEngine                           500             538253us                9.03%        
NewtonsDampedLaw                                    500            1320030us               22.15%        
TOTAL                                                              5958792us              100.00%        
Number of interactions: 42427 (real ratio: 0.502958)

(InsertionSortCollider was run only 14* during 500 steps)

Current limitations

  • angularVelocity is not taken into account. This is OK for spheres, but not OK for facets, for instance. However, normally facets are non-dynamic, and presumably the fastest bodies will be some spheres.
  • The maxVelocitySq monitoring might get fooled by a sudden peak and fall back, it seems; it would be better to update the scheduledRun in isActivated if maxVelocitySq changes (increases, but also decreases, which is not handled now). This would estimate the integral. See isActivated code for the formulas.
  • Enlarges bboxes of all bodies the same. It would be possible to ask for maxVelocity per-body, but in that case a single body getting too fast would make the collider run again and this could happen often. Also, per-body characteristics change the bbox size often, thereby creating lot of artificial inversions in the sorting code, which slows down the most.
  • One body going crazy makes the whole simulation slower. It could be somehow possible to separate bodies into groups based on their |velocity| and enlarge the bbox most for the fastest ones etc. It would however incur some overhead.

Enhancement ideas: velocity bins

  • Instead of having one maxVelocitySq for all bodies, bodies would be put to a (user-settable) number of bins based on their |velocity|. The overall maximum maxVelocitySq would be the maximum for the 0th bih, maxima of other bins would be maxVelocitySq_n=maxVelocitySq_0/((binFactor^2)^n). (binFactor^2) because it is squares of velocities; binFactor~10, which would give bins for max. speeds 20,2,0.2 etc. Each of them would have the sweepDistance smaller by 10x, which should save substantial amount of spurious interactions.
  • NewtonsDampedLaw would hold an instance of VelocityBin class; putting bodies to bins would be triggered from InsertionSortCollider (by a method call), since it requires a loop over bodies. There is vector<short> bodyBins, which has bin numbers indexed by bodyIds. Also, there is vector<Real> maxVelocitySq, which holds maximum for the respective bin.
  • NewtonsDampedLaw would at every step:
  1. reset the maxVelocitySq vector
  2. update maxVelocitySq_n as it traverses bodies, depending on the bin in which the respecive body is. (this will require separate instances of maxima vectors for all threads when parallelized and getting the overall maximum after the loop)
  • InsertionSortCollider would
  1. At every step in isActivated: add maxVelocitySq_n*dt to dist_n and check whether this velocity integral hasn't gotten over the maximum distance for the bin, i.e. sweepDist/(binFactor^n). If so, activate. If not, nothing to do.
  2. At the full run once in a while (a) call NewtonsDampedLaw::assignBins() which will recompute maxima for all bins (based on the global maximum... -- is that OK?) (b) update stride etc.

I think I get the idea. If I understand, this approach would be especially usefull for simulations where some bodies are very dynamic while others are almost static. In such case, it sounds a really good idea. I don't want to refrain your enthusiastic implementation spree, however I'm wondering what will be the cost of all this in cpu time compared to what will be gained for typical dense packing simulations, where bodies have velocities of the same order of magnitude (I guess). Thank you for the graphs. :) bchareyre 22:00, 14 July 2009 (UTC)

Comments

    • running on 50k spheres, steps 1000-3500 (during first 1000 there is so much motion that striding is not active), I get those results (clearly there is much more benefit as the packing slows down) eudoxos 08:16, 11 July 2009 (UTC)
plain  350s total:  103s (74%, 500 runs), 101s (61%, 500 runs), 79s (47%, 500 runs), 65s (32%, 500 runs)
stride 291s total:   96s (71%, 374 runs),  82s (49%, 156 runs), 62s (27%,  84 runs), 51s ( 8%,  34 runs) # with .1*rSphere
      • So, if I understand correctly, the collider is not the bottleneck in terms of speed now, even for large numbers of spheres. Do you agree? bchareyre 08:43, 11 July 2009 (UTC)
      • More or less; BUT it just allows the number of spheres to grow more and as soon as you get to 1e6 particles (since it will be doable), the problem will be there again (OK, seems not so much realistic for now) ~~~~
  • I'm curious to know how the distributions of {v(i)}, {v(i)-v(j)}, and {sum(v(i)-v(j) on interval} look like. It might be possible to define the probability that the distance 2*sweepLength between two bodies is closed during the interval, as a function of the number of spheres and maxVelocitySq (I have to check what maxVelocitySq is exactly). In that case, you could set a probabilistic value intervalP, corresponding to the probability of "missing" a contact below 10e-10. In some cases intervalP could be significantly larger than interval (especially dense and slow packings). The only "juge de paix" at the end is stability and impact on macroscopic response.bchareyre 22:37, 10 July 2009 (UTC)
    • (note that the x-axis range changes a lot):
      Triax-velocity-histogram.png
import pylab
	pylab.hist([sqrt(sum([v**2 for v in b.phys['velocity']])) for b in O.bodies])
	pylab.title('step %d; max speed %g'%(O.iter,sqrt(newton['maxVelocitySq'])))
	pylab.grid()
	pylab.savefig('%s-%05d.png'%(loadFrom,O.iter),dpi=40)
BTW for my own code, the histogram is still much less dispersed, e.g. max. speed 0.47 and (visually) all 25k spheres are in the 0-0.02 bin! This will benefit hugely from the binning as explained above (decrease number of potential interactions). ~~~~