Insertion Sort Collider Stride
From Yade
Contents
Strategy description
 Bounding boxes of all bodies are statically enlarged by the quantity InsertionSortCollider::sweepLength (must be set by the user); if negative (default), no striding. BoundingVolumeMetaEngine must be present in O.engines; it is deactivated by InsertionSortCollider, since we run it from within the collider only when it is needed.
 InsertionSortCollider::stride (interval) is computed based on this fixed length and a measure provided by NewtonsDampedLaw: NewtonsDampedLaw::maxVelocitySq. InsertionSortCollider schedules the next run (this is based on virtual time rather than iteration number; this way, changing timestep is handled gracefully). maxVelocitySq is multiplied by InsertionSortCollider::sweepFactor (default 1.05, usersettable) before being used, to handle oscillation. NewtonsDampedLaw must also be present in O.engines.
 InsertionSortCollider::isActivated monitors (at every step) maxVelocitySq; if it gets larger such that it would be possible for the body with this velocity to go out of the bounding box, the collider is woken up prematurely.
Parameter calibration
Based on experiments with scripts/test/colliderstridetriax.py, best values for the two parameters seem to be:
 sweepFactor can be kept at the default value, i.e. 1.05.
 sweepLength about 0.05*typical_sphere_radius
 The lower limit is that if sweepLength is too small, no striding will be performed, except for almoststatic simulations.
 The upper limit is number of spurious potential interactions, which slows down the rest. For normal TriaxialTest, the ratio O.interactions.countReal()/len(O.interactions)==0.56, drops to about 0.5 for sweepLength=0.05*radius and to 0.43 for 0.1*radius.
Results
 scripts/test/colliderstridetriax.py with 8k spheres (it is expected, but not yet measured, that the difference will be much larger with many particles
 43s with the normal setup (table for steps 20002500 )
Name Count Time Rel. time  PhysicalActionContainerReseter 500 209626us 2.78% BoundingVolumeMetaEngine 0 0us 0.00% InsertionSortCollider 500 1600184us 21.23% InteractionDispatchers 500 3763965us 49.93% GlobalStiffnessTimeStepper 10 68153us 0.90% TriaxialCompressionEngine 500 515486us 6.84% NewtonsDampedLaw 500 1381668us 18.33% TOTAL 7539084us 100.00% Number of interactions: 37543 (real ratio: 0.56964)
 37s with sweepLength=0.05*radius (collider taking <1% in the later stages) (table for steps 20002500 )
Name Count Time Rel. time  PhysicalActionContainerReseter 500 208016us 3.49% BoundingVolumeMetaEngine 0 0us 0.00% InsertionSortCollider 14 80656us 1.35% InteractionDispatchers 500 3737658us 62.73% GlobalStiffnessTimeStepper 10 74176us 1.24% TriaxialCompressionEngine 500 538253us 9.03% NewtonsDampedLaw 500 1320030us 22.15% TOTAL 5958792us 100.00% Number of interactions: 42427 (real ratio: 0.502958)
(InsertionSortCollider was run only 14* during 500 steps)
Current limitations
 angularVelocity is not taken into account. This is OK for spheres, but not OK for facets, for instance. However, normally facets are nondynamic, and presumably the fastest bodies will be some spheres.
 The maxVelocitySq monitoring might get fooled by a sudden peak and fall back, it seems; it would be better to update the scheduledRun in isActivated if maxVelocitySq changes (increases, but also decreases, which is not handled now). This would estimate the integral. See isActivated code for the formulas.
 Enlarges bboxes of all bodies the same. It would be possible to ask for maxVelocity perbody, but in that case a single body getting too fast would make the collider run again and this could happen often. Also, perbody characteristics change the bbox size often, thereby creating lot of artificial inversions in the sorting code, which slows down the most.
 One body going crazy makes the whole simulation slower. It could be somehow possible to separate bodies into groups based on their velocity and enlarge the bbox most for the fastest ones etc. It would however incur some overhead.
Enhancement ideas: velocity bins
 Instead of having one maxVelocitySq for all bodies, bodies would be put to a (usersettable) number of bins based on their velocity. The overall maximum maxVelocitySq would be the maximum for the 0th bih, maxima of other bins would be maxVelocitySq_n=maxVelocitySq_0/((binFactor^2)^n). (binFactor^2) because it is squares of velocities; binFactor~10, which would give bins for max. speeds 20,2,0.2 etc. Each of them would have the sweepDistance smaller by 10x, which should save substantial amount of spurious interactions.
 NewtonsDampedLaw would hold an instance of VelocityBin class; putting bodies to bins would be triggered from InsertionSortCollider (by a method call), since it requires a loop over bodies. There is vector<short> bodyBins, which has bin numbers indexed by bodyIds. Also, there is vector<Real> maxVelocitySq, which holds maximum for the respective bin.
 NewtonsDampedLaw would at every step:
 reset the maxVelocitySq vector
 update maxVelocitySq_n as it traverses bodies, depending on the bin in which the respecive body is. (this will require separate instances of maxima vectors for all threads when parallelized and getting the overall maximum after the loop)
 InsertionSortCollider would
 At every step in isActivated: add maxVelocitySq_n*dt to dist_n and check whether this velocity integral hasn't gotten over the maximum distance for the bin, i.e. sweepDist/(binFactor^n). If so, activate. If not, nothing to do.
 At the full run once in a while (a) call NewtonsDampedLaw::assignBins() which will recompute maxima for all bins (based on the global maximum...  is that OK?) (b) update stride etc.
I think I get the idea. If I understand, this approach would be especially usefull for simulations where some bodies are very dynamic while others are almost static. In such case, it sounds a really good idea. I don't want to refrain your enthusiastic implementation spree, however I'm wondering what will be the cost of all this in cpu time compared to what will be gained for typical dense packing simulations, where bodies have velocities of the same order of magnitude (I guess). Thank you for the graphs. :) bchareyre 22:00, 14 July 2009 (UTC)
Comments
 running on 50k spheres, steps 10003500 (during first 1000 there is so much motion that striding is not active), I get those results (clearly there is much more benefit as the packing slows down) eudoxos 08:16, 11 July 2009 (UTC)
plain 350s total: 103s (74%, 500 runs), 101s (61%, 500 runs), 79s (47%, 500 runs), 65s (32%, 500 runs) stride 291s total: 96s (71%, 374 runs), 82s (49%, 156 runs), 62s (27%, 84 runs), 51s ( 8%, 34 runs) # with .1*rSphere


 So, if I understand correctly, the collider is not the bottleneck in terms of speed now, even for large numbers of spheres. Do you agree? bchareyre 08:43, 11 July 2009 (UTC)
 More or less; BUT it just allows the number of spheres to grow more and as soon as you get to 1e6 particles (since it will be doable), the problem will be there again (OK, seems not so much realistic for now) ~~~~

 I'm curious to know how the distributions of {v(i)}, {v(i)v(j)}, and {sum(v(i)v(j) on interval} look like. It might be possible to define the probability that the distance 2*sweepLength between two bodies is closed during the interval, as a function of the number of spheres and maxVelocitySq (I have to check what maxVelocitySq is exactly). In that case, you could set a probabilistic value intervalP, corresponding to the probability of "missing" a contact below 10e10. In some cases intervalP could be significantly larger than interval (especially dense and slow packings). The only "juge de paix" at the end is stability and impact on macroscopic response.bchareyre 22:37, 10 July 2009 (UTC)
 (note that the xaxis range changes a lot):
import pylab pylab.hist([sqrt(sum([v**2 for v in b.phys['velocity']])) for b in O.bodies]) pylab.title('step %d; max speed %g'%(O.iter,sqrt(newton['maxVelocitySq']))) pylab.grid() pylab.savefig('%s%05d.png'%(loadFrom,O.iter),dpi=40)
 BTW for my own code, the histogram is still much less dispersed, e.g. max. speed 0.47 and (visually) all 25k spheres are in the 00.02 bin! This will benefit hugely from the binning as explained above (decrease number of potential interactions). ~~~~