In this chapter, we mathematically describe general features of explicit DEM simulations, with some reference to Yade implementation of these algorithms. They are given roughly in the order as they appear in simulation; first, two particles might establish a new interaction, which consists in

- detecting collision between particles;
- creating new interaction and determining its properties (such as stiffness); they are either precomputed or derived from properties of both particles;

Then, for already existing interactions, the following is performed:

- strain evaluation;
- stress computation based on strains;
- force application to particles in interaction.

This simplified description serves only to give meaning to the ordering of sections within this chapter. A more detailed description of this *simulation loop* is given later.

In this chapter we refer to kinematic variables of the contacts as ``strains``, although at this scale it is also common to speak of ``displacements``. Which semantic is more appropriate depends on the conceptual model one is starting from, and therefore it cannot be decided independently of specific problems. The reader familiar with displacements can mentaly replace normal strain and shear strain by normal displacement and shear displacement, respectively, without altering the meaning of what follows.

Exact computation of collision configuration between two particles can be relatively expensive (for instance between Sphere and Facet). Taking a general pair of bodies and and their ``exact`` (In the sense of precision admissible by numerical implementation.) spatial predicates (called Shape in Yade) represented by point sets , the detection generally proceeds in 2 passes:

fast collision detection using approximate predicate and ; they are pre-constructed in such a way as to abstract away individual features of and and satisfy the condition

(1)

(likewise for ). The approximate predicate is called ``bounding volume’’ (Bound in Yade) since it bounds any particle’s volume from outside (by virtue of the implication). It follows that and, by applying

*modus tollens*,(2)

which is a candidate exclusion rule in the proper sense.

By filtering away impossible collisions in (2), a more expensive, exact collision detection algorithms can be run on possible interactions, filtering out remaining spurious couples . These algorithms operate on and and have to be able to handle all possible combinations of shape types.

It is only the first step we are concerned with here.

Collision evaluation algorithms have been the subject of extensive research in fields such as robotics, computer graphics and simulations. They can be roughly divided in two groups:

- Hierarchical algorithms
- which recursively subdivide space and restrict the number of approximate checks in the first pass, knowing that lower-level bounding volumes can intersect only if they are part of the same higher-level bounding volume. Hierarchy elements are bounding volumes of different kinds: octrees [Jung1997], bounding spheres [Hubbard1996], k-DOP’s [Klosowski1998].
- Flat algorithms
work directly with bounding volumes without grouping them in hierarchies first; let us only mention two kinds commonly used in particle simulations:

- Sweep and prune
- algorithm operates on axis-aligned bounding boxes, which overlap if and only if they overlap along all axes. These algorithms have roughly complexity, where is number of particles as long as they exploit
*temporal coherence*of the simulation. - Grid algorithms
- represent continuous space by a finite set of regularly spaced points, leading to very fast neighbor search; they can reach the complexity [Munjiza1998] and recent research suggests ways to overcome one of the major drawbacks of this method, which is the necessity to adjust grid cell size to the largest particle in the simulation ([Munjiza2006], the ``multistep’’ extension).

- Temporal coherence
- expresses the fact that motion of particles in simulation is not arbitrary but governed by physical laws. This knowledge can be exploited to optimize performance.

Numerical stability of integrating motion equations dictates an upper limit on (sect. *Stability considerations*) and, by consequence, on displacement of particles during one step. This consideration is taken into account in [Munjiza2006], implying that any particle may not move further than to a neighboring grid cell during one step allowing the complexity; it is also explored in the periodic variant of the sweep and prune algorithm described below.

On a finer level, it is common to enlarge predicates in such a way that they satisfy the (1) condition during *several* timesteps; the first collision detection pass might then be run with stride, speeding up the simulation considerably. The original publication of this optimization by Verlet [Verlet1967] used enlarged list of neighbors, giving this technique the name *Verlet list*. In general cases, however, where neighbor lists are not necessarily used, the term *Verlet distance* is employed.

Let us describe in detail the sweep and prune algorithm used for collision detection in Yade (class InsertionSortCollider). Axis-aligned bounding boxes (Aabb) are used as ; each Aabb is given by lower and upper corner (in the following, , are minimum/maximum coordinates of along the -axis and so on). Construction of Aabb from various particle Shape‘s (such as Sphere, Facet, Wall) is straightforward, handled by appropriate classes deriving form BoundFunctor (Bo1_Sphere_Aabb, Bo1_Facet_Aabb, …).

Presence of overlap of two Aabb‘s can be determined from conjunction of separate overlaps of intervals along each axis (fig-sweep-and-prune):

where denotes interval in .

The collider keeps 3 separate lists (arrays) for each axis

where traverses all particles. arrays (sorted sets) contain respective coordinates of minimum and maximum corners for each Aabb (we call these coordinates *bound* in the following); besides bound, each of list elements further carries `id` referring to particle it belongs to, and a flag whether it is lower or upper bound.

In the initial step, all lists are sorted (using quicksort, average ) and one axis is used to create initial interactions: the range between lower and upper bound for each body is traversed, while bounds in-between indicate potential Aabb overlaps which must be checked on the remaining axes as well.

At each successive step, lists are already pre-sorted. Inversions occur where a particle’s coordinate has just crossed another particle’s coordinate; this number is limited by numerical stability of simulation and its physical meaning (giving spatio-temporal coherence to the algorithm). The insertion sort algorithm swaps neighboring elements if they are inverted, and has complexity between bigO{n} and bigO{n^2}, for pre-sorted and unsorted lists respectively. For our purposes, we need only to handle inversions, which by nature of the sort algorithm are detected inside the sort loop. An inversion might signify:

- overlap along the current axis, if an upper bound inverts (swaps) with a lower bound (i.e. that the upper bound with a higher coordinate was out of order in coming before the lower bound with a lower coordinate). Overlap along the other 2 axes is checked and if there is overlap along all axes, a new potential interaction is created.
- End of overlap along the current axis, if lower bound inverts (swaps) with an upper bound. If there is only potential interaction between the two particles in question, it is deleted.
- Nothing if both bounds are upper or both lower.

Let us show the sort algorithm on a sample sequence of numbers:

Elements are traversed from left to right; each of them keeps inverting (swapping) with neighbors to the left, moving left itself, until any of the following conditions is satisfied:

() | the sorting order with the left neighbor is correct, or |

() | the element is at the beginning of the sequence. |

We start at the leftmost element (the current element is marked )

It obviously immediately satisfies (), and we move to the next element:

Condition () holds, therefore we move to the right. The is not in order (violating ()) and two inversions take place; after that, () holds:

The last element first violates (), but satisfies it after one inversion

All elements having been traversed, the sequence is now sorted.

It is obvious that if the initial sequence were sorted, elements only would have to be traversed without any inversion to handle (that happens in time).

For each inversion during the sort in simulation, the function that investigates change in Aabb overlap is invoked, creating or deleting interactions.

The periodic variant of the sort algorithm is described in *Periodic insertion sort algorithm*, along with other periodic-boundary related topics.

As noted above, [Verlet1967] explored the possibility of running the collision detection only sparsely by enlarging predicates .

In Yade, this is achieved by enlarging Aabb of particles by fixed relative length (or Verlet’s distance) in all dimensions (InsertionSortCollider.sweepLength). Suppose the collider run last time at step and the current step is . NewtonIntegrator tracks the cummulated distance traversed by each particle between and by comparing the current position with the reference position from time (Bound::refPos),

(3)

triggering the collider re-run as soon as one particle gives:

(4)

InsertionSortCollider.targetInterv is used to adjust independently for each particle. Larger will be assigned to the fastest ones, so that all particles would ideally reach the edge of their bounds after this “target” number of iterations. Results of using Verlet distance depend highly on the nature of simulation and choice of InsertionSortCollider.targetInterv. Adjusting the sizes independently for each particle is especially efficient if some parts of a problem have high-speed particles will others are not moving. If it is not the case, no significant gain should be expected as compared to targetInterv=0 (assigning the same to all particles).

The number of particles and the number of available threads is also to be considered for choosing an appropriate Verlet’s distance. A larger distance will result in less time spent in the collider (which runs single-threaded) and more time in computing interactions (multi-threaded). Typically, large will be used for large simulations with more than particles on multi-core computers. On the other hand simulations with less than particles on single processor will probably benefit from smaller . Users benchmarks may be found on Yade’s wiki (see e.g. https://yade-dem.org/wiki/Colliders_performace).

Collision detection described above is only approximate. Exact collision detection depends on the geometry of individual particles and is handled separately. In Yade terminology, the Collider creates only *potential* interactions; potential interactions are evaluated exactly using specialized algorithms for collision of two spheres or other combinations. Exact collision detection must be run at every timestep since it is at every step that particles can change their mutual position (the collider is only run sometimes if the Verlet distance optimization is in use). Some exact collision detection algorithms are described in *Kinematic variables*; in Yade, they are implemented in classes deriving from IGeomFunctor (prefixed with `Ig2`).

Besides detection of geometrical overlap (which corresponds to IGeom in Yade), there are also non-geometrical properties of the interaction to be determined (IPhys). In Yade, they are computed for every new interaction by calling a functor deriving from IPhysFunctor (prefixed with `Ip2`) which accepts the given combination of Material types of both particles.

Basic DEM interaction defines two stiffnesses: normal stiffness and shear (tangent) stiffness . It is desirable that be related to fictitious Young’s modulus of the particles’ material, while is typically determined as a given fraction of computed . The ratio determines macroscopic Poisson’s ratio of the arrangement, which can be shown by dimensional analysis: elastic continuum has two parameters ( and ) and basic DEM model also has 2 parameters with the same dimensions and ; macroscopic Poisson’s ratio is therefore determined solely by and macroscopic Young’s modulus is then proportional to and affected by .

Naturally, such analysis is highly simplifying and does not account for particle radius distribution, packing configuration and other possible parameters such as the interaction radius introduced later.

The algorithm commonly used in Yade computes normal interaction stiffness as stiffness of two springs in serial configuration with lengths equal to the sphere radii (fig-spheres-contact-stiffness).

Let us define distance , where are distances between contact point and sphere centers, which are initially (roughly speaking) equal to sphere radii. Change of distance between the sphere centers is distributed onto deformations of both spheres proportionally to their compliances. Displacement change generates force , where assures proportionality and has physical meaning and dimension of stiffness; is related to the sphere material modulus and some length proportional to .

The most used class computing interaction properties Ip2_FrictMat_FrictMat_FrictPhys uses .

Some formulations define an equivalent cross-section , which in that case appears in the term as . Such is the case for the concrete model (Ip2_CpmMat_CpmMat_CpmPhys), where .

For reasons given above, no pretense about equality of particle-level and macroscopic modulus should be made. Some formulations, such as [Hentz2003], introduce parameters to match them numerically. This is not appropriate, in our opinion, since it binds those values to particular features of the sphere arrangement that was used for calibration.

Non-elastic parameters differ for various material models. Usually, though, they are averaged from the particles’ material properties, if it makes sense. For instance, Ip2_CpmMat_CpmMat_CpmPhys averages most quantities, while Ip2_FrictMat_FrictMat_FrictPhys computes internal friction angle as to avoid friction with bodies that are frictionless.

In the general case, mutual configuration of two particles has 6 degrees of freedom (DoFs) just like a beam in 3D space: both particles have 6 DoFs each, but the interaction itself is free to move and rotate in space (with both spheres) having 6 DoFs itself; then . They are shown at fig-spheres-dofs.

We will only describe normal and shear components of the relative movement in the following, leaving torsion and bending aside. The reason is that most constitutive laws for contacts do not use the latter two.

Let us consider two spheres with *initial* centers , and radii , that enter into contact. The order of spheres within the contact is arbitrary and has no influence on the behavior. Then we define lengths

These quantities are *constant* throughout the life of the interaction and are computed only once when the interaction is established. The distance is the *reference distance* and is used for the conversion of absolute displacements to dimensionless strain, for instance. It is also the distance where (for usual contact laws) there is neither repulsive nor attractive force between the spheres, whence the name *equilibrium distance*.

Distances and define reduced (or expanded) radii of spheres; geometrical radii and are used only for collision detection and may not be the same as and , as shown in fig. fig-sphere-sphere. This difference is exploited in cases where the average number of contacts between spheres should be increased, e.g. to influence the response in compression or to stabilize the packing. In such case, interactions will be created also for spheres that do not geometrically overlap based on the *interaction radius* , a dimensionless parameter determining „non-locality“ of contact detection. For , only spheres that touch are considered in contact; the general condition reads

(5)

The value of directly influences the average number of interactions per sphere (percolation), which for some models is necessary in order to achieve realistic results. In such cases, Aabb (or predicates in general) must be enlarged accordingly (Bo1_Sphere_Aabb.aabbEnlargeFactor).

Some constitutive laws are formulated with strains and stresses (Law2_ScGeom_CpmPhys_Cpm, the concrete model described later, for instance); in that case, equivalent cross-section of the contact must be introduced for the sake of dimensionality. The exact definition is rather arbitrary; the CPM model (Ip2_CpmMat_CpmMat_CpmPhys) uses the relation

(6)

which will be used to convert stresses to forces, if the constitutive law used is formulated in terms of stresses and strains. Note that other values than can be used; it will merely scale macroscopic packing stiffness; it is only for the intuitive notion of a truss-like element between the particle centers that we choose representing the circle area. Besides that, another function than can be used, although the result should depend linearly on and so that the equation gives consistent results if the particle dimensions are scaled.

The following state variables are updated as spheres undergo motion during the simulation (as and change):

(7)

and

(8)

The contact point is always in the middle of the spheres’ overlap zone (even if the overlap is negative, when it is in the middle of the empty space between the spheres). The *contact plane* is always perpendicular to the contact plane normal and passes through .

Normal displacement and strain can be defined as

Since is always aligned with , it can be stored as a scalar value multiplied by if necessary.

For massively compressive simulations, it might be beneficial to use the logarithmic strain, such that the strain tends to (rather than ) as centers of both spheres approach. Otherwise, repulsive force would remain finite and the spheres could penetrate through each other. Therefore, we can adjust the definition of normal strain as follows:

Such definition, however, has the disadvantage of effectively increasing rigidity (up to infinity) of contacts, requiring to be adjusted, lest the simulation becomes unstable. Such dynamic adjustment is possible using a stiffness-based time-stepper (GlobalStiffnessTimeStepper in Yade).

In order to keep consistent (e.g. that must be constant if two spheres retain mutually constant configuration but move arbitrarily in space), then either must track spheres’ spatial motion or must (somehow) rely on sphere-local data exclusively.

Geometrical meaning of shear strain is shown in fig-shear-2d.

The classical incremental algorithm is widely used in DEM codes and is described frequently ([Luding2008], [Alonso2004]). Yade implements this algorithm in the ScGeom class. At each step, shear displacement is updated; the update increment can be decomposed in 2 parts: motion of the interaction (i.e. and ) in global space and mutual motion of spheres.

Contact moves dues to changes of the spheres’ positions and , which updates current and as per (8) and (7). is perpendicular to the contact plane at the previous step and must be updated so that ; this is done by perpendicular projection to the plane first (which might decrease ) and adding what corresponds to spatial rotation of the interaction instead:

Mutual movement of spheres, using only its part perpendicular to ; denotes mutual velocity of spheres at the contact point:

Finally, we compute

The kinematic variables of an interaction are used to determine the forces acting on both spheres via a constitutive law. In DEM generally, some constitutive laws are expressed using strains and stresses while others prefer displacement/force formulation. The law described here falls in the latter category.

The constitutive law presented here is the most common in DEM, originally proposed by Cundall. While the kinematic variables are described in the previous section regardless of the contact model, the force evaluation depends on the nature of the material being modeled. The constitutive law presented here is the simplest non-cohesive elastic-frictional contact model, which Yade implements in Law2_ScGeom_FrictPhys_CundallStrack (all constitutive laws derive from base class LawFunctor).

When new contact is established (discussed in *Engines*) it has its properties (IPhys) computed from Materials associated with both particles. In the simple case of frictional material FrictMat, Ip2_FrictMat_FrictMat_FrictPhys creates a new FrictPhys instance, which defines normal stiffness , shear stiffness and friction angle .

At each step, given normal and shear displacements , , normal and shear forces are computed (if , the contact is deleted without generating any forces):

where is normal force and is trial shear force. A simple non-associated stress return algorithm is applied to compute final shear force

Summary force is then applied to both particles – each particle accumulates forces and torques acting on it in the course of each step. Because the force computed acts at contact point , which is difference from spheres’ centers, torque generated by must also be considered.

Each particle accumulates generalized forces (forces and torques) from the contacts in which it participates. These generalized forces are then used to integrate motion equations for each particle separately; therefore, we omit indices denoting the -th particle in this section.

The customary leapfrog scheme (also known as the Verlet scheme) is used, with some adjustments for rotation of non-spherical particles, as explained below. The “leapfrog” name comes from the fact that even derivatives of position/orientation are known at on-step points, whereas odd derivatives are known at mid-step points. Let us recall that we use , , for on-step values of at , and respectively; and , for mid-step values of at , .

Described integration algorithms are implemented in the NewtonIntegrator class in Yade.

Integrating motion consists in using current acceleration on a particle to update its position from the current value to its value at the next timestep . Computation of acceleration, knowing current forces acting on the particle in question and its mass , is simply

Using the 2nd order finite difference with step , we obtain

from which we express

Typically, is already not known (only is); we notice, however, that

i.e. the mean velocity during the previous step, which is known. Plugging this approximate into the term, we also notice that mean velocity during the current step can be approximated as

which is ; we arrive finally at

The algorithm can then be written down by first computing current mean velocity which we need to store for the next step (just as we use its old value now), then computing the position for the next time step :

(9)

Positions are known at times (if is constant) while velocities are known at . The facet that they interleave (jump over each other) in such way gave rise to the colloquial name “leapfrog” scheme.

Updating particle orientation proceeds in an analogous way to position update. First, we compute current angular acceleration from known current torque . For spherical particles where the inertia tensor is diagonal in any orientation (therefore also in current global orientation), satisfying , we can write

We use the same approximation scheme, obtaining an equation analogous to (?)

The quaternion representing rotation vector is constructed, i.e. such that

Finally, we compute the next orientation by rotation composition

Integrating rotation of aspherical particles is considerably more complicated than their position, as their local reference frame is not inertial. Rotation of rigid body in the local frame, where inertia matrix is diagonal, is described in the continuous form by Euler’s equations ( and , , are subsequent indices):

Due to the presence of the current values of both and , they cannot be solved using the standard leapfrog algorithm (that was the case for translational motion and also for the spherical bodies’ rotation where this equation reduced to ).

The algorithm presented here is described by [Allen1989] (pg. 84–89) and was designed by Fincham for molecular dynamics problems; it is based on extending the leapfrog algorithm by mid-step/on-step estimators of quantities known at on-step/mid-step points in the basic formulation. Although it has received criticism and more precise algorithms are known ([Omelyan1999], [Neto2006], [Johnson2008]), this one is currently implemented in Yade for its relative simplicity.

Each body has its local coordinate system based on the principal axes of inertia for that body. We use to denote vectors in local coordinates. The orientation of the local system is given by the current particle’s orientation as a quaternion; this quaternion can be expressed as the (current) rotation matrix . Therefore, every vector is transformed as . Since is a rotation (orthogonal) matrix, the inverse rotation .

For given particle in question, we know

- (constant) inertia matrix; diagonal, since in local, principal coordinates,
- external torque,
- current orientation (and its equivalent rotation matrix ),
- mid-step angular velocity,
- mid-step angular momentum; this is an auxiliary variable that must be tracked in addition for use in this algorithm. It will be zero in the initial step.

Our goal is to compute new values of the latter three, that is , , . We first estimate current angular momentum and compute current local angular velocity:

Then we compute , using and :

(10)

We evaluate from and in the same way as in (10) but shifted by ahead. Then we can finally compute the desired values

DEM simulations frequently make use of rigid aggregates of particles to model complex shapes [Price2007] called *clumps*, typically composed of many spheres. Dynamic properties of clumps are computed from the properties of its members:

- For non-overlapping clump members the clump’s mass is summed over members, the inertia tensor is computed using the parallel axes theorem: , where is the mass of clump member , is the distance from center of clump member to clump’s centroid and is the inertia tensor of the clump member .
- For overlapping clump members the clump’s mass is summed over cells using a regular grid spacing inside axis-aligned bounding box (Aabb) of the clump, the inertia tensor is computed using the parallel axes theorem: , where is the mass of cell , is the distance from cell center to clump’s centroid and is the inertia tensor of the cell .

Local axes are oriented such that they are principal and inertia tensor is diagonal and clump’s orientation is changed to compensate rotation of the local system, as to not change the clump members’ positions in global space. Initial positions and orientations of all clump members in local coordinate system are stored.

In Yade (class Clump), clump members behave as stand-alone particles during simulation for purposes of collision detection and contact resolution, except that they have no contacts created among themselves within one clump. It is at the stage of motion integration that they are treated specially. Instead of integrating each of them separately, forces/torques on those particles , are converted to forces/torques on the clump itself. Let us denote relative position of each particle with regards to clump’s centroid, in global orientation. Then summary force and torque on the clump are

Motion of the clump is then integrated, using aspherical rotation integration. Afterwards, clump members are displaced in global space, to keep their initial positions and orientations in the clump’s local coordinate system. In such a way, relative positions of clump members are always the same, resulting in the behavior of a rigid aggregate.

In simulations of quasi-static phenomena, it it desirable to dissipate kinetic energy of particles. Since most constitutive laws (including Law_ScGeom_FrictPhys_Basic shown above, *Contact model (example)*) do not include velocity-based damping (such as one in [Addetta2001]), it is possible to use artificial numerical damping. The formulation is described in [Pfc3dManual30], although our version is slightly adapted. The basic idea is to decrease forces which increase the particle velocities and vice versa by , comparing the current acceleration sense and particle velocity sense. This is done by component, which makes the damping scheme clearly non-physical, as it is not invariant with respect to coordinate system rotation; on the other hand, it is very easy to compute. Cundall proposed the form (we omit particle indices since it applies to all of them separately):

where is the damping coefficient. This formulation has several advantages [Hentz2003]:

- it acts on forces (accelerations), not constraining uniform motion;
- it is independent of eigenfrequencies of particles, they will be all damped equally;
- it needs only the dimensionless parameter which does not have to be scaled.

In Yade, we use the adapted form

(11)

where we replaced the previous mid-step velocity by its on-step estimate in parentheses. This is to avoid locked-in forces that appear if the velocity changes its sign due to force application at each step, i.e. when the particle in question oscillates around the position of equilibrium with period.

In Yade, damping (11) is implemented in the NewtonIntegrator engine; the damping coefficient is NewtonIntegrator.damping.

In order to ensure stability for the explicit integration sceheme, an upper limit is imposed on :

(12)

where is the highest eigenfrequency within the system.

Single 1D mass-spring system with mass and stiffness is governed by the equation

where is displacement from the mean (equilibrium) position. The solution of harmonic oscillation is where phase and amplitude are determined by initial conditions. The angular frequency

(13)

does not depend on initial conditions. Since there is one single mass, . Plugging (13) into (12), we obtain

for a single oscillator.

In a general mass-spring system, the highest frequency occurs if two connected masses , are in opposite motion; let us suppose they have equal velocities (which is conservative) and they are connected by a spring with stiffness : displacement of will be accompained by of , giving . That results in apparent stiffness , giving maximum eigenfrequency of the whole system

The overall critical timestep is then

(14)

This equation can be used for all 6 degrees of freedom (DOF) in translation and rotation, by considering generalized mass and stiffness matrices and , and replacing fractions by eigen values of . The critical timestep is then associated to the eigen mode with highest frequency :

(15)

In DEM simulations, per-particle stiffness is determined from the stiffnesses of contacts in which it participates. Suppose each contact has normal stiffness , shear stiffness and is oriented by normal . A translational stiffness matrix can be defined as the sum of contributions of all contacts in which it participates (indices ), as [Chareyre2005].

(16)

with and . Equations (15) and (16) determine in a simulation. A similar approach generalized to all 6 DOFs is implemented by the GlobalStiffnessTimeStepper engine in Yade. The derivation of generalized stiffness including rotational terms is very similar and can be found in [AboulHosn2016].

Note that for computation efficiency reasons, eigenvalues of the stiffness matrices are not computed. They are only approximated assuming than DOF’s are uncoupled, and using the diagonal terms of . They give good approximates in typical mechanical systems.

There is one important condition that : if there are no contacts between particles and , we would obtain value . While formally correct, this value is numerically erroneous: we were silently supposing that stiffness remains constant during each timestep, which is not true if contacts are created as particles collide. In case of no contact, therefore, stiffness must be pre-estimated based on future interactions, as shown in the next section.

Estimating timestep in absence of interactions is based on the connection between interaction stiffnesses and the particle’s properties. Note that in this section, symbols and refer exceptionally to Young’s modulus and density of *particles*, not of macroscopic arrangement.

In Yade, particles have associated Material which defines density (Material.density), and also may define (in ElastMat and derived classes) particle’s “Young’s modulus” (ElastMat.young). is used when particle’s mass is initially computed from its , while is taken in account when creating new interaction between particles, affecting stiffness . Knowing and , we can estimate (16) for each particle; we obviously neglect

- number of interactions per particle ; for a “reasonable” radius distribution, however, there is a geometrically imposed upper limit (12 for a packing of spheres with equal radii, for instance);
- the exact relationship the between particles’ rigidities , , supposing only that is somehow proportional to them.

By defining and , particles have continuum-like quantities. Explicit integration schemes for continuum equations impose a critical timestep based on sonic speed ; the elastic wave must not propagate farther than the minimum distance of integration points during one step. Since , are parameters of the elastic continuum and is fixed beforehand, we obtain

For our purposes, we define and for each particle separately; can be replaced by the sphere’s radius ; technically, could be used, but because of possible interactions of spheres and facets (which have zero thickness), we consider instead. Then

This algorithm is implemented in the utils.PWaveTimeStep function.

Let us compare this result to (14); this necessitates making several simplifying hypotheses:

all particles are spherical and have the same radius ;

the sphere’s material has the same and ;

the average number of contacts per sphere is ;

the contacts have sufficiently uniform spatial distribution around each particle;

the ratio is constant for all interactions;

contact stiffness is computed from using a formula of the form

(17)

where is some constant depending on the algorithm in usefootnote{For example, in the concrete particle model (Ip2_CpmMat_CpmMat_CpmPhys), while in the classical DEM model (Ip2_FrictMat_FrictMat_FrictPhys) as implemented in Yade.} and is half-distance between spheres in contact, equal to for the case of interaction radius . If (and by consequence), all interactions will have the same stiffness . In other cases, we will consider as the average stiffness computed from average (see below).

As all particles have the same parameters, we drop the index in the following formulas.

We try to express the average per-particle stiffness from (16). It is a sum over all interactions where and are scalars that will not rotate with interaction, while is -th component of unit interaction normal . Since we supposed uniform spatial distribution, we can replace by its average value . Recognizing components of as direction cosines, the average values of is . We find the average value by integrating over all possible orientations, which are uniformly distributed in space:

Moreover, since all directions are equal, we can write the per-body stiffness as for all . We obtain

and can put constant terms (everything) in front of the summation. equals the number of contacts per sphere, i.e. . Arriving at

we substitute into (14) using (17):

The ratio of timestep predicted by the p-wave velocity and numerically stable timestep is the inverse value of the last (dimensionless) term:

Actual values of this ratio depend on characteristics of packing , ratio and the way of computing contact stiffness from particle rigidity. Let us show it for two models in Yade:

- Concrete particle model
computes contact stiffness from the equivalent area first (6),

is the initial contact length, which will be, for interaction radius (5) , in average larger than . For ,we can roughly estimate , getting

where by comparison with (17).

Interaction radius leads to average interactions per sphere for dense packing of spheres with the same radius . is calibrated to match the desired macroscopic Poisson’s ratio .

Finally, we obtain the ratio

showing significant overestimation by the p-wave algorithm.

- Non-cohesive dry friction model
is the basic model proposed by Cundall explained in

*Contact model (example)*. Supposing almost-constant sphere radius and rather dense packing, each sphere will have interactions on average (that corresponds to maximally dense packing of spheres with a constant radius). If we use the Ip2_FrictMat_FrictMat_FrictPhys class, we have , as ; we again use (for lack of a more significant value). In this case, we obtain the resultwhich again overestimates the numerical critical timestep.

To conclude, p-wave timestep gives estimate proportional to the real , but in the cases shown, the value of about should be used to guarantee stable simulation.

Let us note at this place that not only assuring numerical stability of motion integration is a constraint. In systems where particles move at relatively high velocities, position change during one timestep can lead to non-elastic irreversible effects such as damage. The needed for reasonable result can be lower . We have no rigorously derived rules for such cases.

While most DEM simulations happen in space, it is frequently useful to avoid boundary effects by using periodic space instead. In order to satisfy periodicity conditions, periodic space is created by repetition of parallelepiped-shaped cell. In Yade, periodic space is implemented in the Cell class. The geometry of the cell in the reference coordinates system is defined by three edges of the parallepiped. The corresponding base vectors are stored in the columns of matrix (Cell.hSize).

The initial can be explicitly defined as a 3x3 matrix at the beginning of the simulation. There are no restricitions on the possible shapes: any parallelepiped is accepted as the initial cell. If the base vectors are axis-aligned, defining only their sizes can be more convenient than defining the full matrix; in that case it is enough to define the norms of columns in (see Cell.size).

After the definition of the initial cell’s geometry, should generally not be modified by direct assignment. Instead, its deformation rate will be defined via the velocity gradient Cell.velGrad described below. It is the only variable that let the period deformation be correctly accounted for in constitutive laws and Newton integrator (NewtonIntegrator).

The deformation of the cell over time is defined via a tensor representing the gradient of an homogeneous velocity field (Cell.velGrad). This gradient represents arbitrary combinations of rotations and stretches. It can be imposed externaly or updated by boundary controllers (see PeriTriaxController or Peri3dController) in order to reach target strain values or to maintain some prescribed stress.

The velocity gradient is integrated automatically over time, and the cumulated transformation is reflected in the transformation matrix (Cell.trsf) and the current shape of the cell . The per-step transformation update reads (it is similar for ), with the identity matrix:

can be set back to identity at any point in simulations, in order to define the current state as reference for strains definition in boundary controllers. It will have no effect on .

Along with the automatic integration of cell transformation, there is an option to homothetically displace all particles so that is applied over the whole simulation (enabled via Cell.homoDeform). This avoids all boundary effects coming from change of the velocity gradient.

In usual implementations, particle positions are forced to be inside the cell by wrapping their positions if they get over the boundary (so that they appear on the other side). As we wanted to avoid abrupt changes of position (it would make particle’s velocity inconsistent with step displacement change), a different method was chosen.

Pass 1 collision detection (based on sweep and prune algorithm, sect. *Sweep and prune*) operates on axis-aligned bounding boxes (Aabb) of particles. During the collision detection phase, bounds of all Aabb’s are wrapped inside the cell in the first step. At subsequent runs, every bound remembers by how many cells it was initially shifted from coordinate given by the Aabb and uses this offset repeatedly as it is being updated from Aabb during particle’s motion. Bounds are sorted using the periodic insertion sort algorithm (sect. *Periodic insertion sort algorithm*), which tracks periodic cell boundary .

Upon inversion of two Aabb‘s, their collision along all three axes is checked, wrapping real coordinates inside the cell for that purpose.

This algorithm detects collisions as if all particles were inside the cell but without the need of constructing “ghost particles” (to represent periodic image of a particle which enters the cell from the other side) or changing the particle’s positions.

It is required by the implementation (and partly by the algorithm itself) that particles do not span more than half of the current cell size along any axis; the reason is that otherwise two (or more) contacts between both particles could appear, on each side. Since Yade identifies contacts by Body.id of both bodies, they would not be distinguishable.

In presence of shear, the sweep-and-prune collider could not sort bounds independently along three axes: collision along axis depends on the mutual position of particles on the axis. Therefore, bounding boxes *are expressed in transformed coordinates* which are perpendicular in the sense of collision detection. This requires some extra computation: Aabb of sphere in transformed coordinates will no longer be cube, but cuboid, as the sphere itself will appear as ellipsoid after transformation. Inversely, the sphere in simulation space will have a parallelepiped bounding “box”, which is cuboid around the ellipsoid in transformed axes (the Aabb has axes aligned with transformed cell basis). This is shown in fig. fig-cell-shear-aabb.

The restriction of a single particle not spanning more than half of the transformed axis becomes stringent as Aabb is enlarged due to shear. Considering Aabb of a sphere with radius in the cell where , , but , the -span of the Aabb will be multiplied by . For the infinite shear , which can be desirable to simulate, we have . Fortunately, this limitation can be easily circumvented by realizing the quasi-identity of all periodic cells which, if repeated in space, create the same grid with their corners: the periodic cell can be flipped, keeping all particle interactions intact, as shown in fig. fig-cell-flip. It only necessitates adjusting the Interaction.cellDist of interactions and re-initialization of the collider (`Collider::invalidatePersistentData`). Cell flipping is implemented in the utils.flipCell function.

This algorithm is implemented in InsertionSortCollider and is used whenever simulation is periodic (Omega.isPeriodic); individual BoundFunctor’s are responsible for computing sheared Aabb’s; currently it is implemented for spheres and facets (in Bo1_Sphere_Aabb and Bo1_Facet_Aabb respectively).

When the collider detects approximate contact (on the Aabb level) and the contact does not yet exist, it creates *potential* contact, which is subsequently checked by exact collision algorithms (depending on the combination of Shapes). Since particles can interact over many periodic cells (recall we never change their positions in simulation space), the collider embeds the relative cell coordinate of particles in the interaction itself (Interaction.cellDist) as an *integer* vector . Multiplying current cell size by component-wise, we obtain particle offset in aperiodic ; this value is passed (from InteractionLoop) to the functor computing exact collision (IGeomFunctor), which adds it to the position of the particle Interaction.id2.

By storing the integral offset , automatically updates as cell parameters change.

The extension of sweep and prune algorithm (described in *Sweep and prune*) to periodic boundary conditions is non-trivial. Its cornerstone is a periodic variant of the insertion sort algorithm, which involves keeping track of the “period” of each boundary; e.g. taking period , then (subscript indicating period). Doing so efficiently (without shuffling data in memory around as bound wraps from one period to another) requires moving period boundary rather than bounds themselves and making the comparison work transparently at the edge of the container.

This algorithm was also extended to handle non-orthogonal periodic Cell boundaries by working in transformed rather than Cartesian coordinates; this modifies computation of Aabb from Cartesian coordinates in which bodies are positioned (treated in detail in *Approximate collision detection*).

The sort algorithm is tracking Aabb extrema along all axes. At the collider’s initialization, each value is assigned an integral period, i.e. its distance from the cell’s interior expressed in the cell’s dimension along its respective axis, and is wrapped to a value inside the cell. We put the period number in subscript.

Let us give an example of coordinate sequence along axis (in a real case, the number of elements would be even, as there is maximum and minimum value couple for each particle; this demonstration only shows the sorting algorithm, however.)

with cell -size . The value then means that the real coordinate of this extremum is , i.e. . The symbol denotes the periodic cell boundary.

Sorting starts from the first element in the cell, i.e. right of , and inverts elements as in the aperiodic variant. The rules are, however, more complicated due to the presence of the boundary :

() | stop inverting if neighbors are ordered; |

() | current element left of is below 0 (lower period boundary); in this case, decrement element’s period, decrease its coordinate by and move right; |

() | current element right of is above (upper period boundary); increment element’s period, increase its coordinate by and move left; |

() | inversion across must subtract from the left coordinate during comparison. If the elements are not in order, they are swapped, but they must have their periods changed as they traverse . Apply () if necessary; |

() | if after () the element that is now right of has , decrease its coordinate by and decrement its period. Do not move . |

In the first step, () is applied, and inversion with happens; then we stop because of ():

We move to next element ; first, we apply (), then invert until ():

The next element is ; we satisfy (), therefore instead of comparing , we must do ; we adjust periods when swapping over and apply (), turning into ; then we keep inverting, until ():

We move (wrapping around) to , which is ordered:

and so is the last element

The DEM computation using an explicit integration scheme demands a relatively high number of steps during simulation, compared to implicit scehemes. The total computation time of simulation spanning seconds (of simulated time), containing particles in volume depends on:

linearly, the number of steps , where is timestep safety factor; can be estimated by p-wave velocity using and (sect.

*Estimation of by wave propagation speed*) as . Thereforethe number of particles ; for fixed value of simulated domain volume and particle radius

where is packing porosity, roughly for dense irregular packings of spheres of similar radius.

The dependency is not strictly linear (which would be the best case), as some algorithms do not scale linearly; a case in point is the sweep and prune collision detection algorithm introduced in sect.

*Sweep and prune*, with scaling roughly .The number of interactions scales with , as long as packing characteristics are the same.

the number of computational cores ; in the ideal case, the dependency would be inverse-linear were all algorithms parallelized (in Yade, collision detection is not).

Let us suppose linear scaling. Additionally, let us suppose that the material to be simulated (, ) and the simulation setup (, ) are given in advance. Finally, dimensionless constants , and will have a fixed value. This leaves us with one last degree of freedom, . We may write

This (rather trivial) result is essential to realize DEM scaling; if we want to have finer results, refining the “mesh” by halving , the computation time will grow times.

For very crude estimates, one can use a known simulation to obtain a machine “constant”

with the meaning of time per particle and per timestep (in the order of for current machines). will be only useful if simulation characteristics are similar and non-linearities in scaling do not have major influence, i.e. should be in the same order of magnitude as in the reference case.

It is naturally expected that running the same simulation several times will give exactly the same results: although the computation is done with finite precision, round-off errors would be deterministically the same at every run. While this is true for *single-threaded* computation where exact order of all operations is given by the simulation itself, it is not true anymore in *multi-threaded* computation which is described in detail in later sections.

The straight-forward manner of parallel processing in explicit DEM is given by the possibility of treating interactions in arbitrary order. Strain and stress is evaluated for each interaction independently, but forces from interactions have to be summed up. If summation order is also arbitrary (in Yade, forces are accumulated for each thread in the order interactions are processed, then summed together), then the results can be slightly different. For instance

```
(1/10.)+(1/13.)+(1/17.)=0.23574660633484162
(1/17.)+(1/13.)+(1/10.)=0.23574660633484165
```

As forces generated by interactions are assigned to bodies in quasi-random order, summary force on the body can be different between single-threaded and multi-threaded computations, but also between different runs of multi-threaded computation with exactly the same parameters. Exact thread scheduling by the kernel is not predictable since it depends on asynchronous events (hardware interrupts) and other unrelated tasks running on the system; and it is thread scheduling that ultimately determines summation order of force contributions from interactions.

The effect of summation order can be significantly amplified by the usage of a *discontinuous* damping function in NewtonIntegrator given in (11) as

If the argument is close to zero then the least significant finite precision artifact can determine whether the equation (relative increment of ) is or . Given commonly used values of , it means that such artifact propagates from least significant place to the most significant one at once.