petar: a high-performance N-body code for modelling massive collisional stellar systems

ABSTRACT

The numerical simulations of massive collisional stellar systems, such as globular clusters (GCs), are very time consuming. Until now, only a few realistic million-body simulations of GCs with a small fraction of binaries (⁠|$5{{\ \rm per\ cent}}$|⁠) have been performed by using the nbody6++gpu code. Such models took half a year computational time on a Graphic Processing Unit (GPU)-based supercomputer. In this work, we develop a new N-body code, petar, by combining the methods of Barnes–Hut tree, Hermite integrator and slow-down algorithmic regularization. The code can accurately handle an arbitrary fraction of multiple systems (e.g. binaries and triples) while keeping a high performance by using the hybrid parallelization methods with mpi, openmp, simd instructions and GPU. A few benchmarks indicate that petar and nbody6++gpu have a very good agreement on the long-term evolution of the global structure, binary orbits and escapers. On a highly configured GPU desktop computer, the performance of a million-body simulation with all stars in binaries by using petar is 11 times faster than that of nbody6++gpu. Moreover, on the Cray XC50 supercomputer, petar well scales when number of cores increase. The 10 million-body problem, which covers the region of ultracompact dwarfs and nuclear star clusters, becomes possible to be solved.

methods: numerical, software: simulations, globular clusters: general

1 INTRODUCTION

The realistic models of collisional stellar systems where stars and binaries can have frequent close interactions, such as globular clusters (GCs), have been a long-term challenge due to the time-consuming calculations. By developing a hybrid-parallel direct N-body code nbody6++gpu (Wang et al. 2015), the million-body models of GCs have become possible (Dragon models; Wang et al. 2016). This success is based on the breakthrough of both hardware and software developments. Especially, the Gravity Pipe (GRAPE; Makino et al. 2003) and the Graphic Processing Unit (GPU) provide great platforms for performing highly efficient parallelized simulation codes (Gaburov, Harfst & Portegies Zwart 2009).

However, the Dragon models have relatively lower densities compared to the typical GCs in the Galaxy. It is difficult to simulate high-density systems because the half-mass relaxation time-scale (T_rh) is much shorter, while the calculation cost scales in a way of O(N³)/T_rh. Even so, the Dragon models took about half a year computing to reach one T_rh. Besides, the binary fraction is also small (⁠|$5{{\ \rm per\ cent}}$|⁠) compared to that in the observed GCs. This is because the orbital integration of binaries is not parallelized in nbody6++gpu, thus the performance cannot well scale with multiple CPU cores when the binary fraction is large. Therefore, the dense models with many binaries are still challenging. It is also not practicable to perform a large number of models to cover the parameter spaces of different initial conditions. All these limit the applications of the direct N-body methods for studying GCs.

On the other hand, the ultracompact dwarfs (UCDs) attract researchers’ attention to understand their formation and evolution, because they belong to a family crossing the region of GCs and dwarf galaxies (e.g. Hilker et al. 1999; Drinkwater et al. 2000; Phillipps et al. 2001). The discovery of supermassive black hole (SMBH) in a UCD (Seth et al. 2014) brings a new question that how SMBHs and UCDs co-evolve. In the view of stellar dynamics, UCDs with N ∼ 10⁷–10⁸ are in the transiting region between the collisional and collisionless systems. Thus, a proper model of UCDs also requires a correct treatment of collisional dynamics.

The galactic nuclear star clusters (NSCs) are very massive and dense. The collisional effect is also important. They can be a few magnitude more massive than GCs but the system size is similar. The formation scenarios of NSCs and the co-evolution among NSCs, SMBHs, and host galaxies are still not fully understood. The star-by-star simulations including SMBH are useful to study such kind of problems. The currently most dense model of NSCs with a proper treatment of collisional effect was done by using the nbody6++gpu code (Panamarev et al. 2019), but the number of stars is still far from the realistic case.

To solve the challenge of realistically simulating these dense and massive systems, a new approach of numerical tool that can overcome the bottlenecks (performance and small fraction of binaries) is necessary. The major difficulty is to solve the multiple time-scale issue. In this work, we describe how our new code, petar, can overcome this challenge.

In Section 2, we introduce the multiple time-scale issue. In Section 3, we summarize the algorithms used in the direct N-body methods, especially those in the nbody6(++gpu) code. The approach of particle-tree (PT) with individual time-steps is shortly described in Section 4. Then, we introduce the idea of Hamiltonian splitting in Section 5. The detailed description of petar is in Section 6. After that, we provide the benchmarks of petar in Section 7. Finally, we make conclusions and discuss the future work in Section 8.

2 MULTIPLE TIME-SCALE ISSUE

Long-term dynamical evolution of star clusters involves multiple physical processes with very different time-scales. Three important ones are:

T_bin: period of binaries (order of day at the minimum).
T_cr: time for a star crossing the cluster (order of Myr for GCs).
T_r: two-body relaxation time of the system (order of Gyr for GCs).

The N-body method needs to be possible to handle these three time-scale regions.

2.1 Binary period

Binaries contain very large mechanical energy compared to that of the host system. They play an important role in controlling the global evolution via few-body interactions. Since the gravitationally bound system has a negative heat capacity, the temperature of the cluster centre becomes hotter as energy is transferred outwards. Such process is unstable and finally the core collapse happens and the central density increases significantly (e.g. Binney & Tremaine 1987; Spitzer 1987). Binaries are considered as the major heating source that can prevent an infinite core collapse. After a close interaction between a tight binary and its neighbours in the centre, energy is added the system to support the core while the binary becomes more compact and finally merges or escapes from the system. Thus, such process influences both the global evolution and the statistical properties of binaries. This feature makes the star cluster an efficient environment to form exotic objects such as blue stragglers, X-ray binaries, and gravitational wave progenitors. Therefore, to obtain realistic models of star clusters, few-body interactions, and dynamics of binaries must be correctly treated.

However, binaries also bring the major challenge for the N-body simulations. T_bin can cover a very wide region in star clusters. The minimum of T_bin is determined by the stellar structure of binary components. Typically it is a few days. There is no upper limit of T_bin, but in a cluster environment wide binaries are easily disrupted by perturbations. Due to the Heggie–Hill law (Heggie 1975; Hills 1975), the boundary between the wide and the tight can be estimated as

$$\begin{eqnarray*} a_{\mathrm{h/s}}\approx \frac{G m_{\mathrm{1}}m_{\mathrm{2}}}{\langle m \rangle \sigma ^2}, \end{eqnarray*}$$

(1)

where G is the gravitational constant; m₁ and m₂ are the masses of the two components, respectively; 〈m〉 is the locally averaged stellar mass, and σ is the local velocity dispersion. After close encounters, wide binaries with the semimajor axis a > a_h/s become wider and while tight binaries become tighter. Thus, binaries with a < a_h/s can stay in clusters for a long time before merger or escaping.

We can roughly estimate T_bin at the boundary of a_h/s. In equation (1), σ² can be replaced by GM/r for a system in virial equilibrium, where M is the total mass and r is the size of the system. Thus, a_h/s ≈ r/N if all stars have a similar mass. In an open cluster, r ∼ 1 pc and N ∼ 10³, binaries close to the boundary have T_bin in the order of 10³ yr. In a dense GC, the boundary T_bin is in the order of years. Thus, T_bin has a range of 3–6 orders of magnitudes.

To properly follow the orbital motion, the time-step size of integration should be much less than T_bin. If the orbit has a fast change at the pericentre due to a high eccentricity (e), the time-step should be very small to catch the pericentre motion. This is also the case for a hyperbolic encounter. Thus, binaries and close encounters are the most time-consuming part in the N-body simulation of star clusters.

2.2 Crossing time

T_cr represents the time-scale of the orbital motion of a single star in the system. Thus, the time-step of integration for a star should be much shorter than its T_cr to obtain a sufficient accuracy. In a star cluster, T_cr can be estimated by

$$\begin{eqnarray*} T_{\mathrm{cr}}\approx \sqrt{\frac{1}{G\rho }}, \end{eqnarray*}$$

(2)

where ρ is the local mean density. After core collapse, the density contrast between the centre and the half-mass radius of star clusters can be an order of 10⁴ (e.g. Binney & Tremaine 1987; Wang et al. 2016). Thus, T_cr varies about 100 times from the centre to the halo. Even without binaries, the individual-time-step method is necessary to efficiently handle such a large range.

2.3 Relaxation time

Two-body relaxation is one important physical process that determines the long-term behaviour of the N-body system. The phenomenons of core collapse, mass segregation, and escaping of stars all depend on it.

The relation between T_r and T_cr can be described as (e.g. Binney & Tremaine 1987)

$$\begin{eqnarray*} T_{\mathrm{r}}\approx \frac{0.1 N}{\ln \Lambda } T_{\mathrm{cr}}, \end{eqnarray*}$$

(3)

where ln Λ is Comlumb logarithm. The factor N in equation (3) indicates that for a global cluster with million stars, the ratio between T_r and T_cr is very large. For a single-mass system in virial equilibrium, the averaged T_r measured at the half-mass radius of the system (R_h) can be estimated by (Spitzer 1987)

$$\begin{eqnarray*} T_{\mathrm{rh}}\approx 0.138 \frac{N^{1/2} R_{\mathrm{h}}^{3/2}}{\langle m \rangle ^{1/2} G^{1/2} \ln \Lambda }, \end{eqnarray*}$$

(4)

Typically, GCs in the Galaxy have T_rh in an order of Gyr and already passed a few T_rh.

To study the long-term evolution of star clusters, the numerical simulations need to cover at least one T_rh. However, the time resolution of the integration should be less than T_bin and T_cr. As the maximum of T_rh/T_bin ∼ 10¹¹, the classical integrators using individual-time-step methods (e.g. fourth-order Hermite with block time-steps) is not practically possible to handle such expensive calculations. Therefore, the sophisticated N-body codes (e.g. nbody6(++gpu)) apply special algorithms to reduce the computational operations.

3 DIRECT N-BODY METHOD

We provide a short review of the algorithms used in the direct N-body code, especially nbody6(++gpu) (Aarseth 2003; Nitadori & Aarseth 2012; Wang et al. 2015), that are designed to deal with the multiple time-scale issue. A part of the algorithms are also implemented in the petar code.

3.1 Individual time-steps

The performance of force interaction calculation in direct N-body simulations are usually considered as O(N²) due to the pair interaction between all particles (stars). However, this is only the case when the interaction between all particles are needed. When the multiple time-scale issue exists, sophisticated N-body codes for simulation star clusters like nbody6(++gpu) use individual time-steps for each particle. Thus particles with different T_cr can use suitable integration steps to avoid expensive O(N²) calculations every step. In nbody6(++gpu), the fourth-order Hermite integrator with the block-time-step method is used. The block-time-step method normalizes the step size to be an integer power of 0.5, so that the implementation of multiple-core parallelization becomes possible. The performance of interaction calculation per step is O(N〈N_act〉), where 〈N_act〉 is the averaged active particle number that need the update of forces at one step. Makino & Hut (1988) found that the total number of pair interactions depends on O(N^7/3)/T_cr if the system has a power-law density distribution with power-index α < 24/11. For α > 24/11, the scaling relation depends on α. Using equation (3), the scaling becomes O(N^10/3/ln Λ)/T_r for α < 24/11. Thus, as N increases, the computational cost grows rapidly.

3.2 AC neighbour scheme

To reduce the computation cost when N is large, Ahmad & Cohen (1973) introduced the (AC) neighbour scheme, where the force on a particle is split into the short-distance (neighbour) part and long-distance (regular) part. As the long-distance force changes smoothly, it can be updated with a larger time-step (regular step) compared to that of the neighbour force. Between two regular steps, the long-distance force is estimated by a second-order prediction. Usually, the number of neighbour particles is a small fraction of total N (the order of 10–10²), thus the total number of pair interactions is significantly reduced. Especially, when small neighbour steps are required to handle close encounters and binaries, only 10–10² force evaluations are needed per step while the regular step can be much larger. Thus, the frequent O(N) calculation is avoided. The speed gained by the AC scheme is roughly proportional to N^1/4 without binaries (Makino & Hut 1988; Makino & Aarseth 1992).

If no short-period binaries exist, individual time-steps combined with the AC neighbour scheme is an efficient method for simulating the long-term evolution of star clusters. However, it is still not sufficient to handle the very large time-scale gap caused by the short-period binaries.

3.3 Binary integrator

Since short-period binaries are very compact, the perturbation from neighbour particles is usually very weak unless a close encounter happens. Based on this feature, nbody6(++gpu) codes do not evolve such binaries and treat them as single (centre-of-the-mass) particles, until the perturbation becomes strong enough. These frozen binaries are named as ‘isolated binaries’ in the codes. Thus, only the internal motion of strongly perturbed binaries are actually integrated. Besides, the time-steps to integrate the internal motion are much smaller than the neighbour steps. To avoid large number of pair interactions, only the force from nearby perturbers is included. The typical number of such perturbers for a binary is less than 5. Therefore, the computational cost is reduced significantly.

The major perturber selection criterion is based on the strength of tidal force (Aarseth 2003, equation 8.58):

$$\begin{eqnarray*} r_{\mathrm{p}} \lt \left[\frac{2m_{\mathrm{p}}}{M\gamma _{\mathrm{min}}}\right]^{1/3}R, \end{eqnarray*}$$

(5)

where r_p is the distance between the perturber and the closest component in the binary; m_p is the mass of the pertuber and M is the mass of the binary; R is the apo-centre distance; and γ_min is a free coefficient.

However, this criterion may ignore the impact from the whole system because it checks only the individual neighbours, but not the cumulative effect from the group of particles with the similar distances and directions. For example, the central region of a GC can contain 10⁵ M_⊙. If we consider its centre-of-mass m_p ≈ 10⁵ M_⊙, the corresponding criterion r_{p, G} is 300 times larger than the case (r_{p, s}) of a normal star with 1 M_⊙. But by using this criterion, most stars outside r_{p, s} are excluded. For the relative wide binaries in the outside region of a GC, this cumulative effect may be significant and cannot be ignored. This is one potential problem when only close perturbers are selected.

On the other hand, to handle the highly eccentric binaries and close encounters, the Kustaanheimo & Stiefel (1965, KS) regularization and ‘Algorithmic Regularization’ (AR; Mikkola & Tanikawa 1999) are used. The regularization method avoids the singularity of Newtonian force when two particles get very close, thus no small time-steps are required while the accuracy of integration is high enough.

3.4 Slow-down algorithm

Mikkola & Aarseth (1996) developed the slow-down (SD) method that can significant reduce the number of integration steps for weakly perturbed binaries. The key idea is to modify the Hamiltonian of a system with a binary as

$$\begin{eqnarray*} H_\mathrm{sd}= \frac{1}{\kappa } H_{\mathrm{b}}+ (H - H_{\mathrm{b}}), \end{eqnarray*}$$

(6)

where H_sd is the new Hamiltonian and H_b is the Hamiltonian of the binary components. The κ is a scaling factor that slows down the motion of the binary that the effective period becomes T_bin/κ. Thus, the number of integration steps for this binary is also reduced by κ times. It is shown in Mikkola & Aarseth (1996) that the secular motion of binary can be correctly reproduced while the orbital phase information is lost.

The nbody6(++gpu) codes apply the SD method together with the KS regularization. However, since most weakly perturbed binaries are treated as isolated binaries, and a strict limit of κ (≤10) is used, the performance improvement is not significant.

3.5 Parallelization

With the algorithms described above, the total computational cost is still significant (roughly O(N³)/T_rh) for the direct N-body method. Thus, the multiple-core parallelization is necessary to reduce the wall clock time of the computation. Spurzem (1999) and Hemsendorf, Khalisi, Omarov & Spurzem (2003) implemented the MPI parallelization for calculating the long-distant and neighbour force. Then, Nitadori & Aarseth (2012) implemented the hybrid parallelization methods (nbody6-gpu) including the openmp, GPU (CUDA), and SIMD instructions (SSE, AVX) for long-distant and neighbour integration. Wang et al. (2015) combined these two and optimize the code (nbody6++gpu) in order to perform large N simulations on supercomputers.

3.6 Bottleneck

Although nbody6++gpu can handle the million-body simulations of GCs (Wang et al. 2015), there are several bottlenecks of the code that limit the future improvement. First, the large memory space is required to save the data. For each particle, a neighbour list with few hundreds 4-bytes integers needs to be saved. Moreover, each MPI process keeps the complete copy of particle data. Thus the maximum N is limited by the maximum memory size per mpi process. For example, if the maximum neighbour number is 500, one million particles require 2 GB memory space to save the neighbour lists. Including the particle data and many others, the actual memory cost is significant. Therefore, the code cannot be used to simulate systems with a very large N even if the computing resource has a large number of computing nodes.

Secondly, the parallelization of integrating internal motions of binaries (KS regularization) is difficult. One may think that this should not be since each binary can be evolved almost independently. In reality, several parts that required a large number of operations are not possible to be parallelized. Especially, each time when the integrator for a binary needs to be switched between the KS regularization and the Hermite method, the initialization of the force requires a O(N) pair interaction. In addition, each particle has to update its neighbour list with the total memory access of O(N〈N_b〉). If switching is frequent, such cost is very large. Unfortunately, there are always wide binaries that are close to the switching conditions in a star clusters if the primordial binary fraction is significant.

Moreover, the code is initially not designed for parallelization. During the KS integration, many shared global variables (mostly for binary stellar evolution) are modified and conditional interruptions are frequently used. Thus, the shared memory parallelization method like openmp is difficult to implement and does not scale with number of threads. The thread safety is also not guaranteed. On the other hand, the floating-point operation per memory access is low due to a few number of perturbers, thus the distributed memory parallel method (mpi parallelization) also does not scale (Wang et al. 2015).

Thirdly, the code mixes the N-body integration and the stellar evolution in a complex way. It is not easy to separate the two parts. Therefore, it is not flexible to replace the implementations of stellar-evolution models. It is also challenging to maintain the code and include new features.

4 PARTICLE-TREE WITH INDIVIDUAL TIME-STEPS

The force contributed by the distant particles are much weaker than that of neighbours. Thus in the individual time-step method, even the time resolution is low for the weakly interacted particles, the accuracy is sufficient. We can consider it as an approximation on time. There are another type of N-body algorithms using an approximation on space, such as the Barnes–Hut tree (PT; Barnes & Hut 1986), the particle mesh (PM; Hockney & Eastwood 1988), and the fast multiple method (FMM; Greengard & Rokhlin 1987). Compared to the cost of O(N²) in the direct force calculation of all particles, such methods only requires O(Nlog N) (PT and PM) or O(N) (FMM). However, these methods need a shared time-step. For example, the second-order leapfrog integrator is used in the PT method. Therefore, it is difficult to use it for star clusters because of the multiple time-scale issue.

McMillan & Aarseth (1993) started the first effort to overcome this bottleneck by introducing a high-order predictor-corrector integrator with individual time-steps in the PT method. They also implemented the KS regularization to deal with binaries. Fukushige & Kawai (2016) implemented the parellelization support on the GRAPE-9 system for this method and showed that it is much faster than the Hermite integrator for million-body simulations.

The PT method approximates the long-distant force. However, the weak encounters from distant particles are important in the relaxation process. Thus, it is necessary to ensure that the PT method can correctly reproduce the relaxation. Hernquist (1987) found that the accuracy of the relaxation process in the PT method depends on the opening angle (θ). When θ < 1.0, the measured T_rh of the PT method is consistent with that of the direct N-body method.

5 HYBRID METHODS WITH HAMILTONIAN SPLITTING

Recently, hybrid numerical simulation methods become popular to solve the multiple time-scale issues. The key idea is based on the Hamiltonian splitting. If the Hamiltonian, the system can be decomposed to two parts as

$$\begin{eqnarray*} H = H_{\mathrm{L}}+ H_{\mathrm{S}} \end{eqnarray*}$$

(7)

The equation of motion can be described as

$$\begin{eqnarray*} \frac{\mathrm{d}\boldsymbol {w}}{\mathrm{d}t} = \lbrace \boldsymbol {w}, H\rbrace = \lbrace \boldsymbol {w}, H_{\mathrm{L}}\rbrace + \lbrace \boldsymbol {w}, H_{\mathrm{S}}\rbrace . \end{eqnarray*}$$

(8)

where {} is Poisson bracket. We define the differential operator |$\mathcal {L}\equiv \lbrace\ , H_{\mathrm{L}}\rbrace$| and |$\mathcal {S}\equiv \lbrace\ , H_{\mathrm{S}}\rbrace$|⁠. Using matrix exponential, the symplectic mapping from t to t + Δt can be written as

$$\begin{eqnarray*} \boldsymbol {w}(t+\Delta t) = e^{\Delta t(\mathcal {L}+\mathcal {S})}\boldsymbol {w}(t). \end{eqnarray*}$$

(9)

If the two parts have analytic solutions, the symplectic integrator can be constructed. The second-order symplectic integrator is given by

$$\begin{eqnarray*} \boldsymbol {w}(t+\Delta t) = e^{\Delta t\mathcal {L}/2}e^{\Delta t\mathcal {S}}e^{\Delta t\mathcal {L}/2}\boldsymbol {w}(t) + O(\Delta t^3). \end{eqnarray*}$$

(10)

If H_L and H_S are the potential and kinetic energy, respectively, equation (10) represents the leapfrog method with the order of kick-drift-kick.

The potential and kinetic energy are not the only combination. Wisdom & Holman (1991) first introduced the mixed variable symplectic (MVS) method to evolve the planetary system, where H_L and H_S represent the Kepler motion and interactions between planets, respectively. Then, several combinations of H_L and H_S are introduced to simulate different type of systems (e.g. Hockney & Eastwood 1988; Xu 1995; Chambers 1999; Fujii et al. 2007; Oshino, Funato & Makino 2011).

5.1 Particle–particle particle tree method

For particle-based N-body systems, one possible way of Hamiltonian splitting is using H_L and H_S to represent the long- and short-range interactions, respectively. Since H_L dominates the computation while the contribute to the pair interaction is less than H_S, the approximated methods can be used with a large fixed time-step, which provides a sufficient accuracy and a small computational cost. On the other hand, more accurate methods with smaller and individual time-steps can be applied for H_S. These hybrid methods are used for several combinations, such as PM +PP (P³M; Hockney & Eastwood 1988), PM +PT (Xu 1995), and PT +PP (P³T; Oshino et al. 2011), where PP represents the direct N-body (particle–particle) method.

The P³T method introduced by Oshino et al. (2011, Fig. 1) is specially designed to simulate the collisional systems which have the multiple time-scale issue (without binaries). The Hamiltonian splitting of this method is via a changeover function W(r_ij):

$$\begin{eqnarray*} H_{\mathrm{S}}&=& \sum _{i=1}^N\frac{p_i^2}{2 m_i} - \sum _{i\lt j}^N \frac{G m_i m_j}{r_{ij}} W(r_{ij}) \nonumber \\ H_{\mathrm{L}}&=& \sum _{=1}^N \frac{G m_i m_j}{r_{ij}} [1-W(r_{ij})] . \end{eqnarray*}$$

(11)

The purpose of W(r_ij) is to result in a smooth transition when two particles pass the boundary of long- and short-range interactions.

An illustration showing how the P3T method deal with the long- and short-range interactions for a 2D particle system. The upper panel shows the structure of the Barnes–Hut tree. The neighbour particles inside a distance criterion are collected as individual clusters. An example for one (red) particle is shown as blue points. Two lines with an opening angle, θ = 0.35°, starting from this particle are also shown. If a tree cell (like the pink box) is inside θ, its superparticle (the centre-of-the-mass with multipole expansions) is used to obtain the long-range acceleration. The short-range acceleration for this particle (shown in the bottom panel) is calculated by the high-accuracy PP method.

Figure 1.

An illustration showing how the P³T method deal with the long- and short-range interactions for a 2D particle system. The upper panel shows the structure of the Barnes–Hut tree. The neighbour particles inside a distance criterion are collected as individual clusters. An example for one (red) particle is shown as blue points. Two lines with an opening angle, θ = 0.35°, starting from this particle are also shown. If a tree cell (like the pink box) is inside θ, its superparticle (the centre-of-the-mass with multipole expansions) is used to obtain the long-range acceleration. The short-range acceleration for this particle (shown in the bottom panel) is calculated by the high-accuracy PP method.

Open in new tab Download slide

Iwasawa, Portegies Zwart & Makino (2015) developed a GPU-parallelized P³T code and compared its performance with the Hermite method. They showed that the new scheme can be 10 times faster. This high performance encourages us to advance in this direction by combining the binary solver into the P³T method in order to properly handle the short-time interval close interactions.

6 HYBRID N-BODY CODE: petar

We introduce our new hybrid N-body code, petar, which combines the P³T method (Oshino et al. 2011; Iwasawa et al. 2015, 2016, 2017) and the slow-down time-transformed symplectic integrator (SDAR; Wang, Nitadori & Makino 2020). The framework of pentacle is the base of the code (Iwasawa et al. 2017). The parallelization framework for developing particle simulation codes (fdps; Iwasawa et al. 2016, 2020) are used to deal with the PT construction and long-range force calculation. The sdar code, which combines the fourth-order Hermite and the SDAR integrators.¹ is used for the short-range interaction. Fig. 2 show how petar works for one cycle of integration. It can be summaries as:

Decompose domains: distribute particles to different mpi processes.
Search neighbours and clustering: construct particle tree, search neighbours for each particle and gather all nearby particles into individual clusters (Section 6.3).
Find groups and create artificial particles: in each cluster, find subsystems (groups) and if necessary, create artificial particles for each group (Sections 6.2 and 6.4).
Calculate long-range force and kick velocities (P³T-kick): construct particle tree that includes artificial particles, calculate the long-range interaction and kick the velocities of all particles.
Integrate motions in each clusters (P³T-drift): in each cluster, use the Hermite and SDAR method to integrate the motions controlled by the short-range interaction.

Figure 2.

The schedule of one kick-drift-kick cycle of petar. In order to have a clear view, the 2D particle system is illustrated here. From left to right: (1) domain decomposition splits particles in space and distributes them to different mpi processes. This step is done every few cycles. (2) PT is constructed for searching neighbours and clustering. Individual clusters are marked as different colours. (3) For individual clusters, groups of multiple systems (binaries, triples ...; marked as red points) are detected and artificial particles (tidal-tensor and orbit-sampling/pseudo-particles; blue points) are added to particle systems. Artificial particles are represented by four points along a rectangle. (4) PT is constructed with artificial particles and used to calculate the long-range force. Then velocities of particles are kicked. (5) For individual clusters, particle positions, and velocities are integrated by the Hermite and SDAR methods (drift).

Open in new tab Download slide

The kick-drift-kick mode (equation 10) is used, thus in the final step, (i)–(iv) are executed once more. The first and the last P³T-kick take the half of Δt_L.

6.1 Mass-dependent changeover function

In pentacle, the seventh-order polynomial type of changeover function K(r_ij) (the derivative of W(r_ij) with respect to r_ij) is implemented (Iwasawa et al. 2017). This K(r_ij) ensures that all terms of derivatives of force used in the fourth-order Hermite integrator have a smooth changes at the boundary of the changeover range. However, there are two limits. First, the changeover function for potential W(r_ij) contains the term of log (r_ij), which is computational expensive. Secondly, the changeover range is fixed for all particles, i.e. r_in and r_out are constant. However, in star clusters, the mass spectrum has a wide range where the ratio between the maximum and minimum mass can be as large as 10⁴. If a very massive object like SMBH exists, the ratio can be 10⁸. For the same distance, the force from massive objects are larger than that of the low-mass objects. Thus the fixed changeover range cannot properly handle the systems with a wide mass spectrum.

We introduce a mass-dependent changeover function, where each particle has an individual changeover range (r_{in, i}, r_{out, i}). The cubic root of the particle mass is used as the coefficient to determine the boundary:

$$\begin{eqnarray*} r_{\mathrm{in},i}&= & \mathrm{max} \left(1, \frac{m_i}{\langle m \rangle } \right)^{\frac{1}{3}}r_{\mathrm{in,ref}}, \nonumber\\ r_{\mathrm{out},i}&=& \mathrm{max} \left(1, \frac{m_i}{\langle m \rangle } \right)^{\frac{1}{3}}r_{\mathrm{out,ref}}, \end{eqnarray*}$$

(12)

where r_{in, ref} and r_{out, ref} are the reference of a fixed changeover range, and 〈m〉 is the average mass of the system. The minimum mass factor is 1.0 so that low-mass particles can avoid too small changeover radii.

If two particles i and j has a separation r_ij < r_{in, ij}, the perturbation from a distant particle k versus the internal force between the two particles can be estimated as

$$\begin{eqnarray*} f_{\mathrm{p}}(r_{\mathrm{cm},k}) = \frac{m_{k}}{m_{i}+m_{j}} \left(\frac{r_{ij}}{r_{\mathrm{cm},k}} \right)^3, \end{eqnarray*}$$

(13)

where r_{cm, k} is the distance between the centre-of-the-mass of the pair i and j and the perturber k. If m_k > m_i + m_j, the changeover radii between the pair and the perturber are determined by m_k. Equation (13) indicates that

$$\begin{eqnarray*} f_{\mathrm{p}}(r_{\mathrm{in},k}) = \mathrm{min} \left(1, \frac{\langle m \rangle }{m_{i}+m_{j}} \right)\left(\frac{r_{ij}}{r_{\mathrm{in,ref}}} \right)^3. \end{eqnarray*}$$

(14)

Thus, f_p at the changeover boundary is independent of m_k. Therefore, equation (12) is sufficient to handle the tidal perturbation from massive objects.

On the other hand, to avoid logarithmic function, we use the eighth-order polynomial function as the changeover function for potential:

$$\begin{eqnarray*} W(x) = \left\lbrace \begin{array}{@{}l@{\quad }l@{}}\Lambda (1-2 x) &\quad (x\le 0)\\ \Lambda (1-2 x) - 1 + f(x) &\quad (0 \lt x \lt 1)\\ 0 & \quad (x\ge 1) \end{array}\right. \end{eqnarray*}$$

(15)

where

$$\begin{eqnarray*} f(x) & =& 1 + \Lambda x^5 \left(14 - 28 x + 20 x^2 -5 x^3 \right) \nonumber\\ x & =& \frac{r_{ij}-r_{\mathrm{in},ij}}{r_{\mathrm{out},ij}-r_{\mathrm{in},ij}} \nonumber\\ \Lambda & =& \frac{r_{\mathrm{out},ij}-r_{\mathrm{in},ij}}{r_{\mathrm{out},ij}+r_{\mathrm{in},ij}} \nonumber\\ r_{\mathrm{in},ij}&=& \max (r_{\mathrm{in},i},r_{\mathrm{in},j}) \nonumber\\ r_{\mathrm{out},ij}&=& \max (r_{\mathrm{out},i}, r_{\mathrm{out},j}). \end{eqnarray*}$$

(16)

We use x instead of r_ij in the formulae. The changeover function for force has the form:

$$\begin{eqnarray*} K(x) = \left\lbrace \begin{array}{@{}l@{\quad }l@{}}1 & \quad (x \le 0)\\ (x - 1)^4 \big(1 + 4x + 10 x^2&\\ \qquad\quad + \,20 x^3 + 35 \Lambda x^4 \big) & \quad (0 \lt x \lt 1)\\ 0 & \quad (x \ge 1) \end{array}\right. \end{eqnarray*}$$

(17)

For 0 < x < 1, W(x) and K(x) are related to f(x) by

$$\begin{eqnarray*} W(x) & =& f(x) - f(1) \frac{r_{ij}}{r_{\mathrm{out},ij}} \nonumber\\ K(x) &=& f(x) - \left(x + \frac{r_{\mathrm{in},ij}}{r_{\mathrm{out},ij}-r_{\mathrm{in},ij}}\right) f^{(1)}(x), \end{eqnarray*}$$

(18)

where the number in superscript ‘()’ indicates the times of derivative with respect to x. The second term in the expression of W(x) is an offset ensuring that the potential becomes zero at r_{out, ij}. At the boundary (x = 0, 1), the changeover functions have values:

$$\begin{eqnarray*} \begin{array}{rr} W(1) =0, & \\ K(0) =1, &\quad K(1) =0, \\ K^{(1)}(0) = 0, & \quad K^{(1)}(1)=0, \\ K^{(2)}(0) = 0, & \quad K^{(2)}(1)=0, \\ K^{(3)}(0) = 0, & \quad K^{(3)}(1)=0.\\ \end{array} \end{eqnarray*}$$

(19)

The potential and force of H_S reduces to zero after x ≥ 1. All derivatives of K(x) are zero at the boundary. These ensure that the higher order (up to 3) derivatives of force used in the fourth-order Hermite integrator have smooth curves at r_{in, ij} and r_{out, ij}. Fig. 3 shows the examples of these functions with r_{out, ij} = 10 and r_{in, ij} = 1.

The shapes of the changeover functions for potential, W(x), and for force, K(x), and its nth time derivatives, K(n)(x). Here, rin, ij = 1 and rout, ij = 10.

Figure 3.

The shapes of the changeover functions for potential, W(x), and for force, K(x), and its nth time derivatives, K⁽ⁿ⁾(x). Here, r_{in, ij} = 1 and r_{out, ij} = 10.

Open in new tab Download slide

Since r_{in, ij} and r_{out, ij} take the maximum values from the changeover radii of i and j particles, the strong force from the massive particles are always preferentially included in H_S for a high accuracy. Besides, for both particles, the changeover range is identical, thus the force is symmetric.

6.1.1 Varying changeover radii

To ensure that the P³T method is symplectic, once the changeover radii of all particles are determined, they should keep unchanged during the integration. However, in real star clusters, masses of stars evolve due to the stellar-wind-driven mass loss or the mass transfer and mergers of two stars. Besides, binary can form, disrupt and change members. Thus, after a certain time, the changeover radii of one star or binary may not be suitable anymore and need to be recalculated by using the new mass. This breaks the symplectic properties of the integrator. To minimize the effect, the modification of changeover radii and masses can only be done after a complete leapfrog step.

6.2 Slow-down time-transformed symplectic integrator

Short-period binaries are challenging not only due to the time-consuming integration, but also because of the large cumulative numerical errors after many orbits. The symplectic integrator can conserve Hamiltonian and angular momentum for the long-term evolution. However, it requires a constant integration step, thus very small time-steps have to be used for highly eccentric Kepler orbits and close encounters. One way to avoid small steps is to use the extended phase-space Hamiltonian,

$$\begin{eqnarray*} \Gamma (\boldsymbol {W}) = g(\boldsymbol {W}) \left[H(\boldsymbol {w},t) - H(\boldsymbol {w}(0),0)\right], \end{eqnarray*}$$

(20)

where |$H(\boldsymbol {w},t)$| is the standard Hamiltonian, |$g(\boldsymbol {W})$| is time-transformation function, and |$\boldsymbol {W}$| is the extended phase-space vector that contains |$\boldsymbol {w}$| and new pair of the coordinate, t, and the corresponding conjugate momentum p_t. By introducing the new differential variable, s, the equation of motion can be described as

$$\begin{eqnarray*} \frac{\mathrm{d}\boldsymbol {W}}{\mathrm{d}s} = \lbrace \boldsymbol {W}, \Gamma (\boldsymbol {W}) \rbrace \end{eqnarray*}$$

(21)

Thus, the time- and integration steps are decoupled via the time transformation function. For eccentric orbits, time-steps can vary based on the requirement of accuracy and efficiency while ds keeps constant. In order to use explicit symplectic method, |$g(\boldsymbol {W})$| should be designed to make |$\Gamma (\boldsymbol {W})$| separable like equation (7). Mikkola & Tanikawa (1999) and Preto & Tremaine (1999) provided such a solution by using

$$\begin{eqnarray*} g(\boldsymbol {W}) = \frac{f(T(\boldsymbol {P})) - f(-U(\boldsymbol {R}))}{T(\boldsymbol {P}) + U(\boldsymbol {R})} , \end{eqnarray*}$$

(22)

where |$T(\boldsymbol {P})$| and |$U(\boldsymbol {R})$| are kinetic and potential energy in the extended phase space. When f(x) = log (x) with a leapfrog integrator in the drift-kick-drift mode, the Kepler orbit can be integrated very accurately with only round-off errors in positions and velocities and a phase error of time. In Mikkola & Tanikawa (1999), this is named ‘AR’. The AR method can well solve the issue of the long-term cumulative errors.

On the other hand, the SD method described in Section 3.4 can reduce the total integration steps for weakly perturbed binaries. Wang et al. (2020) combined the SD and AR methods and developed the SDAR algorithm to efficiently and accurately integrate the few-body systems. In their work, κ is calculated by the perturbation and time-scale criteria. We set the tree time-step, C_sΔt_L, as the maximum time-scale criteria where C_s is a coefficient larger than one. In such case, if κ of a weakly perturbed binary reaches the maximum value, Δt_L is small enough to resolve the orbit of the binary in order to provide the correct P³T-kick. With the SD method, the actual integration steps of binaries for a given physical time interval are decoupled from the real T_bin but depends on the perturbation and Δt_L. Since most short-period binaries are weakly perturbed in a star cluster and their T_bin ≪ Δt_L, the total number of integration steps are significantly reduced by a few thousand times. Thus, the SDAR method is the major algorithm in the petar code to solve the multiple time-scale issue.

In the P³T-drift step, we use the SDAR method for compact groups of particles and the Hermite integrator for integrating the motions of singles and centre-of-the-mass of groups. The particle groups are determined by a distant criterion, r_ij < r_{g, ij}, where

$$\begin{eqnarray*} r_{\mathrm{g},ij}= \theta r_{\mathrm{in},ij}, \end{eqnarray*}$$

(23)

and θ is the opening angle of the PT method. This is not a strict criterion. We use equation (23) so that any pair of members in a group are always inside their inner boundaries of changeover ranges. Thus, the SDAR method only need to deal with the Newtonian force. This avoids the complexity of using the SDAR method and changeover functions together. Besides, if one particle receives a long-range interaction from the group, its members are inside the angle θ viewing from this particle.

6.3 Clustering

In Section 3.6, we show that the switch between the regularization method and the Hermite integrator is expensive with an O(N〈N_b〉) memory access and an O(N) force calculation in nbody6(++gpu). This issue is general for the hybrid methods that use neighbour list and need force calculation for the centre-of-the-masses of groups. In petar, we use the clustering scheme to avoid such expensive switching. This scheme is originally implemented in the pentacle code (Iwasawa et al. 2017) with a uniform neighbour radius. Here, we describe the idea and introduce the improved algorithm based on the orbit-dependent neighbour criterion.

After searching short-range interacting neighbours and before Hermite integration, particles are collected together into different clusters (Fig. 2). The clustering scheme ensures that any member in one cluster have all its neighbours inside the same cluster. In such case, particles outside this cluster only provide the long-range interaction to the members. Thus, during the integration of the short-range interactions (P³T-drift), each cluster is isolated to others and can be integrated in parallel. This feature leads to a great advantage: the switch between the different integration methods in one cluster only affect the neighbour lists and forces of the local members. Since the typical number of members per cluster is a small fraction of the total number of particles, the computational cost of switching is much less.

On the other hand, when mpi parallelization is used, sometimes one cluster may contain members crossing multiple mpi processes (like the cluster, OMP:1, shown in Fig. 2). In such case, one mpi process is chosen to be the host for the cluster and others send particle data to it.

6.3.1 Orbit-dependent neighbour criterion

The number of members in clusters determine the performance of P³T-drift. Thus, it is important to choose a proper neighbour searching criterion. In star clusters with a mass spectrum, we cannot apply the uniform neighbour radius as in pentacle. Instead, we determine the individual neighbour searching radius, r_{nb, i}, based on r_{out, i} and the velocity. First, for each particle, r_{nb, i} must be longer than r_{out, i}. However, we cannot set these two radii the same because during P³T-drift, particles that are initially not inside the short-range interaction region can move closer and penetrate the boundary. Therefore, r_{nb, i} should be long enough to capture such potential neighbours. One safe way is to include the velocity information that

$$\begin{eqnarray*} r_{\mathrm{nb},i}= r_{\mathrm{out},i}+ C_{\mathrm{r}}|\boldsymbol {v}_i| \Delta t_{\mathrm{L}}, \end{eqnarray*}$$

(24)

where |$\boldsymbol {v}_i$| is the particle velocity and C_r is a free coefficient (we use 3.0 for safety).

However, this criterion is independent on the direction of velocity. If the particle velocity is large, r_{nb, i} is significantly long that a huge cluster can form. Unfortunately, high-velocity particles are commonly generated via few-body interactions in star clusters. Thus, we need to reduce the neighbour numbers for these particles. For a high-velocity particle, only neighbours along its path are important and most particles inside r_{nb, i} are not real neighbours. To avoid including these unnecessary neighbours, a 3D neighbour searching criterion which depends on the direction of velocity is needed. However, such criterion is not computationally efficient and is not supported by the current version of fdps. To solve this issue, we apply a two-stage method:

Obtain the neighbour candidates by applying the spherically symmetric neighbour searching using equation (24).
Select true neighbours if the candidate j has the Kepler orbital pericentre separation, r_{p, ij} < C_vr_{nb, i}, where C_v is a free coefficient (e.g. 1.5).

The first step has the calculation cost of O(Nlog N) by using the PT method. Since a proper Δt_L leads to a large fraction of particles with no neighbour candidates, the cost of evaluations of Kepler orbital pericentre at the second step is not expensive. Thus, this method is efficient to deal with the problem of high-velocity particles.

6.4 Artificial particle algorithm for weak perturbation

Both the changeover function and Δt_L influence the performance and accuracy of the simulations. We can understand this by analysing a situation where a binary receives the long-range perturbation force. Although the long-range force is much weaker compared to the internal force of the binary, it is still important to ensure that Δt_L < T_bin. Otherwise a random phase of binary is chosen to evaluate the long-range force, which does not represent the correct perturbation. This is the same for the counter force. If the binary is very massive, this error can be significant.

However, keeping Δt_L < T_bin is difficult since T_bin can be very small. In Section 6.2, we show that the SD method can artificially increase T_bin, which helps to avoid too small Δt_L. But only weakly perturbed binaries can have large enough κ. When a tight binary has close neighbours, the effective T_bin can be much smaller than Δt_L. This can frequently happen in star clusters.

To solve this issue, we introduce the ‘artificial particle algorithm’. In this algorithm, instead of calculating the long-range force once and giving a large velocity kick per Δt_L, we can construct the local potential (tidal-tensor) field near the binary and use it to calculate a smooth evolution of the long-range perturbation every AR step. The tidal-tensor field can be obtained by measuring the long-range forces of a group of artificial particles near the binary. On the other hand, another group of artificial particles along or near the orbit of the binary can be used to to represent the correct orbit-averaged counterforce.

This algorithm increases the total number of particles, thus the number of long-range interactions becomes more. However, this additional cost can be easily reduced by increasing the number of computing cores. If we use small Δt_L, there is no such simple solution. Besides, adding artificial particles is easy to implement. This is especially convenient for using the fdps library and the accelerators such as SIMD and GPU.

6.4.1 Tidal tensor

Here, we describe the algorithm to obtain the local tidal-tensor field near the binary. Based on a 3D Taylor expansion, the acceleration of a particle at an arbitrary position near a fixed centre can be evaluated by:

$$\begin{eqnarray*} \boldsymbol {A}_i(\boldsymbol {r}) &=& \boldsymbol {A}_i(\boldsymbol {r}_{\mathrm{cm}}) + \boldsymbol {A}_{ij} \cdot (\boldsymbol {r}-\boldsymbol {r}_{\mathrm{cm}})_{j} \nonumber\\ &&+ \,\frac{1}{2!} (\boldsymbol {r}-\boldsymbol {r}_{\mathrm{cm}})_{j}^T \cdot \boldsymbol {A}_{ijk} \cdot (\boldsymbol {r}-\boldsymbol {r}_{\mathrm{cm}})_{k} + ... \nonumber\\ A_{ij} &=& \frac{\partial A_{i}}{\partial r_{j}}\nonumber\\ A_{ijk} &=& \frac{\partial ^2 A_{i}}{\partial r_{j}\partial r_{k}} \end{eqnarray*}$$

(25)

where A_ij and A_ijk are individual components of the tensors, |$\boldsymbol{A}_{ij}$| and |$\boldsymbol {A}_{ijk}$|⁠, respectively; and ‘·’ represents matrix multiplication. In the P³T method, the numerical long-range forces are constant within one Δt_L, so should be the tensor field. Thus, we only need to measure the tensors once per Δt_L. Then, using equation (25), the long-range perturbation on an arbitrary orbital phase of the binary can be evaluated during the P³T-drift. For the gravitational field, |$\boldsymbol {A}_{ij}$| and |$\boldsymbol {A}_{ijk}$| are symmetric tensors. The number of elements of the first three orders are 3 (⁠|$\boldsymbol {A}_i(\boldsymbol {r}_{\rm {cm}})$|⁠), 6 (⁠|$\boldsymbol {A}_{ij}$|⁠), and 10 (⁠|$\boldsymbol {A}_{ijk}$|⁠), respectively. Thus, the second-order method has totally 9 elements and the third order has 19.

To obtain these tensor elements, we can create measure points (zero-mass artificial particles) near the centre-of-the-mass of the binary. These artificial particles obtain the long-range interactions during P³T-kick. Using the 3D accelerations of one measure point in equation (25), we can get three independent linear equations of the tensor elements. At least three measure points are needed to obtain the unique values of tensors up to the second order. The third-order case requires seven points.

The acceleration of the centre-of-the-mass can be used to directly measure the zero-order acceleration, |$\boldsymbol {A}(\boldsymbol {r}_{\mathrm{cm}})$|⁠. We collect other components of the tensors in 1D vectors for the second- (2nd) and third- (3rd) order methods:

$$\begin{eqnarray*} \begin{array}{lccccccc}\boldsymbol{T}_{i} (2{\mathrm{nd}}) = \big[ &\,\,\,\, A_{xx} &\,\,\,\, A_{xy} &\,\,\,\, A_{xz} &\,\,\,\, A_{yy} &\,\,\,\, A_{yz} &\,\,\,\, A_{zz} &\,\,\,\, \big]\\ \boldsymbol{T}_{i} (3{\mathrm{rd}}) = \big[ &\,\,\,\, A_{xx} &\,\,\,\, A_{xy} &\,\,\,\, A_{xz} &\,\,\,\, A_{yy} &\,\,\,\, A_{yz} &\,\,\,\, A_{zz} &\,\,\,\,\\ &\,\,\,\, A_{xxx} &\,\,\,\, A_{xxy} &\,\,\,\, A_{xxz} &\,\,\,\, A_{xyy} &\,\,\,\, A_{xyz} &\,\,\,\, A_{xzz}&\,\,\,\,\\ &\,\,\,\, A_{yyy} &\,\,\,\, A_{yyz} &\,\,\,\, A_{yzz} &\,\,\,\, A_{zzz} &\,\,\,\, &\,\,\,\, &\,\,\,\, \big]. \end{array} \end{eqnarray*}$$

(26)

The accelerations of measure points excluding |$\boldsymbol {A}(\boldsymbol {r}_{\mathrm{cm}})$| can be also collected as a 1D vector:

$$\begin{eqnarray*} \begin{array}{lccc}\boldsymbol {A^{\prime }}_{\mathrm{j}} = &\quad \big[ A^{\prime }_{\mathrm{x}}(\boldsymbol {r}_{\mathrm{1}}) &\quad A^{\prime }_{\mathrm{y}}(\boldsymbol {r}_{\mathrm{1}}) &\quad A^{\prime }_{\mathrm{z}}(\boldsymbol {r}_{\mathrm{1}})\\ &\quad A^{\prime }_{\mathrm{x}}(\boldsymbol {r}_{\mathrm{2}}) &\quad A^{\prime }_{\mathrm{y}}(\boldsymbol {r}_{\mathrm{2}}) &\quad A^{\prime }_{\mathrm{z}}(\boldsymbol {r}_{\mathrm{2}})\\ &\quad ... &\quad &\quad \big], \end{array} \end{eqnarray*}$$

(27)

where |$\boldsymbol {A}^{\prime }(\boldsymbol {r})=\boldsymbol {A}(\boldsymbol {r})-\boldsymbol {A}(\boldsymbol {r}_{\mathrm{cm}})$| and the suffixes, 1,2,3,..., are the indices of points.

Based on equation (25), |$\boldsymbol {T_{\mathrm{i}}}$| and |$\boldsymbol {A}_{\mathrm{j}}$| can be described by a linear mapping:

$$\begin{eqnarray*} \boldsymbol {M}_{ij} \boldsymbol {T}_{\mathrm{i}} = \boldsymbol {A}_{\mathrm{j}} \end{eqnarray*}$$

(28)

Once the generalized inverse matrix, |$\boldsymbol {M}_{ij}^{-1}$|⁠, is obtained, |$\boldsymbol {T}_{\mathrm{i}}$| can be easily calculated once |$\boldsymbol {A}_{\mathrm{j}}$| are measured.

In principle 2 points excluding the centre-of-the-mass are enough for the second-order method. However, we can only obtain the 2D information in a plane. Thus we use 4 points locating at the corners of a regular tetrahedron (see the upper panel in Fig. 4). In the third-order case, the corners of a regular octahedron can provide 6 points. However, in such case, we find the rank of |$\boldsymbol {M}_{ij}$| is not full, so that |$\boldsymbol {M}_{ij}^{-1}$| cannot be constructed. Thus, we use 8 points locating at the corners of a cube instead (the lower panel in Fig. 4). Although two additional points are needed for both two methods, we obtain the benefit that the condition numbers of the matrices (the maximum singular value versus the minimum) are small: 2 and 12.7 for the second- and the third-order methods, respectively. This means that the relative error of measurement inherited from |$\boldsymbol {A}_{\mathrm{j}}$| can be maximally enlarged by a factor of 2 or 12.7 in |$\boldsymbol {M}_{ij}^{-1}$|⁠. The exact values of the elements in |$\boldsymbol {M}_{ij}^{-1}$| can also be obtained easily. Table 1 provides the complete formulae to evaluate |$\boldsymbol {T}_{\mathrm{i}}$| and to calculate the acceleration at any |$\boldsymbol {r}$|⁠. The corresponding coordinates of the measure points are also provided.

Figure 4.

The illustration of the spatial distribution of the artificial particles in the tidal-tensor, orbit-sampling, and pseudo-particle multipole methods. Open pentagons at the corners and the centre are tidal-tensor measure points. Filled stars along the orbits of two binary components are pseudo-particles or orbit-sampling particles used for evaluating the orbit-averaged long-range counterforce to distant particles. Notice here two combinations are shown, it is possible to combine different orders of tidal tensors with either pseudo-particle multipole or orbit-sampling.

Open in new tab Download slide

Table 1.

Open in new tab

The second- (upper block) and third-order (lower block) tidal tensor methods. Each block contains three parts: (1) the formulae to calculate the tensor coefficients, |$\boldsymbol {A}_{ij}$| and |$\boldsymbol {A}_{ijk}$|⁠, where the centre-of-the-mass acceleration is subtracted in |$\boldsymbol {A}^{\prime }(\boldsymbol {r}_{\mathrm{i}})$|⁠; (2) the formulae to calculate the acceleration |$\boldsymbol {A}(\boldsymbol {r})$| for a given position, where |$\boldsymbol {r}^{\prime }$| (x^′, y^′, z^′) is the coordinate referring to the centre-of-the-mass (⁠|$\boldsymbol {r}-\boldsymbol {r}_{\mathrm{cm}}$|⁠); (3) the coordinates of the measure points, where d_c is the half-length of the edge of the tetrahedron or the cube.

Second-order tidal tensor with four measure points at corners of a regular tetrahedron
A_xx	=	\|$\frac{1}{d_{\mathrm{c}}} \big[ + \frac{1}{2} A^{\prime }_{x}(\boldsymbol {r}_1)- \frac{1}{2} A^{\prime }_{x}(\boldsymbol {r}_2)\big]$\|
A_xy	=	\|$\frac{1}{d_{\mathrm{c}}} \big[ + \frac{1}{4} A^{\prime }_{y}(\boldsymbol {r}_1)- \frac{1}{4} A^{\prime }_{y}(\boldsymbol {r}_2)+ \frac{1}{4} A^{\prime }_{x}(\boldsymbol {r}_3)- \frac{1}{4} A^{\prime }_{x}(\boldsymbol {r}_4)\big]$\|
A_xz	=	\|$\frac{1}{d_{\mathrm{c}}} \big[ - \frac{\sqrt{2}}{8} A^{\prime }_{x}(\boldsymbol {r}_1)+ \frac{1}{4} A^{\prime }_{z}(\boldsymbol {r}_1)- \frac{\sqrt{2}}{8} A^{\prime }_{x}(\boldsymbol {r}_2)- \frac{1}{4} A^{\prime }_{z}(\boldsymbol {r}_2)+ \frac{\sqrt{2}}{8} A^{\prime }_{x}(\boldsymbol {r}_3)+ \frac{\sqrt{2}}{8} A^{\prime }_{x}(\boldsymbol {r}_4)\big]$\|
A_yy	=	\|$\frac{1}{d_{\mathrm{c}}} \big[ + \frac{1}{2} A^{\prime }_{y}(\boldsymbol {r}_3)- \frac{1}{2} A^{\prime }_{y}(\boldsymbol {r}_4)\big]$\|
A_yz	=	\|$\frac{1}{d_{\mathrm{c}}} \big[ - \frac{\sqrt{2}}{8} A^{\prime }_{y}(\boldsymbol {r}_1)- \frac{\sqrt{2}}{8} A^{\prime }_{y}(\boldsymbol {r}_2)+ \frac{\sqrt{2}}{8} A^{\prime }_{y}(\boldsymbol {r}_3)+ \frac{1}{4} A^{\prime }_{z}(\boldsymbol {r}_3)+ \frac{\sqrt{2}}{8} A^{\prime }_{y}(\boldsymbol {r}_4)- \frac{1}{4} A^{\prime }_{z}(\boldsymbol {r}_4)\big]$\|
A_zz	=	\|$\frac{1}{d_{\mathrm{c}}} \big[ - \frac{\sqrt{2}}{4} A^{\prime }_{z}(\boldsymbol {r}_1)- \frac{\sqrt{2}}{4} A^{\prime }_{z}(\boldsymbol {r}_2)+ \frac{\sqrt{2}}{4} A^{\prime }_{z}(\boldsymbol {r}_3)+ \frac{\sqrt{2}}{4} A^{\prime }_{z}(\boldsymbol {r}_4)\big]$\|
\|$A_{x}(\boldsymbol {r})$\|	=	\|$A_{x}(\boldsymbol {r}_{\mathrm{cm}}) + A_{xx}x^{\prime } + A_{xy}y^{\prime } + A_{xz}z^{\prime }$\|
\|$A_{y}(\boldsymbol {r})$\|	=	\|$A_{y}(\boldsymbol {r}_{\mathrm{cm}}) + A_{xy}x^{\prime } + A_{yy}y^{\prime } + A_{yz}z^{\prime }$\|
\|$A_{z}(\boldsymbol {r})$\|	=	\|$A_{z}(\boldsymbol {r}_{\mathrm{cm}}) + A_{xz}x^{\prime } + A_{yz}y^{\prime } + A_{zz}z^{\prime }$\|
\|$\boldsymbol {r}^{\prime }_{\mathrm{i}}/d_{\mathrm{c}}$\|	=	\|$\left [ 1\quad 0 \quad - \frac{\sqrt{2}}{2} \right ]\quad\left [ -1\quad 0 \quad - \frac{\sqrt{2}}{2} \right ]\quad\left [ 0\quad 1 \quad - \frac{\sqrt{2}}{2} \right ]\quad\left [ 0\quad -1 \quad - \frac{\sqrt{2}}{2} \right ]$\|
Third-order tidal tensor with eight measure points at corners of a cube
A_xx	=	\|$\frac{1}{d_{\mathrm{c}}} \big[ + \frac{1}{4} A^{\prime }_{x}(\boldsymbol {r}_1)- \frac{1}{4} A^{\prime }_{x}(\boldsymbol {r}_3)+ \frac{1}{4} A^{\prime }_{x}(\boldsymbol {r}_5)- \frac{1}{4} A^{\prime }_{x}(\boldsymbol {r}_7)\big]$\|
A_xy	=	\|$\frac{1}{d_{\mathrm{c}}} \big[ + \frac{1}{8} A^{\prime }_{y}(\boldsymbol {r}_1)+ \frac{1}{8} A^{\prime }_{x}(\boldsymbol {r}_2)- \frac{1}{8} A^{\prime }_{y}(\boldsymbol {r}_3)- \frac{1}{8} A^{\prime }_{x}(\boldsymbol {r}_4)+ \frac{1}{8} A^{\prime }_{y}(\boldsymbol {r}_5)+ \frac{1}{8} A^{\prime }_{x}(\boldsymbol {r}_6)- \frac{1}{8} A^{\prime }_{y}(\boldsymbol {r}_7)- \frac{1}{8} A^{\prime }_{x}(\boldsymbol {r}_8)\big]$\|
A_xz	=	\|$\frac{1}{d_{\mathrm{c}}} \big[ - \frac{1}{12} A^{\prime }_{x}(\boldsymbol {r}_1)+ \frac{1}{12} A^{\prime }_{z}(\boldsymbol {r}_1)- \frac{1}{12} A^{\prime }_{x}(\boldsymbol {r}_2)- \frac{1}{12} A^{\prime }_{x}(\boldsymbol {r}_3)- \frac{1}{12} A^{\prime }_{z}(\boldsymbol {r}_3)- \frac{1}{12} A^{\prime }_{x}(\boldsymbol {r}_4)+ \frac{1}{12} A^{\prime }_{x}(\boldsymbol {r}_5)+ \frac{1}{12} A^{\prime }_{z}(\boldsymbol {r}_5)+ \frac{1}{12} A^{\prime }_{x}(\boldsymbol {r}_6)$\|
		\|$+ \frac{1}{12} A^{\prime }_{x}(\boldsymbol {r}_7)- \frac{1}{12} A^{\prime }_{z}(\boldsymbol {r}_7)+ \frac{1}{12} A^{\prime }_{x}(\boldsymbol {r}_8)\big]$\|
A_yy	=	\|$\frac{1}{d_{\mathrm{c}}} \big[ + \frac{1}{4} A^{\prime }_{y}(\boldsymbol {r}_2)- \frac{1}{4} A^{\prime }_{y}(\boldsymbol {r}_4)+ \frac{1}{4} A^{\prime }_{y}(\boldsymbol {r}_6)- \frac{1}{4} A^{\prime }_{y}(\boldsymbol {r}_8)\big]$\|
A_yz	=	\|$\frac{1}{d_{\mathrm{c}}} \big[ - \frac{1}{12} A^{\prime }_{y}(\boldsymbol {r}_1)- \frac{1}{12} A^{\prime }_{y}(\boldsymbol {r}_2)+ \frac{1}{12} A^{\prime }_{z}(\boldsymbol {r}_2)- \frac{1}{12} A^{\prime }_{y}(\boldsymbol {r}_3)- \frac{1}{12} A^{\prime }_{y}(\boldsymbol {r}_4)- \frac{1}{12} A^{\prime }_{z}(\boldsymbol {r}_4)+ \frac{1}{12} A^{\prime }_{y}(\boldsymbol {r}_5)+ \frac{1}{12} A^{\prime }_{y}(\boldsymbol {r}_6)+ \frac{1}{12} A^{\prime }_{z}(\boldsymbol {r}_6)$\|
		\|$+ \frac{1}{12} A^{\prime }_{y}(\boldsymbol {r}_7)+ \frac{1}{12} A^{\prime }_{y}(\boldsymbol {r}_8)- \frac{1}{12} A^{\prime }_{z}(\boldsymbol {r}_8)\big]$\|
A_zz	=	\|$\frac{1}{d_{\mathrm{c}}} \big[ - \frac{1}{8} A^{\prime }_{z}(\boldsymbol {r}_1)- \frac{1}{8} A^{\prime }_{z}(\boldsymbol {r}_2)- \frac{1}{8} A^{\prime }_{z}(\boldsymbol {r}_3)- \frac{1}{8} A^{\prime }_{z}(\boldsymbol {r}_4)+ \frac{1}{8} A^{\prime }_{z}(\boldsymbol {r}_5)+ \frac{1}{8} A^{\prime }_{z}(\boldsymbol {r}_6)+ \frac{1}{8} A^{\prime }_{z}(\boldsymbol {r}_7)+ \frac{1}{8} A^{\prime }_{z}(\boldsymbol {r}_8)\big]$\|
A_xxx	=	\|$\frac{1}{d_{\mathrm{c}}^2} \big[ + \frac{1}{4} A^{\prime }_{x}(\boldsymbol {r}_1)+ \frac{1}{8} A^{\prime }_{z}(\boldsymbol {r}_1)+ \frac{1}{4} A^{\prime }_{x}(\boldsymbol {r}_3)- \frac{1}{8} A^{\prime }_{z}(\boldsymbol {r}_3)+ \frac{1}{4} A^{\prime }_{x}(\boldsymbol {r}_5)- \frac{1}{8} A^{\prime }_{z}(\boldsymbol {r}_5)+ \frac{1}{4} A^{\prime }_{x}(\boldsymbol {r}_7)+ \frac{1}{8} A^{\prime }_{z}(\boldsymbol {r}_7)\big]$\|
A_xxy	=	\|$\frac{1}{d_{\mathrm{c}}^2} \big[ + \frac{1}{4} A^{\prime }_{y}(\boldsymbol {r}_1)+ \frac{1}{8} A^{\prime }_{z}(\boldsymbol {r}_2)+ \frac{1}{4} A^{\prime }_{y}(\boldsymbol {r}_3)- \frac{1}{8} A^{\prime }_{z}(\boldsymbol {r}_4)+ \frac{1}{4} A^{\prime }_{y}(\boldsymbol {r}_5)- \frac{1}{8} A^{\prime }_{z}(\boldsymbol {r}_6)+ \frac{1}{4} A^{\prime }_{y}(\boldsymbol {r}_7)+ \frac{1}{8} A^{\prime }_{z}(\boldsymbol {r}_8)\big]$\|
A_xxz	=	\|$\frac{1}{d_{\mathrm{c}}^2} \big[ - \frac{9}{80} A^{\prime }_{x}(\boldsymbol {r}_1)+ \frac{1}{40} A^{\prime }_{z}(\boldsymbol {r}_1)- \frac{1}{80} A^{\prime }_{y}(\boldsymbol {r}_2)- \frac{1}{40} A^{\prime }_{z}(\boldsymbol {r}_2)+ \frac{9}{80} A^{\prime }_{x}(\boldsymbol {r}_3)+ \frac{1}{40} A^{\prime }_{z}(\boldsymbol {r}_3)+ \frac{1}{80} A^{\prime }_{y}(\boldsymbol {r}_4)- \frac{1}{40} A^{\prime }_{z}(\boldsymbol {r}_4)+ \frac{9}{80} A^{\prime }_{x}(\boldsymbol {r}_5)$\|
		\|$+ \frac{1}{40} A^{\prime }_{z}(\boldsymbol {r}_5)+ \frac{1}{80} A^{\prime }_{y}(\boldsymbol {r}_6)- \frac{1}{40} A^{\prime }_{z}(\boldsymbol {r}_6)- \frac{9}{80} A^{\prime }_{x}(\boldsymbol {r}_7)+ \frac{1}{40} A^{\prime }_{z}(\boldsymbol {r}_7)- \frac{1}{80} A^{\prime }_{y}(\boldsymbol {r}_8)- \frac{1}{40} A^{\prime }_{z}(\boldsymbol {r}_8)\big]$\|
A_xyy	=	\|$\frac{1}{d_{\mathrm{c}}^2} \big[ + \frac{1}{8} A^{\prime }_{z}(\boldsymbol {r}_1)+ \frac{1}{4} A^{\prime }_{x}(\boldsymbol {r}_2)- \frac{1}{8} A^{\prime }_{z}(\boldsymbol {r}_3)+ \frac{1}{4} A^{\prime }_{x}(\boldsymbol {r}_4)- \frac{1}{8} A^{\prime }_{z}(\boldsymbol {r}_5)+ \frac{1}{4} A^{\prime }_{x}(\boldsymbol {r}_6)+ \frac{1}{8} A^{\prime }_{z}(\boldsymbol {r}_7)+ \frac{1}{4} A^{\prime }_{x}(\boldsymbol {r}_8)\big]$\|
A_xyz	=	\|$\frac{1}{d_{\mathrm{c}}^2} \big[ - \frac{1}{16} A^{\prime }_{y}(\boldsymbol {r}_1)- \frac{1}{16} A^{\prime }_{x}(\boldsymbol {r}_2)+ \frac{1}{16} A^{\prime }_{y}(\boldsymbol {r}_3)+ \frac{1}{16} A^{\prime }_{x}(\boldsymbol {r}_4)+ \frac{1}{16} A^{\prime }_{y}(\boldsymbol {r}_5)+ \frac{1}{16} A^{\prime }_{x}(\boldsymbol {r}_6)- \frac{1}{16} A^{\prime }_{y}(\boldsymbol {r}_7)- \frac{1}{16} A^{\prime }_{x}(\boldsymbol {r}_8)\big]$\|
A_xzz	=	\|$\frac{1}{d_{\mathrm{c}}^2} \big[ - \frac{1}{8} A^{\prime }_{z}(\boldsymbol {r}_1)+ \frac{1}{8} A^{\prime }_{z}(\boldsymbol {r}_3)+ \frac{1}{8} A^{\prime }_{z}(\boldsymbol {r}_5)- \frac{1}{8} A^{\prime }_{z}(\boldsymbol {r}_7)\big]$\|
A_yyy	=	\|$\frac{1}{d_{\mathrm{c}}^2} \big[ + \frac{1}{4} A^{\prime }_{y}(\boldsymbol {r}_2)+ \frac{1}{8} A^{\prime }_{z}(\boldsymbol {r}_2)+ \frac{1}{4} A^{\prime }_{y}(\boldsymbol {r}_4)- \frac{1}{8} A^{\prime }_{z}(\boldsymbol {r}_4)+ \frac{1}{4} A^{\prime }_{y}(\boldsymbol {r}_6)- \frac{1}{8} A^{\prime }_{z}(\boldsymbol {r}_6)+ \frac{1}{4} A^{\prime }_{y}(\boldsymbol {r}_8)+ \frac{1}{8} A^{\prime }_{z}(\boldsymbol {r}_8)\big]$\|
A_yyz	=	\|$\frac{1}{d_{\mathrm{c}}^2} \big[ - \frac{1}{80} A^{\prime }_{x}(\boldsymbol {r}_1)- \frac{1}{40} A^{\prime }_{z}(\boldsymbol {r}_1)- \frac{9}{80} A^{\prime }_{y}(\boldsymbol {r}_2)+ \frac{1}{40} A^{\prime }_{z}(\boldsymbol {r}_2)+ \frac{1}{80} A^{\prime }_{x}(\boldsymbol {r}_3)- \frac{1}{40} A^{\prime }_{z}(\boldsymbol {r}_3)+ \frac{9}{80} A^{\prime }_{y}(\boldsymbol {r}_4)+ \frac{1}{40} A^{\prime }_{z}(\boldsymbol {r}_4)+ \frac{1}{80} A^{\prime }_{x}(\boldsymbol {r}_5)$\|
		\|$- \frac{1}{40} A^{\prime }_{z}(\boldsymbol {r}_5)+ \frac{9}{80} A^{\prime }_{y}(\boldsymbol {r}_6)+ \frac{1}{40} A^{\prime }_{z}(\boldsymbol {r}_6)- \frac{1}{80} A^{\prime }_{x}(\boldsymbol {r}_7)- \frac{1}{40} A^{\prime }_{z}(\boldsymbol {r}_7)- \frac{9}{80} A^{\prime }_{y}(\boldsymbol {r}_8)+ \frac{1}{40} A^{\prime }_{z}(\boldsymbol {r}_8)\big]$\|
A_yzz	=	\|$\frac{1}{d_{\mathrm{c}}^2} \big[ - \frac{1}{8} A^{\prime }_{z}(\boldsymbol {r}_2)+ \frac{1}{8} A^{\prime }_{z}(\boldsymbol {r}_4)+ \frac{1}{8} A^{\prime }_{z}(\boldsymbol {r}_6)- \frac{1}{8} A^{\prime }_{z}(\boldsymbol {r}_8)\big]$\|
A_zzz	=	\|$\frac{1}{d_{\mathrm{c}}^2} \big[ + \frac{1}{16} A^{\prime }_{x}(\boldsymbol {r}_1)+ \frac{1}{8} A^{\prime }_{z}(\boldsymbol {r}_1)+ \frac{1}{16} A^{\prime }_{y}(\boldsymbol {r}_2)+ \frac{1}{8} A^{\prime }_{z}(\boldsymbol {r}_2)- \frac{1}{16} A^{\prime }_{x}(\boldsymbol {r}_3)+ \frac{1}{8} A^{\prime }_{z}(\boldsymbol {r}_3)- \frac{1}{16} A^{\prime }_{y}(\boldsymbol {r}_4)+ \frac{1}{8} A^{\prime }_{z}(\boldsymbol {r}_4)- \frac{1}{16} A^{\prime }_{x}(\boldsymbol {r}_5)$\|
		\|$+ \frac{1}{8} A^{\prime }_{z}(\boldsymbol {r}_5)- \frac{1}{16} A^{\prime }_{y}(\boldsymbol {r}_6)+ \frac{1}{8} A^{\prime }_{z}(\boldsymbol {r}_6)+ \frac{1}{16} A^{\prime }_{x}(\boldsymbol {r}_7)+ \frac{1}{8} A^{\prime }_{z}(\boldsymbol {r}_7)+ \frac{1}{16} A^{\prime }_{y}(\boldsymbol {r}_8)+ \frac{1}{8} A^{\prime }_{z}(\boldsymbol {r}_8)\big]$\|
\|$A_{x}(\boldsymbol {r})$\|	=	\|$A_{x}(\boldsymbol {r}_{\mathrm{cm}}) + A_{xxx}x^{\prime 2} + 2A_{xxy}x^{\prime }y^{\prime } + 2A_{xxz}x^{\prime }z^{\prime } + A_{xx}x^{\prime } + A_{xyy}y^{\prime 2} + 2A_{xyz}y^{\prime }z^{\prime } + A_{xy}y^{\prime } + A_{xzz}z^{\prime 2} + A_{xz}z^{\prime }$\|
\|$A_{y}(\boldsymbol {r})$\|	=	\|$A_{y}(\boldsymbol {r}_{\mathrm{cm}}) + A_{xxy}x^{\prime 2} + 2A_{xyy}x^{\prime }y^{\prime } + 2A_{xyz}x^{\prime }z^{\prime } + A_{xy}x^{\prime } + A_{yyy}y^{\prime 2} + 2A_{yyz}y^{\prime }z^{\prime } + A_{yy}y^{\prime } + A_{yzz}z^{\prime 2} + A_{yz}z^{\prime }$\|
\|$A_{z}(\boldsymbol {r})$\|	=	\|$A_{z}(\boldsymbol {r}_{\mathrm{cm}}) + A_{xxz}x^{\prime 2} + 2A_{xyz}x^{\prime }y^{\prime } + 2A_{xzz}x^{\prime }z^{\prime } + A_{xz}x^{\prime } + A_{yyz}y^{\prime 2} + 2A_{yzz}y^{\prime }z^{\prime } + A_{yz}y^{\prime } + A_{zzz}z^{\prime 2} + A_{zz}z^{\prime }$\|
\|$\boldsymbol {r}^{\prime }_{\mathrm{i}}/d_{\mathrm{c}}$\|	=	\|$\left [\,1\quad 0 \quad -1\right ] \quad \left [\,0\quad 1 \quad -1\right ]\quad \left [\,-1\quad 0 \quad -1\right ]\quad \left [\,0\quad -1 \quad -1\right ]\quad \left [\,1\quad 0 \quad 1\right ]\quad \left [\,0\quad 1 \quad 1\right ]\quad \left [\,-1\quad 0 \quad 1\right ]\quad \left [\,0\quad -1 \quad 1\right ]$\|

Second-order tidal tensor with four measure points at corners of a regular tetrahedron
A_xx	=	\|$\frac{1}{d_{\mathrm{c}}} \big[ + \frac{1}{2} A^{\prime }_{x}(\boldsymbol {r}_1)- \frac{1}{2} A^{\prime }_{x}(\boldsymbol {r}_2)\big]$\|
A_xy	=	\|$\frac{1}{d_{\mathrm{c}}} \big[ + \frac{1}{4} A^{\prime }_{y}(\boldsymbol {r}_1)- \frac{1}{4} A^{\prime }_{y}(\boldsymbol {r}_2)+ \frac{1}{4} A^{\prime }_{x}(\boldsymbol {r}_3)- \frac{1}{4} A^{\prime }_{x}(\boldsymbol {r}_4)\big]$\|
A_xz	=	\|$\frac{1}{d_{\mathrm{c}}} \big[ - \frac{\sqrt{2}}{8} A^{\prime }_{x}(\boldsymbol {r}_1)+ \frac{1}{4} A^{\prime }_{z}(\boldsymbol {r}_1)- \frac{\sqrt{2}}{8} A^{\prime }_{x}(\boldsymbol {r}_2)- \frac{1}{4} A^{\prime }_{z}(\boldsymbol {r}_2)+ \frac{\sqrt{2}}{8} A^{\prime }_{x}(\boldsymbol {r}_3)+ \frac{\sqrt{2}}{8} A^{\prime }_{x}(\boldsymbol {r}_4)\big]$\|
A_yy	=	\|$\frac{1}{d_{\mathrm{c}}} \big[ + \frac{1}{2} A^{\prime }_{y}(\boldsymbol {r}_3)- \frac{1}{2} A^{\prime }_{y}(\boldsymbol {r}_4)\big]$\|
A_yz	=	\|$\frac{1}{d_{\mathrm{c}}} \big[ - \frac{\sqrt{2}}{8} A^{\prime }_{y}(\boldsymbol {r}_1)- \frac{\sqrt{2}}{8} A^{\prime }_{y}(\boldsymbol {r}_2)+ \frac{\sqrt{2}}{8} A^{\prime }_{y}(\boldsymbol {r}_3)+ \frac{1}{4} A^{\prime }_{z}(\boldsymbol {r}_3)+ \frac{\sqrt{2}}{8} A^{\prime }_{y}(\boldsymbol {r}_4)- \frac{1}{4} A^{\prime }_{z}(\boldsymbol {r}_4)\big]$\|
A_zz	=	\|$\frac{1}{d_{\mathrm{c}}} \big[ - \frac{\sqrt{2}}{4} A^{\prime }_{z}(\boldsymbol {r}_1)- \frac{\sqrt{2}}{4} A^{\prime }_{z}(\boldsymbol {r}_2)+ \frac{\sqrt{2}}{4} A^{\prime }_{z}(\boldsymbol {r}_3)+ \frac{\sqrt{2}}{4} A^{\prime }_{z}(\boldsymbol {r}_4)\big]$\|
\|$A_{x}(\boldsymbol {r})$\|	=	\|$A_{x}(\boldsymbol {r}_{\mathrm{cm}}) + A_{xx}x^{\prime } + A_{xy}y^{\prime } + A_{xz}z^{\prime }$\|
\|$A_{y}(\boldsymbol {r})$\|	=	\|$A_{y}(\boldsymbol {r}_{\mathrm{cm}}) + A_{xy}x^{\prime } + A_{yy}y^{\prime } + A_{yz}z^{\prime }$\|
\|$A_{z}(\boldsymbol {r})$\|	=	\|$A_{z}(\boldsymbol {r}_{\mathrm{cm}}) + A_{xz}x^{\prime } + A_{yz}y^{\prime } + A_{zz}z^{\prime }$\|
\|$\boldsymbol {r}^{\prime }_{\mathrm{i}}/d_{\mathrm{c}}$\|	=	\|$\left [ 1\quad 0 \quad - \frac{\sqrt{2}}{2} \right ]\quad\left [ -1\quad 0 \quad - \frac{\sqrt{2}}{2} \right ]\quad\left [ 0\quad 1 \quad - \frac{\sqrt{2}}{2} \right ]\quad\left [ 0\quad -1 \quad - \frac{\sqrt{2}}{2} \right ]$\|
Third-order tidal tensor with eight measure points at corners of a cube
A_xx	=	\|$\frac{1}{d_{\mathrm{c}}} \big[ + \frac{1}{4} A^{\prime }_{x}(\boldsymbol {r}_1)- \frac{1}{4} A^{\prime }_{x}(\boldsymbol {r}_3)+ \frac{1}{4} A^{\prime }_{x}(\boldsymbol {r}_5)- \frac{1}{4} A^{\prime }_{x}(\boldsymbol {r}_7)\big]$\|
A_xy	=	\|$\frac{1}{d_{\mathrm{c}}} \big[ + \frac{1}{8} A^{\prime }_{y}(\boldsymbol {r}_1)+ \frac{1}{8} A^{\prime }_{x}(\boldsymbol {r}_2)- \frac{1}{8} A^{\prime }_{y}(\boldsymbol {r}_3)- \frac{1}{8} A^{\prime }_{x}(\boldsymbol {r}_4)+ \frac{1}{8} A^{\prime }_{y}(\boldsymbol {r}_5)+ \frac{1}{8} A^{\prime }_{x}(\boldsymbol {r}_6)- \frac{1}{8} A^{\prime }_{y}(\boldsymbol {r}_7)- \frac{1}{8} A^{\prime }_{x}(\boldsymbol {r}_8)\big]$\|
A_xz	=	\|$\frac{1}{d_{\mathrm{c}}} \big[ - \frac{1}{12} A^{\prime }_{x}(\boldsymbol {r}_1)+ \frac{1}{12} A^{\prime }_{z}(\boldsymbol {r}_1)- \frac{1}{12} A^{\prime }_{x}(\boldsymbol {r}_2)- \frac{1}{12} A^{\prime }_{x}(\boldsymbol {r}_3)- \frac{1}{12} A^{\prime }_{z}(\boldsymbol {r}_3)- \frac{1}{12} A^{\prime }_{x}(\boldsymbol {r}_4)+ \frac{1}{12} A^{\prime }_{x}(\boldsymbol {r}_5)+ \frac{1}{12} A^{\prime }_{z}(\boldsymbol {r}_5)+ \frac{1}{12} A^{\prime }_{x}(\boldsymbol {r}_6)$\|
		\|$+ \frac{1}{12} A^{\prime }_{x}(\boldsymbol {r}_7)- \frac{1}{12} A^{\prime }_{z}(\boldsymbol {r}_7)+ \frac{1}{12} A^{\prime }_{x}(\boldsymbol {r}_8)\big]$\|
A_yy	=	\|$\frac{1}{d_{\mathrm{c}}} \big[ + \frac{1}{4} A^{\prime }_{y}(\boldsymbol {r}_2)- \frac{1}{4} A^{\prime }_{y}(\boldsymbol {r}_4)+ \frac{1}{4} A^{\prime }_{y}(\boldsymbol {r}_6)- \frac{1}{4} A^{\prime }_{y}(\boldsymbol {r}_8)\big]$\|
A_yz	=	\|$\frac{1}{d_{\mathrm{c}}} \big[ - \frac{1}{12} A^{\prime }_{y}(\boldsymbol {r}_1)- \frac{1}{12} A^{\prime }_{y}(\boldsymbol {r}_2)+ \frac{1}{12} A^{\prime }_{z}(\boldsymbol {r}_2)- \frac{1}{12} A^{\prime }_{y}(\boldsymbol {r}_3)- \frac{1}{12} A^{\prime }_{y}(\boldsymbol {r}_4)- \frac{1}{12} A^{\prime }_{z}(\boldsymbol {r}_4)+ \frac{1}{12} A^{\prime }_{y}(\boldsymbol {r}_5)+ \frac{1}{12} A^{\prime }_{y}(\boldsymbol {r}_6)+ \frac{1}{12} A^{\prime }_{z}(\boldsymbol {r}_6)$\|
		\|$+ \frac{1}{12} A^{\prime }_{y}(\boldsymbol {r}_7)+ \frac{1}{12} A^{\prime }_{y}(\boldsymbol {r}_8)- \frac{1}{12} A^{\prime }_{z}(\boldsymbol {r}_8)\big]$\|
A_zz	=	\|$\frac{1}{d_{\mathrm{c}}} \big[ - \frac{1}{8} A^{\prime }_{z}(\boldsymbol {r}_1)- \frac{1}{8} A^{\prime }_{z}(\boldsymbol {r}_2)- \frac{1}{8} A^{\prime }_{z}(\boldsymbol {r}_3)- \frac{1}{8} A^{\prime }_{z}(\boldsymbol {r}_4)+ \frac{1}{8} A^{\prime }_{z}(\boldsymbol {r}_5)+ \frac{1}{8} A^{\prime }_{z}(\boldsymbol {r}_6)+ \frac{1}{8} A^{\prime }_{z}(\boldsymbol {r}_7)+ \frac{1}{8} A^{\prime }_{z}(\boldsymbol {r}_8)\big]$\|
A_xxx	=	\|$\frac{1}{d_{\mathrm{c}}^2} \big[ + \frac{1}{4} A^{\prime }_{x}(\boldsymbol {r}_1)+ \frac{1}{8} A^{\prime }_{z}(\boldsymbol {r}_1)+ \frac{1}{4} A^{\prime }_{x}(\boldsymbol {r}_3)- \frac{1}{8} A^{\prime }_{z}(\boldsymbol {r}_3)+ \frac{1}{4} A^{\prime }_{x}(\boldsymbol {r}_5)- \frac{1}{8} A^{\prime }_{z}(\boldsymbol {r}_5)+ \frac{1}{4} A^{\prime }_{x}(\boldsymbol {r}_7)+ \frac{1}{8} A^{\prime }_{z}(\boldsymbol {r}_7)\big]$\|
A_xxy	=	\|$\frac{1}{d_{\mathrm{c}}^2} \big[ + \frac{1}{4} A^{\prime }_{y}(\boldsymbol {r}_1)+ \frac{1}{8} A^{\prime }_{z}(\boldsymbol {r}_2)+ \frac{1}{4} A^{\prime }_{y}(\boldsymbol {r}_3)- \frac{1}{8} A^{\prime }_{z}(\boldsymbol {r}_4)+ \frac{1}{4} A^{\prime }_{y}(\boldsymbol {r}_5)- \frac{1}{8} A^{\prime }_{z}(\boldsymbol {r}_6)+ \frac{1}{4} A^{\prime }_{y}(\boldsymbol {r}_7)+ \frac{1}{8} A^{\prime }_{z}(\boldsymbol {r}_8)\big]$\|
A_xxz	=	\|$\frac{1}{d_{\mathrm{c}}^2} \big[ - \frac{9}{80} A^{\prime }_{x}(\boldsymbol {r}_1)+ \frac{1}{40} A^{\prime }_{z}(\boldsymbol {r}_1)- \frac{1}{80} A^{\prime }_{y}(\boldsymbol {r}_2)- \frac{1}{40} A^{\prime }_{z}(\boldsymbol {r}_2)+ \frac{9}{80} A^{\prime }_{x}(\boldsymbol {r}_3)+ \frac{1}{40} A^{\prime }_{z}(\boldsymbol {r}_3)+ \frac{1}{80} A^{\prime }_{y}(\boldsymbol {r}_4)- \frac{1}{40} A^{\prime }_{z}(\boldsymbol {r}_4)+ \frac{9}{80} A^{\prime }_{x}(\boldsymbol {r}_5)$\|
		\|$+ \frac{1}{40} A^{\prime }_{z}(\boldsymbol {r}_5)+ \frac{1}{80} A^{\prime }_{y}(\boldsymbol {r}_6)- \frac{1}{40} A^{\prime }_{z}(\boldsymbol {r}_6)- \frac{9}{80} A^{\prime }_{x}(\boldsymbol {r}_7)+ \frac{1}{40} A^{\prime }_{z}(\boldsymbol {r}_7)- \frac{1}{80} A^{\prime }_{y}(\boldsymbol {r}_8)- \frac{1}{40} A^{\prime }_{z}(\boldsymbol {r}_8)\big]$\|
A_xyy	=	\|$\frac{1}{d_{\mathrm{c}}^2} \big[ + \frac{1}{8} A^{\prime }_{z}(\boldsymbol {r}_1)+ \frac{1}{4} A^{\prime }_{x}(\boldsymbol {r}_2)- \frac{1}{8} A^{\prime }_{z}(\boldsymbol {r}_3)+ \frac{1}{4} A^{\prime }_{x}(\boldsymbol {r}_4)- \frac{1}{8} A^{\prime }_{z}(\boldsymbol {r}_5)+ \frac{1}{4} A^{\prime }_{x}(\boldsymbol {r}_6)+ \frac{1}{8} A^{\prime }_{z}(\boldsymbol {r}_7)+ \frac{1}{4} A^{\prime }_{x}(\boldsymbol {r}_8)\big]$\|
A_xyz	=	\|$\frac{1}{d_{\mathrm{c}}^2} \big[ - \frac{1}{16} A^{\prime }_{y}(\boldsymbol {r}_1)- \frac{1}{16} A^{\prime }_{x}(\boldsymbol {r}_2)+ \frac{1}{16} A^{\prime }_{y}(\boldsymbol {r}_3)+ \frac{1}{16} A^{\prime }_{x}(\boldsymbol {r}_4)+ \frac{1}{16} A^{\prime }_{y}(\boldsymbol {r}_5)+ \frac{1}{16} A^{\prime }_{x}(\boldsymbol {r}_6)- \frac{1}{16} A^{\prime }_{y}(\boldsymbol {r}_7)- \frac{1}{16} A^{\prime }_{x}(\boldsymbol {r}_8)\big]$\|
A_xzz	=	\|$\frac{1}{d_{\mathrm{c}}^2} \big[ - \frac{1}{8} A^{\prime }_{z}(\boldsymbol {r}_1)+ \frac{1}{8} A^{\prime }_{z}(\boldsymbol {r}_3)+ \frac{1}{8} A^{\prime }_{z}(\boldsymbol {r}_5)- \frac{1}{8} A^{\prime }_{z}(\boldsymbol {r}_7)\big]$\|
A_yyy	=	\|$\frac{1}{d_{\mathrm{c}}^2} \big[ + \frac{1}{4} A^{\prime }_{y}(\boldsymbol {r}_2)+ \frac{1}{8} A^{\prime }_{z}(\boldsymbol {r}_2)+ \frac{1}{4} A^{\prime }_{y}(\boldsymbol {r}_4)- \frac{1}{8} A^{\prime }_{z}(\boldsymbol {r}_4)+ \frac{1}{4} A^{\prime }_{y}(\boldsymbol {r}_6)- \frac{1}{8} A^{\prime }_{z}(\boldsymbol {r}_6)+ \frac{1}{4} A^{\prime }_{y}(\boldsymbol {r}_8)+ \frac{1}{8} A^{\prime }_{z}(\boldsymbol {r}_8)\big]$\|
A_yyz	=	\|$\frac{1}{d_{\mathrm{c}}^2} \big[ - \frac{1}{80} A^{\prime }_{x}(\boldsymbol {r}_1)- \frac{1}{40} A^{\prime }_{z}(\boldsymbol {r}_1)- \frac{9}{80} A^{\prime }_{y}(\boldsymbol {r}_2)+ \frac{1}{40} A^{\prime }_{z}(\boldsymbol {r}_2)+ \frac{1}{80} A^{\prime }_{x}(\boldsymbol {r}_3)- \frac{1}{40} A^{\prime }_{z}(\boldsymbol {r}_3)+ \frac{9}{80} A^{\prime }_{y}(\boldsymbol {r}_4)+ \frac{1}{40} A^{\prime }_{z}(\boldsymbol {r}_4)+ \frac{1}{80} A^{\prime }_{x}(\boldsymbol {r}_5)$\|
		\|$- \frac{1}{40} A^{\prime }_{z}(\boldsymbol {r}_5)+ \frac{9}{80} A^{\prime }_{y}(\boldsymbol {r}_6)+ \frac{1}{40} A^{\prime }_{z}(\boldsymbol {r}_6)- \frac{1}{80} A^{\prime }_{x}(\boldsymbol {r}_7)- \frac{1}{40} A^{\prime }_{z}(\boldsymbol {r}_7)- \frac{9}{80} A^{\prime }_{y}(\boldsymbol {r}_8)+ \frac{1}{40} A^{\prime }_{z}(\boldsymbol {r}_8)\big]$\|
A_yzz	=	\|$\frac{1}{d_{\mathrm{c}}^2} \big[ - \frac{1}{8} A^{\prime }_{z}(\boldsymbol {r}_2)+ \frac{1}{8} A^{\prime }_{z}(\boldsymbol {r}_4)+ \frac{1}{8} A^{\prime }_{z}(\boldsymbol {r}_6)- \frac{1}{8} A^{\prime }_{z}(\boldsymbol {r}_8)\big]$\|
A_zzz	=	\|$\frac{1}{d_{\mathrm{c}}^2} \big[ + \frac{1}{16} A^{\prime }_{x}(\boldsymbol {r}_1)+ \frac{1}{8} A^{\prime }_{z}(\boldsymbol {r}_1)+ \frac{1}{16} A^{\prime }_{y}(\boldsymbol {r}_2)+ \frac{1}{8} A^{\prime }_{z}(\boldsymbol {r}_2)- \frac{1}{16} A^{\prime }_{x}(\boldsymbol {r}_3)+ \frac{1}{8} A^{\prime }_{z}(\boldsymbol {r}_3)- \frac{1}{16} A^{\prime }_{y}(\boldsymbol {r}_4)+ \frac{1}{8} A^{\prime }_{z}(\boldsymbol {r}_4)- \frac{1}{16} A^{\prime }_{x}(\boldsymbol {r}_5)$\|
		\|$+ \frac{1}{8} A^{\prime }_{z}(\boldsymbol {r}_5)- \frac{1}{16} A^{\prime }_{y}(\boldsymbol {r}_6)+ \frac{1}{8} A^{\prime }_{z}(\boldsymbol {r}_6)+ \frac{1}{16} A^{\prime }_{x}(\boldsymbol {r}_7)+ \frac{1}{8} A^{\prime }_{z}(\boldsymbol {r}_7)+ \frac{1}{16} A^{\prime }_{y}(\boldsymbol {r}_8)+ \frac{1}{8} A^{\prime }_{z}(\boldsymbol {r}_8)\big]$\|
\|$A_{x}(\boldsymbol {r})$\|	=	\|$A_{x}(\boldsymbol {r}_{\mathrm{cm}}) + A_{xxx}x^{\prime 2} + 2A_{xxy}x^{\prime }y^{\prime } + 2A_{xxz}x^{\prime }z^{\prime } + A_{xx}x^{\prime } + A_{xyy}y^{\prime 2} + 2A_{xyz}y^{\prime }z^{\prime } + A_{xy}y^{\prime } + A_{xzz}z^{\prime 2} + A_{xz}z^{\prime }$\|
\|$A_{y}(\boldsymbol {r})$\|	=	\|$A_{y}(\boldsymbol {r}_{\mathrm{cm}}) + A_{xxy}x^{\prime 2} + 2A_{xyy}x^{\prime }y^{\prime } + 2A_{xyz}x^{\prime }z^{\prime } + A_{xy}x^{\prime } + A_{yyy}y^{\prime 2} + 2A_{yyz}y^{\prime }z^{\prime } + A_{yy}y^{\prime } + A_{yzz}z^{\prime 2} + A_{yz}z^{\prime }$\|
\|$A_{z}(\boldsymbol {r})$\|	=	\|$A_{z}(\boldsymbol {r}_{\mathrm{cm}}) + A_{xxz}x^{\prime 2} + 2A_{xyz}x^{\prime }y^{\prime } + 2A_{xzz}x^{\prime }z^{\prime } + A_{xz}x^{\prime } + A_{yyz}y^{\prime 2} + 2A_{yzz}y^{\prime }z^{\prime } + A_{yz}y^{\prime } + A_{zzz}z^{\prime 2} + A_{zz}z^{\prime }$\|
\|$\boldsymbol {r}^{\prime }_{\mathrm{i}}/d_{\mathrm{c}}$\|	=	\|$\left [\,1\quad 0 \quad -1\right ] \quad \left [\,0\quad 1 \quad -1\right ]\quad \left [\,-1\quad 0 \quad -1\right ]\quad \left [\,0\quad -1 \quad -1\right ]\quad \left [\,1\quad 0 \quad 1\right ]\quad \left [\,0\quad 1 \quad 1\right ]\quad \left [\,-1\quad 0 \quad 1\right ]\quad \left [\,0\quad -1 \quad 1\right ]$\|

Table 1.

Open in new tab

Second-order tidal tensor with four measure points at corners of a regular tetrahedron
A_xx	=	\|$\frac{1}{d_{\mathrm{c}}} \big[ + \frac{1}{2} A^{\prime }_{x}(\boldsymbol {r}_1)- \frac{1}{2} A^{\prime }_{x}(\boldsymbol {r}_2)\big]$\|
A_xy	=	\|$\frac{1}{d_{\mathrm{c}}} \big[ + \frac{1}{4} A^{\prime }_{y}(\boldsymbol {r}_1)- \frac{1}{4} A^{\prime }_{y}(\boldsymbol {r}_2)+ \frac{1}{4} A^{\prime }_{x}(\boldsymbol {r}_3)- \frac{1}{4} A^{\prime }_{x}(\boldsymbol {r}_4)\big]$\|
A_xz	=	\|$\frac{1}{d_{\mathrm{c}}} \big[ - \frac{\sqrt{2}}{8} A^{\prime }_{x}(\boldsymbol {r}_1)+ \frac{1}{4} A^{\prime }_{z}(\boldsymbol {r}_1)- \frac{\sqrt{2}}{8} A^{\prime }_{x}(\boldsymbol {r}_2)- \frac{1}{4} A^{\prime }_{z}(\boldsymbol {r}_2)+ \frac{\sqrt{2}}{8} A^{\prime }_{x}(\boldsymbol {r}_3)+ \frac{\sqrt{2}}{8} A^{\prime }_{x}(\boldsymbol {r}_4)\big]$\|
A_yy	=	\|$\frac{1}{d_{\mathrm{c}}} \big[ + \frac{1}{2} A^{\prime }_{y}(\boldsymbol {r}_3)- \frac{1}{2} A^{\prime }_{y}(\boldsymbol {r}_4)\big]$\|
A_yz	=	\|$\frac{1}{d_{\mathrm{c}}} \big[ - \frac{\sqrt{2}}{8} A^{\prime }_{y}(\boldsymbol {r}_1)- \frac{\sqrt{2}}{8} A^{\prime }_{y}(\boldsymbol {r}_2)+ \frac{\sqrt{2}}{8} A^{\prime }_{y}(\boldsymbol {r}_3)+ \frac{1}{4} A^{\prime }_{z}(\boldsymbol {r}_3)+ \frac{\sqrt{2}}{8} A^{\prime }_{y}(\boldsymbol {r}_4)- \frac{1}{4} A^{\prime }_{z}(\boldsymbol {r}_4)\big]$\|
A_zz	=	\|$\frac{1}{d_{\mathrm{c}}} \big[ - \frac{\sqrt{2}}{4} A^{\prime }_{z}(\boldsymbol {r}_1)- \frac{\sqrt{2}}{4} A^{\prime }_{z}(\boldsymbol {r}_2)+ \frac{\sqrt{2}}{4} A^{\prime }_{z}(\boldsymbol {r}_3)+ \frac{\sqrt{2}}{4} A^{\prime }_{z}(\boldsymbol {r}_4)\big]$\|
\|$A_{x}(\boldsymbol {r})$\|	=	\|$A_{x}(\boldsymbol {r}_{\mathrm{cm}}) + A_{xx}x^{\prime } + A_{xy}y^{\prime } + A_{xz}z^{\prime }$\|
\|$A_{y}(\boldsymbol {r})$\|	=	\|$A_{y}(\boldsymbol {r}_{\mathrm{cm}}) + A_{xy}x^{\prime } + A_{yy}y^{\prime } + A_{yz}z^{\prime }$\|
\|$A_{z}(\boldsymbol {r})$\|	=	\|$A_{z}(\boldsymbol {r}_{\mathrm{cm}}) + A_{xz}x^{\prime } + A_{yz}y^{\prime } + A_{zz}z^{\prime }$\|
\|$\boldsymbol {r}^{\prime }_{\mathrm{i}}/d_{\mathrm{c}}$\|	=	\|$\left [ 1\quad 0 \quad - \frac{\sqrt{2}}{2} \right ]\quad\left [ -1\quad 0 \quad - \frac{\sqrt{2}}{2} \right ]\quad\left [ 0\quad 1 \quad - \frac{\sqrt{2}}{2} \right ]\quad\left [ 0\quad -1 \quad - \frac{\sqrt{2}}{2} \right ]$\|
Third-order tidal tensor with eight measure points at corners of a cube
A_xx	=	\|$\frac{1}{d_{\mathrm{c}}} \big[ + \frac{1}{4} A^{\prime }_{x}(\boldsymbol {r}_1)- \frac{1}{4} A^{\prime }_{x}(\boldsymbol {r}_3)+ \frac{1}{4} A^{\prime }_{x}(\boldsymbol {r}_5)- \frac{1}{4} A^{\prime }_{x}(\boldsymbol {r}_7)\big]$\|
A_xy	=	\|$\frac{1}{d_{\mathrm{c}}} \big[ + \frac{1}{8} A^{\prime }_{y}(\boldsymbol {r}_1)+ \frac{1}{8} A^{\prime }_{x}(\boldsymbol {r}_2)- \frac{1}{8} A^{\prime }_{y}(\boldsymbol {r}_3)- \frac{1}{8} A^{\prime }_{x}(\boldsymbol {r}_4)+ \frac{1}{8} A^{\prime }_{y}(\boldsymbol {r}_5)+ \frac{1}{8} A^{\prime }_{x}(\boldsymbol {r}_6)- \frac{1}{8} A^{\prime }_{y}(\boldsymbol {r}_7)- \frac{1}{8} A^{\prime }_{x}(\boldsymbol {r}_8)\big]$\|
A_xz	=	\|$\frac{1}{d_{\mathrm{c}}} \big[ - \frac{1}{12} A^{\prime }_{x}(\boldsymbol {r}_1)+ \frac{1}{12} A^{\prime }_{z}(\boldsymbol {r}_1)- \frac{1}{12} A^{\prime }_{x}(\boldsymbol {r}_2)- \frac{1}{12} A^{\prime }_{x}(\boldsymbol {r}_3)- \frac{1}{12} A^{\prime }_{z}(\boldsymbol {r}_3)- \frac{1}{12} A^{\prime }_{x}(\boldsymbol {r}_4)+ \frac{1}{12} A^{\prime }_{x}(\boldsymbol {r}_5)+ \frac{1}{12} A^{\prime }_{z}(\boldsymbol {r}_5)+ \frac{1}{12} A^{\prime }_{x}(\boldsymbol {r}_6)$\|
		\|$+ \frac{1}{12} A^{\prime }_{x}(\boldsymbol {r}_7)- \frac{1}{12} A^{\prime }_{z}(\boldsymbol {r}_7)+ \frac{1}{12} A^{\prime }_{x}(\boldsymbol {r}_8)\big]$\|
A_yy	=	\|$\frac{1}{d_{\mathrm{c}}} \big[ + \frac{1}{4} A^{\prime }_{y}(\boldsymbol {r}_2)- \frac{1}{4} A^{\prime }_{y}(\boldsymbol {r}_4)+ \frac{1}{4} A^{\prime }_{y}(\boldsymbol {r}_6)- \frac{1}{4} A^{\prime }_{y}(\boldsymbol {r}_8)\big]$\|
A_yz	=	\|$\frac{1}{d_{\mathrm{c}}} \big[ - \frac{1}{12} A^{\prime }_{y}(\boldsymbol {r}_1)- \frac{1}{12} A^{\prime }_{y}(\boldsymbol {r}_2)+ \frac{1}{12} A^{\prime }_{z}(\boldsymbol {r}_2)- \frac{1}{12} A^{\prime }_{y}(\boldsymbol {r}_3)- \frac{1}{12} A^{\prime }_{y}(\boldsymbol {r}_4)- \frac{1}{12} A^{\prime }_{z}(\boldsymbol {r}_4)+ \frac{1}{12} A^{\prime }_{y}(\boldsymbol {r}_5)+ \frac{1}{12} A^{\prime }_{y}(\boldsymbol {r}_6)+ \frac{1}{12} A^{\prime }_{z}(\boldsymbol {r}_6)$\|
		\|$+ \frac{1}{12} A^{\prime }_{y}(\boldsymbol {r}_7)+ \frac{1}{12} A^{\prime }_{y}(\boldsymbol {r}_8)- \frac{1}{12} A^{\prime }_{z}(\boldsymbol {r}_8)\big]$\|
A_zz	=	\|$\frac{1}{d_{\mathrm{c}}} \big[ - \frac{1}{8} A^{\prime }_{z}(\boldsymbol {r}_1)- \frac{1}{8} A^{\prime }_{z}(\boldsymbol {r}_2)- \frac{1}{8} A^{\prime }_{z}(\boldsymbol {r}_3)- \frac{1}{8} A^{\prime }_{z}(\boldsymbol {r}_4)+ \frac{1}{8} A^{\prime }_{z}(\boldsymbol {r}_5)+ \frac{1}{8} A^{\prime }_{z}(\boldsymbol {r}_6)+ \frac{1}{8} A^{\prime }_{z}(\boldsymbol {r}_7)+ \frac{1}{8} A^{\prime }_{z}(\boldsymbol {r}_8)\big]$\|
A_xxx	=	\|$\frac{1}{d_{\mathrm{c}}^2} \big[ + \frac{1}{4} A^{\prime }_{x}(\boldsymbol {r}_1)+ \frac{1}{8} A^{\prime }_{z}(\boldsymbol {r}_1)+ \frac{1}{4} A^{\prime }_{x}(\boldsymbol {r}_3)- \frac{1}{8} A^{\prime }_{z}(\boldsymbol {r}_3)+ \frac{1}{4} A^{\prime }_{x}(\boldsymbol {r}_5)- \frac{1}{8} A^{\prime }_{z}(\boldsymbol {r}_5)+ \frac{1}{4} A^{\prime }_{x}(\boldsymbol {r}_7)+ \frac{1}{8} A^{\prime }_{z}(\boldsymbol {r}_7)\big]$\|
A_xxy	=	\|$\frac{1}{d_{\mathrm{c}}^2} \big[ + \frac{1}{4} A^{\prime }_{y}(\boldsymbol {r}_1)+ \frac{1}{8} A^{\prime }_{z}(\boldsymbol {r}_2)+ \frac{1}{4} A^{\prime }_{y}(\boldsymbol {r}_3)- \frac{1}{8} A^{\prime }_{z}(\boldsymbol {r}_4)+ \frac{1}{4} A^{\prime }_{y}(\boldsymbol {r}_5)- \frac{1}{8} A^{\prime }_{z}(\boldsymbol {r}_6)+ \frac{1}{4} A^{\prime }_{y}(\boldsymbol {r}_7)+ \frac{1}{8} A^{\prime }_{z}(\boldsymbol {r}_8)\big]$\|
A_xxz	=	\|$\frac{1}{d_{\mathrm{c}}^2} \big[ - \frac{9}{80} A^{\prime }_{x}(\boldsymbol {r}_1)+ \frac{1}{40} A^{\prime }_{z}(\boldsymbol {r}_1)- \frac{1}{80} A^{\prime }_{y}(\boldsymbol {r}_2)- \frac{1}{40} A^{\prime }_{z}(\boldsymbol {r}_2)+ \frac{9}{80} A^{\prime }_{x}(\boldsymbol {r}_3)+ \frac{1}{40} A^{\prime }_{z}(\boldsymbol {r}_3)+ \frac{1}{80} A^{\prime }_{y}(\boldsymbol {r}_4)- \frac{1}{40} A^{\prime }_{z}(\boldsymbol {r}_4)+ \frac{9}{80} A^{\prime }_{x}(\boldsymbol {r}_5)$\|
		\|$+ \frac{1}{40} A^{\prime }_{z}(\boldsymbol {r}_5)+ \frac{1}{80} A^{\prime }_{y}(\boldsymbol {r}_6)- \frac{1}{40} A^{\prime }_{z}(\boldsymbol {r}_6)- \frac{9}{80} A^{\prime }_{x}(\boldsymbol {r}_7)+ \frac{1}{40} A^{\prime }_{z}(\boldsymbol {r}_7)- \frac{1}{80} A^{\prime }_{y}(\boldsymbol {r}_8)- \frac{1}{40} A^{\prime }_{z}(\boldsymbol {r}_8)\big]$\|
A_xyy	=	\|$\frac{1}{d_{\mathrm{c}}^2} \big[ + \frac{1}{8} A^{\prime }_{z}(\boldsymbol {r}_1)+ \frac{1}{4} A^{\prime }_{x}(\boldsymbol {r}_2)- \frac{1}{8} A^{\prime }_{z}(\boldsymbol {r}_3)+ \frac{1}{4} A^{\prime }_{x}(\boldsymbol {r}_4)- \frac{1}{8} A^{\prime }_{z}(\boldsymbol {r}_5)+ \frac{1}{4} A^{\prime }_{x}(\boldsymbol {r}_6)+ \frac{1}{8} A^{\prime }_{z}(\boldsymbol {r}_7)+ \frac{1}{4} A^{\prime }_{x}(\boldsymbol {r}_8)\big]$\|
A_xyz	=	\|$\frac{1}{d_{\mathrm{c}}^2} \big[ - \frac{1}{16} A^{\prime }_{y}(\boldsymbol {r}_1)- \frac{1}{16} A^{\prime }_{x}(\boldsymbol {r}_2)+ \frac{1}{16} A^{\prime }_{y}(\boldsymbol {r}_3)+ \frac{1}{16} A^{\prime }_{x}(\boldsymbol {r}_4)+ \frac{1}{16} A^{\prime }_{y}(\boldsymbol {r}_5)+ \frac{1}{16} A^{\prime }_{x}(\boldsymbol {r}_6)- \frac{1}{16} A^{\prime }_{y}(\boldsymbol {r}_7)- \frac{1}{16} A^{\prime }_{x}(\boldsymbol {r}_8)\big]$\|
A_xzz	=	\|$\frac{1}{d_{\mathrm{c}}^2} \big[ - \frac{1}{8} A^{\prime }_{z}(\boldsymbol {r}_1)+ \frac{1}{8} A^{\prime }_{z}(\boldsymbol {r}_3)+ \frac{1}{8} A^{\prime }_{z}(\boldsymbol {r}_5)- \frac{1}{8} A^{\prime }_{z}(\boldsymbol {r}_7)\big]$\|
A_yyy	=	\|$\frac{1}{d_{\mathrm{c}}^2} \big[ + \frac{1}{4} A^{\prime }_{y}(\boldsymbol {r}_2)+ \frac{1}{8} A^{\prime }_{z}(\boldsymbol {r}_2)+ \frac{1}{4} A^{\prime }_{y}(\boldsymbol {r}_4)- \frac{1}{8} A^{\prime }_{z}(\boldsymbol {r}_4)+ \frac{1}{4} A^{\prime }_{y}(\boldsymbol {r}_6)- \frac{1}{8} A^{\prime }_{z}(\boldsymbol {r}_6)+ \frac{1}{4} A^{\prime }_{y}(\boldsymbol {r}_8)+ \frac{1}{8} A^{\prime }_{z}(\boldsymbol {r}_8)\big]$\|
A_yyz	=	\|$\frac{1}{d_{\mathrm{c}}^2} \big[ - \frac{1}{80} A^{\prime }_{x}(\boldsymbol {r}_1)- \frac{1}{40} A^{\prime }_{z}(\boldsymbol {r}_1)- \frac{9}{80} A^{\prime }_{y}(\boldsymbol {r}_2)+ \frac{1}{40} A^{\prime }_{z}(\boldsymbol {r}_2)+ \frac{1}{80} A^{\prime }_{x}(\boldsymbol {r}_3)- \frac{1}{40} A^{\prime }_{z}(\boldsymbol {r}_3)+ \frac{9}{80} A^{\prime }_{y}(\boldsymbol {r}_4)+ \frac{1}{40} A^{\prime }_{z}(\boldsymbol {r}_4)+ \frac{1}{80} A^{\prime }_{x}(\boldsymbol {r}_5)$\|
		\|$- \frac{1}{40} A^{\prime }_{z}(\boldsymbol {r}_5)+ \frac{9}{80} A^{\prime }_{y}(\boldsymbol {r}_6)+ \frac{1}{40} A^{\prime }_{z}(\boldsymbol {r}_6)- \frac{1}{80} A^{\prime }_{x}(\boldsymbol {r}_7)- \frac{1}{40} A^{\prime }_{z}(\boldsymbol {r}_7)- \frac{9}{80} A^{\prime }_{y}(\boldsymbol {r}_8)+ \frac{1}{40} A^{\prime }_{z}(\boldsymbol {r}_8)\big]$\|
A_yzz	=	\|$\frac{1}{d_{\mathrm{c}}^2} \big[ - \frac{1}{8} A^{\prime }_{z}(\boldsymbol {r}_2)+ \frac{1}{8} A^{\prime }_{z}(\boldsymbol {r}_4)+ \frac{1}{8} A^{\prime }_{z}(\boldsymbol {r}_6)- \frac{1}{8} A^{\prime }_{z}(\boldsymbol {r}_8)\big]$\|
A_zzz	=	\|$\frac{1}{d_{\mathrm{c}}^2} \big[ + \frac{1}{16} A^{\prime }_{x}(\boldsymbol {r}_1)+ \frac{1}{8} A^{\prime }_{z}(\boldsymbol {r}_1)+ \frac{1}{16} A^{\prime }_{y}(\boldsymbol {r}_2)+ \frac{1}{8} A^{\prime }_{z}(\boldsymbol {r}_2)- \frac{1}{16} A^{\prime }_{x}(\boldsymbol {r}_3)+ \frac{1}{8} A^{\prime }_{z}(\boldsymbol {r}_3)- \frac{1}{16} A^{\prime }_{y}(\boldsymbol {r}_4)+ \frac{1}{8} A^{\prime }_{z}(\boldsymbol {r}_4)- \frac{1}{16} A^{\prime }_{x}(\boldsymbol {r}_5)$\|
		\|$+ \frac{1}{8} A^{\prime }_{z}(\boldsymbol {r}_5)- \frac{1}{16} A^{\prime }_{y}(\boldsymbol {r}_6)+ \frac{1}{8} A^{\prime }_{z}(\boldsymbol {r}_6)+ \frac{1}{16} A^{\prime }_{x}(\boldsymbol {r}_7)+ \frac{1}{8} A^{\prime }_{z}(\boldsymbol {r}_7)+ \frac{1}{16} A^{\prime }_{y}(\boldsymbol {r}_8)+ \frac{1}{8} A^{\prime }_{z}(\boldsymbol {r}_8)\big]$\|
\|$A_{x}(\boldsymbol {r})$\|	=	\|$A_{x}(\boldsymbol {r}_{\mathrm{cm}}) + A_{xxx}x^{\prime 2} + 2A_{xxy}x^{\prime }y^{\prime } + 2A_{xxz}x^{\prime }z^{\prime } + A_{xx}x^{\prime } + A_{xyy}y^{\prime 2} + 2A_{xyz}y^{\prime }z^{\prime } + A_{xy}y^{\prime } + A_{xzz}z^{\prime 2} + A_{xz}z^{\prime }$\|
\|$A_{y}(\boldsymbol {r})$\|	=	\|$A_{y}(\boldsymbol {r}_{\mathrm{cm}}) + A_{xxy}x^{\prime 2} + 2A_{xyy}x^{\prime }y^{\prime } + 2A_{xyz}x^{\prime }z^{\prime } + A_{xy}x^{\prime } + A_{yyy}y^{\prime 2} + 2A_{yyz}y^{\prime }z^{\prime } + A_{yy}y^{\prime } + A_{yzz}z^{\prime 2} + A_{yz}z^{\prime }$\|
\|$A_{z}(\boldsymbol {r})$\|	=	\|$A_{z}(\boldsymbol {r}_{\mathrm{cm}}) + A_{xxz}x^{\prime 2} + 2A_{xyz}x^{\prime }y^{\prime } + 2A_{xzz}x^{\prime }z^{\prime } + A_{xz}x^{\prime } + A_{yyz}y^{\prime 2} + 2A_{yzz}y^{\prime }z^{\prime } + A_{yz}y^{\prime } + A_{zzz}z^{\prime 2} + A_{zz}z^{\prime }$\|
\|$\boldsymbol {r}^{\prime }_{\mathrm{i}}/d_{\mathrm{c}}$\|	=	\|$\left [\,1\quad 0 \quad -1\right ] \quad \left [\,0\quad 1 \quad -1\right ]\quad \left [\,-1\quad 0 \quad -1\right ]\quad \left [\,0\quad -1 \quad -1\right ]\quad \left [\,1\quad 0 \quad 1\right ]\quad \left [\,0\quad 1 \quad 1\right ]\quad \left [\,-1\quad 0 \quad 1\right ]\quad \left [\,0\quad -1 \quad 1\right ]$\|

Second-order tidal tensor with four measure points at corners of a regular tetrahedron
A_xx	=	\|$\frac{1}{d_{\mathrm{c}}} \big[ + \frac{1}{2} A^{\prime }_{x}(\boldsymbol {r}_1)- \frac{1}{2} A^{\prime }_{x}(\boldsymbol {r}_2)\big]$\|
A_xy	=	\|$\frac{1}{d_{\mathrm{c}}} \big[ + \frac{1}{4} A^{\prime }_{y}(\boldsymbol {r}_1)- \frac{1}{4} A^{\prime }_{y}(\boldsymbol {r}_2)+ \frac{1}{4} A^{\prime }_{x}(\boldsymbol {r}_3)- \frac{1}{4} A^{\prime }_{x}(\boldsymbol {r}_4)\big]$\|
A_xz	=	\|$\frac{1}{d_{\mathrm{c}}} \big[ - \frac{\sqrt{2}}{8} A^{\prime }_{x}(\boldsymbol {r}_1)+ \frac{1}{4} A^{\prime }_{z}(\boldsymbol {r}_1)- \frac{\sqrt{2}}{8} A^{\prime }_{x}(\boldsymbol {r}_2)- \frac{1}{4} A^{\prime }_{z}(\boldsymbol {r}_2)+ \frac{\sqrt{2}}{8} A^{\prime }_{x}(\boldsymbol {r}_3)+ \frac{\sqrt{2}}{8} A^{\prime }_{x}(\boldsymbol {r}_4)\big]$\|
A_yy	=	\|$\frac{1}{d_{\mathrm{c}}} \big[ + \frac{1}{2} A^{\prime }_{y}(\boldsymbol {r}_3)- \frac{1}{2} A^{\prime }_{y}(\boldsymbol {r}_4)\big]$\|
A_yz	=	\|$\frac{1}{d_{\mathrm{c}}} \big[ - \frac{\sqrt{2}}{8} A^{\prime }_{y}(\boldsymbol {r}_1)- \frac{\sqrt{2}}{8} A^{\prime }_{y}(\boldsymbol {r}_2)+ \frac{\sqrt{2}}{8} A^{\prime }_{y}(\boldsymbol {r}_3)+ \frac{1}{4} A^{\prime }_{z}(\boldsymbol {r}_3)+ \frac{\sqrt{2}}{8} A^{\prime }_{y}(\boldsymbol {r}_4)- \frac{1}{4} A^{\prime }_{z}(\boldsymbol {r}_4)\big]$\|
A_zz	=	\|$\frac{1}{d_{\mathrm{c}}} \big[ - \frac{\sqrt{2}}{4} A^{\prime }_{z}(\boldsymbol {r}_1)- \frac{\sqrt{2}}{4} A^{\prime }_{z}(\boldsymbol {r}_2)+ \frac{\sqrt{2}}{4} A^{\prime }_{z}(\boldsymbol {r}_3)+ \frac{\sqrt{2}}{4} A^{\prime }_{z}(\boldsymbol {r}_4)\big]$\|
\|$A_{x}(\boldsymbol {r})$\|	=	\|$A_{x}(\boldsymbol {r}_{\mathrm{cm}}) + A_{xx}x^{\prime } + A_{xy}y^{\prime } + A_{xz}z^{\prime }$\|
\|$A_{y}(\boldsymbol {r})$\|	=	\|$A_{y}(\boldsymbol {r}_{\mathrm{cm}}) + A_{xy}x^{\prime } + A_{yy}y^{\prime } + A_{yz}z^{\prime }$\|
\|$A_{z}(\boldsymbol {r})$\|	=	\|$A_{z}(\boldsymbol {r}_{\mathrm{cm}}) + A_{xz}x^{\prime } + A_{yz}y^{\prime } + A_{zz}z^{\prime }$\|
\|$\boldsymbol {r}^{\prime }_{\mathrm{i}}/d_{\mathrm{c}}$\|	=	\|$\left [ 1\quad 0 \quad - \frac{\sqrt{2}}{2} \right ]\quad\left [ -1\quad 0 \quad - \frac{\sqrt{2}}{2} \right ]\quad\left [ 0\quad 1 \quad - \frac{\sqrt{2}}{2} \right ]\quad\left [ 0\quad -1 \quad - \frac{\sqrt{2}}{2} \right ]$\|
Third-order tidal tensor with eight measure points at corners of a cube
A_xx	=	\|$\frac{1}{d_{\mathrm{c}}} \big[ + \frac{1}{4} A^{\prime }_{x}(\boldsymbol {r}_1)- \frac{1}{4} A^{\prime }_{x}(\boldsymbol {r}_3)+ \frac{1}{4} A^{\prime }_{x}(\boldsymbol {r}_5)- \frac{1}{4} A^{\prime }_{x}(\boldsymbol {r}_7)\big]$\|
A_xy	=	\|$\frac{1}{d_{\mathrm{c}}} \big[ + \frac{1}{8} A^{\prime }_{y}(\boldsymbol {r}_1)+ \frac{1}{8} A^{\prime }_{x}(\boldsymbol {r}_2)- \frac{1}{8} A^{\prime }_{y}(\boldsymbol {r}_3)- \frac{1}{8} A^{\prime }_{x}(\boldsymbol {r}_4)+ \frac{1}{8} A^{\prime }_{y}(\boldsymbol {r}_5)+ \frac{1}{8} A^{\prime }_{x}(\boldsymbol {r}_6)- \frac{1}{8} A^{\prime }_{y}(\boldsymbol {r}_7)- \frac{1}{8} A^{\prime }_{x}(\boldsymbol {r}_8)\big]$\|
A_xz	=	\|$\frac{1}{d_{\mathrm{c}}} \big[ - \frac{1}{12} A^{\prime }_{x}(\boldsymbol {r}_1)+ \frac{1}{12} A^{\prime }_{z}(\boldsymbol {r}_1)- \frac{1}{12} A^{\prime }_{x}(\boldsymbol {r}_2)- \frac{1}{12} A^{\prime }_{x}(\boldsymbol {r}_3)- \frac{1}{12} A^{\prime }_{z}(\boldsymbol {r}_3)- \frac{1}{12} A^{\prime }_{x}(\boldsymbol {r}_4)+ \frac{1}{12} A^{\prime }_{x}(\boldsymbol {r}_5)+ \frac{1}{12} A^{\prime }_{z}(\boldsymbol {r}_5)+ \frac{1}{12} A^{\prime }_{x}(\boldsymbol {r}_6)$\|
		\|$+ \frac{1}{12} A^{\prime }_{x}(\boldsymbol {r}_7)- \frac{1}{12} A^{\prime }_{z}(\boldsymbol {r}_7)+ \frac{1}{12} A^{\prime }_{x}(\boldsymbol {r}_8)\big]$\|
A_yy	=	\|$\frac{1}{d_{\mathrm{c}}} \big[ + \frac{1}{4} A^{\prime }_{y}(\boldsymbol {r}_2)- \frac{1}{4} A^{\prime }_{y}(\boldsymbol {r}_4)+ \frac{1}{4} A^{\prime }_{y}(\boldsymbol {r}_6)- \frac{1}{4} A^{\prime }_{y}(\boldsymbol {r}_8)\big]$\|
A_yz	=	\|$\frac{1}{d_{\mathrm{c}}} \big[ - \frac{1}{12} A^{\prime }_{y}(\boldsymbol {r}_1)- \frac{1}{12} A^{\prime }_{y}(\boldsymbol {r}_2)+ \frac{1}{12} A^{\prime }_{z}(\boldsymbol {r}_2)- \frac{1}{12} A^{\prime }_{y}(\boldsymbol {r}_3)- \frac{1}{12} A^{\prime }_{y}(\boldsymbol {r}_4)- \frac{1}{12} A^{\prime }_{z}(\boldsymbol {r}_4)+ \frac{1}{12} A^{\prime }_{y}(\boldsymbol {r}_5)+ \frac{1}{12} A^{\prime }_{y}(\boldsymbol {r}_6)+ \frac{1}{12} A^{\prime }_{z}(\boldsymbol {r}_6)$\|
		\|$+ \frac{1}{12} A^{\prime }_{y}(\boldsymbol {r}_7)+ \frac{1}{12} A^{\prime }_{y}(\boldsymbol {r}_8)- \frac{1}{12} A^{\prime }_{z}(\boldsymbol {r}_8)\big]$\|
A_zz	=	\|$\frac{1}{d_{\mathrm{c}}} \big[ - \frac{1}{8} A^{\prime }_{z}(\boldsymbol {r}_1)- \frac{1}{8} A^{\prime }_{z}(\boldsymbol {r}_2)- \frac{1}{8} A^{\prime }_{z}(\boldsymbol {r}_3)- \frac{1}{8} A^{\prime }_{z}(\boldsymbol {r}_4)+ \frac{1}{8} A^{\prime }_{z}(\boldsymbol {r}_5)+ \frac{1}{8} A^{\prime }_{z}(\boldsymbol {r}_6)+ \frac{1}{8} A^{\prime }_{z}(\boldsymbol {r}_7)+ \frac{1}{8} A^{\prime }_{z}(\boldsymbol {r}_8)\big]$\|
A_xxx	=	\|$\frac{1}{d_{\mathrm{c}}^2} \big[ + \frac{1}{4} A^{\prime }_{x}(\boldsymbol {r}_1)+ \frac{1}{8} A^{\prime }_{z}(\boldsymbol {r}_1)+ \frac{1}{4} A^{\prime }_{x}(\boldsymbol {r}_3)- \frac{1}{8} A^{\prime }_{z}(\boldsymbol {r}_3)+ \frac{1}{4} A^{\prime }_{x}(\boldsymbol {r}_5)- \frac{1}{8} A^{\prime }_{z}(\boldsymbol {r}_5)+ \frac{1}{4} A^{\prime }_{x}(\boldsymbol {r}_7)+ \frac{1}{8} A^{\prime }_{z}(\boldsymbol {r}_7)\big]$\|
A_xxy	=	\|$\frac{1}{d_{\mathrm{c}}^2} \big[ + \frac{1}{4} A^{\prime }_{y}(\boldsymbol {r}_1)+ \frac{1}{8} A^{\prime }_{z}(\boldsymbol {r}_2)+ \frac{1}{4} A^{\prime }_{y}(\boldsymbol {r}_3)- \frac{1}{8} A^{\prime }_{z}(\boldsymbol {r}_4)+ \frac{1}{4} A^{\prime }_{y}(\boldsymbol {r}_5)- \frac{1}{8} A^{\prime }_{z}(\boldsymbol {r}_6)+ \frac{1}{4} A^{\prime }_{y}(\boldsymbol {r}_7)+ \frac{1}{8} A^{\prime }_{z}(\boldsymbol {r}_8)\big]$\|
A_xxz	=	\|$\frac{1}{d_{\mathrm{c}}^2} \big[ - \frac{9}{80} A^{\prime }_{x}(\boldsymbol {r}_1)+ \frac{1}{40} A^{\prime }_{z}(\boldsymbol {r}_1)- \frac{1}{80} A^{\prime }_{y}(\boldsymbol {r}_2)- \frac{1}{40} A^{\prime }_{z}(\boldsymbol {r}_2)+ \frac{9}{80} A^{\prime }_{x}(\boldsymbol {r}_3)+ \frac{1}{40} A^{\prime }_{z}(\boldsymbol {r}_3)+ \frac{1}{80} A^{\prime }_{y}(\boldsymbol {r}_4)- \frac{1}{40} A^{\prime }_{z}(\boldsymbol {r}_4)+ \frac{9}{80} A^{\prime }_{x}(\boldsymbol {r}_5)$\|
		\|$+ \frac{1}{40} A^{\prime }_{z}(\boldsymbol {r}_5)+ \frac{1}{80} A^{\prime }_{y}(\boldsymbol {r}_6)- \frac{1}{40} A^{\prime }_{z}(\boldsymbol {r}_6)- \frac{9}{80} A^{\prime }_{x}(\boldsymbol {r}_7)+ \frac{1}{40} A^{\prime }_{z}(\boldsymbol {r}_7)- \frac{1}{80} A^{\prime }_{y}(\boldsymbol {r}_8)- \frac{1}{40} A^{\prime }_{z}(\boldsymbol {r}_8)\big]$\|
A_xyy	=	\|$\frac{1}{d_{\mathrm{c}}^2} \big[ + \frac{1}{8} A^{\prime }_{z}(\boldsymbol {r}_1)+ \frac{1}{4} A^{\prime }_{x}(\boldsymbol {r}_2)- \frac{1}{8} A^{\prime }_{z}(\boldsymbol {r}_3)+ \frac{1}{4} A^{\prime }_{x}(\boldsymbol {r}_4)- \frac{1}{8} A^{\prime }_{z}(\boldsymbol {r}_5)+ \frac{1}{4} A^{\prime }_{x}(\boldsymbol {r}_6)+ \frac{1}{8} A^{\prime }_{z}(\boldsymbol {r}_7)+ \frac{1}{4} A^{\prime }_{x}(\boldsymbol {r}_8)\big]$\|
A_xyz	=	\|$\frac{1}{d_{\mathrm{c}}^2} \big[ - \frac{1}{16} A^{\prime }_{y}(\boldsymbol {r}_1)- \frac{1}{16} A^{\prime }_{x}(\boldsymbol {r}_2)+ \frac{1}{16} A^{\prime }_{y}(\boldsymbol {r}_3)+ \frac{1}{16} A^{\prime }_{x}(\boldsymbol {r}_4)+ \frac{1}{16} A^{\prime }_{y}(\boldsymbol {r}_5)+ \frac{1}{16} A^{\prime }_{x}(\boldsymbol {r}_6)- \frac{1}{16} A^{\prime }_{y}(\boldsymbol {r}_7)- \frac{1}{16} A^{\prime }_{x}(\boldsymbol {r}_8)\big]$\|
A_xzz	=	\|$\frac{1}{d_{\mathrm{c}}^2} \big[ - \frac{1}{8} A^{\prime }_{z}(\boldsymbol {r}_1)+ \frac{1}{8} A^{\prime }_{z}(\boldsymbol {r}_3)+ \frac{1}{8} A^{\prime }_{z}(\boldsymbol {r}_5)- \frac{1}{8} A^{\prime }_{z}(\boldsymbol {r}_7)\big]$\|
A_yyy	=	\|$\frac{1}{d_{\mathrm{c}}^2} \big[ + \frac{1}{4} A^{\prime }_{y}(\boldsymbol {r}_2)+ \frac{1}{8} A^{\prime }_{z}(\boldsymbol {r}_2)+ \frac{1}{4} A^{\prime }_{y}(\boldsymbol {r}_4)- \frac{1}{8} A^{\prime }_{z}(\boldsymbol {r}_4)+ \frac{1}{4} A^{\prime }_{y}(\boldsymbol {r}_6)- \frac{1}{8} A^{\prime }_{z}(\boldsymbol {r}_6)+ \frac{1}{4} A^{\prime }_{y}(\boldsymbol {r}_8)+ \frac{1}{8} A^{\prime }_{z}(\boldsymbol {r}_8)\big]$\|
A_yyz	=	\|$\frac{1}{d_{\mathrm{c}}^2} \big[ - \frac{1}{80} A^{\prime }_{x}(\boldsymbol {r}_1)- \frac{1}{40} A^{\prime }_{z}(\boldsymbol {r}_1)- \frac{9}{80} A^{\prime }_{y}(\boldsymbol {r}_2)+ \frac{1}{40} A^{\prime }_{z}(\boldsymbol {r}_2)+ \frac{1}{80} A^{\prime }_{x}(\boldsymbol {r}_3)- \frac{1}{40} A^{\prime }_{z}(\boldsymbol {r}_3)+ \frac{9}{80} A^{\prime }_{y}(\boldsymbol {r}_4)+ \frac{1}{40} A^{\prime }_{z}(\boldsymbol {r}_4)+ \frac{1}{80} A^{\prime }_{x}(\boldsymbol {r}_5)$\|
		\|$- \frac{1}{40} A^{\prime }_{z}(\boldsymbol {r}_5)+ \frac{9}{80} A^{\prime }_{y}(\boldsymbol {r}_6)+ \frac{1}{40} A^{\prime }_{z}(\boldsymbol {r}_6)- \frac{1}{80} A^{\prime }_{x}(\boldsymbol {r}_7)- \frac{1}{40} A^{\prime }_{z}(\boldsymbol {r}_7)- \frac{9}{80} A^{\prime }_{y}(\boldsymbol {r}_8)+ \frac{1}{40} A^{\prime }_{z}(\boldsymbol {r}_8)\big]$\|
A_yzz	=	\|$\frac{1}{d_{\mathrm{c}}^2} \big[ - \frac{1}{8} A^{\prime }_{z}(\boldsymbol {r}_2)+ \frac{1}{8} A^{\prime }_{z}(\boldsymbol {r}_4)+ \frac{1}{8} A^{\prime }_{z}(\boldsymbol {r}_6)- \frac{1}{8} A^{\prime }_{z}(\boldsymbol {r}_8)\big]$\|
A_zzz	=	\|$\frac{1}{d_{\mathrm{c}}^2} \big[ + \frac{1}{16} A^{\prime }_{x}(\boldsymbol {r}_1)+ \frac{1}{8} A^{\prime }_{z}(\boldsymbol {r}_1)+ \frac{1}{16} A^{\prime }_{y}(\boldsymbol {r}_2)+ \frac{1}{8} A^{\prime }_{z}(\boldsymbol {r}_2)- \frac{1}{16} A^{\prime }_{x}(\boldsymbol {r}_3)+ \frac{1}{8} A^{\prime }_{z}(\boldsymbol {r}_3)- \frac{1}{16} A^{\prime }_{y}(\boldsymbol {r}_4)+ \frac{1}{8} A^{\prime }_{z}(\boldsymbol {r}_4)- \frac{1}{16} A^{\prime }_{x}(\boldsymbol {r}_5)$\|
		\|$+ \frac{1}{8} A^{\prime }_{z}(\boldsymbol {r}_5)- \frac{1}{16} A^{\prime }_{y}(\boldsymbol {r}_6)+ \frac{1}{8} A^{\prime }_{z}(\boldsymbol {r}_6)+ \frac{1}{16} A^{\prime }_{x}(\boldsymbol {r}_7)+ \frac{1}{8} A^{\prime }_{z}(\boldsymbol {r}_7)+ \frac{1}{16} A^{\prime }_{y}(\boldsymbol {r}_8)+ \frac{1}{8} A^{\prime }_{z}(\boldsymbol {r}_8)\big]$\|
\|$A_{x}(\boldsymbol {r})$\|	=	\|$A_{x}(\boldsymbol {r}_{\mathrm{cm}}) + A_{xxx}x^{\prime 2} + 2A_{xxy}x^{\prime }y^{\prime } + 2A_{xxz}x^{\prime }z^{\prime } + A_{xx}x^{\prime } + A_{xyy}y^{\prime 2} + 2A_{xyz}y^{\prime }z^{\prime } + A_{xy}y^{\prime } + A_{xzz}z^{\prime 2} + A_{xz}z^{\prime }$\|
\|$A_{y}(\boldsymbol {r})$\|	=	\|$A_{y}(\boldsymbol {r}_{\mathrm{cm}}) + A_{xxy}x^{\prime 2} + 2A_{xyy}x^{\prime }y^{\prime } + 2A_{xyz}x^{\prime }z^{\prime } + A_{xy}x^{\prime } + A_{yyy}y^{\prime 2} + 2A_{yyz}y^{\prime }z^{\prime } + A_{yy}y^{\prime } + A_{yzz}z^{\prime 2} + A_{yz}z^{\prime }$\|
\|$A_{z}(\boldsymbol {r})$\|	=	\|$A_{z}(\boldsymbol {r}_{\mathrm{cm}}) + A_{xxz}x^{\prime 2} + 2A_{xyz}x^{\prime }y^{\prime } + 2A_{xzz}x^{\prime }z^{\prime } + A_{xz}x^{\prime } + A_{yyz}y^{\prime 2} + 2A_{yzz}y^{\prime }z^{\prime } + A_{yz}y^{\prime } + A_{zzz}z^{\prime 2} + A_{zz}z^{\prime }$\|
\|$\boldsymbol {r}^{\prime }_{\mathrm{i}}/d_{\mathrm{c}}$\|	=	\|$\left [\,1\quad 0 \quad -1\right ] \quad \left [\,0\quad 1 \quad -1\right ]\quad \left [\,-1\quad 0 \quad -1\right ]\quad \left [\,0\quad -1 \quad -1\right ]\quad \left [\,1\quad 0 \quad 1\right ]\quad \left [\,0\quad 1 \quad 1\right ]\quad \left [\,-1\quad 0 \quad 1\right ]\quad \left [\,0\quad -1 \quad 1\right ]$\|

Since all measure points can only obtain the long-range forces every P³T-kick step, if the binary forms in the middle of P³T-drift, it is not possible to construct the tidal-tensor field immediately. Besides, if the binary disrupt, the tidal-tensor filed also cannot provide the correct perturbation once the two components leave far away. This is the limitation of the tidal-tensor method. However, the purpose of the tidal-tensor method is to ensure the long-term cumulative effect of long-range perturbation is correctly treated. Thus, a short interval error within one Δt_L is not very serious.

6.4.2 Counterforce

6.4.2.1 Orbit-sampling method

The tidal-tensor method introduced above provide the correct perturbation to the internal motion of the binary, it is also necessary to ensure that the perturbers can obtain the consistent counterforce. If the perturbers obtain the forces from the two components of the binary at P³T-kick, only one random phase of the binary is used to evaluate the interaction. This cannot provide the correct orbit-averaged force from binaries. To solve this issue, another group of artificial particles can be created by sampling the binary orbit with an equal eccentricity anomaly interval (Fig. 4). For example, if the eccentric anomaly interval, |$\Delta \mathcal {E} = \pi /4$|⁠, 16 particles are created along the two orbits of binary components. The masses of these particles are weighted by the interval of mean anomaly:

$$\begin{eqnarray*} m_{\mathrm{orb},i,k} &= & m_{i} \frac{\Delta \mathcal {M}_{k}}{2\pi }\nonumber\\ \Delta \mathcal {M}_{k} &= & \Delta \mathcal {E} - e \left[ \sin \left(\mathcal {E}_{k} + \frac{\Delta \mathcal {E}}{2} \right) - \sin \left(\mathcal {E}_{k} - \frac{\Delta \mathcal {E}}{2} \right) \right] \nonumber\\ &=& \Delta \mathcal {E} - 2 e \cos \left(\mathcal {E}_{k} \right) \sin \left(\frac{\Delta \mathcal {E}}{2} \right). \end{eqnarray*}$$

(29)

where |$\mathcal {M}_{k}$| and |$\mathcal {E}_{k}$| are the mean anomaly and the eccentric anomaly at the point of particle k, respectively. In this case, the particle mass approximately represents the orbit-average duration of the two-components at each |$\Delta \mathcal {E}$|⁠.

Pseudo-particle multipole method:

In the orbit-sampling method, at least eight sample particles are needed to reasonably represent the orbits of binaries. It is rather expensive since the number of sample particles per binary is large. Kawai & Makino (2001) introduce the pseudo-particle multipole method that the quadrupole moment of a particle group can be represented by only three pseudo-particles. The quadrupole tensor of N particles can be described by

$$\begin{eqnarray*} \mathcal {A} = \sum _{i=1}^{N} {m_i \boldsymbol {r}_i \otimes \boldsymbol {r}_i}, \end{eqnarray*}$$

(30)

where ⊗ is tensor production. The corresponding traceless form is

$$\begin{eqnarray*} \mathcal {A^{\prime }} = \frac{3}{2} \mathcal {A} - \frac{1}{2} T_{\mathrm{r}}(\mathcal {A}) = \frac{3}{2} \sum _{i=1}^{N} {m_i \boldsymbol {r}_i \otimes \boldsymbol {r}_i} - \frac{1}{2} \sum _{i=1}^{N} {m_i \boldsymbol {r}_i^2}, \end{eqnarray*}$$

(31)

where the term with |$\boldsymbol {r}_i^2$| is subtracted from the the diagonal elements of the matrix by |$\boldsymbol {r}_i \otimes \boldsymbol {r}_i$|⁠.

The orbit average of the binary motion can be treated as a continue distribution of mass along the orbits of two components. Thus, we can also derive the analytic formulae of its quadrupole moment. In the coordinate systems of the binary orbital plane where the three Delaunay’s elements are zero, the relative position vector has the form depending on |$\mathcal {E}$| as

$$\begin{eqnarray*} \Delta \boldsymbol {r} = \left[{\begin{array}{*{10}c}a \left(\cos {\mathcal {E}} - e \right) & a \sqrt{1 - e^{2}} \sin {\mathcal {E}} & 0\end{array}}\right]. \end{eqnarray*}$$

(32)

The two component position vectors have a relation to Δr by

$$\begin{eqnarray*} \boldsymbol {r}_1 &=& -\frac{m_2}{m_1+m_2} \Delta \boldsymbol {r}\nonumber\\ \boldsymbol {r}_2 &=& \frac{m_1}{m_1+m_2} \Delta \boldsymbol {r}, \end{eqnarray*}$$

(33)

where m₁ and m₂ are masses of two components. Put equations (32) and (33) into equation (30), we can obtain |$\mathcal {A}$| of the binary as a function of |$\mathcal {E}$|⁠. When the two components pass one full orbit, |$\mathcal {E}$| changes from 0 to 2π and t changes from 0 to P (period). The orbital average of the quadrupole moment should integrate one period of |$\mathcal {A}(\mathcal {E})$|⁠:

$$\begin{eqnarray*} \langle \mathcal {A} \rangle = \frac{1}{P} \int _0^{P} \mathcal {A}(\mathcal {E}) \mathrm{ d}t. \end{eqnarray*}$$

(34)

The differentials of t and |$\mathcal {E}$| has the relation:

$$\begin{eqnarray*} \mathrm{d}t = \frac{P}{2 \pi }(1 - e \cos {\mathcal {E}}) \mathrm{d}\mathcal {E}. \end{eqnarray*}$$

(35)

Replace dt by |$\mathrm{d}\mathcal {E}$| and do the integration, we can obtain the final form:

$$\begin{eqnarray*} \langle \mathcal {A} \rangle = \mu a^2 \left[{\begin{array}{ccc}2 e^{2} + \frac{1}{2} & \quad 0 & \quad 0\\ 0 & \quad \frac{1}{2} - \frac{e^{2}}{2} & \quad 0\\ 0 & \quad 0 & \quad 0\end{array}}\right], \end{eqnarray*}$$

(36)

where μ is reduced mass, m₁m₂/(m₁ + m₂).

We choose the coordinate system where |$\langle \mathcal {A} \rangle$| becomes traceless in order to use the pseudo-particle multipole method:

$$\begin{eqnarray*} \langle \mathcal {A^{\prime }} \rangle &= &\frac{3}{2}\langle \mathcal {A} \rangle - \frac{1}{2} T_{\mathrm{r}}(\langle \mathcal {A} \rangle) \nonumber\\ &= & \frac{1}{4} \mu a^2 \left[{\begin{array}{*{10}c}9 e^{2} + 1 & \quad 0 & \quad 0\\ 0 & \quad 1 - 6 e^{2} & \quad 0\\ 0 & \quad 0 & \quad - 3 e^{2} - 2\end{array}}\right] \end{eqnarray*}$$

(37)

Using equations (3), (5), and (6) in Kawai & Makino (2001), the three pseudo-particles with the equal mass of (m₁ + m₂)/3 are distributed at:

$$\begin{eqnarray*} \boldsymbol {r}_{\mathrm{p,1}} &= & a\sqrt{\frac{\mu }{m_{1} + m_{2}}} \left[{\begin{array}{*{10}c}0 & \sqrt{1-e^{2}} & 0\end{array}}\right]\nonumber\\ \boldsymbol {r}_{\mathrm{p,2}} &= & a\sqrt{\frac{\mu }{m_{1} + m_{2}}} \left[{\begin{array}{*{10}c}\sqrt{3 e^{2} + \frac{3}{4}} & - \sqrt{\frac{1-e^{2}}{4}} & 0\end{array}}\right]\nonumber\\ \boldsymbol {r}_{\mathrm{p,3}} &= & a\sqrt{\frac{\mu }{m_{1} + m_{2}}} \left[{\begin{array}{*{10}c}-\sqrt{3 e^{2} + \frac{3}{4}} & - \sqrt{\frac{1-e^{2}}{4}} & 0\end{array}}\right]. \end{eqnarray*}$$

(38)

These positions refer to the rest frame of the binary orbital plane. To obtain the correct direction of the orbital plane in the original frame, we need to multiply equation (38) by the rotational matrix based on the Delaunay’s elements.

6.4.3 Test

To confirm that the artificial particle algorithm can provide the correct perturbation and counterforce, we test a triple system with the initial condition listed in Table 2. We use three methods to integrate the motion of the system. Orbits integrated by the accurate SDAR method is used as a reference (names as the SDAR-REF model). Models using the P³T method with no artificial particles (no-TT), second- (TT-2), and third-order (TT-3) tidal-tensor methods are compared. The perturber is outside the changeover region of the binary. Thus the outer orbit is integrated by the leapfrog method. The inner binary is integrated by the SDAR method. Δt_L = 0.00390625 for the P³T method. The ratio between the binary period (T_{bin, in}) and Δt_L is about 0.51. Thus in the no-TT model, the long-range force is evaluated once every two T_{bin, in}. We also add a model of a binary with the initial condition the same as that of the outer binary in Table 2 (named as the B-out model). The leapfrog integrator with the same step size is used. The pseudo-particle multipole method is used for both TT-2 and TT-3 methods.

Table 2.

Open in new tab

The initial condition of the hierarchical triple system for testing the artificial particle algorithm. m_p and m_s are the masses of the primary and the secondary of the inner and outer binaries. The values are shown in the scale-free unit with the gravitational constant, G = 1. a is semimajor axis. e is eccentricity. |$\mathcal {I}$|⁠, ϕ, and ψ are Delaunay’s elements. |$\mathcal {E}$| is eccentric anomaly. T_bin is period.

	m_p	m_s	a	e	\|$\mathcal {I}$\|	ϕ	ψ	\|$\mathcal {E}$\|	T_bin
In	0.00900	0.00100	0.900	0.900	1.500	0.100	0.200	3.14	1.97 × 10⁻³
Out	1.00	0.01	1.500	0.0100	0.100	0.100	0.100	1.50	11.5

	m_p	m_s	a	e	\|$\mathcal {I}$\|	ϕ	ψ	\|$\mathcal {E}$\|	T_bin
In	0.00900	0.00100	0.900	0.900	1.500	0.100	0.200	3.14	1.97 × 10⁻³
Out	1.00	0.01	1.500	0.0100	0.100	0.100	0.100	1.50	11.5

Table 2.

Open in new tab

	m_p	m_s	a	e	\|$\mathcal {I}$\|	ϕ	ψ	\|$\mathcal {E}$\|	T_bin
In	0.00900	0.00100	0.900	0.900	1.500	0.100	0.200	3.14	1.97 × 10⁻³
Out	1.00	0.01	1.500	0.0100	0.100	0.100	0.100	1.50	11.5

	m_p	m_s	a	e	\|$\mathcal {I}$\|	ϕ	ψ	\|$\mathcal {E}$\|	T_bin
In	0.00900	0.00100	0.900	0.900	1.500	0.100	0.200	3.14	1.97 × 10⁻³
Out	1.00	0.01	1.500	0.0100	0.100	0.100	0.100	1.50	11.5

The evolution of orbital elements are show in Fig. 5. By selecting the Cartesian coordinate system (x–y–z), the three Delaunay’s elements (angles) are (e.g. Wang et al. 2020):

|$\mathcal {I}$|⁠: inclination.
ϕ: longitude of the ascending node.
ψ: argument of periapsis.

Except ψ_in, both TT-2 and TT-3 models agree well with the SDAR-REF model on the evolution of inner orbital elements while the no-TT model does not. This suggests that the tidal-tensor method indeed provides a better result for the secular motions of the inner binary.

Figure 5.

The evolution of orbital parameters of the inner and outer binaries for the triple system using different integration methods. For a and time, the scale-free units are used. The black colour represents the accurate result using the SDAR method as a reference. The green and blue colours represent the TT and no-TT models, respectively. The purple colour represents the B-out model. For each panel, we apply the scientific notation in the plotting style of y-axis: the actual values of y-axis are calculated by y_tick × scale + y_offset, where y_tick is the value shown along the the y-axis, scale is the first value shown above the y-axis (scale = 1 in default) and y_offset is the second value following the symbol ‘+’ (y_offset = 0 in default).

Open in new tab Download slide

On the other hand, the TT-3 model also provides a correct evolution of a_out, ϕ_out, and i_out. But TT-2 model cannot reproduce the correct oscillation of the outer orbit and the evolution overlaps with the B-out model. This suggests that the second-order tidal-tensor method cannot properly reproduce the secular motion of the outer orbit. The no-TT model disagree with all others.

However, all three models show large differences (oscillation) of e_out and ψ_out compared with those of the SDAR-REF model, but agree well with those of the B-out models. The B-out model is a simple binary motion, thus e_out and ψ_out should not evolve in reality. This indicates that the artificial oscillation is caused by the inaccuracy of the leapfrog method for the outer orbit.

This result suggests that the third-order tidal-tensor algorithm is a good choice to represent a reasonable secular motions of both inner and outer binaries. The low accuracy of the leapfrog method results in a relative error of 10⁻⁶ in the evolution of e_out but the averaged value can converge to the correct one.

How the counterforces are calculated does not affect the orbital motions as shown in Fig. 5. The results are identical within the resolution thus we does not show. But the linear momentum conservation is sensitive to it. In Fig. 6, the evolution of the x-component in the centre-of-the-mass velocity of the triple is shown. Four models with different ways to calculate the counter forces are compared. As the number of sample particles decrease, the error (oscillation) is more obvious. The PM method provide a similar level of error as eight sample particles. Thus, it is a more efficient choice for the counter force if the high-order momentum is not important.

Figure 6.

The evolution of the x-component of the centre-of-the-mass velocity of the triple system. The OS-cm model uses only the centre-of-the-mass of the inner binary to calculate the force to the perturber. The OS-4 and OS-8 models use orbit-sampling methods with 4 and 8 sample particles. The PM model uses three pseudo-particles.

Open in new tab Download slide

6.5 Parallelization algorithm

Fig. 2 shows how the hybrid parallelization is implemented. The domain decomposition, exchanging particles between mpi processes, PT construction, and long-range force calculations are handled by fdps. The mpi and openmp are used together in fdps with a well-controlled load balance. The clustering (Section 6.3) is also parallelized by using mpi and openmp methods (Iwasawa et al. 2017). The long-range force calculation can be accelerated by both the SIMD instruction set of X86 architecture and the NVIDIA GPU using the CUDA programming environment (Iwasawa et al. 2020). The AVX, AVX2, and AVX-512 instructions are used for the SIMD accelerated implementation.

7 BENCHMARK

We carry out benchmarks to compare the long-term evolution of star clusters with and without binaries, and show the computing performances by using the petar and nbody6++gpu codes. A scaling test is also performed on the Cray XC50 supercomputer.

7.1 Comparison with nbody6++gpu

7.1.1 Performance on GPU-based desktop

We compare the performances of simulating star clusters with different numbers of particles (N = 1000, 10 000, 100 000, and 1000 000) by using the petar and nbody6++gpu codes. The name of models are listed in Table 3. The initial mass function (IMF) of Kroupa (2001) ranging from 0.08 to 40 M_⊙ is applied. The Plummer model is used to generate the positions and velocities. The system is in virial equilibrium and the virial radius is 1.0 in the Hénon (1971) unit (hereafter named as NB unit). No tidal field and no stellar evolution are used in order to have a well-controlled comparison of the dynamics. For each N, we have two models with and without primordial binaries. In models with binaries, we use the period and eccentrmaricity distributions from Kroupa (1995a, b). Initially all particles are in binaries. The period distribution of this model has the form:

$$\begin{eqnarray*} \mathcal {F}(T_{\mathrm{bin}}) = 2.5 \frac{\log _{10}{T_{\mathrm{bin}}} - 1}{45 + (\log _{10}{T_{\mathrm{bin}}} - 1)^2}, \end{eqnarray*}$$

(39)

where the maximum and minimum of log₁₀T_bin are truncated at 8.43 and 1.00 (in the unit of days), respectively. The eccentricity follows the thermal distribution. The orbits of some tight binaries are adjusted to avoid stellar collision (pre-main-sequence eigenevolution; Kroupa 1995b). Thus, a wide range of T_bin and eccentricities are covered, which is very suitable for testing the code. We assume that the initial half-mass radii of all models are 1.0 pc and the periods are scaled to be in the NB time unit.

Table 3.

Open in new tab

The optimized sets of input parameters that influence the performance of nbody6++gpu and petar codes. N is the total number of particles; N_b is the total number of binaries; R_KS and Δt_KS are the criterion to switch on the KS regularization in nbody6++gpu; and Δt_L and r_{out, ref} are the tree time-step and the outer boundary reference of the changeover function for petar. NB units are used for distance and time.

Model	N (10³)	N_{b (}10³)	R_KS (10⁻³)	Δt_KS (10⁻⁵)	Δt_L	r_{out, ref} (10⁻²)
N1k	1	0	1	3.2	1/128	3.22490
N10k	10	0	0.464	1	1/256	1.61101
N100k	100	0	0.215	0.3	1/512	0.801104
N1m	1	0	0.1	0.1	1/1024	0.399470
N1kb	1	0.5	1	3.2	1/256	1.62651
N10kb	10	5	0.764	1	1/512	0.802076
N100kb	100	50	0.515	0.3	1/1024	0.400471
N1mkb	1000	500	0.4	0.1	1/2048	0.199039
The shared parameters	nbody6++gpu		\|$\eta _{\mathrm{r}} = \eta _{\mathrm{i}} = 0.1\sqrt{2}$\|⁠; N_{b, p} = 50
	petar		η = 0.1; θ = 0.3, 0.5; \|$\dfrac{r_{\mathrm{out,ref}}}{r_{\mathrm{in,ref}}} = 10.0$\|⁠; third tidal-tensor; PM

Model	N (10³)	N_{b (}10³)	R_KS (10⁻³)	Δt_KS (10⁻⁵)	Δt_L	r_{out, ref} (10⁻²)
N1k	1	0	1	3.2	1/128	3.22490
N10k	10	0	0.464	1	1/256	1.61101
N100k	100	0	0.215	0.3	1/512	0.801104
N1m	1	0	0.1	0.1	1/1024	0.399470
N1kb	1	0.5	1	3.2	1/256	1.62651
N10kb	10	5	0.764	1	1/512	0.802076
N100kb	100	50	0.515	0.3	1/1024	0.400471
N1mkb	1000	500	0.4	0.1	1/2048	0.199039
The shared parameters	nbody6++gpu		\|$\eta _{\mathrm{r}} = \eta _{\mathrm{i}} = 0.1\sqrt{2}$\|⁠; N_{b, p} = 50
	petar		η = 0.1; θ = 0.3, 0.5; \|$\dfrac{r_{\mathrm{out,ref}}}{r_{\mathrm{in,ref}}} = 10.0$\|⁠; third tidal-tensor; PM

Table 3.

Open in new tab

Model	N (10³)	N_{b (}10³)	R_KS (10⁻³)	Δt_KS (10⁻⁵)	Δt_L	r_{out, ref} (10⁻²)
N1k	1	0	1	3.2	1/128	3.22490
N10k	10	0	0.464	1	1/256	1.61101
N100k	100	0	0.215	0.3	1/512	0.801104
N1m	1	0	0.1	0.1	1/1024	0.399470
N1kb	1	0.5	1	3.2	1/256	1.62651
N10kb	10	5	0.764	1	1/512	0.802076
N100kb	100	50	0.515	0.3	1/1024	0.400471
N1mkb	1000	500	0.4	0.1	1/2048	0.199039
The shared parameters	nbody6++gpu		\|$\eta _{\mathrm{r}} = \eta _{\mathrm{i}} = 0.1\sqrt{2}$\|⁠; N_{b, p} = 50
	petar		η = 0.1; θ = 0.3, 0.5; \|$\dfrac{r_{\mathrm{out,ref}}}{r_{\mathrm{in,ref}}} = 10.0$\|⁠; third tidal-tensor; PM

Model	N (10³)	N_{b (}10³)	R_KS (10⁻³)	Δt_KS (10⁻⁵)	Δt_L	r_{out, ref} (10⁻²)
N1k	1	0	1	3.2	1/128	3.22490
N10k	10	0	0.464	1	1/256	1.61101
N100k	100	0	0.215	0.3	1/512	0.801104
N1m	1	0	0.1	0.1	1/1024	0.399470
N1kb	1	0.5	1	3.2	1/256	1.62651
N10kb	10	5	0.764	1	1/512	0.802076
N100kb	100	50	0.515	0.3	1/1024	0.400471
N1mkb	1000	500	0.4	0.1	1/2048	0.199039
The shared parameters	nbody6++gpu		\|$\eta _{\mathrm{r}} = \eta _{\mathrm{i}} = 0.1\sqrt{2}$\|⁠; N_{b, p} = 50
	petar		η = 0.1; θ = 0.3, 0.5; \|$\dfrac{r_{\mathrm{out,ref}}}{r_{\mathrm{in,ref}}} = 10.0$\|⁠; third tidal-tensor; PM

The simulations are performed on a GPU-based desktop computer. The computer is equipped with one AMD RYZEN 3970X CPU (3.7 GHz) which includes 32 physical cores, one NVIDIA RTX 2080Ti GPU and 4-channel DDR4-3200 SDRAM memories. Both the codes use the hybrid parallelization methods containing mpi,openmp, AVX2, and GPU (CUDA). For small numbers of particles, using all CPU cores causes the issue of load balance and overshooting of communication. Thus, the number of cores for each simulations are adjusted to obtain the best performance.

One important point is that the performance of the two codes are sensitive to different inputting parameters. We adjust the parameters for each model in order to optimize the performance. The important ones are listed in Table 3.

For the Hermite integrator of both two codes, the time-step is calculated by (Aarseth 2003; Oshino et al. 2011)

$$\begin{eqnarray*} \Delta t_{\mathrm{H},i}= \mathrm{min}\left(\eta \sqrt{\tfrac{\sqrt{\left|\boldsymbol {A}^{(0)}_i\right|^2 + A^2_{0}} \left|\boldsymbol {A}^{(2)}_i\right| + \left|\boldsymbol {A}^{(1)}_i\right|^2}{\left|\boldsymbol {A}^{(0)}_i\right| \left|\boldsymbol {A}^{(3)}_i\right| + \left|\boldsymbol {A}^{(2)}_i\right|^2} }, \Delta t_{\mathrm{H,max}}\right), \end{eqnarray*}$$

(40)

where |$\boldsymbol {A}^{(j)}_i$| is the acceleration of a particle (i) and its j-order time derivatives, A₀ is the constant coefficient for safety, and Δt_{H, max} is input parameter. In nbody6++gpu, A₀ is calculated only for particles having no neighbours by assuming an artificial particle locating at the centre of the system with the mass of 0.01〈m〉. In petar, |$A_{0} = 0.1 \langle m \rangle / r_{\mathrm{nb},i}^2$| where 〈m〉 is the local-averaged mass of particles. Since both codes use the block-time-step method, the calculated Δt_{H, i} is adjusted to the integer power of 0.5. In nbody6++gpu, η_i for neighbours and η_r for distance particles are set to |$0.1\sqrt{2}$|⁠.

Another three major parameters in nbody6++gpu that determine the performance are the expected number of neighbours, N_{b, p}, the separation of a neighbour pair, R_KS, and the Hermite time-step to trigger on KS regularization, Δt_KS. In our models, N_{b, p} = 50. One important tip is that the KS criterion not only influences the performance, but also has a big impact on the accuracy of integrating the internal motions of binaries. A strict criterion can avoid a too frequent switch of KS but increase the error of integrating orbits (especially for eccentricities) of wide binaries. The choices of R_KS and Δt_KS are shown in Table 3.

In petar, the Hermite time-step coefficient does not significantly influences the performance. We use the value of 0.1. The major impact to the performance comes from θ, Δt_L and the changeover function. We compare θ of 0.3 and 0.5 with quadrupole moment of the PT force for all models. The energy error of the P³T method depends on the combination of Δt_L and the changeover function (Iwasawa et al. 2015). When Δt_L is given, we determine the reference of the outer changeover boundary as:

$$\begin{eqnarray*} r_{\mathrm{out,ref}}= 10 \Delta t_{\mathrm{L}}\sigma _{\mathrm{1D}} \end{eqnarray*}$$

(41)

where σ_1D is the 1D velocity dispersion of the system. We fix r_{out, ref}/r_{in, ref} to 10.0. The third-order tidal-tensor and pseudo-particle multipole methods are used.

By checking a group of Δt_L, we can find a balanced combination of the parameters to obtain the best performance. The computational cost of the long-range force and kick velocity per P³T-kick step is roughly constant. Thus, the wall clock time of one NB time unit (T_W) is anticorrelated with Δt_L. In contrast, when Δt_L is reduced, the sizes of clusters for short-range interactions are smaller due to a shorter r_{out, ref}. This gives a better load-balance and a less computational cost in the P³T-drift and the clustering. Therefore, one balance Δt_L can be found to achieve the best performance. In petar, we first estimate r_{out, ref} by

$$\begin{eqnarray*} r_{\mathrm{out,ref}}= 0.1 \frac{G M}{3 N^{1/3} \sigma _{\mathrm{1D}}^2}, \end{eqnarray*}$$

(42)

where M is the total mass of the system and G is gravitational constant. Then using equation (41), we obtain the first guess of Δt_L and check the best value around it. One example of this check process for the N10k model is shown in Fig. 7. When Δt_L increases, T_W(P³T-kick) and T_W(clustering) decrease while T_W(P³T-drift) increases. The balanced Δt_L = 1/256. Above this value, T_W(P³T-drift) increases significantly because a very large cluster with 2000 members forms due to a set of large neighbour radii. This completely kills the load balance of paralellization and the benefit of the P³T method.

Figure 7.

The wall clock times per NB time unit (T_W) depending on Δt_L for the N10k model performed by using petar. Colours represent the different parts of the computing (total, P³T-kick, P³T-drift, and clustering). The minimum value of the total T_W indicates the balanced choice of Δt_L for the best performance.

Open in new tab Download slide

The results of T_W and the relative energy error (|dE/E|) per NB time unit are shown in Fig. 8. For each model, the maximum 10 NB time-steps are simulated to reduce fluctuation. In both codes, the models with binaries sometimes have a very large T_W in a short time interval due to the events of few-body interaction and the existence of semistable (perturbed) few-body systems. In our analysis, we remove these data since they do not represent the normal performance of the codes. Besides, we remove the first one steps in some models performed by nbody6++gpu if their T_W are very different from that of other steps. In the long-term simulations, the averaged T_W may be larger if hierarchical systems frequently form. This is easier to happen when N is small, because the boundary of wide binaries (equation 1) is large thus more space is available to form hierarchical systems.

Figure 8.

The comparison of wall clock time, T_W (upper panel), and relative energy error, |dE/E| (lower panel), for models listed in Table 3. For the data of petar, ‘-T03’ and ‘-T05’ indicate the opening angles of 0.3 and 0.5, respectively. The values of T_W in sec are also printed near the data points. T_W takes the average value of a few steps to avoid fluctuation. |dE/E| of each step (maximum 10 steps) is shown. The left- and right-hand panels show the models without and with binaries, respectively.

Open in new tab Download slide

For models without binaries, the two codes have a similar performance for N of 1000 and 10 000. The differences appears when N becomes large. T_W of petar well follows the scaling line of Nlog N. However, the result of nbody6++gpu scale differently for small and large N. It is expected that T_W is proportional to N^7/4, which is estimated by including the AC neighbour scheme. Only when N is large, the result follows N^7/4, while when N is small, it follows Nlog N. The reason is probably due to the scaling of parallelization. When N is too small, the parellelization of multi cores does not help to improve the performance. As N increases, the computational cost of direct N-body method significantly increase, thus the parallelization efficiency (floating point operations versus peak performance of CPU and GPU) also increases. Once the efficiency reaches the maximum, the scaling begins to follow N^7/4. In million-body case, petar provides a five times faster performance than that of nbody6++gpu.

The significantly difference of the performance is shown in the models with large number of primordial binaries (⁠|$100{{\ \rm per\ cent}}$|⁠). For N of 1000, the two codes have a similar performance, while the difference starts from N = 10000. petar code gives a much faster performance compared to nbody6++gpu. In million-body case, the performance difference is 12 times. As discussed in Section 3.6, nbody6++gpu does not parallize the KS regularzation and the switching of KS is very expensive. This significantly reduces the performance. In contrast, the behaviour of petar is much better. Especially, the actual computing time of million-body model with full of binaries by using petar is even faster than the million-body model without binaries performed by nbody6++gpu. This result show the great advantage of petar for large N-body models with many binaries.

The relative energy errors, dE/E, are also compared in the bottom panels of Fig. 8. The fluctuation is large for both codes. In the models with no binaries, nbody6++gpu gives a systematically better energy conservation for N < 10⁵. The small θ help to reduce the errors while the performance does not significantly change. The wide range of mass spectrum is one important reason for the larger error. With equal masses, the error is much smaller (not shown here). The major contribution of the error in the models performed by petar comes from the P³T-kick, due to the approximation of the long-range force.

For a particle group, the changeover radii of equation (12) takes care of the mass-dependent long-range tidal perturbation. However, the mass-dependent error still exists in the centre-of-the-mass motion of the group due to the low accuracy of the leapfrog method (see Section 6.4.3). Especially, the gravitational focusing enhances the error of the massive objects if the changeover radii is not sufficient large. To avoid this, we need an (m_i/〈m〉)^1/2 dependent changeover radii, which can cause the formation of large clusters and the load balance can become bad. One possible solution is to implement the nested openmp parallelization for the Hermite and SDAR integrations inside the clusters. This cannot fully solve the load-balance issue but may provide a better performance than reducing Δt_L or θ in some conditions, especially when a very massive object like SMBH exists.

Nevertheless, both the codes show an increasing cumulative errors of a comparable level in the long-term simulation, while the dynamical behaviour seems to be consistent (Section 7.1.2). If the global trend is not bad, even the energy error suddenly increases in one step, it only indicates that one specific event (in most cases this is a few-body interaction) is not well treated, but the global evolution is still statistically correct.

In the models with binaries, both codes have a similar level of errors. However, we should notice that the definition of error with binaries are different in nbody6++gpu and petar. In nbody6++gpu, the total energy of the systems are included in the energy conservation check, thus the energy error is completely dominated by the binary with the highest binding energy. In petar, we follow the definition of SD energy used in the sdar code (Wang et al. 2020), where the energy of binaries are scaled down by the SD factor. Thus, the energy error reflects more about the global behaviour of the system.

7.1.2 Long-term evolution

Lagrangian and core radii:

We check the long-term evolution including the post-core-collapse stage of a star cluster without binaries (N250k model). Initially, the cluster contains 250 000 stars with the IMF of Kroupa (2001) ranging from 0.08 to 40 M_⊙. For nbody6++gpu, the initial N_{b, p} = 200, R_KS = 1.0 × 10⁻⁵, and Δt_KS = 1.0 × 10⁻⁷, while the values are adjusted in the middle of the simulation. For petar, Δt_L = 1/1024, r_{out, ref} = 0.00398365 and θ = 0.5.

The evolution of the Lagrangian and core radii of the N250k model is shown in Fig. 9. We can see that the results of the two codes well overlap each others. The core collapse finishes around 300 NB time unit. Both codes give the same core-collapse time and the evolution of the core radius. This result indicates that petar can provide the same long-term evolution of the global density profile as the direct N-body method.

$The evolution of the Lagrangian and core radii (N250k model) by using petar (black solid curves) and nbody6++gpu (red dashed curves). From the bottom to the top, the curves represent the core radius, $10{{\ \rm per\ cent}}$, $30{{\ \rm per\ cent}}$, $50{{\ \rm per\ cent}},\, 70{{\ \rm per\ cent}}$ Lagrangian radii, respectively.$

Figure 9.

The evolution of the Lagrangian and core radii (N250k model) by using petar (black solid curves) and nbody6++gpu (red dashed curves). From the bottom to the top, the curves represent the core radius, |$10{{\ \rm per\ cent}}$|⁠, |$30{{\ \rm per\ cent}}$|⁠, |$50{{\ \rm per\ cent}},\, 70{{\ \rm per\ cent}}$| Lagrangian radii, respectively.

Open in new tab Download slide

Escapers:

We compare the properties of escapers for the N250k model in Fig. 10. Two codes provide consistent time and mass distribution of escapers. We can also clearly identify the low-mass escapers caused by the relaxation-driven evaporation and the high-mass escapers ejected by strong few-body interactions in the core. The latter appears after the core collapse. The two components of binary escapers have similar masses and are the most massive objects in the cluster. Both codes provide a similar number of binary escapers (6 and 7) with very similar masses and consistent distribution of a and e. This result indicates that petar and nbody6++gpu well agree on the properties of escapers for the model without primordial binaries.

Figure 10.

The properties of escapers for the N250k model performed by using petar and nbody6++gpu. The upper panel: time and mass distribution of single and binary escapers. Triangles indicate the two components of each binary escaper. The mass is scaled to the unit of solar mass. The lower panel: semimajor axis (a) and eccentricity (e) of binary escapers. The areas of markers are proportional to the masses of binaries.

Open in new tab Download slide

Performance and energy error:

In Fig. 11, the long-term behaviour of the performance and the energy error of the N250k model are compared. At the beginning, petar provides a twice faster performance than that of nbody6++gpu. After core collapse, the performance of petar slows down due to a larger cost of P³T-drift (yellow curves). This is expected since a dense core results in a larger size of particle cluster and smaller Hermite time-steps. There are a few sharp peaks, which are due to an expensive calculation of a few-body interaction. Such behaviour can happen when a tight multiple system (e.g. triple, quadruple) forms and stay for a while before a disruption by close encounters. Sometimes, the criterion for switching on and off the SDAR method or the initial integration step size are not well adjusted to the specific condition. Thus, a larger error or an expensive calculation can appear. There is probably no uniform way that can well handle all type of few-body interactions. This is the same for both nbody6++gpu and petar. The performance of nbody6++gpu becomes better in the late phase of the evolution.

TW of total, P3T-kick and P3T-drift (upper panel) and dE/E (lower panel) versus the physical evolution time for the N250k model. The label, (S) and (C), in the legend of dE/E indicate ‘per time’ and ‘cumulative’, respectively.

Figure 11.

T_W of total, P³T-kick and P³T-drift (upper panel) and dE/E (lower panel) versus the physical evolution time for the N250k model. The label, (S) and (C), in the legend of dE/E indicate ‘per time’ and ‘cumulative’, respectively.

Open in new tab Download slide

The relative energy error dE/E per step (S) of petar has a larger fluctuation while the cumulative errors (C) converge around zero until t = 1000 NB unit. Large fluctuations appear after the core collapse when few-body interactions start to eject massive objects out of the core. The appearance of large dE/E at one step is often associated with the formation of a large particle cluster including a few hundreds members. This is caused by a high-velocity particle which has a large r_{nb, i}. In such case, the particle moves a long distance during one Δt_L, thus the time-step of the long-range interaction is too large. If this particle is also massive, a large dE/E can appear. The way to solve this issue is to reduce Δt_L. In our simulation, to have consistent parameters for measuring the performance, Δt_L is not modified. In an application of an astrophysical study, after core collapse, it is better to adjust the Δt_L and r_{out, ref} based on the new state of the system.

The behaviour of dE/E by using nbody6++gpu is better but large jumps of dE/E sometimes appear and result in a large cumulative error. From the information record of the simulations, significant time-step jumps happen for some particles when the big error appears. It can be caused by the unsuitable criterion to switch on KS regularization, that a strong close encounter is not caught by the KS method. However, it is not easy to design a general criterion to handle all kind of case while keeping a good performance. In such case, the usual way to avoid such large error is to restart the simulation by reducing η_r and η_i or enlarging the KS criterion, R_KS, and Δt_KS. For a test purpose, we do not restart the simulation and allow such large dE/E(S) in our models.

Although the error of 10⁻³ is not small, the behaviour of the long-term dynamical evolution is not very sensitive to it, as shown in Fig. 9. This suggests that we should not use the energy conservation as a strict judgement for the quality of simulations, but should focus on what the physical processes we really care and whether they are treated properly. If the interested objects cause a big energy error, we need to properly validate the result with better energy conservation. On the other hand, sometimes energy conservation can be misleading when we compare the simulations done by using different approximations. For instance, if all binaries in the clusters are treated as isolated and do not exchange energy with other stars, the energy conservation can be much better since the difficulty of the few-body interactions is avoided, but the results are completely wrong. Another example is to use the softening length, which gives a wrong behaviour of close encounters but result in a much better energy conservation.

7.1.3 Binaries

We perform another simulation of star clusters with primordial binaries (the N100kb model) in order to compare the behaviour of the dynamical evolution of binaries. The initial controlling parameters are the same as shown in Table 3. For petar, we choose the opening angle as θ = 0.3 for a better accuracy. The evolution of Lagrangian and core radii are shown in Fig. 12. Similar to the result of N250k, the two results agree with each other very well.

Figure 12.

The evolution of the Lagrangian and core radii for N100kb model. The plotting style is similar to Fig. 9.

Open in new tab Download slide

The initial and final distribution (after 475 NB time unit) of semimajor axis, a, and eccentricity, e, are compared in Fig. 13. There is a forbidden region of a and e in the initial distribution based on the pre-main-sequence eigenevolution model of Kroupa (1995b). Since stars have physical sizes, the pericentres of binaries cannot be lower than a threshold, otherwise the binaries collide. In our simulations, particles are treated as point masses. Thus, after interactions, a part of binaries, especially the tight binaries which are easily perturbed, can enter the forbidden region. The two histograms show that the two codes provide very similar final distributions of a and e.

$Distributions of the semimajor axis a and eccentricity e for the N100kb model performed by using petar and nbody6++gpu. Each point in the central panel indicate one binary. We randomly select $20{{\ \rm per\ cent}}$ binaries to show in order to avoid too many data points. The top and right-hand panels show the histograms of a and e, respectively. The black, green, and red colours indicate the initial condition, the results of petar and nbody6++gpu, respectively.$

Figure 13.

Distributions of the semimajor axis a and eccentricity e for the N100kb model performed by using petar and nbody6++gpu. Each point in the central panel indicate one binary. We randomly select |$20{{\ \rm per\ cent}}$| binaries to show in order to avoid too many data points. The top and right-hand panels show the histograms of a and e, respectively. The black, green, and red colours indicate the initial condition, the results of petar and nbody6++gpu, respectively.

Open in new tab Download slide

We also compare the properties of escapers in Figs 14 and 15. Fig. 14 show the time and mass distribution of single and binary escapers. The semimajor axis (a) versus eccentricity (e) and component mass ratio (m₂/m₁) versus binary mass (m₁ + m₂) are shown in Fig. 15. The two codes well agree with each other on all properties compared here. After 474 NB time unit, the numbers of single and binary escapers are 689 and 110 in the case of petar, and 733 and 137 in the case of nbody6++gpu, respectively. nbody6++gpu produce a slightly more escapers, but generally they agree well considering the statistical fluctuation. Therefore, petar and nbody6++gpu have a consistent treatment on the long-term dynamical evolution of binaries in star clusters.

Figure 14.

Distributions of the times and masses for single (dot) and binary (triangle) escapers in the N100kb model performed by using petar and nbody6++gpu. In the top and right-hand histograms, single and binary escapers are represented by the step and bar styles, respectively.

Open in new tab Download slide

Figure 15.

Upper: the distributions of semimajor axis, a, and eccentricity, e for binary escapers in N100kb models. The areas of markers are proportional to the binary masses. Lower: the distributions of the components’ mass ratios and binaries’ masses (m₁ + m₂). The areas of markers are proportional to the logarithmic of a with an offset.

Open in new tab Download slide

The performance and the energy error of the codes are shown in Fig. 16. Initially, petar is about 4 times faster than nbody6++gpu due to a more efficient treatment of primordial binaries. The performance is stable till the end except for a few peaks due to the formation of stable multiple systems. The performance of nbody6++gpu becomes better in the late time due to the disruption of wide binaries. There are also several peaks where the performance significantly drops due to the appearances of stable multiple systems. At the end, the performance of petar is about 1–2 times faster than that of nbody6++gpu. Notice that here we compare a model of 10⁵ particles. Due to the time-consuming calculation in nbody6++gpu, we have not compared a long-term evolution of million-body systems with large primordial binaries. Based on the result of Fig. 8, we expect that the performance difference is more obvious in the case of simulating massive GCs.

Figure 16.

T_W (upper panel) and dE/E (lower panel) versus the physical evolution time for the N100kb model. The plotting style is similar to Fig. 11.

Open in new tab Download slide

7.2 Scaling on supercomputer

We test the performance of petar depending on N and the number of cores (N_core) on the Cray XC50 supercomputer. Each computing node has two of Intel Xeon Gold 6148 processors (Skylake), that is 40 cores. The maximum of N_core = 960. No GPU devices are available. The mpi and openmp are used together. For N_core > 10, the number of openmp threads are fixed to be 10 while the number of mpi processes is N_core/10. For N_core ≤ 10, we only use openmp. The AVX-512 acceleration is used.

The initial properties of the star clusters except N are the same as that of the single and binary models used in Section 7.1.1. The naming style of the models follows the style of N10k, N1m, N10kb, etc. The configuration of input parameters follows Table 3. Since N_core is large, we set the maximum of N to be 10 million. The N10m and N10mb models have Δt_L = 1/2048 and 1/4096 and r_{out, ref} ≈ 0.00199303 and 0.000996758, respectively. We choose the θ of 0.5.

The result is shown in Fig 17. The scaling of the N10k and N10kb model becomes flat when N_core > 5. The thresholds are about 160 and 640 for the case of N = 10⁵ and 10⁶, respectively. The N10m and N10mb models can well scale up to N_core = 960. The absolute values of T_W at the thresholds for million-body cases are about 63 and 121 s for single and binary models, respectively. The binary models roughly cost twice T_W of that in single models, because for a given N, Δt_L in the binary model is half of that in the single model.

Figure 17.

The performance test of petar on the Cray XC50 supercomputer. The two panels show T_W of single and binary models versus the number of CPU cores (N_core), respectively. Colours indicate different N. The values T_W in sec are shown above the data points. The dashed lines indicate the ideal scaling (∝1/N_core). The vertical line (N_core = 10) indicates the boundary where the simulations use only OpenMP in the left and use both mpi and openmp in the right.

Open in new tab Download slide

The million-body Dragon models with only |$5{{\ \rm per\ cent}}$| primordial binaries took about 3000 s per NB time unit on a GPU-based supercomputer (Wang et al. 2016). Thus, T_W of the N10mb model are even much less than that of the Dragon models. Notice that we cannot directly compare the absolute T_W as the hardware are very different. Besides, the long-term behaviour can be different from the initial case. But we can still obviously see the significant advance of the performance by using petar on a modern supercomputer even without GPU. Although in our test, the stellar evolution is not included, it should not significantly change the result as the computational complexity of that part is only O(N). It is expected that petar may even be possible to solve the 10-million-body problem.

8 CONCLUSION AND FUTURE WORK

In this work, we have detailed described the new N-body code, petar. It is designed to simulate gravitationally bound collisional stellar systems with many subsystems. The code is publicly available on GitHub (see footnote 1). The hybrid integrator is implemented, where the long-range interaction is calculated by using the Barnes–Hut PT method and short-range (neighbour) interactions are handled by the fourth-order Hermite with block time steps and the SDAR methods. This hybrid method not only provides a calculation cost of O(Nlog N) for the long-range interaction compared to O(N²) in the direct N-body method, but also can accurately follow the long-term evolution of perturbed binaries and hierarchical systems.

We introduce the mass-dependent changeover function for handling the systems with a wide range of particle masses (Section 6.1). The clustering method based on the implementation in pentacle is improved by using mass and velocity dependent searching radii. The orbit-dependent neighbour criterion is introduced to handle the high-velocity particles (Section 6.3). In order to accurately follow the long-range perturbation to tight binaries when T_bin < Δt_L, the artificial particle algorithm is developed (Section 6.4).

For a high performance on multiple-core computers, the code is implemented by using of fdps, which provides a well-optimized mpi and openmp parallelization of PT part for the tree construction and long-range force calculations. The neighbour searching and long-range force calculation in P³T-kick are optimized by using of SIMD instructions (AVX, AVX2, AVX-512) and GPU acceleration (CUDA). The sdar library is used to perform the P³T-drift (short-range interactions). This part is parallelized by using of openmp in each mpi processes.

A series of simulations are performed by using petar and nbody6++gpu in order to compare the performance and to validate whether petar can properly follow the long-term evolution of star clusters. On a highly configured GPU-based desktop, the performance of petar follows the scaling of O(Nlog N) and is faster than nbody6++gpu (Fig. 8). Especially for million-body systems with a large fraction of primordial binaries, petar can give an 11 times faster performance. Notice that the performance of petar is very sensitive to Δt_L (Fig. 7). The best value is chosen in our test.

The test of petar on the Cray XC50 supercomputer show a good scaling depending on the number of CPU cores for N ≥ 10⁶ (Fig. 17). Million-body simulations with |$100{{\ \rm per\ cent}}$| primordial binaries only take about 121 s wall clock time per NB time unit. The 10 million models with and without (⁠|$100{{\ \rm per\ cent}}$|⁠) primordial binaries take 1344 and 1106 s, which is even faster than that of the Dragon models with only |$5{{\ \rm per\ cent}}$| binaries. Such significant improvement is due to the benefit of using new algorithms (P³T with SDAR), well optimized parallelization from fdps and the advance of hardware.

We also compare the long-term behaviour of the performance and relative energy error in Figs 11 and 16. In the case without primordial binaries, after core collapse, petar becomes slower due to the increasing of central density. This is possibly improved if Δt_L and r_{out, ref} are readjusted during or after the core collapse.

Sometimes, the computation becomes very slow during a short physical time interval of the models due to the formation of a specific type of stable hierarchical systems. Such behaviours exist for both petar and nbody6++gpu in the models with primordial binaries (N100kb). Once it happens, the benefit of parallel computing is lost due to a bad load balance. However, how to efficiently and accurately solve such systems is a long existing question. In the future work, the code will be improved to well handle a part of them. But it is difficult to find a universal solution for all cases. Because of this, the actual computing time of a long-term simulation may be longer than the prediction estimated from a few initial steps if such systems frequently form.

The results of N250k and N100kb models indicate that for both star clusters with and without primordial binaries, petar can provide a good agreement on the long-term behaviours of Lagrangian and core radii (including the post-core-collapse evolution; Fig. 9 and 12), the properties of single and binary escapers (Figs 14 and 15) and the evolution of binary orbits (Fig. 13). Thus, petar is accurate enough to handle the realistic models of star clusters with many multiple systems.

The API to the stellar evolution package based on the framework of sse and bse (Hurley, Pols & Tout 2000; Hurley, Tout & Pols 2002) are implemented. The interface is designed in a way that switching between different versions of the sse and bse is straightforward. Moreover, petar is also implemented as a module in a hydro dynamics code, asura-bridge (private comm. with Michiko Fujii), using the Bridge method (Fujii et al. 2007). Thus, this hybrid code is possible to be used for studying the formation of star clusters, where few-body dynamics including close encounters, formation and evolution of binaries and hierarchical systems can be treated accurately. Besides, the application programming interface (API) to amuse (Portegies Zwart et al. 2013) is under developing. The complete petar module in amuse will allow us to use the Bridge method to combine modules like hydrodynamic codes for a wide range of studies and to work with different single and binary stellar evolution packages. On the other hand, the current version of petar only includes the pure Newtonian gravitational pair interaction. In the future, the general relativity effect for compact object binaries including BHs and NSs will be implemented.

ACKNOWLEDGEMENTS

We thank Sverre Aarseth for the helpful suggestions on the code development. LW thanks the financial support from JSPS International Research Fellow (School of Science, The University of Tokyo) and the support from Alexander von Humboldt Foundation (The University of Bonn). Numerical computations were in part carried out on Cray XC50 at Center for Computational Astrophysics, National Astronomical Observatory of Japan.

DATA AVAILABILITY

The (benchmark) data underlying this article were generated by using the petar code on the desktop computer of the corresponding author and the supercomputer, Cray XC50 at Center for Computational Astrophysics (CfCA), National Astronomical Observatory of Japan. The data underlying this article will be shared on reasonable request to the corresponding author. The code, petar, introduced in this article is publicly available in GitHub at https://github.com/lwang-astro/PeTar, under the MIT license.

Footnotes

Notice that sdar and SDAR (with different font styles) are the names of the code and the algorithm, respectively.

REFERENCES

Aarseth

S. J.

2003

Gravitational N-Body Simulations

Cambridge Univ. Press

Cambridge

Month:	Total Views:
July 2020	15
August 2020	23
September 2020	20
October 2020	33
November 2020	10
December 2020	10
January 2021	7
February 2021	9
March 2021	8
April 2021	5
May 2021	8
June 2021	3
July 2021	4
August 2021	5
September 2021	9
October 2021	15
November 2021	11
December 2021	16
January 2022	13
February 2022	3
March 2022	9
April 2022	18
May 2022	16
June 2022	10
July 2022	6
August 2022	8
September 2022	4
October 2022	9
November 2022	12
December 2022	10
January 2023	5
February 2023	17
March 2023	9
April 2023	14
May 2023	14
June 2023	14
July 2023	18
August 2023	6
September 2023	37
October 2023	39

Article Contents

ABSTRACT

1 INTRODUCTION

2 MULTIPLE TIME-SCALE ISSUE

2.1 Binary period

2.2 Crossing time

2.3 Relaxation time

3 DIRECT N-BODY METHOD

3.1 Individual time-steps

3.2 AC neighbour scheme

3.3 Binary integrator

3.4 Slow-down algorithm

3.5 Parallelization

3.6 Bottleneck

4 PARTICLE-TREE WITH INDIVIDUAL TIME-STEPS

5 HYBRID METHODS WITH HAMILTONIAN SPLITTING

5.1 Particle–particle particle tree method

6 HYBRID N-BODY CODE: petar

6.1 Mass-dependent changeover function

6.1.1 Varying changeover radii

6.2 Slow-down time-transformed symplectic integrator

6.3 Clustering

6.3.1 Orbit-dependent neighbour criterion

6.4 Artificial particle algorithm for weak perturbation

6.4.1 Tidal tensor

6.4.2 Counterforce

6.4.2.1 Orbit-sampling method

Pseudo-particle multipole method:

6.4.3 Test

6.5 Parallelization algorithm

7 BENCHMARK

7.1 Comparison with nbody6++gpu

7.1.1 Performance on GPU-based desktop

7.1.2 Long-term evolution

Lagrangian and core radii:

Escapers:

Performance and energy error:

7.1.3 Binaries

7.2 Scaling on supercomputer

8 CONCLUSION AND FUTURE WORK

ACKNOWLEDGEMENTS

DATA AVAILABILITY

Footnotes

REFERENCES

Citations

Views

Altmetric

Email alerts

Astrophysics Data System

Citing articles via

Latest

Most Read

Most Cited

This Feature Is Available To Subscribers Only