ABSTRACT

The numerical simulations of massive collisional stellar systems, such as globular clusters (GCs), are very time consuming. Until now, only a few realistic million-body simulations of GCs with a small fraction of binaries (⁠|$5{{\ \rm per\ cent}}$|⁠) have been performed by using the nbody6++gpu code. Such models took half a year computational time on a Graphic Processing Unit (GPU)-based supercomputer. In this work, we develop a new N-body code, petar, by combining the methods of Barnes–Hut tree, Hermite integrator and slow-down algorithmic regularization. The code can accurately handle an arbitrary fraction of multiple systems (e.g. binaries and triples) while keeping a high performance by using the hybrid parallelization methods with mpi, openmp, simd instructions and GPU. A few benchmarks indicate that petar and nbody6++gpu have a very good agreement on the long-term evolution of the global structure, binary orbits and escapers. On a highly configured GPU desktop computer, the performance of a million-body simulation with all stars in binaries by using petar is 11 times faster than that of nbody6++gpu. Moreover, on the Cray XC50 supercomputer, petar well scales when number of cores increase. The 10 million-body problem, which covers the region of ultracompact dwarfs and nuclear star clusters, becomes possible to be solved.

1 INTRODUCTION

The realistic models of collisional stellar systems where stars and binaries can have frequent close interactions, such as globular clusters (GCs), have been a long-term challenge due to the time-consuming calculations. By developing a hybrid-parallel direct N-body code nbody6++gpu (Wang et al. 2015), the million-body models of GCs have become possible (Dragon models; Wang et al. 2016). This success is based on the breakthrough of both hardware and software developments. Especially, the Gravity Pipe (GRAPE; Makino et al. 2003) and the Graphic Processing Unit (GPU) provide great platforms for performing highly efficient parallelized simulation codes (Gaburov, Harfst & Portegies Zwart 2009).

However, the Dragon models have relatively lower densities compared to the typical GCs in the Galaxy. It is difficult to simulate high-density systems because the half-mass relaxation time-scale (Trh) is much shorter, while the calculation cost scales in a way of O(N3)/Trh. Even so, the Dragon models took about half a year computing to reach one Trh. Besides, the binary fraction is also small (⁠|$5{{\ \rm per\ cent}}$|⁠) compared to that in the observed GCs. This is because the orbital integration of binaries is not parallelized in nbody6++gpu, thus the performance cannot well scale with multiple CPU cores when the binary fraction is large. Therefore, the dense models with many binaries are still challenging. It is also not practicable to perform a large number of models to cover the parameter spaces of different initial conditions. All these limit the applications of the direct N-body methods for studying GCs.

On the other hand, the ultracompact dwarfs (UCDs) attract researchers’ attention to understand their formation and evolution, because they belong to a family crossing the region of GCs and dwarf galaxies (e.g. Hilker et al. 1999; Drinkwater et al. 2000; Phillipps et al. 2001). The discovery of supermassive black hole (SMBH) in a UCD (Seth et al. 2014) brings a new question that how SMBHs and UCDs co-evolve. In the view of stellar dynamics, UCDs with N ∼ 107–108 are in the transiting region between the collisional and collisionless systems. Thus, a proper model of UCDs also requires a correct treatment of collisional dynamics.

The galactic nuclear star clusters (NSCs) are very massive and dense. The collisional effect is also important. They can be a few magnitude more massive than GCs but the system size is similar. The formation scenarios of NSCs and the co-evolution among NSCs, SMBHs, and host galaxies are still not fully understood. The star-by-star simulations including SMBH are useful to study such kind of problems. The currently most dense model of NSCs with a proper treatment of collisional effect was done by using the nbody6++gpu code (Panamarev et al. 2019), but the number of stars is still far from the realistic case.

To solve the challenge of realistically simulating these dense and massive systems, a new approach of numerical tool that can overcome the bottlenecks (performance and small fraction of binaries) is necessary. The major difficulty is to solve the multiple time-scale issue. In this work, we describe how our new code, petar, can overcome this challenge.

In Section 2, we introduce the multiple time-scale issue. In Section 3, we summarize the algorithms used in the direct N-body methods, especially those in the nbody6(++gpu) code. The approach of particle-tree (PT) with individual time-steps is shortly described in Section 4. Then, we introduce the idea of Hamiltonian splitting in Section 5. The detailed description of petar is in Section 6. After that, we provide the benchmarks of petar in Section 7. Finally, we make conclusions and discuss the future work in Section 8.

2 MULTIPLE TIME-SCALE ISSUE

Long-term dynamical evolution of star clusters involves multiple physical processes with very different time-scales. Three important ones are:

  • Tbin: period of binaries (order of day at the minimum).

  • Tcr: time for a star crossing the cluster (order of Myr for GCs).

  • Tr: two-body relaxation time of the system (order of Gyr for GCs).

The N-body method needs to be possible to handle these three time-scale regions.

2.1 Binary period

Binaries contain very large mechanical energy compared to that of the host system. They play an important role in controlling the global evolution via few-body interactions. Since the gravitationally bound system has a negative heat capacity, the temperature of the cluster centre becomes hotter as energy is transferred outwards. Such process is unstable and finally the core collapse happens and the central density increases significantly (e.g. Binney & Tremaine 1987; Spitzer 1987). Binaries are considered as the major heating source that can prevent an infinite core collapse. After a close interaction between a tight binary and its neighbours in the centre, energy is added the system to support the core while the binary becomes more compact and finally merges or escapes from the system. Thus, such process influences both the global evolution and the statistical properties of binaries. This feature makes the star cluster an efficient environment to form exotic objects such as blue stragglers, X-ray binaries, and gravitational wave progenitors. Therefore, to obtain realistic models of star clusters, few-body interactions, and dynamics of binaries must be correctly treated.

However, binaries also bring the major challenge for the N-body simulations. Tbin can cover a very wide region in star clusters. The minimum of Tbin is determined by the stellar structure of binary components. Typically it is a few days. There is no upper limit of Tbin, but in a cluster environment wide binaries are easily disrupted by perturbations. Due to the Heggie–Hill law (Heggie 1975; Hills 1975), the boundary between the wide and the tight can be estimated as
$$\begin{eqnarray*} a_{\mathrm{h/s}}\approx \frac{G m_{\mathrm{1}}m_{\mathrm{2}}}{\langle m \rangle \sigma ^2}, \end{eqnarray*}$$
(1)
where G is the gravitational constant; m1 and m2 are the masses of the two components, respectively; 〈m〉 is the locally averaged stellar mass, and σ is the local velocity dispersion. After close encounters, wide binaries with the semimajor axis a > ah/s become wider and while tight binaries become tighter. Thus, binaries with a < ah/s can stay in clusters for a long time before merger or escaping.

We can roughly estimate Tbin at the boundary of ah/s. In equation (1), σ2 can be replaced by GM/r for a system in virial equilibrium, where M is the total mass and r is the size of the system. Thus, ah/sr/N if all stars have a similar mass. In an open cluster, r ∼ 1 pc and N ∼ 103, binaries close to the boundary have Tbin in the order of 103 yr. In a dense GC, the boundary Tbin is in the order of years. Thus, Tbin has a range of 3–6 orders of magnitudes.

To properly follow the orbital motion, the time-step size of integration should be much less than Tbin. If the orbit has a fast change at the pericentre due to a high eccentricity (e), the time-step should be very small to catch the pericentre motion. This is also the case for a hyperbolic encounter. Thus, binaries and close encounters are the most time-consuming part in the N-body simulation of star clusters.

2.2 Crossing time

Tcr represents the time-scale of the orbital motion of a single star in the system. Thus, the time-step of integration for a star should be much shorter than its Tcr to obtain a sufficient accuracy. In a star cluster, Tcr can be estimated by
$$\begin{eqnarray*} T_{\mathrm{cr}}\approx \sqrt{\frac{1}{G\rho }}, \end{eqnarray*}$$
(2)
where ρ is the local mean density. After core collapse, the density contrast between the centre and the half-mass radius of star clusters can be an order of 104 (e.g. Binney & Tremaine 1987; Wang et al. 2016). Thus, Tcr varies about 100 times from the centre to the halo. Even without binaries, the individual-time-step method is necessary to efficiently handle such a large range.

2.3 Relaxation time

Two-body relaxation is one important physical process that determines the long-term behaviour of the N-body system. The phenomenons of core collapse, mass segregation, and escaping of stars all depend on it.

The relation between Tr and Tcr can be described as (e.g. Binney & Tremaine 1987)
$$\begin{eqnarray*} T_{\mathrm{r}}\approx \frac{0.1 N}{\ln \Lambda } T_{\mathrm{cr}}, \end{eqnarray*}$$
(3)
where ln Λ is Comlumb logarithm. The factor N in equation (3) indicates that for a global cluster with million stars, the ratio between Tr and Tcr is very large. For a single-mass system in virial equilibrium, the averaged Tr measured at the half-mass radius of the system (Rh) can be estimated by (Spitzer 1987)
$$\begin{eqnarray*} T_{\mathrm{rh}}\approx 0.138 \frac{N^{1/2} R_{\mathrm{h}}^{3/2}}{\langle m \rangle ^{1/2} G^{1/2} \ln \Lambda }, \end{eqnarray*}$$
(4)
Typically, GCs in the Galaxy have Trh in an order of Gyr and already passed a few Trh.

To study the long-term evolution of star clusters, the numerical simulations need to cover at least one Trh. However, the time resolution of the integration should be less than Tbin and Tcr. As the maximum of Trh/Tbin ∼ 1011, the classical integrators using individual-time-step methods (e.g. fourth-order Hermite with block time-steps) is not practically possible to handle such expensive calculations. Therefore, the sophisticated N-body codes (e.g. nbody6(++gpu)) apply special algorithms to reduce the computational operations.

3 DIRECT N-BODY METHOD

We provide a short review of the algorithms used in the direct N-body code, especially nbody6(++gpu) (Aarseth 2003; Nitadori & Aarseth 2012; Wang et al. 2015), that are designed to deal with the multiple time-scale issue. A part of the algorithms are also implemented in the petar code.

3.1 Individual time-steps

The performance of force interaction calculation in direct N-body simulations are usually considered as O(N2) due to the pair interaction between all particles (stars). However, this is only the case when the interaction between all particles are needed. When the multiple time-scale issue exists, sophisticated N-body codes for simulation star clusters like nbody6(++gpu) use individual time-steps for each particle. Thus particles with different Tcr can use suitable integration steps to avoid expensive O(N2) calculations every step. In nbody6(++gpu), the fourth-order Hermite integrator with the block-time-step method is used. The block-time-step method normalizes the step size to be an integer power of 0.5, so that the implementation of multiple-core parallelization becomes possible. The performance of interaction calculation per step is O(NNact〉), where 〈Nact〉 is the averaged active particle number that need the update of forces at one step. Makino & Hut (1988) found that the total number of pair interactions depends on O(N7/3)/Tcr if the system has a power-law density distribution with power-index α < 24/11. For α > 24/11, the scaling relation depends on α. Using equation (3), the scaling becomes O(N10/3/ln Λ)/Tr for α < 24/11. Thus, as N increases, the computational cost grows rapidly.

3.2 AC neighbour scheme

To reduce the computation cost when N is large, Ahmad & Cohen (1973) introduced the (AC) neighbour scheme, where the force on a particle is split into the short-distance (neighbour) part and long-distance (regular) part. As the long-distance force changes smoothly, it can be updated with a larger time-step (regular step) compared to that of the neighbour force. Between two regular steps, the long-distance force is estimated by a second-order prediction. Usually, the number of neighbour particles is a small fraction of total N (the order of 10–102), thus the total number of pair interactions is significantly reduced. Especially, when small neighbour steps are required to handle close encounters and binaries, only 10–102 force evaluations are needed per step while the regular step can be much larger. Thus, the frequent O(N) calculation is avoided. The speed gained by the AC scheme is roughly proportional to N1/4 without binaries (Makino & Hut 1988; Makino & Aarseth 1992).

If no short-period binaries exist, individual time-steps combined with the AC neighbour scheme is an efficient method for simulating the long-term evolution of star clusters. However, it is still not sufficient to handle the very large time-scale gap caused by the short-period binaries.

3.3 Binary integrator

Since short-period binaries are very compact, the perturbation from neighbour particles is usually very weak unless a close encounter happens. Based on this feature, nbody6(++gpu) codes do not evolve such binaries and treat them as single (centre-of-the-mass) particles, until the perturbation becomes strong enough. These frozen binaries are named as ‘isolated binaries’ in the codes. Thus, only the internal motion of strongly perturbed binaries are actually integrated. Besides, the time-steps to integrate the internal motion are much smaller than the neighbour steps. To avoid large number of pair interactions, only the force from nearby perturbers is included. The typical number of such perturbers for a binary is less than 5. Therefore, the computational cost is reduced significantly.

The major perturber selection criterion is based on the strength of tidal force (Aarseth 2003, equation 8.58):
$$\begin{eqnarray*} r_{\mathrm{p}} \lt \left[\frac{2m_{\mathrm{p}}}{M\gamma _{\mathrm{min}}}\right]^{1/3}R, \end{eqnarray*}$$
(5)
where rp is the distance between the perturber and the closest component in the binary; mp is the mass of the pertuber and M is the mass of the binary; R is the apo-centre distance; and γmin is a free coefficient.

However, this criterion may ignore the impact from the whole system because it checks only the individual neighbours, but not the cumulative effect from the group of particles with the similar distances and directions. For example, the central region of a GC can contain 105 M. If we consider its centre-of-mass mp ≈ 105 M, the corresponding criterion rp, G is 300 times larger than the case (rp, s) of a normal star with 1 M. But by using this criterion, most stars outside rp, s are excluded. For the relative wide binaries in the outside region of a GC, this cumulative effect may be significant and cannot be ignored. This is one potential problem when only close perturbers are selected.

On the other hand, to handle the highly eccentric binaries and close encounters, the Kustaanheimo & Stiefel (1965, KS) regularization and ‘Algorithmic Regularization’ (AR; Mikkola & Tanikawa 1999) are used. The regularization method avoids the singularity of Newtonian force when two particles get very close, thus no small time-steps are required while the accuracy of integration is high enough.

3.4 Slow-down algorithm

Mikkola & Aarseth (1996) developed the slow-down (SD) method that can significant reduce the number of integration steps for weakly perturbed binaries. The key idea is to modify the Hamiltonian of a system with a binary as
$$\begin{eqnarray*} H_\mathrm{sd}= \frac{1}{\kappa } H_{\mathrm{b}}+ (H - H_{\mathrm{b}}), \end{eqnarray*}$$
(6)
where Hsd is the new Hamiltonian and Hb is the Hamiltonian of the binary components. The κ is a scaling factor that slows down the motion of the binary that the effective period becomes Tbin/κ. Thus, the number of integration steps for this binary is also reduced by κ times. It is shown in Mikkola & Aarseth (1996) that the secular motion of binary can be correctly reproduced while the orbital phase information is lost.

The nbody6(++gpu) codes apply the SD method together with the KS regularization. However, since most weakly perturbed binaries are treated as isolated binaries, and a strict limit of κ (≤10) is used, the performance improvement is not significant.

3.5 Parallelization

With the algorithms described above, the total computational cost is still significant (roughly O(N3)/Trh) for the direct N-body method. Thus, the multiple-core parallelization is necessary to reduce the wall clock time of the computation. Spurzem (1999) and Hemsendorf, Khalisi, Omarov & Spurzem (2003) implemented the MPI parallelization for calculating the long-distant and neighbour force. Then, Nitadori & Aarseth (2012) implemented the hybrid parallelization methods (nbody6-gpu) including the openmp, GPU (CUDA), and SIMD instructions (SSE, AVX) for long-distant and neighbour integration. Wang et al. (2015) combined these two and optimize the code (nbody6++gpu) in order to perform large N simulations on supercomputers.

3.6 Bottleneck

Although nbody6++gpu can handle the million-body simulations of GCs (Wang et al. 2015), there are several bottlenecks of the code that limit the future improvement. First, the large memory space is required to save the data. For each particle, a neighbour list with few hundreds 4-bytes integers needs to be saved. Moreover, each MPI process keeps the complete copy of particle data. Thus the maximum N is limited by the maximum memory size per mpi process. For example, if the maximum neighbour number is 500, one million particles require 2 GB memory space to save the neighbour lists. Including the particle data and many others, the actual memory cost is significant. Therefore, the code cannot be used to simulate systems with a very large N even if the computing resource has a large number of computing nodes.

Secondly, the parallelization of integrating internal motions of binaries (KS regularization) is difficult. One may think that this should not be since each binary can be evolved almost independently. In reality, several parts that required a large number of operations are not possible to be parallelized. Especially, each time when the integrator for a binary needs to be switched between the KS regularization and the Hermite method, the initialization of the force requires a O(N) pair interaction. In addition, each particle has to update its neighbour list with the total memory access of O(NNb〉). If switching is frequent, such cost is very large. Unfortunately, there are always wide binaries that are close to the switching conditions in a star clusters if the primordial binary fraction is significant.

Moreover, the code is initially not designed for parallelization. During the KS integration, many shared global variables (mostly for binary stellar evolution) are modified and conditional interruptions are frequently used. Thus, the shared memory parallelization method like openmp is difficult to implement and does not scale with number of threads. The thread safety is also not guaranteed. On the other hand, the floating-point operation per memory access is low due to a few number of perturbers, thus the distributed memory parallel method (mpi parallelization) also does not scale (Wang et al. 2015).

Thirdly, the code mixes the N-body integration and the stellar evolution in a complex way. It is not easy to separate the two parts. Therefore, it is not flexible to replace the implementations of stellar-evolution models. It is also challenging to maintain the code and include new features.

4 PARTICLE-TREE WITH INDIVIDUAL TIME-STEPS

The force contributed by the distant particles are much weaker than that of neighbours. Thus in the individual time-step method, even the time resolution is low for the weakly interacted particles, the accuracy is sufficient. We can consider it as an approximation on time. There are another type of N-body algorithms using an approximation on space, such as the Barnes–Hut tree (PT; Barnes & Hut 1986), the particle mesh (PM; Hockney & Eastwood 1988), and the fast multiple method (FMM; Greengard & Rokhlin 1987). Compared to the cost of O(N2) in the direct force calculation of all particles, such methods only requires O(Nlog N) (PT and PM) or O(N) (FMM). However, these methods need a shared time-step. For example, the second-order leapfrog integrator is used in the PT method. Therefore, it is difficult to use it for star clusters because of the multiple time-scale issue.

McMillan & Aarseth (1993) started the first effort to overcome this bottleneck by introducing a high-order predictor-corrector integrator with individual time-steps in the PT method. They also implemented the KS regularization to deal with binaries. Fukushige & Kawai (2016) implemented the parellelization support on the GRAPE-9 system for this method and showed that it is much faster than the Hermite integrator for million-body simulations.

The PT method approximates the long-distant force. However, the weak encounters from distant particles are important in the relaxation process. Thus, it is necessary to ensure that the PT method can correctly reproduce the relaxation. Hernquist (1987) found that the accuracy of the relaxation process in the PT method depends on the opening angle (θ). When θ < 1.0, the measured Trh of the PT method is consistent with that of the direct N-body method.

5 HYBRID METHODS WITH HAMILTONIAN SPLITTING

Recently, hybrid numerical simulation methods become popular to solve the multiple time-scale issues. The key idea is based on the Hamiltonian splitting. If the Hamiltonian, the system can be decomposed to two parts as
$$\begin{eqnarray*} H = H_{\mathrm{L}}+ H_{\mathrm{S}} \end{eqnarray*}$$
(7)
The equation of motion can be described as
$$\begin{eqnarray*} \frac{\mathrm{d}\boldsymbol {w}}{\mathrm{d}t} = \lbrace \boldsymbol {w}, H\rbrace = \lbrace \boldsymbol {w}, H_{\mathrm{L}}\rbrace + \lbrace \boldsymbol {w}, H_{\mathrm{S}}\rbrace . \end{eqnarray*}$$
(8)
where {} is Poisson bracket. We define the differential operator |$\mathcal {L}\equiv \lbrace\ , H_{\mathrm{L}}\rbrace$| and |$\mathcal {S}\equiv \lbrace\ , H_{\mathrm{S}}\rbrace$|⁠. Using matrix exponential, the symplectic mapping from t to t + Δt can be written as
$$\begin{eqnarray*} \boldsymbol {w}(t+\Delta t) = e^{\Delta t(\mathcal {L}+\mathcal {S})}\boldsymbol {w}(t). \end{eqnarray*}$$
(9)
If the two parts have analytic solutions, the symplectic integrator can be constructed. The second-order symplectic integrator is given by
$$\begin{eqnarray*} \boldsymbol {w}(t+\Delta t) = e^{\Delta t\mathcal {L}/2}e^{\Delta t\mathcal {S}}e^{\Delta t\mathcal {L}/2}\boldsymbol {w}(t) + O(\Delta t^3). \end{eqnarray*}$$
(10)
If HL and HS are the potential and kinetic energy, respectively, equation (10) represents the leapfrog method with the order of kick-drift-kick.

The potential and kinetic energy are not the only combination. Wisdom & Holman (1991) first introduced the mixed variable symplectic (MVS) method to evolve the planetary system, where HL and HS represent the Kepler motion and interactions between planets, respectively. Then, several combinations of HL and HS are introduced to simulate different type of systems (e.g. Hockney & Eastwood 1988; Xu 1995; Chambers 1999; Fujii et al. 2007; Oshino, Funato & Makino 2011).

5.1 Particle–particle particle tree method

For particle-based N-body systems, one possible way of Hamiltonian splitting is using HL and HS to represent the long- and short-range interactions, respectively. Since HL dominates the computation while the contribute to the pair interaction is less than HS, the approximated methods can be used with a large fixed time-step, which provides a sufficient accuracy and a small computational cost. On the other hand, more accurate methods with smaller and individual time-steps can be applied for HS. These hybrid methods are used for several combinations, such as PM +PP (P3M; Hockney & Eastwood 1988), PM +PT (Xu 1995), and PT +PP (P3T; Oshino et al. 2011), where PP represents the direct N-body (particle–particle) method.

The P3T method introduced by Oshino et al. (2011, Fig. 1) is specially designed to simulate the collisional systems which have the multiple time-scale issue (without binaries). The Hamiltonian splitting of this method is via a changeover function W(rij):
$$\begin{eqnarray*} H_{\mathrm{S}}&=& \sum _{i=1}^N\frac{p_i^2}{2 m_i} - \sum _{i\lt j}^N \frac{G m_i m_j}{r_{ij}} W(r_{ij}) \nonumber \\ H_{\mathrm{L}}&=& \sum _{=1}^N \frac{G m_i m_j}{r_{ij}} [1-W(r_{ij})] . \end{eqnarray*}$$
(11)
The purpose of W(rij) is to result in a smooth transition when two particles pass the boundary of long- and short-range interactions.
An illustration showing how the P3T method deal with the long- and short-range interactions for a 2D particle system. The upper panel shows the structure of the Barnes–Hut tree. The neighbour particles inside a distance criterion are collected as individual clusters. An example for one (red) particle is shown as blue points. Two lines with an opening angle, θ = 0.35°, starting from this particle are also shown. If a tree cell (like the pink box) is inside θ, its superparticle (the centre-of-the-mass with multipole expansions) is used to obtain the long-range acceleration. The short-range acceleration for this particle (shown in the bottom panel) is calculated by the high-accuracy PP method.
Figure 1.

An illustration showing how the P3T method deal with the long- and short-range interactions for a 2D particle system. The upper panel shows the structure of the Barnes–Hut tree. The neighbour particles inside a distance criterion are collected as individual clusters. An example for one (red) particle is shown as blue points. Two lines with an opening angle, θ = 0.35°, starting from this particle are also shown. If a tree cell (like the pink box) is inside θ, its superparticle (the centre-of-the-mass with multipole expansions) is used to obtain the long-range acceleration. The short-range acceleration for this particle (shown in the bottom panel) is calculated by the high-accuracy PP method.

Iwasawa, Portegies Zwart & Makino (2015) developed a GPU-parallelized P3T code and compared its performance with the Hermite method. They showed that the new scheme can be 10 times faster. This high performance encourages us to advance in this direction by combining the binary solver into the P3T method in order to properly handle the short-time interval close interactions.

6 HYBRID N-BODY CODE: petar

We introduce our new hybrid N-body code, petar, which combines the P3T method (Oshino et al. 2011; Iwasawa et al. 2015, 2016, 2017) and the slow-down time-transformed symplectic integrator (SDAR; Wang, Nitadori & Makino 2020). The framework of pentacle is the base of the code (Iwasawa et al. 2017). The parallelization framework for developing particle simulation codes (fdps; Iwasawa et al. 2016, 2020) are used to deal with the PT construction and long-range force calculation. The sdar code, which combines the fourth-order Hermite and the SDAR integrators.1 is used for the short-range interaction. Fig. 2 show how petar works for one cycle of integration. It can be summaries as:

  • Decompose domains: distribute particles to different mpi processes.

  • Search neighbours and clustering: construct particle tree, search neighbours for each particle and gather all nearby particles into individual clusters (Section 6.3).

  • Find groups and create artificial particles: in each cluster, find subsystems (groups) and if necessary, create artificial particles for each group (Sections 6.2 and 6.4).

  • Calculate long-range force and kick velocities (P3T-kick): construct particle tree that includes artificial particles, calculate the long-range interaction and kick the velocities of all particles.

  • Integrate motions in each clusters (P3T-drift): in each cluster, use the Hermite and SDAR method to integrate the motions controlled by the short-range interaction.

The schedule of one kick-drift-kick cycle of petar. In order to have a clear view, the 2D particle system is illustrated here. From left to right: (1) domain decomposition splits particles in space and distributes them to different mpi processes. This step is done every few cycles. (2) PT is constructed for searching neighbours and clustering. Individual clusters are marked as different colours. (3) For individual clusters, groups of multiple systems (binaries, triples ...; marked as red points) are detected and artificial particles (tidal-tensor and orbit-sampling/pseudo-particles; blue points) are added to particle systems. Artificial particles are represented by four points along a rectangle. (4) PT is constructed with artificial particles and used to calculate the long-range force. Then velocities of particles are kicked. (5) For individual clusters, particle positions, and velocities are integrated by the Hermite and SDAR methods (drift).
Figure 2.

The schedule of one kick-drift-kick cycle of petar. In order to have a clear view, the 2D particle system is illustrated here. From left to right: (1) domain decomposition splits particles in space and distributes them to different mpi processes. This step is done every few cycles. (2) PT is constructed for searching neighbours and clustering. Individual clusters are marked as different colours. (3) For individual clusters, groups of multiple systems (binaries, triples ...; marked as red points) are detected and artificial particles (tidal-tensor and orbit-sampling/pseudo-particles; blue points) are added to particle systems. Artificial particles are represented by four points along a rectangle. (4) PT is constructed with artificial particles and used to calculate the long-range force. Then velocities of particles are kicked. (5) For individual clusters, particle positions, and velocities are integrated by the Hermite and SDAR methods (drift).

The kick-drift-kick mode (equation 10) is used, thus in the final step, (i)–(iv) are executed once more. The first and the last P3T-kick take the half of ΔtL.

6.1 Mass-dependent changeover function

In pentacle, the seventh-order polynomial type of changeover function K(rij) (the derivative of W(rij) with respect to rij) is implemented (Iwasawa et al. 2017). This K(rij) ensures that all terms of derivatives of force used in the fourth-order Hermite integrator have a smooth changes at the boundary of the changeover range. However, there are two limits. First, the changeover function for potential W(rij) contains the term of log (rij), which is computational expensive. Secondly, the changeover range is fixed for all particles, i.e. rin and rout are constant. However, in star clusters, the mass spectrum has a wide range where the ratio between the maximum and minimum mass can be as large as 104. If a very massive object like SMBH exists, the ratio can be 108. For the same distance, the force from massive objects are larger than that of the low-mass objects. Thus the fixed changeover range cannot properly handle the systems with a wide mass spectrum.

We introduce a mass-dependent changeover function, where each particle has an individual changeover range (rin, i, rout, i). The cubic root of the particle mass is used as the coefficient to determine the boundary:
$$\begin{eqnarray*} r_{\mathrm{in},i}&= & \mathrm{max} \left(1, \frac{m_i}{\langle m \rangle } \right)^{\frac{1}{3}}r_{\mathrm{in,ref}}, \nonumber\\ r_{\mathrm{out},i}&=& \mathrm{max} \left(1, \frac{m_i}{\langle m \rangle } \right)^{\frac{1}{3}}r_{\mathrm{out,ref}}, \end{eqnarray*}$$
(12)
where rin, ref and rout, ref are the reference of a fixed changeover range, and 〈m〉 is the average mass of the system. The minimum mass factor is 1.0 so that low-mass particles can avoid too small changeover radii.
If two particles i and j has a separation rij < rin, ij, the perturbation from a distant particle k versus the internal force between the two particles can be estimated as
$$\begin{eqnarray*} f_{\mathrm{p}}(r_{\mathrm{cm},k}) = \frac{m_{k}}{m_{i}+m_{j}} \left(\frac{r_{ij}}{r_{\mathrm{cm},k}} \right)^3, \end{eqnarray*}$$
(13)
where rcm, k is the distance between the centre-of-the-mass of the pair i and j and the perturber k. If mk > mi + mj, the changeover radii between the pair and the perturber are determined by mk. Equation (13) indicates that
$$\begin{eqnarray*} f_{\mathrm{p}}(r_{\mathrm{in},k}) = \mathrm{min} \left(1, \frac{\langle m \rangle }{m_{i}+m_{j}} \right)\left(\frac{r_{ij}}{r_{\mathrm{in,ref}}} \right)^3. \end{eqnarray*}$$
(14)
Thus, fp at the changeover boundary is independent of mk. Therefore, equation (12) is sufficient to handle the tidal perturbation from massive objects.
On the other hand, to avoid logarithmic function, we use the eighth-order polynomial function as the changeover function for potential:
$$\begin{eqnarray*} W(x) = \left\lbrace \begin{array}{@{}l@{\quad }l@{}}\Lambda (1-2 x) &\quad (x\le 0)\\ \Lambda (1-2 x) - 1 + f(x) &\quad (0 \lt x \lt 1)\\ 0 & \quad (x\ge 1) \end{array}\right. \end{eqnarray*}$$
(15)
where
$$\begin{eqnarray*} f(x) & =& 1 + \Lambda x^5 \left(14 - 28 x + 20 x^2 -5 x^3 \right) \nonumber\\ x & =& \frac{r_{ij}-r_{\mathrm{in},ij}}{r_{\mathrm{out},ij}-r_{\mathrm{in},ij}} \nonumber\\ \Lambda & =& \frac{r_{\mathrm{out},ij}-r_{\mathrm{in},ij}}{r_{\mathrm{out},ij}+r_{\mathrm{in},ij}} \nonumber\\ r_{\mathrm{in},ij}&=& \max (r_{\mathrm{in},i},r_{\mathrm{in},j}) \nonumber\\ r_{\mathrm{out},ij}&=& \max (r_{\mathrm{out},i}, r_{\mathrm{out},j}). \end{eqnarray*}$$
(16)
We use x instead of rij in the formulae. The changeover function for force has the form:
$$\begin{eqnarray*} K(x) = \left\lbrace \begin{array}{@{}l@{\quad }l@{}}1 & \quad (x \le 0)\\ (x - 1)^4 \big(1 + 4x + 10 x^2&\\ \qquad\quad + \,20 x^3 + 35 \Lambda x^4 \big) & \quad (0 \lt x \lt 1)\\ 0 & \quad (x \ge 1) \end{array}\right. \end{eqnarray*}$$
(17)
For 0 < x < 1, W(x) and K(x) are related to f(x) by
$$\begin{eqnarray*} W(x) & =& f(x) - f(1) \frac{r_{ij}}{r_{\mathrm{out},ij}} \nonumber\\ K(x) &=& f(x) - \left(x + \frac{r_{\mathrm{in},ij}}{r_{\mathrm{out},ij}-r_{\mathrm{in},ij}}\right) f^{(1)}(x), \end{eqnarray*}$$
(18)
where the number in superscript ‘()’ indicates the times of derivative with respect to x. The second term in the expression of W(x) is an offset ensuring that the potential becomes zero at rout, ij. At the boundary (x = 0, 1), the changeover functions have values:
$$\begin{eqnarray*} \begin{array}{rr} W(1) =0, & \\ K(0) =1, &\quad K(1) =0, \\ K^{(1)}(0) = 0, & \quad K^{(1)}(1)=0, \\ K^{(2)}(0) = 0, & \quad K^{(2)}(1)=0, \\ K^{(3)}(0) = 0, & \quad K^{(3)}(1)=0.\\ \end{array} \end{eqnarray*}$$
(19)
The potential and force of HS reduces to zero after x ≥ 1. All derivatives of K(x) are zero at the boundary. These ensure that the higher order (up to 3) derivatives of force used in the fourth-order Hermite integrator have smooth curves at rin, ij and rout, ij. Fig. 3 shows the examples of these functions with rout, ij = 10 and rin, ij = 1.
The shapes of the changeover functions for potential, W(x), and for force, K(x), and its nth time derivatives, K(n)(x). Here, rin, ij = 1 and rout, ij = 10.
Figure 3.

The shapes of the changeover functions for potential, W(x), and for force, K(x), and its nth time derivatives, K(n)(x). Here, rin, ij = 1 and rout, ij = 10.

Since rin, ij and rout, ij take the maximum values from the changeover radii of i and j particles, the strong force from the massive particles are always preferentially included in HS for a high accuracy. Besides, for both particles, the changeover range is identical, thus the force is symmetric.

6.1.1 Varying changeover radii

To ensure that the P3T method is symplectic, once the changeover radii of all particles are determined, they should keep unchanged during the integration. However, in real star clusters, masses of stars evolve due to the stellar-wind-driven mass loss or the mass transfer and mergers of two stars. Besides, binary can form, disrupt and change members. Thus, after a certain time, the changeover radii of one star or binary may not be suitable anymore and need to be recalculated by using the new mass. This breaks the symplectic properties of the integrator. To minimize the effect, the modification of changeover radii and masses can only be done after a complete leapfrog step.

6.2 Slow-down time-transformed symplectic integrator

Short-period binaries are challenging not only due to the time-consuming integration, but also because of the large cumulative numerical errors after many orbits. The symplectic integrator can conserve Hamiltonian and angular momentum for the long-term evolution. However, it requires a constant integration step, thus very small time-steps have to be used for highly eccentric Kepler orbits and close encounters. One way to avoid small steps is to use the extended phase-space Hamiltonian,
$$\begin{eqnarray*} \Gamma (\boldsymbol {W}) = g(\boldsymbol {W}) \left[H(\boldsymbol {w},t) - H(\boldsymbol {w}(0),0)\right], \end{eqnarray*}$$
(20)
where |$H(\boldsymbol {w},t)$| is the standard Hamiltonian, |$g(\boldsymbol {W})$| is time-transformation function, and |$\boldsymbol {W}$| is the extended phase-space vector that contains |$\boldsymbol {w}$| and new pair of the coordinate, t, and the corresponding conjugate momentum pt. By introducing the new differential variable, s, the equation of motion can be described as
$$\begin{eqnarray*} \frac{\mathrm{d}\boldsymbol {W}}{\mathrm{d}s} = \lbrace \boldsymbol {W}, \Gamma (\boldsymbol {W}) \rbrace \end{eqnarray*}$$
(21)
Thus, the time- and integration steps are decoupled via the time transformation function. For eccentric orbits, time-steps can vary based on the requirement of accuracy and efficiency while ds keeps constant. In order to use explicit symplectic method, |$g(\boldsymbol {W})$| should be designed to make |$\Gamma (\boldsymbol {W})$| separable like equation (7). Mikkola & Tanikawa (1999) and Preto & Tremaine (1999) provided such a solution by using
$$\begin{eqnarray*} g(\boldsymbol {W}) = \frac{f(T(\boldsymbol {P})) - f(-U(\boldsymbol {R}))}{T(\boldsymbol {P}) + U(\boldsymbol {R})} , \end{eqnarray*}$$
(22)
where |$T(\boldsymbol {P})$| and |$U(\boldsymbol {R})$| are kinetic and potential energy in the extended phase space. When f(x) = log (x) with a leapfrog integrator in the drift-kick-drift mode, the Kepler orbit can be integrated very accurately with only round-off errors in positions and velocities and a phase error of time. In Mikkola & Tanikawa (1999), this is named ‘AR’. The AR method can well solve the issue of the long-term cumulative errors.

On the other hand, the SD method described in Section 3.4 can reduce the total integration steps for weakly perturbed binaries. Wang et al. (2020) combined the SD and AR methods and developed the SDAR algorithm to efficiently and accurately integrate the few-body systems. In their work, κ is calculated by the perturbation and time-scale criteria. We set the tree time-step, CsΔtL, as the maximum time-scale criteria where Cs is a coefficient larger than one. In such case, if κ of a weakly perturbed binary reaches the maximum value, ΔtL is small enough to resolve the orbit of the binary in order to provide the correct P3T-kick. With the SD method, the actual integration steps of binaries for a given physical time interval are decoupled from the real Tbin but depends on the perturbation and ΔtL. Since most short-period binaries are weakly perturbed in a star cluster and their Tbin ≪ ΔtL, the total number of integration steps are significantly reduced by a few thousand times. Thus, the SDAR method is the major algorithm in the petar code to solve the multiple time-scale issue.

In the P3T-drift step, we use the SDAR method for compact groups of particles and the Hermite integrator for integrating the motions of singles and centre-of-the-mass of groups. The particle groups are determined by a distant criterion, rij < rg, ij, where
$$\begin{eqnarray*} r_{\mathrm{g},ij}= \theta r_{\mathrm{in},ij}, \end{eqnarray*}$$
(23)
and θ is the opening angle of the PT method. This is not a strict criterion. We use equation (23) so that any pair of members in a group are always inside their inner boundaries of changeover ranges. Thus, the SDAR method only need to deal with the Newtonian force. This avoids the complexity of using the SDAR method and changeover functions together. Besides, if one particle receives a long-range interaction from the group, its members are inside the angle θ viewing from this particle.

6.3 Clustering

In Section 3.6, we show that the switch between the regularization method and the Hermite integrator is expensive with an O(NNb〉) memory access and an O(N) force calculation in nbody6(++gpu). This issue is general for the hybrid methods that use neighbour list and need force calculation for the centre-of-the-masses of groups. In petar, we use the clustering scheme to avoid such expensive switching. This scheme is originally implemented in the pentacle code (Iwasawa et al. 2017) with a uniform neighbour radius. Here, we describe the idea and introduce the improved algorithm based on the orbit-dependent neighbour criterion.

After searching short-range interacting neighbours and before Hermite integration, particles are collected together into different clusters (Fig. 2). The clustering scheme ensures that any member in one cluster have all its neighbours inside the same cluster. In such case, particles outside this cluster only provide the long-range interaction to the members. Thus, during the integration of the short-range interactions (P3T-drift), each cluster is isolated to others and can be integrated in parallel. This feature leads to a great advantage: the switch between the different integration methods in one cluster only affect the neighbour lists and forces of the local members. Since the typical number of members per cluster is a small fraction of the total number of particles, the computational cost of switching is much less.

On the other hand, when mpi parallelization is used, sometimes one cluster may contain members crossing multiple mpi processes (like the cluster, OMP:1, shown in Fig. 2). In such case, one mpi process is chosen to be the host for the cluster and others send particle data to it.

6.3.1 Orbit-dependent neighbour criterion

The number of members in clusters determine the performance of P3T-drift. Thus, it is important to choose a proper neighbour searching criterion. In star clusters with a mass spectrum, we cannot apply the uniform neighbour radius as in pentacle. Instead, we determine the individual neighbour searching radius, rnb, i, based on rout, i and the velocity. First, for each particle, rnb, i must be longer than rout, i. However, we cannot set these two radii the same because during P3T-drift, particles that are initially not inside the short-range interaction region can move closer and penetrate the boundary. Therefore, rnb, i should be long enough to capture such potential neighbours. One safe way is to include the velocity information that
$$\begin{eqnarray*} r_{\mathrm{nb},i}= r_{\mathrm{out},i}+ C_{\mathrm{r}}|\boldsymbol {v}_i| \Delta t_{\mathrm{L}}, \end{eqnarray*}$$
(24)
where |$\boldsymbol {v}_i$| is the particle velocity and Cr is a free coefficient (we use 3.0 for safety).

However, this criterion is independent on the direction of velocity. If the particle velocity is large, rnb, i is significantly long that a huge cluster can form. Unfortunately, high-velocity particles are commonly generated via few-body interactions in star clusters. Thus, we need to reduce the neighbour numbers for these particles. For a high-velocity particle, only neighbours along its path are important and most particles inside rnb, i are not real neighbours. To avoid including these unnecessary neighbours, a 3D neighbour searching criterion which depends on the direction of velocity is needed. However, such criterion is not computationally efficient and is not supported by the current version of fdps. To solve this issue, we apply a two-stage method:

  • Obtain the neighbour candidates by applying the spherically symmetric neighbour searching using equation (24).

  • Select true neighbours if the candidate j has the Kepler orbital pericentre separation, rp, ij < Cvrnb, i, where Cv is a free coefficient (e.g. 1.5).

The first step has the calculation cost of O(Nlog N) by using the PT method. Since a proper ΔtL leads to a large fraction of particles with no neighbour candidates, the cost of evaluations of Kepler orbital pericentre at the second step is not expensive. Thus, this method is efficient to deal with the problem of high-velocity particles.

6.4 Artificial particle algorithm for weak perturbation

Both the changeover function and ΔtL influence the performance and accuracy of the simulations. We can understand this by analysing a situation where a binary receives the long-range perturbation force. Although the long-range force is much weaker compared to the internal force of the binary, it is still important to ensure that ΔtL < Tbin. Otherwise a random phase of binary is chosen to evaluate the long-range force, which does not represent the correct perturbation. This is the same for the counter force. If the binary is very massive, this error can be significant.

However, keeping ΔtL < Tbin is difficult since Tbin can be very small. In Section 6.2, we show that the SD method can artificially increase Tbin, which helps to avoid too small ΔtL. But only weakly perturbed binaries can have large enough κ. When a tight binary has close neighbours, the effective Tbin can be much smaller than ΔtL. This can frequently happen in star clusters.

To solve this issue, we introduce the ‘artificial particle algorithm’. In this algorithm, instead of calculating the long-range force once and giving a large velocity kick per ΔtL, we can construct the local potential (tidal-tensor) field near the binary and use it to calculate a smooth evolution of the long-range perturbation every AR step. The tidal-tensor field can be obtained by measuring the long-range forces of a group of artificial particles near the binary. On the other hand, another group of artificial particles along or near the orbit of the binary can be used to to represent the correct orbit-averaged counterforce.

This algorithm increases the total number of particles, thus the number of long-range interactions becomes more. However, this additional cost can be easily reduced by increasing the number of computing cores. If we use small ΔtL, there is no such simple solution. Besides, adding artificial particles is easy to implement. This is especially convenient for using the fdps library and the accelerators such as SIMD and GPU.

6.4.1 Tidal tensor

Here, we describe the algorithm to obtain the local tidal-tensor field near the binary. Based on a 3D Taylor expansion, the acceleration of a particle at an arbitrary position near a fixed centre can be evaluated by:
$$\begin{eqnarray*} \boldsymbol {A}_i(\boldsymbol {r}) &=& \boldsymbol {A}_i(\boldsymbol {r}_{\mathrm{cm}}) + \boldsymbol {A}_{ij} \cdot (\boldsymbol {r}-\boldsymbol {r}_{\mathrm{cm}})_{j} \nonumber\\ &&+ \,\frac{1}{2!} (\boldsymbol {r}-\boldsymbol {r}_{\mathrm{cm}})_{j}^T \cdot \boldsymbol {A}_{ijk} \cdot (\boldsymbol {r}-\boldsymbol {r}_{\mathrm{cm}})_{k} + ... \nonumber\\ A_{ij} &=& \frac{\partial A_{i}}{\partial r_{j}}\nonumber\\ A_{ijk} &=& \frac{\partial ^2 A_{i}}{\partial r_{j}\partial r_{k}} \end{eqnarray*}$$
(25)
where Aij and Aijk are individual components of the tensors, |$\boldsymbol{A}_{ij}$| and |$\boldsymbol {A}_{ijk}$|⁠, respectively; and ‘·’ represents matrix multiplication. In the P3T method, the numerical long-range forces are constant within one ΔtL, so should be the tensor field. Thus, we only need to measure the tensors once per ΔtL. Then, using equation (25), the long-range perturbation on an arbitrary orbital phase of the binary can be evaluated during the P3T-drift. For the gravitational field, |$\boldsymbol {A}_{ij}$| and |$\boldsymbol {A}_{ijk}$| are symmetric tensors. The number of elements of the first three orders are 3 (⁠|$\boldsymbol {A}_i(\boldsymbol {r}_{\rm {cm}})$|⁠), 6 (⁠|$\boldsymbol {A}_{ij}$|⁠), and 10 (⁠|$\boldsymbol {A}_{ijk}$|⁠), respectively. Thus, the second-order method has totally 9 elements and the third order has 19.

To obtain these tensor elements, we can create measure points (zero-mass artificial particles) near the centre-of-the-mass of the binary. These artificial particles obtain the long-range interactions during P3T-kick. Using the 3D accelerations of one measure point in equation (25), we can get three independent linear equations of the tensor elements. At least three measure points are needed to obtain the unique values of tensors up to the second order. The third-order case requires seven points.

The acceleration of the centre-of-the-mass can be used to directly measure the zero-order acceleration, |$\boldsymbol {A}(\boldsymbol {r}_{\mathrm{cm}})$|⁠. We collect other components of the tensors in 1D vectors for the second- (2nd) and third- (3rd) order methods:
$$\begin{eqnarray*} \begin{array}{lccccccc}\boldsymbol{T}_{i} (2{\mathrm{nd}}) = \big[ &\,\,\,\, A_{xx} &\,\,\,\, A_{xy} &\,\,\,\, A_{xz} &\,\,\,\, A_{yy} &\,\,\,\, A_{yz} &\,\,\,\, A_{zz} &\,\,\,\, \big]\\ \boldsymbol{T}_{i} (3{\mathrm{rd}}) = \big[ &\,\,\,\, A_{xx} &\,\,\,\, A_{xy} &\,\,\,\, A_{xz} &\,\,\,\, A_{yy} &\,\,\,\, A_{yz} &\,\,\,\, A_{zz} &\,\,\,\,\\ &\,\,\,\, A_{xxx} &\,\,\,\, A_{xxy} &\,\,\,\, A_{xxz} &\,\,\,\, A_{xyy} &\,\,\,\, A_{xyz} &\,\,\,\, A_{xzz}&\,\,\,\,\\ &\,\,\,\, A_{yyy} &\,\,\,\, A_{yyz} &\,\,\,\, A_{yzz} &\,\,\,\, A_{zzz} &\,\,\,\, &\,\,\,\, &\,\,\,\, \big]. \end{array} \end{eqnarray*}$$
(26)
The accelerations of measure points excluding |$\boldsymbol {A}(\boldsymbol {r}_{\mathrm{cm}})$| can be also collected as a 1D vector:
$$\begin{eqnarray*} \begin{array}{lccc}\boldsymbol {A^{\prime }}_{\mathrm{j}} = &\quad \big[ A^{\prime }_{\mathrm{x}}(\boldsymbol {r}_{\mathrm{1}}) &\quad A^{\prime }_{\mathrm{y}}(\boldsymbol {r}_{\mathrm{1}}) &\quad A^{\prime }_{\mathrm{z}}(\boldsymbol {r}_{\mathrm{1}})\\ &\quad A^{\prime }_{\mathrm{x}}(\boldsymbol {r}_{\mathrm{2}}) &\quad A^{\prime }_{\mathrm{y}}(\boldsymbol {r}_{\mathrm{2}}) &\quad A^{\prime }_{\mathrm{z}}(\boldsymbol {r}_{\mathrm{2}})\\ &\quad ... &\quad &\quad \big], \end{array} \end{eqnarray*}$$
(27)
where |$\boldsymbol {A}^{\prime }(\boldsymbol {r})=\boldsymbol {A}(\boldsymbol {r})-\boldsymbol {A}(\boldsymbol {r}_{\mathrm{cm}})$| and the suffixes, 1,2,3,..., are the indices of points.
Based on equation (25), |$\boldsymbol {T_{\mathrm{i}}}$| and |$\boldsymbol {A}_{\mathrm{j}}$| can be described by a linear mapping:
$$\begin{eqnarray*} \boldsymbol {M}_{ij} \boldsymbol {T}_{\mathrm{i}} = \boldsymbol {A}_{\mathrm{j}} \end{eqnarray*}$$
(28)
Once the generalized inverse matrix, |$\boldsymbol {M}_{ij}^{-1}$|⁠, is obtained, |$\boldsymbol {T}_{\mathrm{i}}$| can be easily calculated once |$\boldsymbol {A}_{\mathrm{j}}$| are measured.

In principle 2 points excluding the centre-of-the-mass are enough for the second-order method. However, we can only obtain the 2D information in a plane. Thus we use 4 points locating at the corners of a regular tetrahedron (see the upper panel in Fig. 4). In the third-order case, the corners of a regular octahedron can provide 6 points. However, in such case, we find the rank of |$\boldsymbol {M}_{ij}$| is not full, so that |$\boldsymbol {M}_{ij}^{-1}$| cannot be constructed. Thus, we use 8 points locating at the corners of a cube instead (the lower panel in Fig. 4). Although two additional points are needed for both two methods, we obtain the benefit that the condition numbers of the matrices (the maximum singular value versus the minimum) are small: 2 and 12.7 for the second- and the third-order methods, respectively. This means that the relative error of measurement inherited from |$\boldsymbol {A}_{\mathrm{j}}$| can be maximally enlarged by a factor of 2 or 12.7 in |$\boldsymbol {M}_{ij}^{-1}$|⁠. The exact values of the elements in |$\boldsymbol {M}_{ij}^{-1}$| can also be obtained easily. Table 1 provides the complete formulae to evaluate |$\boldsymbol {T}_{\mathrm{i}}$| and to calculate the acceleration at any |$\boldsymbol {r}$|⁠. The corresponding coordinates of the measure points are also provided.

The illustration of the spatial distribution of the artificial particles in the tidal-tensor, orbit-sampling, and pseudo-particle multipole methods. Open pentagons at the corners and the centre are tidal-tensor measure points. Filled stars along the orbits of two binary components are pseudo-particles or orbit-sampling particles used for evaluating the orbit-averaged long-range counterforce to distant particles. Notice here two combinations are shown, it is possible to combine different orders of tidal tensors with either pseudo-particle multipole or orbit-sampling.
Figure 4.

The illustration of the spatial distribution of the artificial particles in the tidal-tensor, orbit-sampling, and pseudo-particle multipole methods. Open pentagons at the corners and the centre are tidal-tensor measure points. Filled stars along the orbits of two binary components are pseudo-particles or orbit-sampling particles used for evaluating the orbit-averaged long-range counterforce to distant particles. Notice here two combinations are shown, it is possible to combine different orders of tidal tensors with either pseudo-particle multipole or orbit-sampling.

Table 1.

The second- (upper block) and third-order (lower block) tidal tensor methods. Each block contains three parts: (1) the formulae to calculate the tensor coefficients, |$\boldsymbol {A}_{ij}$| and |$\boldsymbol {A}_{ijk}$|⁠, where the centre-of-the-mass acceleration is subtracted in |$\boldsymbol {A}^{\prime }(\boldsymbol {r}_{\mathrm{i}})$|⁠; (2) the formulae to calculate the acceleration |$\boldsymbol {A}(\boldsymbol {r})$| for a given position, where |$\boldsymbol {r}^{\prime }$| (x, y, z) is the coordinate referring to the centre-of-the-mass (⁠|$\boldsymbol {r}-\boldsymbol {r}_{\mathrm{cm}}$|⁠); (3) the coordinates of the measure points, where dc is the half-length of the edge of the tetrahedron or the cube.

Second-order tidal tensor with four measure points at corners of a regular tetrahedron
Axx=|$\frac{1}{d_{\mathrm{c}}} \big[ + \frac{1}{2} A^{\prime }_{x}(\boldsymbol {r}_1)- \frac{1}{2} A^{\prime }_{x}(\boldsymbol {r}_2)\big]$|
Axy=|$\frac{1}{d_{\mathrm{c}}} \big[ + \frac{1}{4} A^{\prime }_{y}(\boldsymbol {r}_1)- \frac{1}{4} A^{\prime }_{y}(\boldsymbol {r}_2)+ \frac{1}{4} A^{\prime }_{x}(\boldsymbol {r}_3)- \frac{1}{4} A^{\prime }_{x}(\boldsymbol {r}_4)\big]$|
Axz=|$\frac{1}{d_{\mathrm{c}}} \big[ - \frac{\sqrt{2}}{8} A^{\prime }_{x}(\boldsymbol {r}_1)+ \frac{1}{4} A^{\prime }_{z}(\boldsymbol {r}_1)- \frac{\sqrt{2}}{8} A^{\prime }_{x}(\boldsymbol {r}_2)- \frac{1}{4} A^{\prime }_{z}(\boldsymbol {r}_2)+ \frac{\sqrt{2}}{8} A^{\prime }_{x}(\boldsymbol {r}_3)+ \frac{\sqrt{2}}{8} A^{\prime }_{x}(\boldsymbol {r}_4)\big]$|
Ayy=|$\frac{1}{d_{\mathrm{c}}} \big[ + \frac{1}{2} A^{\prime }_{y}(\boldsymbol {r}_3)- \frac{1}{2} A^{\prime }_{y}(\boldsymbol {r}_4)\big]$|
Ayz=|$\frac{1}{d_{\mathrm{c}}} \big[ - \frac{\sqrt{2}}{8} A^{\prime }_{y}(\boldsymbol {r}_1)- \frac{\sqrt{2}}{8} A^{\prime }_{y}(\boldsymbol {r}_2)+ \frac{\sqrt{2}}{8} A^{\prime }_{y}(\boldsymbol {r}_3)+ \frac{1}{4} A^{\prime }_{z}(\boldsymbol {r}_3)+ \frac{\sqrt{2}}{8} A^{\prime }_{y}(\boldsymbol {r}_4)- \frac{1}{4} A^{\prime }_{z}(\boldsymbol {r}_4)\big]$|
Azz=|$\frac{1}{d_{\mathrm{c}}} \big[ - \frac{\sqrt{2}}{4} A^{\prime }_{z}(\boldsymbol {r}_1)- \frac{\sqrt{2}}{4} A^{\prime }_{z}(\boldsymbol {r}_2)+ \frac{\sqrt{2}}{4} A^{\prime }_{z}(\boldsymbol {r}_3)+ \frac{\sqrt{2}}{4} A^{\prime }_{z}(\boldsymbol {r}_4)\big]$|
|$A_{x}(\boldsymbol {r})$|=|$A_{x}(\boldsymbol {r}_{\mathrm{cm}}) + A_{xx}x^{\prime } + A_{xy}y^{\prime } + A_{xz}z^{\prime }$|
|$A_{y}(\boldsymbol {r})$|=|$A_{y}(\boldsymbol {r}_{\mathrm{cm}}) + A_{xy}x^{\prime } + A_{yy}y^{\prime } + A_{yz}z^{\prime }$|
|$A_{z}(\boldsymbol {r})$|=|$A_{z}(\boldsymbol {r}_{\mathrm{cm}}) + A_{xz}x^{\prime } + A_{yz}y^{\prime } + A_{zz}z^{\prime }$|
|$\boldsymbol {r}^{\prime }_{\mathrm{i}}/d_{\mathrm{c}}$|=|$\left [ 1\quad 0 \quad - \frac{\sqrt{2}}{2} \right ]\quad\left [ -1\quad 0 \quad - \frac{\sqrt{2}}{2} \right ]\quad\left [ 0\quad 1 \quad - \frac{\sqrt{2}}{2} \right ]\quad\left [ 0\quad -1 \quad - \frac{\sqrt{2}}{2} \right ]$|
Third-order tidal tensor with eight measure points at corners of a cube
Axx=|$\frac{1}{d_{\mathrm{c}}} \big[ + \frac{1}{4} A^{\prime }_{x}(\boldsymbol {r}_1)- \frac{1}{4} A^{\prime }_{x}(\boldsymbol {r}_3)+ \frac{1}{4} A^{\prime }_{x}(\boldsymbol {r}_5)- \frac{1}{4} A^{\prime }_{x}(\boldsymbol {r}_7)\big]$|
Axy=|$\frac{1}{d_{\mathrm{c}}} \big[ + \frac{1}{8} A^{\prime }_{y}(\boldsymbol {r}_1)+ \frac{1}{8} A^{\prime }_{x}(\boldsymbol {r}_2)- \frac{1}{8} A^{\prime }_{y}(\boldsymbol {r}_3)- \frac{1}{8} A^{\prime }_{x}(\boldsymbol {r}_4)+ \frac{1}{8} A^{\prime }_{y}(\boldsymbol {r}_5)+ \frac{1}{8} A^{\prime }_{x}(\boldsymbol {r}_6)- \frac{1}{8} A^{\prime }_{y}(\boldsymbol {r}_7)- \frac{1}{8} A^{\prime }_{x}(\boldsymbol {r}_8)\big]$|
Axz=|$\frac{1}{d_{\mathrm{c}}} \big[ - \frac{1}{12} A^{\prime }_{x}(\boldsymbol {r}_1)+ \frac{1}{12} A^{\prime }_{z}(\boldsymbol {r}_1)- \frac{1}{12} A^{\prime }_{x}(\boldsymbol {r}_2)- \frac{1}{12} A^{\prime }_{x}(\boldsymbol {r}_3)- \frac{1}{12} A^{\prime }_{z}(\boldsymbol {r}_3)- \frac{1}{12} A^{\prime }_{x}(\boldsymbol {r}_4)+ \frac{1}{12} A^{\prime }_{x}(\boldsymbol {r}_5)+ \frac{1}{12} A^{\prime }_{z}(\boldsymbol {r}_5)+ \frac{1}{12} A^{\prime }_{x}(\boldsymbol {r}_6)$|
|$+ \frac{1}{12} A^{\prime }_{x}(\boldsymbol {r}_7)- \frac{1}{12} A^{\prime }_{z}(\boldsymbol {r}_7)+ \frac{1}{12} A^{\prime }_{x}(\boldsymbol {r}_8)\big]$|
Ayy=|$\frac{1}{d_{\mathrm{c}}} \big[ + \frac{1}{4} A^{\prime }_{y}(\boldsymbol {r}_2)- \frac{1}{4} A^{\prime }_{y}(\boldsymbol {r}_4)+ \frac{1}{4} A^{\prime }_{y}(\boldsymbol {r}_6)- \frac{1}{4} A^{\prime }_{y}(\boldsymbol {r}_8)\big]$|
Ayz=|$\frac{1}{d_{\mathrm{c}}} \big[ - \frac{1}{12} A^{\prime }_{y}(\boldsymbol {r}_1)- \frac{1}{12} A^{\prime }_{y}(\boldsymbol {r}_2)+ \frac{1}{12} A^{\prime }_{z}(\boldsymbol {r}_2)- \frac{1}{12} A^{\prime }_{y}(\boldsymbol {r}_3)- \frac{1}{12} A^{\prime }_{y}(\boldsymbol {r}_4)- \frac{1}{12} A^{\prime }_{z}(\boldsymbol {r}_4)+ \frac{1}{12} A^{\prime }_{y}(\boldsymbol {r}_5)+ \frac{1}{12} A^{\prime }_{y}(\boldsymbol {r}_6)+ \frac{1}{12} A^{\prime }_{z}(\boldsymbol {r}_6)$|
|$+ \frac{1}{12} A^{\prime }_{y}(\boldsymbol {r}_7)+ \frac{1}{12} A^{\prime }_{y}(\boldsymbol {r}_8)- \frac{1}{12} A^{\prime }_{z}(\boldsymbol {r}_8)\big]$|
Azz=|$\frac{1}{d_{\mathrm{c}}} \big[ - \frac{1}{8} A^{\prime }_{z}(\boldsymbol {r}_1)- \frac{1}{8} A^{\prime }_{z}(\boldsymbol {r}_2)- \frac{1}{8} A^{\prime }_{z}(\boldsymbol {r}_3)- \frac{1}{8} A^{\prime }_{z}(\boldsymbol {r}_4)+ \frac{1}{8} A^{\prime }_{z}(\boldsymbol {r}_5)+ \frac{1}{8} A^{\prime }_{z}(\boldsymbol {r}_6)+ \frac{1}{8} A^{\prime }_{z}(\boldsymbol {r}_7)+ \frac{1}{8} A^{\prime }_{z}(\boldsymbol {r}_8)\big]$|
Axxx=|$\frac{1}{d_{\mathrm{c}}^2} \big[ + \frac{1}{4} A^{\prime }_{x}(\boldsymbol {r}_1)+ \frac{1}{8} A^{\prime }_{z}(\boldsymbol {r}_1)+ \frac{1}{4} A^{\prime }_{x}(\boldsymbol {r}_3)- \frac{1}{8} A^{\prime }_{z}(\boldsymbol {r}_3)+ \frac{1}{4} A^{\prime }_{x}(\boldsymbol {r}_5)- \frac{1}{8} A^{\prime }_{z}(\boldsymbol {r}_5)+ \frac{1}{4} A^{\prime }_{x}(\boldsymbol {r}_7)+ \frac{1}{8} A^{\prime }_{z}(\boldsymbol {r}_7)\big]$|
Axxy=|$\frac{1}{d_{\mathrm{c}}^2} \big[ + \frac{1}{4} A^{\prime }_{y}(\boldsymbol {r}_1)+ \frac{1}{8} A^{\prime }_{z}(\boldsymbol {r}_2)+ \frac{1}{4} A^{\prime }_{y}(\boldsymbol {r}_3)- \frac{1}{8} A^{\prime }_{z}(\boldsymbol {r}_4)+ \frac{1}{4} A^{\prime }_{y}(\boldsymbol {r}_5)- \frac{1}{8} A^{\prime }_{z}(\boldsymbol {r}_6)+ \frac{1}{4} A^{\prime }_{y}(\boldsymbol {r}_7)+ \frac{1}{8} A^{\prime }_{z}(\boldsymbol {r}_8)\big]$|
Axxz=|$\frac{1}{d_{\mathrm{c}}^2} \big[ - \frac{9}{80} A^{\prime }_{x}(\boldsymbol {r}_1)+ \frac{1}{40} A^{\prime }_{z}(\boldsymbol {r}_1)- \frac{1}{80} A^{\prime }_{y}(\boldsymbol {r}_2)- \frac{1}{40} A^{\prime }_{z}(\boldsymbol {r}_2)+ \frac{9}{80} A^{\prime }_{x}(\boldsymbol {r}_3)+ \frac{1}{40} A^{\prime }_{z}(\boldsymbol {r}_3)+ \frac{1}{80} A^{\prime }_{y}(\boldsymbol {r}_4)- \frac{1}{40} A^{\prime }_{z}(\boldsymbol {r}_4)+ \frac{9}{80} A^{\prime }_{x}(\boldsymbol {r}_5)$|
|$+ \frac{1}{40} A^{\prime }_{z}(\boldsymbol {r}_5)+ \frac{1}{80} A^{\prime }_{y}(\boldsymbol {r}_6)- \frac{1}{40} A^{\prime }_{z}(\boldsymbol {r}_6)- \frac{9}{80} A^{\prime }_{x}(\boldsymbol {r}_7)+ \frac{1}{40} A^{\prime }_{z}(\boldsymbol {r}_7)- \frac{1}{80} A^{\prime }_{y}(\boldsymbol {r}_8)- \frac{1}{40} A^{\prime }_{z}(\boldsymbol {r}_8)\big]$|
Axyy=|$\frac{1}{d_{\mathrm{c}}^2} \big[ + \frac{1}{8} A^{\prime }_{z}(\boldsymbol {r}_1)+ \frac{1}{4} A^{\prime }_{x}(\boldsymbol {r}_2)- \frac{1}{8} A^{\prime }_{z}(\boldsymbol {r}_3)+ \frac{1}{4} A^{\prime }_{x}(\boldsymbol {r}_4)- \frac{1}{8} A^{\prime }_{z}(\boldsymbol {r}_5)+ \frac{1}{4} A^{\prime }_{x}(\boldsymbol {r}_6)+ \frac{1}{8} A^{\prime }_{z}(\boldsymbol {r}_7)+ \frac{1}{4} A^{\prime }_{x}(\boldsymbol {r}_8)\big]$|
Axyz=|$\frac{1}{d_{\mathrm{c}}^2} \big[ - \frac{1}{16} A^{\prime }_{y}(\boldsymbol {r}_1)- \frac{1}{16} A^{\prime }_{x}(\boldsymbol {r}_2)+ \frac{1}{16} A^{\prime }_{y}(\boldsymbol {r}_3)+ \frac{1}{16} A^{\prime }_{x}(\boldsymbol {r}_4)+ \frac{1}{16} A^{\prime }_{y}(\boldsymbol {r}_5)+ \frac{1}{16} A^{\prime }_{x}(\boldsymbol {r}_6)- \frac{1}{16} A^{\prime }_{y}(\boldsymbol {r}_7)- \frac{1}{16} A^{\prime }_{x}(\boldsymbol {r}_8)\big]$|
Axzz=|$\frac{1}{d_{\mathrm{c}}^2} \big[ - \frac{1}{8} A^{\prime }_{z}(\boldsymbol {r}_1)+ \frac{1}{8} A^{\prime }_{z}(\boldsymbol {r}_3)+ \frac{1}{8} A^{\prime }_{z}(\boldsymbol {r}_5)- \frac{1}{8} A^{\prime }_{z}(\boldsymbol {r}_7)\big]$|
Ayyy=|$\frac{1}{d_{\mathrm{c}}^2} \big[ + \frac{1}{4} A^{\prime }_{y}(\boldsymbol {r}_2)+ \frac{1}{8} A^{\prime }_{z}(\boldsymbol {r}_2)+ \frac{1}{4} A^{\prime }_{y}(\boldsymbol {r}_4)- \frac{1}{8} A^{\prime }_{z}(\boldsymbol {r}_4)+ \frac{1}{4} A^{\prime }_{y}(\boldsymbol {r}_6)- \frac{1}{8} A^{\prime }_{z}(\boldsymbol {r}_6)+ \frac{1}{4} A^{\prime }_{y}(\boldsymbol {r}_8)+ \frac{1}{8} A^{\prime }_{z}(\boldsymbol {r}_8)\big]$|
Ayyz=|$\frac{1}{d_{\mathrm{c}}^2} \big[ - \frac{1}{80} A^{\prime }_{x}(\boldsymbol {r}_1)- \frac{1}{40} A^{\prime }_{z}(\boldsymbol {r}_1)- \frac{9}{80} A^{\prime }_{y}(\boldsymbol {r}_2)+ \frac{1}{40} A^{\prime }_{z}(\boldsymbol {r}_2)+ \frac{1}{80} A^{\prime }_{x}(\boldsymbol {r}_3)- \frac{1}{40} A^{\prime }_{z}(\boldsymbol {r}_3)+ \frac{9}{80} A^{\prime }_{y}(\boldsymbol {r}_4)+ \frac{1}{40} A^{\prime }_{z}(\boldsymbol {r}_4)+ \frac{1}{80} A^{\prime }_{x}(\boldsymbol {r}_5)$|
|$- \frac{1}{40} A^{\prime }_{z}(\boldsymbol {r}_5)+ \frac{9}{80} A^{\prime }_{y}(\boldsymbol {r}_6)+ \frac{1}{40} A^{\prime }_{z}(\boldsymbol {r}_6)- \frac{1}{80} A^{\prime }_{x}(\boldsymbol {r}_7)- \frac{1}{40} A^{\prime }_{z}(\boldsymbol {r}_7)- \frac{9}{80} A^{\prime }_{y}(\boldsymbol {r}_8)+ \frac{1}{40} A^{\prime }_{z}(\boldsymbol {r}_8)\big]$|
Ayzz=|$\frac{1}{d_{\mathrm{c}}^2} \big[ - \frac{1}{8} A^{\prime }_{z}(\boldsymbol {r}_2)+ \frac{1}{8} A^{\prime }_{z}(\boldsymbol {r}_4)+ \frac{1}{8} A^{\prime }_{z}(\boldsymbol {r}_6)- \frac{1}{8} A^{\prime }_{z}(\boldsymbol {r}_8)\big]$|
Azzz=|$\frac{1}{d_{\mathrm{c}}^2} \big[ + \frac{1}{16} A^{\prime }_{x}(\boldsymbol {r}_1)+ \frac{1}{8} A^{\prime }_{z}(\boldsymbol {r}_1)+ \frac{1}{16} A^{\prime }_{y}(\boldsymbol {r}_2)+ \frac{1}{8} A^{\prime }_{z}(\boldsymbol {r}_2)- \frac{1}{16} A^{\prime }_{x}(\boldsymbol {r}_3)+ \frac{1}{8} A^{\prime }_{z}(\boldsymbol {r}_3)- \frac{1}{16} A^{\prime }_{y}(\boldsymbol {r}_4)+ \frac{1}{8} A^{\prime }_{z}(\boldsymbol {r}_4)- \frac{1}{16} A^{\prime }_{x}(\boldsymbol {r}_5)$|
|$+ \frac{1}{8} A^{\prime }_{z}(\boldsymbol {r}_5)- \frac{1}{16} A^{\prime }_{y}(\boldsymbol {r}_6)+ \frac{1}{8} A^{\prime }_{z}(\boldsymbol {r}_6)+ \frac{1}{16} A^{\prime }_{x}(\boldsymbol {r}_7)+ \frac{1}{8} A^{\prime }_{z}(\boldsymbol {r}_7)+ \frac{1}{16} A^{\prime }_{y}(\boldsymbol {r}_8)+ \frac{1}{8} A^{\prime }_{z}(\boldsymbol {r}_8)\big]$|
|$A_{x}(\boldsymbol {r})$|=|$A_{x}(\boldsymbol {r}_{\mathrm{cm}}) + A_{xxx}x^{\prime 2} + 2A_{xxy}x^{\prime }y^{\prime } + 2A_{xxz}x^{\prime }z^{\prime } + A_{xx}x^{\prime } + A_{xyy}y^{\prime 2} + 2A_{xyz}y^{\prime }z^{\prime } + A_{xy}y^{\prime } + A_{xzz}z^{\prime 2} + A_{xz}z^{\prime }$|
|$A_{y}(\boldsymbol {r})$|=|$A_{y}(\boldsymbol {r}_{\mathrm{cm}}) + A_{xxy}x^{\prime 2} + 2A_{xyy}x^{\prime }y^{\prime } + 2A_{xyz}x^{\prime }z^{\prime } + A_{xy}x^{\prime } + A_{yyy}y^{\prime 2} + 2A_{yyz}y^{\prime }z^{\prime } + A_{yy}y^{\prime } + A_{yzz}z^{\prime 2} + A_{yz}z^{\prime }$|
|$A_{z}(\boldsymbol {r})$|=|$A_{z}(\boldsymbol {r}_{\mathrm{cm}}) + A_{xxz}x^{\prime 2} + 2A_{xyz}x^{\prime }y^{\prime } + 2A_{xzz}x^{\prime }z^{\prime } + A_{xz}x^{\prime } + A_{yyz}y^{\prime 2} + 2A_{yzz}y^{\prime }z^{\prime } + A_{yz}y^{\prime } + A_{zzz}z^{\prime 2} + A_{zz}z^{\prime }$|
|$\boldsymbol {r}^{\prime }_{\mathrm{i}}/d_{\mathrm{c}}$|=|$\left [\,1\quad 0 \quad -1\right ] \quad \left [\,0\quad 1 \quad -1\right ]\quad \left [\,-1\quad 0 \quad -1\right ]\quad \left [\,0\quad -1 \quad -1\right ]\quad \left [\,1\quad 0 \quad 1\right ]\quad \left [\,0\quad 1 \quad 1\right ]\quad \left [\,-1\quad 0 \quad 1\right ]\quad \left [\,0\quad -1 \quad 1\right ]$|
Second-order tidal tensor with four measure points at corners of a regular tetrahedron
Axx=|$\frac{1}{d_{\mathrm{c}}} \big[ + \frac{1}{2} A^{\prime }_{x}(\boldsymbol {r}_1)- \frac{1}{2} A^{\prime }_{x}(\boldsymbol {r}_2)\big]$|
Axy=|$\frac{1}{d_{\mathrm{c}}} \big[ + \frac{1}{4} A^{\prime }_{y}(\boldsymbol {r}_1)- \frac{1}{4} A^{\prime }_{y}(\boldsymbol {r}_2)+ \frac{1}{4} A^{\prime }_{x}(\boldsymbol {r}_3)- \frac{1}{4} A^{\prime }_{x}(\boldsymbol {r}_4)\big]$|
Axz=|$\frac{1}{d_{\mathrm{c}}} \big[ - \frac{\sqrt{2}}{8} A^{\prime }_{x}(\boldsymbol {r}_1)+ \frac{1}{4} A^{\prime }_{z}(\boldsymbol {r}_1)- \frac{\sqrt{2}}{8} A^{\prime }_{x}(\boldsymbol {r}_2)- \frac{1}{4} A^{\prime }_{z}(\boldsymbol {r}_2)+ \frac{\sqrt{2}}{8} A^{\prime }_{x}(\boldsymbol {r}_3)+ \frac{\sqrt{2}}{8} A^{\prime }_{x}(\boldsymbol {r}_4)\big]$|
Ayy=|$\frac{1}{d_{\mathrm{c}}} \big[ + \frac{1}{2} A^{\prime }_{y}(\boldsymbol {r}_3)- \frac{1}{2} A^{\prime }_{y}(\boldsymbol {r}_4)\big]$|
Ayz=|$\frac{1}{d_{\mathrm{c}}} \big[ - \frac{\sqrt{2}}{8} A^{\prime }_{y}(\boldsymbol {r}_1)- \frac{\sqrt{2}}{8} A^{\prime }_{y}(\boldsymbol {r}_2)+ \frac{\sqrt{2}}{8} A^{\prime }_{y}(\boldsymbol {r}_3)+ \frac{1}{4} A^{\prime }_{z}(\boldsymbol {r}_3)+ \frac{\sqrt{2}}{8} A^{\prime }_{y}(\boldsymbol {r}_4)- \frac{1}{4} A^{\prime }_{z}(\boldsymbol {r}_4)\big]$|
Azz=|$\frac{1}{d_{\mathrm{c}}} \big[ - \frac{\sqrt{2}}{4} A^{\prime }_{z}(\boldsymbol {r}_1)- \frac{\sqrt{2}}{4} A^{\prime }_{z}(\boldsymbol {r}_2)+ \frac{\sqrt{2}}{4} A^{\prime }_{z}(\boldsymbol {r}_3)+ \frac{\sqrt{2}}{4} A^{\prime }_{z}(\boldsymbol {r}_4)\big]$|
|$A_{x}(\boldsymbol {r})$|=|$A_{x}(\boldsymbol {r}_{\mathrm{cm}}) + A_{xx}x^{\prime } + A_{xy}y^{\prime } + A_{xz}z^{\prime }$|
|$A_{y}(\boldsymbol {r})$|=|$A_{y}(\boldsymbol {r}_{\mathrm{cm}}) + A_{xy}x^{\prime } + A_{yy}y^{\prime } + A_{yz}z^{\prime }$|
|$A_{z}(\boldsymbol {r})$|=|$A_{z}(\boldsymbol {r}_{\mathrm{cm}}) + A_{xz}x^{\prime } + A_{yz}y^{\prime } + A_{zz}z^{\prime }$|
|$\boldsymbol {r}^{\prime }_{\mathrm{i}}/d_{\mathrm{c}}$|=|$\left [ 1\quad 0 \quad - \frac{\sqrt{2}}{2} \right ]\quad\left [ -1\quad 0 \quad - \frac{\sqrt{2}}{2} \right ]\quad\left [ 0\quad 1 \quad - \frac{\sqrt{2}}{2} \right ]\quad\left [ 0\quad -1 \quad - \frac{\sqrt{2}}{2} \right ]$|
Third-order tidal tensor with eight measure points at corners of a cube
Axx=|$\frac{1}{d_{\mathrm{c}}} \big[ + \frac{1}{4} A^{\prime }_{x}(\boldsymbol {r}_1)- \frac{1}{4} A^{\prime }_{x}(\boldsymbol {r}_3)+ \frac{1}{4} A^{\prime }_{x}(\boldsymbol {r}_5)- \frac{1}{4} A^{\prime }_{x}(\boldsymbol {r}_7)\big]$|
Axy=|$\frac{1}{d_{\mathrm{c}}} \big[ + \frac{1}{8} A^{\prime }_{y}(\boldsymbol {r}_1)+ \frac{1}{8} A^{\prime }_{x}(\boldsymbol {r}_2)- \frac{1}{8} A^{\prime }_{y}(\boldsymbol {r}_3)- \frac{1}{8} A^{\prime }_{x}(\boldsymbol {r}_4)+ \frac{1}{8} A^{\prime }_{y}(\boldsymbol {r}_5)+ \frac{1}{8} A^{\prime }_{x}(\boldsymbol {r}_6)- \frac{1}{8} A^{\prime }_{y}(\boldsymbol {r}_7)- \frac{1}{8} A^{\prime }_{x}(\boldsymbol {r}_8)\big]$|
Axz=|$\frac{1}{d_{\mathrm{c}}} \big[ - \frac{1}{12} A^{\prime }_{x}(\boldsymbol {r}_1)+ \frac{1}{12} A^{\prime }_{z}(\boldsymbol {r}_1)- \frac{1}{12} A^{\prime }_{x}(\boldsymbol {r}_2)- \frac{1}{12} A^{\prime }_{x}(\boldsymbol {r}_3)- \frac{1}{12} A^{\prime }_{z}(\boldsymbol {r}_3)- \frac{1}{12} A^{\prime }_{x}(\boldsymbol {r}_4)+ \frac{1}{12} A^{\prime }_{x}(\boldsymbol {r}_5)+ \frac{1}{12} A^{\prime }_{z}(\boldsymbol {r}_5)+ \frac{1}{12} A^{\prime }_{x}(\boldsymbol {r}_6)$|
|$+ \frac{1}{12} A^{\prime }_{x}(\boldsymbol {r}_7)- \frac{1}{12} A^{\prime }_{z}(\boldsymbol {r}_7)+ \frac{1}{12} A^{\prime }_{x}(\boldsymbol {r}_8)\big]$|
Ayy=|$\frac{1}{d_{\mathrm{c}}} \big[ + \frac{1}{4} A^{\prime }_{y}(\boldsymbol {r}_2)- \frac{1}{4} A^{\prime }_{y}(\boldsymbol {r}_4)+ \frac{1}{4} A^{\prime }_{y}(\boldsymbol {r}_6)- \frac{1}{4} A^{\prime }_{y}(\boldsymbol {r}_8)\big]$|
Ayz=|$\frac{1}{d_{\mathrm{c}}} \big[ - \frac{1}{12} A^{\prime }_{y}(\boldsymbol {r}_1)- \frac{1}{12} A^{\prime }_{y}(\boldsymbol {r}_2)+ \frac{1}{12} A^{\prime }_{z}(\boldsymbol {r}_2)- \frac{1}{12} A^{\prime }_{y}(\boldsymbol {r}_3)- \frac{1}{12} A^{\prime }_{y}(\boldsymbol {r}_4)- \frac{1}{12} A^{\prime }_{z}(\boldsymbol {r}_4)+ \frac{1}{12} A^{\prime }_{y}(\boldsymbol {r}_5)+ \frac{1}{12} A^{\prime }_{y}(\boldsymbol {r}_6)+ \frac{1}{12} A^{\prime }_{z}(\boldsymbol {r}_6)$|
|$+ \frac{1}{12} A^{\prime }_{y}(\boldsymbol {r}_7)+ \frac{1}{12} A^{\prime }_{y}(\boldsymbol {r}_8)- \frac{1}{12} A^{\prime }_{z}(\boldsymbol {r}_8)\big]$|
Azz=|$\frac{1}{d_{\mathrm{c}}} \big[ - \frac{1}{8} A^{\prime }_{z}(\boldsymbol {r}_1)- \frac{1}{8} A^{\prime }_{z}(\boldsymbol {r}_2)- \frac{1}{8} A^{\prime }_{z}(\boldsymbol {r}_3)- \frac{1}{8} A^{\prime }_{z}(\boldsymbol {r}_4)+ \frac{1}{8} A^{\prime }_{z}(\boldsymbol {r}_5)+ \frac{1}{8} A^{\prime }_{z}(\boldsymbol {r}_6)+ \frac{1}{8} A^{\prime }_{z}(\boldsymbol {r}_7)+ \frac{1}{8} A^{\prime }_{z}(\boldsymbol {r}_8)\big]$|
Axxx=|$\frac{1}{d_{\mathrm{c}}^2} \big[ + \frac{1}{4} A^{\prime }_{x}(\boldsymbol {r}_1)+ \frac{1}{8} A^{\prime }_{z}(\boldsymbol {r}_1)+ \frac{1}{4} A^{\prime }_{x}(\boldsymbol {r}_3)- \frac{1}{8} A^{\prime }_{z}(\boldsymbol {r}_3)+ \frac{1}{4} A^{\prime }_{x}(\boldsymbol {r}_5)- \frac{1}{8} A^{\prime }_{z}(\boldsymbol {r}_5)+ \frac{1}{4} A^{\prime }_{x}(\boldsymbol {r}_7)+ \frac{1}{8} A^{\prime }_{z}(\boldsymbol {r}_7)\big]$|
Axxy=|$\frac{1}{d_{\mathrm{c}}^2} \big[ + \frac{1}{4} A^{\prime }_{y}(\boldsymbol {r}_1)+ \frac{1}{8} A^{\prime }_{z}(\boldsymbol {r}_2)+ \frac{1}{4} A^{\prime }_{y}(\boldsymbol {r}_3)- \frac{1}{8} A^{\prime }_{z}(\boldsymbol {r}_4)+ \frac{1}{4} A^{\prime }_{y}(\boldsymbol {r}_5)- \frac{1}{8} A^{\prime }_{z}(\boldsymbol {r}_6)+ \frac{1}{4} A^{\prime }_{y}(\boldsymbol {r}_7)+ \frac{1}{8} A^{\prime }_{z}(\boldsymbol {r}_8)\big]$|
Axxz=|$\frac{1}{d_{\mathrm{c}}^2} \big[ - \frac{9}{80} A^{\prime }_{x}(\boldsymbol {r}_1)+ \frac{1}{40} A^{\prime }_{z}(\boldsymbol {r}_1)- \frac{1}{80} A^{\prime }_{y}(\boldsymbol {r}_2)- \frac{1}{40} A^{\prime }_{z}(\boldsymbol {r}_2)+ \frac{9}{80} A^{\prime }_{x}(\boldsymbol {r}_3)+ \frac{1}{40} A^{\prime }_{z}(\boldsymbol {r}_3)+ \frac{1}{80} A^{\prime }_{y}(\boldsymbol {r}_4)- \frac{1}{40} A^{\prime }_{z}(\boldsymbol {r}_4)+ \frac{9}{80} A^{\prime }_{x}(\boldsymbol {r}_5)$|
|$+ \frac{1}{40} A^{\prime }_{z}(\boldsymbol {r}_5)+ \frac{1}{80} A^{\prime }_{y}(\boldsymbol {r}_6)- \frac{1}{40} A^{\prime }_{z}(\boldsymbol {r}_6)- \frac{9}{80} A^{\prime }_{x}(\boldsymbol {r}_7)+ \frac{1}{40} A^{\prime }_{z}(\boldsymbol {r}_7)- \frac{1}{80} A^{\prime }_{y}(\boldsymbol {r}_8)- \frac{1}{40} A^{\prime }_{z}(\boldsymbol {r}_8)\big]$|
Axyy=|$\frac{1}{d_{\mathrm{c}}^2} \big[ + \frac{1}{8} A^{\prime }_{z}(\boldsymbol {r}_1)+ \frac{1}{4} A^{\prime }_{x}(\boldsymbol {r}_2)- \frac{1}{8} A^{\prime }_{z}(\boldsymbol {r}_3)+ \frac{1}{4} A^{\prime }_{x}(\boldsymbol {r}_4)- \frac{1}{8} A^{\prime }_{z}(\boldsymbol {r}_5)+ \frac{1}{4} A^{\prime }_{x}(\boldsymbol {r}_6)+ \frac{1}{8} A^{\prime }_{z}(\boldsymbol {r}_7)+ \frac{1}{4} A^{\prime }_{x}(\boldsymbol {r}_8)\big]$|
Axyz=|$\frac{1}{d_{\mathrm{c}}^2} \big[ - \frac{1}{16} A^{\prime }_{y}(\boldsymbol {r}_1)- \frac{1}{16} A^{\prime }_{x}(\boldsymbol {r}_2)+ \frac{1}{16} A^{\prime }_{y}(\boldsymbol {r}_3)+ \frac{1}{16} A^{\prime }_{x}(\boldsymbol {r}_4)+ \frac{1}{16} A^{\prime }_{y}(\boldsymbol {r}_5)+ \frac{1}{16} A^{\prime }_{x}(\boldsymbol {r}_6)- \frac{1}{16} A^{\prime }_{y}(\boldsymbol {r}_7)- \frac{1}{16} A^{\prime }_{x}(\boldsymbol {r}_8)\big]$|
Axzz=|$\frac{1}{d_{\mathrm{c}}^2} \big[ - \frac{1}{8} A^{\prime }_{z}(\boldsymbol {r}_1)+ \frac{1}{8} A^{\prime }_{z}(\boldsymbol {r}_3)+ \frac{1}{8} A^{\prime }_{z}(\boldsymbol {r}_5)- \frac{1}{8} A^{\prime }_{z}(\boldsymbol {r}_7)\big]$|
Ayyy=|$\frac{1}{d_{\mathrm{c}}^2} \big[ + \frac{1}{4} A^{\prime }_{y}(\boldsymbol {r}_2)+ \frac{1}{8} A^{\prime }_{z}(\boldsymbol {r}_2)+ \frac{1}{4} A^{\prime }_{y}(\boldsymbol {r}_4)- \frac{1}{8} A^{\prime }_{z}(\boldsymbol {r}_4)+ \frac{1}{4} A^{\prime }_{y}(\boldsymbol {r}_6)- \frac{1}{8} A^{\prime }_{z}(\boldsymbol {r}_6)+ \frac{1}{4} A^{\prime }_{y}(\boldsymbol {r}_8)+ \frac{1}{8} A^{\prime }_{z}(\boldsymbol {r}_8)\big]$|
Ayyz=|$\frac{1}{d_{\mathrm{c}}^2} \big[ - \frac{1}{80} A^{\prime }_{x}(\boldsymbol {r}_1)- \frac{1}{40} A^{\prime }_{z}(\boldsymbol {r}_1)- \frac{9}{80} A^{\prime }_{y}(\boldsymbol {r}_2)+ \frac{1}{40} A^{\prime }_{z}(\boldsymbol {r}_2)+ \frac{1}{80} A^{\prime }_{x}(\boldsymbol {r}_3)- \frac{1}{40} A^{\prime }_{z}(\boldsymbol {r}_3)+ \frac{9}{80} A^{\prime }_{y}(\boldsymbol {r}_4)+ \frac{1}{40} A^{\prime }_{z}(\boldsymbol {r}_4)+ \frac{1}{80} A^{\prime }_{x}(\boldsymbol {r}_5)$|
|$- \frac{1}{40} A^{\prime }_{z}(\boldsymbol {r}_5)+ \frac{9}{80} A^{\prime }_{y}(\boldsymbol {r}_6)+ \frac{1}{40} A^{\prime }_{z}(\boldsymbol {r}_6)- \frac{1}{80} A^{\prime }_{x}(\boldsymbol {r}_7)- \frac{1}{40} A^{\prime }_{z}(\boldsymbol {r}_7)- \frac{9}{80} A^{\prime }_{y}(\boldsymbol {r}_8)+ \frac{1}{40} A^{\prime }_{z}(\boldsymbol {r}_8)\big]$|
Ayzz=|$\frac{1}{d_{\mathrm{c}}^2} \big[ - \frac{1}{8} A^{\prime }_{z}(\boldsymbol {r}_2)+ \frac{1}{8} A^{\prime }_{z}(\boldsymbol {r}_4)+ \frac{1}{8} A^{\prime }_{z}(\boldsymbol {r}_6)- \frac{1}{8} A^{\prime }_{z}(\boldsymbol {r}_8)\big]$|
Azzz=|$\frac{1}{d_{\mathrm{c}}^2} \big[ + \frac{1}{16} A^{\prime }_{x}(\boldsymbol {r}_1)+ \frac{1}{8} A^{\prime }_{z}(\boldsymbol {r}_1)+ \frac{1}{16} A^{\prime }_{y}(\boldsymbol {r}_2)+ \frac{1}{8} A^{\prime }_{z}(\boldsymbol {r}_2)- \frac{1}{16} A^{\prime }_{x}(\boldsymbol {r}_3)+ \frac{1}{8} A^{\prime }_{z}(\boldsymbol {r}_3)- \frac{1}{16} A^{\prime }_{y}(\boldsymbol {r}_4)+ \frac{1}{8} A^{\prime }_{z}(\boldsymbol {r}_4)- \frac{1}{16} A^{\prime }_{x}(\boldsymbol {r}_5)$|
|$+ \frac{1}{8} A^{\prime }_{z}(\boldsymbol {r}_5)- \frac{1}{16} A^{\prime }_{y}(\boldsymbol {r}_6)+ \frac{1}{8} A^{\prime }_{z}(\boldsymbol {r}_6)+ \frac{1}{16} A^{\prime }_{x}(\boldsymbol {r}_7)+ \frac{1}{8} A^{\prime }_{z}(\boldsymbol {r}_7)+ \frac{1}{16} A^{\prime }_{y}(\boldsymbol {r}_8)+ \frac{1}{8} A^{\prime }_{z}(\boldsymbol {r}_8)\big]$|
|$A_{x}(\boldsymbol {r})$|=|$A_{x}(\boldsymbol {r}_{\mathrm{cm}}) + A_{xxx}x^{\prime 2} + 2A_{xxy}x^{\prime }y^{\prime } + 2A_{xxz}x^{\prime }z^{\prime } + A_{xx}x^{\prime } + A_{xyy}y^{\prime 2} + 2A_{xyz}y^{\prime }z^{\prime } + A_{xy}y^{\prime } + A_{xzz}z^{\prime 2} + A_{xz}z^{\prime }$|
|$A_{y}(\boldsymbol {r})$|=|$A_{y}(\boldsymbol {r}_{\mathrm{cm}}) + A_{xxy}x^{\prime 2} + 2A_{xyy}x^{\prime }y^{\prime } + 2A_{xyz}x^{\prime }z^{\prime } + A_{xy}x^{\prime } + A_{yyy}y^{\prime 2} + 2A_{yyz}y^{\prime }z^{\prime } + A_{yy}y^{\prime } + A_{yzz}z^{\prime 2} + A_{yz}z^{\prime }$|
|$A_{z}(\boldsymbol {r})$|=|$A_{z}(\boldsymbol {r}_{\mathrm{cm}}) + A_{xxz}x^{\prime 2} + 2A_{xyz}x^{\prime }y^{\prime } + 2A_{xzz}x^{\prime }z^{\prime } + A_{xz}x^{\prime } + A_{yyz}y^{\prime 2} + 2A_{yzz}y^{\prime }z^{\prime } + A_{yz}y^{\prime } + A_{zzz}z^{\prime 2} + A_{zz}z^{\prime }$|
|$\boldsymbol {r}^{\prime }_{\mathrm{i}}/d_{\mathrm{c}}$|=|$\left [\,1\quad 0 \quad -1\right ] \quad \left [\,0\quad 1 \quad -1\right ]\quad \left [\,-1\quad 0 \quad -1\right ]\quad \left [\,0\quad -1 \quad -1\right ]\quad \left [\,1\quad 0 \quad 1\right ]\quad \left [\,0\quad 1 \quad 1\right ]\quad \left [\,-1\quad 0 \quad 1\right ]\quad \left [\,0\quad -1 \quad 1\right ]$|
Table 1.

The second- (upper block) and third-order (lower block) tidal tensor methods. Each block contains three parts: (1) the formulae to calculate the tensor coefficients, |$\boldsymbol {A}_{ij}$| and |$\boldsymbol {A}_{ijk}$|⁠, where the centre-of-the-mass acceleration is subtracted in |$\boldsymbol {A}^{\prime }(\boldsymbol {r}_{\mathrm{i}})$|⁠; (2) the formulae to calculate the acceleration |$\boldsymbol {A}(\boldsymbol {r})$| for a given position, where |$\boldsymbol {r}^{\prime }$| (x, y, z) is the coordinate referring to the centre-of-the-mass (⁠|$\boldsymbol {r}-\boldsymbol {r}_{\mathrm{cm}}$|⁠); (3) the coordinates of the measure points, where dc is the half-length of the edge of the tetrahedron or the cube.

Second-order tidal tensor with four measure points at corners of a regular tetrahedron
Axx=|$\frac{1}{d_{\mathrm{c}}} \big[ + \frac{1}{2} A^{\prime }_{x}(\boldsymbol {r}_1)- \frac{1}{2} A^{\prime }_{x}(\boldsymbol {r}_2)\big]$|
Axy=|$\frac{1}{d_{\mathrm{c}}} \big[ + \frac{1}{4} A^{\prime }_{y}(\boldsymbol {r}_1)- \frac{1}{4} A^{\prime }_{y}(\boldsymbol {r}_2)+ \frac{1}{4} A^{\prime }_{x}(\boldsymbol {r}_3)- \frac{1}{4} A^{\prime }_{x}(\boldsymbol {r}_4)\big]$|
Axz=|$\frac{1}{d_{\mathrm{c}}} \big[ - \frac{\sqrt{2}}{8} A^{\prime }_{x}(\boldsymbol {r}_1)+ \frac{1}{4} A^{\prime }_{z}(\boldsymbol {r}_1)- \frac{\sqrt{2}}{8} A^{\prime }_{x}(\boldsymbol {r}_2)- \frac{1}{4} A^{\prime }_{z}(\boldsymbol {r}_2)+ \frac{\sqrt{2}}{8} A^{\prime }_{x}(\boldsymbol {r}_3)+ \frac{\sqrt{2}}{8} A^{\prime }_{x}(\boldsymbol {r}_4)\big]$|
Ayy=|$\frac{1}{d_{\mathrm{c}}} \big[ + \frac{1}{2} A^{\prime }_{y}(\boldsymbol {r}_3)- \frac{1}{2} A^{\prime }_{y}(\boldsymbol {r}_4)\big]$|
Ayz=|$\frac{1}{d_{\mathrm{c}}} \big[ - \frac{\sqrt{2}}{8} A^{\prime }_{y}(\boldsymbol {r}_1)- \frac{\sqrt{2}}{8} A^{\prime }_{y}(\boldsymbol {r}_2)+ \frac{\sqrt{2}}{8} A^{\prime }_{y}(\boldsymbol {r}_3)+ \frac{1}{4} A^{\prime }_{z}(\boldsymbol {r}_3)+ \frac{\sqrt{2}}{8} A^{\prime }_{y}(\boldsymbol {r}_4)- \frac{1}{4} A^{\prime }_{z}(\boldsymbol {r}_4)\big]$|
Azz=|$\frac{1}{d_{\mathrm{c}}} \big[ - \frac{\sqrt{2}}{4} A^{\prime }_{z}(\boldsymbol {r}_1)- \frac{\sqrt{2}}{4} A^{\prime }_{z}(\boldsymbol {r}_2)+ \frac{\sqrt{2}}{4} A^{\prime }_{z}(\boldsymbol {r}_3)+ \frac{\sqrt{2}}{4} A^{\prime }_{z}(\boldsymbol {r}_4)\big]$|
|$A_{x}(\boldsymbol {r})$|=|$A_{x}(\boldsymbol {r}_{\mathrm{cm}}) + A_{xx}x^{\prime } + A_{xy}y^{\prime } + A_{xz}z^{\prime }$|
|$A_{y}(\boldsymbol {r})$|=|$A_{y}(\boldsymbol {r}_{\mathrm{cm}}) + A_{xy}x^{\prime } + A_{yy}y^{\prime } + A_{yz}z^{\prime }$|
|$A_{z}(\boldsymbol {r})$|=|$A_{z}(\boldsymbol {r}_{\mathrm{cm}}) + A_{xz}x^{\prime } + A_{yz}y^{\prime } + A_{zz}z^{\prime }$|
|$\boldsymbol {r}^{\prime }_{\mathrm{i}}/d_{\mathrm{c}}$|=|$\left [ 1\quad 0 \quad - \frac{\sqrt{2}}{2} \right ]\quad\left [ -1\quad 0 \quad - \frac{\sqrt{2}}{2} \right ]\quad\left [ 0\quad 1 \quad - \frac{\sqrt{2}}{2} \right ]\quad\left [ 0\quad -1 \quad - \frac{\sqrt{2}}{2} \right ]$|
Third-order tidal tensor with eight measure points at corners of a cube
Axx=|$\frac{1}{d_{\mathrm{c}}} \big[ + \frac{1}{4} A^{\prime }_{x}(\boldsymbol {r}_1)- \frac{1}{4} A^{\prime }_{x}(\boldsymbol {r}_3)+ \frac{1}{4} A^{\prime }_{x}(\boldsymbol {r}_5)- \frac{1}{4} A^{\prime }_{x}(\boldsymbol {r}_7)\big]$|
Axy=|$\frac{1}{d_{\mathrm{c}}} \big[ + \frac{1}{8} A^{\prime }_{y}(\boldsymbol {r}_1)+ \frac{1}{8} A^{\prime }_{x}(\boldsymbol {r}_2)- \frac{1}{8} A^{\prime }_{y}(\boldsymbol {r}_3)- \frac{1}{8} A^{\prime }_{x}(\boldsymbol {r}_4)+ \frac{1}{8} A^{\prime }_{y}(\boldsymbol {r}_5)+ \frac{1}{8} A^{\prime }_{x}(\boldsymbol {r}_6)- \frac{1}{8} A^{\prime }_{y}(\boldsymbol {r}_7)- \frac{1}{8} A^{\prime }_{x}(\boldsymbol {r}_8)\big]$|
Axz=|$\frac{1}{d_{\mathrm{c}}} \big[ - \frac{1}{12} A^{\prime }_{x}(\boldsymbol {r}_1)+ \frac{1}{12} A^{\prime }_{z}(\boldsymbol {r}_1)- \frac{1}{12} A^{\prime }_{x}(\boldsymbol {r}_2)- \frac{1}{12} A^{\prime }_{x}(\boldsymbol {r}_3)- \frac{1}{12} A^{\prime }_{z}(\boldsymbol {r}_3)- \frac{1}{12} A^{\prime }_{x}(\boldsymbol {r}_4)+ \frac{1}{12} A^{\prime }_{x}(\boldsymbol {r}_5)+ \frac{1}{12} A^{\prime }_{z}(\boldsymbol {r}_5)+ \frac{1}{12} A^{\prime }_{x}(\boldsymbol {r}_6)$|
|$+ \frac{1}{12} A^{\prime }_{x}(\boldsymbol {r}_7)- \frac{1}{12} A^{\prime }_{z}(\boldsymbol {r}_7)+ \frac{1}{12} A^{\prime }_{x}(\boldsymbol {r}_8)\big]$|
Ayy=|$\frac{1}{d_{\mathrm{c}}} \big[ + \frac{1}{4} A^{\prime }_{y}(\boldsymbol {r}_2)- \frac{1}{4} A^{\prime }_{y}(\boldsymbol {r}_4)+ \frac{1}{4} A^{\prime }_{y}(\boldsymbol {r}_6)- \frac{1}{4} A^{\prime }_{y}(\boldsymbol {r}_8)\big]$|
Ayz=|$\frac{1}{d_{\mathrm{c}}} \big[ - \frac{1}{12} A^{\prime }_{y}(\boldsymbol {r}_1)- \frac{1}{12} A^{\prime }_{y}(\boldsymbol {r}_2)+ \frac{1}{12} A^{\prime }_{z}(\boldsymbol {r}_2)- \frac{1}{12} A^{\prime }_{y}(\boldsymbol {r}_3)- \frac{1}{12} A^{\prime }_{y}(\boldsymbol {r}_4)- \frac{1}{12} A^{\prime }_{z}(\boldsymbol {r}_4)+ \frac{1}{12} A^{\prime }_{y}(\boldsymbol {r}_5)+ \frac{1}{12} A^{\prime }_{y}(\boldsymbol {r}_6)+ \frac{1}{12} A^{\prime }_{z}(\boldsymbol {r}_6)$|
|$+ \frac{1}{12} A^{\prime }_{y}(\boldsymbol {r}_7)+ \frac{1}{12} A^{\prime }_{y}(\boldsymbol {r}_8)- \frac{1}{12} A^{\prime }_{z}(\boldsymbol {r}_8)\big]$|
Azz=|$\frac{1}{d_{\mathrm{c}}} \big[ - \frac{1}{8} A^{\prime }_{z}(\boldsymbol {r}_1)- \frac{1}{8} A^{\prime }_{z}(\boldsymbol {r}_2)- \frac{1}{8} A^{\prime }_{z}(\boldsymbol {r}_3)- \frac{1}{8} A^{\prime }_{z}(\boldsymbol {r}_4)+ \frac{1}{8} A^{\prime }_{z}(\boldsymbol {r}_5)+ \frac{1}{8} A^{\prime }_{z}(\boldsymbol {r}_6)+ \frac{1}{8} A^{\prime }_{z}(\boldsymbol {r}_7)+ \frac{1}{8} A^{\prime }_{z}(\boldsymbol {r}_8)\big]$|
Axxx=|$\frac{1}{d_{\mathrm{c}}^2} \big[ + \frac{1}{4} A^{\prime }_{x}(\boldsymbol {r}_1)+ \frac{1}{8} A^{\prime }_{z}(\boldsymbol {r}_1)+ \frac{1}{4} A^{\prime }_{x}(\boldsymbol {r}_3)- \frac{1}{8} A^{\prime }_{z}(\boldsymbol {r}_3)+ \frac{1}{4} A^{\prime }_{x}(\boldsymbol {r}_5)- \frac{1}{8} A^{\prime }_{z}(\boldsymbol {r}_5)+ \frac{1}{4} A^{\prime }_{x}(\boldsymbol {r}_7)+ \frac{1}{8} A^{\prime }_{z}(\boldsymbol {r}_7)\big]$|
Axxy=|$\frac{1}{d_{\mathrm{c}}^2} \big[ + \frac{1}{4} A^{\prime }_{y}(\boldsymbol {r}_1)+ \frac{1}{8} A^{\prime }_{z}(\boldsymbol {r}_2)+ \frac{1}{4} A^{\prime }_{y}(\boldsymbol {r}_3)- \frac{1}{8} A^{\prime }_{z}(\boldsymbol {r}_4)+ \frac{1}{4} A^{\prime }_{y}(\boldsymbol {r}_5)- \frac{1}{8} A^{\prime }_{z}(\boldsymbol {r}_6)+ \frac{1}{4} A^{\prime }_{y}(\boldsymbol {r}_7)+ \frac{1}{8} A^{\prime }_{z}(\boldsymbol {r}_8)\big]$|
Axxz=|$\frac{1}{d_{\mathrm{c}}^2} \big[ - \frac{9}{80} A^{\prime }_{x}(\boldsymbol {r}_1)+ \frac{1}{40} A^{\prime }_{z}(\boldsymbol {r}_1)- \frac{1}{80} A^{\prime }_{y}(\boldsymbol {r}_2)- \frac{1}{40} A^{\prime }_{z}(\boldsymbol {r}_2)+ \frac{9}{80} A^{\prime }_{x}(\boldsymbol {r}_3)+ \frac{1}{40} A^{\prime }_{z}(\boldsymbol {r}_3)+ \frac{1}{80} A^{\prime }_{y}(\boldsymbol {r}_4)- \frac{1}{40} A^{\prime }_{z}(\boldsymbol {r}_4)+ \frac{9}{80} A^{\prime }_{x}(\boldsymbol {r}_5)$|
|$+ \frac{1}{40} A^{\prime }_{z}(\boldsymbol {r}_5)+ \frac{1}{80} A^{\prime }_{y}(\boldsymbol {r}_6)- \frac{1}{40} A^{\prime }_{z}(\boldsymbol {r}_6)- \frac{9}{80} A^{\prime }_{x}(\boldsymbol {r}_7)+ \frac{1}{40} A^{\prime }_{z}(\boldsymbol {r}_7)- \frac{1}{80} A^{\prime }_{y}(\boldsymbol {r}_8)- \frac{1}{40} A^{\prime }_{z}(\boldsymbol {r}_8)\big]$|
Axyy=|$\frac{1}{d_{\mathrm{c}}^2} \big[ + \frac{1}{8} A^{\prime }_{z}(\boldsymbol {r}_1)+ \frac{1}{4} A^{\prime }_{x}(\boldsymbol {r}_2)- \frac{1}{8} A^{\prime }_{z}(\boldsymbol {r}_3)+ \frac{1}{4} A^{\prime }_{x}(\boldsymbol {r}_4)- \frac{1}{8} A^{\prime }_{z}(\boldsymbol {r}_5)+ \frac{1}{4} A^{\prime }_{x}(\boldsymbol {r}_6)+ \frac{1}{8} A^{\prime }_{z}(\boldsymbol {r}_7)+ \frac{1}{4} A^{\prime }_{x}(\boldsymbol {r}_8)\big]$|
Axyz=|$\frac{1}{d_{\mathrm{c}}^2} \big[ - \frac{1}{16} A^{\prime }_{y}(\boldsymbol {r}_1)- \frac{1}{16} A^{\prime }_{x}(\boldsymbol {r}_2)+ \frac{1}{16} A^{\prime }_{y}(\boldsymbol {r}_3)+ \frac{1}{16} A^{\prime }_{x}(\boldsymbol {r}_4)+ \frac{1}{16} A^{\prime }_{y}(\boldsymbol {r}_5)+ \frac{1}{16} A^{\prime }_{x}(\boldsymbol {r}_6)- \frac{1}{16} A^{\prime }_{y}(\boldsymbol {r}_7)- \frac{1}{16} A^{\prime }_{x}(\boldsymbol {r}_8)\big]$|
Axzz=|$\frac{1}{d_{\mathrm{c}}^2} \big[ - \frac{1}{8} A^{\prime }_{z}(\boldsymbol {r}_1)+ \frac{1}{8} A^{\prime }_{z}(\boldsymbol {r}_3)+ \frac{1}{8} A^{\prime }_{z}(\boldsymbol {r}_5)- \frac{1}{8} A^{\prime }_{z}(\boldsymbol {r}_7)\big]$|
Ayyy=|$\frac{1}{d_{\mathrm{c}}^2} \big[ + \frac{1}{4} A^{\prime }_{y}(\boldsymbol {r}_2)+ \frac{1}{8} A^{\prime }_{z}(\boldsymbol {r}_2)+ \frac{1}{4} A^{\prime }_{y}(\boldsymbol {r}_4)- \frac{1}{8} A^{\prime }_{z}(\boldsymbol {r}_4)+ \frac{1}{4} A^{\prime }_{y}(\boldsymbol {r}_6)- \frac{1}{8} A^{\prime }_{z}(\boldsymbol {r}_6)+ \frac{1}{4} A^{\prime }_{y}(\boldsymbol {r}_8)+ \frac{1}{8} A^{\prime }_{z}(\boldsymbol {r}_8)\big]$|
Ayyz=|$\frac{1}{d_{\mathrm{c}}^2} \big[ - \frac{1}{80} A^{\prime }_{x}(\boldsymbol {r}_1)- \frac{1}{40} A^{\prime }_{z}(\boldsymbol {r}_1)- \frac{9}{80} A^{\prime }_{y}(\boldsymbol {r}_2)+ \frac{1}{40} A^{\prime }_{z}(\boldsymbol {r}_2)+ \frac{1}{80} A^{\prime }_{x}(\boldsymbol {r}_3)- \frac{1}{40} A^{\prime }_{z}(\boldsymbol {r}_3)+ \frac{9}{80} A^{\prime }_{y}(\boldsymbol {r}_4)+ \frac{1}{40} A^{\prime }_{z}(\boldsymbol {r}_4)+ \frac{1}{80} A^{\prime }_{x}(\boldsymbol {r}_5)$|
|$- \frac{1}{40} A^{\prime }_{z}(\boldsymbol {r}_5)+ \frac{9}{80} A^{\prime }_{y}(\boldsymbol {r}_6)+ \frac{1}{40} A^{\prime }_{z}(\boldsymbol {r}_6)- \frac{1}{80} A^{\prime }_{x}(\boldsymbol {r}_7)- \frac{1}{40} A^{\prime }_{z}(\boldsymbol {r}_7)- \frac{9}{80} A^{\prime }_{y}(\boldsymbol {r}_8)+ \frac{1}{40} A^{\prime }_{z}(\boldsymbol {r}_8)\big]$|
Ayzz=|$\frac{1}{d_{\mathrm{c}}^2} \big[ - \frac{1}{8} A^{\prime }_{z}(\boldsymbol {r}_2)+ \frac{1}{8} A^{\prime }_{z}(\boldsymbol {r}_4)+ \frac{1}{8} A^{\prime }_{z}(\boldsymbol {r}_6)- \frac{1}{8} A^{\prime }_{z}(\boldsymbol {r}_8)\big]$|
Azzz=|$\frac{1}{d_{\mathrm{c}}^2} \big[ + \frac{1}{16} A^{\prime }_{x}(\boldsymbol {r}_1)+ \frac{1}{8} A^{\prime }_{z}(\boldsymbol {r}_1)+ \frac{1}{16} A^{\prime }_{y}(\boldsymbol {r}_2)+ \frac{1}{8} A^{\prime }_{z}(\boldsymbol {r}_2)- \frac{1}{16} A^{\prime }_{x}(\boldsymbol {r}_3)+ \frac{1}{8} A^{\prime }_{z}(\boldsymbol {r}_3)- \frac{1}{16} A^{\prime }_{y}(\boldsymbol {r}_4)+ \frac{1}{8} A^{\prime }_{z}(\boldsymbol {r}_4)- \frac{1}{16} A^{\prime }_{x}(\boldsymbol {r}_5)$|
|$+ \frac{1}{8} A^{\prime }_{z}(\boldsymbol {r}_5)- \frac{1}{16} A^{\prime }_{y}(\boldsymbol {r}_6)+ \frac{1}{8} A^{\prime }_{z}(\boldsymbol {r}_6)+ \frac{1}{16} A^{\prime }_{x}(\boldsymbol {r}_7)+ \frac{1}{8} A^{\prime }_{z}(\boldsymbol {r}_7)+ \frac{1}{16} A^{\prime }_{y}(\boldsymbol {r}_8)+ \frac{1}{8} A^{\prime }_{z}(\boldsymbol {r}_8)\big]$|
|$A_{x}(\boldsymbol {r})$|=|$A_{x}(\boldsymbol {r}_{\mathrm{cm}}) + A_{xxx}x^{\prime 2} + 2A_{xxy}x^{\prime }y^{\prime } + 2A_{xxz}x^{\prime }z^{\prime } + A_{xx}x^{\prime } + A_{xyy}y^{\prime 2} + 2A_{xyz}y^{\prime }z^{\prime } + A_{xy}y^{\prime } + A_{xzz}z^{\prime 2} + A_{xz}z^{\prime }$|
|$A_{y}(\boldsymbol {r})$|=|$A_{y}(\boldsymbol {r}_{\mathrm{cm}}) + A_{xxy}x^{\prime 2} + 2A_{xyy}x^{\prime }y^{\prime } + 2A_{xyz}x^{\prime }z^{\prime } + A_{xy}x^{\prime } + A_{yyy}y^{\prime 2} + 2A_{yyz}y^{\prime }z^{\prime } + A_{yy}y^{\prime } + A_{yzz}z^{\prime 2} + A_{yz}z^{\prime }$|
|$A_{z}(\boldsymbol {r})$|=|$A_{z}(\boldsymbol {r}_{\mathrm{cm}}) + A_{xxz}x^{\prime 2} + 2A_{xyz}x^{\prime }y^{\prime } + 2A_{xzz}x^{\prime }z^{\prime } + A_{xz}x^{\prime } + A_{yyz}y^{\prime 2} + 2A_{yzz}y^{\prime }z^{\prime } + A_{yz}y^{\prime } + A_{zzz}z^{\prime 2} + A_{zz}z^{\prime }$|
|$\boldsymbol {r}^{\prime }_{\mathrm{i}}/d_{\mathrm{c}}$|=|$\left [\,1\quad 0 \quad -1\right ] \quad \left [\,0\quad 1 \quad -1\right ]\quad \left [\,-1\quad 0 \quad -1\right ]\quad \left [\,0\quad -1 \quad -1\right ]\quad \left [\,1\quad 0 \quad 1\right ]\quad \left [\,0\quad 1 \quad 1\right ]\quad \left [\,-1\quad 0 \quad 1\right ]\quad \left [\,0\quad -1 \quad 1\right ]$|
Second-order tidal tensor with four measure points at corners of a regular tetrahedron
Axx=|$\frac{1}{d_{\mathrm{c}}} \big[ + \frac{1}{2} A^{\prime }_{x}(\boldsymbol {r}_1)- \frac{1}{2} A^{\prime }_{x}(\boldsymbol {r}_2)\big]$|
Axy=|$\frac{1}{d_{\mathrm{c}}} \big[ + \frac{1}{4} A^{\prime }_{y}(\boldsymbol {r}_1)- \frac{1}{4} A^{\prime }_{y}(\boldsymbol {r}_2)+ \frac{1}{4} A^{\prime }_{x}(\boldsymbol {r}_3)- \frac{1}{4} A^{\prime }_{x}(\boldsymbol {r}_4)\big]$|
Axz=|$\frac{1}{d_{\mathrm{c}}} \big[ - \frac{\sqrt{2}}{8} A^{\prime }_{x}(\boldsymbol {r}_1)+ \frac{1}{4} A^{\prime }_{z}(\boldsymbol {r}_1)- \frac{\sqrt{2}}{8} A^{\prime }_{x}(\boldsymbol {r}_2)- \frac{1}{4} A^{\prime }_{z}(\boldsymbol {r}_2)+ \frac{\sqrt{2}}{8} A^{\prime }_{x}(\boldsymbol {r}_3)+ \frac{\sqrt{2}}{8} A^{\prime }_{x}(\boldsymbol {r}_4)\big]$|
Ayy=|$\frac{1}{d_{\mathrm{c}}} \big[ + \frac{1}{2} A^{\prime }_{y}(\boldsymbol {r}_3)- \frac{1}{2} A^{\prime }_{y}(\boldsymbol {r}_4)\big]$|
Ayz=|$\frac{1}{d_{\mathrm{c}}} \big[ - \frac{\sqrt{2}}{8} A^{\prime }_{y}(\boldsymbol {r}_1)- \frac{\sqrt{2}}{8} A^{\prime }_{y}(\boldsymbol {r}_2)+ \frac{\sqrt{2}}{8} A^{\prime }_{y}(\boldsymbol {r}_3)+ \frac{1}{4} A^{\prime }_{z}(\boldsymbol {r}_3)+ \frac{\sqrt{2}}{8} A^{\prime }_{y}(\boldsymbol {r}_4)- \frac{1}{4} A^{\prime }_{z}(\boldsymbol {r}_4)\big]$|
Azz=|$\frac{1}{d_{\mathrm{c}}} \big[ - \frac{\sqrt{2}}{4} A^{\prime }_{z}(\boldsymbol {r}_1)- \frac{\sqrt{2}}{4} A^{\prime }_{z}(\boldsymbol {r}_2)+ \frac{\sqrt{2}}{4} A^{\prime }_{z}(\boldsymbol {r}_3)+ \frac{\sqrt{2}}{4} A^{\prime }_{z}(\boldsymbol {r}_4)\big]$|
|$A_{x}(\boldsymbol {r})$|=|$A_{x}(\boldsymbol {r}_{\mathrm{cm}}) + A_{xx}x^{\prime } + A_{xy}y^{\prime } + A_{xz}z^{\prime }$|
|$A_{y}(\boldsymbol {r})$|=|$A_{y}(\boldsymbol {r}_{\mathrm{cm}}) + A_{xy}x^{\prime } + A_{yy}y^{\prime } + A_{yz}z^{\prime }$|
|$A_{z}(\boldsymbol {r})$|=|$A_{z}(\boldsymbol {r}_{\mathrm{cm}}) + A_{xz}x^{\prime } + A_{yz}y^{\prime } + A_{zz}z^{\prime }$|
|$\boldsymbol {r}^{\prime }_{\mathrm{i}}/d_{\mathrm{c}}$|=|$\left [ 1\quad 0 \quad - \frac{\sqrt{2}}{2} \right ]\quad\left [ -1\quad 0 \quad - \frac{\sqrt{2}}{2} \right ]\quad\left [ 0\quad 1 \quad - \frac{\sqrt{2}}{2} \right ]\quad\left [ 0\quad -1 \quad - \frac{\sqrt{2}}{2} \right ]$|
Third-order tidal tensor with eight measure points at corners of a cube
Axx=|$\frac{1}{d_{\mathrm{c}}} \big[ + \frac{1}{4} A^{\prime }_{x}(\boldsymbol {r}_1)- \frac{1}{4} A^{\prime }_{x}(\boldsymbol {r}_3)+ \frac{1}{4} A^{\prime }_{x}(\boldsymbol {r}_5)- \frac{1}{4} A^{\prime }_{x}(\boldsymbol {r}_7)\big]$|
Axy=|$\frac{1}{d_{\mathrm{c}}} \big[ + \frac{1}{8} A^{\prime }_{y}(\boldsymbol {r}_1)+ \frac{1}{8} A^{\prime }_{x}(\boldsymbol {r}_2)- \frac{1}{8} A^{\prime }_{y}(\boldsymbol {r}_3)- \frac{1}{8} A^{\prime }_{x}(\boldsymbol {r}_4)+ \frac{1}{8} A^{\prime }_{y}(\boldsymbol {r}_5)+ \frac{1}{8} A^{\prime }_{x}(\boldsymbol {r}_6)- \frac{1}{8} A^{\prime }_{y}(\boldsymbol {r}_7)- \frac{1}{8} A^{\prime }_{x}(\boldsymbol {r}_8)\big]$|
Axz=|$\frac{1}{d_{\mathrm{c}}} \big[ - \frac{1}{12} A^{\prime }_{x}(\boldsymbol {r}_1)+ \frac{1}{12} A^{\prime }_{z}(\boldsymbol {r}_1)- \frac{1}{12} A^{\prime }_{x}(\boldsymbol {r}_2)- \frac{1}{12} A^{\prime }_{x}(\boldsymbol {r}_3)- \frac{1}{12} A^{\prime }_{z}(\boldsymbol {r}_3)- \frac{1}{12} A^{\prime }_{x}(\boldsymbol {r}_4)+ \frac{1}{12} A^{\prime }_{x}(\boldsymbol {r}_5)+ \frac{1}{12} A^{\prime }_{z}(\boldsymbol {r}_5)+ \frac{1}{12} A^{\prime }_{x}(\boldsymbol {r}_6)$|
|$+ \frac{1}{12} A^{\prime }_{x}(\boldsymbol {r}_7)- \frac{1}{12} A^{\prime }_{z}(\boldsymbol {r}_7)+ \frac{1}{12} A^{\prime }_{x}(\boldsymbol {r}_8)\big]$|
Ayy=|$\frac{1}{d_{\mathrm{c}}} \big[ + \frac{1}{4} A^{\prime }_{y}(\boldsymbol {r}_2)- \frac{1}{4} A^{\prime }_{y}(\boldsymbol {r}_4)+ \frac{1}{4} A^{\prime }_{y}(\boldsymbol {r}_6)- \frac{1}{4} A^{\prime }_{y}(\boldsymbol {r}_8)\big]$|
Ayz=|$\frac{1}{d_{\mathrm{c}}} \big[ - \frac{1}{12} A^{\prime }_{y}(\boldsymbol {r}_1)- \frac{1}{12} A^{\prime }_{y}(\boldsymbol {r}_2)+ \frac{1}{12} A^{\prime }_{z}(\boldsymbol {r}_2)- \frac{1}{12} A^{\prime }_{y}(\boldsymbol {r}_3)- \frac{1}{12} A^{\prime }_{y}(\boldsymbol {r}_4)- \frac{1}{12} A^{\prime }_{z}(\boldsymbol {r}_4)+ \frac{1}{12} A^{\prime }_{y}(\boldsymbol {r}_5)+ \frac{1}{12} A^{\prime }_{y}(\boldsymbol {r}_6)+ \frac{1}{12} A^{\prime }_{z}(\boldsymbol {r}_6)$|
|$+ \frac{1}{12} A^{\prime }_{y}(\boldsymbol {r}_7)+ \frac{1}{12} A^{\prime }_{y}(\boldsymbol {r}_8)- \frac{1}{12} A^{\prime }_{z}(\boldsymbol {r}_8)\big]$|
Azz=|$\frac{1}{d_{\mathrm{c}}} \big[ - \frac{1}{8} A^{\prime }_{z}(\boldsymbol {r}_1)- \frac{1}{8} A^{\prime }_{z}(\boldsymbol {r}_2)- \frac{1}{8} A^{\prime }_{z}(\boldsymbol {r}_3)- \frac{1}{8} A^{\prime }_{z}(\boldsymbol {r}_4)+ \frac{1}{8} A^{\prime }_{z}(\boldsymbol {r}_5)+ \frac{1}{8} A^{\prime }_{z}(\boldsymbol {r}_6)+ \frac{1}{8} A^{\prime }_{z}(\boldsymbol {r}_7)+ \frac{1}{8} A^{\prime }_{z}(\boldsymbol {r}_8)\big]$|
Axxx=|$\frac{1}{d_{\mathrm{c}}^2} \big[ + \frac{1}{4} A^{\prime }_{x}(\boldsymbol {r}_1)+ \frac{1}{8} A^{\prime }_{z}(\boldsymbol {r}_1)+ \frac{1}{4} A^{\prime }_{x}(\boldsymbol {r}_3)- \frac{1}{8} A^{\prime }_{z}(\boldsymbol {r}_3)+ \frac{1}{4} A^{\prime }_{x}(\boldsymbol {r}_5)- \frac{1}{8} A^{\prime }_{z}(\boldsymbol {r}_5)+ \frac{1}{4} A^{\prime }_{x}(\boldsymbol {r}_7)+ \frac{1}{8} A^{\prime }_{z}(\boldsymbol {r}_7)\big]$|
Axxy=|$\frac{1}{d_{\mathrm{c}}^2} \big[ + \frac{1}{4} A^{\prime }_{y}(\boldsymbol {r}_1)+ \frac{1}{8} A^{\prime }_{z}(\boldsymbol {r}_2)+ \frac{1}{4} A^{\prime }_{y}(\boldsymbol {r}_3)- \frac{1}{8} A^{\prime }_{z}(\boldsymbol {r}_4)+ \frac{1}{4} A^{\prime }_{y}(\boldsymbol {r}_5)- \frac{1}{8} A^{\prime }_{z}(\boldsymbol {r}_6)+ \frac{1}{4} A^{\prime }_{y}(\boldsymbol {r}_7)+ \frac{1}{8} A^{\prime }_{z}(\boldsymbol {r}_8)\big]$|
Axxz=|$\frac{1}{d_{\mathrm{c}}^2} \big[ - \frac{9}{80} A^{\prime }_{x}(\boldsymbol {r}_1)+ \frac{1}{40} A^{\prime }_{z}(\boldsymbol {r}_1)- \frac{1}{80} A^{\prime }_{y}(\boldsymbol {r}_2)- \frac{1}{40} A^{\prime }_{z}(\boldsymbol {r}_2)+ \frac{9}{80} A^{\prime }_{x}(\boldsymbol {r}_3)+ \frac{1}{40} A^{\prime }_{z}(\boldsymbol {r}_3)+ \frac{1}{80} A^{\prime }_{y}(\boldsymbol {r}_4)- \frac{1}{40} A^{\prime }_{z}(\boldsymbol {r}_4)+ \frac{9}{80} A^{\prime }_{x}(\boldsymbol {r}_5)$|
|$+ \frac{1}{40} A^{\prime }_{z}(\boldsymbol {r}_5)+ \frac{1}{80} A^{\prime }_{y}(\boldsymbol {r}_6)- \frac{1}{40} A^{\prime }_{z}(\boldsymbol {r}_6)- \frac{9}{80} A^{\prime }_{x}(\boldsymbol {r}_7)+ \frac{1}{40} A^{\prime }_{z}(\boldsymbol {r}_7)- \frac{1}{80} A^{\prime }_{y}(\boldsymbol {r}_8)- \frac{1}{40} A^{\prime }_{z}(\boldsymbol {r}_8)\big]$|
Axyy=|$\frac{1}{d_{\mathrm{c}}^2} \big[ + \frac{1}{8} A^{\prime }_{z}(\boldsymbol {r}_1)+ \frac{1}{4} A^{\prime }_{x}(\boldsymbol {r}_2)- \frac{1}{8} A^{\prime }_{z}(\boldsymbol {r}_3)+ \frac{1}{4} A^{\prime }_{x}(\boldsymbol {r}_4)- \frac{1}{8} A^{\prime }_{z}(\boldsymbol {r}_5)+ \frac{1}{4} A^{\prime }_{x}(\boldsymbol {r}_6)+ \frac{1}{8} A^{\prime }_{z}(\boldsymbol {r}_7)+ \frac{1}{4} A^{\prime }_{x}(\boldsymbol {r}_8)\big]$|
Axyz=|$\frac{1}{d_{\mathrm{c}}^2} \big[ - \frac{1}{16} A^{\prime }_{y}(\boldsymbol {r}_1)- \frac{1}{16} A^{\prime }_{x}(\boldsymbol {r}_2)+ \frac{1}{16} A^{\prime }_{y}(\boldsymbol {r}_3)+ \frac{1}{16} A^{\prime }_{x}(\boldsymbol {r}_4)+ \frac{1}{16} A^{\prime }_{y}(\boldsymbol {r}_5)+ \frac{1}{16} A^{\prime }_{x}(\boldsymbol {r}_6)- \frac{1}{16} A^{\prime }_{y}(\boldsymbol {r}_7)- \frac{1}{16} A^{\prime }_{x}(\boldsymbol {r}_8)\big]$|
Axzz=|$\frac{1}{d_{\mathrm{c}}^2} \big[ - \frac{1}{8} A^{\prime }_{z}(\boldsymbol {r}_1)+ \frac{1}{8} A^{\prime }_{z}(\boldsymbol {r}_3)+ \frac{1}{8} A^{\prime }_{z}(\boldsymbol {r}_5)- \frac{1}{8} A^{\prime }_{z}(\boldsymbol {r}_7)\big]$|
Ayyy=|$\frac{1}{d_{\mathrm{c}}^2} \big[ + \frac{1}{4} A^{\prime }_{y}(\boldsymbol {r}_2)+ \frac{1}{8} A^{\prime }_{z}(\boldsymbol {r}_2)+ \frac{1}{4} A^{\prime }_{y}(\boldsymbol {r}_4)- \frac{1}{8} A^{\prime }_{z}(\boldsymbol {r}_4)+ \frac{1}{4} A^{\prime }_{y}(\boldsymbol {r}_6)- \frac{1}{8} A^{\prime }_{z}(\boldsymbol {r}_6)+ \frac{1}{4} A^{\prime }_{y}(\boldsymbol {r}_8)+ \frac{1}{8} A^{\prime }_{z}(\boldsymbol {r}_8)\big]$|
Ayyz=|$\frac{1}{d_{\mathrm{c}}^2} \big[ - \frac{1}{80} A^{\prime }_{x}(\boldsymbol {r}_1)- \frac{1}{40} A^{\prime }_{z}(\boldsymbol {r}_1)- \frac{9}{80} A^{\prime }_{y}(\boldsymbol {r}_2)+ \frac{1}{40} A^{\prime }_{z}(\boldsymbol {r}_2)+ \frac{1}{80} A^{\prime }_{x}(\boldsymbol {r}_3)- \frac{1}{40} A^{\prime }_{z}(\boldsymbol {r}_3)+ \frac{9}{80} A^{\prime }_{y}(\boldsymbol {r}_4)+ \frac{1}{40} A^{\prime }_{z}(\boldsymbol {r}_4)+ \frac{1}{80} A^{\prime }_{x}(\boldsymbol {r}_5)$|
|$- \frac{1}{40} A^{\prime }_{z}(\boldsymbol {r}_5)+ \frac{9}{80} A^{\prime }_{y}(\boldsymbol {r}_6)+ \frac{1}{40} A^{\prime }_{z}(\boldsymbol {r}_6)- \frac{1}{80} A^{\prime }_{x}(\boldsymbol {r}_7)- \frac{1}{40} A^{\prime }_{z}(\boldsymbol {r}_7)- \frac{9}{80} A^{\prime }_{y}(\boldsymbol {r}_8)+ \frac{1}{40} A^{\prime }_{z}(\boldsymbol {r}_8)\big]$|
Ayzz=|$\frac{1}{d_{\mathrm{c}}^2} \big[ - \frac{1}{8} A^{\prime }_{z}(\boldsymbol {r}_2)+ \frac{1}{8} A^{\prime }_{z}(\boldsymbol {r}_4)+ \frac{1}{8} A^{\prime }_{z}(\boldsymbol {r}_6)- \frac{1}{8} A^{\prime }_{z}(\boldsymbol {r}_8)\big]$|
Azzz=|$\frac{1}{d_{\mathrm{c}}^2} \big[ + \frac{1}{16} A^{\prime }_{x}(\boldsymbol {r}_1)+ \frac{1}{8} A^{\prime }_{z}(\boldsymbol {r}_1)+ \frac{1}{16} A^{\prime }_{y}(\boldsymbol {r}_2)+ \frac{1}{8} A^{\prime }_{z}(\boldsymbol {r}_2)- \frac{1}{16} A^{\prime }_{x}(\boldsymbol {r}_3)+ \frac{1}{8} A^{\prime }_{z}(\boldsymbol {r}_3)- \frac{1}{16} A^{\prime }_{y}(\boldsymbol {r}_4)+ \frac{1}{8} A^{\prime }_{z}(\boldsymbol {r}_4)- \frac{1}{16} A^{\prime }_{x}(\boldsymbol {r}_5)$|
|$+ \frac{1}{8} A^{\prime }_{z}(\boldsymbol {r}_5)- \frac{1}{16} A^{\prime }_{y}(\boldsymbol {r}_6)+ \frac{1}{8} A^{\prime }_{z}(\boldsymbol {r}_6)+ \frac{1}{16} A^{\prime }_{x}(\boldsymbol {r}_7)+ \frac{1}{8} A^{\prime }_{z}(\boldsymbol {r}_7)+ \frac{1}{16} A^{\prime }_{y}(\boldsymbol {r}_8)+ \frac{1}{8} A^{\prime }_{z}(\boldsymbol {r}_8)\big]$|
|$A_{x}(\boldsymbol {r})$|=|$A_{x}(\boldsymbol {r}_{\mathrm{cm}}) + A_{xxx}x^{\prime 2} + 2A_{xxy}x^{\prime }y^{\prime } + 2A_{xxz}x^{\prime }z^{\prime } + A_{xx}x^{\prime } + A_{xyy}y^{\prime 2} + 2A_{xyz}y^{\prime }z^{\prime } + A_{xy}y^{\prime } + A_{xzz}z^{\prime 2} + A_{xz}z^{\prime }$|
|$A_{y}(\boldsymbol {r})$|=|$A_{y}(\boldsymbol {r}_{\mathrm{cm}}) + A_{xxy}x^{\prime 2} + 2A_{xyy}x^{\prime }y^{\prime } + 2A_{xyz}x^{\prime }z^{\prime } + A_{xy}x^{\prime } + A_{yyy}y^{\prime 2} + 2A_{yyz}y^{\prime }z^{\prime } + A_{yy}y^{\prime } + A_{yzz}z^{\prime 2} + A_{yz}z^{\prime }$|
|$A_{z}(\boldsymbol {r})$|=|$A_{z}(\boldsymbol {r}_{\mathrm{cm}}) + A_{xxz}x^{\prime 2} + 2A_{xyz}x^{\prime }y^{\prime } + 2A_{xzz}x^{\prime }z^{\prime } + A_{xz}x^{\prime } + A_{yyz}y^{\prime 2} + 2A_{yzz}y^{\prime }z^{\prime } + A_{yz}y^{\prime } + A_{zzz}z^{\prime 2} + A_{zz}z^{\prime }$|
|$\boldsymbol {r}^{\prime }_{\mathrm{i}}/d_{\mathrm{c}}$|=|$\left [\,1\quad 0 \quad -1\right ] \quad \left [\,0\quad 1 \quad -1\right ]\quad \left [\,-1\quad 0 \quad -1\right ]\quad \left [\,0\quad -1 \quad -1\right ]\quad \left [\,1\quad 0 \quad 1\right ]\quad \left [\,0\quad 1 \quad 1\right ]\quad \left [\,-1\quad 0 \quad 1\right ]\quad \left [\,0\quad -1 \quad 1\right ]$|

Since all measure points can only obtain the long-range forces every P3T-kick step, if the binary forms in the middle of P3T-drift, it is not possible to construct the tidal-tensor field immediately. Besides, if the binary disrupt, the tidal-tensor filed also cannot provide the correct perturbation once the two components leave far away. This is the limitation of the tidal-tensor method. However, the purpose of the tidal-tensor method is to ensure the long-term cumulative effect of long-range perturbation is correctly treated. Thus, a short interval error within one ΔtL is not very serious.

6.4.2 Counterforce

6.4.2.1 Orbit-sampling method
The tidal-tensor method introduced above provide the correct perturbation to the internal motion of the binary, it is also necessary to ensure that the perturbers can obtain the consistent counterforce. If the perturbers obtain the forces from the two components of the binary at P3T-kick, only one random phase of the binary is used to evaluate the interaction. This cannot provide the correct orbit-averaged force from binaries. To solve this issue, another group of artificial particles can be created by sampling the binary orbit with an equal eccentricity anomaly interval (Fig. 4). For example, if the eccentric anomaly interval, |$\Delta \mathcal {E} = \pi /4$|⁠, 16 particles are created along the two orbits of binary components. The masses of these particles are weighted by the interval of mean anomaly:
$$\begin{eqnarray*} m_{\mathrm{orb},i,k} &= & m_{i} \frac{\Delta \mathcal {M}_{k}}{2\pi }\nonumber\\ \Delta \mathcal {M}_{k} &= & \Delta \mathcal {E} - e \left[ \sin \left(\mathcal {E}_{k} + \frac{\Delta \mathcal {E}}{2} \right) - \sin \left(\mathcal {E}_{k} - \frac{\Delta \mathcal {E}}{2} \right) \right] \nonumber\\ &=& \Delta \mathcal {E} - 2 e \cos \left(\mathcal {E}_{k} \right) \sin \left(\frac{\Delta \mathcal {E}}{2} \right). \end{eqnarray*}$$
(29)
where |$\mathcal {M}_{k}$| and |$\mathcal {E}_{k}$| are the mean anomaly and the eccentric anomaly at the point of particle k, respectively. In this case, the particle mass approximately represents the orbit-average duration of the two-components at each |$\Delta \mathcal {E}$|⁠.
Pseudo-particle multipole method:
In the orbit-sampling method, at least eight sample particles are needed to reasonably represent the orbits of binaries. It is rather expensive since the number of sample particles per binary is large. Kawai & Makino (2001) introduce the pseudo-particle multipole method that the quadrupole moment of a particle group can be represented by only three pseudo-particles. The quadrupole tensor of N particles can be described by
$$\begin{eqnarray*} \mathcal {A} = \sum _{i=1}^{N} {m_i \boldsymbol {r}_i \otimes \boldsymbol {r}_i}, \end{eqnarray*}$$
(30)
where ⊗ is tensor production. The corresponding traceless form is
$$\begin{eqnarray*} \mathcal {A^{\prime }} = \frac{3}{2} \mathcal {A} - \frac{1}{2} T_{\mathrm{r}}(\mathcal {A}) = \frac{3}{2} \sum _{i=1}^{N} {m_i \boldsymbol {r}_i \otimes \boldsymbol {r}_i} - \frac{1}{2} \sum _{i=1}^{N} {m_i \boldsymbol {r}_i^2}, \end{eqnarray*}$$
(31)
where the term with |$\boldsymbol {r}_i^2$| is subtracted from the the diagonal elements of the matrix by |$\boldsymbol {r}_i \otimes \boldsymbol {r}_i$|⁠.
The orbit average of the binary motion can be treated as a continue distribution of mass along the orbits of two components. Thus, we can also derive the analytic formulae of its quadrupole moment. In the coordinate systems of the binary orbital plane where the three Delaunay’s elements are zero, the relative position vector has the form depending on |$\mathcal {E}$| as
$$\begin{eqnarray*} \Delta \boldsymbol {r} = \left[{\begin{array}{*{10}c}a \left(\cos {\mathcal {E}} - e \right) & a \sqrt{1 - e^{2}} \sin {\mathcal {E}} & 0\end{array}}\right]. \end{eqnarray*}$$
(32)
The two component position vectors have a relation to Δr by
$$\begin{eqnarray*} \boldsymbol {r}_1 &=& -\frac{m_2}{m_1+m_2} \Delta \boldsymbol {r}\nonumber\\ \boldsymbol {r}_2 &=& \frac{m_1}{m_1+m_2} \Delta \boldsymbol {r}, \end{eqnarray*}$$
(33)
where m1 and m2 are masses of two components. Put equations (32) and (33) into equation (30), we can obtain |$\mathcal {A}$| of the binary as a function of |$\mathcal {E}$|⁠. When the two components pass one full orbit, |$\mathcal {E}$| changes from 0 to 2π and t changes from 0 to P (period). The orbital average of the quadrupole moment should integrate one period of |$\mathcal {A}(\mathcal {E})$|⁠:
$$\begin{eqnarray*} \langle \mathcal {A} \rangle = \frac{1}{P} \int _0^{P} \mathcal {A}(\mathcal {E}) \mathrm{ d}t. \end{eqnarray*}$$
(34)
The differentials of t and |$\mathcal {E}$| has the relation:
$$\begin{eqnarray*} \mathrm{d}t = \frac{P}{2 \pi }(1 - e \cos {\mathcal {E}}) \mathrm{d}\mathcal {E}. \end{eqnarray*}$$
(35)
Replace dt by |$\mathrm{d}\mathcal {E}$| and do the integration, we can obtain the final form:
$$\begin{eqnarray*} \langle \mathcal {A} \rangle = \mu a^2 \left[{\begin{array}{ccc}2 e^{2} + \frac{1}{2} & \quad 0 & \quad 0\\ 0 & \quad \frac{1}{2} - \frac{e^{2}}{2} & \quad 0\\ 0 & \quad 0 & \quad 0\end{array}}\right], \end{eqnarray*}$$
(36)
where μ is reduced mass, m1m2/(m1 + m2).
We choose the coordinate system where |$\langle \mathcal {A} \rangle$| becomes traceless in order to use the pseudo-particle multipole method:
$$\begin{eqnarray*} \langle \mathcal {A^{\prime }} \rangle &= &\frac{3}{2}\langle \mathcal {A} \rangle - \frac{1}{2} T_{\mathrm{r}}(\langle \mathcal {A} \rangle) \nonumber\\ &= & \frac{1}{4} \mu a^2 \left[{\begin{array}{*{10}c}9 e^{2} + 1 & \quad 0 & \quad 0\\ 0 & \quad 1 - 6 e^{2} & \quad 0\\ 0 & \quad 0 & \quad - 3 e^{2} - 2\end{array}}\right] \end{eqnarray*}$$
(37)
Using equations (3), (5), and (6) in Kawai & Makino (2001), the three pseudo-particles with the equal mass of (m1 + m2)/3 are distributed at:
$$\begin{eqnarray*} \boldsymbol {r}_{\mathrm{p,1}} &= & a\sqrt{\frac{\mu }{m_{1} + m_{2}}} \left[{\begin{array}{*{10}c}0 & \sqrt{1-e^{2}} & 0\end{array}}\right]\nonumber\\ \boldsymbol {r}_{\mathrm{p,2}} &= & a\sqrt{\frac{\mu }{m_{1} + m_{2}}} \left[{\begin{array}{*{10}c}\sqrt{3 e^{2} + \frac{3}{4}} & - \sqrt{\frac{1-e^{2}}{4}} & 0\end{array}}\right]\nonumber\\ \boldsymbol {r}_{\mathrm{p,3}} &= & a\sqrt{\frac{\mu }{m_{1} + m_{2}}} \left[{\begin{array}{*{10}c}-\sqrt{3 e^{2} + \frac{3}{4}} & - \sqrt{\frac{1-e^{2}}{4}} & 0\end{array}}\right]. \end{eqnarray*}$$
(38)
These positions refer to the rest frame of the binary orbital plane. To obtain the correct direction of the orbital plane in the original frame, we need to multiply equation (38) by the rotational matrix based on the Delaunay’s elements.

6.4.3 Test

To confirm that the artificial particle algorithm can provide the correct perturbation and counterforce, we test a triple system with the initial condition listed in Table 2. We use three methods to integrate the motion of the system. Orbits integrated by the accurate SDAR method is used as a reference (names as the SDAR-REF model). Models using the P3T method with no artificial particles (no-TT), second- (TT-2), and third-order (TT-3) tidal-tensor methods are compared. The perturber is outside the changeover region of the binary. Thus the outer orbit is integrated by the leapfrog method. The inner binary is integrated by the SDAR method. ΔtL = 0.00390625 for the P3T method. The ratio between the binary period (Tbin, in) and ΔtL is about 0.51. Thus in the no-TT model, the long-range force is evaluated once every two Tbin, in. We also add a model of a binary with the initial condition the same as that of the outer binary in Table 2 (named as the B-out model). The leapfrog integrator with the same step size is used. The pseudo-particle multipole method is used for both TT-2 and TT-3 methods.

Table 2.

The initial condition of the hierarchical triple system for testing the artificial particle algorithm. mp and ms are the masses of the primary and the secondary of the inner and outer binaries. The values are shown in the scale-free unit with the gravitational constant, G = 1. a is semimajor axis. e is eccentricity. |$\mathcal {I}$|⁠, ϕ, and ψ are Delaunay’s elements. |$\mathcal {E}$| is eccentric anomaly. Tbin is period.

mpmsae|$\mathcal {I}$|ϕψ|$\mathcal {E}$|Tbin
In0.009000.001000.9000.9001.5000.1000.2003.141.97 × 10−3
Out1.000.011.5000.01000.1000.1000.1001.5011.5
mpmsae|$\mathcal {I}$|ϕψ|$\mathcal {E}$|Tbin
In0.009000.001000.9000.9001.5000.1000.2003.141.97 × 10−3
Out1.000.011.5000.01000.1000.1000.1001.5011.5
Table 2.

The initial condition of the hierarchical triple system for testing the artificial particle algorithm. mp and ms are the masses of the primary and the secondary of the inner and outer binaries. The values are shown in the scale-free unit with the gravitational constant, G = 1. a is semimajor axis. e is eccentricity. |$\mathcal {I}$|⁠, ϕ, and ψ are Delaunay’s elements. |$\mathcal {E}$| is eccentric anomaly. Tbin is period.

mpmsae|$\mathcal {I}$|ϕψ|$\mathcal {E}$|Tbin
In0.009000.001000.9000.9001.5000.1000.2003.141.97 × 10−3
Out1.000.011.5000.01000.1000.1000.1001.5011.5
mpmsae|$\mathcal {I}$|ϕψ|$\mathcal {E}$|Tbin
In0.009000.001000.9000.9001.5000.1000.2003.141.97 × 10−3
Out1.000.011.5000.01000.1000.1000.1001.5011.5

The evolution of orbital elements are show in Fig. 5. By selecting the Cartesian coordinate system (xy–z), the three Delaunay’s elements (angles) are (e.g. Wang et al. 2020):

  • |$\mathcal {I}$|⁠: inclination.

  • ϕ: longitude of the ascending node.

  • ψ: argument of periapsis.

Except ψin, both TT-2 and TT-3 models agree well with the SDAR-REF model on the evolution of inner orbital elements while the no-TT model does not. This suggests that the tidal-tensor method indeed provides a better result for the secular motions of the inner binary.

The evolution of orbital parameters of the inner and outer binaries for the triple system using different integration methods. For a and time, the scale-free units are used. The black colour represents the accurate result using the SDAR method as a reference. The green and blue colours represent the TT and no-TT models, respectively. The purple colour represents the B-out model. For each panel, we apply the scientific notation in the plotting style of y-axis: the actual values of y-axis are calculated by ytick × scale + yoffset, where ytick is the value shown along the the y-axis, scale is the first value shown above the y-axis (scale = 1 in default) and yoffset is the second value following the symbol ‘+’ (yoffset = 0 in default).
Figure 5.

The evolution of orbital parameters of the inner and outer binaries for the triple system using different integration methods. For a and time, the scale-free units are used. The black colour represents the accurate result using the SDAR method as a reference. The green and blue colours represent the TT and no-TT models, respectively. The purple colour represents the B-out model. For each panel, we apply the scientific notation in the plotting style of y-axis: the actual values of y-axis are calculated by ytick × scale + yoffset, where ytick is the value shown along the the y-axis, scale is the first value shown above the y-axis (scale = 1 in default) and yoffset is the second value following the symbol ‘+’ (yoffset = 0 in default).

On the other hand, the TT-3 model also provides a correct evolution of aout, ϕout, and iout. But TT-2 model cannot reproduce the correct oscillation of the outer orbit and the evolution overlaps with the B-out model. This suggests that the second-order tidal-tensor method cannot properly reproduce the secular motion of the outer orbit. The no-TT model disagree with all others.

However, all three models show large differences (oscillation) of eout and ψout compared with those of the SDAR-REF model, but agree well with those of the B-out models. The B-out model is a simple binary motion, thus eout and ψout should not evolve in reality. This indicates that the artificial oscillation is caused by the inaccuracy of the leapfrog method for the outer orbit.

This result suggests that the third-order tidal-tensor algorithm is a good choice to represent a reasonable secular motions of both inner and outer binaries. The low accuracy of the leapfrog method results in a relative error of 10−6 in the evolution of eout but the averaged value can converge to the correct one.

How the counterforces are calculated does not affect the orbital motions as shown in Fig. 5. The results are identical within the resolution thus we does not show. But the linear momentum conservation is sensitive to it. In Fig. 6, the evolution of the x-component in the centre-of-the-mass velocity of the triple is shown. Four models with different ways to calculate the counter forces are compared. As the number of sample particles decrease, the error (oscillation) is more obvious. The PM method provide a similar level of error as eight sample particles. Thus, it is a more efficient choice for the counter force if the high-order momentum is not important.

The evolution of the x-component of the centre-of-the-mass velocity of the triple system. The OS-cm model uses only the centre-of-the-mass of the inner binary to calculate the force to the perturber. The OS-4 and OS-8 models use orbit-sampling methods with 4 and 8 sample particles. The PM model uses three pseudo-particles.
Figure 6.

The evolution of the x-component of the centre-of-the-mass velocity of the triple system. The OS-cm model uses only the centre-of-the-mass of the inner binary to calculate the force to the perturber. The OS-4 and OS-8 models use orbit-sampling methods with 4 and 8 sample particles. The PM model uses three pseudo-particles.

6.5 Parallelization algorithm

Fig. 2 shows how the hybrid parallelization is implemented. The domain decomposition, exchanging particles between mpi processes, PT construction, and long-range force calculations are handled by fdps. The mpi and openmp are used together in fdps with a well-controlled load balance. The clustering (Section 6.3) is also parallelized by using mpi and openmp methods (Iwasawa et al. 2017). The long-range force calculation can be accelerated by both the SIMD instruction set of X86 architecture and the NVIDIA GPU using the CUDA programming environment (Iwasawa et al. 2020). The AVX, AVX2, and AVX-512 instructions are used for the SIMD accelerated implementation.

7 BENCHMARK

We carry out benchmarks to compare the long-term evolution of star clusters with and without binaries, and show the computing performances by using the petar and nbody6++gpu codes. A scaling test is also performed on the Cray XC50 supercomputer.

7.1 Comparison with nbody6++gpu

7.1.1 Performance on GPU-based desktop

We compare the performances of simulating star clusters with different numbers of particles (N = 1000, 10 000, 100 000, and 1000 000) by using the petar and nbody6++gpu codes. The name of models are listed in Table 3. The initial mass function (IMF) of Kroupa (2001) ranging from 0.08 to 40 M is applied. The Plummer model is used to generate the positions and velocities. The system is in virial equilibrium and the virial radius is 1.0 in the Hénon (1971) unit (hereafter named as NB unit). No tidal field and no stellar evolution are used in order to have a well-controlled comparison of the dynamics. For each N, we have two models with and without primordial binaries. In models with binaries, we use the period and eccentrmaricity distributions from Kroupa (1995a, b). Initially all particles are in binaries. The period distribution of this model has the form:
$$\begin{eqnarray*} \mathcal {F}(T_{\mathrm{bin}}) = 2.5 \frac{\log _{10}{T_{\mathrm{bin}}} - 1}{45 + (\log _{10}{T_{\mathrm{bin}}} - 1)^2}, \end{eqnarray*}$$
(39)
where the maximum and minimum of log10Tbin are truncated at 8.43 and 1.00 (in the unit of days), respectively. The eccentricity follows the thermal distribution. The orbits of some tight binaries are adjusted to avoid stellar collision (pre-main-sequence eigenevolution; Kroupa 1995b). Thus, a wide range of Tbin and eccentricities are covered, which is very suitable for testing the code. We assume that the initial half-mass radii of all models are 1.0 pc and the periods are scaled to be in the NB time unit.
Table 3.

The optimized sets of input parameters that influence the performance of nbody6++gpu and petar codes. N is the total number of particles; Nb is the total number of binaries; RKS and ΔtKS are the criterion to switch on the KS regularization in nbody6++gpu; and ΔtL and rout, ref are the tree time-step and the outer boundary reference of the changeover function for petar. NB units are used for distance and time.

ModelN (103)Nb (103)RKS (10−3)ΔtKS (10−5)ΔtLrout, ref (10−2)
N1k1013.21/1283.22490
N10k1000.46411/2561.61101
N100k10000.2150.31/5120.801104
N1m100.10.11/10240.399470
N1kb10.513.21/2561.62651
N10kb1050.76411/5120.802076
N100kb100500.5150.31/10240.400471
N1mkb10005000.40.11/20480.199039
The shared parametersnbody6++gpu|$\eta _{\mathrm{r}} = \eta _{\mathrm{i}} = 0.1\sqrt{2}$|⁠; Nb, p = 50
petar η = 0.1; θ = 0.3, 0.5; |$\dfrac{r_{\mathrm{out,ref}}}{r_{\mathrm{in,ref}}} = 10.0$|⁠; third tidal-tensor; PM
ModelN (103)Nb (103)RKS (10−3)ΔtKS (10−5)ΔtLrout, ref (10−2)
N1k1013.21/1283.22490
N10k1000.46411/2561.61101
N100k10000.2150.31/5120.801104
N1m100.10.11/10240.399470
N1kb10.513.21/2561.62651
N10kb1050.76411/5120.802076
N100kb100500.5150.31/10240.400471
N1mkb10005000.40.11/20480.199039
The shared parametersnbody6++gpu|$\eta _{\mathrm{r}} = \eta _{\mathrm{i}} = 0.1\sqrt{2}$|⁠; Nb, p = 50
petar η = 0.1; θ = 0.3, 0.5; |$\dfrac{r_{\mathrm{out,ref}}}{r_{\mathrm{in,ref}}} = 10.0$|⁠; third tidal-tensor; PM
Table 3.

The optimized sets of input parameters that influence the performance of nbody6++gpu and petar codes. N is the total number of particles; Nb is the total number of binaries; RKS and ΔtKS are the criterion to switch on the KS regularization in nbody6++gpu; and ΔtL and rout, ref are the tree time-step and the outer boundary reference of the changeover function for petar. NB units are used for distance and time.

ModelN (103)Nb (103)RKS (10−3)ΔtKS (10−5)ΔtLrout, ref (10−2)
N1k1013.21/1283.22490
N10k1000.46411/2561.61101
N100k10000.2150.31/5120.801104
N1m100.10.11/10240.399470
N1kb10.513.21/2561.62651
N10kb1050.76411/5120.802076
N100kb100500.5150.31/10240.400471
N1mkb10005000.40.11/20480.199039
The shared parametersnbody6++gpu|$\eta _{\mathrm{r}} = \eta _{\mathrm{i}} = 0.1\sqrt{2}$|⁠; Nb, p = 50
petar η = 0.1; θ = 0.3, 0.5; |$\dfrac{r_{\mathrm{out,ref}}}{r_{\mathrm{in,ref}}} = 10.0$|⁠; third tidal-tensor; PM
ModelN (103)Nb (103)RKS (10−3)ΔtKS (10−5)ΔtLrout, ref (10−2)
N1k1013.21/1283.22490
N10k1000.46411/2561.61101
N100k10000.2150.31/5120.801104
N1m100.10.11/10240.399470
N1kb10.513.21/2561.62651
N10kb1050.76411/5120.802076
N100kb100500.5150.31/10240.400471
N1mkb10005000.40.11/20480.199039
The shared parametersnbody6++gpu|$\eta _{\mathrm{r}} = \eta _{\mathrm{i}} = 0.1\sqrt{2}$|⁠; Nb, p = 50
petar η = 0.1; θ = 0.3, 0.5; |$\dfrac{r_{\mathrm{out,ref}}}{r_{\mathrm{in,ref}}} = 10.0$|⁠; third tidal-tensor; PM

The simulations are performed on a GPU-based desktop computer. The computer is equipped with one AMD RYZEN 3970X CPU (3.7 GHz) which includes 32 physical cores, one NVIDIA RTX 2080Ti GPU and 4-channel DDR4-3200 SDRAM memories. Both the codes use the hybrid parallelization methods containing mpi,openmp, AVX2, and GPU (CUDA). For small numbers of particles, using all CPU cores causes the issue of load balance and overshooting of communication. Thus, the number of cores for each simulations are adjusted to obtain the best performance.

One important point is that the performance of the two codes are sensitive to different inputting parameters. We adjust the parameters for each model in order to optimize the performance. The important ones are listed in Table 3.

For the Hermite integrator of both two codes, the time-step is calculated by (Aarseth 2003; Oshino et al. 2011)
$$\begin{eqnarray*} \Delta t_{\mathrm{H},i}= \mathrm{min}\left(\eta \sqrt{\tfrac{\sqrt{\left|\boldsymbol {A}^{(0)}_i\right|^2 + A^2_{0}} \left|\boldsymbol {A}^{(2)}_i\right| + \left|\boldsymbol {A}^{(1)}_i\right|^2}{\left|\boldsymbol {A}^{(0)}_i\right| \left|\boldsymbol {A}^{(3)}_i\right| + \left|\boldsymbol {A}^{(2)}_i\right|^2} }, \Delta t_{\mathrm{H,max}}\right), \end{eqnarray*}$$
(40)
where |$\boldsymbol {A}^{(j)}_i$| is the acceleration of a particle (i) and its j-order time derivatives, A0 is the constant coefficient for safety, and ΔtH, max is input parameter. In nbody6++gpu, A0 is calculated only for particles having no neighbours by assuming an artificial particle locating at the centre of the system with the mass of 0.01〈m〉. In petar, |$A_{0} = 0.1 \langle m \rangle / r_{\mathrm{nb},i}^2$| where 〈m〉 is the local-averaged mass of particles. Since both codes use the block-time-step method, the calculated ΔtH, i is adjusted to the integer power of 0.5. In nbody6++gpu, ηi for neighbours and ηr for distance particles are set to |$0.1\sqrt{2}$|⁠.

Another three major parameters in nbody6++gpu that determine the performance are the expected number of neighbours, Nb, p, the separation of a neighbour pair, RKS, and the Hermite time-step to trigger on KS regularization, ΔtKS. In our models, Nb, p = 50. One important tip is that the KS criterion not only influences the performance, but also has a big impact on the accuracy of integrating the internal motions of binaries. A strict criterion can avoid a too frequent switch of KS but increase the error of integrating orbits (especially for eccentricities) of wide binaries. The choices of RKS and ΔtKS are shown in Table 3.

In petar, the Hermite time-step coefficient does not significantly influences the performance. We use the value of 0.1. The major impact to the performance comes from θ, ΔtL and the changeover function. We compare θ of 0.3 and 0.5 with quadrupole moment of the PT force for all models. The energy error of the P3T method depends on the combination of ΔtL and the changeover function (Iwasawa et al. 2015). When ΔtL is given, we determine the reference of the outer changeover boundary as:
$$\begin{eqnarray*} r_{\mathrm{out,ref}}= 10 \Delta t_{\mathrm{L}}\sigma _{\mathrm{1D}} \end{eqnarray*}$$
(41)
where σ1D is the 1D velocity dispersion of the system. We fix rout, ref/rin, ref to 10.0. The third-order tidal-tensor and pseudo-particle multipole methods are used.
By checking a group of ΔtL, we can find a balanced combination of the parameters to obtain the best performance. The computational cost of the long-range force and kick velocity per P3T-kick step is roughly constant. Thus, the wall clock time of one NB time unit (TW) is anticorrelated with ΔtL. In contrast, when ΔtL is reduced, the sizes of clusters for short-range interactions are smaller due to a shorter rout, ref. This gives a better load-balance and a less computational cost in the P3T-drift and the clustering. Therefore, one balance ΔtL can be found to achieve the best performance. In petar, we first estimate rout, ref by
$$\begin{eqnarray*} r_{\mathrm{out,ref}}= 0.1 \frac{G M}{3 N^{1/3} \sigma _{\mathrm{1D}}^2}, \end{eqnarray*}$$
(42)
where M is the total mass of the system and G is gravitational constant. Then using equation (41), we obtain the first guess of ΔtL and check the best value around it. One example of this check process for the N10k model is shown in Fig. 7. When ΔtL increases, TW(P3T-kick) and TW(clustering) decrease while TW(P3T-drift) increases. The balanced ΔtL = 1/256. Above this value, TW(P3T-drift) increases significantly because a very large cluster with 2000 members forms due to a set of large neighbour radii. This completely kills the load balance of paralellization and the benefit of the P3T method.
The wall clock times per NB time unit (TW) depending on ΔtL for the N10k model performed by using petar. Colours represent the different parts of the computing (total, P3T-kick, P3T-drift, and clustering). The minimum value of the total TW indicates the balanced choice of ΔtL for the best performance.
Figure 7.

The wall clock times per NB time unit (TW) depending on ΔtL for the N10k model performed by using petar. Colours represent the different parts of the computing (total, P3T-kick, P3T-drift, and clustering). The minimum value of the total TW indicates the balanced choice of ΔtL for the best performance.

The results of TW and the relative energy error (|dE/E|) per NB time unit are shown in Fig. 8. For each model, the maximum 10 NB time-steps are simulated to reduce fluctuation. In both codes, the models with binaries sometimes have a very large TW in a short time interval due to the events of few-body interaction and the existence of semistable (perturbed) few-body systems. In our analysis, we remove these data since they do not represent the normal performance of the codes. Besides, we remove the first one steps in some models performed by nbody6++gpu if their TW are very different from that of other steps. In the long-term simulations, the averaged TW may be larger if hierarchical systems frequently form. This is easier to happen when N is small, because the boundary of wide binaries (equation 1) is large thus more space is available to form hierarchical systems.

The comparison of wall clock time, TW (upper panel), and relative energy error, |dE/E| (lower panel), for models listed in Table 3. For the data of petar, ‘-T03’ and ‘-T05’ indicate the opening angles of 0.3 and 0.5, respectively. The values of TW in sec are also printed near the data points. TW takes the average value of a few steps to avoid fluctuation. |dE/E| of each step (maximum 10 steps) is shown. The left- and right-hand panels show the models without and with binaries, respectively.
Figure 8.

The comparison of wall clock time, TW (upper panel), and relative energy error, |dE/E| (lower panel), for models listed in Table 3. For the data of petar, ‘-T03’ and ‘-T05’ indicate the opening angles of 0.3 and 0.5, respectively. The values of TW in sec are also printed near the data points. TW takes the average value of a few steps to avoid fluctuation. |dE/E| of each step (maximum 10 steps) is shown. The left- and right-hand panels show the models without and with binaries, respectively.

For models without binaries, the two codes have a similar performance for N of 1000 and 10 000. The differences appears when N becomes large. TW of petar well follows the scaling line of Nlog N. However, the result of nbody6++gpu scale differently for small and large N. It is expected that TW is proportional to N7/4, which is estimated by including the AC neighbour scheme. Only when N is large, the result follows N7/4, while when N is small, it follows Nlog N. The reason is probably due to the scaling of parallelization. When N is too small, the parellelization of multi cores does not help to improve the performance. As N increases, the computational cost of direct N-body method significantly increase, thus the parallelization efficiency (floating point operations versus peak performance of CPU and GPU) also increases. Once the efficiency reaches the maximum, the scaling begins to follow N7/4. In million-body case, petar provides a five times faster performance than that of nbody6++gpu.

The significantly difference of the performance is shown in the models with large number of primordial binaries (⁠|$100{{\ \rm per\ cent}}$|⁠). For N of 1000, the two codes have a similar performance, while the difference starts from N = 10000. petar code gives a much faster performance compared to nbody6++gpu. In million-body case, the performance difference is 12 times. As discussed in Section 3.6, nbody6++gpu does not parallize the KS regularzation and the switching of KS is very expensive. This significantly reduces the performance. In contrast, the behaviour of petar is much better. Especially, the actual computing time of million-body model with full of binaries by using petar is even faster than the million-body model without binaries performed by nbody6++gpu. This result show the great advantage of petar for large N-body models with many binaries.

The relative energy errors, dE/E, are also compared in the bottom panels of Fig. 8. The fluctuation is large for both codes. In the models with no binaries, nbody6++gpu gives a systematically better energy conservation for N < 105. The small θ help to reduce the errors while the performance does not significantly change. The wide range of mass spectrum is one important reason for the larger error. With equal masses, the error is much smaller (not shown here). The major contribution of the error in the models performed by petar comes from the P3T-kick, due to the approximation of the long-range force.

For a particle group, the changeover radii of equation (12) takes care of the mass-dependent long-range tidal perturbation. However, the mass-dependent error still exists in the centre-of-the-mass motion of the group due to the low accuracy of the leapfrog method (see Section 6.4.3). Especially, the gravitational focusing enhances the error of the massive objects if the changeover radii is not sufficient large. To avoid this, we need an (mi/〈m〉)1/2 dependent changeover radii, which can cause the formation of large clusters and the load balance can become bad. One possible solution is to implement the nested openmp parallelization for the Hermite and SDAR integrations inside the clusters. This cannot fully solve the load-balance issue but may provide a better performance than reducing ΔtL or θ in some conditions, especially when a very massive object like SMBH exists.

Nevertheless, both the codes show an increasing cumulative errors of a comparable level in the long-term simulation, while the dynamical behaviour seems to be consistent (Section 7.1.2). If the global trend is not bad, even the energy error suddenly increases in one step, it only indicates that one specific event (in most cases this is a few-body interaction) is not well treated, but the global evolution is still statistically correct.

In the models with binaries, both codes have a similar level of errors. However, we should notice that the definition of error with binaries are different in nbody6++gpu and petar. In nbody6++gpu, the total energy of the systems are included in the energy conservation check, thus the energy error is completely dominated by the binary with the highest binding energy. In petar, we follow the definition of SD energy used in the sdar code (Wang et al. 2020), where the energy of binaries are scaled down by the SD factor. Thus, the energy error reflects more about the global behaviour of the system.

7.1.2 Long-term evolution

Lagrangian and core radii:

We check the long-term evolution including the post-core-collapse stage of a star cluster without binaries (N250k model). Initially, the cluster contains 250 000 stars with the IMF of Kroupa (2001) ranging from 0.08 to 40 M. For nbody6++gpu, the initial Nb, p = 200, RKS = 1.0 × 10−5, and ΔtKS = 1.0 × 10−7, while the values are adjusted in the middle of the simulation. For petar, ΔtL = 1/1024, rout, ref = 0.00398365 and θ = 0.5.

The evolution of the Lagrangian and core radii of the N250k model is shown in Fig. 9. We can see that the results of the two codes well overlap each others. The core collapse finishes around 300 NB time unit. Both codes give the same core-collapse time and the evolution of the core radius. This result indicates that petar can provide the same long-term evolution of the global density profile as the direct N-body method.

The evolution of the Lagrangian and core radii (N250k model) by using petar (black solid curves) and nbody6++gpu (red dashed curves). From the bottom to the top, the curves represent the core radius, $10{{\ \rm per\ cent}}$, $30{{\ \rm per\ cent}}$, $50{{\ \rm per\ cent}},\, 70{{\ \rm per\ cent}}$ Lagrangian radii, respectively.
Figure 9.

The evolution of the Lagrangian and core radii (N250k model) by using petar (black solid curves) and nbody6++gpu (red dashed curves). From the bottom to the top, the curves represent the core radius, |$10{{\ \rm per\ cent}}$|⁠, |$30{{\ \rm per\ cent}}$|⁠, |$50{{\ \rm per\ cent}},\, 70{{\ \rm per\ cent}}$| Lagrangian radii, respectively.

Escapers:

We compare the properties of escapers for the N250k model in Fig. 10. Two codes provide consistent time and mass distribution of escapers. We can also clearly identify the low-mass escapers caused by the relaxation-driven evaporation and the high-mass escapers ejected by strong few-body interactions in the core. The latter appears after the core collapse. The two components of binary escapers have similar masses and are the most massive objects in the cluster. Both codes provide a similar number of binary escapers (6 and 7) with very similar masses and consistent distribution of a and e. This result indicates that petar and nbody6++gpu well agree on the properties of escapers for the model without primordial binaries.

The properties of escapers for the N250k model performed by using petar and nbody6++gpu. The upper panel: time and mass distribution of single and binary escapers. Triangles indicate the two components of each binary escaper. The mass is scaled to the unit of solar mass. The lower panel: semimajor axis (a) and eccentricity (e) of binary escapers. The areas of markers are proportional to the masses of binaries.
Figure 10.

The properties of escapers for the N250k model performed by using petar and nbody6++gpu. The upper panel: time and mass distribution of single and binary escapers. Triangles indicate the two components of each binary escaper. The mass is scaled to the unit of solar mass. The lower panel: semimajor axis (a) and eccentricity (e) of binary escapers. The areas of markers are proportional to the masses of binaries.

Performance and energy error:

In Fig. 11, the long-term behaviour of the performance and the energy error of the N250k model are compared. At the beginning, petar provides a twice faster performance than that of nbody6++gpu. After core collapse, the performance of petar slows down due to a larger cost of P3T-drift (yellow curves). This is expected since a dense core results in a larger size of particle cluster and smaller Hermite time-steps. There are a few sharp peaks, which are due to an expensive calculation of a few-body interaction. Such behaviour can happen when a tight multiple system (e.g. triple, quadruple) forms and stay for a while before a disruption by close encounters. Sometimes, the criterion for switching on and off the SDAR method or the initial integration step size are not well adjusted to the specific condition. Thus, a larger error or an expensive calculation can appear. There is probably no uniform way that can well handle all type of few-body interactions. This is the same for both nbody6++gpu and petar. The performance of nbody6++gpu becomes better in the late phase of the evolution.

TW of total, P3T-kick and P3T-drift (upper panel) and dE/E (lower panel) versus the physical evolution time for the N250k model. The label, (S) and (C), in the legend of dE/E indicate ‘per time’ and ‘cumulative’, respectively.
Figure 11.

TW of total, P3T-kick and P3T-drift (upper panel) and dE/E (lower panel) versus the physical evolution time for the N250k model. The label, (S) and (C), in the legend of dE/E indicate ‘per time’ and ‘cumulative’, respectively.

The relative energy error dE/E per step (S) of petar has a larger fluctuation while the cumulative errors (C) converge around zero until t = 1000 NB unit. Large fluctuations appear after the core collapse when few-body interactions start to eject massive objects out of the core. The appearance of large dE/E at one step is often associated with the formation of a large particle cluster including a few hundreds members. This is caused by a high-velocity particle which has a large rnb, i. In such case, the particle moves a long distance during one ΔtL, thus the time-step of the long-range interaction is too large. If this particle is also massive, a large dE/E can appear. The way to solve this issue is to reduce ΔtL. In our simulation, to have consistent parameters for measuring the performance, ΔtL is not modified. In an application of an astrophysical study, after core collapse, it is better to adjust the ΔtL and rout, ref based on the new state of the system.

The behaviour of dE/E by using nbody6++gpu is better but large jumps of dE/E sometimes appear and result in a large cumulative error. From the information record of the simulations, significant time-step jumps happen for some particles when the big error appears. It can be caused by the unsuitable criterion to switch on KS regularization, that a strong close encounter is not caught by the KS method. However, it is not easy to design a general criterion to handle all kind of case while keeping a good performance. In such case, the usual way to avoid such large error is to restart the simulation by reducing ηr and ηi or enlarging the KS criterion, RKS, and ΔtKS. For a test purpose, we do not restart the simulation and allow such large dE/E(S) in our models.

Although the error of 10−3 is not small, the behaviour of the long-term dynamical evolution is not very sensitive to it, as shown in Fig. 9. This suggests that we should not use the energy conservation as a strict judgement for the quality of simulations, but should focus on what the physical processes we really care and whether they are treated properly. If the interested objects cause a big energy error, we need to properly validate the result with better energy conservation. On the other hand, sometimes energy conservation can be misleading when we compare the simulations done by using different approximations. For instance, if all binaries in the clusters are treated as isolated and do not exchange energy with other stars, the energy conservation can be much better since the difficulty of the few-body interactions is avoided, but the results are completely wrong. Another example is to use the softening length, which gives a wrong behaviour of close encounters but result in a much better energy conservation.

7.1.3 Binaries

We perform another simulation of star clusters with primordial binaries (the N100kb model) in order to compare the behaviour of the dynamical evolution of binaries. The initial controlling parameters are the same as shown in Table 3. For petar, we choose the opening angle as θ = 0.3 for a better accuracy. The evolution of Lagrangian and core radii are shown in Fig. 12. Similar to the result of N250k, the two results agree with each other very well.

The evolution of the Lagrangian and core radii for N100kb model. The plotting style is similar to Fig. 9.
Figure 12.

The evolution of the Lagrangian and core radii for N100kb model. The plotting style is similar to Fig. 9.

The initial and final distribution (after 475 NB time unit) of semimajor axis, a, and eccentricity, e, are compared in Fig. 13. There is a forbidden region of a and e in the initial distribution based on the pre-main-sequence eigenevolution model of Kroupa (1995b). Since stars have physical sizes, the pericentres of binaries cannot be lower than a threshold, otherwise the binaries collide. In our simulations, particles are treated as point masses. Thus, after interactions, a part of binaries, especially the tight binaries which are easily perturbed, can enter the forbidden region. The two histograms show that the two codes provide very similar final distributions of a and e.

Distributions of the semimajor axis a and eccentricity e for the N100kb model performed by using petar and nbody6++gpu. Each point in the central panel indicate one binary. We randomly select $20{{\ \rm per\ cent}}$ binaries to show in order to avoid too many data points. The top and right-hand panels show the histograms of a and e, respectively. The black, green, and red colours indicate the initial condition, the results of petar and nbody6++gpu, respectively.
Figure 13.

Distributions of the semimajor axis a and eccentricity e for the N100kb model performed by using petar and nbody6++gpu. Each point in the central panel indicate one binary. We randomly select |$20{{\ \rm per\ cent}}$| binaries to show in order to avoid too many data points. The top and right-hand panels show the histograms of a and e, respectively. The black, green, and red colours indicate the initial condition, the results of petar and nbody6++gpu, respectively.

We also compare the properties of escapers in Figs 14 and 15. Fig. 14 show the time and mass distribution of single and binary escapers. The semimajor axis (a) versus eccentricity (e) and component mass ratio (m2/m1) versus binary mass (m1 + m2) are shown in Fig. 15. The two codes well agree with each other on all properties compared here. After 474 NB time unit, the numbers of single and binary escapers are 689 and 110 in the case of petar, and 733 and 137 in the case of nbody6++gpu, respectively. nbody6++gpu produce a slightly more escapers, but generally they agree well considering the statistical fluctuation. Therefore, petar and nbody6++gpu have a consistent treatment on the long-term dynamical evolution of binaries in star clusters.

Distributions of the times and masses for single (dot) and binary (triangle) escapers in the N100kb model performed by using petar and nbody6++gpu. In the top and right-hand histograms, single and binary escapers are represented by the step and bar styles, respectively.
Figure 14.

Distributions of the times and masses for single (dot) and binary (triangle) escapers in the N100kb model performed by using petar and nbody6++gpu. In the top and right-hand histograms, single and binary escapers are represented by the step and bar styles, respectively.

Upper: the distributions of semimajor axis, a, and eccentricity, e for binary escapers in N100kb models. The areas of markers are proportional to the binary masses. Lower: the distributions of the components’ mass ratios and binaries’ masses (m1 + m2). The areas of markers are proportional to the logarithmic of a with an offset.
Figure 15.

Upper: the distributions of semimajor axis, a, and eccentricity, e for binary escapers in N100kb models. The areas of markers are proportional to the binary masses. Lower: the distributions of the components’ mass ratios and binaries’ masses (m1 + m2). The areas of markers are proportional to the logarithmic of a with an offset.

The performance and the energy error of the codes are shown in Fig. 16. Initially, petar is about 4 times faster than nbody6++gpu due to a more efficient treatment of primordial binaries. The performance is stable till the end except for a few peaks due to the formation of stable multiple systems. The performance of nbody6++gpu becomes better in the late time due to the disruption of wide binaries. There are also several peaks where the performance significantly drops due to the appearances of stable multiple systems. At the end, the performance of petar is about 1–2 times faster than that of nbody6++gpu. Notice that here we compare a model of 105 particles. Due to the time-consuming calculation in nbody6++gpu, we have not compared a long-term evolution of million-body systems with large primordial binaries. Based on the result of Fig. 8, we expect that the performance difference is more obvious in the case of simulating massive GCs.

TW (upper panel) and dE/E (lower panel) versus the physical evolution time for the N100kb model. The plotting style is similar to Fig. 11.
Figure 16.

TW (upper panel) and dE/E (lower panel) versus the physical evolution time for the N100kb model. The plotting style is similar to Fig. 11.

7.2 Scaling on supercomputer

We test the performance of petar depending on N and the number of cores (Ncore) on the Cray XC50 supercomputer. Each computing node has two of Intel Xeon Gold 6148 processors (Skylake), that is 40 cores. The maximum of Ncore = 960. No GPU devices are available. The mpi and openmp are used together. For Ncore > 10, the number of openmp threads are fixed to be 10 while the number of mpi processes is Ncore/10. For Ncore ≤ 10, we only use openmp. The AVX-512 acceleration is used.

The initial properties of the star clusters except N are the same as that of the single and binary models used in Section 7.1.1. The naming style of the models follows the style of N10k, N1m, N10kb, etc. The configuration of input parameters follows Table 3. Since Ncore is large, we set the maximum of N to be 10 million. The N10m and N10mb models have ΔtL = 1/2048 and 1/4096 and rout, ref ≈ 0.00199303 and 0.000996758, respectively. We choose the θ of 0.5.

The result is shown in Fig 17. The scaling of the N10k and N10kb model becomes flat when Ncore > 5. The thresholds are about 160 and 640 for the case of N = 105 and 106, respectively. The N10m and N10mb models can well scale up to Ncore = 960. The absolute values of TW at the thresholds for million-body cases are about 63 and 121 s for single and binary models, respectively. The binary models roughly cost twice TW of that in single models, because for a given N, ΔtL in the binary model is half of that in the single model.

The performance test of petar on the Cray XC50 supercomputer. The two panels show TW of single and binary models versus the number of CPU cores (Ncore), respectively. Colours indicate different N. The values TW in sec are shown above the data points. The dashed lines indicate the ideal scaling (∝1/Ncore). The vertical line (Ncore = 10) indicates the boundary where the simulations use only OpenMP in the left and use both mpi and openmp in the right.
Figure 17.

The performance test of petar on the Cray XC50 supercomputer. The two panels show TW of single and binary models versus the number of CPU cores (Ncore), respectively. Colours indicate different N. The values TW in sec are shown above the data points. The dashed lines indicate the ideal scaling (∝1/Ncore). The vertical line (Ncore = 10) indicates the boundary where the simulations use only OpenMP in the left and use both mpi and openmp in the right.

The million-body Dragon models with only |$5{{\ \rm per\ cent}}$| primordial binaries took about 3000 s per NB time unit on a GPU-based supercomputer (Wang et al. 2016). Thus, TW of the N10mb model are even much less than that of the Dragon models. Notice that we cannot directly compare the absolute TW as the hardware are very different. Besides, the long-term behaviour can be different from the initial case. But we can still obviously see the significant advance of the performance by using petar on a modern supercomputer even without GPU. Although in our test, the stellar evolution is not included, it should not significantly change the result as the computational complexity of that part is only O(N). It is expected that petar may even be possible to solve the 10-million-body problem.

8 CONCLUSION AND FUTURE WORK

In this work, we have detailed described the new N-body code, petar. It is designed to simulate gravitationally bound collisional stellar systems with many subsystems. The code is publicly available on GitHub (see footnote 1). The hybrid integrator is implemented, where the long-range interaction is calculated by using the Barnes–Hut PT method and short-range (neighbour) interactions are handled by the fourth-order Hermite with block time steps and the SDAR methods. This hybrid method not only provides a calculation cost of O(Nlog N) for the long-range interaction compared to O(N2) in the direct N-body method, but also can accurately follow the long-term evolution of perturbed binaries and hierarchical systems.

We introduce the mass-dependent changeover function for handling the systems with a wide range of particle masses (Section 6.1). The clustering method based on the implementation in pentacle is improved by using mass and velocity dependent searching radii. The orbit-dependent neighbour criterion is introduced to handle the high-velocity particles (Section 6.3). In order to accurately follow the long-range perturbation to tight binaries when Tbin < ΔtL, the artificial particle algorithm is developed (Section 6.4).

For a high performance on multiple-core computers, the code is implemented by using of fdps, which provides a well-optimized mpi and openmp parallelization of PT part for the tree construction and long-range force calculations. The neighbour searching and long-range force calculation in P3T-kick are optimized by using of SIMD instructions (AVX, AVX2, AVX-512) and GPU acceleration (CUDA). The sdar library is used to perform the P3T-drift (short-range interactions). This part is parallelized by using of openmp in each mpi processes.

A series of simulations are performed by using petar and nbody6++gpu in order to compare the performance and to validate whether petar can properly follow the long-term evolution of star clusters. On a highly configured GPU-based desktop, the performance of petar follows the scaling of O(Nlog N) and is faster than nbody6++gpu (Fig. 8). Especially for million-body systems with a large fraction of primordial binaries, petar can give an 11 times faster performance. Notice that the performance of petar is very sensitive to ΔtL (Fig. 7). The best value is chosen in our test.

The test of petar on the Cray XC50 supercomputer show a good scaling depending on the number of CPU cores for N ≥ 106 (Fig. 17). Million-body simulations with |$100{{\ \rm per\ cent}}$| primordial binaries only take about 121 s wall clock time per NB time unit. The 10 million models with and without (⁠|$100{{\ \rm per\ cent}}$|⁠) primordial binaries take 1344 and 1106 s, which is even faster than that of the Dragon models with only |$5{{\ \rm per\ cent}}$| binaries. Such significant improvement is due to the benefit of using new algorithms (P3T with SDAR), well optimized parallelization from fdps and the advance of hardware.

We also compare the long-term behaviour of the performance and relative energy error in Figs 11 and 16. In the case without primordial binaries, after core collapse, petar becomes slower due to the increasing of central density. This is possibly improved if ΔtL and rout, ref are readjusted during or after the core collapse.

Sometimes, the computation becomes very slow during a short physical time interval of the models due to the formation of a specific type of stable hierarchical systems. Such behaviours exist for both petar and nbody6++gpu in the models with primordial binaries (N100kb). Once it happens, the benefit of parallel computing is lost due to a bad load balance. However, how to efficiently and accurately solve such systems is a long existing question. In the future work, the code will be improved to well handle a part of them. But it is difficult to find a universal solution for all cases. Because of this, the actual computing time of a long-term simulation may be longer than the prediction estimated from a few initial steps if such systems frequently form.

The results of N250k and N100kb models indicate that for both star clusters with and without primordial binaries, petar can provide a good agreement on the long-term behaviours of Lagrangian and core radii (including the post-core-collapse evolution; Fig. 9 and 12), the properties of single and binary escapers (Figs 14 and 15) and the evolution of binary orbits (Fig. 13). Thus, petar is accurate enough to handle the realistic models of star clusters with many multiple systems.

The API to the stellar evolution package based on the framework of sse and bse (Hurley, Pols & Tout 2000; Hurley, Tout & Pols 2002) are implemented. The interface is designed in a way that switching between different versions of the sse and bse is straightforward. Moreover, petar is also implemented as a module in a hydro dynamics code, asura-bridge (private comm. with Michiko Fujii), using the Bridge method (Fujii et al. 2007). Thus, this hybrid code is possible to be used for studying the formation of star clusters, where few-body dynamics including close encounters, formation and evolution of binaries and hierarchical systems can be treated accurately. Besides, the application programming interface (API) to amuse (Portegies Zwart et al. 2013) is under developing. The complete petar module in amuse will allow us to use the Bridge method to combine modules like hydrodynamic codes for a wide range of studies and to work with different single and binary stellar evolution packages. On the other hand, the current version of petar only includes the pure Newtonian gravitational pair interaction. In the future, the general relativity effect for compact object binaries including BHs and NSs will be implemented.

ACKNOWLEDGEMENTS

We thank Sverre Aarseth for the helpful suggestions on the code development. LW thanks the financial support from JSPS International Research Fellow (School of Science, The University of Tokyo) and the support from Alexander von Humboldt Foundation (The University of Bonn). Numerical computations were in part carried out on Cray XC50 at Center for Computational Astrophysics, National Astronomical Observatory of Japan.

DATA AVAILABILITY

The (benchmark) data underlying this article were generated by using the petar code on the desktop computer of the corresponding author and the supercomputer, Cray XC50 at Center for Computational Astrophysics (CfCA), National Astronomical Observatory of Japan. The data underlying this article will be shared on reasonable request to the corresponding author. The code, petar, introduced in this article is publicly available in GitHub at https://github.com/lwang-astro/PeTar, under the MIT license.

Footnotes

1

Notice that sdar and SDAR (with different font styles) are the names of the code and the algorithm, respectively.

REFERENCES

Aarseth
S. J.
,
2003
,
Gravitational N-Body Simulations
.
Cambridge Univ. Press
,
Cambridge

Ahmad
A.
,
Cohen
L.
,
1973
,
J. Comput. Phys.
,
12
,
389

Barnes
J.
,
Hut
P.
,
1986
,
Nature
,
324
,
446

Binney
J.
,
Tremaine
S.
,
1987
,
Galactic Dynamics
,
Princeton Univ. Press
,
Princeton, NJ

Chambers
J. E.
,
1999
,
MNRAS
,
304
,
793

Drinkwater
M. J.
,
Jones
J. B.
,
Gregg
M. D.
,
Phillipps
S.
,
2000
,
PASA
,
17
,
227

Fujii
M.
,
Iwasawa
M.
,
Funato
Y.
,
Makino
J.
,
2007
,
PASJ
,
59
,
1095

Fukushige
T.
,
Kawai
A.
,
2016
,
PASJ
,
68
,
30

Gaburov
E.
,
Harfst
S.
,
Portegies Zwart
S.
,
2009
,
New Astron.
,
14
,
630

Greengard
L.
,
Rokhlin
V.
,
1987
,
J. Comput. Phys.
,
73
,
325

Heggie
D. C.
,
1975
,
MNRAS
,
173
,
729

Hemsendorf
M.
,
Khalisi
E.
,
Omarov
C. T.
,
Spurzem
R.
,
2003
,
High Performance Computing in Science and Engineering, Vol. 71
.
Springer Verlag
,
New Yok
, p.
388

Hénon
M.
,
1971
,
Ap&SS
,
13
,
284

Hernquist
L.
,
1987
,
ApJS
,
64
,
715

Hilker
M.
,
Infante
L.
,
Vieira
G.
,
Kissler-Patig
M.
,
Richtler
T.
,
1999
,
A&AS
,
134
,
75

Hills
J. G.
,
1975
,
AJ
,
80
,
809

Hockney
R. W.
,
Eastwood
J. W.
,
1988
,
Computer Simulation Using Particles
.
CRC Press
,
Boca Raton, FL

Hurley
J. R.
,
Pols
O. R.
,
Tout
C. A.
,
2000
,
MNRAS
,
315
,
543

Hurley
J. R.
,
Tout
C. A.
,
Pols
O. R.
,
2002
,
MNRAS
,
329
,
897

Iwasawa
M.
,
Portegies Zwart
S.
,
Makino
J.
,
2015
,
Comput. Astron. Cosmol.
,
2
,
6

Iwasawa
M.
,
Tanikawa
A.
,
Hosono
N.
,
Nitadori
K.
,
Muranushi
T.
,
Makino
J.
,
2016
,
PASJ
,
68
,
54

Iwasawa
M.
,
Oshino
S.
,
Fujii
M. S.
,
Hori
Y.
,
2017
,
PASJ
,
69
,
81

Iwasawa
M.
,
Namekata
D.
,
Nitadori
K.
,
Nomura
K.
,
Wang
L.
,
Tsubouchi
M.
,
Makino
J.
,
2020
,
PASJ
,
72
,
13

Kawai
A.
,
Makino
J.
,
2001
,
ApJ
,
550
,
L143

Kroupa
P.
,
1995a
,
MNRAS
,
277
,
1491

Kroupa
P.
,
1995b
,
MNRAS
,
277
,
1507

Kroupa
P.
,
2001
,
MNRAS
,
322
,
231

Kustaanheimo
P.
,
Stiefel
E.
,
1965
,
J. Reine Angew. Math.
,
218
,
204

Makino
J.
,
Aarseth
S. J.
,
1992
,
PASJ
,
44
,
141

Makino
J.
,
Hut
P.
,
1988
,
ApJS
,
68
,
833

Makino
J.
,
Fukushige
T.
,
Koga
M.
,
Namura
K.
,
2003
,
PASJ
,
55
,
1163

McMillan
S. L. W.
,
Aarseth
S. J.
,
1993
,
ApJ
,
414
,
200

Mikkola
S.
,
Aarseth
S. J.
,
1996
,
Celest. Mech. Dyn. Astron.
,
64
,
197

Mikkola
S.
,
Tanikawa
K.
,
1999
,
MNRAS
,
310
,
745

Nitadori
K.
,
Aarseth
S. J.
,
2012
,
MNRAS
,
424
,
545

Oshino
S.
,
Funato
Y.
,
Makino
J.
,
2011
,
PASJ
,
63
,
881

Panamarev
T.
,
Just
A.
,
Spurzem
R.
,
Berczik
P.
,
Wang
L.
,
Arca Sedda
M.
,
2019
,
MNRAS
,
484
,
3279

Phillipps
S.
,
Drinkwater
M. J.
,
Gregg
M. D.
,
Jones
J. B.
,
2001
,
ApJ
,
560
,
201

Portegies Zwart
S.
,
McMillan
S. L. W.
,
van Elteren
E.
,
Pelupessy
I.
,
de Vries
N.
,
2013
,
Comput. Phys. Commun.
,
184
,
456

Preto
M.
,
Tremaine
S.
,
1999
,
AJ
,
118
,
2532

Seth
A. C.
et al. ,
2014
,
Nature
,
513
,
398

Spitzer
L.
,
1987
,
Dynamical Evolution of Globular Clusters
.
Princeton Univ. Press
,
Princeton, NJ

Spurzem
R.
,
1999
,
J. Comput. Appl. Math.
,
109
,
407

Wang
L.
,
Spurzem
R.
,
Aarseth
S.
,
Nitadori
K.
,
Berczik
P.
,
Kouwenhoven
M. B. N.
,
Naab
T.
,
2015
,
MNRAS
,
450
,
4070

Wang
L.
et al. ,
2016
,
MNRAS
,
458
,
1450

Wang
L.
,
Nitadori
K.
,
Makino
J.
,
2020
,
MNRAS
,
493
,
3398

Wisdom
J.
,
Holman
M.
,
1991
,
AJ
,
102
,
1528

Xu
G.
,
1995
,
ApJS
,
98
,
355

This article is published and distributed under the terms of the Oxford University Press, Standard Journals Publication Model (https://academic.oup.com/journals/pages/open_access/funder_policies/chorus/standard_publication_model)