Dissertation

Abstract

This paper analyses the modelling of mass-based n-body gravity with a specific lens focused on its potential application in real-time simulations such as video games.

Through the literature review, a history of physical models is provided before looking at Newtonian Mechanics and General Relativity specifically. Newtonian Mechanics is identified as the better model for the specified use case. The N-body Problem is briefly considered, concluding that through a mixture of small enough and distant enough planetary objects, the problem can be effectively reduced to a 2-body problem allowing for the creation of stable orbits which is likely to be important for application in video games. Next, the current use of gravity in video games is discussed, finding a general homogeneity in the industry’s use of gravity. The only notable mass-based implementation found was in Outer Wilds which bypassed the N-body Problem completely. Finally, the literature review considers optimisation techniques due to the exponential increase in computations compared to objects necessary for n-body simulations. This found the Barnes-Hut Tree-Particle method and threading to be the most promising techniques.

An artefact was then created to implement Newtonian mechanics for gravity calculations with 2011 objects in the world. Before optimising, this had significant lag but through the Tree-Particle method and threading, this was brought down to a stable 60 frames per second across multiple thresholds of accuracy.

Literature Review

Introduction

Over the course of this literature review, the researcher will explore the evolution of physical models for gravity as well as their implementation in video games to later inform an artefact that will be made to test a 2D implementation and optimisation of physics-based gravity in a solar-system model. The review will begin by looking at the historical context leading to Newton’s Gravitational Model before looking in more depth at that model and subsequently exploring the understanding of gravity in the Theory of General Relativity. Following this, specific research will be done into the multiple body problem, before looking at relevant optimisation techniques.

History of Physical Models

The origin of science-based cosmological models can be traced back to Pythagorean philosophy from the 6th century BC (largely attributed to Philolaus (Zhmud, 2012)) which taught of a round Earth which spun on an axis with all of the heavenly bodies orbiting a central fire (notably not the Sun) on concentric spheres (Chow, 2024). This theory helped pave the way for later rational models even though it was not particularly evidence based. There was no evidence of the central fire, and its presence was just assumed due to its spiritual implications, this led to its eventual replacement in the 4th century BC by Aristotle.

In the 4th century BC, Aristotle established evidence-based arguments as to why the Earth is spherical. In his book, On The Heavens, he noted that the Earth’s shadow on the moon was always round which would only be caused by a spherical Earth unless it was a disc with the Sun directly under the centre. Another argument was due to the position of the North Star being higher in the sky towards the north of the Earth. Aristotle also held that the Earth was a fixed point which the rest of the heavens orbited around although he had no scientific evidence for this. Ptolemy, in the 2nd century AD, later built on this idea to put forward a more concrete geocentric model. This model put 8 spheres around the Earth carrying the moon, the five known planets with the Sun between Venus and Mars, and finally the fixed stars on the outermost layer. The system required the planets moving about smaller circles attached to their spheres to account for their perceived paths in the sky and, more problematically, the path put forward for the moon would make it twice as close to the Earth at some moments in time compared to others, which would suggest that at times it should appear twice as big as at others. This was a flaw that Ptolemy recognised, but due in part to the model’s acceptance from the Catholic Church, the model remained largely unchallenged until the 16th century by Copernicus (Hawking, 2016).

Copernicus put forward the simpler idea that the planets including Earth went through circular orbits around a stationary Sun although this theory was not taken seriously (in part due to the predicted orbits not properly lining up with the observable evidence) until almost a century later when Kepler and Galileo started to support the theory (Hawking, 2016).

Kepler was one of the first astronomers to attempt to move towards physical reason over geometric reason. Until this point, many models had observed eccentricity of orbits, in other words it was known that the supposed circular orbits were not centred around the focus of the orbit but instead slightly offset. This meant that under the Copernican system, the planets would at some point be closer in the orbits to the Sun. For all of the planets but the Earth, this meant that they moved faster as they were closer to the Sun. Kepler reasoned that since the Sun appeared to be the cause of the planetary movement, the Earth must also move faster in its orbit as it approached the Sun. With the help of Brahe’s meticulous observations, he mapped the orbit of the Earth about the Sun and stumbled upon his law of areas (the line swept out between a planet and the sun always forms an equal area in equal time periods (Wilson, 1972)) as a shortcut. This worked at the near and far points of the orbit, but for the rest of it, the areas were wrong, however, assuming that his law was true, he realised that the areas would be correct if the orbit was elliptical with the Sun being a focus (Gingerich, 2009).

In 1609, Galileo learned of a device using two lenses which brought distant images closer. When he discovered how it worked, he began working to improve its amplification capabilities, inventing the telescope. Using this, he discovered three main things: Jupiter had four moons which orbited around it, this suggested that the Earth did not have to be the focus of the orbits of all the heavenly bodies (Hawking, 2016); Venus goes through phases similar to the Moon, suggesting that it orbited around the Sun instead of the Earth (Chow, 2024); the surface of the moon was covered in mountains, craters, and seas, making it remarkably Earth-like, not just a smooth, crystalline ball (Gingerich, 2009). None of this proved that the Earth also orbited the Sun, but it did suggest that the Earth was less singular than previously thought when compared to the observable satellites. This gave less reason for the Earth to be believed to be unique in its lack of movement too, which was important given the religious landscape. A non-geocentric model was incredibly hard to support due to persecution by both the Catholic and Protestant churches at the time.

Together, Galileo and Kepler had managed to promote the idea of a heliocentric model with elliptical orbits, though they could not explain what kept the planets in these orbits. The last element required to bridge the gap to Newton is the physical principles and geometry of Descartes as these were theories that Newton was raised on (Brackenridge, 1996). In the 17th century, Descartes published his Principia philosophiae in which he details two laws of motion. Firstly, he observed that the effort required to bring an object to motion from rest is equivalent to the effort required to bring an object from rest to motion. This led him to describe motion as a relative property where under no external force an object will remain in its state whether that be at rest or in motion. Secondly, now that undisturbed motion has been defined to continue, he clarifies that the motion must be in a straight line, thus curved motion must be under a force (Baigrie, 1992). These laws together defined inertial mechanics and would later be adapted by Newton into his first law of motion; they also reveal that there must be a force acting upon the planets to keep them in their orbits.

Newtonian Gravity

Newton’s unique perspective and breakthrough came from ignoring the Platonic drive to find circularity and the Cartesian challenge of looking for mechanical cause (Brackenridge, 1996). Since Descartes viewed force as a property of a body which let it interact with another, he was held up looking for vortices of matter to push the planets around their orbits (Baigrie, 1992). Newton only concerned himself with looking for a mathematical description of the force required to hold a planet in its orbit, thus allowing him to develop a theory of universal gravitation. Another misstep that was obscuring the discovery of gravity was the idea of a centrifugal force which pushed outwards on objects being spun in a circle. It was only when Hooke asked Newton to comment on some of his hypotheses that Newton was introduced to the idea of breaking an orbit down into a tangential inertia and a centripetal force pulling towards the focus of the orbit. What Hooke lacked was the focus on Kepler’s law of areas to bridge the gap to proving the idea (Cohen, 1981).

Newton’s cosmological ideas are primarily put forward in his Philosophiae Naturalis Principia Mathematica where he first provides a proof for Kepler’s law of areas. The first part of the proof describes a line split in equal segments representing inertial movement in equal periods of time, at a distance h above the line lies point P. Any triangle formed between a segment of the line and P will have the same area since each base is equal in length and h is constant. Where Kepler stumbled onto the law of areas seemingly at random, Newton shows with this simple idea an inherent connection between inertia and the law (Cohen, 1981). The proof then goes on to add instantaneous forces at each interval and geometrically shows that the triangles formed are still equal in area. At the limit as the time interval approaches zero, this path becomes a smooth orbit thus proving that a centripetal force causes a curve which follows the law of areas. Going forward, Newton proves the inverse: that a curve which follows the law of areas must suggest a centripetal force. Newton ultimately proved that an inertial orbit following the law of areas must have a force that is proportionally inverse to the square of the distance between the bodies’ centres of mass.

Newton’s full formulation of the law of gravitation is that the force of gravity between two bodies is proportional to the product of their mass and inversely proportional to the square of the distance between their centres of mass (Whittaker, 1989). The gravitational constant connecting these values is $6.673 \times 10^{- 11}$ , known as $G .$ . The full equation is as below.

$F = G \frac{m_{1} m_{2}}{r^{2}}$

This is a remarkably simply equation for such an all-encompassing concept making it a very compelling choice to use in a simulation. All of the mathematics involved uses functions that, on most modern computers, can be dealt with at constant latency (Fog, 2025).

Relativity

Einstein’s theory of Special Relativity, published in 1905, discards the Newtonian ideas of absolute space and time, replacing them with relativistic spacetime. This theory would later serve as the basis for general relativity and thus immediately creates a barrier-to-entry for making simulations of physics-based gravity using it. This is because computer programs inherently have a form of absolute time in the way that they run. Thus, in order to gain any benefits of relativity, this major drawback would have to be overcome (Choquet-Bruhat, 2023).

Einstein found that Newtonian mechanics violated Special Relativity due to the force of gravity propagating at infinite speed in order for gravity to take instantaneous effect. This led him to develop the theory of General Relativity as a way to describe the effects of gravity without violating Special Relativity. Under General Relativity, gravity is not a force, but a result of the curvature of spacetime (Perkowitz, 2025).

One of the most notable predictions of General Relativity is the curvature of light rays. Under general relativity, an equivalence between mass and energy is drawn, represented by the famous equation: $e = {m c}^{2}$ where $e$ is energy, $m$ is mass, and $c$ is the speed of light. This equivalence specifically predicts a gravitational effect on zero-mass particles like photons due to the relation between an object’s mass and the force of gravity affecting it. This suggests that light should be deflected when passing by high mass objects (Choquet-Bruhat, 2023).

The force of gravity under general relativity is given by the equation:

$F_{f} (r) = - \frac{G M m}{r^{2}} + \frac{L^{2}}{m r^{3}} - \frac{3 G M L^{2}}{m c^{2} r^{4}}$

where $L$ is the angular momentum, $G$ is the gravitational constant, $m$ is the mass of the orbitting body, and $M$ is the mass of the larger body. The first term is essentially the same as the Newtonian equation for gravitational force.

Comparison

The predictions of Newtonian Mechanics are similar to those of General Relativity and give relatively accurate results for acceleration of objects moving slowly compared to the speed of light (Bekenstein, 1991). This means that in a setting like a simulation of a solar system, it is likely that Newtonian Mechanics will provide a quick and accurate solution where Relativity may lack efficiency for little benefit.

One more major difference between Relativity and Newtonian Mechanics is the difference between deflection of light under each model. Although Newtonian Mechanics can predict the deflection of light, by substituting Newton’s second law into the force equation under Newtonian Gravity to cancel the mass of the light and calculate an acceleration to deflect the light by, the result it predicts is half the result predicted by Relativity (Dyson, Eddington and Davidson, 1920). More recently however, this result has been disputed, suggesting that the Newtonian prediction was actually more accurate than the prediction under relativity unless a term is added which then makes Relativity only marginally more accurate (Xiaochun, 2022). Ultimately deflection of light is a very minor adjustment for an expensive set of calculations and will likely be irrelevant to a simulation of the kind this research is to support.

Due to the similarity in accuracy between the two models and the much lighter computational load of Newtonian Mechanics, this model will be taken to be the better one when informing the implementation of mass-based physics in video games.

Multiple-body Problem

The multiple-body problem is the name typically given to the problem of predicting the motion of multiple bodies (represented as particles) whose initial state is known, and which act only under the gravitational forces between them. The most trivial version of this is the two-body problem. Since Newton’s second law in respect to a body’s position and time is a second order differential equation, the two-body problem can be solved by adding and subtracting the equations for each body. An exact solution is given in which both bodies always remain on the same plane and the centre of mass between them is motionless (Peale, 2023).

When adding a third body or more, the problem becomes chaotic as there are too many unknowns to form a general solution. Various specific solutions have been developed however, such as the restricted three-body problem in which one of the bodies is treated as massless essentially expressing the problem as a two-body problem with an extra body. This is useful for cases such as the Sun, Earth and the Moon in which the Moon can be treated as massless since, relative to the other two, the Moon’s mass is small (Barrow-Green, 1993).

It has also been noted that at the scale of known solar systems with one star, in practice, the multiple body problem can be reduced down to a series of two-body problems with respect to the central star and each planet due to the immense distances between each planet. This gives a high degree of precision over a long period of time, although in a game setting, it could reduce the opportunities for emergent behaviour. For example, if there were a way to knock planets out of their orbits, the system losing that planet would not experience extreme side effects when the planets are distant enough to begin with which may be less fun from a player perspective (Peale, 2023).

Gravity in Games

The history of gravity in video games can be traced back as far as Tennis for Two (1958) as it reflects a tennis match played from the side. The addition of gravity to the game and others like it helped to add a level of relatability and interactivity for players despite the simplistic graphics (Wardrip-Fruin, 2019). Wardrip-Frurin points out that, aside from Pong (1972), many early games succeeded due to their implementation of gravity with a major component in Pong’s success being its flipped perspective which invites the player to assume a gravity. This is reflected still in the gaming landscape today in which the majority of video games implement gravity in some way with the largest set of notable exceptions being in top-down 2D games.

Despite this, even today, most video games use implementations of gravity no more complex than those used in the 50s and 60s. These are implementations in which non-static objects are attracted in a set direction (generally directly downward like in Tennis for Two) or towards a specific point (such as the central star in Spacewar! (1962)). An example of a modern game that uses both of these classic methods is Lovers in a Dangerous Spacetime (2015): players control characters who are attracted downward towards the bottom of their spaceship; the spaceship is in turn attracted towards the centre of planets although only when within a specific range. This game also has an enemy that walks along the surface of planets, however as many of the planets generate no intrinsic gravity, this is implemented by using raycasts to hold the enemy next to the terrain with some additional logic to rotate it naturally (Winkels, 2014). This is a common flaw of this type of implementation of gravity: due to its over-simplified nature, many exceptions are needed instead of building from the same gravitational foundation.

Other games opt for a more complex approach such as Super Mario Galaxy (2007). Yoshiaki Koizumi, who was also the director of the Mario 128 (2000) technical demonstration for the GameCube, wanted to expand the idea from Mario 128, which took place on a flying saucer-shaped platform, into a game where Mario could walk freely around a spherical platform (Iwata, 2007). The solution the developers came up with to implement gravity in this way utilised a kind of container which would tell the player character how to react to gravity depending on which container he is within. These “gravity fields” came in a variety of different types including: spherical; parallel; cubic; toroid; conical; cylindrical; etc. (Jasper, 2020). Spherical and parallel fields represent the traditional two forms of gravity with spherical pulling inward towards a central point like in Spacewar! and parallel pulling in a set direction like in Tennis for Two. These two can also with varying degrees of success be recreated with mass-based gravity implementations as spherical gravity is the typical result of using mass-based gravity and on a large enough scale can appear parallel like in the real world. As a practical use in a game however, this form of parallel gravity could only really be used with a large star or black hole at the edge of a simulated universe or something similar, thus being less versatile than Super Mario Galaxy’s implementation. The only other of these fields that could potentially be recreated with mass-based gravity would be toroid gravity, although it would operate slightly differently as the gravity on the inner side of the torus would be weaker than the outer side due to the mass of the torus above. Overall, Super Mario Galaxy, and its sequel Super Mario Galaxy 2 (2010) are space games that opt for creativity over realism in their implementation of gravity. This makes them examples of a type of game where mass-based gravity would not be applicable but also highlights some of the interesting differences that mass-based gravity could cause such as gravity on toroid planets.

Outer Wilds (2020) is a video game set in space which uses Newtonian mechanics as the basis for its system of gravity and cosmology. The entire solar system is kept in orbit around the sun in real-time by its gravity with space being represented with no intrinsic friction or gravity, contrary to the way that many games approach space. This makes the game’s spaceflight less intuitive to players as simply ceasing acceleration does not cause a deceleration. It is therefore more realistic and dangerous. Due to the instability that gravity-based orbits and some of the other systems introduce to the simulation, it is only run for twenty minutes before resetting. This prevents the simulation from either devolving into chaos or reaching an unintended state of equilibrium (Beachum, 2013). A key difference between Newtonian mechanics and the implementation of gravity in Outer Wilds, is that most of the effects of gravity calculate the force with an inverse linear relationship to the distance instead of an inverse square relationship, with exceptions being from the sun or a specific moon which use inverse square. Practically, this means that the falloff for the force of gravity in most cases is considerably lower, giving more consistent gravitational forces at the scale of the game. The gravity from the sun is still inverse square as this allows for the elliptical orbits of the planets which otherwise would not orbit correctly. For the moon, this simply means that the player is less affected by its gravity when standing on the nearby planet due to the increased falloff (xen-42, 2023).

The biggest change that Outer Wilds makes from real Newtonian mechanics, is that it opts to have each planet ignore the gravitational effects of other planets, thereby completely circumventing the issue of the multiple-body problem and the instability that would cause. This makes planetary orbits more stable and also reduced the performance impact of each planet, allowing the planets to be closer to one another and still having orbits that remain stable over the twenty-minute cycle of the game (Torres, 2024). Overall, Outer Wilds provides a good starting point for how more realistic simulations of gravity can be used in video games, but it still uses many shortcuts and workarounds to provide a fun and playable world.

Optimisation

There is little literature about the optimisation of cosmological models for video games. As such the majority of this section of the review will focus on more general physical optimisations with suggestions as to how they could be changed to instead work for gravitational models.

The first technique to discuss in physical optimisations will be broad-phase collision checks. The fundamental idea behind these is to use simpler and less accurate collision checks to decide what is worth spending more time on for more accurate checks (Szauer, 2017). There are a variety of techniques involved in the broad phase which can generally be broken down into simplification of shapes, and spatial and temporal reasoning. Shape simplification is unlikely to be helpful in the case of gravity optimisations as gravity already has uniform effects in all directions. Temporal reasoning is also unlikely to be useful as it is mainly useful for static objects, which in a space simulation with Newtonian gravity, should not exist. The final option, spatial reasoning, is about narrowing down which objects could collide with each other through means such as axis separation (Cao and Wang, 2023). A modification of this idea could be used to decide which objects are light enough or far enough away to cause a negligible impact on the acceleration of other objects.

A notable form of spatial reasoning is binary space partitioning (BSP), particularly in the form of quadtrees or k-d trees. Binary space partitioning is the process of creating a binary tree by repeatedly subdividing a space into two smaller sets. The end result is a tree which can be very useful for sorting and searching data (Fuchs, Bruce and Naylor, 1980). Quadtrees are a type of BSP tree which split the data down the middle of the defined region on each 2-dimensional axis creating 4 child branches with varying quantities of nodes on each branch. These do a good job at splitting up objects and are easy to modify at runtime, but have the disadvantage of being unbalanced meaning that some nodes will take longer to reach due to being much more deeply embedded in the tree whilst some branches are completely empty. K-d trees are another form of BSP tree which instead find the middle node and create branches either side of it in the specified dimension, cycling through each dimension. This method of partitioning the nodes results in a balanced tree which is generally better for searching through but has the added complexity of finding the median node to sort around. A common optimisation for implementations with frequently changing data is to take a set number of randomly selected nodes and find the median of that set instead of the whole list, this still gives a fairly balanced tree most of the time but is less precise (Hristov, 2025).

Threading is another common method for optimisation. Threading involves passing parts of the codebase onto separate threads to create concurrent code with the end goal of parallelism—for the CPU to be able to execute code simultaneously across multiple cores. Threading is highly applicable to various kinds of optimisation problems as it only requires a codebase which has operations that do not rely on each other to be processed (Reinders, 2007). It works particularly well in conjunction with BSP as a number of regions within a BSP tree can be split onto separate threads to be processed concurrently, often this can eliminate the possibility of race conditions (issues which are caused by multiple threads attempting to read or modify the same data at the same time due to their asynchronous nature) if the regions contain all of the data needed for their own calculations.

N-body simulations on supercomputers have been carried out for decades leading to a wide variety of optimisations designed for supercomputers. Some of these optimisations may be applicable to the consumer market although most consumers will be unable to access computers with 32 processors like the supercomputers from the 90s (Xu, 1994) let alone the millions of processors typically used for n-body simulations now (Liu et al., 2025).

If the simulation is performed with each particle strictly affecting each other particle (known as Particle-Particle or PP), then the computational complexity is $O (N^{2})$ . This will scale very quickly in terms of performance impact leading researchers to come up with different methods to optimise the simulation at the cost of marginal simulation inaccuracy (Liu et al., 2025). The simplest of these methods is the Tree method which uses a quadtree or oct-tree to divide the bodies into boxes. At close distances, the particle interactions are calculated directly, whilst at further distances boxes are treated as particles with their position at their centre of mass instead of each particle being used. This reduces the computational complexity to $O (N log N)$ (Barnes and Hut, 1986). Further methods exist such as the Particle-Mesh method which is less accurate at lower distances. This then led to the creation of the combined Particle-Particle-Particle-Mesh (P³M) method which over short distances uses PP for increased accuracy, and the Tree Particle-Mesh (TPM) method which similarly uses the Tree method at shorter distances. These methods are all generally more efficient than the Tree method but still have complexity $O (N log N)$ (Xu, 1994).

The Tree method seems the most sensible to test for application in games as it is the most similar to existing physical optimisations which are commonly used in the industry. Therefore, it could reasonably piggyback off collision optimisations which are likely to be included anyway due to the computational complexity of collisions which are important to the majority of games. The Barnes-Hut method specifically uses what is essentially a broad-phase pruning method based on the ratio of a branches distance to its size. This could be configured to be more aggressive in games as it is not necessary for them to be as accurate as astrophysics simulations.

Overall, this research suggests that the best techniques to apply to the problem are threading due to its broad use in optimising various problems, and the Tree-Particle method which combines a form of binary space partitioning with broad phase pruning.

Conclusion

Overall, the literature suggests that the best physical model for real-time simulation of multiple large, gravitational bodies would be Newtonian mechanics as it is sufficiently accurate to model the orbits of planets over a reasonable period of time whilst being computationally less complex than general relativity. The multiple-body problem is an issue which could introduce incalculable instabilities into orbits causing models to collapse which would lead to challenges in the design of games. This can be largely avoided by keeping orbits of massive bodies far apart leaving only massively small bodies to orbit as solitary moons, fulfilling the criteria to simplify to a two-body problem for each planet and the sun or a restricted three-body problem with a moon. For cases that do not obey these conditions, the only source of gravity for the planets can be set to be the star of a system, as is the case with Outer Wilds. Finally, optimisation of these kinds of models within real-time games is something of a gap in the literature, but promising techniques include threading and binary space partitioning using the Tree-Particle method

Research Methodologies

Primary research was undertaken primarily to test RQ3 as real-time simulations on consumer grade technology was a gap in the literature. Primary testing allows the generation of data about how Newtonian gravity could be implemented in a video game with reasonable frame rates. Based on the results from the literature review, it was decided that the artefact would be based on Newtonian Mechanics to inform gravity calculations. This means that instead of providing a constant value for gravity along the y-axis, every frame the gravity would need to be recalculated with every object in the scene. The artefact will then also implement the Barnes-Hut Tree-Particle (TP) method of optimisation for the n-body gravity calculations before threading them as informed by the literature review.

The artefact was created in c++ and implemented acceleration based on Newton’s law of gravitation. Each object would calculate its acceleration on each frame by initially looping through a vector of every other object in the scene and using the equation

$a = G \frac{m_{2}}{r^{2}}$

where $a$ is the resultant partial acceleration, $G$ is the gravitational constant, $m_{2}$ is the mass of the other object, and $r$ is the distance between the two objects. The result of each of these calculations would then be used to calculate the total acceleration for each object. This is the Particle-Particle method (PP) of n-body calculation. After the acceleration is calculated, it is applied to modify the object’s velocity and its position. To test that this worked, 11 initial objects were added to the scene: a sun and ten planets. Six of these planets had masses, positions, and velocities based on the planets from Mercury to Saturn and the sun was based on the Sun. Four additional planets were added between the planets based on Mars and Jupiter due to their distance apart. This all ran well with stable orbits when tested running for multiple hours. Then 2000 small asteroids were added to create lag. Profiling data was collected using Optick to ascertain the most costly functions and display the frame times. Unsurprisingly, the gravity calculations were shown to take up the majority of the processing time meaning that optimisations should target this. Following the research gathered during the literature review, a Tree-Particle (TP) optimisation using a quad tree as its basis was chosen as the next optimisation to implement.

To begin this optimisation, a quad tree implementation was constructed where each branch stored a vector of the objects within its bounds. Vectors were used as they contain dynamically sized memory which is stored contiguously to improve cache usage. The basic structure of the quad tree is constructed from the main function which calls a constructor for the root node, passing its position, size, and the vector of all of the objects within the scene. Then on its first update loop (controlled with a Boolean value to tell the branch if it is built), it recursively builds the tree by at each stage testing each of its objects to see if they would fit completely into any of the quadrants within the region. If any would, branches are constructed for each quadrant with objects in it and then built in the same way.

After this first build, the update is again called which first calls update for each of its children so that the lowest level branches are updated first. Lifespan and max lifespan values are used to test if branches are still in use: each frame that a branch is empty, its lifespan is decreased, if it reaches 0 the branch is pruned, if it stops being empty its max lifespan is doubled (this keeps frequently used nodes alive longer to decrease repeated allocation and deallocation of memory) and its lifespan is reset. The next step in updating the tree is to loop through each branch’s objects and check if they are still within the bounds of the node. If so, an attempt is made to move the object deeper into the tree or otherwise stay on the same branch. If not, the object is moved up the tree until it is contained (or otherwise is left on the root if outside the bounds of the whole tree. This does cause performance issues as more objects are ejected from the system and end up on the root, however it proves that the performance gain early on is not gained by deleting ejected objects. A true implementation should either delete these ejected objects or come up with another solution to deal with them in accordance with the design of the game), once contained, an attempt is made to move it down the tree as before.

Finally, the update calls the update for each of the objects on the branch except this is modified to remove the gravity acceleration calculation which is instead implemented as below.

This quad tree was then used to implement the Barnes-Hut algorithm (Barnes and Hut, 1986) as a Tree-Particle optimisation. The Barnes-Hut method was chosen as it was shown to reduce the computational complexity from $O (N^{2})$ to $O (N log N)$ and to be more similar to existing game optimisations than other methods which achieved the same computational efficiency. This meant for each object, going from the top of the tree and comparing the ratio $s / d$ to a parameter $θ$ where $s$ is the branch's width and $d$ is the distance from the object to the centre of mass for the branch. $θ$ was an arbitrary value which was tested at various values, the lower it is the more precise the calculations should be and therefore less optimised. If $s / d$ is greater than $θ$ then each object on that branch is used to calculate gravity with the original object as usual and each child branch is tested similarly. Otherwise, the whole branch is treated as a single object with a position at the centre of mass of all of its child nodes and objects and a combined mass of all of them. After implementing the Barnes-Hut algorithm, profiling data still showed the gravity calculations to be the primary bottleneck although this technique vastly improved frame times.

The next optimisation was threading. In the artefact this was done by separating out the update loop of the quad tree into three sections. One handled the logic for recalculation of where each object should be situated on the quad tree, the second contained the acceleration calculations due to gravity, and the final section updated each object to apply their acceleration to their velocity and that in turn to their position. Doing this decoupled the latter two sections such that none of the information required to calculate the acceleration would be altered by the acceleration calculations and the same for the object updates. This meant that both sections could easily be threaded, if necessary, without risk of race conditions. Since the gravity calculations were taking the most processing time, this was the section that was threaded. Logic was added to check whether the node was the root of the tree. If so, instead of simply calling its children’s gravity functions, they were set off on separate threads. Two additional variations were created, one of which additionally threaded the grandchildren of the root by running threads from the initial child nodes as well, the second only threads the grandchildren. That means that these use 20 and 16 threads total. Each of these three variations were then tested at each threshold as used previously to analyse the effect of threading differing amounts at different thresholds.

Results and Findings

Introduction

For each optimisation technique (including the control), three types of results will be discussed:

Qualitative data regarding how the artefact visibly appears.

Data taken from Optick which provides a breakdown of how long tracked functions take up in a single profiled frame (the last frame which is profiled) which will primarily focus of the percentage of the frame time taken by each function to show which functions are bottlenecks for the program.

Frame times to the nearest millisecond as recorded by Optick. Roughly 100 frames will be profiled from close to the start of the simulation. Of these frames, the last 40 will be recorded. This should give the program time to settle as the first frames may have additional overhead, especially in setting up quad trees which won’t typically affect performance later on. These later frames should be more indicative of how the program will appear at runtime.

Control

Implementing Newtonian Mechanics to simulate gravity provided an extensible framework: simply adding an object with a specified mass, velocity, and position to the vector would cause the object to begin orbiting the other objects in the scene. If these values were just right, the new object would be placed in a stable orbit about the other objects in the scene, otherwise over time it would be ejected from the system. Although the system used n-body simulation, the orbits of these planets were stable. Beyond these, the 2000 asteroids in the model caused significant visual lag in the control. This lag did not cause the planetary orbits to be unstable even running for multiple hours.

Table 1 shows that at this stage, the gravity calculations take up around 96.8% of the frame time. This places it as the primary bottleneck for performance as the vast majority of frame time is spent at this stage.

Table 1. Percentage of processing time used by various functions for the control

Table 2 shows the average, spread, and the bounds of frame times using multiple measures of each of these. For the control there is a high level of consistency between these measures of average. As the mean is slightly higher than the median and the mode, this indicates a slight positive skew meaning that outliers fall above the average. This can be seen in the different bounds of the data: the minimum milliseconds per frame was only 2 below the frame time of the lower quartile, whereas the maximum was 9 above the upper quartile. The inter quartile range and standard deviation are both low relative to the scale of the data being over 2 orders of magnitude lower than the averages. This shows that the frame times were consistent overall. The average frame times here convert to a little over 3 frames per second which is 20 times lower than the desired frame rate.

Tree-particle Optimisation

The Tree-Particle method introduces a threshold value $θ$ . If the ratio between the size of a branch and the distance from the branch’s centre of mass to the object which gravity is being calculated for is lower than this value, the whole branch is treated as a single object, otherwise each child branch is considered and the acceleration due to any objects on the branch is calculated using the Particle-Particle (PP) method. Therefore, at lower values of $θ$ , there is a higher accuracy of simulation, but lower performance as more of the calculations are using PP.

This difference can be seen in figures 1, 2 and 3 as the pattern of asteroid scattering around the white planet is marginally different: gravity seems to hold some asteroids closer, ejecting them quickly one by one at the 0.7 threshold, whilst at 1 and 1.5 there appears to be a slow ejection that happens to a group of asteroids all at once. The simulation is also visibly running at a much higher frame rate with this optimisation when compared to the control (note that the gifs shown below are not representative of the true frame rate as they are limited to 30 fps).

Fig. 1. Asteroid scattering around a planet at the 0.7 threshold

Fig. 2. Asteroid scattering around a planet at the 1 threshold

Fig. 3. Asteroid scattering around a planet at the 1.5 threshold

Table 3 reveals that although the calculations of acceleration due to gravity still takes up most of the frame time for TP at about 62.9%, maintaining the tree now also takes a significant amount of frame time with the insert function now taking 9.8% of the frame and the quad tree’s update taking another 15.8% (note that although the insert function and calculate acceleration functions take place within the update function, the percentage shown for the update function ignores this time due to them also being profiled, this better represents where the processing time is being used specifically. The insert function is included in this break down separately to update as is accounts for a significant part of that processing time and therefore could become a target for further optimisation). This means that now only 89% of the total frame time is used to calculate the accelerations though some of that is overhead to using quad trees.

Table 3. Percentage of processing time used by various functions for TP at threshold 0.7

The breakdowns for the other two thresholds reveal much the same thing so are not included in this section, though they can be found in appendix 2.

Table 4 shows the average, spread, and the bounds of frame times as before for the control, this time showing TP at all three thresholds. Like with the control there is a high level of consistency between these measures of average for each threshold. All three of these thresholds again are shown to have slight positive skew. Relative to the scale of the data, the inter quartile range and standard deviation are not as low as with the control. This indicates that the frame times were slightly less consistent overall, with the optimization at threshold 1 being almost twice as consistent as the others based on its standard deviation and range or 4 times as consistent based on its IQR. The average frame times at threshold 0.7 are roughly 42 to 44 frames per second which is 14 times better than the control for the most accurate version that was tested. At threshold 1, the frame rate is about 58 to 59 frames per second, almost 20 times better than the control. At threshold 1.5, the frame rate is roughly 79 to 83 frames per second which is 26 to 27 times better than the control.

Fig. 4 shows the frame times across 40 frames of the TP optimisations against the control on a line graph to visually display the significance of the improvement that TP provides. Fig. 5 shows the same information without the control to convey the differences between the thresholds in a way that can be better appreciated. From this, it is clear that higher thresholds result in a direct performance with only slight crossover between thresholds 1 and 1.5 on the first frame profiled.

Threading

The three different threading implementations are as follows:

The tree root executes the update accelerations event of its 4 child branches (if they exist) on threads. This will be referred to as the 4-thread method for conciseness going forward.
Each of the 4 children of the root node execute the update accelerations of their 4 children each where possible. As this results in a maximum of 16 threads being used, this will be referred to as the 16-thread method.
Both of the above methods are combined, generating up to 20 threads. This will be known as the 20-thread method.

Since figure 5 suggests a strong inverse relationship between the threshold and frame times, this section will look at the different threading techniques on a per threshold basis instead of a per thread implementation basis. It is assumed that each implementation of threading will have a better performance with a higher threshold as before.

The visible performance of each threading technique is very similar to that of the Tree-Particle method at the equivalent threshold so each will not be reported in further depth.

After adding threading, the quad tree update is split into three functions: one to update the accelerations, one to update the tree itself, and one to call the planet updates. Although these make up what was previously the update function, the function for updating the accelerations is more similar to the previous calculate acceleration function in its percentage of the frame time. This is due to limitations with Optick making it difficult to profile that function directly when it is not on the main thread. This is important to know when viewing the results in tables 5, 6, 8, 9, 11, and 12

Threading at Threshold 0.7

Tables 5 and 6 show the function break downs for the 4-thread and 20-thread methods (note that the 16-thread method is not shown as it was found to be the worst method across all three thresholds). These tables reveal a continuing trend that sees the acceleration calculations becoming a lower percentage of the total frame time as the program gets optimised to higher frame rates (in this case 20-thread has a higher frame rate as shown in table 7). Notably most of the functions take the same proportion of the frame to within a percent except for the update accelerations function. This shows that the actual improvement between them lies in that function with the other functions all taking slightly more of the total frame time as a result.

Table 5. Percentage of processing time used by various functions for 4-thread at threshold 0.7

Table 6. Percentage of processing time used by various functions for 20-thread at threshold 0.7

Table 7 compares the averages, spreads, and the bounds of frame between each threading implementation at the 0.7 threshold. At this threshold, 16-thread is shown to have both the highest average frame time across all three measures of average, and the largest spread across all three measures of spread. The best method at this threshold is shown to be 20-thread. It has the lowest average frame time across all three measures and is more consistent based on both its range and standard deviation, although the inter-quartile range shows 4-thread to be more consistent. Like all of the data so far, these three implementations have slight positive skew at this threshold. All three of these are more optimised than just the Tree-Particle method at this threshold, although 16-thread is not much of an improvement. The average frame rate of 4-thread is between 61 and 63 frames per second. The average for 16-thread is between 48 and 50 frames per second. 20-thread’s average frame rate is between 65 and 67 frames per second.

These three optimisations are plotted alongside TP on a line graph in figure 6. This shows how each improves upon TP with 16-thread briefly fluctuating above TP, and 4-thread briefly fluctuating above 16 and below 20 at its extremes.

Threading at Threshold 1

Tables 8 and 9 show break downs for the function times for 4 and 20 threads respectively. This time the 4-thread update takes up a lower percentage of the frame time as it is more optimal (as shown in table 10). The other three functions in the quad tree class together take up a higher proportion of the frame time than the acceleration calculations.

Table 8. Percentage of processing time used by various functions for 4-thread at threshold 1

Table 9. Percentage of processing time used by various functions for 20-thread at threshold 1

Table 10 shows that again, the 16-thread implementation is the worst of the threading optimisations, and this time performs even worse than unthreaded TP. 4-thread is this time more optimal, although 20-thread is still the most consistent. Whilst 16 and 20 threads both still have slight positive skew, 4-thread this time has slight negative skew. 4-thread operates at about 77 to 79 frames per second on average. 16-thread achieves between 52 and 53 frames per second. 20-thread operates at around 70 to 71 frames per second.

Figure 7 compares the three techniques with TP on a graph, showing that 16-thread is now the worst including TP, whilst 4-thread is the best. There is very little crossover between lines.

Threading at Threshold 1.5

Tables 11 and 12 show the function breakdowns of 4-thread and 20-thread at threshold 1.5. Like with threshold 1, 4-thread has the lower proportion being taken up by update accelerations, this is reflective of its more optimal frame time as shown by table 13. Both implementations now spend more time total elsewhere in the quad tree class than on the update accelerations function.

Table 11. Percentage of processing time used by various functions for 4-thread at threshold 1.5

Table 12. Percentage of processing time used by various functions for 20-thread at threshold 1.5

Table 13 shows that again, the 16-thread implementation is the worst of the threading optimisations and again performs even worse than unthreaded TP. 4-thread is again more optimal, and this time very marginally more consistent than 20-thread. All three are back to having slight positive skew. 4-thread has an average frame rate of between 96 and 100 frames per second. 16-thread runs at around 71 frames per second. 20-thread operates between 87 and 91 frames per second on average.

Figure 8 shows the three methods on a line graph with TP. TP, 4-thread, and 20-thread are very close together the whole time, whilst 16-thread is slightly more separate.

Discussion and Analysis

Physical Model (Objective 1)

An analysis of the literature found Newtonian Mechanics to be the best physical model based on the aims of this paper as it was found to be almost as accurate as General Relativity for simulating gravity over large periods of time whilst being much simpler both computationally and conceptually. This let it strike the right balance between providing additional realism and scalability over Classical Mechanics, whilst being simple enough to conceivably run at real time.

An artefact was created which then implemented Newtonian mechanics as informed by the literature. This was overall simple to implement as the most complicated part was looping through a vector of objects and applying the following equation (a more detailed breakdown of this equation is in the research methodologies section):

$a = G \frac{m_{2}}{r^{2}}$

This implementation worked well with a small number of objects as it made the creation of a solar system with a sun and ten planets trivial. The only limiting factors to adding more objects at this point were finding a stable orbit, if required, and the processing time for the gravity equations due to its O (N2) complexity. Finding stable orbits was easily solved for this implementation by using values for mass, position, and initial velocity based on the Sun and six of the planets in the Solar System (Mercury to Saturn), then as the gap between the planets based on Mars and Jupiter was noticeably larger than the others, it was easy to create four additional planets between them with stable orbits by choosing arbitrary values for mass, position and velocity between those of the planets based on Mars and Jupiter. The problem of performance leads nicely into objective 2.

Optimisation Techniques (Objective 2)

The other main result taken from the literature review were the promising optimisation methods to attempt in the artefact. Two main techniques were highlighted:

The Barnes-Hut Tree-Particle method to specifically target optimisation of the n-body interactions. This was selected as it has been shown to take n-body interactions from a complexity of $O (N^{2})$ to a complexity of $O (N log N)$ (Barnes and Hut, 1986).
Threading which was found to be a useful generalised optimisation tool which can work across a variety of situations by parallelising logic (Reinders, 2007).

Tree-Particle

The Tree-Particle method was shown to provide a performance boost of between 14 and 27 times depending on the threshold used. This supports the prior research which suggested that it was a good technique for optimising n-body problems. This technique demonstrated an impressive performance across three differing degrees of accuracy.

Threading

The increase in performance due to the threading techniques implemented was not as impressive when compared with pure TP which these were all optimisations to. That being said, the optimisation was, in most cases, still noticeable.

Threading the children of the tree root always provided a performance benefit over equivalent techniques which didn’t thread them. This can be seen when comparing 4-thread to unthreaded TP showing that at all three thresholds, 4-thread outperformed unthreaded. This can also be seen when comparing 20-thread which threads the children to 16-thread which doesn’t.

Threading the grandchildren of the root, on the other hand, was shown to only provide a performance benefit compared to the equivalent technique which doesn’t, when at the 0.7 threshold. 16-thread only outperformed unthreaded at this threshold and similarly, 20-thread outperformed 4-thread at only this threshold. The performance loss can also be shown to be greater at 1.5 compared to 1. This can be assumed to extrapolate, with 20-thread becoming the best optimisation of those tested by a higher margin as the threshold gets lower. A likely cause for this is that at lower thresholds, the TP method is expected to perform the more accurate PP calculations further down the tree which makes the threads of greater benefit compared to their overhead.

Following this reasoning, it can also be assumed that at a high enough threshold, threading at all in the way done in the artefact will result in a performance loss (more research would need to be done to confirm this however). The opposite of this can also be assumed: that at lower thresholds more consecutive levels of branches could be threaded to improve performance. Due to the way that threading operates across cores however, this is likely to reach an upper limit based on hardware limitations. Since consumer PCs typically have between 4 and 8 cores, it is likely that threading another level would provide a negative performance impact as it would need 64 threads for the additional level alone.

This way, threading can be shown to be a more nuanced optimisation which may not always be the best choice, but if implemented correctly, it can still result in a performance benefit.

The percentage breakdowns of frame times suggest that as the computational time for both updating the quad tree and inserting elements into it starts to reach significant proportions of the total frame time when threading, further optimisations could start to target these functions as well as the gravity calculations. Threading particularly could be explored to attempt to parallelise some of the computations involved in this although there would be more risk of race conditions due to the higher dependence of things being changed on the tree in a certain order.

Frame Rates (Objective 3)

Control

The control’s frame rate of roughly 3 frames per second displays the need for the optimisation techniques as it is unlikely that any player would be willing to play a game which runs so slowly. The industry standard that most players will be used to is 60 frames per second in AAA games and typically either 60 or 30 frames per second in indie games so 3 frames per second would be a huge downgrade.

Tree-Particle

The Tree-Particle optimisation at threshold 1.5 is already consistently over the target of 60 frames per second with the frame rate dropping to a minimum of 59 frames per second on only the longest frame. It should be noted that the frame rates for each of these optimisation techniques are not recorded precisely, only showing integer amounts of milliseconds. In this case, it could mean that the longest recorded frame which was reported at 17 ms is actually below the approximate 16.67 ms that would indicate the frame rate never dropping below 60 fps in the profiled time. Depending on the rounding that is used, the inverse of this could be true and the frame that was recorded at 16 ms could have been floored from above 16.67 ms making the frame rate marginally less consistently above 60 fps.

The frame rate for threshold 1 is marginally below 60 fps on average. This means that although it would likely be acceptable for a commercial game, it does not meet the stricter requirement of objective 3 and therefore more optimisation is necessary.

Threshold 0.7 gives a frame rate that would be acceptable in an indie game but would likely cause complaints if featured in a commercial game and is well below the target for the paper which again means that further optimisation is required.

Threading

At threshold 0.7

At this threshold, the threading technique with the highest frame rate was 20-thread which only at its highest frame time dropped below 60 frames per second. This frame time as mentioned above, fell into the margin of error which could have still been at or above 60 frames per second. This means that even at the threshold that prioritises accuracy over performance the most out of those tested, a method has been found to give a consistent 60 frames per second.

4-thread also achieved a consistent 60 frames per second with only frame times beyond the upper quartile being slower than this. At this threshold 20-thread is still better and leaves more space for further bottlenecks added by other game mechanics, but 4-thread may run better on PCs with fewer cores (although further research would be necessary to prove this).

16-thread despite being more optimised than the TP method does not reach the target set forth by objective 3. It also utilises more threads than 4-thread which makes it likely to run worse on lower end PCs making it ultimately unnecessary for any purpose.

At threshold 1

4-thread overtakes 20-thread for the highest frame rate at this threshold although both are comfortably above the target and never dropped below 60 frames per second in the profiled time. 20-thread and 16-thread are unnecessary at this threshold as they both perform worse than the equivalent optimisation which don’t thread the grandchildren of the tree root and as they use more threads, they are likely to perform worse on lower spec hardware.

At threshold 1.5

At a threshold of 1.5, the 4-thread optimisation method once again outperforms the competition similarly to at the threshold of 1. 16-threads once again fails to beat unthreaded. Notably at this threshold, 4-theads performs over 1.5 times faster than the target, giving plenty of time for other features of a game to be executed in without dropping below 60 frames per second.

Comparison

Since each threshold is a distinct level of accuracy, the best at each threshold is notable. Through this lens, the best optimisations are:

20-thread at threshold 0.7, this likely extends to even more accurate thresholds. 4-thread is still notable here as it still reaches a consistent 60 frames per second and is likely to outperform 20-thread on hardware with fewer cores.
4-thread at thresholds 1 and 1.5. This is likely to extend to less precise thresholds although may eventually be beaten by unthreaded TP as less of the tree gets traversed.

Limitations

The methodology of this paper did not include player experience surveys which means that the answer to “should games be made this way” is still left unanswered. This further leads into the optimisation techniques where due to a lack of survey, it is unclear which threshold for the Tree-Particle method is accurate enough.

The method of data collection was imprecise. No good way to obtain results from the same range of frames during the simulation was found. Further research could use a bespoke method to record data to a csv file. This could also provide a higher precision in frame times as data recorded by Optick was displayed only to integer values. Using an external profiler could additionally cause lag and may lead to incorrect values being recorded. The methodology could also be improved by recording frame times from multiple runs of the simulation and across multiple systems.

The testing performed overall was useful as a starting point but limited: further research should test more methods over more simulations to get results with a higher degree of confidence across different scenarios such as testing on PCs at the lower end of commercial specifications.