‘Ground-Up’ Solution to the Brachistochrone Problem

0
348

Abstract

Existing treatments of the brachistochrone problem often appeal to concepts such as Fermat’s principle and energy conservation, which – though physically correct – can seem arbitrary from a purely mathematical perspective. As such, this paper provides a ‘ground-up’ investigation, arriving at those methods organically from only elementary mechanics (considering forces) in a manner approachable by high school students. The motion of a falling object is first examined along a single straight segment – analysis which is extended to multiple connected segments to establish a least-time condition. Generalising this condition to an arbitrary number of segments implies a continuous formulation and a corresponding differential form. Solving this equation produces the brachistochrone, which is shown to be a cycloid. Comparison to a purely geometric method is then used to verify consistency with the analytical result, offering an alternative perspective on the problem. Though both approaches ultimately lack the completeness offered by a variational method, they still produce the known shape of the brachistochrone – the cycloid.

Introduction

Context

The brachistochrone problem asks:

What is the curve that allows an object to fall between two points, separated by some fixed horizontal and vertical distance, in the shortest time?

The Swiss mathematician Johann Bernoulli posed and solved this problem in 1696. Using an argument based on Fermat’s principle (that light always takes a time-minimising path) and subsequently an optical analogy, he showed that the brachistochrone was, in fact, cycloidal1.

As with the original (classical) brachistochrone problem considered by Bernoulli, some assumptions will be made for the sake of simplification. These are that the ball:

  • Is a point mass with radius zero, ‘falling’ or ‘sliding’ along the curve.
  • Falls through a uniform gravitational field in the absence of dissipative forces.
  • Begins at rest, or has initial velocity equal to zero.
Fig 1 | The brachistochrone is the time-minimising curve between two points

Variational Formulation

While more elementary treatments exist2,3,4,5,6,7, a variational formulation is a particularly concise way to formalise the classical brachistochrone problem — which likely explains its prevalence in the modern literature8,9,10,11,12,13,14. For the purposes of this paper, assume without loss of generality that the ball begins its motion at the point (0, 0), falling along the curve y(x) to the reach the point (a, b), where y(a) = b.

Thus, the brachistochrone curve is the curve y(x) which minimises the total falling time (T). An expression for T can be found by first considering the total distance (L) the ball travels along the curve:

    \[L = \int_0^a \sqrt{1 + \left(\frac{dy}{dx}\right)^2} \, dx\]

Since

    \[\frac{ds}{dx} = \sqrt{1 + \left(\frac{dy}{dx}\right)^2}\]

Where s is the distance travelled by the ball along the curve at the point (x, y(x)), or the distance travelled thus far by the ball. By the definition of velocity (v):

    \[v(t) = \frac{ds}{dt} \Rightarrow \frac{dt}{ds} = \frac{1}{v(s)}\]

Where t is time since the ball starts falling. Hence, for the total falling time T:

    \[T = \int_0^T dt = \int_0^L \frac{1}{v(s)} \, ds\]

Conservation of mechanical energy states (note the negative sign on y(x), since the falling occurs in the fourth quadrant):

    \[\frac{1}{2} m v(x)^2 = -mgy(x) \Rightarrow v(x) = \sqrt{-2gy(x)}\]

Where m is the mass of the ball. This allows T to be rewritten entirely in terms of y and x:

    \[T = \int_0^L \frac{1}{v(s)} \, ds = \int_0^a \frac{1}{\sqrt{-2gy(x)}} \sqrt{1 + \left(\frac{dy}{dx}\right)^2} \, dx\]

Or, assuming an invertible y(x):

(1.2.1)   \begin{equation*<em>} T = \int_0^a \frac{1}{\sqrt{-2gy(x)}} \sqrt{1 + \left(\frac{dy}{dx}\right)^2} \, dx \end{equation*</em>}\begin{equation*}= \int_0^b \frac{1}{\sqrt{-2gy}} \sqrt{1 + \left(\frac{dx}{dy}\right)^2} \, dy\end{equation*}

Hence, the brachistochrone problem can be reduced to finding the curve y(x) which minimises the functional T(y) while fulfilling the boundary conditions y(0) = 0 and y(a) = b. This can be done through the machinery of variational calculus, though that is likely inaccessible to high school students and therefore beyond the scope of this paper.

Further, both the variational approach and Bernoulli’s original solution (and subsequent solutions inspired by it) exploit physics principles such as conservation of mechanical energy15,16,5,9 and Snell’s law1,17. While physically true, these assumptions can appear arbitrary from a purely mathematical perspective. As such, this paper seeks a more “ground-up” derivation, in which these results emerge naturally – using methods approachable from high school students — from only the most basic mechanics of the problem.

Initial Investigation

One Segment / Basic Mechanics

This paper aims to find the brachistochrone curve from only the basic mechanics of falling. The simplest path that the ball can take between two points is a straight line. Hence, consider the points A and B connected by the line \overline{AB} (see Fig 2a).

Fig 2a | A ball descending down a single line segment
Fig 2b | The weight vector \vec{F}_g and its tangential component \vec{F}_T

Finding the ball’s time of descent is straightforward. To begin, the ball is acted upon by the weight force \vec{F}_g, the magnitude of which is proportional to the mass m of the ball:

    \[\vec{F}_g = \begin{pmatrix} 0 \ -mg \end{pmatrix} \Rightarrow F_g = mg\]

Where F_g is the magnitude of the weight force and g is the acceleration due to gravity. Note that magnitudes of vector quantities will be represented in this paper by its symbol (e.g. \vec{F}g) without the vector arrow (e.g. F_g). We then find (as per Fig. 2b):

    \[F\tau = mg \cos(\theta)\]

Since the normal component of \vec{F}g is cancelled out by reaction forces provided by the ramp (or equivalently, since the reaction forces do no work along the tangent direction for frictionless sliding), \vec{F}\tau is the net force acting on the ball. Hence, by the second law of motion, the magnitude of acceleration (a) is given by:

(2.1.1)   \begin{equation*}a = \frac{F_\tau}{m} = g \cos(\theta)\end{equation*}

The direction of acceleration does not change, so only its magnitude is relevant. Hence, integrating twice with respect to elapsed time (t):

(1)   \begin{align*} v &= at = g \cos(\theta) t \ s &= \frac{1}{2} at^2 = \frac{1}{2} g \cos(\theta) t^2 \end{align*}

Both integration constants come to zero, as the magnitudes of velocity (v) and displacement (s) are zero before the ball starts moving. Hence, the total time of descent (T) is given by:

(2.1.2)   \begin{equation*}T = \sqrt{\frac{2s}{g \cos(\theta)}} = \sqrt{\frac{2\overline{AB}}{g \cos(\theta)}}\end{equation*}

For example, if the ball falls from A(0,0) to B(1,-1), then we have:

    \[T = \sqrt{\frac{2\sqrt{2}}{9.81 \times \cos 45^\circ}} \approx 0.639 \text{ seconds}\]

Two Segments

A natural next step is to investigate a slightly more complex path, like one formed by two connected segments. To that end, add a third point C below and to the right of B, forming two segments \overline{AB} and \overline{BC}.

Fig 3 | The ball falling down two segments

From now on, when considering multiple connected segments, \theta_n represents the angle formed against the vertical by the nth such segment. Similarly, s_n refers to the length of the nth segment, t_n the time of descent along it. t_1 has been found in the previous section, as per (2.1.2):

    \[t_1 = \sqrt{\frac{2s_1}{g \cos(\theta_1)}}\]

The ball arrives at the second segment with a non-zero velocity (call this v_1). Hence, if we ignore the first segment and define t = 0 as the time at which the ball arrives at the second segment, then the equations of motion are:

(2)   \begin{align*} a &= g \cos(\theta_2) \ v &= g \cos(\theta_2) t + v_1 \ s &= \frac{1}{2} g \cos(\theta_2) t^2 + v_1 t \end{align*}

Where:

(2.2.1)   \begin{equation*}v_1 = g \cos(\theta_1) t_1 = \sqrt{2g \cos(\theta_1) s_1}\end{equation*}

Next, once the ball arrives at the second segment:

    \[s(t_2) = s_2 = \frac{1}{2} g \cos(\theta_2) t_2^2 + v_1 t_2\]

Which implies a quadratic in t_2:

    \begin{equation*} \small&\frac{1}{2} g \cos(\theta_2) t_2^2 + v_1 t_2 - s_2 = 0 \ &\Rightarrow t_2 = \frac{-v_1 \pm \sqrt{v_1^2 + 2g \cos(\theta_2) s_2}}{g \cos(\theta_2)}\end{equation*}


(3)   \begin{align*}\ &= \frac{-\sqrt{2g \cos(\theta_1) s_1} \pm \sqrt{2g \cos(\theta_1) s_1 + 2g \cos(\theta_2) s_2}}{g \cos(\theta_2)} \end{align*}

Since \sqrt{2g \cos(\theta_1) s_1 + 2g \cos(\theta_2) s_2} > \sqrt{2g \cos(\theta_1) s_1}, there is one negative and one positive solution for t_2. Rejecting the negative solution, we find that:

(4)   \begin{equation*} T = t_1 + t_2 = \sqrt{\frac{2s_1}{g \cos(\theta_1)}} - \end{equation*}

(2.2.2)   \begin{equation*}\frac{\sqrt{2g \cos(\theta_1) s_1} + \sqrt{2g \cos(\theta_1) s_1 + 2g \cos(\theta_2) s_2}}{g \cos(\theta_2)}\end{equation*}

Consider a similar numerical example as before. Let A be at (0, 0), C be at (1,-1), and B be at (0.3, -0.3). As such, \theta_1 = \tan^{-1}(0.3/0.7) \approx 23.2^\circ, \theta_2 \approx 66.8^\circ and s_1 = s_2 = \sqrt{0.3^2 + 0.7^2} \approx 0.762. In this case, we have T \approx 0.598s, which is a slightly shorter descent time than the single line segment connecting (0,1) and (1,0).

Speed Across Multiple Segments

As can be seen, finding time of descent becomes considerably more complex with the addition of even one more segment, since the ball arrives at the second segment with non-zero speed. Observe, however, that (by 2.2.1):

    \[\cos(\theta_1) = \frac{h_1}{s_1} \Rightarrow s_1 = \frac{h_1}{\cos(\theta_1)} \quad \therefore v_1 = \sqrt{2g \cos(\theta_1) s_1} = \sqrt{2gh_1}\]

Where h_1 is the vertical height of the first segment (see Fig. 4). Hence, v_1 depends only on h_1, which the reader may already recognise to be equivalent to mechanical energy conservation (assuming no friction or rolling). Proving this result for any number of segments would remove the need for individually calculating the initial velocity at each junction point.

Hence, consider a ball falling along a series of connected line segments, currently falling along the nth segment. Let the height through which it has fallen be h_1 + h_2 + h_3 \ldots + h_n, where h_n is the vertical height of the nth segment.

Fig 4 | A scenario where the ball is at the end of the fifth segment. In this case, n = 5.

It can be shown that the speed v_n of the ball at the end of the nth segment is dependent only on the height through which it has already fallen. More precisely:

(2.3.1)   \begin{equation*}v_n = \sqrt{2g(h_1 + h_2 + h_3 \ldots + h_n)}\end{equation*}

Proof:

The statement (2.3.1) can be proven inductively, already demonstrated for the base case n = 1:

    \[v_1 = \sqrt{2gh_1}\]

Next, assume for induction that the case n = k is true. For compactness, write h_1 + h_2 + h_3 \ldots + h_k = \Sigma_k and thus v_k = \sqrt{2g\Sigma_k}. We now wish to prove the validity of the next case (n = k + 1). Again, the acceleration a is given by:

(2.3.2)   \begin{equation*}a = g \cos(\theta_{k+1})\end{equation*}

As in the previous section, to find speed and distance travelled, it is most convenient to define t = 0 as being the time when the ball arrives at the segment being considered (in this case the k + 1th segment). Integrating acceleration with respect to t:

(5)   \begin{align*} v_{k+1} &= g \cos(\theta_{k+1}) t + \sqrt{2g\Sigma_k} \ s &= \frac{1}{2} g \cos(\theta_{k+1}) t^2 + \sqrt{2g\Sigma_k} \, t \end{align*}

Note that, by the n = k case, the ball has an initial velocity v_k = \sqrt{2g\Sigma_k}, i.e. v(0) = \sqrt{2g\Sigma_k}, while s(0) = 0. The length s_{k+1} of the k + 1th segment is:

    \begin{equation*} s(t_{k+1}) &= \frac{h_{k+1}}{\cos(\theta_{k+1})} = \frac{1}{2} g \cos(\theta_{k+1}) t_{k+1}^2 + \sqrt{2g\Sigma_k} \, t_{k+1} \ &\Rightarrow \frac{1}{2} g \cos(\theta_{k+1}) t_{k+1}^2\end{equation*}


(6)   \begin{align*} + \sqrt{2g\Sigma_k} \, t_{k+1} - \frac{h_{k+1}}{\cos(\theta_{k+1})} = 0 \end{align*}

Which – similar to before — is a quadratic in t_{k+1}. Solving for t_{k+1}:

    \[t_{k+1} = \frac{-\sqrt{2g\Sigma_k} + \sqrt{2g\Sigma_k + 2gh_{k+1}}}{g \cos(\theta_{k+1})}\]

The negative root has been rejected. Substituting this back into (2.3.2) yields:

    \[v_{k+1} = \sqrt{2g(\Sigma_k + h_{k+1})} = \sqrt{2g(h_1 + h_2 \ldots + h_k + h_{k+1})}\]

Recalling that \Sigma_k = h_1 + h_2 \ldots + h_k, we find that the statement is also true for n = k + 1. Hence, the statement is true for n = k + 1 so long as it is true for n = k. Because (2.3.1) was valid for the base case n = 1, by the principle of induction, it is true for all positive integers n.

Since h_n has not been given a value, it could, for the sake of argument, be equal to any positive real number. Hence, the statement is true continuously across every segment, not just at the endpoints. To reflect this, v_n can be replaced by v (representing instantaneous velocity):

(2.3.3)   \begin{equation*}v = \sqrt{2g(h_1 + h_2 \ldots + h_n)}\end{equation*}

Average Speed

Let the average speed of the ball falling along the nth segment be \bar{v}n = s_n/t_n. By corollary to (2.3.3), \bar{v}_n is dependent only on the heights fallen: \Sigma_n and \Sigma{n-1} (with \Sigma_{n-1} = h_1 + h_2 \ldots h_{n-1}).

Proof:

Again, let t = 0 be the time at which the ball reaches the segment in question. The acceleration across that segment is still a = g\cos(\theta_n), so integrating twice with respect to t yields v and s:

(7)   \begin{align*} v &= g\cos(\theta_n)t + v_{n-1} \ s &= \frac{g}{2} \cos(\theta_n)t^2 + v_{n-1} t \end{align*}

Note that v(0) = v_{n-1} and s(0) = 0. Next, since t_n is the time taken to traverse s_n:

    \[v_n = v(t_n) = g\cos(\theta_n)t_n + v_{n-1}\]

s_n is given by:

    \[s_n = s(t_n) = \frac{g}{2} \cos(\theta_n)t_n^2 + v_{n-1} t_n\]

And hence, by definition:

    \[\bar{v}n = \frac{\text{distance}}{\text{time}} = \frac{s_n}{t_n} = \frac{g}{2} \cos(\theta_n) t_n + v{n-1} \Rightarrow \bar{v}n\]

    \[= \frac{g\cos(\theta_n) t_n + 2v{n-1}}{2} = \frac{v_{n-1} + v_n}{2}\]

By (2.3.3):

(2.4.1)   \begin{equation*}\therefore \bar{v}n = \frac{v{n-1} + v_n}{2} = \frac{\sqrt{2g(\Sigma_n - h_n)} + \sqrt{2g\Sigma_n}}{2}\end{equation*}

This result will be used in the next section.

Deriving A Least-Time Property

Minimizing Descent Time

Revisiting the numerical example with A(0, 0), B(0.3,-0.3) and C(1,-1), (2.3.3) and (2.4.1) make finding descent time significantly simpler. This is since \bar{v}_n = \text{distance}/\text{time} = s_n/t_n produces the same answer without needing to solve a quadratic:

    \[T = t_1 + t_2 = \frac{s_1}{\bar{v}_1} + \frac{s_2}{\bar{v}_2} = \frac{2 \times 0.762}{\sqrt{2g(0.7)}} + \frac{2 \times 0.762}{\sqrt{2g(0.7)} + \sqrt{2g}} \approx 0.598\text{s}\]

This observation allows us to minimise travel time by optimising the placement of B. Let the vertical and horizontal distances between A and C be H and W respectively. If the positions of A and C are fixed – and by extension, if H and W are constant – what are the values of h_1 and w_1 (see Fig. 5) which minimise the descent time?

Fig 5 | The two-segment path with relevant widths and heights

By Pythagoras, the lengths of the segments \overline{AB} and \overline{BC} are:

    \[\overline{AB} = \sqrt{w_1^2 + h_1^2} \qquad \overline{BC} = \sqrt{(W - w_1)^2 + (H - h_1)^2}\]

The total travel time is simply the sum of the travel times across \overline{AB} and \overline{BC}:

    \[T = t_1 + t_2 = \frac{\sqrt{w_1^2 + h_1^2}}{\bar{v}_1} + \frac{\sqrt{(W - w_1)^2 + (H - h_1)^2}}{\bar{v}_2}\]

Since H is fixed, we know by (2.4.1) that the average speeds depend only on h_1. Therefore, if h_1 is held constant, then T depends on w_1 alone. As such, we can examine how the travel time is minimised by changing w_1 alone. Differentiating T with respect to w_1:

    \[\frac{dT}{dw_1} = \frac{w_1}{\bar{v}_1 \sqrt{w_1^2 + h_1^2}} - \frac{W - w_1}{\bar{v}_2 \sqrt{(W - w_1)^2 + (H - h_1)^2}} = 0\]

Hence, since \frac{dT}{dw_1} = 0 at minima and maxima, the following condition must upheld:

(3.1.2)   \begin{equation*}\frac{w_1}{\bar{v}_1 \sqrt{w_1^2 + h_1^2}} = \frac{W - w_1}{\bar{v}_2 \sqrt{(W - w_1)^2 + (H - h_1)^2}}\end{equation*}

It can be shown that T is indeed minimised by analysing the second derivative. Factoring out the constant 1/\bar{v}_1 and 1/\bar{v}_2 terms and then differentiating once more with respect to w_1:

    \[\frac{d^2T}{dw_1^2} = \frac{1}{\bar{v}_1} \times \frac{d}{dw_1} \left(\frac{w_1}{\sqrt{w_1^2 + h_1^2}}\right)- \frac{1}{\bar{v}_2} \times \frac{d}{dw_1} \left(\frac{W - w_1}{\sqrt{(W - w_1)^2 + (H - h_1)^2}}\right)\]

Applying the quotient rule yields:

    \[\frac{d^2T}{dw_1^2} = \frac{h_1^2}{\bar{v}_1 (w_1^2 + h_1^2)^{3/2}} + \frac{(H - h_1)^2}{\bar{v}_1 [(W - w_1)^2 + (H - h_1)^2]^{3/2}}\]

Each term in \frac{d^2T}{dw_1^2} is a ratio of positive constants and squared distances, hence \frac{d^2T}{dw_1^2} is strictly positive for all values of w_1. Therefore, T(w_1) is concave upwards for all values of w_1, meaning that its only stationary point is a minimum which occurs when:

    \[\frac{w_1}{\bar{v}_1 \sqrt{w_1^2 + h_1^2}} = \frac{W - w_1}{\bar{v}_2 \sqrt{(W - w_1)^2 + (H - h_1)^2}}\]

Snell’s Law

We have found that descent time is minimised when \frac{w_1}{\bar{v}_1 \sqrt{w_1^2 + h_1^2}} = \frac{W - w_1}{\bar{v}_2 \sqrt{(W - w_1)^2 + (H - h_1)^2}}, which is an unwieldy result. Since \sin = \text{opposite}/\text{hypotenuse}, this can be rewritten as:

    \[\frac{w_1}{\bar{v}_1 \sqrt{w_1^2 + h_1^2}} = \frac{\sin(\theta_1)}{\bar{v}_1} = \frac{W - w_1}{\bar{v}_2 \sqrt{(W - w_1)^2 + (H - h_1)^2}} = \frac{\sin(\theta_2)}{\bar{v}_2}\]

Next, it has been shown that there is only one value of w_1 which minimises T(w_1). As a result, for constant h_1, point B is `fixed’. In that case, therefore, the angles \theta_1 and \theta_2 are constant (as are \frac{\sin(\theta_1)}{\bar{v}_1} and \frac{\sin(\theta_2)}{\bar{v}_2} by extension). Let k represent a constant term (k in fact scales the radius of the cycloid), meaning the time-minimising condition can be rewritten as:

(3.2.1)   \begin{equation*}\frac{\sin(\theta_1)}{\bar{v}_1} = \frac{\sin(\theta_2)}{\bar{v}_2} = k\end{equation*}

This result is well-known in physics as Snell’s law, which describes the path that light will take as it moves through different media. Light travels through different materials at different (but constant) velocities and will always take the least-time path, so the motion of light rays (as first noticed by Bernoulli in his optical-mechanical analogy18,19 and subsequently re-applied by others20 is directly applicable to this optimisation problem.

Generalising To n Segments

The time-minimising property has been proven directly before for two segments (\overline{AB} and \overline{BC}), but can be generalised to two points connected by n segments. In other words, for an object falling down n connected segments which each have a fixed vertical height, the condition which minimises the total travel time T = t_1 + t_2 \ldots + t_n is:

(3.3.1)   \begin{equation*}\frac{\sin(\theta_1)}{\bar{v}_1} = \frac{\sin(\theta_2)}{\bar{v}_2} = \cdots = \frac{\sin(\theta_n)}{\bar{v}_n} = k\end{equation*}

Proof:

By (2.3.3), the vertical heights h_1, h_2 \ldots h_n being fixed means that the initial speeds v_1, v_2 \ldots v_n are also fixed, as are the average speeds \bar{v}_1, \bar{v}_1 \ldots \bar{v}_n. Therefore, varying the width w_n of the nth segment will only affect the travel time across that particular segment. This is key.

As such, assume that the segments form the time-minimising path:

  • If w_1 and w_2 are such that \frac{\sin(\theta_1)}{\bar{v}_1} \neq \frac{\sin(\theta_2)}{\bar{v}_2}, then t_1 + t_2 can be decreased while all other descent times stay the same, which would decrease the total travel time T. Contradiction. Hence, \frac{\sin(\theta_1)}{\bar{v}_1} = \frac{\sin(\theta_2)}{\bar{v}_2}.
  • If w_2 and w_3 are such that \frac{\sin(\theta_2)}{\bar{v}_2} \neq \frac{\sin(\theta_3)}{\bar{v}_3}, then t_2 + t_3 can be decreased, which would decrease the value of T. Contradiction. Hence, \frac{\sin(\theta_2)}{\bar{v}_2} = \frac{\sin(\theta_3)}{\bar{v}_3}. \ldots
  • If w_{n-1} and w_n are such that \frac{\sin(\theta_{n-1})}{n-1} \neq \frac{\sin(\theta_n)}{\bar{v}_n}, then t_{n-1} + t_n can be decreased, which would decrease the value of T. Contradiction. Hence, \frac{\sin(\theta_{n-1})}{\bar{v}_{n-1}} = \frac{\sin(\theta_n)}{\bar{v}_n}.

Hence, by transitivity, we arrive at the generalisation:

    \[\frac{\sin(\theta_1)}{\bar{v}_1} = \frac{\sin(\theta_2)}{\bar{v}_2} \ldots = \frac{\sin(\theta_n)}{\bar{v}_n} = k\]

Finding The Brachistochrone

‘Smooth’ Curve

So far, the time-minimising condition has been found for a mass travelling down n connected line segments. Such a path is not smooth, as the mass experiences an abrupt change in direction at the boundary between each segment.

However, since speed depends only on height fallen, it can be argued that the brachistochrone is a smooth curve, or at least approximates one. Considering two points (A and B), observe that:

Fig 7a | Increasing the number of points decreases total travel time
  • The descent time between A and B (call this t_{AB}) can be decreased by adding an intermediate point (C) between A and B, so long as the placement of C fulfils (3.3.1). This can be done without affecting any other descent time because the relative heights of the other points are unchanged (and thus the average speed along them too).
  • Similarly, t_{AC} can be decreased by adding another point (D) between A and C, such that the placement of \overline{AD} and \overline{DC} fulfils (3.3.1).
  • Then, t_{AD} can be decreased by adding another point between A and D, and so on.

More intermediate points can added until the length of each connecting segment approaches zero. This path (approaching `infinitely’ many points) would intuitively be the brachistochrone, since no more points can be added to decrease travel time further.

Strictly speaking, this argument is only heuristic, and should not be construed as a rigorous proof of the brachistochrone’s smoothness (since `smoothness’ has a specific definition within real analysis). As noted in the introduction, a fully rigorous solution to the brachistochrone problem would require variational calculus, which is beyond the scope of this paper.

However, this argument demonstrates the need for the length of each segment to approach zero – implying that the curve eventually becomes infinitely differentiable. In addition, it acts as a useful intuition for how the time-minimising curve may be found via integration.

Fig 7b | The ball falling down two segments

The Brachistochrone Integral

Consider an infinitesimally short segment ds on this `smooth’ time-minimising curve, which forms an angle \theta against the y-axis. As per Fig. 8, a mass falling along ds would travel vertically and horizontally by the distances -dy and dx respectively (note the negative sign on -dy because the ball is falling downwards).

Fig 8 | An infinitesimally short segment (ds) along a smooth curve

ds being infinitesimally short means that the ball’s average speed is essentially equal to its instantaneous speed, the change in height tends towards zero. By (2.4.1):

    \[\bar{v}n = \lim{h_n \to 0} \frac{\sqrt{2g(\Sigma_n - h_n)} + \sqrt{2g\Sigma_n}}{2} = \sqrt{2g\Sigma_n} = v\]

Note that \bar{v}_n = v exactly, since acceleration becomes constant in the limit ds \to 0. Hence, since (3.3.1) is fulfilled instantaneously across the brachistochrone, rewrite it as:

(4.2.1)   \begin{equation*}\frac{\sin(\theta)}{v} = k \Leftrightarrow \sin(\theta) = kv\end{equation*}

By Pythagoras, \sin(\theta) can be rewritten:

    \[\frac{dx}{\sqrt{dx^2 + (-dy)^2}} = kv\]

Rearranging algebraically, this can be interpreted as a differential equation:

(8)   \begin{align*} \frac{dx^2}{dx^2 + (-dy)^2} &= k^2 v^2 \ \Rightarrow dx^2 &= \frac{k^2 v^2}{1 - k^2 v^2} (-dy)^2 \end{align*}

(4.2.2)   \begin{equation*}\Rightarrow \frac{dx}{dy} = -\sqrt{\frac{k^2 v^2}{1 - k^2 v^2}}\end{equation*}

Recalling (2.3.3), the velocity v of the ball depends only on the height through which it has already fallen. Again, since the ball begins at the point (0, 0), the vertical distance fallen is simply the negative y-coordinate, or -y. By (2.3.3):

    \[v = \sqrt{2g(h_1 + h_2 \ldots + h_n + h_{n+1})} = \sqrt{2g(-y)}\]

(4.2.3)   \begin{equation*}v = \sqrt{-2gy}\end{equation*}

Which is equivalent to conservation of mechanical energy (assuming no friction or rolling). This can be substituted back into (4.2.2):

(4.2.4)   \begin{equation*}\frac{dx}{dy} = -\sqrt{\frac{k^2 v^2}{1 - k^2 v^2}} = -\sqrt{\frac{-2k^2 gy}{1 + 2k^2 gy}} = -\sqrt{\frac{-Cy}{1 + Cy}}\end{equation*}

The constant C = 2k^2 g has been introduced for convenience. This derivative form, however, does not provide very much insight into the curve it describes. As such, we integrate with respect to y to find the Cartesian form.

    \[x = \int -\sqrt{\frac{-Cy}{1 + Cy}} \, dy = -\int \sqrt{\frac{-Cy}{1 + Cy}} \, dy\]

Evaluating this integral is tedious. First, the integrand may be manipulated with the substitution:

    \[u = \sqrt{\frac{-Cy}{Cy + 1}} \Rightarrow y = -\frac{u^2}{C(u^2 + 1)} \Rightarrow dy = -\frac{2u}{C(u^2 + 1)^2} \, du\]

Hence, by the chain rule:

(9)   \begin{equation*}\begin{split} \text{RHS}_1 &= \frac{-2 \sin(\theta) \cos(\theta)}{1 - (1 - 2 \sin^2(\theta))} = -\frac{\cos(\theta)}{\sin(\theta)} = -\cot(\theta) \\ \text{RHS}_2 &= -\frac{\sin(90^\circ - \theta)}{\cos(90^\circ - \theta)} = -\frac{\cos(\theta)}{\sin(\theta)} = -\cot(\theta) \end{split} \end{equation*}

The remaining integral \int \frac{1}{(u^2 + 1)^2} \, du is best evaluated using a trigonometric substitution:

    \[z = \arctan(u) \Rightarrow du = \sec^2(z) \, dz\]

Which yields:

(10)   \begin{align*}\int \frac{1}{(u^2 + 1)^2} \, du &= \int \frac{\sec^2(z)}{(\tan^2(z) + 1)^2} \, dz = \int \cos^2(z) \, dz \ &= \frac{1}{2} \sin(z)\cos(z) + \frac{z}{2}\end{align*}

The z substitution can be undone by noting that:

    \[\sin(\arctan(u)) = \frac{u}{\sqrt{u^2 + 1}} \qquad \text{and} \qquad \sin(\arctan(u)) = \frac{1}{\sqrt{u^2 + 1}}\]

Fig 9 | A right triangle with side lengths u, 1 and \sqrt{u^2 + 1}.

These relations can be proven efficiently by constructing a right triangle with side lengths u, 1 and \sqrt{u^2 + 1}. From Fig. 9, observe that:

    \[\sin(\arctan(u)) = \sin(\varphi) = \frac{u}{\sqrt{u^2 + 1}}\]


    \[\qquad \text{and} \qquad \cos(\arctan(u)) = \cos(\varphi) = \frac{1}{\sqrt{u^2 + 1}}\]

As such, undoing the z substitution yields:

    \[\frac{1}{2} \sin(z)\cos(z) + \frac{z}{2} = \frac{u}{2(u^2 + 1)} + \frac{1}{2} \arctan(u)\]

Finally, putting everything together and undoing the u substitution:

    \[x &= \frac{2}{C} \left(\arctan(u) - \int \frac{1}{(u^2 + 1)^2} \, du\right) = \frac{1}{C} \arctan(u) - \frac{1}{C} \times \frac{u}{(u^2 + 1)}\]


    \[&\Rightarrow x = \frac{1}{C} \arctan\left(\sqrt{\frac{-Cy}{Cy + 1}}\right) - \frac{1}{C} \times \frac{\left(\sqrt{\frac{-Cy}{Cy + 1}}\right)}{\left(1 - \frac{Cy}{Cy + 1}\right)}\]


    \[&= \frac{1}{C} \arctan\left(\sqrt{\frac{-Cy}{Cy + 1}}\right) - \frac{1}{C} \times \frac{\left(\sqrt{\frac{-Cy}{Cy + 1}}\right)}{\left(\frac{1}{Cy + 1}\right)}\]

The curve begins at (0, 0). Hence, x(0) = 0 and the constant of integration comes to zero:

(4.2.5)   \begin{equation*}x = \frac{1}{C} \arctan\left(\sqrt{\frac{-Cy}{1 + Cy}}\right) - \frac{1}{C} (Cy + 1) \sqrt{\frac{-Cy}{1 + Cy}}\end{equation*}

This is the explicit solution to (4.2.2). Since it obeys the `instantaneous’ least-time condition (4.2.1), it must represent the least-time curve, i.e. the brachistochrone.

Parametric Representation

The arctangent term in (4.2.5) makes algebraic manipulation impractically complex. As such, representation using parametric equations is more elegant. Let \tan(\beta) = \sqrt{\frac{-Cy}{1 + Cy}}, which would eliminate the nested square root via substitution (\beta is a parameterising variable whose geometric meaning will be demonstrated later). Making y the subject:

    \[\tan^2(\beta) = \frac{-Cy}{1 + Cy} \Rightarrow -\tan^2(\beta) = Cy(\tan^2(\beta) + 1)\]


    \[\Rightarrow y = -\frac{1}{C} \times \frac{\tan^2(\beta)}{\tan^2(\beta) + 1}\]

Noting that \tan(\beta) \equiv \frac{\sin(\beta)}{\cos(\beta)}:

    \[y = -\frac{1}{C} \times \frac{\tan^2(\beta)}{\tan^2(\beta) + 1} = -\frac{1}{C} \times \frac{\left(\frac{\sin^2(\beta)}{\cos^2(\beta)}\right)}{\left(\frac{\sin^2(\varphi) + \cos^2(\beta)}{\cos^2(\beta)}\right)} = -\frac{1}{C} \sin^2(\beta)\]

(4.3.1)   \begin{equation*}\therefore y = -\frac{1}{2C} + \frac{1}{2C} \cos(2\beta)\end{equation*}

Similarly, rearranging to make x the subject:

(4.3.2)   \begin{equation*}\therefore x = -\frac{1}{C} \beta + \frac{1}{2C} \sin(2\beta)\end{equation*}

Finally, introducing the constant R = \frac{1}{2C} = \frac{1}{4gk^2} and changing the parameterising variable to \alpha = 2\beta yields a simpler set of equations than (4.3.1) and (4.3.2):

    \[x = R\alpha - R\sin(\alpha)\]


    \[y = -R + R\cos(\alpha)\]

These are the standard parametric equations for the inverted cycloid traced by a circle of radius R whose centre rolls along the line y = -R.

As shown below in Fig. 10, observe how the brachistochrone starts steep, before flattening out. This can be understood intuitively as arising from a tradeoff between acceleration (requiring a steep gradient), and then covering horizontal distance (requiring a shallow / flat gradient).

Fig 10 | (4.3.3) graphed on Desmos, R = 1 and 0 \leq \alpha \leq 2\pi

The Cycloid

(4.3.3) Is the set of parametric equations which describe the brachistochrone, found using integration. These equations in fact describe a cycloid – the locus of a point on the circumference of a circle which rolls without slipping at a constant speed.

Fig 11 | The motion of a point on the circumference of a rolling circle, which sweeps out an inverted cycloid.

Proof:

Consider a generating circle with radius R, rolling with a constant speed in the positive x-direction (rightwards). Let P be the point on its circumference which begins at the origin (0, 0), such that the path traced out by P is a cycloid. Suppose then that the circle rolls for an arbitrary amount of time, with the centre of the circle travelling through through the length \overline{CC_1} and \alpha being the (counterclockwise) angle through which P has turned from rest.

Fig 12 | The point P on the circumference of a generating circle

Let A and B be points on the x-axis and \overline{AC_1} respectively, such that \overline{AC_1} \perp \overline{OA} and \overline{PB} \perp \overline{AC_1}. Since \overline{PC_1} = \overline{AC_1} = R, the y-position of P is given by:

    \[y = -\overline{AC_1} + \overline{BC_1} = -R + R\cos(\alpha)\]

The length \overline{CC_1} (and by extension P‘s x-position) are not readily apparent. However, rolling without slipping means that the circle’s centre must travel a distance equal to the arclength swept out by P. In other words, \overline{CC_1} = R\alpha. Hence:

    \[x = \overline{CC_1} - \overline{PB} = R\alpha - R\sin(\alpha)\]

Fig. 12 depicts an acute \alpha. For the sake of completeness, however, the same result may be obtained for any \alpha \geq \pi/2, with the process being repeated almost verbatim.

Fig 13 | Cases where the angle (\alpha) swept through by P is greater than \pi/2

    \[&\text{If } 0 \leq \alpha \leq \tfrac{\pi}{2}, \text{ then } x = R \alpha - R \sin(\alpha)\]


    \[\text{ and } y = -R + R \cos(\alpha) \quad \text{(as before)}\]


    \[&\text{If } \tfrac{\pi}{2} \leq \alpha \leq \pi, \; x = R \alpha - R \cos!\left(\alpha - \tfrac{\pi}{2}\right)\]


    \[= R \alpha - R \sin(\alpha) \text{ and } y = -R - R \sin!\left(\alpha - \tfrac{\pi}{2}\right)\]


    \[= -R + R \cos(\alpha)\]


    \[&\text{If } \pi \leq \alpha \leq \tfrac{3\pi}{2}, \; x = R \alpha + R \sin(\alpha - \pi)\]


    \[= R \alpha - R \sin(\alpha) \text{ and } y = -R - R \cos(\alpha - \pi)\]


    \[= -R + R \cos(\alpha)\]


    \[&\text{If } \tfrac{3\pi}{2} \leq \alpha \leq 2\pi, \; x = R \alpha +\]


    \[R \cos!\left(\alpha - \tfrac{3\pi}{2}\right)\]


    \[= R \alpha - R \sin(\alpha) \text{ and } y = -R + R \sin!\left(\alpha - \tfrac{3\pi}{2}\right)\]


    \[= -R + R \cos(\alpha)\]

Thus, for a generating circle with radius R, the position of P is given by
x = R\alpha - R\sin(\alpha)
y = -R + R\cos(\alpha), which exactly matches the parametric equations derived in (4.3.3). The shape of the brachistochrone, therefore, is that of an inverted cycloid.

Boundary Conditions / Completeness

Though important for completeness, fully accounting for the boundary conditions (that is, finding how the generating circle radius R depends on the start and endpoints) is beyond the scope of this paper. For example, forcing the cycloid to pass through the earlier points (0, 0) and (1,-1) yields:

    \begin{equation*}0 = R\alpha - R\sin(\alpha) = -R + R\cos(\alpha) \1 = R\alpha - R\sin(\alpha) = R - R\cos(\alpha)\Rightarrow\end{equation*}


(11)   \begin{equation*}\alpha = 0 \\alpha - \sin(\alpha) = 1 - \cos(\alpha)\end{equation*}

\alpha - \sin(\alpha) = 1 - \cos(\alpha) is transcendental, while \alpha = 0 simply means the brachistochrone starts from the `cusp’ of the cycloid. Hence, in the case where only start and end points are known (as Bernoulli’s original problem statement suggests), R cannot be found purely algebraically21 (except when the end point is at the same height as the starting point). Rigorously confirming that this time-minimising value of R is unique (and actually time-minimising) is similarly beyond the scope of this paper, requiring complex variational methods22,23,24.

Nonetheless, suitable values of R can still be found numerically, demonstrating the brachistochrone property of the cycloid. For example, considering curves which begin at (0,0) and end at (1,-1):

Table 1 | *T can be calculated from (1.2.1)                 

A Geometric Approach (Comparison)

Setup and Motivation

In the previous section, a cycloidal path was shown to instantaneously fulfil the least-time condition \frac{\sin(\theta)}{v} = k identified in (4.2.1). However, the approach taken — forming and then solving a differential equation — was highly mechanical. It can in fact be confirmed more elegantly that the cycloid obeys \frac{\sin(\theta)}{v} = k at every point through a purely geometric method7.

As before, consider some point P on the cycloid, which lies on the circumference of its generating circle, currently centred on C_1 (the cycloid is not shown in Fig. 14a to prevent clutter). Construct a line \overline{AB} such that it forms the vertical diameter of the generating circle, with \overline{AB} \perp \overline{OA}. By Thales’ theorem, \triangle APB is a right triangle, with A\widehat{P}B = 90^\circ.

Next, extend PB until it reaches OA at T and let XP be the vertical height fallen by P. XQ \parallel AB, so by corresponding angles, X\widehat{P}T = A\widehat{B}T, while X\widehat{Q}T = A\widehat{B}T by alternating angles. Noting further that angle at centre = 2 \times at circumference:

    \[A\widehat{C}_1P = \alpha = 2X\widehat{P}T = 2A\widehat{B}T = 2\beta\]

As per the substitution \alpha = 2\beta. The key to this approach, then, is that if TB is the tangent to the cycloid at P, then \beta = \theta, which in turn allows for confirmation of the condition \sin(\theta) v = k by analysing the geometry of the triangle \triangle TAP.

Tangency Of \overline{TB}

The author of7 notes that the tangent to the cycloid always passes through the bottom of the generating circle, or that (as per Fig. 14a) \overline{TB} is the tangent to the cycloid at P.

Proof:

Recalling (4.3.3), the position of P is given by:

    \[x = R\alpha - R\sin(\alpha)\]


    \[y = -R + R\cos(\alpha)\]

Hence, by the chain rule, the gradient of the cycloid at P is:

    \[\frac{dy}{dx} = \frac{dy}{d\alpha} \times \frac{d\alpha}{dx} = \frac{-R\sin(\alpha)}{R - R\cos(\alpha)} = \frac{-\sin(\alpha)}{1 - \cos(\alpha)} = \frac{-\sin(2\theta)}{1 - \cos(2\theta)}\]


    \[= \text{RHS}_1\]

Observe that if \overline{TB} is the tangent to the cycloid, then:

    \[\frac{dy}{dx} = \frac{\text{Rise}}{\text{Run}} = -\frac{\overline{AB}}{\overline{TA}} = -\tan(A\widehat{T}B) = -\tan(90^\circ - \theta) = \text{RHS}_2\]

Simplifying the \text{RHS}_1 and \text{RHS}_2 further:

(12)   \begin{equation*}\begin{split} \text{RHS}_1 &= \frac{-2 \sin(\theta) \cos(\theta)}{1 - (1 - 2 \sin^2(\theta))} = -\frac{\cos(\theta)}{\sin(\theta)} = -\cot(\theta) \\ \text{RHS}_2 &= -\frac{\sin(90^\circ - \theta)}{\cos(90^\circ - \theta)} = -\frac{\cos(\theta)}{\sin(\theta)} = -\cot(\theta) \end{split} \end{equation*}

    \[\text{RHS}_1 = \text{RHS}_2 \Rightarrow \frac{dy}{dx} = -\tan(A\widehat{T}B)\]

Therefore, \overline{TB} is the tangent to the cycloid at P, meaning \beta is in fact the same angle as \theta.

Fulfilling The Least-Time Condition

Since \beta = \theta, showing the cycloid’s fulfilment of the condition \frac{\sin(\theta)}{v} = k can be done through basic angle-chasing and geometry.

Fig 15 | The triangle \DeltaTAP

Because \triangle TXP and \triangle AXP are both right triangles, we have:

    \[X\widehat{A}P = 180^\circ - (A\widehat{T}P + 90^\circ) = 180^\circ - (90^\circ - \theta + 90^\circ) = \theta\]

Hence, since \overline{XP} = -y is the vertical height fallen by the ball:

    \[\sin(X\widehat{A}P) = \sin(\theta) = \frac{\overline{XP}}{\overline{AP}} \Rightarrow \overline{AP} = \frac{\overline{XP}}{\sin(\theta)} = \frac{-y}{\sin(\theta)}\]

Similarly, for A\widehat{B}T:

    \[\sin(A\widehat{B}T) = \sin(\theta) = \frac{\overline{AP}}{\overline{AB}} = \frac{-y}{2R\sin(\theta)} \Rightarrow \frac{\sin^2(\theta)}{-y} = \frac{1}{2R}\]

Recalling (4.2.3) and undoing the substitutions made in the previous section:

This yields:

    \[\frac{\sin^2(\theta)}{-y} = \frac{2g\sin^2(\theta)}{v^2} = \frac{1}{2R} \Rightarrow \frac{\sin^2(\theta)}{v^2} = \frac{1}{4gR}\]

    \[\therefore \frac{\sin(\theta)}{v} = \frac{1}{2\sqrt{gR}} = k\]

By the geometry of the tangent and generating circle, therefore, the least-time condition identified in (4.2.1) is satisfied at every point along the inverted cycloid — which has been demonstrated without resorting to tedious integration.

Conclusion

Initial investigation of motion along one and two line segments provided the insights necessary to minimise descent time along n connected lines. In the continuous limit, this condition implied a differential equation, the solution of which describes an inverted cycloid. Geometric analysis7 confirmed this property of the cycloid, and had the advantage of requiring less tedious algebra and the assumption of `smoothness’ — though, without the machinery of variational calculus, neither approach can incorporate the initial and final boundary conditions.

Therefore, for a point mass, the curve which minimises the time of descent between two points is an inverted cycloid, which can be represented parametrically by the equations

    \[x = R\alpha - R\sin(\alpha)\]


    \[y = -R + R\cos(\alpha)\]

which have derived in an entirely `ground-up’ manner, starting with basic mechanics and proceeding using methods — employing an inductive proof, contradiction, simple integration methods, and angle-chasing in the geometric approach — which should be entirely familiar to high school students.

Important Notation

References

  1. J. Bernoulli. Solutio problematis a se in Actis 1696, propositi, de invenienda linea brachystochrona. Acta Eruditorum. pg. 206–211, 1697 [] []
  2. M. A. Lerma. A simple derivation of the equation for the brachistochrone curve. Northwestern University, 2023. []
  3. G. Lawlor. A new minimization proof for the brachistochrone. The American Mathematical Monthly. Vol 103, pg. 242–249, 1996 []
  4. H. Erlichson, Johann Bernoulli’s brachistochrone solution using Fermat’s principle of least time, European Journal of Physics, Vol. 20, pg. 299–304, 1999. []
  5. R. T. Boute. The brachistochrone problem solved geometrically: a very elementary approach. Mathematics Magazine. Vol 85, pg. 193–199, 2012 [] []
  6. G. Brookfield. Yet another elementary solution of the brachistochrone problem. Mathematics Magazine. Vol 83, pg. 104–110, 2010 []
  7. S. Hayes, The Brachistochrone problem, M500 Society, Vol 291, pg. 1–7, 2019. [] [] [] []
  8. A. Tan, A. K. Chilvery, M. Dokhanian. Dynamical variables in brachistochrone problem. Lat. Am. J. Phys. Educ. Vol 6, pg. 196–199, 2012. []
  9. Y. Nishiyama. The brachistochrone curve: the problem of quickest descent. International Journal of Pure and Applied Mathematics. Vol. 82, pg. 409–419, 2013. [] []
  10. S. Mertens, S. Mingramm. Brachistochrones with loose ends. European Journal of Physics. Vol 29, pg. 795–802, 2008. []
  11. S. R. Bistafa. Euler’s navigation variational problem. Euleriana. Vol 2, pg. 131–142, 2022. []
  12. G. L. Silva, M. A. D. Pereira. Infinite horizon problems in the calculus of variations: the role of transformations with an application to the brachistochrone problem. Set-Valued and Variational Analysis. Vol 27, pg. 201–224, 2018 []
  13. C. Criado, N. Álamo. Solving the brachistochrone and other variational problems with soap films. American Journal of Physics. Vol 78, pg. 1400–1405, 2010. []
  14. S. Gómez-Aíza, R. W. Gómez, V. Marquina. A simplified approach to the brachistochrone problem. European Journal of Physics. Vol 27, pg. 1091–1098, 2006. []
  15. M. A. Lerma. A simple derivation of the equation for the brachistochrone curve. Northwestern University, 2023 []
  16. H. Erlichson, Johann Bernoulli’s brachistochrone solution using Fermat’s principle of least time, European Journal of Physics, Vol. 20, pg. 299–304, 1999 []
  17. G. Lawlor. A new minimization proof for the brachistochrone. The American Mathematical Monthly. Vol 103, pg. 242–249, 1996. []
  18. H. W. Broer. Bernoulli’s light ray solution of the brachistochrone problem through Hamilton’s eyes. International Journal of Bifurcation and Chaos. Vol 24, 2014. []
  19. H. J. Sussmann, J. C. Willems. Contemporary trends in nonlinear geometric control theory and its applications, pg. 113–166, 2002. []
  20. K. Kim, J.-H. Ee, K. Kim, U.-R. Kim, J. Lee. Mechanical Snell’s law. Journal of the Korean Physical Society. Vol 76, pg. 281–290, 2020. []
  21. M. G. Katz, D. M. Schaps, S. Shnider. Almost equal: the method of adequality from Diophantus to Fermat and beyond. Perspectives on Science. Vol 21, pg. 283–314, 2012. []
  22. P. G. Ciarlet, C. Mardare. On the brachistochrone problem. Communications in Mathematical Analysis and Applications. Vol 1, pg. 213–240, 2022. []
  23. J. L. Troutman. Variational calculus and optimal control: Optimization with elementary convexity, 2nd ed. pg. 66–68, 1996. []
  24. P. Kosmol. Bemerkungen zur brachistochrone. Abhandlungen aus dem Mathematischen Seminar der Universität Hamburg. Vol 54, pg. 91–94, 1984. []

LEAVE A REPLY

Please enter your comment!
Please enter your name here