Swing-up control of double-inverted pendulum systems

. This article deals with presenting a new swing-up control approach of a double-inverted pendulum on a trolley. The dynamic model of the double-inverted pendulum is derived and linearized. Two different linearization approaches are used: ﬁrst, the traditional Taylor’s series approach and, second, using partial linearization. A state feedback control algorithm has been implemented based on the linearized model from Taylor’s series. Furthermore, a method for swinging up the pendulum to the inversion position from rest (swing-up) has been presented. The design and implementation of the swing-up function of the pendulum are implemented using the partial linearized model. The swing-up control procedure depends on using the feedforward–feedback controllers’ combination to transfer the pendulums from the downward to the upward position. The time-variant controller gain is used for the sake of the swing-up control procedure. The performances of these algorithms are shown in this paper through simulations.


Introduction
Non-linear, unstable, or underactuated systems are generally very difficult to control.Therefore, they are considered to be a challenge and are the subject of many technical reports, student projects, and academic papers.Their goal is not always to build a practically useful device but rather to develop or improve existing control algorithms, either by simulation or even experimentally to verify their applicability in general.The double-inverted pendulum model consists of several parts.A cart moves along a linear track, and two pendulums are connected to the carriage and each other by a rotational linkage (Fig. 1).The input to the system is the torque of a DC motor, which is transmitted by a toothed belt as a force to the carriage.The system has one challenging equilibrium position in which the system can be stabilized, the upper (inverse) (see Fig. 1a-c).Furthermore, three configurations of the double-inverted pendulum, which are upperlower, lower-upper, and lower, are also considered as equilibrium positions.the lower equilibrium is always reached by the system (with the damping element) in finite time; control is needed to maintain the upper-lower, lower-upper, and inverse configurations in real conditions.The stabilization of the double-inverse pendulum is further understood in this paper as a regulation to maintain its inverse position.
The next task of the control program is to be the so-called swing-up.This term refers to the realization of such a movement of the carriage that brings the system from the lower equilibrium position to the inverse position.The situation is indicated in Fig. 1c.The control requirement is primarily for the robustness and repeatability of the swing-up.There are several published articles with different methods addressing the swing-up problem of a double-inverted pendulum.However, in the vast majority of cases, these are only simulations that have not been experimentally verified on a real system.
A mathematical model of this system was derived based on the Lagrangian mechanics; the dynamic model is discretized, and then the Laguerre series is implemented in model predictive control technology to trace the control signal for the system (Qian et al., 2011).An RNA genetic algorithm with fuzzy logic is used to control the pendulum system, where the fuzzy logic controller can improve the performance of the controller by using the RNA genetic algorithm to find certain optimal membership functions (Sun et al., 2015).The technical report compares the linear quadratic regulator, the state-dependent Riccati equation (SDRE), and the use of neural networks (NNs) and concludes that the NNs have a limited capability to improve the SDRE performance (Bogdanov et al., 2004).Adaptive sliding-mode control in combination with a fuzzy neural network is used to control a double-inverted pendulum.The fuzzy neural network is designed as a system controller, and the adaptive sliding mode is designed to carry out the disturbance problem (Mon and Lin, 2014).
In the previous literature, the dynamic model of the double-inverted pendulum system is linearized around an operating point to design a linear controller.An alternative to the above dynamic linearization is partial feedback linearization -splitting the generalized coordinates into a regularization whose dimensions are given by the number of inputs, and variables consider only zero dynamics (Hedrick and Girard, 2010;Neusser and Valášek, 2013).This path of linearization is used to design the swing-up function of the double-inverted pendulum (Hedrick and Girard, 2010).A more detailed description of the method can be found in the swing-up control practical example (Neusser and Valášek, 2013).Swing-up control of the double-inverted system was proposed to separate pendulums and to control each one distinctly (Henmi et al., 2014).A non-linear model predictive control is used to build up a control algorithm for swingup motion (Jaiwat and Ohtsuka, 2014).A method for controlling the energy of the system with partial linearization is presented, and passivity-based control is utilized in the work (Zhong and Rock, 2001).The solution of the boundary value problem with free parameters is used to generate the control approach (Graichen et al., 2007) The motivation of this work was mainly to demonstrate the theoretical swing-up control procedures on a model of a double-inverted pendulum.Indeed, there are not many publications that deal with the application of the proposed algorithms on a real mechanism.The paper focuses on the theoretical basis for the following part.It consists of an explanation of the terms used and a search of the studied area in terms of stabilization and swing-up.It also includes the derivation of the equations of motion and the presentation of the existing double-pendulum model.Furthermore, this study serves to apply a linear quadratic controller and a Kalman filter and lastly to implement the swing-up function.The boundary value problem is solved to generate the trajectory for the swing-up motion of the pendulums.The partial linear realization method is used to linearize the system dynamically during the swing-up process.Dynamic input-output decoupling is used to keep the system in a stable position after the pendulums reach the upper unstable position (Qian et al., 2011).
The rest of the paper is organized as follows: in the following section, the mathematical model is derived; next, the stabilization control is presented; after that, the swing-up procedure is detailed; later, the simulation of the work is addressed, and the article conclusions are shown in the last section.

Mathematical modeling
Figure 2 schematically shows the double-inverted pendulum model.The meaning of the individual variables is evident from the figure.The system has 3 degrees of freedom -the sliding motion of the carriage and the rotational motions of the two pendulums.A vector of state variables is constructed as the displacement of the carriage or the rotation of the pendulums and the velocity of the carriage or the angular velocity of the pendulums: T . (1) For input u, the problem of obtaining a mathematical model is a search for an equation of motion of the following form: Lagrange's equations of the second kind were used for this purpose.The mathematical model of the double-inverse pendulum can be expressed as the sum of or difference between the partial derivatives of the mechanical energies according to the respective independent variables and their derivatives: where E K and E P are the kinetic and the potential energy.D denotes the dissipative component, W e denotes the work of external forces, and P e denotes their power.For individual bodies, the kinetic, potential, and dissipative energies and external forces can be expressed (Bogdanov, 2004).
The velocity energies of the centers of gravity of the pendulums can be obtained from the time derivative of their position vector.
Potential energy is as follows: The dissipative component is as follows: In Eqs. ( 4)-( 8), the θ j represents the motion variable; m j is the mass; b j is the viscous friction constant; and j = 0, 1, 2 stands for cart and first and second pendulum, respectively.I 0 is the inertia of the pulley, and l 1,2 are the lengths of the pendulums.
The only input to the system is the moment µ on the motor shaft.However, for better understanding, the equations were derived for the force on the carriage F .There is a simple relationship of direct proportionality between these two quantities: where R 0 is the radius of the pulley.Then the external force output is given by Substituting the above equations into Eq.( 2) gives the resulting equation of motion, which can be written in standard matrix form: where the vector q = [θ 0 , θ 1 , θ 2 ]; the introduction of matrices is not listed here for the sake of abbreviation.The equation of motion in the form of Eq. ( 5) can be expressed as follows: where 0 represents the zero matrix, and I represents the unit matrix of the corresponding dimensions.
To design a linear controller, the non-linear equation of motion must first be linearized.It is common to approximate around the nominal operating point (θ n , u n ) by the first terms of the Taylor series (Franklin et al., 2002).The resulting equation of motion represented by appropriate state vector and state space matrices is as follow: 3 Stabilization control of the system In this section, attention is paid to the possibilities of stabilizing the double-inverted pendulum.The LQR and feedback linearization as the theoretical basis for the following system of the swing-up search have achieved detailed description.

Linear quadratic regulator (LQR)
The objective of this method is to design the optimal control of a linear system given the magnitude of the active intervention u and the deviation of the states x from the zero (desired) value in time.Practically, the objective is to minimize the cost function J , which is given by For the state feedback law, we have the following: For discrete systems, the integral in Eq. ( 6) is replaced by the summation.The matrices Q c and R c correspond to the weights of the states and inputs, respectively (Franklin et al., 2002).K c is the control matrix that can be subsequently used for stabilization and is given by the solution of the associated Riccati equation.LQR is a widely used method, and there is a wealth of documentation on it.Implementation-wise, the simplest modification of LQR for non-linear systems is the state-dependent Riccati equation (SDRE).The principle of the method is to linearize the model around the current state for each time instant and then compute the optimal control matrix.However, this method places greater demands on computational power.In contrast to the usual linear function approximation, the method used does not neglect the non-linear terms and works even outside the vicinity of the working point.There are two principles utilized in combination in this work's input state and input-output linearization.
The necessary algebraic operations are generally not trivial.For the sake of scope, the system is restricted to the single input, single output (SISO) described by the following: The input state linearization consists of finding a suitable transformation of the states T and input u such that The role of regulation is transferred to system control: The linearization conditions, detailed description, and examples are exhaustively explained in Hedrick and Girard (2010) and Graichen et al. (2007).
We assume that the outputs of the considered system y are given by the following function: The principle of the input-output linearization is to find the direct dependence of the outputs on the input by successively deriving the function h with regard to time until the following dependence appears: and so on until, for the kth derivative, the input term is nonzero (Aguiar, 2011).The prescription of the new input u is given by where v is equal to y(k).The transformed systems prescribed by Eqs.(22,23) are already completely linear -they can therefore be controlled by the linear method (LQR, array placement).A A schematic of the closed-loop control is shown in Fig. 3.For systems with n inputs and m outputs (MIMO), where n = m, the procedure is more complicated (we refer to static inputoutput decoupling), but the basic principles remain the same (Hedrick and Girard, 2010;Henson and Seborg, 1997) and lead to time-invariant feedback control.Also, A Kalman filter, which is a very useful tool that extracts the best possible estimate of all the states of the system (even if not all of them are measured) from imperfect knowledge of the model and inaccurate measurements (Nise, 2020), is used in this work.This is a stochastic observerit assumes that the quantities are random with a Gaussian distribution and works with their mean and variance (uncertainty).Tuning of the algorithm consists of an appropriate choice of the matrices Q o and R o , which introduce the covariance of process and measurement noise into the calculation.At each stage, the state estimation and covariance matrix P o are adjusted, which carries information about the uncertainty of the states and their correlation with each other.

Swing-up control
The key to planning a swing-up is finding a suitable trajectory for the carriage and pendulums so that the positions and velocities of the carriage and pendulums are zero when the swing-up is complete.However, whatever the method of obtaining the swing-up trajectory (or the necessary input for swing-up), in terms of structure, it is possible to divide the control into forward or backward.Feedforward control can only be used if the behavior of the double-inverted pendulum is well known.A previously calculated input is applied to the system, and a certain output is expected.However, the double-inverse pendulum is extremely sensitive to initial conditions.Even a small deviation from the calculated trajectory can cause a failed swing-up.A possible solution is to add feedback control.If we denote by u f the input given by as the forward control and that given by u b as the backward control, then the resultant is given by their sum (Fig. 4).

Boundary value problem (BVP) with free parameters
Finding a trajectory for the swing-up function is a case of the boundary value problem.The equation of motion (1) is a system of ordinary differential equations (ODEs), and the boundary conditions are as follows: where t is the time, and T is the duration of the swing.For the 6 equations and 12 conditions, the associated boundary value problem is overdetermined.In order to solve it, Qian et al. (2011) propose defining the cart trajectory Y (t) by the cosine series (13) and thus adding the necessary number of free p = p 1 p 2 p 3 p 4 to the equation of motion.
The terms a 0 = −(p 1 +p 3 ) and a 1 = −(p 2 +p 4 ) are obtained by setting Y (0, p) = Y (T , p) = 0, which appears by deriving Eq. ( 13) by time.The boundary value problem can be solved by a suitable numerical method.If the deviation of the actual state values from the nominal trajectory θ * and input u * is sufficiently small, the system can be described by a linear time-dependent state equation (Zhong and Rock, 2001): where the matrices A and B are given by the linearization of Eq. ( 12) (see "Feedback linearization" section) along the nominal trajectory and input.And for such a system, we design a linear controller at each time instant: where the control matrix K is computed forward.Its calculation therefore does not burden the processor on which the swing-up and subsequent stabilization program runs.The resulting forward and backward control actions are given by the following:

Partial linearization and feedforward
By studying the natural motion of a system (without control), some useful knowledge can be gained.In order to calculate a suitable elevation trajectory, the natural frequencies of the system must be found.This is based on the findings of the feedback linearization (FBL).
For the sake of partial linearization, the generalized coordinates of the double-inverse pendulum and the variable vector q are divided into two parts; q x denotes the controlled part (the carriage), and q θ denotes the rest with zero dynamics (the pendulum).With the proposed decomposition, Eq. ( 5) is decomposed into the following: and the matrices M, B, and K become , where 0 represents the zero vector, and the numbers in parentheses indicate the dimension of the matrix -so, for example, From Eq. ( 30), qθ is expressed and substituted into Eq.( 29).If the output of the system is y = q x , then the feedback linearization of the obtained equation gives the following: where, to simplify the notation, W = −M xθ M −1 θθ .The principle further consists of finding an input v for which the boundary conditions in Eq. ( 24) hold.In general, it is given by the sum of two parts.The first makes the cart move a certain distance from the initial state at time T .However, since the end position of the cart is supposed to be identical to the initial position, i.e., θ 0 (0) = θ 0 (T ) = 0, the mentioned component will be zero for this case.The second component does not change the final position of the cart, https://doi.org/10.5194/ms-15-47-2024 Mech.Sci., 15, 47-54, 2024 but due to the entanglement via zero dynamics, it will cause the pendulums to swing.It is proposed in the following form (Hedrick and Girard, 2010): The amplitudes A i are relative to the change in the potential energy of the system: However, the exact numbers, as well as the phase shifts ϕ i , are the result of an optimization process to obtain boundary conditions for the q θ coordinates as well.If there are natural numbers K i , then the following holds (Neusser and Valášek, 2013):

Simulations
The calculations of the linearized model and the gains of the controller and observer are shown in Fig. 5.In order to avoid the errors as much as possible, after deriving Eq. ( 5), all further modifications were performed using the Symbolic Math Toolbox in MATLAB.In this way, the equation of motion (12) and its linearized form in the form of state space matrices A and B were obtained.These are complex relations with a considerable number of terms.The system was further simulated in Simulink environment with an ode4 (Runge-Kutta) solver with a constant time step of 0.001 s.The verification of the equation of motion (12) was successful.It faithfully describes the model under consideration.This is one of the basic prerequisites for the design of a successful control algorithm.The parameters used in the simulation are shown in Table 1.

Stabilization control simulation
In Sect.3.1 in this article, the method of obtaining the linear equations of the system described by the matrices A and B was explained.The optimal control multipliers K were calculated based on the model and the given weight matrices Q c and R c .An extended Kalman filter was used as an observer to perform the correction

Swing-up control simulation
In this section, the simulation results of swing-up trajectories and the subsequent application of forward and reverse control are addressed.The MATLAB numerical solver bvp4c was used to find suitable swing-up trajectories according to the method described in Sect.4.1.For the double-pendulum model, where the input is the acceleration of the trolley, boundary conditions (24) were considered, except for defining the end position of the cart and its velocity, for which the boundary value is guaranteed by choosing the input as a cosine series (13).A time-dependent linear controller was designed along the nominal swing-up trajectory (with a time step of 10 ms and subsequent linear interpolation between the calculated points).It turned out that, over a certain time interval, the control multiples undergo abrupt changes in magnitude and sign.Since the linear controller was derived from a double-pendulum model where the input is directly the ac-  celeration of the carriage, its proposed intervention also has an acceleration dimension.Therefore, it has to be converted to a force based on the current states according to Eq. ( 32).In view of this and the fact that the system is often at the limit of drivability or loses it completely at these moments, it was decided to take the reverse steering out of action for a while.In Fig. 7, the swing-up state behavior is shown.It is clear that the control approach successfully followed the derived trajectory and moved the system from the downward to upward position in a short amount of time.All the states of the system are stable and follow the desired values.The actual and the estimated states of the system converge perfectly, which proves the effectiveness of the dynamic inputoutput decoupling approach.The abrupt disengagement and engagement of the reverse control caused a disproportionately large impulse response with the eventual consequence of loss of stability and a failed swing-up.Therefore, around the critical interval, the multiples K c are linearly decreased or increased, as presented in Fig. 8.For the simulation, it is useful to set the duration of the linear region to 0.1 s.If the pendulum rotation approaches the inverse position during the swing-up, it is switched to LQG stabilization.
Trajectories with a duration of 1.85 s were selected for simulation purposes.The progress of such a swing-up is illustrated by the video sequence at the following link: https: //youtu.be/zr0fzmVc9Ao(last access: 18 January 2024).In Fig. 9, the process of swing-up control is illustrated by following the pendulums during the motion.The animation shows that the swing-up process effectively finished based on the designed controller.

Conclusions
A model of the double-inverted pendulum was obtained, including the assumption of all necessary parameters.Also, thanks to this, a robust and long-lasting stabilization of both pendulums in the inverted position was implemented.This is implemented by the LQR method.The system states are estimated by a non-linear extended Kalman filter.
By solving the boundary value problem with the numerical proper solver, nominal trajectories for the swing-up function have been obtained.The required cart acceleration was implemented by a controller with 2 degrees of freedom -a forward PID control of the cart speed and a feedback LQR controller, whose controller gains were calculated off-line along the nominal trajectories and input.Fortunately, the planned swing-up was achieved despite the chaotic behavior of the real system.The algorithm proved its worth in simulation; a complete swing-up was still realized several times.https://doi.org/10.5194/ms-15-47-2024 Mech.Sci., 15, 47-54, 2024

Figure 2 .
Figure 2. Schematic of the double-inverted pendulum model.

Figure 3 .
Figure 3. Block diagram of regulation with input-output feedback linearization.

Figure 4 .
Figure 4. Block diagram of feedforward and feedback control for swing-up.

Figure 5 .
Figure 5. Simulation procedure in MATLAB and Simulink.

Figure 6 .
Figure 6.Stabilization control of the system using LQR controller.

Figure 8 .
Figure 8. Controller gain for swing-up control action.

Figure 9 .
Figure 9. Simplified graphic display of the swing-up process.

Table 1 .
Parameters values used in the simulation.