YADPF: Yet another Dynamic Programming Function A generic implementation of deterministic backward dynamic programming in MATLAB Auralius ManurungME - Universitas Pertamina Version 2021Table of Contents 1. Bertsekas' Two-Oven Problem 2. A Moving Mass (Double Integrator) 3. Optimal Storage Strategy (Single Integrator Problem) 4. Lotka-Volterra Fishery 5. Dubin's Car 6. Finding the Shortest Path on a Terrain 7. Sutton's Mountain Car 8. Two-Tank Problem 9. Piecewise Hanging Mass-Spring System 10. Time-Optimal van der Pol Equation with an Input 11. Time-Optimal Stabilization of an F8 Aircraft 12. References Preface This current version of the notes is not yet complete. Please email me at auralius.manurung@ieee.org with any corrections or comments. 1. Bertsekas' Two-Oven Problem1A certain material is passed through a sequence of N ovens as shown in the figure below:
Drawing 1:The schematic of the two-oven problem.
Where:x0 : initial temperature of the materialxi : temperature of the material at the exit of oven i.ui: prevailing temperature of oven i. The system's model is described as follows:xk+1=(1-a)xk+auk,k=0,1where a is a given scalar from the interval (0,1) In order to keep the the final temperature x2=T, the total incurring cost function is as follows:J=r(x2-T)2+u02+u12where r is any positive real number that will bring x2 close to T. Before we can apply dynamic programming algorithm to solve the two-oven problem above, we need to first formulate the problem in (1) and (2) into a more formal form, which is as follows:JN=r(xN-T)2Jk=JN+j=kN-1uj2,N=2,k=0,,N-1J2*=minx2r(x2-T)2J1*=minu1[J2*+u12]J0*=minu0,u1[J2+u12+u02]=minu0[J1*+u02] The following computational grids are used for the dynamic programming algorithm implementation:x1,x2Xk(xk)={0,1,,1000}ukUk(uk)={0,1,,1000} Here, we take r=100, a=0.7, T=300, and x0=30. The optimization problem in (3) to (6) can actually be solved analytically. However, we will solve it numerically by using dynamic programming algorithm in YAPDF software package. The MATLAB implementation is shown in the code listing below. Besides the main function, we need to provide three other functions: state_update_fn, stage_cost_fn, and terminal_cost_fn. These three functions describe system dynamics, stage cost, and terminal cost, respectively. We must make sure that only element-wise operation occurs in these functions, especially for functions that work differently for matrices, such as: multiplication, power, exp, etc.
1% Initial and terminal states2x0 = 30;3xf = 300;4 5% Setup the states and the inputs6X = 0 : 0.1 : 1000;7U = 0 : 0.1 : 1000;8 9% Build the structure10dpf.states = {X};11dpf.inputs = {U};12dpf.T_ocp = 1;13dpf.T_dyn = 1;14dpf.n_horizon = 3;15dpf.state_update_fn = @state_update_fn;16dpf.stage_cost_fn = @stage_cost_fn;17dpf.terminal_cost_fn = @terminal_cost_fn;18 19% Initiate and run the solver, do forwar tracing and plot the results20dpf = yadpf_solve(dpf);21dpf = yadpf_trace(dpf, x0);22yadpf_plot(dpf, '-');23 24% Reachability plot25yadpf_rplot(dpf, xf, 10);26 27%% The state update function28function X = state_update_fn(X, U, ~)29a = 0.7;30X{1} = (1 - a).*X{1} + a * U{1};31end32 33%% The stage cost function34function J = stage_cost_fn(X, U, k, ~)35J = U{1}.^2;36end37 38%% The terminal cost function39function J = terminal_cost_fn(X)40xf = 300; % Desired terminal state41r = 1000; % Weighting factor42J = r * (X{1}-xf).^2;43end
The outputs of the above code listing are shown in the following Figure 1 and Figure 2. Notice that in YADPF, there are two sampling period: T_ocp and T_dyn. T_ocp is the sampling period for the optimal control problem (OCP) which defines the temporal discretization of the input. T_dyn is the sampling period for the dynamic simulation. T_dyn must be greater than or equal to T_ocp.
Figure 1:The optimal state and input variable for the two-oven problem
Figure 2:Backward reachability plot of the two-oven problem that shows all possible state evolution to reach a certain terminal condition.
2. A Moving Mass (Double Integrator)Given the following system:
Drawing 2:A mass-damper system is exposed to an external force.
Find an optimum set of f that brings m to xf within Tf seconds. The desired terminal velocity is v(Tf)=0 and the constrain for f is -1f(t)1. The equation of motion of the system described in the figure above is:a(t)=f(t)m-bmv(t) Rewriting (7) in its state space form:[v(t)a(t)]=[010-bm][x(t)v(t)]+[01m]g(t) Discretizing (8) gives us:[vk+1ak+1]=[1𝛥t01-bm𝛥t][vkak]+[0𝛥tm]fkWe define the following objective functions:Minimization of the inputJ1=k=0N-1fk2Minimization of the terminal position errorJ2=(xf-xN)2Minimization of the terminal velocityJ3=(vf-vN)2where N is the horizon length. The total objective function can then be written as:J=𝛼1J1+𝛼2J2+𝛼3J3where 𝛼1, 𝛼2, and 𝛼3 are any positive real numbers. Finally, the formulated problem is expressed as follows:minfkk=0,,N-1(𝛼1k=0N-1fk2+𝛼2(xf-xN)2+𝛼3(vf-vN)2)such that:[vk+1ak+1]=[1𝛥t01-bm𝛥t][vkak]+[0𝛥tm]fk An asterisk symbol represents an optimal value. To use dynamic programming to solve (14), all states and inputs must be discretized first. Thus, we add more information to the problem and rewrite it as follows:minfk(𝛼1k=0N-1f(k)2+𝛼2(xf-x(N))2+𝛼3(vf-v(N))2)such that:[vk+1ak+1]=[1𝛥t01-bm𝛥t][vkak]+[0𝛥tm]fkfk=Fk{-4,-3.9,-3.8,,3.8,3.9,4}xk=Xk{0,0.001,0.002,,1}vk=Vk{0,0.001,0.002,,1} The following code listing shows the implementation of backward dynamic programming by using the YADPF.
1% Setup the states and the inputs2X = 0 : 0.001 : 1; % Position3V = 0 : 0.001 : 1; % Velocity4F = -4 : 0.1 : 4; % Applied force5 6% Setup the horizon7Tf = 1; % 1 second8T_ocp = 0.1; % Temporal discretization step9t = 0 : T_ocp : Tf;10 11dpf.states = {X, V};12dpf.inputs = {F};13dpf.T_ocp = T_ocp;14dpf.T_dyn = 0.01; % Time step for the dynamic simulation 15dpf.n_horizon = length(t);16dpf.state_update_fn = @state_update_fn;17dpf.stage_cost_fn = @stage_cost_fn;18dpf.terminal_cost_fn = @terminal_cost_fn;19 20% Initiate and run the solver, do forward tracing for the given initial 21% condition and plot the results22dpf = yadpf_solve(dpf);23dpf = yadpf_trace(dpf, [0 0]); % Initial state: [0 0]24yadpf_plot(dpf, '-');25 26% Optional: draw the reachability plot27yadpf_rplot(dpf, [0.5 0], 0.1);28 29%% The state unpdate funtion 30function X = state_update_fn(X, F, dt)31m = 1; % Mass32b = 0.1; % Damping coefficient33 34X{1} = X{1} + dt*X{2};35X{2} = X{2} - b/m*dt.*X{2} + dt/m.*F{1};36end37 38%% The stage cost function39function J = stage_cost_fn(X, F, k, dt)40J = dt*F{1}.^2;41end42 43%% The terminal cost function44function J = terminal_cost_fn(X)45xf = [0.5 0];46 47% Control gains48r1 = 1000;49r2 = 100;50 51J = r1*(X{1}-xf(1)).^2 + r2*(X{2}-xf(2)).^2;52end
The results are shown as follows:
Figure 3:The optimal states and input for the mass-damper optimal control problem.
3. Optimal Storage Strategy (Single Integrator Problem)This example is taken from
(Jonsson, 2010)
. The stock grow equation is defined as follows:
x.(t)=u(t),x(0)=0where x is the stock size and u is the production rate. The goal is to have at least A units of commodity in storage at some prescribed time tf. This goal should be obtained at minimum cost. If we let r be production cost growth rate and c storage cost per time unit, then the total cost becomes:J=0tf(ukert+cx(t))dtIn discrete form, the optimal control problem can be written as follows:minukk=0N-1(ukerkTs+cxk)TsStage cost+L(xN-A)2Terminal costsuch thatx(k+1)=u(k)Ts+ukxk=Xk{0,0.1,,500}uk=Uk{0,0.1,,10} Let us define r=0.1, c=10, Ts=1, A=200, and L=100. Additionally, notice that we add an additional terminal cost in (18) in order to meet the desired terminal state. Initially we set x0=0. Now, let us use the reusable dynamic programming algorithm function provided in YADPF.
1% Setup the states and the inputs2X = 0 : 0.1 : 500;3U = 0 : 0.1 : 10;4 5% Setup the horizon6T_ocp = 1; % Every 1 day7T_vect = 0 : T_ocp : 30; % Day-0 to day-308 9x0 = 0; % Initial state10xf = 200; % Terminal state11 12dpf.states = {X};13dpf.inputs = {U};14dpf.T_ocp = T_ocp;15dpf.T_dyn = T_ocp;16dpf.n_horizon = n_horizon;17dpf.state_update_fn = @state_update_fn;18dpf.stage_cost_fn = @stage_cost_fn;19dpf.terminal_cost_fn = @terminal_cost_fn;20 21% Initiate and run the solver, do forwar tracing and plot the results22dpf = yadpf_solve(dpf);23dpf = yadpf_trace(dpf, x0);24yadpf_plot(dpf, '-d');25 26% Reachability plot27yadpf_rplot(dpf, xf, 10);28 29%% State update function30function X = state_update_fn(X, U, dt)31X{1} = dt*U{1} + X{1}; 32end33 34%% Stage cost function35function J = stage_cost_fn(X, U, k, dt)36r = 0.1; % production cost growth rate37c = 10; % storage cost per time unit38J = dt*exp(r*k*dt)*U{1} + c*X{1}*dt;39end40 41%% Terminal cost function42function J = terminal_cost_fn(X)43xf = 200; 44L = 100;45J = L .* (xf-X{1}).^2;46end
Running the code listing above gives us the following results.
Figure 4:The optimization results for the optimal storage strategy problem.
When the state variable is part of the cost function, our problem becomes a time optimal control problem. This is because whenever the state variable is non-zero, the cost is increased. Thus, it makes sense to delay the production such that we meet the terminal state at the end of the horizon. Since it is a time-optimal control problem, we notice that there is a bang-bang action in the control input. The control input is switched to its allowable maximum value.
Figure 5:Backward reachability plot of a targeted terminal state.
4. Lotka-Volterra FisheryThis problem is taken from
(Sundstrom & Guzzella, 2009)
, with some modifications. Given the following discrete dynamic system:
xk+1=f(xk,uk)+xk,k=0,1,,N-1wheref(xk,uk)=Ts(2100(xk-xk21000)-uk)where N is the horizon length, xk is the amount of fishes in the lake and uk is the fishing rate. Time unit is in "days". The time horizon is set to 200 days with a sampling of Ts=0.5 days. The goal is to maximize the amount of fishes caught over a fixed period of time. By he 200th day, at least 750 fish are still in the lake. Thus, the following objective function is used:J=k=1N-1(-uk)Let us reformulate the problem above as follow:minuk{k=1N-1(-uk)}such that:xk+1=Ts(2100(xk-xk21000)-uk)+xkxk{0,1,,1000}uk{0,0.1,1,,10} MATLAB implementation of (22) by using YADPF is as follows.
% Initial and terminal statex0 = 250;xf = 750; % Setup the states and the inputsX = 0 : 0.1 : 1000;U = 0 : 1 : 10; % Setup the horizonTf = 200; % 200 daysT_ocp = 0.2; % Every half dayt = 0 : T_ocp : Tf; dpf.states = {X};dpf.inputs = {U};dpf.T_ocp = T_ocp;dpf.T_dyn = 0.01;dpf.n_horizon = length(t);dpf.state_update_fn = @state_update_fn;dpf.stage_cost_fn = @stage_cost_fn;dpf.terminal_cost_fn = @terminal_cost_fn; % Initiate and run the solver, do forward tracing and plot the resultsdpf = yadpf_solve(dpf);dpf = yadpf_trace(dpf, x0);yadpf_plot(dpf, '-'); % Draw the reachability plotyadpf_rplot(dpf, xf, 5); %% The state update functionfunction X = state_update_fn(X, U, dt)X{1} = X{1} + dt * (0.02 * (X{1}-X{1}.^2./ 1000) - U{1});end %% The stage cost functionfunction J = stage_cost_fn(X, U, k, dt)J = -dt .* U{1};end %% The terminal cost functionfunction J = terminal_cost_fn(X)xf = 750; % Terminal stater = 1000;J = r * (xf-X{1}).^2;end
The optimization results are shown in the following figures.
Figure 6:The optimal input and the resulting state values for the Lotka-Volterra fishery problem generated by the dynamic programming.
Figure 7:The reachable and unreachable states for the Lotka-Volterra fishery problem in order to reach a given terminal state.
5. Dubin's Car The kinematics of a car is described by the following differential equations:x.(t)=vcos(𝜃(t))y.(t)=vsin(𝜃(t))𝜃.(t)=𝜔(t)where: 𝜔 is the angular velocity of the car and |𝜔|v/R R is the lower bound for the turning radius (x,y) are the coordinates of the reference point of the car 𝜃 is the direction (heading) of the car and v is the constant speed of the car
Drawing 3:Schematic and parameters of a Dubins' car
Find the optimum path from a given initial point to a terminal point. The car must arrive at the terminal point at time tf. Thus, the objective function is defined as:J=k=0N-1𝜔k2+(xN-A)2+(yN-B)2+(𝜃N-C)2Discretizing (23) gives us:xk+1=Tsvcos(𝜃k)+xkyk+1=Tsvsin(𝜃k)+yk𝜃k+1=Ts𝜔k+𝜃k In the Dubin's car problem, typically we are looking for a time optimal solution. On the other hand, backward dynamic programming has a predefined fixed horizon. Thus, we must reach the terminal state before the final horizon and stop updating the state variables as soon as the terminal state is reached. However, in the Dubin's car problem, the car has constant speed and it can not stop. In order to implement the Dubin's car problem in MATLAB by using YAPDF, we formalize the problem as follows:min𝜔k𝛺kk=0N-1𝜔k2+(xN-A)2+(yN-B)2+(𝜃N-C)2such thatxk+1=Tsvcos(𝜃k)+xkyk+1=Tsvsin(𝜃k)+yk𝜃k+1=Ts𝜔k+𝜃kxkXkykYk𝜃k𝛩k𝜔𝛺k Taking a case where the car is the initially located at [0,0,0] and is moving to terminal as [0,0,-2𝜋]. we then define the following discrete world:Xk={-1,-0.09,,1}Yk={-1,-0.99,,0}𝛩k={-2𝜋,-2𝜋+0.01,,0}𝛺k={-vR,0,vR}Let us take v=1 and R=0.5. The MATLAB implementation of (26) with YADPF is as follows.
1global v xf;2 3% Define the Dubin's car parameters4R = 0.5; % lower bound for the turning radius5v = 1; % the constant speed of the car 6 7% Setup the states and the inputs8X = -1 : 0.01 : 1;9Y = -1 : 0.01 : 0;10THETA = -2*pi : 0.01 : 0;11OMEGA = [-v/R 0 v/R];12 13% Setup the horizon14T_ocp = 0.2; % Temporal discretization step15Tf = pi; % Theoritical completion time, otherwise the car will go around-and-around16t = 0 : T_ocp : Tf;17 18% Initial and terminal states19% We are going to make a full cricle of radius R20x0 = [0 0 0];21xf = [0 0 -2*pi];22 23% Build the structure24dpf.states = {X, Y, THETA};25dpf.inputs = {OMEGA};26dpf.T_ocp = T_ocp;27dpf.T_dyn = 0.01;28dpf.n_horizon = length(t);29dpf.state_update_fn = @state_update_fn;30dpf.stage_cost_fn = @stage_cost_fn;31dpf.terminal_cost_fn = @terminal_cost_fn;32 33% Initiate and run the solver, do forwar tracing and plot the results34dpf = yadpf_solve(dpf);35dpf = yadpf_trace(dpf, x0);36yadpf_plot(dpf, '-');37 38% Animation39visualize(dpf);40 41%% The state update function42function X = state_update_fn(X, U, dt)43global v;44 45% X{1} is x, X{2} is y, and X{3} is theta, and U is omega46X{1} = dt*v*cos(X{3})+X{1};47X{2} = dt*v*sin(X{3})+X{2};48X{3} = dt*v*U{1}+X{3};49end50 51%% The stage cost function52function J = stage_cost_fn(X, U, k, dt)53J = dt * U{1}.^2; % Minimize the input54end55 56%% The terminal cost function57function J = terminal_cost_fn(X)58global xf;59% Weighting factors60r = 1000;61 62% Calculate the cost63J = r*(X{1}-xf(1)).^2 + r*(X{2}-xf(2)).^2 + r*(X{3}-xf(3)).^2;64end65 66%% Animate the car's motion in XY-space67function visualize(dpf)68% Please rsee the provided repository.69end
The results are shown in Figure 8 and Figure 9 below.
Figure 8:Optimal state evolution of the Dubin's car which is located at [0,0,0] and moving to terminal as [0,0,-2𝜋].
Figure 9: Optimal path for the car, which is initially located at [0,0,0] and moving to terminal as [0,0,-2𝜋].
6. Finding the Shortest Path on a TerrainA car is positioned x0,y0. The motion of the car is entirely kinematic and is given as follows: xk+1=xk+ukyk+1=yk+vkwhere uk and vk are defined as follows:uk{-1,0,1}vk{-1,0,1}Thus, the car can only move one grid per stage.
Drawing 4:A kinematic cat that can only move one grit at a time
When applying input u and v, the following rules apply.When u=1,v=0, the car moves right.When u=-1,v=0, the car moves left.When u=0,v=1, the car moves up.When u=0,v=-1, the car moves down. The objective function here is to travel as close as possible to the target coordinate (xT,yT). The travel time should be as short as possible. During the travel, the car must avoid the climb trajectories. However, the descent trajectories does not benefit the car either. Thus, descent trajectories are not more preferable than flat trajectories. Mathematically, the objective function is defined as follows:J*=minuk{𝛼(xT-xN)2+𝛼(yT-yN)2+k+0N-1[uk2+vk2+𝛽𝛥hk(uk)]}where 𝛥hk(uk)=max(zk+1-zk,0), which represents the height difference when a certain uk is applied. However, when the height difference is negative, 𝛥h is set to 0. r is the control gain that regulates the terminal position of the car. As a result, we can write down the optimal control problem as follows:minuk{𝛼(xT-xN)2+𝛼(yT-yN)2+k+0N-1[uk2+vk2+𝛽𝛥hk(uk)]}such that:xk+1=xk+ukyk+1=yk+vkuk{-1,0,1}vk{-1,0,1}MATLAB implementation by using YADPF of (31) is as follows. First, a MAT file containing a square matrix is provided. Matrix element at row number m and column number n is the terrain height at coordinate (m,n). This height matrix is generated by another MATLAB function that can be found in this link2
global T terra_size; % Terrain1.mat, from 1 to 4225 (64x64)% Terrain2.mat, from 1 to 16641 (129x129)load ('Terrain1.mat');%load ('Terrain2.mat');terra_size = size(T); % Setup the states and the inputsX = 1 : 1 : length(T);Y = 1 : 1 : length(T);Ux = [-1 0 1];Uy = [-1 0 1]; % Prepare the structuredpf.states = {X, Y};dpf.inputs = {Ux Uy};dpf.T_ocp = 1;dpf.T_dyn = 1;dpf.n_horizon = 150;dpf.state_update_fn = @state_update_fn;dpf.stage_cost_fn = @stage_cost_fn;dpf.terminal_cost_fn = @terminal_cost_fn; % Initiate and run the solver, do forwar tracing and plot the resultsdpf = yadpf_solve(dpf);dpf = yadpf_trace(dpf, [1 1]); % Initial state: [1 1]yadpf_plot(dpf, '-'); % Additional plottingfigurehold onsurf(T');plot3(dpf.x_star{1}, dpf.x_star{2}, T(sub2ind(terra_size, ... dpf.x_star{1}, dpf.x_star{2})), '-r', 'LineWidth',2);view([-46.1 49.7]);xlabel('x')ylabel('y')axis equal %% The state update functionfunction X = state_update_fn(X, U, ~)X{1} = X{1} + U{1};X{2} = X{2} + U{2};end %% The stage cost functionfunction J = stage_cost_fn(X, U, k, ~)global terra_size; r = terra_size(1);c = terra_size(2); % First, check where the car's next position for a selected inputX_to = state_update_fn(X, U);X_to{1} = min(max(X_to{1},1),r);X_to{2} = min(max(X_to{2},1),c); % Next, compute the height difference of the current position to the next% positiondh = get_height_difference(X{1}, X{2}, X_to{1}, X_to{2}); % Finally, compute the costw = 2;J = U{1}.^2 + U{2}.^2 + w.*max(dh,0);end %% The terminal cost functionfunction J = terminal_cost_fn(X)% Weighting factorsk1 = 100;k2 = 100; % Final statesxf = [50 60]; J = k1*(X{1}-xf(1)).^2 + k2.*(X{2}-xf(2)).^2;end %% This function computes the height differece between two location % coordinatesfunction dh = get_height_difference(x_from, y_from, x_to, y_to)global T terra_size; r = terra_size(1);c = terra_size(2); to_id = sub2ind([r c], x_to, y_to);from_id = sub2ind([r c], x_from, y_from); dh = T(to_id) - T(from_id);end
It can be seen from the listing above that before we can compute the stage cost, we must first compute the height difference of the current position coordinate to the next position coordinate for every possible input combination. Thus, we add a new function: get_height_difference. The outputs from the code listing above are given in Figure 10 and Figure 11.
Figure 10:State evolution and control input sequence for a given initial position to a targeted terminal position
Figure 11:An example of an optimal path found by the dynamic programming algorithm
7. Sutton's Mountain CarThere are many variations of the mountain car problems. The mountain car problem that we will solve here is taken from a popular text book authored by Sutton and Barto
(Sutton & Barto, 2018)
. The problem is stated as follows (with necessary modifications):
Consider the task of driving an under-powered car up a steep mountain road. The difficulty is that gravity is stronger than the car’s engine, and even at full throttle the car cannot accelerate up the steep slope. The only solution is to first move away from the goal and up the opposite slope on the left. Then, by applying full throttle the car can build up enough inertia to carry it up the steep slope even though it is slowing down the whole way. This is a simple example of a continuous control task where things have to get worse in a sense (farther from the goal) before they can get better. Many control methodologies have great difficulties with tasks of this kind unless explicitly aided by a human designer.
Figure 12:The Sutton's mountain car problem
The reward in this problem is 1 on all time steps until the car moves past its goal position at the top of the mountain, which ends the episode. There are three possible actions: full throttle forward (+1), full throttle reverse (-1), and zero throttle (0). The car moves according to a simplified physics. Its position, xk, and velocity, x.k, are updated by: xk+1= bound [xk+x.k+1]x.k+1= bound [x.k+0.001Ak-0.0025cos(3xk)] where the bound operation enforces -1.2xk0.5 and -0.07x.k0.07. In addition, when xk reached the left bound, x.k was reset to zero. When it reached the right bound, the goal was reached and the episode was terminated. Each episode started from a random position x0[-0.6,-0.4] and zero velocity. To solve the problem above, we start by formalizing it as follows: minuk{uk2+(xN-0.5)2+x.N},k=0,2,,N-1such that:xk+1=xk+x.k+1x.k+1=x.k+0.001Ak-0.0025cos(3xk)xk{-1.2,-1.199,,0.5}x.k{-0.07,-0.069,,0.07} MATLAB implementation of (33) by using YADPF is as follows.
% Setup the states and the inputsP = -1.2 : 0.001 : 0.5;V = -0.07 : 0.0001 : 0.07;U = [-1 0 1]; dpf.states = {P V};dpf.inputs = {U};dpf.T_ocp = 1;dpf.T_dyn = 1;dpf.n_horizon = 140;dpf.state_update_fn = @state_update_fn;dpf.stage_cost_fn = @stage_cost_fn;dpf.terminal_cost_fn = @terminal_cost_fn; % Initiate and run the solver, do forwar tracing and plot the resultsdpf = yadpf_solve(dpf);dpf = yadpf_trace(dpf, [-0.5 0]); % Initial state: [-0.5 0]yadpf_plot(dpf, '-'); % Animate the mountatin carvisualize(dpf); % Draw the reachability plot, this may take quite some time% yadpf_rplot(dpf, [0.5 0], 0.1); %% The state update functionfunction X = state_update_fn(X, U, ~)X{2} = X{2} + 0.001*U{1} - 0.0025*cos(3*X{1});X{1} = X{1} + X{2}; % Hitting the left wall[r,c] = find(X{1}(:,:) <= -1.2);X{2}(r,c) = 0.001; % Inelastic wallend %% The stage cost function function J = stage_cost_fn(X, U, k, ~)xf = [0.5 0]; % Terminal state r1 = 1000;r2 = 1000;r3 = 5; J = r1*(X{1}-xf(1)).^2 + r2*(X{2}-xf(2)).^2 + r3*U{1}.^2;end %% The terminal cost function function J = terminal_cost_fn(X)xf = [0.5 0]; % Terminal state r1 = 1000;r2 = 1000; J = r1*(X{1}-xf(1)).^2 + r2*(X{2}-xf(2)).^2;end %% -----------------------------------------------------------------------function visualize(dpf)% Please see the complete codes in the provided repository.end
The results are presented in Figure 13 and Figure 14.Initially, we do not know the shortest possible time for the car to complete task. Thus, we guess the completion time by initially setting the horizon to 200. However, the car manage to finish the task before the final horizon. The results are shown in Figure 13 and Figure 14.
Figure 13:State evolution and control input sequence of the mountain car
Figure 14:Animation of the car's optimal movement
Figure 15:Reachability plot of the Sutton's mountain car's problem.
8. Two-Tank ProblemThe two tank problem is taken from
(Elbert et al., 2013)
. The task here is to fill both tanks within time tf. The input u1 commands total water flows. The input u2 commands the water flow distribution between the two tanks. The values for these inputs are from 0 to 1. When u2=0.5, same amount of water flows to both tanks. Both tanks have leaks that are proportional to the amount of water in the tank. The volume of the tanks (x1 and x2) are also from 0 to 1.
Drawing 5:Illustration of the two tank problem
The dynamics of the system is governed by the following equations.x.1(t)=-12x1(t)+u1(t)u2(t)x.2(t)=-12x2(t)+u1(t)(1-u2(t))Our goal is to fill the tank such that after two seconds, both tanks are half full. For this purpose, the cost function is given as follows.J=0tfu1(t)+0.1|u2(t)-0.5|dtwhere tf=2. we start by formalizing (34) and (35) into an optimal control problem as follows. minukk=0N-1{u1k+0.1|u2k-0.5|}such thatx1k+1=Ts(-12x1k+u1ku2k)+x1kx2k+1=Ts(-12x2k+u1k(1-u2k))+x2ku1kU1ku2kU2kx1kX1kx2kX2k
9. Piecewise Hanging Mass-Spring SystemOur task here is to find the equilibrium position of a mass-spring system hanging from two anchor points as shown in Drawing 6. This problem is similar to the problem that can be found in this MATLAB documentation page . However, we will address the problem by using the dynamic optimization method.
Drawing 6:Hanging piecewise linear spring due to the gravity
To turn the problem above into an dynamic optimization problem, let us consider a planar kinematic car which is initially located at anchor 1. The car takes two vectors as its input: ux and uy. The resulting motion of the car is given the by the sum of these two vectors. The car must arrive at anchor 2 within six stages (number of horizons: N+1=7). The dynamic of the car is governed by the following equations.xk+1=xk+uxkyk+1=yk+uykLet us take the location of mi as (xi,yi) and the normal length of spring-i as li. The potential energy caused by the spring-i can be written as:Ti=12ki((xi-xi-1)2+(yi-yi-1)2-li)2While the energy caused by the gravity can be written as:Vi=migxiThus, the total energy is considered as cost that must be minimized. It is described as follows:J=i=05(Ti+Vi)+R(x6-A)2+R(y6-B)2
Drawing 7:Analogous planar kinematic car to model the piecewise linear springs
MATLAB Cone MethodDPA-1Grind Size: 0.05DPA-2Grid Size: 0.03
Anchor 1[0,5][0,5][0,5]
Spring 1[0.8476,2.8835][0.65,2.9][0.81,2.85]
Spring 2[1.3675,2.1207][1.25,2.15][1.32,2.07]
Spring 3[2.2506,1.2789[2.15,1.3][2.16,1.17]
Spring 4[4.0897,2.3711][3.95,2.5][3.99,2.34]
Spring 5[4.7089,3.3774][4.65,3.45][4.59,3.39]
Anchor 2 / Spring 6[5,4][5,4][5,4]
10. Time-Optimal van der Pol Equation with an Input
11. Time-Optimal Stabilization of an F8 Aircraft
12. References
Bertsekas, D. P. (2000). Dynamic Programming and Optimal Control (2nd ed.). Athena Scientific.
Elbert, P., Ebbesen, S., & Guzzella, L. (2013). Implementation of dynamic programming for n-dimensional optimal control problems with final state constraints. IEEE Transactions on Control Systems Technology, 21(3), 924–931.
Jonsson, U. (2010). Optimal Control. Optimization and Systems Theory, KTH.
Miretti, F., Misul, D., & Spessa, E. (2021). DynaProg: Deterministic Dynamic Programming solver for finite horizon multi-stage decision problems. SoftwareX, 14, 100690.
Sundstrom, O., & Guzzella, L. (2009). A Generic Dynamic Programming Matlab Function. 18th IEEE International Conference on Control Applications, 7, 1625–1630.
Sutton, R. S., & Barto, A. G. (2018). Reinforcement Learning: An Introduction (2nd ed.). The MIT Press.
1https://ocw.mit.edu/courses/electrical-engineering-and-computer-science/6-231-dynamic-programming-and-stochastic-control-fall-2015/lecture-notes/MIT6_231F15_Lec2.pdf
2http://knight.temple.edu/~lakaemper/courses/cis350_2004/assignments/assignment_02.htm