The Analysis of How to Learn Dynamic Programming
Used in mathematics, management science, computer science, economics, and bioinformatics, dynamic programming (DP) can solve complex problems by decomposing the original problem into relatively simple subproblems. And at AlgoMonster, we will share how to solve the dynamic programming problem.
What is dynamic programming?
In 1951, American mathematician B. E. Bellman and others proposed the “optimality principle.” The characteristics of a class of multi-stage decision problems transform them into a series of interrelated single-stage decision problems. It then solves them one by one in stages, thus creating an optimality solution. It makes a new method for solving optimization problems – dynamic programming.
Classification of dynamic programming
Dynamic programming models classify processes into discrete and continuous decision processes according to whether the time covariates of the multi-stage decision process are discrete or continuous variables. Depending on whether the evolution of the decision process is deterministic or stochastic, these processes are identified accordingly. There are four decision process models: discrete deterministic, discrete stochastic, continuous deterministic, and continuous stochastic.
Dynamic programming is not a specific algorithm but an algorithmic idea: to solve a given problem, we need to solve its different parts (i.e., subproblems) and then, based on the solutions of the subproblems to derive the resolution of the original problem.
The feasibility of applying this algorithmic idea to solve a problem has some requirements on two aspects. The relationship between the subproblem and the original problem and the relationship between the subproblems. It corresponds to the optimal substructure and the repeated subproblem, respectively.
What is the optimal substructure in dynamic programming?
The optimal substructure specifies the relationship between the subproblem and the original problem.
Dynamic programming is about finding the optimal solution to some problem, i.e., finding the optimal one from many resolutions. When we seek the optimal solution of a problem, if we can decompose the problem into multiple subproblems, then recursively find the optimal solution of each subproblem. And finally, combine the optimal solutions of each subproblem through specific mathematical methods to obtain the final result. To summarize, the optimal solution of its subproblems determines the optimal solution of a problem.
Combining the subproblems’ solutions to obtain the original problem’s solution is the key to the feasibility of dynamic programming. The state transfer equation generally describes this combination in the solution.
The example
The solution of the original problem is f(n)f(n), where we call f(n)f(n) the state. The state transfer equation f(n) = f(n – 1) + f(n – 2)f(n) = f(n-1) + f(n-2) describes a combination of the original problem and the subproblem . There are some choices on the original problem, and different choices may correspond to different subproblems or different combinations.
n = 2k and n = 2k + 1n=2k+1 correspond to different choices on the original problem, corresponding to different subproblems and combinations, respectively.
After finding the optimal substructure, we can also derive a state transfer equation f(n)f(n), by which we can quickly write a recursive implementation of the problem.
How to solve dynamic programming problems?
The core of solving a dynamic programming problem is finding the subproblem and its relationship to the original problem.
Once we find the subproblem and its relationship to the original problem, we can solve the subproblem recursively. However, the overlapping subproblems make direct recursion a lot of repeated calculations, so the idea of mnemonic recursion. If the scope of the subproblems can be sure in advance, then build the tap to store the answers to the subproblems.
Example: Dynamic programming in a continuous state
Think about why this problem is dynamic programming. You can use this method.
First, due to the fixed sum of three variables, it limits the space for the other two variables to take values after selecting one variable. That is, the decision affects the state.
Second, the importance of three variables can be a determination one by one to achieve a certain step. That is, there is a stage division possible.
Finally, it satisfies Markovianity. Then at this time, “fix the sum of the last 50 variables at b” in the set state, which means the values of the first 50 variables do not have any effect on the choice of the last 50 variables. It is posterior-free. It determines that Finally, it satisfies Markovianity. Then at this time, “fix the sum of the last 50 variables at b” in the set state. Meaning the values of the first 50 variables do not have any effect on the choice of the last 50 variables. It is posterior-free. It determines that dynamic programming can solve such problems.
The next step
The next step is to clarify the basic concept of dynamic programming. The stage can be defined as the number of variables taken. Like taking the first variable’s value is stage 1, and taking the value of 1, 2 variables are stage 2. The state can be defined as the sum of k variables when taking k variables.
Also, decision and strategy become the processes of taking values naturally. And the indicator function is the form of z. The optimal function is the largest indicator function.
Sub-indicator (optimal) function is a certain stage of the variables are taken after the value of the part in the manifestation of z.
For example, the sub-indicator of stage 2 is x1x2^2.
In the first stage, the value of x1 is taken, and according to the state’s definition, the reachable state of stage 1 is the set of all matters of x1, which means all values of [0,c], and the subindex function is also x1.
The key starts at the second stage, where values are taken for x1 and x2 simultaneously, at which point the reachable state of stage 2 is the set of all possible values of (x1+x2), again all values of [0,c], and the subindex function is x1x2^2. But then, we have to find the optimal policy for each of these states.
When the stage 2 state is s2, we can adjust the values of x1 and x2 to reach the maximum value of x1x2^2. And also, we can require this maximum value by treating s2 as a constant derivative concerning x2 to obtain the optimal strategy for state s2.
The third stage of the reachable state is only c. It means that the decision of the third stage is also certain when the second state is over. At this time there is already x1 + x2 = s2, s2 + x3 = s3 = c. Then consider s3 as a constant (and it is) derivative. So that obtains the optimal strategy for the state s3.