Unmanned aerial vehicle energy consumption optimization method based on reinforcement learning
1. An unmanned aerial vehicle energy consumption optimization method based on reinforcement learning is characterized by comprising the following steps:
s1: constructing a communication system model between the unmanned aerial vehicle and the ground sensor;
s2: calculating the flight energy consumption of the unmanned aerial vehicle traversing a single sensor on the basis of the system model;
s3: on the basis of flight energy consumption of the unmanned aerial vehicle traversing a single sensor, calculating to obtain the overall energy consumption of the unmanned aerial vehicle traversing all sensors, wherein the energy consumption comprises flight energy consumption and communication energy consumption;
s4: and obtaining the optimal traversal path of the unmanned aerial vehicle by using a path selection algorithm of the unmanned aerial vehicle based on Q-learning, so as to obtain the optimal energy consumption for the unmanned aerial vehicle to traverse all the sensors.
2. The method of claim 1, wherein the step S1 specifically includes:
suppose the altitude of the ground sensor k is hkN, where k is 1, N is the total number of sensors, and the maximum altitude of the ground sensor is hmax={h1,h2,...,hk,...,hNAnd the maximum height of the surface vegetation is htIn order to ensure the flight safety and communication quality of the unmanned aerial vehicle, the flight height h of the unmanned aerial vehiclefThe following conditions are satisfied:
hf≥hmax+ht
setting the unmanned aerial vehicle to communicate with a ground sensor k when hovering for thWhen the unmanned aerial vehicle hovers in the air, the altitude difference H between the unmanned aerial vehicle and the ground sensor kkComprises the following steps:
Hk=hf-hk
by skRepresents the horizontal distance between the drone and sensor k, the distance between the drone and sensor k being represented as:
at time t, using βk(t) as a channel coefficient between the drone and the sensor k, the following condition is satisfied:
wherein the content of the first and second substances,representing path loss due to large-scale fading, random complex variablesUsed for representing the influence of small-scale fading on a received signal, and considering a line-of-sight (LoS) link and a non-line-of-sight (NLoS) link due to the occlusion of an obstacle, wherein alpha isLAnd alphaNRespectively, the corresponding path loss, respectively,the following conditions are satisfied:
wherein p isk,LoSAnd pk,NLoSRespectively representing line-of-sight (LoS) and non-line-of-sight (NLoS) probabilities, p, between the drone and the sensor kk,LoSSatisfies the following conditions:
wherein b and c are proportionality coefficients, let pk,NLoSThe following conditions are satisfied:
pk,NLoS=1-pk,LoS
when the drone and the sensor k communicate, it is assumed that the drone and the sensor have the same communication device and the same transmission power PtThe interference signal sent by the unmanned aerial vehicle is xs(t), the transmission rate between the drone and the sensor k is expressed using the following equation:
wherein the content of the first and second substances,white gaussian noise N representing a receiver0And the sum of the weak interference and the interference,for residual loop interference channel, P { | x { [ j ] } { | ] { [ n ] } iss(t)|2That is, the interference signal sent by the unmanned aerial vehicle is xsMean square error of (t), B represents the bandwidth, thFor unmanned hover time.
3. The method for optimizing energy consumption of unmanned aerial vehicle based on reinforcement learning of claim 2, wherein the step S2 specifically comprises: suppose that the maximum horizontal flying speed of the unmanned aerial vehicle is VmaxThe air resistance is f, the mass of the unmanned aerial vehicle is represented by m, a0The acceleration is represented, the altitude of the unmanned aerial vehicle is constant in the whole flight process, and the flight process can be divided into four parts of uniform speed, acceleration, deceleration and hovering; firstly, analyzing the process of flying from a sensor k to a sensor k +1 by the unmanned aerial vehicle, and assuming that the horizontal speed of the unmanned aerial vehicle is 0 when the unmanned aerial vehicle initially hovers above the sensor k; unmanned plane at thCompleting data sending and receiving tasks within time and being in a hovering state all the time; then, the drone acceleratesFlying a certain distance to the maximum speed and at a constant maximum horizontal speed; finally, the unmanned aerial vehicle starts to decelerate, and reaches a sensor k +1 when the speed is reduced to 0, and the deceleration process is consistent with the acceleration process;
unmanned plane at thThe energy consumption of hovering over time is expressed as:
Eh=Phth
wherein, PhThe flying power of the unmanned aerial vehicle during suspension is assumed as tcThen the flight energy consumption in this time period is:
Ec=Phtc+ftc
at the same time, energy consumption E of the acceleration processAcAnd energy consumption E of deceleration processDeThe following formula is satisfied:
the energy consumption of flight E between sensor k and sensor k +1fComprises the following steps:
Ef=Eh+EAc+Ec+EDe。
4. the method for optimizing energy consumption of unmanned aerial vehicle based on reinforcement learning of claim 3, wherein the step S3 specifically comprises: putting N sensors in the centers of N grids, traversing all the sensors by the unmanned aerial vehicle in the air, and assuming that the height of the unmanned aerial vehicle is fixed and cannot rise or fall in the flight process, in the scene, the communication of the unmanned aerial vehicle is point-to-point communication, considering LoS-based data transmission, ignoring non-LoS transmission, and obtaining the following formula according to a communication system model of the step S1:
wherein the content of the first and second substances,energy consumption E caused by flying unmanned aerial vehicle from sensor i to sensor ji,jIncluding flight energy consumption and communication energy consumption, the communication energy consumption including transmission energy consumption EsAnd receiving energy consumption ErFor receiving energy consumption, the receiving power consumption of the unmanned aerial vehicle is far less than the hovering power consumption of the unmanned aerial vehicle in the transmission process of the sensor waiting, so that the receiving power consumption is ignored, namely ErFor transmit energy consumption, the drone transmits data Q over sensor i to sensor ipTime ti,iThe drone sends data Q to the next sensor jcThis time is ti,jLet the unmanned aerial vehicle transmit power PsThen the transmission power consumption is:
Es=Ps(ti,i+ti,j)
wherein, ti,iAnd ti,jRespectively as follows:
wherein d isi,iAnd di,jRespectively represents the distance H between the unmanned aerial vehicle and the sensor i, j when the unmanned aerial vehicle is suspended on the sensor ii,HjRespectively, the height difference between the drone and the sensor i, j, Ei,jExpressed as:
Ei,j=Ef+Es
then the whole energy consumption E of the unmanned aerial vehicle traversing all the sensorsallExpressed as:
simultaneously, the following conditions are met:
and each sensor only needs to transmit data to the unmanned aerial vehicle once, and power consumption calculation is not repeated.
5. The reinforcement learning-based energy consumption optimization method for unmanned aerial vehicles according to claim 4, wherein in step S4, the optimal traversal path of the unmanned aerial vehicle is obtained by using a path selection algorithm for unmanned aerial vehicles based on Q-learning, so as to obtain the optimal energy consumption of all sensors traversed by the unmanned aerial vehicle, the steps are as follows:
(1) define the state of the drone s ═ (x)s,ys),(xs,ys) Representing the position coordinate information of a sensor i, defining a Q table, recording a state s in each row in the table, and selecting a Q value in different actions, wherein the actions are from the current sensor to the next sensor; every time, unmanned aerial vehicle has two kinds of actions optional: randomly selecting one sensor from all sensors as a next sensor number to arrive; selecting an action executed in the state with the maximum Q value, namely the next sensor reached by the unmanned aerial vehicle; energy consumption E generated by traversing and communicating a ground sensor with w as a dronei,jDefines the following reward value function, representing the reward value for the drone to perform an action in state s:
Ri=-wEi,j
(2) initializing N ground sensors, initializing a sensor number omega being {1, 2.. N }, initializing values of w, epsilon, lambda and gamma, wherein gamma is an attenuation coefficient, lambda is a learning rate, and having gamma being in (0, 1), lambda being in (0, 1), epsilon being a threshold, initializing N multiplied by N energy matrixes Ei,jAnd a reward matrix RiI, j ═ 1, 2., N }, initialize Q ← 0N,N,0N,NRepresenting an NxNth-order zero matrix, and initializing the state s of the unmanned aerial vehicle, wherein omega is omega;
(3) suppose Qi[s,a]Indicating unmanned aerial vehicleExecuting an action a when in the state s, namely an action that the unmanned aerial vehicle reaches another sensor i +1 from one sensor i, so that the unmanned aerial vehicle reaches the next state s' ═ xs′,ys′]The obtained Q value generates a random number μ from 0 to 1, and if μ < epsilon, the above action (i) is performed, that is, the next sensor number to be reached by the drone is randomly selected from Ω' ═ {1, 2.. N }; otherwise, the above action (ii) is performed, i.e. the action (a ') performed in the state (s') with the maximum Q value is selected, i.e. from the last sensor (i + 1) to the next sensor (i + 2); storing the Q value obtained by each iteration in a Q table, and updating the Q value by using the following formula:
Q′=Qi[s,a];
Q′=Q′+λ(Ri[s,a]+γmaxQi+,[s′,a′]-Q′);
Qi[s,a]=Q′;
wherein R isi[s,a]Reward value, maxQ, representing the process of the drone in state s from the current sensor i to the next sensor i +1i+1[s′,a′]Representing the maximum Q value of the subsequent state, and circularly executing (2) when i is less than N;
(4) after the above process is executed, obtaining an NxN Q table, wherein the maximum value of each line of the Q table represents the optimal selection; obtaining the unmanned plane path planning decision of the given path point according to the maximum Q value in each state, and calculating each E through the pathi,jAnd summing to finally obtain the minimum value minE of the energy consumption of all sensors on the ground traversed by the unmanned aerial vehicleall。
Background
In recent years, with the development of 5G, the current communication system is no longer satisfactory for general terrestrial communication. Ground-to-air communications have become a part of the intense development in communication networks. Unmanned aerial vehicles are very suitable for large-scale communication due to high mobility, and meanwhile, the unmanned aerial vehicles can reduce much interference when communicating with the ground in high altitude, so that the unmanned aerial vehicles become an important component of ground-to-air communication networks. However, there are also problems with drones communicating with a large number of sensors in open areas in the field. The energy of the unmanned aerial vehicle is limited, and how to reasonably design a flight route and reduce the flight energy consumption becomes a key.
Therefore, the present invention primarily contemplates that the drone communicates with a large number of sensors distributed on the ground over open field areas lacking infrastructure. The unmanned aerial vehicle receives data collected by the sensor and simultaneously sends some model parameter information to the sensor. On the basis of completing a communication task, the flight route and the data transmission strategy of the unmanned aerial vehicle are optimized, and the minimization of the energy consumption of the unmanned aerial vehicle is the core of the problem. The invention is developed according to the above, mainly researches a model of the whole unmanned aerial vehicle and a sensor communication system, deduces a communication and flight energy consumption model of the unmanned aerial vehicle, researches a related path selection method, considers factors such as the flight speed, geographic information and transmission rate of the unmanned aerial vehicle, and analyzes the action space and the state space of the unmanned aerial vehicle. An unmanned aerial vehicle energy consumption optimization method based on reinforcement learning is provided.
Disclosure of Invention
The invention provides an unmanned aerial vehicle energy consumption optimization method based on reinforcement learning, which is implemented by formulating a flight strategy and optimization content of an unmanned aerial vehicle. Then, starting from a reinforcement learning algorithm, a path selection algorithm of the unmanned aerial vehicle based on Q-learning is provided, and the flight and communication energy consumption of the unmanned aerial vehicle is effectively reduced.
In order to achieve the purpose, the technical scheme adopted by the invention is as follows: the invention provides an unmanned aerial vehicle energy consumption optimization method based on reinforcement learning, which comprises the following steps:
s1: constructing a communication system model between the unmanned aerial vehicle and the ground sensor;
s2: calculating the flight energy consumption of the unmanned aerial vehicle traversing a single sensor on the basis of the system model;
s3: on the basis of flight energy consumption of the unmanned aerial vehicle traversing a single sensor, calculating to obtain the overall energy consumption of the unmanned aerial vehicle traversing all sensors, wherein the energy consumption comprises flight energy consumption and communication energy consumption;
s4: and obtaining the optimal traversal path of the unmanned aerial vehicle by using a path selection algorithm of the unmanned aerial vehicle based on Q-learning, so as to obtain the optimal energy consumption for the unmanned aerial vehicle to traverse all the sensors.
Further, the step S1 specifically includes:
suppose the altitude of the ground sensor k is hkN, where k is 1, N is the total number of sensors, and the maximum altitude of the ground sensor is hmax={h1,h2,...,hk,...,hNAnd the maximum height of the surface vegetation is htIn order to ensure the flight safety and communication quality of the unmanned aerial vehicle, the flight height h of the unmanned aerial vehiclefThe following conditions are satisfied:
hf≥hmax+ht
setting the unmanned aerial vehicle to communicate with a ground sensor k when hovering for thWhen the unmanned aerial vehicle hovers in the air, the altitude difference H between the unmanned aerial vehicle and the ground sensor kkComprises the following steps:
Hk=hf-hk
by skRepresents the horizontal distance between the drone and sensor k, the distance between the drone and sensor k being represented as:
at time t, using βk(t) as a channel coefficient between the drone and the sensor k, the following condition is satisfied:
wherein the content of the first and second substances,representing path loss due to large-scale fading, random complex variablesUsed for representing the influence of small-scale fading on a received signal, and considering a line-of-sight (LoS) link and a non-line-of-sight (NLoS) link due to the occlusion of an obstacle, wherein alpha isLAnd alphaNRespectively, the corresponding path loss, respectively,the following conditions are satisfied:
wherein, Pk,LoSAnd Pk,NLoSRespectively representing line-of-sight (LoS) and non-line-of-sight (NLoS) probabilities, p, between the drone and the sensor kk,LoSSatisfies the following conditions:
wherein b and c are proportionality coefficients, let pk,NLoSThe following conditions are satisfied:
pk,NLoS=1-pk,LoS
when the drone and the sensor k communicate, it is assumed that the drone and the sensor have the same communication device and the same transmission power PtThe interference signal sent by the unmanned aerial vehicle is xs(t), the transmission rate between the drone and the sensor k is expressed using the following equation:
wherein the content of the first and second substances,white gaussian noise N representing a receiver0And the sum of the weak interference and the interference,for residual loop interference channel, P { | x { [ j ] } { | ] { [ n ] } iss(t)|2That is, the interference signal sent by the unmanned aerial vehicle is xsMean square error of (t), B represents the bandwidth, thFor unmanned hover time.
Further, the step S2 specifically includes: suppose that the maximum horizontal flying speed of the unmanned aerial vehicle is VmaxThe air resistance is f, the mass of the unmanned aerial vehicle is represented by m, a0The acceleration is represented, the altitude of the unmanned aerial vehicle is constant in the whole flight process, and the flight process can be divided into four parts of uniform speed, acceleration, deceleration and hovering; firstly, analyzing the process of flying from a sensor k to a sensor k +1 by the unmanned aerial vehicle, and assuming that the horizontal speed of the unmanned aerial vehicle is 0 when the unmanned aerial vehicle initially hovers above the sensor k; unmanned plane at tjCompleting data sending and receiving tasks within time and being in a hovering state all the time; then, accelerating the unmanned aerial vehicle to the maximum speed and flying for a distance at a constant speed at the constant maximum horizontal speed; finally, the unmanned aerial vehicle starts to decelerate, and reaches a sensor k +1 when the speed is reduced to 0, and the deceleration process is consistent with the acceleration process;
unmanned plane at thThe energy consumption of hovering over time is expressed as:
Eh=Phth
wherein, PhThe flying power of the unmanned aerial vehicle during suspension is assumed as tcThen the flight energy consumption in this time period is:
Ec=Phtc+ftc
at the same time, energy consumption E of the acceleration processAcAnd energy consumption E of deceleration processDeThe following formula is satisfied:
the energy consumption Ef of flight from sensor k to sensor k +1 is then:
Ef=Eh+EAc+Ec+EDe。
further, the step S3 specifically includes: putting N sensors in the centers of N grids, traversing all the sensors by the unmanned aerial vehicle in the air, and assuming that the height of the unmanned aerial vehicle is fixed and cannot rise or fall in the flight process, in the scene, the communication of the unmanned aerial vehicle is point-to-point communication, considering LoS-based data transmission, ignoring non-LoS transmission, and obtaining the following formula according to a communication system model of the step S1:
wherein the content of the first and second substances,energy consumption E caused by flying unmanned aerial vehicle from sensor i to sensor ji,jIncluding flight energy consumption and communication energy consumption, the communication energy consumption including transmission energy consumption EsAnd receiving energy consumption ErFor transmit power consumption, the drone transmits data Q over sensor i to sensor ipTime ti,iThe drone sends data Q to the next sensor jcThis time is ti,jLet the unmanned aerial vehicle transmit power PsThen the transmission power consumption is:
Es=Ps(ti,i+ti,j)
wherein, ti,iAnd ti,jRespectively as follows:
wherein d isi,iAnd di,jRespectively represents the distance H between the unmanned aerial vehicle and the sensor i, j when the unmanned aerial vehicle is suspended on the sensor ii,HjRespectively, the height difference between the drone and the sensor i, j, Ei,jExpressed as:
Ei,j=Ef+Es
then the whole energy consumption E of the unmanned aerial vehicle traversing all the sensorsallExpressed as:
simultaneously, the following conditions are met:
and each sensor only needs to transmit data to the unmanned aerial vehicle once, and power consumption calculation is not repeated.
Further, the step S4 specifically includes: deducing the energy consumption E generated by the unmanned plane traversing and communicating with a ground sensori,jW is taken as the weight of energy consumption, and the total energy consumption E of the unmanned aerial vehicle traversing all sensors on the ground can be known through the analysisallRequiring demine depending on the sum of energy consumption of each sensor traversedallAnd obtaining the optimal traversal path of the unmanned aerial vehicle, which minimizes the total energy consumption, by using a Q-learning algorithm. Q-learning has three elements, namely, state, action and reward. The agent (referred to as the drone) will take action based on the current state and record the reward that is fed back so that it can take more optimal action the next time it comes to the same state. Q is an action utility function used for evaluating the quality of certain action taken under a specific state.
The specific steps for obtaining the optimal energy consumption traversal path of the unmanned aerial vehicle are as follows:
(1) define the state of the drone s ═ (x)s,ys),(xs,ys) Representing the position coordinate information of a sensor i, defining a Q table, recording a state s in each row in the table, and selecting a Q value in different actions, wherein the actions are from the current sensor to the next sensor; every time, unmanned aerial vehicle has two kinds of actions optional: randomly selecting one sensor from all sensors as a next sensor number to arrive; selecting an action executed in the state with the maximum Q value, namely the next sensor reached by the unmanned aerial vehicle; energy consumption E generated by traversing and communicating a ground sensor with w as a dronei,jDefines the following reward value function, representing the reward value for the drone to perform an action in state s:
Ri=-wEi,j
(2) initializing N ground sensors, initializing a sensor number omega being {1, 2.. N }, initializing values of w, epsilon, lambda and gamma, wherein gamma is an attenuation coefficient, lambda is a learning rate, and having gamma being in (0, 1), lambda being in (0, 1), epsilon being a threshold, initializing N multiplied by N energy matrixes Ei,jAnd a reward matrix RiI, j ═ 1, 2., N }, initialize Q ← 0N,N,0N,NRepresenting an NxNth-order zero matrix, and initializing the state s of the unmanned aerial vehicle, wherein omega is omega;
(3) suppose Qi[s,a]Indicating that the drone executes action a when in state s, this is an action that the drone reaches from one sensor i to another sensor i +1, so that the drone reaches the next state s' ═ xs′,ys′]The obtained Q value generates a random number μ from 0 to 1, and if μ < epsilon, the above action (i) is performed, that is, the next sensor number to be reached by the drone is randomly selected from Ω' ═ {1, 2.. N }; otherwise, the above action (ii) is performed, i.e. the action (a ') performed in the state (s') with the maximum Q value is selected, i.e. from the last sensor (i + 1) to the next sensor (i + 2); storing the Q value obtained by each iteration in a Q table, and updating the Q value by using the following formula:
Q′=Qi[s,a];
Q′=Q′+λ(Ri[s,a]+γmaxQi+1[s′,a′]-Q′);
Qi[s,a]=Q′;
wherein R isi[s,a]Reward value, maxQ, representing the process of the drone in state s from the current sensor i to the next sensor i +1i+1[s′,a′]Representing the maximum Q value of the subsequent state, and circularly executing (2) when i is less than N;
(4) after the above process is executed, obtaining an NxN Q table, wherein the maximum value of each line of the Q table represents the optimal selection; obtaining the unmanned plane path planning decision of the given path point according to the maximum O value in each state, and calculating each E through the pathi,jAnd summing to finally obtain the minimum value minE of the energy consumption of all sensors on the ground traversed by the unmanned aerial vehicleall。
Has the advantages that: compared with the prior art, the technical scheme of the invention has the following beneficial effects:
the invention provides an unmanned aerial vehicle energy consumption optimization method based on reinforcement learning, which can effectively select an optimal path, thereby reducing the overall power consumption of an unmanned aerial vehicle during flying and communicating with a sensor.
Drawings
FIG. 1 is a schematic flow diagram of the process of the present invention;
FIG. 2 is a schematic diagram of the arrangement relationship between the UAVs and the sensors;
FIG. 3 is a sensor profile;
FIG. 4 is a diagram of training iterations;
FIG. 5 is an algorithm path diagram.
Detailed Description
In order to facilitate understanding of those skilled in the art, the present invention will be further described with reference to the following examples and drawings, which are not intended to limit the present invention.
Referring to fig. 1, the invention provides an unmanned aerial vehicle energy consumption optimization method based on reinforcement learning, which includes the following steps:
s1: constructing a communication system model between the unmanned aerial vehicle and the ground sensor;
s2: calculating the flight energy consumption of the unmanned aerial vehicle traversing a single sensor on the basis of the system model;
s3: on the basis of flight energy consumption of the unmanned aerial vehicle traversing a single sensor, calculating to obtain the overall energy consumption of the unmanned aerial vehicle traversing all sensors, wherein the energy consumption comprises flight energy consumption and communication energy consumption;
s4: and obtaining the optimal traversal path of the unmanned aerial vehicle by using a path selection algorithm of the unmanned aerial vehicle based on Q-learning, so as to obtain the optimal energy consumption for the unmanned aerial vehicle to traverse all the sensors.
The step S1 specifically includes:
suppose the altitude of the ground sensor k is hkN, where k is 1, N is the total number of sensors, and the maximum altitude of the ground sensor is hmax={h1,h2,...,hk,...,hNAnd the maximum height of the surface vegetation is htIn order to ensure the flight safety and communication quality of the unmanned aerial vehicle, the flight height h of the unmanned aerial vehiclefThe following conditions are satisfied:
hf≥hmax+ht
setting the unmanned aerial vehicle to communicate with a ground sensor k when hovering for thWhen the unmanned aerial vehicle hovers in the air, the altitude difference H between the unmanned aerial vehicle and the ground sensor kkComprises the following steps:
Hk=hf-hk
by skRepresents the horizontal distance between the drone and sensor k, the distance between the drone and sensor k being represented as:
at time t, using βk(t) as a channel coefficient between the drone and the sensor k, the following condition is satisfied:
wherein the content of the first and second substances,representing path loss due to large-scale fading, random complex variablesThe field comes to represent the influence of small-scale fading on received signals, and due to the fact that obstacles are shielded, a line-of-sight link (LoS) and a non-line-of-sight link (NLoS) are considered, and alphaLAnd alphaNRespectively, the corresponding path loss, respectively,the following conditions are satisfied:
wherein p isk,LoSAnd pk,NLoSRespectively representing line-of-sight (LoS) and non-line-of-sight (NLoS) probabilities, p, between the drone and the sensor kk,LoSSatisfies the following conditions:
wherein b and c are proportionality coefficients, let pk,NLoSThe following conditions are satisfied:
pk,NLoS=1-pk,LoS
when the drone and the sensor k communicate, it is assumed that the drone and the sensor have the same communication device and the same transmission power PtThe interference signal sent by the unmanned aerial vehicle is xs(t), the transmission rate between the drone and the sensor k is expressed using the following equation:
wherein the content of the first and second substances,white gaussian noise N representing a receiver0And the sum of the weak interference and the interference,for residual loop interference channel, P { | x { [ j ] } { | ] { [ n ] } iss(t)|2That is, the interference signal sent by the unmanned aerial vehicle is xsMean square error of (t), B represents the bandwidth, thFor unmanned hover time.
The step S2 specifically includes: suppose that the maximum horizontal flying speed of the unmanned aerial vehicle is VmaxThe air resistance is f, the mass of the unmanned aerial vehicle is represented by m, a0The acceleration is represented, the altitude of the unmanned aerial vehicle is constant in the whole flight process, and the flight process can be divided into four parts of uniform speed, acceleration, deceleration and hovering; firstly, analyzing the process of flying from a sensor k to a sensor k +1 by the unmanned aerial vehicle, and assuming that the horizontal speed of the unmanned aerial vehicle is 0 when the unmanned aerial vehicle initially hovers above the sensor k; unmanned plane at thCompleting data sending and receiving tasks within time and being in a hovering state all the time; then, accelerating the unmanned aerial vehicle to the maximum speed and flying for a distance at a constant speed at the constant maximum horizontal speed; finally, the unmanned aerial vehicle starts to decelerate, and reaches a sensor k +1 when the speed is reduced to 0, and the deceleration process is consistent with the acceleration process;
unmanned plane at thThe energy consumption of hovering over time is expressed as:
Eh=Phth
wherein, PhThe flying power of the unmanned aerial vehicle during suspension is assumed as tcThen the flight energy consumption in this time period is:
Ec=Phtc+ftc
at the same time, energy consumption E of the acceleration processAcAnd energy consumption E of deceleration processDeThe following formula is satisfied:
the energy consumption Ef of flight from sensor k to sensor k +1 is then:
Ef=Eh+EAc+Ec+EDe。
the step S3 specifically includes: putting N sensors in the centers of N grids, traversing all the sensors by the unmanned aerial vehicle in the air, and assuming that the height of the unmanned aerial vehicle is fixed and cannot rise or fall in the flight process, in the scene, the communication of the unmanned aerial vehicle is point-to-point communication, considering LoS-based data transmission, ignoring non-LoS transmission, and obtaining the following formula according to a communication system model of the step S1:
wherein the content of the first and second substances,energy consumption E caused by flying unmanned aerial vehicle from sensor i to sensor ji,jIncluding flight energy consumption and communication energy consumption, the communication energy consumption including transmission energy consumption EsAnd receiving energy consumption ErFor transmit power consumption, the drone transmits data Q over sensor i to sensor ipTime ti,iThe drone sends data Q to the next sensor jcThis time is ti,jLet the unmanned aerial vehicle transmit power PsThen the transmission power consumption is:
Es=Ps(ti,i+ti,j)
wherein, ti,iAnd ti,jRespectively as follows:
wherein d isi,iAnd di,jRespectively represents the distance H between the unmanned aerial vehicle and the sensor i, j when the unmanned aerial vehicle is suspended on the sensor ii,HjRespectively, the height difference between the drone and the sensor i, j, Ei,jExpressed as:
Ei,j=Ef+Es
then the whole energy consumption E of the unmanned aerial vehicle traversing all the sensorsallExpressed as:
simultaneously, the following conditions are met:
and each sensor only needs to transmit data to the unmanned aerial vehicle once, and power consumption calculation is not repeated.
The step S4 specifically includes: deducing the energy consumption E generated by the unmanned plane traversing and communicating with a ground sensori,jW is taken as the weight of energy consumption, and the total energy consumption E of the unmanned aerial vehicle traversing all sensors on the ground can be known through the analysisallRequiring demine depending on the sum of energy consumption of each sensor traversedallAnd obtaining the optimal traversal path of the unmanned aerial vehicle, which minimizes the total energy consumption, by using a Q-learning algorithm. Q-learning has three elements, namely, state, action and reward. The agent (referred to as the drone) will take action based on the current state and record the reward that is fed back so that it can take more optimal action the next time it comes to the same state. Q is an action utility function used for evaluating the quality of certain action taken under a specific state.
The specific steps for obtaining the optimal energy consumption traversal path of the unmanned aerial vehicle are as follows:
(1) define the state of the drone s ═ (x)s,ys),(xs,ys) Representing the position coordinate information of a sensor i, defining a Q table, recording a state s in each row in the table, and selecting a Q value in different actions, wherein the actions are from the current sensor to the next sensor; every time, unmanned aerial vehicle has two kinds of actions optional: randomly selecting one sensor from all sensors as a next sensor number to arrive; selecting an action executed in the state with the maximum Q value, namely the next sensor reached by the unmanned aerial vehicle; taking w as the weight of the energy consumption Ei, j generated by the drone traversing one ground sensor and communicating with it, defining the following reward value function, which represents the reward value of the drone performing an action in state s:
Ri=-wEi,j
(2) initializing N ground sensors, initializing a sensor number omega being {1, 2.. N }, initializing values of w, epsilon, lambda and gamma, wherein gamma is an attenuation coefficient, lambda is a learning rate, and having gamma being in (0, 1), lambda being in (0, 1), epsilon being a threshold, initializing N multiplied by N energy matrixes Ei,jAnd a reward matrix RiI, j ═ 1, 2., N }, initialize Q ← 0N,N,0N,NRepresenting an NxNth-order zero matrix, and initializing the state s of the unmanned aerial vehicle, wherein omega is omega;
(3) suppose Qi[s,a]Indicating that the drone executes action a when in state s, this is an action that the drone reaches from one sensor i to another sensor i +1, so that the drone reaches the next state s' ═ xs′,ys′]The obtained Q value generates a random number μ from 0 to 1, and if μ < epsilon, the above action (i) is performed, that is, the next sensor number to be reached by the drone is randomly selected from Ω' ═ {1, 2.. N }; otherwise, the above action (ii) is performed, i.e. the action (a ') performed in the state (s') with the maximum Q value is selected, i.e. from the last sensor (i + 1) to the next sensor (i + 2); storing the Q value obtained by each iteration in a Q table, and updating the Q value by using the following formula:
Q′=Qi[s,a];
Q′=Q′+λ(Ri[s,a]+γmaxQi+1[s′,a′]-Q′);
Qi[s,a]=Q′;
wherein R isi[s,a]Reward value, maxQ, representing the process of the drone in state s from the current sensor i to the next sensor i +1i+1[s′,a′]Representing the maximum Q value of the subsequent state, and circularly executing (2) when i is less than N;
(4) after the above process is executed, obtaining an NxN Q table, wherein the maximum value of each line of the Q table represents the optimal selection; obtaining the unmanned plane path planning decision of the given path point according to the maximum Q value in each state, and calculating each E through the pathi,jAnd summing to finally obtain the minimum value minE of the energy consumption of all sensors on the ground traversed by the unmanned aerial vehicleall。
In the example, a 2km by 2km area is selected and is gridded into 10 by 10 blocks of area. Each block has a width of 200 m. All the areas where we need to collect data occupy only 48 area blocks, we place each sensor in the middle of the grid. As shown in fig. 3.
The distance between the two sensors is calculated and recorded in the matrix D. From historical data, we conclude that each sensor needs to collect data, which is stored in matrix QoIn (1). Let ω be 1, H be 120m, B be 1MHz, η be 50dB,let us assume that the communication power of the drone and the sensor is Ps5W. The flight power of the unmanned plane is Ph80W. When we load all the data, we try to adjust the parameters of Q-learning, enabling the algorithm to run and converge, and finally get the optimal path. Finally, we set the learning rate λ to 0.1, w to 1, and the search coefficient ∈ to 0.88. The training results are shown in fig. 4. The final algorithm path is shown in fig. 5.
While embodiments of the present invention have been described above, the present invention is not limited to the specific embodiments and applications described above, which are intended to be illustrative, instructive, and not limiting. Those skilled in the art, having the benefit of this disclosure, may effect numerous modifications thereto without departing from the scope of the invention as defined by the appended claims.
- 上一篇:石墨接头机器人自动装卡簧、装栓机
- 下一篇:一种无人机的反制方法及无人机的反制系统