Traffic characteristic prediction method and system applicable to long-time prediction based on control information
1. A traffic characteristic prediction method suitable for long-time prediction based on control information is characterized by comprising the following steps:
(1) the data preprocessing module acquires a first traffic characteristic, a second traffic characteristic and a third traffic characteristic and processes the first traffic characteristic, the second traffic characteristic and the third traffic characteristic into the input of the space-time data embedding module; wherein the first traffic characteristic is characteristic data with time variation generated by traffic operation; the second traffic characteristic is characteristic data with short-time invariance of the environment where the traffic runs; the traffic third characteristic is characteristic data with time variation generated by controlling traffic operation;
(2) the space-time data embedding module respectively sends the traffic first characteristic, the traffic second characteristic and the traffic third characteristic processed in the step (1) into a neural network to obtain space embedding, time embedding and control embedding, and the space embedding, the time embedding and the control embedding are combined into comprehensive embedding;
(3) processing the comprehensive embedding obtained in the encoding step (2) by using an encoder based on a space-time attention mechanism module;
(4) the data conversion module converts the coded comprehensive embedding into the input of a decoder by using a transfer attention mechanism;
(5) and (4) processing the output of the data conversion module in the decoding step (4) by using a decoder of the space-time attention mechanism module to obtain the finally predicted traffic characteristics.
2. The traffic feature prediction method applicable to long-term prediction based on control information according to claim 1, characterized in that: the step (1) specifically comprises the following steps:
(1.1) Using traffic second characteristic dataConstructing a connection diagram; the second traffic characteristic is that, taking road network structure data as an example, the topology of the urban road network is converted into a weighted directed connection graph G ═ (V, E, a), where V is a set of nodes, represents a road segment in the actual road network, is a finite set, and | V | ═ N, that is, the number of road segments in the actual road network is N; e is a set of edges, represents the connectivity among the road sections in the actual road network, and takes the direction of traffic flow among the road sections as the direction of the edges; a is an element of RN×NRepresenting a weighted adjacency matrix in which,representing a node viTo node vjThe weight of (c); specifically, the weight of the adjacent matrix is calculated by utilizing a Gaussian weight model, and the density of the adjacent matrix can be effectively controlled by utilizing a threshold value;
wherein the content of the first and second substances,is a node viTo node vjBy the distance of node viAnd node vjHalf of the sum of the lengths of the represented road segments is approximately replaced; sigma is the standard deviation of all distance values, and epsilon is a threshold value used for controlling the sparsity of the adjacent matrix;
(1.2) normalizing the traffic first characteristic data;
and (1.3) matching the traffic third characteristic data to the road section, converting the traffic third characteristic data into a period index and a green signal ratio index, performing discretization treatment, and processing the discrete data into the one-hot code.
3. The traffic feature prediction method applicable to long-term prediction based on control information according to claim 1, characterized in that: the step (2) is specifically as follows:
(2.1) generating spatial embedding, specifically: utilizing Deepwalk, node2Vec, Deepwalk, node2Vec and Deepwalk to construct a connection map constructed in the step (1.1),Learning vector representation of the vertex by any method in GraphSAGE; and feeding the vectors into a two-layer fully-connected neural network to obtain spatial embedding represented as
(2.2) generating time embedding, specifically: encoding each time corresponding to the traffic speed data in the step (1.2) into a vector; encoding time as R according to the time of seven days of the week and each day as time steps7And RTAnd splicing them into R7+TThe vector of (a); sending into a neural network with two or more layers, converting into a D-dimension vector, namely time embedding, and expressing asWherein P represents the historical time step number of the input, Q represents the time step number of the output needing to be predicted;
(2.3) generating a control embedding, specifically: respectively processing the discretized period index and the discretized green ratio index obtained in the step (1.3) by R10And splicing them into R20The vector of (a); collecting a vector corresponding to each control data, and sending the vector into a neural network with two or more layers to obtain control embedding represented as
(2.4) synthesizing and integrating the embedding, in particular, integrating the spatial embedding and the temporal embedding and the control embedding into the integrated embedding; for at time step tjNode viComprehensive embedding is defined asOrα, β, and γ are trainable weights, respectively; bag for holding foodThe integrated embedding of N nodes with P + Q time steps is expressed as E E R(P+Q)×N×D(ii) a Where the composite embedding contains both temporal, spatial and control information.
4. The traffic feature prediction method applicable to long-term prediction based on control information according to claim 1, characterized in that: when the comprehensive embedding obtained in the encoding step (2) is processed by utilizing a space-time attention mechanism module in the step (3), before entering an encoder, the speed data X belonging to the R in the step (1.2) after normalizationP×N×CIs converted into H through the full connection layer(0)∈RP×N×D(ii) a Then H(0)Obtaining an H through an encoder of an L-layer space-time attention mechanism module(L)∈RP×N×DAn output of (d); the space-time attention mechanism module is formed by fusing a time attention mechanism and a space attention mechanism by a gate control fusion device; the input to the l-th layer spatiotemporal attention module is denoted as H(l-1)Wherein at time step tjNode v ofiIs represented asThe spatial attention mechanism and the temporal attention mechanism output in the l-th layer space-time attention module are respectively expressed asAnd
5. the traffic feature prediction method applicable to long-term prediction based on control information according to claim 4, characterized in that: the spatial attention machine is used for adaptively mastering the relation among the traffic characteristics of different road sections in the road network, and the key point of the spatial attention machine is that different weights are dynamically set at different time steps and are connected to different nodes; wherein at time step tjNode v ofiCalculate allThe weighted sum of the nodes is:
where V represents the set of all nodes,is to represent node v to node viAttention score of importance, the sum of which is 1, i.e. For attention at time step t in the space-time attention mechanism module of the l-th layerjNode v ofiThe input of (a) is performed,for attention at time step t in the space-time attention mechanism module of the l-th layerjNode v ofiThe output of the spatial attention mechanism.
6. The traffic feature prediction method applicable to long-term prediction based on control information according to claim 5, characterized in that: the attention score calculation specifically comprises the following steps: at a particular time step, the current traffic state and road network structure simultaneously affect the relationship between the sensors; considering traffic characteristics and graph structure and control information to learn the whole attention score, especially connecting the comprehensive embedding and hidden state, and applying the scaling dot product method to calculate the node v and the node viThe correlation between:
wherein the content of the first and second substances,is shown at time step tjNode viThe comprehensive embedding of (1), the splicing operation is represented by,<·,·>inner product representation, 2D representationDimension (d); function pairs are then activated with softmaxNormalization:
in particular, in order to make the learning process more stable, the spatial attention mechanism is upgraded to a multi-head attention mechanism; namely, K parallel attention mechanisms are set, and K sets of different learnable equations are set:
wherein the content of the first and second substances,andrepresenting kth head space attention mechanismThree different nonlinear equations can finally output D ═ D/K dimension vectors; the nonlinear equation is of the form:
f(x)=ReLU(xW+b)
where W and b are trainable parameters, respectively, and ReLU is an activation function.
7. The traffic feature prediction method applicable to long-term prediction based on control information according to claim 4, characterized in that: the time attention mechanism is used for adaptively modeling the nonlinear relation between different time steps of the same node; the time correlation continuously changes between different time steps and is influenced by factors such as traffic state, related time, control state and the like; therefore, the comprehensive embedding containing the information of the three is utilized to combine the hidden state, and a multi-head attention mechanism is applied to calculate the time attention score; wherein for node viTime step tjThe correlation with t is defined as follows:
wherein the content of the first and second substances,representing the time step t in the kth time attention mechanismjThe correlation with the time step t,representing time step t versus time step t in the kth attention mechanismjAttention score of importance of;two of the attention mechanisms representing the kth timeDifferent learnable non-linear equations, the form of the non-linear equations being the same as in spatial attention; n is a radical oftjRepresents tjA set of all time steps prior to the time step; an attention score is obtainedThen at tjVertex v of time stepiMay be updated according to:
wherein the content of the first and second substances,represents a non-linear equation in the kth temporal attention mechanism, of the same form as in spatial attention; the learnable parameters in the above three equations are shared among all nodes and time steps when computed in parallel.
8. The traffic feature prediction method applicable to long-term prediction based on control information according to claim 4, characterized in that: the gated fuser has the function of adaptively fusing representations of time and space, or representations of time, space, control; at the l-th layer space-time attention mechanism module, the outputs of the time and space attention mechanisms are respectively expressed asAndthe fusion mode is as follows:
wherein, Wz,1∈RD×D、Wz,2∈RD×DAnd bz∈RDFor a learnable parameter, <' > indicates a point multiplication, (. sigma.. cndot.) indicates a sigmoid activation function, and z indicates a gate; h(l)The output of the space-time attention mechanism module of the l layer; this gated fuser can adaptively control the weight of the spatio-temporal dependencies for each vertex and each time step.
9. The traffic feature prediction method applicable to long-term prediction based on control information according to claim 1, characterized in that: the data conversion module of step (4) modeling a direct relationship between each future time step and the historical time step to convert the encoded traffic characteristics to generate a future representation as input to a decoder; in particular, the encoded features H are transformed(L)∈RP×N×DTo generate a future sequence representation H(L+1)∈RQ×N×D(ii) a For each node viPredicting the time step tj(tj=tP+1,…,tP+Q) And a historical time step t (t ═ t)1,…,tP) The relationship of (c) is measured by synthetic embedding:
wherein the content of the first and second substances,representing the predicted time step t in the kth time attention mechanismjThe correlation with the historical time step t,representing the historical time step t versus the predicted time step t in the kth attention mechanismjAttention score of importance of;represents two different learnable nonlinear equations in the kth head distraction mechanism, the form of the nonlinear equations being the same as in spatial attention; and then using the attention scoreAdaptively selecting the relevant characteristics of historical P time steps, and converting the coded traffic characteristics into the input of a decoder:
represents a non-linear equation in the kth temporal attention mechanism, of the same form as in spatial attention;is a node viHistorical time step t, input at level l,for after conversion, node viAt the predicted time step tjA vector representation of the output of (a); in the three formulas, trainable parameters of all nodes and time steps can be calculated in parallel and shared.
10. The traffic feature prediction method applicable to long-term prediction based on control information according to claim 1, characterized in that: the above-mentionedThe output of the data conversion module in the step (5) is H(L+1)∈RQ×N×DThe decoder of the space-time attention mechanism module comprises an L-layer space-time attention mechanism module and outputs H(2L+1)∈RQ×N×D(ii) a Finally, the full connection layer outputs the predicted values in advance of Q time steps
11. A traffic characteristic prediction system based on control information and suitable for long-time prediction is characterized by comprising a data preprocessing module, a space-time data embedding module, a space-time attention mechanism module and a data conversion module; the data preprocessing module is used for acquiring a first traffic characteristic, a second traffic characteristic and a third traffic characteristic and processing the first traffic characteristic, the second traffic characteristic and the third traffic characteristic into input of the space-time data embedding module; the space-time data embedding module respectively sends the first traffic characteristic, the second traffic characteristic and the third traffic characteristic to a neural network to obtain space embedding, time embedding and control embedding, and the space embedding, the time embedding and the control embedding are combined into comprehensive embedding; the data conversion module is used for converting the coded comprehensive embedding to be used as the input of a decoder; the space-time attention mechanism module comprises an encoder and a decoder, wherein the encoder of the space-time attention mechanism module is used for processing the comprehensive embedding of the output of the encoding space-time data embedding module; and the decoder of the space-time attention mechanism module is used for processing the output of the decoded data conversion module to obtain the finally predicted traffic characteristics.
Background
Traffic characteristic prediction is a classic time series prediction problem, and the future traffic condition is predicted according to the observed value of the historical traffic condition. The accuracy of traffic characteristic prediction influences various applications such as traffic signal control, traffic navigation and the like. Accurate traffic prediction may increase the effectiveness of traffic decisions, thereby better reducing traffic congestion. However, the existing traffic characteristic prediction method still has defects, which are specifically represented as follows: 1) the effect on long-time traffic characteristic prediction is not optimistic because the traffic characteristics have complex space-time relationship and the error of the long-time prediction is amplified at each step; 2) the method is mainly used for predicting the equiangular degrees of time correlation and space correlation, the relation between the predicted traffic characteristics and historical traffic characteristics and the traffic characteristics of the similar regions is constructed in the conventional correlation research from the perspective of time correlation or space correlation, and the influence of control information on the predicted values of the traffic characteristics is ignored. In order to effectively solve the above problems, it is necessary to design a traffic characteristic prediction method and system based on control information and suitable for long-term prediction.
Disclosure of Invention
The invention aims to overcome the defects and provides a traffic characteristic prediction method and a system based on control information and suitable for long-time prediction, wherein the control information is introduced into the traffic characteristic prediction to respectively process traffic speed, road network structure and traffic control data; the data are respectively sent to a neural network to obtain space embedding, time embedding and control embedding, and the space embedding, the time embedding and the control embedding are combined to form comprehensive embedding; and finally obtaining the predicted traffic speed value through an encoder based on a space-time attention mechanism module, a data conversion module and a decoder based on the space-time attention mechanism module. The invention improves the effectiveness of the speed prediction of the signal control road network, can better model the dynamic space correlation and the nonlinear time correlation, and simultaneously can avoid error accumulation to relieve the error propagation effect, thereby improving the long-term traffic flow prediction performance and solving the problem of being suitable for long-term traffic prediction.
The invention achieves the aim through the following technical scheme: a traffic characteristic prediction method suitable for long-time prediction based on control information comprises the following steps:
(1) the data preprocessing module acquires a first traffic characteristic, a second traffic characteristic and a third traffic characteristic and processes the first traffic characteristic, the second traffic characteristic and the third traffic characteristic into the input of the space-time data embedding module; the first traffic characteristic is characteristic data with time variation generated by traffic operation, including but not limited to: traffic speed, flow, occupancy, queuing length, congestion index; the second traffic characteristic is characteristic data with short-time invariance of the environment where the traffic runs, and the characteristic data comprises but is not limited to: road network structure, POI distribution, traffic infrastructure distribution; the third characteristic of the traffic is characteristic data with time variation generated by controlling traffic operation, including but not limited to: traffic control data, traffic guidance data, traffic restriction data;
(2) the space-time data embedding module respectively sends the traffic first characteristic, the traffic second characteristic and the traffic third characteristic processed in the step (1) into a neural network to obtain space embedding, time embedding and control embedding, and the space embedding, the time embedding and the control embedding are combined into comprehensive embedding;
(3) processing the comprehensive embedding obtained in the encoding step (2) by using an encoder based on a space-time attention mechanism module;
(4) the data conversion module converts the coded comprehensive embedding into the input of a decoder by using a transfer attention mechanism;
(5) and (4) processing the output of the data conversion module in the decoding step (4) by using a decoder of the space-time attention mechanism module to obtain the finally predicted traffic characteristics.
Preferably, the step (1) specifically includes the steps of:
(1.1) constructing a connection graph by using the traffic second characteristic data; the second traffic characteristic is that, taking road network structure data as an example, topology of an urban road network is converted into a weighted directed connection graph G ═ (V, E, a), where V is a set of nodes, represents road segments in an actual road network, is a finite set, and | V | ═ N, that is, the number of road segments in the actual road network is N; e is a set of edges, represents the connectivity among the road sections in the actual road network, and takes the direction of traffic flow among the road sections as the direction of the edges; a is an element of RN×NRepresenting a weighted adjacency matrix in which,representing a node viTo node vjThe weight of (c); specifically, the weight of the adjacent matrix is calculated by utilizing a Gaussian weight model, and the density of the adjacent matrix can be effectively controlled by utilizing a threshold value;
wherein the content of the first and second substances,is a node viTo node vjBy the distance of node viAnd node vjHalf of the sum of the lengths of the represented road segments is approximately replaced; sigma is the standard deviation of all distance values, epsilon is a threshold value used for controlling the sparsity of the adjacent matrix and is set to be 0.1;
(1.2) normalizing the traffic first characteristic data; the traffic first characteristic data takes traffic speed data as an example, and the specific processing flow is as follows: the original traffic data speed at time t is represented as Xt∈RN×CWherein N is the number of nodes, C is the number of node characteristics, and 1 is taken here and only includes the speed characteristics of the road section; taking 2, road section speed and road section flow; taking 3, including road section speed, road section flow and road section density; finally, X is treated by a Z-score method and a max-min methodtCarrying out normalization;
(1.3) matching the traffic third characteristic data to a road section, converting the road section into a period index and a green signal ratio index, carrying out discretization treatment, and processing the road section into a single-hot code; the third traffic characteristic is exemplified by control data, the control data indicates cycle and split ratio data of the road, and the matching of the control data to the road section specifically comprises the following steps: combining the road network structure data, taking the period of the downstream intersection of the road section as the period of the road section, and taking the split of the phase of the vehicle which can enter the intersection of the road section as the split of the road section; then, calculating a period index as a ratio of a road section period to a historical road section maximum period, and calculating a green signal ratio index as a ratio of a road section green signal ratio to a historical maximum green signal ratio; and discretizing the period index and the green signal ratio index according to the corresponding table.
Preferably, the step (2) is specifically as follows:
(2.1) generating spatial embedding, specifically: learning the vector representation of the vertex by using any one method of Deepwalk, node2Vec and GraphSAGE on the connection graph constructed in the step (1.1); and feeding the vectors into a two-layer fully-connected neural network to obtain spatial embedding represented as
(2.2) generating time embedding, specifically: the traffic speed in the step (1.2)Each time corresponding to the data is encoded into a vector; the time is coded as R according to the time steps of seven days of the week and the time of each day respectively7And RTAnd splicing them into R7+TThe vector of (a); wherein, the value of T can be 24 calculated according to the hour value and 1440 calculated according to the minute value; sending the vector into a neural network with two or more layers, such as a fully-connected neural network, a cyclic neural network, a deep belief network, etc., converting into a D-dimension vector, namely a time-embedded vector expressed as timeWherein P represents the historical time step number of the input, Q represents the time step number of the output needing to be predicted;
(2.3) generating a control embedding, specifically: respectively processing the discretized period index and the discretized green ratio index obtained in the step (1.3) by R10And splicing them into R20The vector of (a); collecting a vector corresponding to each control data, and sending into a two-layer or more than two-layer fully-connected neural network, cyclic neural network, deep belief network, etc. to obtain control embedding represented as
(2.4) synthesizing and integrating the embedding, in particular, integrating the spatial embedding and the temporal embedding and the control embedding into the integrated embedding; for at time step tjNode viComprehensive embedding is defined asOrα, β, and γ are trainable weights, respectively; thus the integrated embedding of N nodes containing P + Q time steps is denoted as E ∈ R(P+Q)×N×D(ii) a Where the composite embedding contains both temporal, spatial and control information.
Preferably, theWhen the comprehensive embedding obtained in the coding step (2) is processed by utilizing a space-time attention mechanism module in the step (3), before entering the coder, the normalized speed data X belonging to the R in the step (1.2)P×N×CIs converted into H through the full connection layer(0)∈RP×N×D(ii) a Then H(0)Obtaining an H through an encoder of an L-layer space-time attention mechanism module(L)∈RP×N×DAn output of (d); the space-time attention mechanism module is formed by fusing a time attention mechanism and a space attention mechanism by a gate control fusion device; the input to the l-th layer spatiotemporal attention module is denoted as H(l-1)Wherein at time step tjNode v ofiIs represented asThe spatial attention mechanism and the temporal attention mechanism output in the l-th layer space-time attention module are respectively expressed asAnd
preferably, the spatial attention mechanism is used for adaptively grasping the relation among the traffic characteristics of different road sections in the road network, and the core of the spatial attention mechanism is to dynamically set different weights at different time steps to be connected to different nodes; wherein at time step tjNode v ofiThe weighted sum of all nodes is calculated as:
where V represents the set of all nodes,is to represent node v to node viAttention score of importance, the sum of which is 1, i.e.For attention at time step t in the space-time attention mechanism module of the l-th layerjNode v ofiThe input of (a) is performed,for attention at time step t in the space-time attention mechanism module of the l-th layerjNode v ofiOutput through a spatial attention mechanism.
Preferably, the calculation of the attention score is specifically as follows: at a particular time step, the current traffic state and road network structure simultaneously affect the relationship between the sensors; considering traffic characteristics and graph structure and control information to learn the whole attention score, especially connecting the integrated embedding and hidden state, and applying the scaling dot product method to calculate the node v and the node viThe correlation between:
wherein the content of the first and second substances,is shown at time step tjNode viThe comprehensive embedding of (1), the splicing operation is represented by,<·,·>inner product representation, 2D representationDimension (d); function pairs are then activated with softmaxNormalization:
in particular, in order to make the learning process more stable, the spatial attention mechanism is upgraded to a multi-head attention mechanism; namely, K parallel attention mechanisms are set, and K sets of different learnable equations are set:
wherein the content of the first and second substances,andthree different nonlinear equations representing the kth spatial attention mechanism can finally output D ═ D/K dimensional vectors; the nonlinear equation is in the form of:
f(x)=ReLU(xW+b)
where W and b are trainable parameters, respectively, and ReLU is an activation function.
Preferably, the time attention mechanism is used for adaptively modeling the nonlinear relation between different time steps of the same node; the time correlation continuously changes between the non-use time steps and is simultaneously influenced by factors such as traffic states, related time, control states and the like; therefore, the comprehensive embedding containing the information of the three is utilized to combine the hidden state, and a multi-head attention mechanism is applied to calculate the time attention score; wherein for node viTime step tjThe correlation with t is defined as follows:
wherein the content of the first and second substances,representing the time step t in the kth time attention mechanismjThe correlation with the time step t,representing time step t versus time step t in the kth attention mechanismjAttention score of importance of;represents two different learnable non-linear equations in the kth temporal attention mechanism, the form of the non-linear equations being the same as in the spatial attention mechanism;represents tjA set of all time steps prior to the time step; an attention score is obtainedThen at tjVertex v of time stepiMay be updated according to:
wherein the content of the first and second substances,representing a nonlinear equation in the kth temporal attention mechanism, the form of which is the same as that in spatial attention; the learnable parameters in the above three equations are shared across all nodes and time steps when calculated in parallel.
Preferably, the gated fuser functions to adaptively fuse the temporal and spatial representations, or the fusion time,Spatial, controlled representations; the outputs of the time and space attention machine module are respectively expressed asAndthe fusion mode is as follows:
wherein, Wz,1∈RD×D、Wz,2∈RD×DAnd bz∈RDFor a learnable parameter, <' > indicates a point multiplication, (. sigma.. cndot.) indicates a sigmoid activation function, and z indicates a gate; h(l)Output of a space-time attention mechanism module for the first layer; this gated fuser can adaptively control the weight of the spatio-temporal dependencies for each vertex and each time step.
Preferably, the data conversion module of step (4) models a direct relationship between each future time step and the historical time step to convert the encoded traffic characteristics to generate a future representation as an input to the decoder; in particular, the encoded features H are transformed(L)∈RP×N×DTo generate a future sequence representation H(L+1)∈RQ×N×D(ii) a For each node viPredicting the time step tj(tj=tP+1,…,tP+Q) And a historical time step t (t ═ t)1,…,tP) The relationship of (c) is measured by synthetic embedding:
wherein the content of the first and second substances,representing the predicted time step t in the kth time attention mechanismjThe correlation with the historical time step t,representing the historical time step t versus the predicted time step t in the kth attention mechanismjAttention score of importance of;represents two different learnable nonlinear equations in the kth head distraction mechanism, the form of the nonlinear equations being the same as in spatial attention; and then using the attention scoreAdaptively selecting the relevant characteristics of historical P time steps, and converting the coded traffic characteristics into the input of a decoder:
representing a non-linear equation in the kth temporal attention mechanism, the form of which is the same as that in spatial attention;is a node viHistorical time step t, input at level l,for after conversion, node viAt the time of predictionStep tjA vector representation of the output of (a); in the three formulas, trainable parameters of all nodes and time steps can be calculated in parallel and shared.
Preferably, the output of the data conversion module in the step (5) is H(L+1)∈RQ×N×DThe decoder of the space-time attention mechanism module comprises an L-layer space-time attention mechanism module and outputs H(2L+1)∈RQ×N×D(ii) a Finally, the full connection layer outputs the predicted values in advance of Q time steps
A traffic characteristic prediction system based on control information and suitable for long-time prediction comprises a data preprocessing module, a time-space data embedding module, a time-space attention mechanism module and a data conversion module; the data preprocessing module is used for acquiring a first traffic characteristic, a second traffic characteristic and a third traffic characteristic and processing the first traffic characteristic, the second traffic characteristic and the third traffic characteristic into input of the space-time data embedding module; the space-time data embedding module respectively sends the first traffic characteristic, the second traffic characteristic and the third traffic characteristic to a neural network to obtain space embedding, time embedding and control embedding, and the space embedding, the time embedding and the control embedding are combined into comprehensive embedding; the data conversion module is used for converting the coded comprehensive embedding to be used as the input of a decoder; the space-time attention machine module comprises an encoder and a decoder, wherein the encoder of the space-time attention machine module is used for processing the comprehensive embedding of the output of the encoding space-time data embedding module; and the decoder of the space-time attention mechanism module is used for understanding the output of the code data conversion module to obtain the finally predicted traffic characteristics.
The invention has the beneficial effects that: (1) the invention introduces traffic control information, combines the control information with traffic speed and road network structure information according to specific processing, effectively captures the influence of the control information on traffic characteristics, and improves the effectiveness of speed prediction of the information control road network; (2) the invention provides a space-time attention mechanism, which can better model dynamic space correlation and nonlinear time correlation, and designs a gating fusion device to adaptively fuse information extracted by the space-time attention mechanism; (3) the invention designs a data conversion module, which transfers the historical traffic characteristics to future representation, models the direct relation between the historical time step and the future time step, avoids error accumulation, relieves the error propagation effect, improves the long-term traffic flow prediction performance and solves the problem of being suitable for long-term traffic prediction.
Drawings
FIG. 1 is a schematic diagram of the system of the present invention;
FIG. 2 is a schematic flow diagram of the method of the present invention;
FIG. 3 is a schematic diagram of a spatial attention mechanism capturing relationships between nodes over time in accordance with an embodiment of the present invention;
FIG. 4 is a schematic diagram of a temporal attention mechanism capturing relationships between nodes as a function of time in accordance with an embodiment of the present invention.
Detailed Description
The invention will be further described with reference to specific examples, but the scope of the invention is not limited thereto:
example (b): as shown in fig. 1, a traffic characteristic prediction system suitable for long-term prediction based on control information includes a data preprocessing module, a spatiotemporal data embedding module, a spatiotemporal attention mechanism module, and a data conversion module; the data preprocessing module is used for acquiring a first traffic characteristic, a second traffic characteristic and a third traffic characteristic and processing the first traffic characteristic, the second traffic characteristic and the third traffic characteristic into the input of the space-time data embedding module; the space-time data embedding module respectively sends the first traffic characteristic, the second traffic characteristic and the third traffic characteristic to a neural network to obtain space embedding, time embedding and control embedding, and the space embedding, the time embedding and the control embedding are combined into comprehensive embedding; the data conversion module is used for converting the coded comprehensive embedding to be used as the input of a decoder; the space-time attention mechanism module comprises an encoder and a decoder, wherein the encoder of the space-time attention mechanism module is used for processing the comprehensive embedding of the output of the encoding space-time data embedding module; and the decoder of the space-time attention mechanism module is used for processing the output of the decoded data conversion module to obtain the finally predicted traffic characteristics.
As shown in fig. 2, a traffic characteristic prediction method suitable for long-term prediction based on control information includes the following steps:
s1, a data preprocessing module: the method comprises the steps of obtaining a first traffic characteristic (characteristic data with time variation generated by traffic operation, including but not limited to traffic speed, flow, occupancy, queue length and congestion index), a second traffic characteristic (characteristic data with short time non-variation of the environment where the traffic operation is located, including but not limited to road network structure, POI distribution and traffic infrastructure distribution) and a third traffic characteristic (characteristic data with time variation generated by controlling traffic operation, including but not limited to traffic control data, traffic guidance data and traffic restriction data), and processing the first traffic characteristic (characteristic data with time variation generated by traffic operation) into an input of a space-time data embedding module. The method specifically comprises the following steps:
s1.1, a connection diagram is constructed by utilizing the traffic second characteristic data. The second traffic characteristic is described as follows by taking a road network structure as an example: the road network structure data includes, but is not limited to, connection relationships between intersections in an actual road network, lengths of road segments connecting intersections, and the like, and can be acquired from each of the map stations and government agencies. Data samples are shown in table 1 below:
TABLE 1
Further, a connection graph is constructed by using the road network structure data. Converting the topology of the urban road network into a weighted directed connection graph G (V, E, A), wherein V is a set of nodes, represents the road sections in the actual road network, is a finite set, and i (V) is N, namely the number of the road sections in the actual road network is N; e is a set of edges representing connectivity between road segments in the actual road network, and the direction in which traffic can flow between road segments is taken as the direction of the edge. A is an element of RN×NRepresenting a weighted adjacency matrix in which,representing a node viTo node vjThe weight of (c). Specifically, the weight of the adjacency matrix is calculated by using a Gaussian weight model, and the density of the adjacency matrix can be effectively controlled by using a threshold value.
Wherein the content of the first and second substances,is a node viTo node vjBy the distance of node viAnd node vjHalf of the sum of the lengths of the represented road segments is approximately substituted. σ is the standard deviation of all distance values, and ε is a threshold used to control the sparsity of the adjacency matrix, set to 0.1.
S1.2, traffic first characteristic data are normalized. The traffic speed is taken as an example for explanation: the traffic speed data refers to road segment traffic speed data acquired based on a movement detection technology or a section detection technology, and includes, but is not limited to, a road segment number, time, and speed. Data samples are shown in table 2 below:
road segment numbering
Time
Speed of rotation
14L020980T0
2021-03-12 09:02:00
30
14L020979T0
2021-03-12 09:03:00
40
14L020978T0
2021-03-12 09:04:00
35
14L020977T0
2021-03-12 09:05:00
36
TABLE 2
The specific treatment process comprises the following steps: the original traffic data speed at time t is represented as Xt∈RN×CWherein N is the number of nodes, C is the number of node characteristics, and 1 is taken here, and only the speed characteristics of the road section are included; taking 2, road section speed and road section flow; and taking 3, wherein the speed of the road section, the flow of the road section and the density of the road section are included. Then, X is subjected to the Z-score method and the max-min methodtAnd (6) carrying out normalization.
S1.3, matching the traffic third characteristic data to a road section, converting the road section into a period index and a green signal ratio index, and then carrying out discretization processing. The third traffic characteristic is illustrated by taking control data as an example: the control data refers to the period, green signal ratio and other data of the road junction. Data are as follows
Shown in Table 3:
crossing number
Time
Period of time
Phase position
Lvxinbi (%)
8ebc4b70778540
2021-03-12 09:02:00
180
A
40
8ebc4b70778540
2021-03-12 09:03:00
180
A
38
8ebc4b70778540
2021-03-12 09:04:00
180
A
38
8ebc4b70778540
2021-03-12 09:05:00
180
A
37
TABLE 3
Matching the control data to the road segment specifically is: combining the road network structure data, taking the period of the intersection downstream of the road section as the period of the road section, and taking the split of the phases of the vehicles capable of entering the intersection of the road section and the split of the road section. Then, the period index is calculated as the ratio of the road section period to the maximum period of the historical road sections, and the green ratio index is calculated as the ratio of the road section green ratio to the maximum historical green ratio. The data obtained are shown in Table 4 below:
TABLE 4
Then, discretizing the period index and the green signal ratio index according to a corresponding table, wherein the corresponding table is shown in the following table 5:
TABLE 5
S2, a spatio-temporal data embedding module: and respectively sending the traffic first characteristic, the traffic second characteristic and the traffic third characteristic processed in the last step into a neural network to obtain spatial embedding, time embedding and control embedding, and finally combining the spatial embedding, the time embedding and the control embedding into comprehensive embedding. The method specifically comprises the following steps:
s2.1 generating spatial embedding. The method specifically comprises the following steps: and (3) learning the vector representation of the vertex by using methods such as Deepwalk, node2Vec, GraphSAGE and the like on the connection graph constructed by the S1.1. These vectors are then fed into a two-layer fully-connected neural network, resulting in spatial embedding, denoted as
S2.2 generate time embedding. The method specifically comprises the following steps: and coding each time corresponding to the traffic speed data in the S1.2 into a vector. The time is coded as R according to the time steps of seven days of the week and each day7And RT(the value of T can be 24 in terms of hour value and 1440 in terms of minute value) and splicing them into R7+TThe vector of (2). For example, 2021-03-1209: 02:00 is Friday, and the week value is encoded as a 7-dimensional vector ([0000100 ]]) When at the same timeThe intermediate values are encoded as 24-dimensional vectors ([000000000100000000000000 ] at 24 hours a day]) The two are spliced into a 31-dimensional vector ([0000100000000000100000000000000 ]]). Then, the vector is sent into a neural network with two or more layers, such as a fully-connected neural network, a cyclic neural network, a deep belief network and the like, and is converted into a D-dimension vector, namely, the D-dimension vector is time-embedded and is expressed as timeWhere P represents the historical time step number of the input and Q represents the time step number of the output that needs to be predicted.
S2.3 generating control embedding. The method specifically comprises the following steps: respectively processing the discretized period index and the green ratio index obtained from S1.3 by R10And splicing them into R20The vector of (2). For example, if the period index of the discretization is (0.5,1.5,2.5,3.5,4.5,5.5,6.5,7.5,8.5,9.5), the converted one-hot codes are respectively ([0000000000 [)],[1000000000],[0100000000], [00100000000],[0001000000],[0000100000],[0000010000], [0000001000],[0000000100],[0000000010]). If the cycle index obtains one-hot code of [ 1000000000%]The split-heat code obtained by the split-Green index is [0000100000 ]]Then the vector spliced is [10000000000000100000 ]]. Then, the vectors (each control data corresponds to one vector) are sent into a neural network with two or more layers, such as a fully-connected neural network, a cyclic neural network, a deep belief network and the like, to obtain control embedding, which is expressed as control embedding
S2.4 Synthesis of synthetic intercalations. The method specifically comprises the following steps: and combining the spatial embedding and the temporal embedding and the control embedding into comprehensive embedding. For at time step tjNode viComprehensive embedding is defined asOrα, β, and γ are trainable weights, respectively. Thus the integrated embedding of N nodes containing P + Q time steps is denoted as E ∈ R(P+Q)×N×D. The composite embedding contains both temporal, spatial and control information.
S3, encoder based on space-time attention mechanism module: processing the comprehensive embedding obtained in the previous step of coding by utilizing a space-time attention machine modeling module; the method comprises the following specific steps:
and processing the comprehensive embedding obtained in the last step of coding by utilizing a space-time attention mechanism module. Normalized velocity data X ∈ R in S1.2 before entering the encoderP×N×CIs converted into H through the full connection layer(0)∈RP×N×D. Then, H(0)Obtaining an H through an encoder of an L-layer space-time attention mechanism module(L)∈RP×N×DTo output of (c).
The space-time attention module is formed by fusing a time attention mechanism and a space attention mechanism through a gating fusion device. The input to the l-th layer spatiotemporal attention module is denoted as H(l-1)Wherein at time step tjNode v ofiIs represented asThe spatial attention mechanism and the temporal attention mechanism output in the l-th layer space-time attention module are respectively expressed asAnd
the spatial attention mechanism is to capture spatial correlation. Spatial correlation means that the traffic state of one road (or segment) is influenced by other roads (or segments), these factors being highly dynamic and changing over time. In order to model such characteristics, the invention designs a spatial attention mechanism to adaptively grasp the relationship between the traffic characteristics of different road sections in the road network. The core of the method is to dynamically set different weights at different time steps to be connected to different nodes, as shown in FIG. 3.
At time step tjNode v ofiThe weighted sum of all nodes is calculated as:
where V represents the set of all nodes,is to represent node v to node viAttention score of importance, the sum of which is 1, i.e.For attention at time step t in the space-time attention mechanism module of the l-th layerjNode v ofiThe input of (a) is performed,for attention at time step t in the space-time attention mechanism module of the l-th layerjNode v ofiOutput through a spatial attention mechanism.
Further, the attention score is calculated as follows, at a particular time step, the current traffic state and the road network structure simultaneously influence the relationship between the sensors. For example, congestion on one road may affect the traffic status of his neighboring roads. Initiated by such an organization, the entire attention score is learned while taking into account traffic characteristics and graph structure and control information. In particular, the synthetic embedding is connected with the hidden state, and the scaling dot product method is applied to calculate the node v and the node viThe correlation between:
wherein the content of the first and second substances,is shown at time step tjNode viThe comprehensive embedding of (1), the splicing operation is represented by,<·,·>inner product representation, 2D representationOf (c) is calculated. Function pairs are then activated with softmaxNormalization:
in particular, to make the learning process more stable, the spatial attention mechanism is upgraded to a multi-head attention mechanism. That is, setting K parallel attention mechanisms, K sets of different learnable equations are set:
wherein the content of the first and second substances,andthree different non-linear equations representing the kth spatial attention mechanism can ultimately output D ═ D/K dimensional vectors. The nonlinear equation is of the form:
f(x)=ReLU(xW+b)
where W and b are trainable parameters, respectively, and ReLU is an activation function.
Further, the time attention mechanism is used for adaptively modeling the nonlinear relation between different time steps of the same node. The time dependence varies between non-use time steps and is influenced by traffic conditions and related time and control conditions. Thus, the present invention combines hidden states with comprehensive embedding that contains the three pieces of information and applies a multi-point attention mechanism to calculate the temporal attention score, as shown in FIG. 4.
For node viTime step tjThe correlation with t is defined as follows:
wherein the content of the first and second substances,representing the time step t in the kth time attention mechanismjThe correlation with the time step t,representing time step t versus time step t in the kth attention mechanismjAttention points of importance of (1).Represents two different learnable non-linear equations in the kth temporal attention mechanism, the form of the non-linear equations being the same as in the spatial attention mechanism mentioned above.Represents tjThe set of all time steps before a time step, i.e. only causal relationships between time steps earlier than the target step are considered.An attention score is obtainedThen at tjVertex v of time stepiMay be updated according to:
wherein the content of the first and second substances,represents a non-linear equation in the kth temporal attention mechanism in the same form as in the spatial attention mechanism mentioned above. The learnable parameters in the above three equations are shared among all nodes and time steps when computed in parallel.
Further, the gated fuser has the function of adaptively fusing the representations of time and space, or fusing the representations of time, space and control. The outputs of the time and space attention machine module are respectively expressed asAndthe fusion mode is as follows:
wherein, Wz,1∈RD×D、Wz,2∈RD×DAnd bz∈RDFor a learnable parameter,. alpha. denotes a point multiplication,. alpha. (. cndot.) denotes a sigmoid activation function, and z denotes a gate. H(l)Temporal and spatial attention to layer IAn output of the force mechanism module. This gated fuser can adaptively control the weight of the spatio-temporal dependencies for each vertex and each time step.
S4, a data conversion module: the transform-coded synthesis is embedded as input to the decoder. The method specifically comprises the following steps: with the shift attention mechanism, the synthesis in which the conversion is encoded is embedded as input to the decoder. In order to reduce error propagation at different prediction time steps in long-term prediction, the invention adds a data conversion module between a decoder and an encoder. It models the direct relationship of each future time step to the historical time step, converting the encoded traffic characteristics to a generated representation as input to the decoder. In particular, the encoded features H are transformed(L)∈RP×N×DTo generate a future sequence representation H(L+1)∈RQ ×N×D. For each node viPredicting the time step tj(tj=tP+1,…,tP+Q) And a historical time step t (t ═ t)1,…,tP) The relationship of (c) is measured by synthetic embedding.
Wherein the content of the first and second substances,representing the predicted time step t in the kth time attention mechanismjThe correlation with the historical time step t,representing the historical time step t versus the predicted time step t in the kth attention mechanismjAttention to the importance of.Representing two different learnable non-linear equations in the kth transfer attention mechanism, the form of the non-linear equations being the same as in the spatial attention mentioned above. Then, the attention score is usedAdaptively selecting the relevant characteristics of historical P time steps, and converting the coded traffic characteristics into the input of a decoder:
represents a non-linear equation in the kth temporal attention mechanism, of the same form as in the spatial attention mechanism mentioned above.Is a node viHistorical time step t, input at level l,for after conversion, node viAt the predicted time step tjThe vector representation of the output of (a). In the three formulas, trainable parameters of all nodes and time steps can be calculated in parallel and shared.
S5, decoder based on space-time attention mechanism module: and processing and decoding the output of the data conversion module in the last step by using the space-time attention machine module to obtain the finally predicted traffic characteristics. Wherein the output of the data conversion module is H(L+1)∈RQ ×N×D. The decoder comprises an L-layer space-time attention mechanism module and outputs H(2L+1)∈RQ×N×D. Finally, the full connection layer outputs the predicted value in advance of Q time steps
In conclusion, the invention introduces the control information into the traffic characteristic prediction, and respectively processes the traffic speed, the road network structure and the traffic control data; the data are respectively sent to a neural network to obtain space embedding, time embedding and control embedding, and the space embedding, the time embedding and the control embedding are combined to form comprehensive embedding; and finally obtaining the predicted traffic speed value through an encoder based on the space-time attention mechanism module, a data conversion module and a decoder based on the space-time attention mechanism module. The beneficial effects are mainly shown as follows:
in the prior correlation study, the relation between the predicted traffic characteristics and the historical traffic characteristics and the traffic characteristics of the similar areas is generally constructed from the perspective of time correlation or space correlation, and the influence of control information on the predicted values of the traffic characteristics is neglected. The invention introduces traffic control information, combines the control information with traffic speed and road network structure information according to specific processing, effectively captures the influence of the control information on traffic characteristics, and improves the effectiveness of speed prediction of the traffic control road network. In a road network, a signal control intersection controls the vehicles at the intersection to pass, under the normal condition, the vehicles at the intersection normally pass, if the signal control is carried out, the vehicles can be blocked from directly entering the next road section, and the relevance of speed, flow and the like between upstream and downstream can be influenced by the change of the duration of a green light. The control information is introduced as embedded to calculate the attention scores of time and space, and is used for describing the influence as a supplement besides the space information and the time information.
And (II) a space-time attention mechanism is provided, dynamic space correlation and nonlinear time correlation can be better modeled, and a gating fusion device is designed to adaptively fuse the information extracted by the space-time attention mechanism.
And (III) designing a data conversion module, transferring the historical traffic characteristics to future characteristics, modeling a direct relation between the historical time step and the future time step, and avoiding error accumulation to relieve an error propagation effect, thereby improving the long-term traffic flow prediction performance and solving the problem of being suitable for long-term traffic prediction.
While the invention has been described in connection with specific embodiments thereof, it will be understood by those skilled in the art that various changes in form and details may be made therein without departing from the spirit and scope of the invention as defined by the appended claims.
- 上一篇:石墨接头机器人自动装卡簧、装栓机
- 下一篇:一种景区游客流量预测方法及装置