Image-based traffic jam recognition method, device and equipment

文档序号:8434 发布日期:2021-09-17 浏览:44次 中文

1. An image-based traffic congestion identification method is characterized by comprising the following steps:

inputting video data acquired in the vehicle driving process into a pre-trained end-to-end-based road condition identification model, wherein the road condition identification model comprises an image feature extraction network, and the image feature extraction network is used for extracting background features of image information in the video data;

and obtaining a road condition classification result for identifying whether the traffic jam exists or not through the road condition identification model processing.

2. The image-based traffic congestion identification method according to claim 1, wherein the processing via the road condition identification model comprises:

respectively extracting time information and the background feature in the video data, wherein the time information comprises time interval features of adjacent video frames;

performing time sequence feature coding after the background feature and the time interval feature are overlapped;

and carrying out road condition classification according to the time sequence characteristic coding result.

3. The image-based traffic congestion identification method according to claim 1, wherein the image feature extraction network is trained by using an auto-supervision mechanism, and specifically comprises:

randomly constructing a forward sequence video sample and a reverse sequence video sample by using a road condition video training sample containing continuous video frames, and generating label values corresponding to the forward sequence video sample and the reverse sequence video sample;

inputting the positive sequence video sample and/or the negative sequence video sample into the image feature extraction network to obtain image information;

and classifying the input road condition video training samples into forward-sequence videos or reverse-sequence videos according to the image information and the label values, and transmitting the classification loss back to the image feature extraction network for iteration.

4. The image-based traffic congestion identification method according to any one of claims 1 to 3, wherein the training mode of the road condition identification model comprises:

dynamically resampling the original training data to obtain resampled data with a data distribution rule opposite to that of the original training data;

respectively extracting image information of original training data and resampled data, and performing feature fusion according to a preset weight proportion;

and updating the parameters of the road condition recognition model based on the fused image information to finish training.

5. An image-based traffic congestion identification device, comprising:

the system comprises a road condition video data input module, a road condition video data output module and a video data output module, wherein the road condition video data input module is used for inputting video data acquired in the driving process of a vehicle into a pre-trained end-to-end-based road condition identification model, the road condition identification model comprises an image feature extraction network, and the image feature extraction network is used for extracting background features of image information in the video data;

and the road condition classification result acquisition module is used for acquiring a road condition classification result for identifying whether traffic is congested or not through the road condition identification model processing.

6. The image-based traffic congestion identification apparatus according to claim 5, wherein the road condition identification model comprises:

the multi-dimensional feature extraction unit is used for respectively extracting time information and the background features in the video data, wherein the time information comprises time interval features of adjacent video frames;

the reinforced coding unit is used for carrying out time sequence characteristic coding after the background characteristic and the time interval characteristic are superposed;

and the video classification unit is used for classifying the road conditions according to the time sequence characteristic coding result.

7. The image-based traffic congestion recognition apparatus according to claim 5, further comprising an auto-supervised training module for training the image feature extraction network;

the self-supervision training module specifically comprises:

the forward sequence and reverse sequence video sample construction unit is used for constructing a forward sequence video sample and a reverse sequence video sample at random by utilizing a road condition video training sample containing continuous video frames and generating label values corresponding to the forward sequence video sample and the reverse sequence video sample;

the sample image information extraction unit is used for inputting the forward sequence video sample and/or the reverse sequence video sample into the image feature extraction network to obtain image information;

and the video sequence classification unit is used for classifying the input road condition video training samples into forward-sequence videos or reverse-sequence videos according to the image information and the label values, and transmitting the classification loss back to the image feature extraction network for iteration.

8. The image-based traffic congestion recognition device according to any one of claims 5 to 7, further comprising a joint training module for training the road condition recognition model;

the combined training module specifically comprises:

the dynamic resampling unit is used for dynamically resampling the original training data to obtain resampled data with a data distribution rule opposite to that of the original training data;

the image information acquisition unit is used for respectively extracting image information of the original training data and the resampled data and carrying out feature fusion according to a preset weight proportion;

and the model parameter learning unit is used for updating the parameters of the road condition recognition model based on the fused image information to finish training.

9. An electronic device, comprising:

one or more processors, a memory, and one or more computer programs, wherein the one or more computer programs are stored in the memory, the one or more computer programs comprising instructions which, when executed by the apparatus, cause the apparatus to perform the image based traffic congestion identification method of any of claims 1-4.

10. A computer data storage medium having a computer program stored therein, which when run on a computer causes the computer to execute the image-based traffic congestion identification method of any one of claims 1 to 4.

Background

The traffic jam refers to a traffic phenomenon that a traffic flow on a road cannot pass smoothly when the total traffic flow passing through a certain road section or intersection in the road is greater than the traffic capacity of the road (the traffic capacity of the road section or the intersection) due to the increase of traffic demand in a certain period of time, and a part of the traffic flow is stagnated on the road (the road section or the intersection).

The problem of traffic congestion on urban roads is becoming more severe, and has become one of the important factors that restrict urban health and sustainable development. For a driver, how to predict road conditions and change the road to another road for driving is also a crucial factor for solving road congestion. Therefore, at present, in combination with network and computer technologies, a machine-based road congestion identification scheme, such as but not limited to an image vision-based traffic congestion identification system, is developed accordingly.

At present, a traffic jam recognition scheme adopting image analysis mainly analyzes whether a road is in a state of jam, slow running, smooth running and the like through real-time road conditions shot by vehicle-mounted visual collecting and recording equipment such as a vehicle data recorder and the like. Specifically, the image-level identification and classification of the road conditions can be performed by technologies such as 3D-CNN or CNN + LSTM and the like in combination with a given video classification strategy. The existing classification strategy mainly learns the semantic features of the image foreground according to the conventional technical logic and implementation habits, namely, whether the road condition is in a congestion state or not is judged mainly by processing the vehicle information in the image. However, through practical analysis, the image foreground information is considered to be prone to have deviation for road condition judgment, and the difference of characteristics such as the number of vehicles, the distance between the front vehicle and the rear vehicle in video images under different road conditions may not be obvious, so that the accuracy of traffic jam judgment based on the image foreground information in the conventional thought may not be high.

Disclosure of Invention

In view of the foregoing, the present invention aims to provide a method, an apparatus and a device for identifying traffic congestion based on images, and accordingly provides a computer data storage medium and a computer program product, so as to solve the problem that the accuracy of identifying road conditions by a machine through images is not high.

The technical scheme adopted by the invention is as follows:

in a first aspect, the present invention provides an image-based traffic congestion identification method, including:

inputting video data acquired in the vehicle driving process into a pre-trained end-to-end-based road condition identification model, wherein the road condition identification model comprises an image feature extraction network, and the image feature extraction network is used for extracting background features of image information in the video data;

and obtaining a road condition classification result for identifying whether the traffic jam exists or not through the road condition identification model processing.

In at least one possible implementation manner, the processing via the road condition recognition model includes:

respectively extracting time information and the background feature in the video data, wherein the time information comprises time interval features of adjacent video frames;

performing time sequence feature coding after the background feature and the time interval feature are overlapped;

and carrying out road condition classification according to the time sequence characteristic coding result.

In at least one possible implementation manner, the image feature extraction network is trained by using an auto-supervision mechanism, and specifically includes:

randomly constructing a forward sequence video sample and a reverse sequence video sample by using a road condition video training sample containing continuous video frames, and generating label values corresponding to the forward sequence video sample and the reverse sequence video sample;

inputting the positive sequence video sample and/or the negative sequence video sample into the image feature extraction network to obtain image information;

and classifying the input road condition video training samples into forward-sequence videos or reverse-sequence videos according to the image information and the label values, and transmitting the classification loss back to the image feature extraction network for iteration.

In at least one possible implementation manner, the training manner of the road condition recognition model includes:

dynamically resampling the original training data to obtain resampled data with a data distribution rule opposite to that of the original training data;

respectively extracting image information of original training data and resampled data, and performing feature fusion according to a preset weight proportion;

and updating the parameters of the road condition recognition model based on the fused image information to finish training.

In a second aspect, the present invention provides an image-based traffic congestion identification apparatus, comprising:

the system comprises a road condition video data input module, a road condition video data output module and a video data output module, wherein the road condition video data input module is used for inputting video data acquired in the driving process of a vehicle into a pre-trained end-to-end-based road condition identification model, the road condition identification model comprises an image feature extraction network, and the image feature extraction network is used for extracting background features of image information in the video data;

and the road condition classification result acquisition module is used for acquiring a road condition classification result for identifying whether traffic is congested or not through the road condition identification model processing.

In at least one possible implementation manner, the traffic identification model includes:

the multi-dimensional feature extraction unit is used for respectively extracting time information and the background features in the video data, wherein the time information comprises time interval features of adjacent video frames;

the reinforced coding unit is used for carrying out time sequence characteristic coding after the background characteristic and the time interval characteristic are superposed;

and the video classification unit is used for classifying the road conditions according to the time sequence characteristic coding result.

In at least one possible implementation manner, the apparatus further includes an auto-supervised training module for training the image feature extraction network;

the self-supervision training module specifically comprises:

the forward sequence and reverse sequence video sample construction unit is used for constructing a forward sequence video sample and a reverse sequence video sample at random by utilizing a road condition video training sample containing continuous video frames and generating label values corresponding to the forward sequence video sample and the reverse sequence video sample;

the sample image information extraction unit is used for inputting the forward sequence video sample and/or the reverse sequence video sample into the image feature extraction network to obtain image information;

and the video sequence classification unit is used for classifying the input road condition video training samples into forward-sequence videos or reverse-sequence videos according to the image information and the label values, and transmitting the classification loss back to the image feature extraction network for iteration.

In at least one possible implementation manner, the apparatus further includes a joint training module for training the road condition recognition model;

the combined training module specifically comprises:

the dynamic resampling unit is used for dynamically resampling the original training data to obtain resampled data with a data distribution rule opposite to that of the original training data;

the image information acquisition unit is used for respectively extracting image information of the original training data and the resampled data and carrying out feature fusion according to a preset weight proportion;

and the model parameter learning unit is used for updating and training the parameters of the road condition recognition model based on the fused image information.

In a third aspect, the present invention provides an electronic device, comprising:

one or more processors, memory which may employ a non-volatile storage medium, and one or more computer programs stored in the memory, the one or more computer programs comprising instructions which, when executed by the apparatus, cause the apparatus to perform the method as in the first aspect or any possible implementation of the first aspect.

In a fourth aspect, the present invention provides a computer data storage medium having a computer program stored thereon, which, when run on a computer, causes the computer to perform at least the method as described in the first aspect or any of its possible implementations.

In a fifth aspect, the present invention also provides a computer program product for performing at least the method of the first aspect or any of its possible implementations, when the computer program product is executed by a computer.

In at least one possible implementation manner of the fifth aspect, the relevant program related to the product may be stored in whole or in part on a memory packaged with the processor, or may be stored in part or in whole on a storage medium not packaged with the processor.

The invention has the conception that the image background characteristic analysis is carried out on the road traffic video data collected during the running of the vehicle by constructing an end-to-end road condition classification model, and the identification of whether the traffic is congested or not is carried out on the basis of the background information in the video instead of the image foreground information, so that the real road condition of the road where the current vehicle is located can be accurately obtained, the route planning can be updated in real time, and the technical contribution with the effect of relieving the traffic congestion phenomenon is made.

Further, in order to enable the end-to-end road condition classification model to efficiently and reliably tend to pay attention to the image background information in the input video data on the premise of lower human consumption, the invention provides in some preferred embodiments to combine the time sequence characteristics of the video data, conveniently construct positive and negative sequence samples, and train the road condition classification model to perform specific feature coding on the image by adopting a self-supervision mechanism, namely, the learning capability of the model on the video background information is realized without additional manual marking.

Further, in other preferred embodiments of the present invention, not only is the effect of the image features on road condition identification considered, but also the time sequence characteristics of the video data shot in the vehicle driving process are fully combined, and the association relationship among the multidimensional information factors is supplemented, so that the classification performance of the model can be improved.

Further, aiming at the distribution characteristics of long tail data appearing in a road traffic video scene, in order to avoid serious overfitting when a model is trained due to unbalanced sample distribution and further reduce the road condition classification performance of the model, the invention provides in some preferred embodiments that resample data with a rule opposite to the original road traffic video distribution rule is constructed through dynamic resampling, and image information fusion is carried out on the original data and the resample data, so that a training set is obtained to carry out joint training on the end-to-end road condition classification model, so that classes with a large number of samples and classes with a small number of samples can be considered simultaneously, the robustness of the model can be improved better, and the road condition classification accuracy can be increased greatly.

Drawings

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention will be further described with reference to the accompanying drawings, in which:

fig. 1 is a flowchart of an image-based traffic congestion identification method according to an embodiment of the present invention;

fig. 2 is a flowchart of an image feature extraction network training method according to an embodiment of the present invention;

fig. 3 is a schematic diagram of an image-based traffic congestion identification apparatus according to an embodiment of the present invention;

fig. 4 is a schematic diagram of an electronic device according to an embodiment of the present invention.

Detailed Description

Reference will now be made in detail to embodiments of the present invention, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to the same or similar elements or elements having the same or similar function throughout. The embodiments described below with reference to the drawings are illustrative only and should not be construed as limiting the invention.

Before the specific embodiments of the present invention are developed, the design of the present invention will be explained. According to the invention, practice analysis shows that the road congestion situation has no inevitable relation with the number of vehicles, the distance between the vehicles and the like presented in the video due to the complexity of the real traffic scene and the randomness of the video data acquired in the vehicle driving stage, so that the foreground semantic features in the road condition image are not key factors.

For the image-based road condition analysis technology, it is necessary to consider how to analyze the moving speed of the vehicle from the image, and therefore, the present invention further contemplates that the traffic jam can be reliably and accurately identified by considering the change presented in the video relative to the background environment of the vehicle. However, the idea provides a challenge for the existing driving video classification scheme, because the currently adopted mode generally uses a foreground target as a main element of image semantic features, and the optimal road condition classification effect cannot be reliably supported by the image features.

In view of this, the present invention provides an embodiment of at least one of the following image-based traffic congestion identification methods, as shown in fig. 1, which may specifically include:

step S1, inputting video data acquired in the vehicle driving process into a pre-trained end-to-end-based road condition recognition model, wherein the road condition recognition model comprises an image feature extraction network for extracting background features of image information in the video data;

step S2, obtaining a road condition classification result for identifying whether traffic is congested or not through the road condition identification model processing.

It can be understood that, the embodiment of the method described above describes a processing logic framework of a road condition identification model in a testing stage, and the important point is that the invention provides an end-to-end modeling idea, and one of the main differences between the end-to-end road condition identification model and the prior art is that road congestion identification can be performed by fully paying attention to and using background information of video data acquired in a vehicle driving process, so that the disadvantages that misjudgment or road condition identification is difficult and the like possibly generated based on foreground information (such as an image of the vehicle itself) are avoided.

Specifically, the present invention proposes that the criterion for evaluating the road congestion condition may be evaluated based on the change (for example, caused by time, displacement, speed, etc.) of the relative background of the vehicle in the adjacent video frames, which requires that the image information adopted by the end-to-end road condition identification model needs to focus on the image background feature (rather than the foreground information), and for example, the relative position relationship and the relative displacement change of the background reference object between the consecutive frames may be learned.

In actual operation, the concept of the invention can be realized by adopting various means, so that the model can tend to the background information in the video frame when image feature extraction is carried out, for example, background feature labeling is carried out on a training set in a model training stage; alternatively, referring to the more convenient, efficient and low-cost training method adopted in some preferred embodiments of the present invention, referring to the image feature extraction network training method shown in fig. 2, the method may mainly include the following steps:

step S10, constructing a forward sequence video sample and a reverse sequence video sample at random by using road condition video training samples containing continuous video frames, and generating label values corresponding to the forward sequence video sample and the reverse sequence video sample;

step S20, inputting the positive sequence video sample and/or the negative sequence video sample into the image feature extraction network to obtain image information;

and step S30, classifying the input road condition video training sample into a forward sequence video or a reverse sequence video according to the image information and the label value, and transmitting the classification loss back to the image feature extraction network for iteration.

The idea of the preferred embodiment is that the video data has a time duration attribute, so that by using the self-characteristics of the vehicle condition video data, a forward video data set and a corresponding reverse video data set can be simply, conveniently and quickly constructed, and the background change of the video images mentioned above can be fully embodied in one-by-one video frame progressing according to the forward sequence or the reverse sequence, therefore, the invention provides that an automatic supervision mechanism can be adopted to enhance the learning of the image feature extraction network on the background information in the video, the automatic supervision mechanism mentioned here can be embodied in the process of constructing the forward sequence video and the reverse sequence video, and the corresponding marking information such as the forward sequence, the reverse sequence and the like can be naturally obtained without additionally adding other marks; in other words, after a section of normally shot driving record visual data is provided, the reverse-order video of the original data can be obtained in a mode similar to reverse playing, and corresponding marks are automatically generated.

The idea of the embodiment of fig. 2 is that when the image information acquired by the image feature extraction network is used to classify whether the input video sample belongs to the forward-sequence video or the reverse-sequence video, the result after each classification can be compared with the label value to calculate, and the deviation is fed back to the image feature extraction network to perform parameter alternation, that is, the accurate classification result of the forward sequence or the reverse sequence is obtained, which inevitably causes the image feature extraction network to pay more attention to the background information in the video image, so that the image feature extraction network is trained to mainly extract the background feature of the image information in the input video data.

On the basis of the above embodiments, the present invention also considers that the input video, especially the video frames extracted from the input video, may not be limited too much, and thus the time interval between frames may not be fixed, therefore, the present invention proposes to encode the time information of adjacent frames in some embodiments, on one hand, it is not necessary to perform too much constraint on the input video data, and on the other hand, it can also ensure that the traffic condition recognition model does not perform classification processing only according to a single image feature. In actual operation, the traffic condition identification model may respectively extract the time information and the background feature in the video data, where the time information includes a time interval feature of adjacent video frames, and more preferably, the background feature and the time interval feature may be superimposed and then subjected to time sequence feature coding, and then a more accurate traffic condition classification result is obtained according to a time sequence feature coding result.

For ease of understanding, the following examples are given herein by reference. In actual operation, continuous video frames may be first sent to an image feature extraction network, where the image feature extraction network may use, but is not limited to, resnet34, etc., and a background feature dimension output by the image feature extraction network may be B × C × T, where B refers to a batch size sent to the network, C is a channel number, and T is a timing length, where the timing length corresponds to a frame extraction number of one video (for example, 4 frames of images may be extracted from video data, that is, a feature map of 4B × C × 1 may be obtained by combining the data dimensions of B × 3 × H W originally input, and then a feature map of 4B × C × 1 may be obtained by feature transformation, where 4 is a timing length T).

While the resnet34 extracts the video information, the time information of the relevant video frame may also be time-coded through, for example, an Embedding layer, and for example, a time interval feature with a dimension B × C × T may be obtained. Then, the time interval features and the background features are subjected to feature fusion, so that the image features contain information of time dimension, and further, considering that classification is performed only by using images and time information and the possibility of missing part of information exists, the fused features can be subjected to enhanced coding again, namely more comprehensive and rich information is added, so that the subsequently obtained classification result is more accurate. Therefore, the present invention provides a time-series feature coding for the above-mentioned fused features, where the time-series feature coding process can be implemented by using a multi-head transform structure, for example, when 8-head transforms are selected, the feature matrix B × C4 can be converted into B × (C/8) × 4 × 8, three matrices with the same structure are obtained by convolution with three 1 × 1, which are respectively named as K, Q, V, and then the matrices K and Q are multiplied to obtain the matrix B × (C/8) × 4, and softmax operation is performed on the 4 × 4 dimension, that is, the feature weight on the dimension is obtained, and finally the matrix V is multiplied to obtain B × C/8) × 4.

And finally, converting the output characteristics B (C/8) 4 and 8 into B C4 after the time sequence characteristic coding processing, outputting the probability of each road condition category through a subsequent full-connection layer, and taking the category with the maximum probability value as the road condition category (such as severe congestion, light congestion, slow driving, smooth driving and the like) of the input video data.

It should be noted that, the aforementioned principles and working manners of the architectures such as resnet34, embed, transform, etc. may refer to the existing mature technologies, and the details of the present invention are not repeated herein. It should be noted that, compared to the conventional processing architecture of CNN + LSTM, the foregoing topology embodiment can implement parallelized training, and therefore, the time for training and reasoning can be greatly reduced by using the foregoing preferred example, and the model building efficiency is improved.

Finally, it can be additionally stated that, in combination with the scenario of the present invention, there is a problem in the video data samples obtained in the real world for model training, that is, in a natural scenario, when the situation of road smoothness is much larger than road slowness and congestion, serious category imbalance of the collected data is caused, most of the real data samples are video data with smooth roads, and relatively few of the real data samples are data with road congestion, and a smaller number of the real data samples are slowness data, so that the distribution of the whole data samples is unbalanced, and there is a long-tail distribution problem, and for the data with long-tail distribution, the number of very individual categories is large, but the category with small number is also important. Under the distribution condition, a large part of data of one batch is of a category with a large number of samples, a loss function is large, training parameters are biased to the category with a large number of samples, so that the condition that a few samples are seriously overfitted is caused, the category with a small number of samples is difficult to recall in a testing stage, the category with a large number of samples is easy to misreport, and the final road condition classification performance is reduced.

Aiming at the situation of the application scene, how to train the road condition recognition model without over-fitting is important for further consideration of the invention.

Specifically, there are generally two concepts for a solution to data imbalance. One is a data/feature enhancement method, which performs secondary sampling on less sample data, artificially balances the sample data, or adds operations similar to dropout in the network to avoid serious overfitting of the network; the second is to provide a new loss function, such as focalloss, to constrain the distance between the loss values of positive and negative samples, but these two schemes can only alleviate the problem of unbalanced distribution of long-tail data, and cannot solve the problem fundamentally, that is, the existing scheme for solving the unbalanced distribution of data samples does not solve the problem of recalling a small number of samples in nature, and does not emphasize the learning capability of the network on a small number of sample data.

Based on this, the present invention in some preferred embodiments proposes a joint training concept to solve the problem of long tail data distribution. Generally speaking, the implementation manner of the concept can be that firstly, original training data is dynamically resampled to obtain resample data opposite to the data distribution rule of the original training data, then, image information of the original training data and the resample data is respectively extracted, feature fusion is carried out according to a preset weight proportion, and finally, parameters of the road condition recognition model are updated based on the fused image information to complete training.

Specifically, in the embodiment, in the model training phase, the original video data is dynamically resampled, and resample data completely opposite to the data distribution rule (long tail) of the original data is obtained. Then, the two kinds of data with the opposite distribution rules can be sent to the image feature extraction network to obtain semantic features, namely image information, in the continuous frame images. Then, the image semantic features obtained from the two data are weighted and calculated by the following formula to obtain fused image information Z:

where f iscIs the original data, frIs a dynamic re-sampling of the data,the image feature extraction network parameter is a parameter of the image feature extraction network, and the weight coefficient α may be initialized to 1, which decreases with the increase of the number of iterations, whereas the weight coefficient (1- α) increases with the increase of the number of iterations, as shown in the following formula:

where i is the number of iterations, α is 1 when i is equal to 0, and α gradually approaches 0 as the number of iterations increases.

The joint training strategy provided by the invention can realize dynamic adjustment of the weight along with the increase of the number of iterations, so that the road condition recognition model learns different data distribution characteristics, and further the generalization capability of the model is improved. Specifically, at the initial stage of the road condition identification model training, the weight occupied by the original data is large enough, the loss function is mainly contributed by the original data, and the model develops towards the training direction of class fitting with a large number of samples. In the later stage of training, the weight occupied by the dynamic resampling data is heavy, the loss function at the moment is mainly contributed by the resampling data, and the model develops towards the class fitting direction with few samples, but the combined training is embodied in the same model framework, so that the model can be robust to the classes with large number of samples and the classes with few samples, the problem of over-fitting of partial classes caused by uneven distribution of the samples is well solved, and the accuracy of road condition classification can be further improved.

In summary, the idea of the present invention is to perform image background feature analysis on road traffic video data collected during vehicle driving by constructing an end-to-end road condition classification model, and identify whether a traffic jam occurs based on background information in a video, rather than image foreground information, so as to accurately obtain a real road condition of a road where a current vehicle is located, and further update route planning in real time, thereby making a productive technical contribution to alleviating the traffic jam phenomenon.

Further, in order to enable the end-to-end road condition classification model to efficiently and reliably tend to pay attention to the image background information in the input video data on the premise of lower human consumption, the invention provides in some preferred embodiments to combine the time sequence characteristics of the video data, conveniently construct positive and negative sequence samples, and train the road condition classification model to perform specific feature coding on the image by adopting a self-supervision mechanism, namely, the learning capability of the model on the video background information is realized without additional manual marking.

Further, in other preferred embodiments of the present invention, not only is the effect of the image features on road condition identification considered, but also the time sequence characteristics of the video data shot in the vehicle driving process are fully combined, and the association relationship among the multidimensional information factors is supplemented, so that the classification performance of the model can be improved.

Further, aiming at the distribution characteristics of long tail data appearing in a road traffic video scene, in order to avoid serious overfitting when a model is trained due to unbalanced sample distribution and further reduce the road condition classification performance of the model, the invention provides in some preferred embodiments that resample data with a rule opposite to the original road traffic video distribution rule is constructed through dynamic resampling, and image information fusion is carried out on the original data and the resample data, so that a training set is obtained to carry out joint training on the end-to-end road condition classification model, so that classes with a large number of samples and classes with a small number of samples can be considered simultaneously, the robustness of the model can be improved better, and the road condition classification accuracy can be increased greatly.

Corresponding to the above embodiments and preferred solutions, the present invention further provides an embodiment of an image-based traffic congestion identification apparatus, as shown in fig. 3, which may specifically include the following components:

the system comprises a road condition video data input module 1, a road condition video data output module and a video data output module, wherein the road condition video data input module is used for inputting video data acquired in the driving process of a vehicle into a pre-trained end-to-end-based road condition identification model, the road condition identification model comprises an image feature extraction network, and the image feature extraction network is used for extracting background features of image information in the video data;

and the road condition classification result acquisition module 2 is used for acquiring a road condition classification result for identifying whether traffic is congested or not through the road condition identification model processing.

In at least one possible implementation manner, the traffic identification model includes:

the multi-dimensional feature extraction unit is used for respectively extracting time information and the background features in the video data, wherein the time information comprises time interval features of adjacent video frames;

the reinforced coding unit is used for carrying out time sequence characteristic coding after the background characteristic and the time interval characteristic are superposed;

and the video classification unit is used for classifying the road conditions according to the time sequence characteristic coding result.

In at least one possible implementation manner, the apparatus further includes an auto-supervised training module for training the image feature extraction network;

the self-supervision training module specifically comprises:

the forward sequence and reverse sequence video sample construction unit is used for constructing a forward sequence video sample and a reverse sequence video sample at random by utilizing a road condition video training sample containing continuous video frames and generating label values corresponding to the forward sequence video sample and the reverse sequence video sample;

the sample image information extraction unit is used for inputting the forward sequence video sample and/or the reverse sequence video sample into the image feature extraction network to obtain image information;

and the video sequence classification unit is used for classifying the input road condition video training samples into forward-sequence videos or reverse-sequence videos according to the image information and the label values, and transmitting the classification loss back to the image feature extraction network for iteration.

In at least one possible implementation manner, the apparatus further includes a joint training module for training the road condition recognition model;

the combined training module specifically comprises:

the dynamic resampling unit is used for dynamically resampling the original training data to obtain resampled data with a data distribution rule opposite to that of the original training data;

the image information acquisition unit is used for respectively extracting image information of the original training data and the resampled data and carrying out feature fusion according to a preset weight proportion; it may be noted that the image information acquisition unit mentioned herein is not limited to refer to the aforementioned image feature extraction network in actual operation.

And the model parameter learning unit is used for updating and training the parameters of the road condition recognition model based on the fused image information.

It should be understood that the division of each component in the image-based traffic congestion identification apparatus shown in fig. 3 is merely a logical division, and the actual implementation may be wholly or partially integrated into a physical entity or physically separated. And these components may all be implemented in software invoked by a processing element; or may be implemented entirely in hardware; and part of the components can be realized in the form of calling by the processing element in software, and part of the components can be realized in the form of hardware. For example, a certain module may be a separate processing element, or may be integrated into a certain chip of the electronic device. Other components are implemented similarly. In addition, all or part of the components can be integrated together or can be independently realized. In implementation, each step of the above method or each component above may be implemented by an integrated logic circuit of hardware in a processor element or an instruction in the form of software.

For example, the above components may be one or more integrated circuits configured to implement the above methods, such as: one or more Application Specific Integrated Circuits (ASICs), one or more microprocessors (DSPs), one or more Field Programmable Gate Arrays (FPGAs), etc. For another example, these components may be integrated together and implemented in the form of a System-On-a-Chip (SOC).

In view of the foregoing examples and preferred embodiments thereof, it will be appreciated by those skilled in the art that, in practice, the technical idea underlying the present invention may be applied in a variety of embodiments, the present invention being schematically illustrated by the following vectors:

(1) an electronic device is provided. The device may specifically include: one or more processors, memory, and one or more computer programs, wherein the one or more computer programs are stored in the memory, the one or more computer programs comprising instructions, which when executed by the apparatus, cause the apparatus to perform the steps/functions of the foregoing embodiments or an equivalent implementation.

The electronic device may specifically be an electronic device related to a computer, such as but not limited to various interactive terminals and electronic products, for example, a vehicle-mounted intelligent terminal, a driving recorder, a navigation device, a background server of a vehicle networking, and the like.

Fig. 4 is a schematic structural diagram of an embodiment of an electronic device provided in the present invention, and specifically, the electronic device 900 includes a processor 910 and a memory 930. Wherein, the processor 910 and the memory 930 can communicate with each other and transmit control and/or data signals through the internal connection path, the memory 930 is used for storing computer programs, and the processor 910 is used for calling and running the computer programs from the memory 930. The processor 910 and the memory 930 may be combined into a single processing device, or more generally, separate components, and the processor 910 is configured to execute the program code stored in the memory 930 to implement the functions described above. In particular implementations, the memory 930 may be integrated with the processor 910 or may be separate from the processor 910.

In addition, to further enhance the functionality of the electronic device 900, the device 900 may further include one or more of an input unit 960, a display unit 970, an audio circuit 980, a camera 990, a sensor 901, and the like, which may further include a speaker 982, a microphone 984, and the like. The display unit 970 may include a display screen, among others.

Further, the apparatus 900 may also include a power supply 950 for providing power to various devices or circuits within the apparatus 900.

It should be understood that the operation and/or function of the various components of the apparatus 900 can be referred to in the foregoing description with respect to the method, system, etc., and the detailed description is omitted here as appropriate to avoid repetition.

It should be understood that the processor 910 in the electronic device 900 shown in fig. 4 may be a system on chip SOC, and the processor 910 may include a Central Processing Unit (CPU), and may further include other types of processors, such as: an image Processing Unit (GPU), etc., which will be described in detail later.

In summary, various portions of the processors or processing units within the processor 910 may cooperate to implement the foregoing method flows, and corresponding software programs for the various portions of the processors or processing units may be stored in the memory 930.

(2) A computer data storage medium having stored thereon a computer program or the above apparatus which, when executed, causes a computer to perform the steps/functions of the preceding embodiments or equivalent implementations.

In several embodiments provided by the present invention, any of the functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a computer data-accessible storage medium. Based on this understanding, some aspects of the present invention may be embodied in the form of software products, which are described below, or portions thereof, which substantially contribute to the art.

In particular, it should be noted that the storage medium may refer to a server or a similar computer device, and specifically, the aforementioned computer program or the aforementioned apparatus is stored in a storage device in the server or the similar computer device.

(3) A computer program product (which may include the above apparatus), when running on a terminal device, causes the terminal device to execute the image-based traffic congestion identification method of the foregoing embodiment or an equivalent embodiment.

From the above description of the embodiments, it is clear to those skilled in the art that all or part of the steps in the above implementation method can be implemented by software plus a necessary general hardware platform. With this understanding, the above-described computer program product may include, but is not limited to referring to APP.

In the foregoing, the device/terminal may be a computer device, and the hardware structure of the computer device may further specifically include: at least one processor, at least one communication interface, at least one memory, and at least one communication bus; the processor, the communication interface and the memory can all complete mutual communication through the communication bus. The processor may be a central Processing unit CPU, a DSP, a microcontroller, or a digital Signal processor, and may further include a GPU, an embedded Neural Network Processor (NPU), and an Image Signal Processing (ISP), and may further include a specific integrated circuit ASIC, or one or more integrated circuits configured to implement the embodiments of the present invention, and the processor may have a function of operating one or more software programs, and the software programs may be stored in a storage medium such as a memory; and the aforementioned memory/storage media may comprise: non-volatile memories (non-volatile memories) such as non-removable magnetic disks, U-disks, removable hard disks, optical disks, etc., and Read-Only memories (ROM), Random Access Memories (RAM), etc.

In the embodiments of the present invention, "at least one" means one or more, "a plurality" means two or more. "and/or" describes the association relationship of the associated objects, and means that there may be three relationships, for example, a and/or B, and may mean that a exists alone, a and B exist simultaneously, and B exists alone. Wherein A and B can be singular or plural. The character "/" generally indicates that the former and latter associated objects are in an "or" relationship. "at least one of the following" and similar expressions refer to any combination of these items, including any combination of singular or plural items. For example, at least one of a, b, and c may represent: a, b, c, a and b, a and c, b and c or a and b and c, wherein a, b and c can be single or multiple.

Those of skill in the art will appreciate that the various modules, elements, and method steps described in the embodiments disclosed in this specification can be implemented as electronic hardware, combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.

And, modules, units, etc. described herein as separate components may or may not be physically separate, i.e., may be located in one place, or may be distributed across multiple places, e.g., nodes of a system network. Some or all of the modules and units can be selected according to actual needs to achieve the purpose of the above-mentioned embodiment. Can be understood and carried out by those skilled in the art without inventive effort.

The structure, features and effects of the present invention have been described in detail with reference to the embodiments shown in the drawings, but the above embodiments are merely preferred embodiments of the present invention, and it should be understood that technical features related to the above embodiments and preferred modes thereof can be reasonably combined and configured into various equivalent schemes by those skilled in the art without departing from and changing the design idea and technical effects of the present invention; therefore, the invention is not limited to the embodiments shown in the drawings, and all the modifications and equivalent embodiments that can be made according to the idea of the invention are within the scope of the invention as long as they are not beyond the spirit of the description and the drawings.

完整详细技术资料下载
上一篇:石墨接头机器人自动装卡簧、装栓机
下一篇:一种智能监控的手势识别方法、装置、设备和存储介质

网友询问留言

已有0条留言

还没有人留言评论。精彩留言会获得点赞!

精彩留言,会给你点赞!