Method and device for identifying surrounding environment of vehicle and related equipment
1. A method for identifying an environment around a vehicle, comprising:
acquiring an environment image around a vehicle, and processing the environment image by using a preset identification network to obtain a plurality of intermediate characteristic maps;
determining a target feature map according to the intermediate feature map;
and performing convolution processing on the target characteristic graph to obtain a classification result, and determining environmental information included in the environmental image according to the classification result.
2. The method of claim 1, wherein the predetermined identification network comprises a plurality of volume blocks, the volume blocks being in a chain-connected structure; the convolution block comprises at least one convolution layer;
the processing of the environment image by using a preset identification network to obtain a plurality of intermediate feature maps comprises:
processing the environment image by a first volume block in the preset identification network, and outputting an intermediate feature map;
inputting the intermediate feature map into a next rolling block according to the chain connection structure, and processing the input intermediate feature map by the next rolling block and outputting another intermediate feature map;
and continuing to execute the step of inputting the intermediate feature map into the next volume block according to the chain connection structure until the last volume block in the preset identification network outputs the intermediate feature map.
3. The method of claim 2, wherein determining a target feature map from the intermediate feature maps comprises:
determining a plurality of detection feature maps in the intermediate feature map, and performing convolution processing on each detection feature map by using a preset convolution layer to obtain a plurality of feature maps to be merged;
and merging the characteristic graphs to be merged to obtain the target characteristic graph.
4. The method according to claim 3, wherein the merging the plurality of feature maps to be merged comprises:
adjusting the feature graph to be merged according to a preset rule to obtain a feature graph in accordance with a preset size;
and superposing the characteristic diagrams conforming to the preset sizes to obtain the target characteristic diagram.
5. The method of claim 3, wherein the number of volume blocks is 7;
the determining a plurality of detection feature maps in the intermediate feature map comprises:
and determining the intermediate characteristic diagram output by the fourth volume block, the intermediate characteristic diagram output by the fifth volume block and the intermediate characteristic diagram output by the seventh volume block in the chain connection structure as the detection characteristic diagram.
6. The method of claim 3, wherein the preset convolution layer comprises 245 convolution kernels of 1 x 1.
7. The method according to any one of claims 1 to 6, wherein the convolving the target feature map to obtain the classification result comprises:
and processing the target feature map by using a full connection layer, and outputting a classification result corresponding to each pixel point in the target feature map.
8. The method of claim 7, wherein the classification result includes at least one of the following information:
obstacle type information, position information of the obstacle in the original image and confidence;
the full connection layer comprises N convolution kernels, and the N convolution kernels comprise a convolution kernel used for determining the obstacle type information, a convolution kernel used for identifying the position information and a convolution kernel of the confidence coefficient.
9. The method of claim 8, wherein the determining environmental information included in the environmental image according to the classification result comprises:
determining the environmental information included in the environmental image according to the confidence included in the classification result.
10. The method of claim 5,
the size of the environment image is 320 x 3;
the size of the intermediate feature map output by the first of the volume blocks is 160 x 16;
the size of the intermediate feature map output by the second of the volume blocks is 80 x 24;
the size of the intermediate feature map output by the third of the volume blocks is 40 x 36;
the size of the intermediate feature map output by the fourth of the volume blocks is 20 x 60;
the size of the intermediate feature map output by the fifth of the volume blocks is 10 x 108;
the size of the intermediate feature map output by the sixth volume block is 5 x 204;
the size of the intermediate feature output by the seventh of the volume blocks is 1 x 396.
11. The method of any of claims 1-6, 10, wherein the acquiring the environmental image around the vehicle comprises:
and acquiring a shooting image of an image acquisition device of the vehicle, and preprocessing the image to obtain the environment image.
12. The method of claim 11, wherein the preprocessing the image comprises:
and performing cropping and/or compression processing on the image to obtain the environment image with the size consistent with the required size.
13. An apparatus for recognizing an environment around a vehicle, comprising:
the system comprises a feature extraction module, a feature extraction module and a feature extraction module, wherein the feature extraction module is used for acquiring an environment image around a vehicle and processing the environment image by using a preset identification network to obtain a plurality of intermediate feature maps;
the determining module is used for determining a target feature map according to the intermediate feature map;
and the classification module is used for performing convolution processing on the target characteristic graph to obtain a classification result and determining the environmental information included in the environmental image according to the classification result.
14. The apparatus of claim 13, wherein the predetermined identification network comprises a plurality of volume blocks, the volume blocks being in a chain-link structure; the convolution block comprises at least one convolution layer;
the feature extraction module is specifically configured to:
processing the environment image by utilizing a first volume block in the preset identification network, and outputting an intermediate feature map;
inputting the intermediate feature map into a next rolling block according to the chain connection structure, and processing the input intermediate feature map by the next rolling block and outputting another intermediate feature map;
and continuing to execute the step of inputting the intermediate feature map into the next volume block according to the chain connection structure until the last volume block in the preset identification network outputs the intermediate feature map.
15. The apparatus of claim 13, wherein the determining module comprises:
the convolution unit is used for determining a plurality of detection feature maps in the intermediate feature map and performing convolution processing on each detection feature map by using a preset convolution layer to obtain a plurality of feature maps to be merged;
and the merging unit is used for merging the plurality of feature maps to be merged to obtain the target feature map.
16. The apparatus according to claim 15, wherein the merging unit is specifically configured to:
adjusting the feature graph to be merged according to a preset rule to obtain a feature graph in accordance with a preset size;
and superposing the characteristic diagrams conforming to the preset sizes to obtain the target characteristic diagram.
17. An electronic device, comprising:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,
the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-12.
18. A non-transitory computer readable storage medium having stored thereon computer instructions for causing the computer to perform the method of any one of claims 1-12.
19. A tachograph, comprising: the device comprises an image acquisition device and a processing device;
the image acquisition device sends the acquired image to the processing device;
the processing device performs the method of any of claims 1-12 based on the received image.
20. A navigation system, characterized in that the navigation system is applied to a vehicle;
the navigation system is connected with an image acquisition device arranged on the vehicle;
the navigation system receives an image to be captured by the image capture device and performs the method of any of claims 1-12 based on the received image.
Background
At present, the assistant driving technology is mature, and the assistant driving technology is arranged in a plurality of vehicles. The early warning of the collision of the front vehicle in the auxiliary driving technology is more widely applied to driving scenes and can be applied to some small-sized vehicle-mounted devices, such as a vehicle event data recorder, a vehicle event box, a vehicle navigation system and the like.
The current front vehicle collision early warning needs environment detection, can identify the environment around the vehicle based on the traditional image characteristics and a classifier, and can also identify the environment around the vehicle based on end-to-end detection of deep learning.
However, the conventional scheme requires a small amount of calculation, but has low detection accuracy, while the deep learning scheme has high accuracy, but has a large amount of calculation and high calculation force requirements.
However, some small-sized vehicle-mounted devices are generally configured with low-computing chips, and the environment recognition method with high accuracy is not suitable for these small-sized vehicle-mounted devices, so that it is an urgent technical problem for those skilled in the art to provide an accurate and low-computing method for recognizing the environment around the vehicle.
Disclosure of Invention
The application provides a method and a device for identifying the surrounding environment of a vehicle and related equipment, so as to provide an identification scheme with accurate identification result and low calculation capacity.
A first aspect of the present application provides a method for identifying an environment around a vehicle, including:
acquiring an environment image around a vehicle, and processing the environment image by using a preset identification network to obtain a plurality of intermediate characteristic maps;
determining a target feature map according to the intermediate feature map;
and performing convolution processing on the target characteristic graph to obtain a classification result, and determining environmental information included in the environmental image according to the classification result.
A second aspect of the present application provides an identification apparatus of a vehicle surroundings, including:
the system comprises a feature extraction module, a feature extraction module and a feature extraction module, wherein the feature extraction module is used for acquiring an environment image around a vehicle and processing the environment image by using a preset identification network to obtain a plurality of intermediate feature maps;
the determining module is used for determining a target feature map according to the intermediate feature map;
and the classification module is used for performing convolution processing on the target characteristic graph to obtain a classification result and determining the environmental information included in the environmental image according to the classification result.
A third aspect of the present application provides an electronic device comprising:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,
the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of identifying an environment surrounding a vehicle according to the first aspect.
A fourth aspect of the present application provides a non-transitory computer-readable storage medium storing computer instructions for causing a computer to execute the method for identifying a vehicle surroundings according to the first aspect.
The fifth aspect of the present application provides a vehicle event data recorder, including: the device comprises an image acquisition device and a processing device;
the image acquisition device sends the acquired image to the processing device;
the processing device executes the method of identifying the vehicle surroundings according to the first aspect, based on the received image.
A sixth aspect of the present application provides a navigation system that is applied to a vehicle;
the navigation system is connected with an image acquisition device arranged on the vehicle;
the navigation system receives an image to be captured by the image capturing device, and performs the method of identifying the vehicle surroundings according to the first aspect, based on the received image.
One embodiment in the above application has the following advantages or benefits:
the application provides a method, a device and related equipment for identifying the surrounding environment of a vehicle, which comprise the following steps: acquiring an environment image around a vehicle, and processing the environment image by using a preset identification network to obtain a plurality of intermediate characteristic maps; determining a target feature map according to the intermediate feature map; and performing convolution processing on the target characteristic graph to obtain a classification result, and determining environmental information included in the environmental image according to the classification result. According to the scheme, the environment image is identified by the aid of the preset identification network, accurate identification results can be obtained, the intermediate features are fully utilized in the identification process, the process calculation amount for identifying the environment image is small, the environment information result obtained by the feature identification mode is accurate, and the scheme with the small calculation amount and the accurate identification results can be provided.
Other effects of the above-described alternative will be described below with reference to specific embodiments.
Drawings
The drawings are included to provide a better understanding of the present solution and are not intended to limit the present application. Wherein:
fig. 1 is a schematic diagram of an application scenario shown in an exemplary embodiment of the present application;
FIG. 2 is a flow chart illustrating a method for identifying an environment surrounding a vehicle according to an exemplary embodiment of the present application;
FIG. 3 is a flow chart illustrating a method of identifying the environment surrounding a vehicle according to another exemplary embodiment of the present application;
FIG. 4 is a schematic diagram of a convolution block shown in an exemplary embodiment of the present application;
FIG. 5 is a flow chart illustrating the identification of an environmental image according to an exemplary embodiment of the present application;
fig. 6 is a block diagram showing an identification apparatus of a vehicle surroundings according to an exemplary embodiment of the present application;
fig. 7 is a block diagram of an identification apparatus of a vehicle surroundings according to another exemplary embodiment of the present application;
fig. 8 is a block diagram of an electronic device according to an exemplary embodiment of the present application.
Detailed Description
The following description of the exemplary embodiments of the present application, taken in conjunction with the accompanying drawings, includes various details of the embodiments of the application for the understanding of the same, which are to be considered exemplary only. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present application. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.
At present, a simple driving assistance function can be carried in a plurality of small-sized vehicle-mounted devices, for example, a technology of early warning of front vehicle collision can be carried in the small-sized vehicle-mounted devices, so that the small-sized vehicle-mounted devices can have an early warning function.
The premise of realizing the early warning of the front vehicle collision is that the environment around the vehicle can be identified, and two methods are available for environment detection, one is based on the traditional image characteristic adding classifier, such as hog + svm, and the second is based on the end-to-end detection of deep learning, such as fast-rcnn and yolo, and both methods output the rectangular frame of the obstacle.
The traditional image scheme has small calculation amount, but has low detection accuracy, and particularly has low detection rate in a complex scene in heavy fog/rainy days/nights. The scheme for deep learning has high detection rate and strong generalization capability, is less influenced and interfered by a complex scene, but needs a larger amount of calculation.
And the small vehicle-mounted equipment is configured with a low-computing chip, so that the environment identification method with higher accuracy is not suitable for the small vehicle-mounted equipment.
According to the vehicle surrounding environment recognition scheme, the environment image is recognized through the preset recognition network, the intermediate features obtained by recognizing the environment image through the network are fully utilized, the target features are obtained according to the intermediate features, and then the target features are processed to obtain the environment information included in the environment image. In the scheme provided by the application, because the intermediate features are fully utilized, the process of identifying the environment image is less in calculated amount, and the environment information result obtained in a feature identification mode is more accurate, so that the scheme with less calculated amount and accurate identification result can be provided.
Fig. 1 is a schematic view of an application scenario shown in an exemplary embodiment of the present application.
As shown in fig. 1, a vehicle-mounted device 11 may be provided in the vehicle, and the vehicle-mounted device 11 may be provided at the time of shipment of the vehicle or may be mounted on the vehicle after purchase by the user. The onboard device 11 may be, for example, an onboard navigation, a tachograph box, or the like.
The in-vehicle apparatus 11 may be connected to an image pickup apparatus 12. In one embodiment, the image capturing device 12 may be integrated with the vehicle-mounted device 11, for example, when the vehicle-mounted device is a car recorder, a camera may be disposed on the car recorder. In another embodiment, the image capturing device 12 may be provided separately from the vehicle-mounted device 11, for example, the vehicle-mounted device 11 may be provided in a cab, and the image capturing device 12 may be mounted at a front windshield, and the two may be connected by wire or wirelessly.
Only one image pickup apparatus 12 is schematically shown in the drawing, but a plurality of image pickup apparatuses 12 connected to the in-vehicle apparatus 11 may be provided according to need. For example, cameras with different orientations can be arranged, so that images around the vehicle can be acquired more abundantly.
The image capture device 12 may capture an image of the surroundings of the vehicle and transmit it to the in-vehicle device 11, so that the in-vehicle device 11 can process the image to determine environmental information of the surroundings of the vehicle.
Fig. 2 is a flowchart illustrating a method for identifying a vehicle surroundings according to an exemplary embodiment of the present application.
As shown in fig. 2, the present application provides a method for identifying an environment around a vehicle, including:
step 201, obtaining an environment image around a vehicle, and processing the environment image by using a preset identification network to obtain a plurality of intermediate characteristic maps.
The method provided by the embodiment may be executed by an electronic device with computing capability, for example, an on-board device, such as the on-board device 11 shown in fig. 1.
Specifically, the vehicle-mounted device may acquire an environment image around the vehicle, where the environment image may be an image captured by the image capture device or an image obtained by processing the image captured by the image capture device.
The image capture device may be disposed on a vehicle, such as image capture device 12 shown in fig. 1, so as to be able to capture the environment external to the vehicle. The image acquisition equipment can send the shot image to the vehicle-mounted equipment, and then the vehicle-mounted equipment processes the shot image.
In one embodiment, the vehicle-mounted device may take the received image as an environment image and process the environment image based on the method provided by the embodiment. In another embodiment, the vehicle-mounted device may perform pre-processing on the image to obtain an environment image, such as compressing, cropping, and then perform processing on the pre-processed image based on the method provided in this embodiment.
The vehicle-mounted device may be preset with a preset identification network, and the preset identification network may be obtained through training in advance. For example, a large number of environment images are collected in advance, and environment information is marked in the environment images, so that the network is trained by using the environment images with the marked information.
Specifically, the preset recognition network may perform convolution calculation on the environment image, thereby extracting features included in the environment image.
Further, the preset identification network may include a plurality of convolution blocks, a first convolution block may perform convolution calculation on the environment image to obtain an intermediate feature map, transmit the obtained intermediate feature map to a next convolution block, perform convolution calculation on the intermediate feature map by the next convolution block to obtain another intermediate feature map, and continue to propagate the obtained intermediate feature map backward, so that each convolution block can output a corresponding intermediate feature map.
In practical applications, each convolution block may include at least one convolution layer therein for performing convolution calculation on the image input to the convolution block.
And step 202, determining a target characteristic diagram according to the intermediate characteristic diagram.
Wherein, some or all of the intermediate feature maps can be selected and processed to obtain a target feature map.
For example, the selected intermediate feature maps may be sampled so that the sizes of the intermediate feature maps are consistent, and then the intermediate feature maps may be subjected to superposition processing, thereby obtaining a target feature map. For another example, an intermediate convolutional layer may be set in the preset identification network, and the intermediate convolutional layer may perform convolution processing on the intermediate feature map, and then perform superposition processing on the features obtained by the convolution to obtain the target feature map.
For example, the predetermined recognition network includes a plurality of convolution blocks for extracting the intermediate feature map, and also includes an intermediate convolution layer for performing convolution processing on the intermediate feature map. One intermediate convolutional layer may be provided, or a plurality of intermediate convolutional layers corresponding to the intermediate feature map may be provided. For example, if the target feature map is determined based on 3 intermediate feature maps, 3 intermediate convolution layers corresponding to the three intermediate feature maps may be provided.
Specifically, a detection head may be further disposed in the preset identification network, the detection head may receive the feature map processed by the middle convolutional layer, and the detection head may perform sampling processing on the received feature map, and then superimpose the sampled feature map to obtain the target feature map.
And 203, performing convolution processing on the target characteristic graph to obtain a classification result, and determining environmental information included in the environmental image according to the classification result.
Specifically, after the target feature map is obtained, the convolution calculation may be performed on the target feature map, so as to obtain a classification result.
Furthermore, a classification convolutional layer can be arranged in a detection head of the preset identification network, and the classification convolutional layer is used for carrying out convolution calculation on the target feature map to obtain a classification result.
In practical applications, the classification result may include various data, such as the type of the obstacle, such as the position of the obstacle in the environment image, and such as the confidence of the classification result. Corresponding convolution kernels can be set according to data included in the classification result, for example, if the type of the obstacle to be identified includes a vehicle and a pedestrian, the convolution kernels corresponding to the two types can be set, and for example, if the position of the obstacle in the environment image needs to be detected, the convolution kernel related to the determined position can be set, and a convolution kernel for determining the confidence coefficient can also be set.
When the target feature map is convolved by the convolution kernel in the classification convolution layer, each convolved part in the target feature map may correspond to a classification result. For example, if the convolution kernel is 1 × 1, each pixel in the target feature map corresponds to a classification result. The classification result may include: obstacle type information, obstacle position, confidence. Corresponding classification results can be output aiming at each obstacle type, for example, when an obstacle corresponding to one pixel point is identified as a person, a group of classification results can be output, and when an obstacle corresponding to one pixel point is identified as a vehicle, another group of classification results can be output.
Specifically, the final classification result may be selected according to the confidence included in the classification result, so as to obtain the environmental information included in the environmental image, for example, for a plurality of classification results corresponding to one pixel point, the classification result with the highest confidence is selected as the recognition result of the pixel point. And determining final environment information by combining the recognition results corresponding to the pixel points in the target characteristic diagram.
Further, the vehicle-mounted device can perform early warning of a preceding vehicle collision based on the recognition result, for example, when a vehicle or a pedestrian is recognized to be near the vehicle, an alarm can be given.
The present embodiment provides a method for identifying an environment surrounding a vehicle, which is performed by a device provided with the method of the present embodiment, which is typically implemented in hardware and/or software.
The method for identifying the environment around the vehicle provided by the embodiment comprises the following steps: acquiring an environment image around a vehicle, and processing the environment image by using a preset identification network to obtain a plurality of intermediate characteristic maps; determining a target feature map according to the intermediate feature map; and performing convolution processing on the target characteristic graph to obtain a classification result, and determining environmental information included in the environmental image according to the classification result. According to the method, the intermediate features are fully utilized, so that the calculation amount of the process of identifying the environment image is less, the environment information result obtained in a feature identification mode is more accurate, and a scheme with less calculation amount and accurate identification result can be provided.
Fig. 3 is a flowchart illustrating a method for identifying a vehicle surroundings according to another exemplary embodiment of the present application.
As shown in fig. 3, the present application provides a method for identifying an environment around a vehicle, including:
step 301, acquiring a shot image of an image acquisition device of a vehicle, and preprocessing the image to obtain an environment image.
The method provided by the embodiment may be executed by an electronic device with computing capability, for example, an on-board device, such as the on-board device 11 shown in fig. 1.
Specifically, the vehicle may be provided with an image capturing device, such as a camera. The image acquisition device can be connected with the vehicle-mounted equipment and sends acquired images to the vehicle-mounted equipment.
Further, the vehicle-mounted device can preprocess the received image, so as to obtain an environment image with a size meeting the preset requirement of identifying network input data. For example, if the data size input to the predetermined recognition network should be 320 × 3, the vehicle-mounted device may process the received image to obtain a size of 320 × 3. Of these, 3 are 3 channels, namely three channels of RGB (red, green and blue). The size of the image is 320 x 320 for each channel. 320 x 320 refers to the pixel size.
In practical application, the size of the image collected by the image collecting device is related to the parameters of the image collecting device, and the image possibly collected does not conform to the required size.
The image may be cropped and/or compressed to obtain an environment image with a size consistent with the required size.
In such an embodiment, the problem of inaccurate identification due to non-uniform image size of the input network architecture can be avoided.
Step 302, processing the environment image by a first volume block in the preset identification network, and outputting an intermediate feature map.
Specifically, the preset identification network comprises a plurality of rolling blocks, and the rolling blocks are in a chain connection structure; the volume block includes at least one convolution layer.
Fig. 4 is a schematic diagram of a convolution block according to an exemplary embodiment of the present application.
As shown in fig. 4, the predetermined identification network may include a plurality of volume blocks, and the volume blocks are sequentially connected in a chain connection structure. The first volume block 41 processes the pictures input to the network and outputs an intermediate feature map, which is input into the next volume block 42, the next volume block 42 processes the input map and continues to output the intermediate feature map to the next volume block 43 until the last volume block 4n outputs the intermediate feature map.
Further, after the environment image is input into the preset identification network, the environment image is processed by a first volume block in the preset identification network to obtain an intermediate feature map. Specifically, convolution processing is performed on the environment image by the convolution layer in the first convolution block. For example, the convolution block may include a convolution layer, and performing convolution calculation on the environment image using the convolution layer can obtain an intermediate feature map. For another example, the convolution block may include two convolution layers, so that an intermediate feature map can be obtained by performing convolution calculation on the environment image using the two convolution layers, specifically, a feature is output by processing the environment image using a first convolution layer, and then an intermediate feature map is obtained by performing convolution calculation on the feature using a second convolution layer.
And step 303, inputting the intermediate feature map into a next volume block according to the chain connection structure, and processing the input intermediate feature map by the next volume block and outputting another intermediate feature map.
After the first convolution block outputs the intermediate feature map, the intermediate feature map may be input to the next convolution block. The intermediate feature map is input as input data to the next volume block, and the input intermediate feature map is subjected to convolution calculation by the next volume block.
The convolution block may also include at least one convolution layer, and the convolution layer in the convolution block performs convolution calculation on the input intermediate feature map and outputs another intermediate feature map.
Specifically, the intermediate feature map continues to be input into another next volume block until the last volume block in the pre-defined identification network outputs the intermediate feature map. For example, if the preset identification network includes n convolution blocks, the environment image is input into the convolution block 1, the convolution block 1 performs convolution processing on the input image, and outputs an intermediate feature map to the convolution block 2, the convolution block 2 performs convolution processing on the input image, and outputs an intermediate feature map to the convolution block 3, and so on until the convolution block n outputs the intermediate feature map.
In such an embodiment, a plurality of intermediate features can be extracted by a plurality of volume blocks, thereby extracting features included in the environment image from a plurality of dimensions.
And 304, determining a plurality of detection feature maps in the intermediate feature map, and performing convolution processing on each detection feature map by using a preset convolution layer to obtain a plurality of feature maps to be merged.
Further, all or part of the intermediate feature maps can be used as detection feature maps, and the target feature maps can be determined by using the detection feature maps.
In practical applications, the default identification network may include 7 volume blocks. The intermediate feature map output by the fourth volume block, the intermediate feature map output by the fifth volume block, and the intermediate feature map output by the seventh volume block in the chain connection structure may be determined as the detection feature map.
The required size of the environment image may be 320 × 3, and thus, the size of the environment image input to the first volume block may be 320 × 3.
The first convolution block can output an intermediate feature map with a size of 160 × 16 after convolving the input 320 × 3 ambient image. The 160 x 16 intermediate feature map is input into the second volume block. For example, the first convolution block may include a convolution layer including 16 convolution kernels of 3 x 3, and the step size for convolution may be 2.
The second convolution block can output an intermediate feature map of size 80 x 24 after convolving the incoming 160 x 16 image. The 80 x 24 intermediate feature map is input into the third volume block.
The third convolution block can output an intermediate feature map of size 40 x 36 after convolving the input 80 x 24 image. The 40 x 36 intermediate feature map is input into the fourth volume block.
The fourth convolution block can output an intermediate feature map of size 20 x 60 after convolving the input 40 x 36 image. The 20 x 60 intermediate feature maps are input into the fifth volume block.
The fifth convolution block can output an intermediate feature map of size 10 x 108 after convolving the input 20 x 60 images. The 10 x 108 intermediate feature maps are input into the sixth volume block.
The sixth convolution block can output an intermediate feature map of size 5 x 204 after convolving the input 10 x 108 images. The 5 x 204 intermediate feature map is input into the seventh volume block.
The seventh convolution block can output an intermediate feature map of size 1 x 396 by convolving the input 5 x 204 image.
In such an embodiment, the amount of computation required for the network architecture is small, so that the method provided by the embodiment can be applied to small vehicle-mounted devices.
The detection feature maps may be 20 × 60 intermediate feature maps output by the fourth convolution block, 10 × 108 intermediate feature maps output by the fifth convolution block, and 1 × 396 intermediate feature maps output by the seventh convolution block.
In this embodiment, features for identifying the environmental image can be extracted by seven volume blocks, and this lightweight network structure is less computationally intensive.
In practical application, preset convolution layers can be further arranged, convolution processing is respectively carried out on the detection feature graphs through the preset convolution layers, and feature graphs to be merged corresponding to the detection feature graphs are obtained. The predetermined convolutional layer may be a part of a predetermined identification network, and may be connected after the convolutional block for outputting the detection feature map.
In one embodiment, a preset convolution layer may be set, and the preset convolution layer is used to perform convolution processing on the detection feature maps respectively; in another embodiment, a preset convolution layer corresponding to each detection feature map may be provided, and convolution processing may be performed on the detection feature map using the preset convolution layer based on the correspondence relationship.
The set preset convolution layer comprises 245 convolution kernels of 1 × 1, and the feature to be merged in 245 dimensions can be obtained by performing convolution processing on the detection feature map by using the preset convolution kernels. For example, convolution processing is performed on the 20 × 60 detection feature maps, so that 20 × 245 feature maps to be merged can be obtained; performing convolution processing on the 10 × 108 detection feature maps to obtain 10 × 245 feature maps to be merged; convolution processing is performed on the detection feature maps of 1 × 396, so that feature maps to be merged of 1 × 245 can be obtained.
In this embodiment, the detection feature maps are processed by using the preset convolution layer, so that features to be merged with the same dimension can be extracted from different detection feature maps, and thus merging processing can be performed on the features.
And 305, merging the multiple feature maps to be merged to obtain a target feature map.
After obtaining a plurality of feature maps to be merged, merging the feature maps to be merged to obtain a target feature map.
Specifically, because the convolution layers in each convolution block are different, the number and the size of the set convolution kernels may also be different, so that the sizes of the intermediate feature maps output by each convolution block may also be different, and correspondingly, the sizes of the feature maps to be merged obtained according to the intermediate feature maps may also be different.
Further, in order to perform merging processing on the feature maps to be merged, the feature maps to be merged may be processed based on a preset rule, so that each feature map to be merged conforms to a preset size, and then the feature maps with the same size are superimposed to obtain a target feature map. For example, the feature map to be merged may be upsampled to obtain a feature map conforming to a preset size.
The preset rule may include, for example, interpolation up-sampling, and may also include a manner of copying feature values in the feature map.
In practical application, the size of the merged feature map with the largest size may be set as a preset size. And processing other feature graphs to be merged according to a preset rule. And overlapping the characteristic graphs conforming to the preset size to obtain a target characteristic graph.
In such an embodiment, the features to be merged may be resized to be uniform so that they can be processed for superposition.
When the size of the feature map to be merged includes 20 × 245, 10 × 245, and 1 × 245, 20 × 245 may be used as a preset size, and then the feature map to be merged of 10 × 245 and 1 × 245 may be processed to obtain the feature map of 20 × 245. For example, for 10 × 245 feature maps to be merged, an interpolation upsampling process may be performed to obtain 20 × 245 feature maps. For the feature map to be merged of 1 × 245, the feature values may be copied to obtain a feature map of 20 × 245. 1 x 245 can be considered as 245 feature maps of 1 x 1, and each feature of 1 x 1 can be replicated to fill up the size of 20 x 20, resulting in a feature map of 20 x 245.
Specifically, the processed feature maps to be merged have the same size, and can be superimposed to obtain the target feature map. For example, three 20 × 245 feature maps are superimposed to obtain a target feature map.
In the implementation mode, some intermediate feature maps can be selected according to requirements, the target feature map is determined according to the selected intermediate feature maps, the extracted intermediate features can be fully utilized, and further a more accurate identification result can be obtained.
And step 306, processing the target feature map by using the full connection layer, and outputting a classification result corresponding to each pixel point in the target feature map.
Furthermore, a full connection layer can be arranged in the preset identification network, and the full connection layer can be utilized to process the target characteristic diagram, so that a classification result is output.
In practical application, 1 × 1 convolution kernels can be set in the full-connection layer, and the convolution kernels are used for performing point-by-point convolution calculation on the target feature map, so that a classification result corresponding to each pixel point is obtained.
The classification result may include obstacle type information, position information of the obstacle in the original image, and a confidence level. A corresponding convolution kernel may be set for each type of information. For example, if the positions of the obstacles in the original image include 3 obstacle types, which are respectively a pedestrian, a road block and a vehicle, and the position information of the obstacles in the original image includes 4 pieces of information, which are respectively an abscissa, an ordinate, a length and a width, it is necessary to set (3+4+1) convolution kernels, and one convolution kernel is used to identify one piece of information.
In this embodiment, the corresponding convolution kernels are set for different information, so that the accuracy of the recognition result can be improved.
Specifically, a recognition result may be output for each obstacle type, for example, for a pixel (m, n), and 3 classification results may be output, which are position information and confidence in the original image of the obstacle when the obstacle type is a pedestrian, position information and confidence in the original image of the obstacle when the obstacle is a road obstacle, and position information and confidence in the original image of the obstacle when the obstacle is a vehicle.
Further, the recognized obstacle types may be expressed in the form of 1 or 0, for example, three types of obstacle types, that is, a pedestrian, a road barrier, and a vehicle are included in total, and if the three types of obstacle types are a pedestrian, a road barrier, and a vehicle, the three types of obstacle types may be expressed by using c1, c2, and c3, respectively, the recognition result corresponding to one pixel point is a pedestrian by (c 1-1, c 2-0, and c 3-0), the recognition result corresponding to one pixel point is a road barrier by (c 1-0, c 2-1, and c 3-0), and the recognition result corresponding to one pixel point is a vehicle by (c 1-0, c 2-0, and c 3-1). In this case, three sets of results can be output for one pixel point, each set of results including 8 values.
In this embodiment, the target feature map is determined based on the plurality of intermediate feature maps, and the target feature map is convolved, whereby the classification result can be obtained by making full use of the plurality of intermediate feature maps.
Step 307, determining the environmental information included in the environmental image according to the confidence included in the classification result.
In practical application, the information included in the environmental image can be determined by using the confidence degree included in the classification result. For example, the result with the highest confidence may be used as the final recognition result, and the final recognition of each pixel point is combined to determine the environmental information included in the environmental image.
The vehicle-mounted equipment can perform early warning of the collision of the front vehicle based on the recognition result, for example, when a vehicle or a pedestrian is recognized to be close to the vehicle, the vehicle-mounted equipment can give an alarm.
In this embodiment, the final recognition result may be screened out from the plurality of classification results according to the confidence, so as to obtain environment information with higher accuracy.
Fig. 5 is a flowchart illustrating an environment image recognition process according to an exemplary embodiment of the present application.
As shown in fig. 5, after the environment image 51 is input into the preset identification network, the convolution block 52 in the network may process the environment image and output the intermediate feature map to the next convolution block 53, and the convolution block 53 continues to process the environment image to obtain an intermediate feature map and continues to propagate backward until the last convolution block 58 outputs the intermediate feature map.
The intermediate feature maps output by the convolution block 55, the convolution block 56 and the convolution block 58 are respectively subjected to preset convolution layers to obtain three feature maps to be merged.
Then, in 59 kinds of detection heads, the three feature maps to be merged are adjusted to have the same size and are superposed to obtain the target feature map. The detection head can also be provided with the full-connection layer, and for the target feature map, the convolution processing can be carried out on the target feature map by using N1-by-1 convolution cores in the full-connection layer to obtain a final classification result.
Fig. 6 is a block diagram of an identification device of a vehicle surroundings according to an exemplary embodiment of the present application.
As shown in fig. 6, the present application provides an apparatus for identifying an environment around a vehicle, including:
the feature extraction module 61 is configured to obtain an environment image around the vehicle, and process the environment image by using a preset identification network to obtain a plurality of intermediate feature maps;
a determining module 62, configured to determine a target feature map according to the intermediate feature map;
and the classification module 63 is configured to perform convolution processing on the target feature map to obtain a classification result, and determine environment information included in the environment image according to the classification result.
The application provides a vehicle surrounding environment's recognition device includes: the characteristic extraction module is used for acquiring an environment image around the vehicle, and processing the environment image by using a preset identification network to obtain a plurality of intermediate characteristic graphs; the determining module is used for determining a target feature map according to the intermediate feature map; and the classification module is used for performing convolution processing on the target characteristic graph to obtain a classification result and determining the environmental information included in the environmental image according to the classification result. In the device provided by the application, the intermediate features are fully utilized, so that the process of identifying the environment image is less in calculated amount, the environment information result obtained by the feature identification mode is more accurate, and a scheme with less calculated amount and accurate identification result can be provided.
The specific principle and implementation of the device for identifying the vehicle surroundings provided in this embodiment are similar to those of the embodiment shown in fig. 2, and are not described herein again.
Fig. 7 is a block diagram of an identification apparatus of a vehicle surroundings according to another exemplary embodiment of the present application.
As shown in fig. 7, the identification apparatus for the surrounding environment of the vehicle shown in the present application, based on the above embodiment, the preset identification network includes a plurality of volume blocks, and the volume blocks are in a chain connection structure; the convolution block comprises at least one convolution layer;
the feature extraction module 61 is specifically configured to:
processing the environment image by utilizing a first volume block in the preset identification network, and outputting an intermediate feature map;
inputting the intermediate feature map into a next rolling block according to the chain connection structure, and processing the input intermediate feature map by the next rolling block and outputting another intermediate feature map;
and continuing to execute the step of inputting the intermediate feature map into the next volume block according to the chain connection structure until the last volume block in the preset identification network outputs the intermediate feature map.
The determination module 62 includes:
a convolution unit 621, configured to determine multiple detection feature maps in the intermediate feature map, and perform convolution processing on each detection feature map by using a preset convolution layer to obtain multiple feature maps to be merged;
a merging unit 622, configured to merge the multiple feature maps to be merged to obtain the target feature map.
Optionally, the merging unit 622 is specifically configured to:
adjusting the feature graph to be merged according to a preset rule to obtain a feature graph in accordance with a preset size;
and superposing the characteristic diagrams conforming to the preset sizes to obtain the target characteristic diagram.
Optionally, the number of the volume blocks is 7;
the convolution unit 621 is specifically configured to:
and determining the intermediate characteristic diagram output by the fourth volume block, the intermediate characteristic diagram output by the fifth volume block and the intermediate characteristic diagram output by the seventh volume block in the chain connection structure as the detection characteristic diagram.
Optionally, the preset convolution layer includes 245 convolution kernels of 1 × 1.
Optionally, the classification module 63 is specifically configured to:
and processing the target feature map by using a full connection layer, and outputting a classification result corresponding to each pixel point in the target feature map.
Optionally, the classification result includes at least one of the following information:
obstacle type information, position information of the obstacle in the original image and confidence;
the full connection layer comprises N convolution kernels, and the N convolution kernels comprise a convolution kernel used for determining the obstacle type information, a convolution kernel used for identifying the position information and a convolution kernel of the confidence coefficient.
Optionally, the classification module 63 is specifically configured to:
determining the environmental information included in the environmental image according to the confidence included in the classification result.
Optionally, the size of the environment image is 320 × 3;
the size of the intermediate feature map output by the first of the volume blocks is 160 x 16;
the size of the intermediate feature map output by the second of the volume blocks is 80 x 24;
the size of the intermediate feature map output by the third of the volume blocks is 40 x 36;
the size of the intermediate feature map output by the fourth of the volume blocks is 20 x 60;
the size of the intermediate feature map output by the fifth of the volume blocks is 10 x 108;
the size of the intermediate feature map output by the sixth volume block is 5 x 204;
the size of the intermediate feature output by the seventh of the volume blocks is 1 x 396.
Optionally, the feature extraction module includes an image preprocessing unit 611, configured to:
and acquiring a shooting image of an image acquisition device of the vehicle, and preprocessing the image to obtain the environment image.
Optionally, the image preprocessing unit 611 is specifically configured to:
and performing cropping and/or compression processing on the image to obtain the environment image with the size consistent with the required size.
The specific principle and implementation of the device for identifying the vehicle surroundings provided in this embodiment are similar to those of the embodiment shown in fig. 3, and are not described herein again.
According to an embodiment of the present application, an electronic device and a readable storage medium are also provided.
As shown in fig. 8, is a block diagram of an electronic device according to an embodiment of the application. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular phones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be examples only, and are not meant to limit implementations of the present application that are described and/or claimed herein.
As shown in fig. 8, the electronic apparatus includes: one or more processors 801, memory 802, and interfaces for connecting the various components, including a high speed interface and a low speed interface. The various components are interconnected using different buses and may be mounted on a common motherboard or in other manners as desired. The processor may process instructions for execution within the electronic device, including instructions stored in or on the memory to display graphical information of a GUI on an external input/output apparatus (such as a display device coupled to the interface). In other embodiments, multiple processors and/or multiple buses may be used, along with multiple memories and multiple memories, as desired. Also, multiple electronic devices may be connected, with each device providing portions of the necessary operations (e.g., as a server array, a group of blade servers, or a multi-processor system). Fig. 8 illustrates an example of a processor 801.
The memory 802 is a non-transitory computer readable storage medium as provided herein. Wherein the memory stores instructions executable by at least one processor to cause the at least one processor to perform the method of identification of a vehicle surroundings provided herein. The non-transitory computer-readable storage medium of the present application stores computer instructions for causing a computer to execute the method of identifying a vehicle surroundings provided by the present application.
The memory 802, which is a non-transitory computer-readable storage medium, may be used to store non-transitory software programs, non-transitory computer-executable programs, and modules, such as program instructions/modules (e.g., the feature extraction module 61, the determination module 62, and the classification module 63 shown in fig. 6) corresponding to the identification method of the vehicle surroundings in the embodiment of the present application. The processor 801 executes various functional applications of the server and data processing by running non-transitory software programs, instructions, and modules stored in the memory 802, that is, implements the method of identifying the vehicle surroundings in the above-described method embodiments.
The memory 802 may include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function; the storage data area may store data created according to use of the electronic device, and the like. Further, the memory 802 may include high speed random access memory and may also include non-transitory memory, such as at least one magnetic disk storage device, flash memory device, or other non-transitory solid state storage device. In some embodiments, the memory 802 optionally includes memory located remotely from the processor 801, which may be connected to the electronic device via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
The electronic device may further include: an input device 803 and an output device 804. The processor 801, the memory 802, the input device 803, and the output device 804 may be connected by a bus or other means, and are exemplified by a bus in fig. 8.
The input device 803 may receive input numeric or character information and generate key signal inputs related to user settings and function controls of the electronic device, such as a touch screen, keypad, mouse, track pad, touch pad, pointer stick, one or more mouse buttons, track ball, joystick, or other input device. The output devices 804 may include a display device, auxiliary lighting devices (e.g., LEDs), and haptic feedback devices (e.g., vibrating motors), among others. The display device may include, but is not limited to, a Liquid Crystal Display (LCD), a Light Emitting Diode (LED) display, and a plasma display. In some implementations, the display device can be a touch screen.
Various implementations of the systems and techniques described here can be realized in digital electronic circuitry, integrated circuitry, application specific ASICs (application specific integrated circuits), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, receiving data and instructions from, and transmitting data and instructions to, a storage system, at least one input device, and at least one output device.
These computer programs (also known as programs, software applications, or code) include machine instructions for a programmable processor, and may be implemented using high-level procedural and/or object-oriented programming languages, and/or assembly/machine languages. As used herein, the terms "machine-readable medium" and "computer-readable medium" refer to any computer program product, apparatus, and/or device (e.g., magnetic discs, optical disks, memory, Programmable Logic Devices (PLDs)) used to provide machine instructions and/or data to a programmable processor, including a machine-readable medium that receives machine instructions as a machine-readable signal. The term "machine-readable signal" refers to any signal used to provide machine instructions and/or data to a programmable processor.
To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic, speech, or tactile input.
The systems and techniques described here can be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), Wide Area Networks (WANs), and the Internet.
The computer system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other.
The application also provides a vehicle event data recorder, which comprises an image acquisition device and a processing device;
the image acquisition device sends the acquired image to the processing device;
the processing device executes any one of the vehicle surroundings recognition methods shown in fig. 2 or 3 based on the received image.
The image acquisition device may be a camera, for example, and the image acquisition device and the processing device may be integrated together or may be separately disposed. The two may be connected by wire or by wireless, e.g. bluetooth.
The application also provides a navigation system, which is applied to a vehicle;
the navigation system is connected with an image acquisition device arranged on the vehicle;
the navigation system receives an image to be captured by the image capturing device and performs any one of the vehicle surroundings recognition methods shown in fig. 2 or 3 based on the received image.
The image acquisition device may be a camera arranged on the vehicle, the camera may be arranged when the vehicle leaves a factory, and may also be configured later according to requirements, and the image acquisition device may also be an auxiliary device on other equipment, such as a camera on a vehicle data recorder, or a camera providing a back-up image function.
The image acquisition device and the navigation system can be connected in a wired or wireless mode.
It should be understood that various forms of the flows shown above may be used, with steps reordered, added, or deleted. For example, the steps described in the present application may be executed in parallel, sequentially, or in different orders, and the present invention is not limited thereto as long as the desired result of the technical solution of the present application can be achieved.
The above-described embodiments should not be construed as limiting the scope of the present application. It should be understood by those skilled in the art that various modifications, combinations, sub-combinations and substitutions may be made in accordance with design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present application shall be included in the protection scope of the present application.
- 上一篇:石墨接头机器人自动装卡簧、装栓机
- 下一篇:移动对象识别方法、装置及设备