Synthetic aperture radar image overlap area extraction method and storage medium

文档序号:8542 发布日期:2021-09-17 浏览:28次 中文

1. A synthetic aperture radar image overlap area extraction method is characterized by comprising the following steps:

acquiring a Synthetic Aperture Radar (SAR) image, and a coherent coefficient map and an interference phase map corresponding to the SAR image;

respectively extracting one image channel of the SAR image, the corresponding coherence coefficient image and the interference phase image, and synthesizing the three extracted image channels according to a preset rule to obtain a synthesized image;

dividing the composite image into a plurality of sub-samples, and performing feature extraction on the plurality of sub-samples to obtain a feature map;

performing feature re-extraction based on the feature map to obtain refined features;

fusing based on the feature map and the refined feature to obtain a fused feature map;

and extracting the overlapping area in the fusion feature map by using a preset sliding window splicing strategy.

2. The method according to claim 1, wherein the extracting features from the plurality of subsamples to obtain a feature map comprises:

performing feature extraction by using a plurality of subsamples of the improved backbone network to obtain a first level feature, a second level feature, a third level feature and a fourth level feature with four gradually increased channel numbers;

the improved backbone network is obtained by respectively replacing common two-dimensional convolutions in three operation blocks behind a rapid training residual error network ResNet _101 pooling layer with pyramid convolutions with different scales and depths, and replacing the last convolution layer of ResNet _101 with a convolution cascade module with a preset specification, wherein the kernel size of the pyramid convolution is increased progressively, and the convolution depth is decreased progressively.

3. The synthetic aperture radar image overlap region extraction method according to claim 2, wherein the feature re-extraction is performed based on the feature map, and obtaining refined features comprises:

inputting the fourth level features into a multi-receptive field parallel attention pooling working layer to obtain image level features and context information corresponding to receptive fields; and

performing feature extraction on the fourth level features through a preset target semantic pooling branch to obtain pooling context information;

determining a first fusion characteristic according to the image-level characteristics, the context information of each receptive field and the pooling context information;

the multi-receptive-field parallel attention pooling working layer comprises convolution of four different cavity rates in parallel and global average pooling.

4. The synthetic aperture radar image overlap region extraction method of claim 3, wherein the inputting the fourth-level features into a multi-field parallel attention pooling working layer, and the obtaining of the image-level features and the context information of the corresponding fields comprises:

inputting the fourth-level features into parallel convolutional layers with four different void ratios to obtain four-way feature diagram output;

inputting the four characteristic diagram outputs into a channel space attention module of a corresponding branch, so as to extract channel information and space information in the corresponding characteristic diagram outputs through the channel space attention module;

and determining context information corresponding to the receptive field according to the four-path characteristic diagram output and the corresponding channel information and the space information.

5. The synthetic aperture radar image overlap region extraction method of claim 4, wherein the extracting, by the channel spatial attention module, the channel information and the spatial information in the corresponding feature map output comprises:

compressing the feature map output of the corresponding branch through the average pooling and the maximum pooling of the channel space attention module;

inputting the compressed feature map into an appointed shared network for processing, performing feature adjustment on a result obtained by processing through a first Sigmoid function, and mapping the result into a one-dimensional channel attention feature map according to the adjustment result, wherein the appointed shared network comprises a plurality of layers of perceptrons and hidden layers;

multiplying the one-dimensional channel attention feature map and the heat map to obtain a feature map on a channel dimension, learning a target feature through maximum pooling and average pooling of the feature map on the channel dimension, merging the one-dimensional channel attention feature map and the feature map on the channel dimension according to the target feature, performing feature adjustment on a merging result through convolution operation and a second Sigmoid function, and mapping the merging result into a two-dimensional space attention feature map according to an adjustment result to extract channel information and space information in the corresponding feature map output.

6. The method according to claim 5, wherein the feature extraction of the fourth-level feature through a preset target semantic pooling branch comprises:

calculating the similarity between a single pixel and all pixels in the fourth-level feature to obtain a semantic mapping relation between target semantics and each pixel;

aggregating all pixels on the semantic mapping relation, and calculating the label of the current single pixel;

repeatedly calculating labels for all pixels in the fourth level feature to obtain pooled context information.

7. The synthetic aperture radar image overlap region extraction method according to claim 6, wherein the feature re-extraction based on the feature map, and obtaining refined features further comprises:

performing multi-modal cyclic MCF fusion on the second-level features and the third-level features;

and carrying out edge thinning on the feature map obtained by MCF fusion to obtain a second fusion feature.

8. The synthetic aperture radar image overlap region extraction method according to claim 7, wherein the fusing based on the feature map and the refined feature to obtain a fused feature map comprises:

splicing and fusing the first fusion feature and the second fusion feature to obtain a third fusion feature;

and connecting the third fused feature with the first hierarchical feature and performing feature refinement to obtain a fused feature map.

9. The synthetic aperture radar image overlap region extraction method according to any one of claims 1 to 8, wherein the extracting the overlap region in the fused feature map by using a preset sliding window stitching strategy comprises:

shearing the fusion feature map according to a preset step length by using a sliding window with a preset specification to obtain a plurality of shearing samples;

classifying the obtained plurality of shearing samples to obtain a classification score map of each shearing sample;

and extracting the overlapping and covering area of the SAR image according to the classification score map of each shearing sample.

10. A computer-readable storage medium, characterized in that the computer-readable storage medium has stored thereon a computer program which, when being executed by a processor, carries out the steps of the synthetic aperture radar image eclipse region extraction method according to any one of the claims 1 to 9.

Background

Synthetic Aperture Radar (SAR) is a high-resolution imaging Radar, and has the advantages of all weather, high precision, no influence of weather and the like. Synthetic Aperture Radar interferometry (InSAR) is a novel active earth surface deformation monitoring technology and is an important application in the SAR field.

With the continuous maturity of the InSAR technology, the application of the InSAR technology in the field of geological disaster monitoring is more and more extensive, which has higher requirements on the monitoring precision and the monitoring data of the InSAR technology.

In landslide and other geological disasters, which are frequently generated in mountainous areas with large topographic relief, the slope angle and the back slope angle are large, and the overlapping and covering phenomenon is easily generated in the SAR image data acquisition process. The overlap refers to a phenomenon that the echo of the mountain top is received by the radar earlier than the echo of the mountain foot part, so that the image of the mountain top appears before the mountain bottom in the slant range, and the phenomenon is also called top-bottom inversion. The existence of the overlap mask seriously damages the continuity of interference phases, so that the InSAR technology cannot accurately filter and unwind in data processing work. Therefore, effective detection and extraction of the overlapping area have great significance for obtaining accurate monitoring results in the field of geological disaster monitoring by the InSAR technology.

Effective detection of the overlap can promote the InSAR technology to better play a role in the field of geological disasters. In studies on the detection of the eclipse, a large number of eclipses are detected by using a method such as a mathematical geometric model. And identifying the overlapping area by using the spectrum shifting characteristics of the SAR image. The phase map is fused with a mathematical Elevation model (DEM) using maximum likelihood estimation to extract the overlapping area. The method based on the interference signal autocorrelation matrix is used for superposition detection, but the method is complex in calculation principle, long in calculation time, large in system error and related to the number estimation criterion of various information theory information.

Disclosure of Invention

The embodiment of the invention provides a synthetic aperture radar image overlap area extraction method and a storage medium, which are used for realizing automatic extraction of an overlap area of a high-resolution SAR image.

The embodiment of the invention provides a synthetic aperture radar image overlap area extraction method, which comprises the following steps:

acquiring a Synthetic Aperture Radar (SAR) image, and a coherent coefficient map and an interference phase map corresponding to the SAR image;

respectively extracting one image channel of the SAR image, the corresponding coherence coefficient image and the interference phase image, and synthesizing the three extracted image channels according to a preset rule to obtain a synthesized image;

dividing the composite image into a plurality of sub-samples, and performing feature extraction on the plurality of sub-samples to obtain a feature map;

performing feature re-extraction based on the feature map to obtain refined features;

fusing based on the feature map and the refined feature to obtain a fused feature map;

and extracting the overlapping area in the fusion feature map by using a preset sliding window splicing strategy.

In an example, the performing feature extraction on a number of the subsamples to obtain a feature map includes:

performing feature extraction by using a plurality of subsamples of the improved backbone network to obtain a first level feature, a second level feature, a third level feature and a fourth level feature with four gradually increased channel numbers;

the improved backbone network is obtained by respectively replacing common two-dimensional convolutions in three operation blocks behind a rapid training residual error network ResNet _101 pooling layer with pyramid convolutions with different scales and depths, and replacing the last convolution layer of ResNet _101 with a convolution cascade module with a preset specification, wherein the kernel size of the pyramid convolution is increased progressively, and the convolution depth is decreased progressively.

In one example, the performing feature re-extraction based on the feature map to obtain refined features includes:

inputting the fourth level features into a multi-receptive field parallel attention pooling working layer to obtain image level features and context information corresponding to receptive fields; and

performing feature extraction on the fourth level features through a preset target semantic pooling branch to obtain pooling context information;

determining a first fusion characteristic according to the image-level characteristics, the context information of each receptive field and the pooling context information;

the multi-receptive-field parallel attention pooling working layer comprises convolution of four different cavity rates in parallel and global average pooling.

In an example, the inputting the fourth-level features into the multi-receptive-field parallel attention pooling working layer, and the obtaining the image-level features and the context information of the corresponding receptive fields includes:

inputting the fourth-level features into parallel convolutional layers with four different void ratios to obtain four-way feature diagram output;

inputting the four characteristic diagram outputs into a channel space attention module of a corresponding branch, so as to extract channel information and space information in the corresponding characteristic diagram outputs through the channel space attention module;

and determining context information corresponding to the receptive field according to the four-path characteristic diagram output and the corresponding channel information and the space information.

In one example, the extracting, by the channel spatial attention module, channel information and spatial information in a corresponding feature map output includes:

compressing the feature map output of the corresponding branch through the average pooling and the maximum pooling of the channel space attention module;

inputting the compressed feature map into an appointed shared network for processing, performing feature adjustment on a result obtained by processing through a first Sigmoid function, and mapping the result into a one-dimensional channel attention feature map according to the adjustment result, wherein the appointed shared network comprises a plurality of layers of perceptrons and hidden layers;

multiplying the one-dimensional channel attention feature map and the heat map to obtain a feature map on a channel dimension, learning a target feature through maximum pooling and average pooling of the feature map on the channel dimension, merging the one-dimensional channel attention feature map and the feature map on the channel dimension according to the target feature, performing feature adjustment on a merging result through convolution operation and a second Sigmoid function, and mapping the merging result into a two-dimensional space attention feature map according to an adjustment result to extract channel information and space information in the corresponding feature map output.

In an example, the performing, by using a preset target semantic pooling branch, feature extraction on the fourth-level feature to obtain pooled context information includes:

calculating the similarity between a single pixel and all pixels in the fourth-level feature to obtain a semantic mapping relation between target semantics and each pixel;

aggregating all pixels on the semantic mapping relation, and calculating the label of the current single pixel;

repeatedly calculating labels for all pixels in the fourth level feature to obtain pooled context information.

In an example, the performing feature re-extraction based on the feature map, and obtaining refined features further includes:

performing multi-modal cyclic MCF fusion on the second-level features and the third-level features;

and carrying out edge thinning on the feature map obtained by MCF fusion to obtain a second fusion feature.

In one example, the fusing based on the feature map and the refined feature, and obtaining a fused feature map includes:

splicing and fusing the first fusion feature and the second fusion feature to obtain a third fusion feature;

and connecting the third fused feature with the first hierarchical feature and performing feature refinement to obtain a fused feature map.

In an example, the extracting the overlap region in the fused feature map by using a preset sliding window stitching strategy includes:

shearing the fusion feature map according to a preset step length by using a sliding window with a preset specification to obtain a plurality of shearing samples;

classifying the obtained plurality of shearing samples to obtain a classification score map of each shearing sample;

and extracting the overlapping and covering area of the SAR image according to the classification score map of each shearing sample.

An embodiment of the present invention further provides a computer-readable storage medium, where a computer program is stored on the computer-readable storage medium, and when the computer program is executed by a processor, the steps of the foregoing synthetic aperture radar image overlap area extraction method are implemented.

According to the method and the device, the SAR image synthesis is carried out, the sub-samples are divided, the feature maps extracted from the sub-samples are fused with the refined features to obtain the fused feature map, and the overlapping area in the fused feature map is extracted by using the preset sliding window splicing strategy, so that the automatic extraction of the overlapping area of the high-resolution SAR image is realized, and the positive technical effect is achieved.

The foregoing description is only an overview of the technical solutions of the present invention, and the embodiments of the present invention are described below in order to make the technical means of the present invention more clearly understood and to make the above and other objects, features, and advantages of the present invention more clearly understandable.

Drawings

Various other advantages and benefits will become apparent to those of ordinary skill in the art upon reading the following detailed description of the preferred embodiments. The drawings are only for purposes of illustrating the preferred embodiments and are not to be construed as limiting the invention. Also, like reference numerals are used to refer to like parts throughout the drawings. In the drawings:

FIG. 1 is a basic flow diagram of an embodiment of the present invention;

FIG. 2 is a block diagram of an embodiment of the present invention;

FIG. 3 is a schematic view of an example of an overlap mask according to an embodiment of the present invention;

fig. 4 is a schematic diagram of an improved backbone network structure according to an embodiment of the present invention;

FIG. 5 is a schematic diagram of a pyramid convolution according to an embodiment of the present invention;

FIG. 6 is a channel space attention module according to an embodiment of the present invention;

FIG. 7 is a block diagram of a target semantic pooling module according to an embodiment of the present invention;

FIG. 8 is a diagram illustrating semantic embedding branches according to an embodiment of the present invention;

fig. 9 is a schematic diagram of a splicing strategy according to an embodiment of the present invention.

Detailed Description

Exemplary embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While exemplary embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure may be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art.

Deep learning develops very rapidly in the direction of semantic segmentation. The semantic segmentation is a deep learning method for performing feature learning based on image pixel points so as to realize different types of image division. The detection of the overlap mask needs to extract all the overlap mask features, and the principle is also consistent with the idea of semantic segmentation. At present, most scholars use the traditional detection method to carry out the overlap-mask detection on the SAR image, but the deep learning method is more suitable for distinguishing the implicit characteristics in the SAR image. Therefore, the combination of the mainstream depth learning algorithm and the high-resolution SAR image overlap extraction is significant. And combining a deep learning convolution network and object-based image analysis, and making a data set by using an optical image, interference coherent data and a mathematical elevation model DEM (dynamic effect model) for segmentation so as to realize the classification and extraction of the overlap mask. The convolutional neural network (CNN-ISS) and the convolutional long-short term memory network (CLSTM-ISS) for interference semantic segmentation are used for classification extraction of the overlapped region. A phase estimation method assisted by interferogram segmentation utilizes an improved Full Convolution Network (FCN) to extract an overlapped area in an interferogram element. A deep learning network MDDA (Multi-level and Densey dual attribute) for extracting a high-resolution SAR image target can realize high-precision target extraction, but the network has higher requirements on a data set and longer training time. A deep learning method that extracts targets efficiently, but is more suitable for small sample datasets.

The embodiment of the invention provides a synthetic aperture radar image overlap area extraction method, as shown in fig. 1, including:

s101, acquiring a Synthetic Aperture Radar (SAR) image, and a coherent coefficient diagram and an interference phase diagram corresponding to the SAR image;

s102, respectively extracting one image channel of the SAR image, the corresponding correlation coefficient image and the interference phase image, and synthesizing the three extracted image channels according to a preset rule to obtain a synthesized image.

As shown in fig. 2, in this embodiment, first, a channel is extracted from a coherence coefficient map and an interference phase map corresponding to an SAR image, and a three-channel synthetic image is generated according to a preset rule, for example, according to an order of r, g, and b, so that information overlapped in a sample is easier to identify, as shown in fig. 3, (a), (b), (c) are the SAR image and its corresponding coherence coefficient map and interference phase map, respectively, and (d) is the three-channel synthetic image. It can be seen that the overlap area is more clearly distinguished from the background area in the fused image.

S103, dividing the composite image into a plurality of sub-samples, and performing feature extraction on the plurality of sub-samples to obtain a feature map;

s104, performing feature re-extraction based on the feature map to obtain refined features;

as shown in fig. 2, the encoding module includes a modified backbone network PyresNet _101, a Multi-scale spatial attention pyramid (MCSP), and a Semantic Embedding Branch (SEB). In specific implementation, the synthesized image may be subjected to sliding window cutting into small samples for training, feature extraction is performed on a plurality of the sub-samples through an improved backbone network PyresNet _101, and then feature re-extraction is performed through a Multi-scale spatial attention pyramid (MCSP) and a Semantic Embedding Branch (SEB) based on a feature map, so as to obtain refined features.

S105, fusing based on the feature graph and the refined feature to obtain a fused feature graph;

and S106, extracting the overlapping area in the fusion characteristic diagram by using a preset sliding window splicing strategy.

The method can complete the fusion of the features through a coding module to obtain a fusion feature map, and can completely extract the overlapping area through an improved sliding window splicing strategy to realize the overlapping area detection of the high-resolution SAR image.

The SAR image overlapping area extraction method realizes the quick automatic extraction of the overlapping area of the high-resolution SAR image.

In an example, the performing feature extraction on a number of the subsamples to obtain a feature map includes:

performing feature extraction by using a plurality of subsamples of the improved backbone network to obtain a first level feature, a second level feature, a third level feature and a fourth level feature with four gradually increased channel numbers;

the improved backbone network is obtained by respectively replacing common two-dimensional convolutions in three operation blocks behind a rapid training residual error network ResNet _101 pooling layer with pyramid convolutions with different scales and depths, and replacing the last convolution layer of ResNet _101 with a convolution cascade module with a preset specification, wherein the kernel size of the pyramid convolution is increased progressively, and the convolution depth is decreased progressively.

In this example, an improved backbone network PyResNet _101 is proposed, which uses ResNet _101 with pyramid convolution and hole convolution as the backbone network. ResNet has characteristics such as jump connection, optimization residual error, and its structure can accelerate training, improves the model accuracy, is applicable to very much and builds the semantic segmentation network. The pyramid convolution comprises convolution kernels with different scales and depths, so that multi-scale features of the target object can be extracted; the hole convolution can enable the output of the subsequent convolution layer to keep larger feature map size, so that more semantic information is kept, and meanwhile, more detailed information such as edges is contained, and the problem that detailed features are easy to lose in the network pooling operation is solved.

Fig. 4 and fig. 5 show an improved structural part of ResNet _101 in this example, common two-dimensional convolutions in three blocks behind the ResNet _101 pooling layer are replaced by pyramid convolutions with different scales and depths, the kernel size of the pyramid convolution is increasing, and the convolution depth is decreasing, so that the convolution layer can resolve input in parallel on multiple scales to capture more detailed information, which helps to improve the identification performance of the network. In some examples, a single 3 × 3 convolution cascade module with a void rate of 2, 4, and 8 respectively may be used to replace the last convolution layer, thereby increasing the number of network layers, enriching semantic information of the feature map, performing down-sampling only 16 times on the original map, increasing the receptive field while maintaining the resolution of the features, and enabling the features finally output by ResNet _101 to have more detailed information while maintaining more semantic information.

In one example, the performing feature re-extraction based on the feature map to obtain refined features includes:

inputting the fourth level features into a multi-receptive field parallel attention pooling working layer to obtain image level features and context information corresponding to receptive fields; and

performing feature extraction on the fourth level features through a preset target semantic pooling branch to obtain pooling context information;

determining a first fusion characteristic according to the image-level characteristics, the context information of each receptive field and the pooling context information;

the multi-receptive-field parallel attention pooling working layer comprises convolution of four different cavity rates in parallel and global average pooling.

As shown in fig. 2, a Multi-scale spatial attention pyramid (MCSP) is introduced in this example. The MCSP in this example comprises two parts: and the multi-receptive field is parallel to the attention pooling working layer and the OCP branch. Since the output fourth-level feature map of the backbone network after the improvement contains 2048 channels and rich semantic information, the features are simultaneously input into the multi-sense-field parallel attention pooling working layer and the OCP branch in the present example. An exemplary multi-receptive-field parallel attention pooling work may include a 1 × 1 convolution with a hole rate of 1, three 3 × 3 convolutions with hole rates of 6,12, and 18, respectively, and a Global Average Pooling (GAP) built in parallel. In the example, the cavity volumes of four different cavity rates can effectively capture the context information of the image from different receptive fields; adding global average pooling to perform down-sampling processing on the features, extracting image-level features, and then recovering the input size through bilinear interpolation up-sampling so that the feature map can contain more global information; the OCP forms another branch circuit to enhance the object information in a display mode and extract the context information of the features more flexibly.

In an example, the inputting the fourth-level features into the multi-receptive-field parallel attention pooling working layer, and the obtaining the image-level features and the context information of the corresponding receptive fields includes:

inputting the fourth-level features into parallel convolutional layers with four different void ratios to obtain four-way feature diagram output;

inputting the four characteristic diagram outputs into a channel space attention module of a corresponding branch, so as to extract channel information and space information in the corresponding characteristic diagram outputs through the channel space attention module;

and determining context information corresponding to the receptive field according to the four-path characteristic diagram output and the corresponding channel information and the space information.

As shown in fig. 6, the Channel and Spatial Attention Module (CSAM) is referred to in this example.

The attention module in this example is important in deep learning networks. Basis of channel space residual attention module in channel space attentionA residual branch is introduced. The CSAM firstly compresses the feature map on the space dimension through average pooling and maximum pooling, then inputs a shared network consisting of a multi-layer perceptron (MLP) and a hidden layer, readjusts the input feature map through a Sigmoid function, and maps the feature map into a one-dimensional channel attention feature map Mc∈RC×1×1. Multiplying the one-dimensional channel attention heat map and the feature map to obtain a feature map on the channel dimension, learning specific features through maximum pooling and average pooling, concat two feature maps, adjusting the feature map through convolution operation and a Sigmoid function, and finally mapping the feature map into a two-dimensional space attention feature map Ms∈R1×H×WAnd the function of extracting useful channel and space information is achieved. When the size of the input feature diagram is X ∈ RC×H×WChannel attention map Mc∈RC ×1×1Spatial attention map Ms∈R1×H×WThe calculation is as follows:

where σ denotes the sigmoid activation function, W0∈RC/r×C,W1∈RC×C/rR is the reduction ratio, the weight W of MLP0And W1Two inputs are shared, W0The ReLU is then used as an activation function,representing pixel-by-pixel multiplication.

And the input X of the residual branch circuit composed of CSAM is respectively from the multi-scale feature extraction branch circuit composed of four cavity convolutions. The channel feature attention and the space attention are jointly applied to the multi-scale feature map, the importance of pixels of different channels is considered, the importance of pixels at different positions of the same channel is also considered, information of the multi-scale features is richer, feature re-screening is achieved, then the multi-scale feature map output by the cavity convolution branch and the re-screened features are weighted based on residual errors, and the features are balanced.

In one example, the extracting, by the channel spatial attention module, channel information and spatial information in a corresponding feature map output includes:

compressing the feature map output of the corresponding branch through the average pooling and the maximum pooling of the channel space attention module;

inputting the compressed feature map into an appointed shared network for processing, performing feature adjustment on a result obtained by processing through a first Sigmoid function, and mapping the result into a one-dimensional channel attention feature map according to the adjustment result, wherein the appointed shared network comprises a plurality of layers of perceptrons and hidden layers;

multiplying the one-dimensional channel attention feature map and the heat map to obtain a feature map on a channel dimension, learning a target feature through maximum pooling and average pooling of the feature map on the channel dimension, merging the one-dimensional channel attention feature map and the feature map on the channel dimension according to the target feature, performing feature adjustment on a merging result through convolution operation and a second Sigmoid function, and mapping the merging result into a two-dimensional space attention feature map according to an adjustment result to extract channel information and space information in the corresponding feature map output.

In an example, the performing, by using a preset target semantic pooling branch, feature extraction on the fourth-level feature to obtain pooled context information includes:

calculating the similarity between a single pixel and all pixels in the fourth-level feature to obtain a semantic mapping relation between target semantics and each pixel;

aggregating all pixels on the semantic mapping relation, and calculating the label of the current single pixel;

repeatedly calculating labels for all pixels in the fourth level feature to obtain pooled context information.

Another branch-target semantic-pooling OCP (object-conditional _ pooling) of the MCSP is illustrated in this example. The target semantic pooling obtains a label of a pixel contained in a certain object by using information of a pixel set belonging to the same object, and does not predict pixel by pixel but performs semantic segmentation after gathering similar pixel points.

As shown in fig. 7, the input X is the last layer feature map of the backbone network, P is the corresponding location feature map, W is the semantic map generated by the attention mechanism, and C is the final output label of the OCP. 1) The similarity between a single pixel and all pixels is calculated to obtain the target semantics and a mapping for each pixel, denoted as wpThe concrete calculation is as follows; 2) all pixels on the target semantic map are aggregated to compute the label of the current pixel, as follows.

Wherein xpIs a characterization vector of pixels p and i, the normalization parameter ZpIs the sum of all similarities, whereinfq(.) and fk(.) represent the traversal and key transfer functions respectively,is a value transfer function.

Due to the local receptive field in the convolution operation, the corresponding features of the pixels of the same label may be different, and such differences may lead to inconsistencies within the class, thereby affecting the recognition accuracy. Thus, a correlation is established between features through target semantic pooling using an attention mechanism, thereby adaptively integrating similar features on any scale from a global perspective. Compared to using a fixed spatial representation to extract context information while simultaneously including object information and background information, the OCP in this example focuses on the relationship between objects in the global view, can display enhanced object information, and is more flexible.

In an example, the performing feature re-extraction based on the feature map, and obtaining refined features further includes:

performing multi-modal cyclic MCF fusion on the second-level features and the third-level features;

and carrying out edge thinning on the feature map obtained by MCF fusion to obtain a second fusion feature.

In this example, a semantic embedding branch seb (semantic embedding branch) is described.

The semantic embedding branch comprises an MCF feature fusion module and a GBR edge thinning module. The MCF fusion module fuses feature maps output by backbone networks pyres2 and pyres3, namely features at the second level and features at the third level, so that continuity of pixels in images during subsequent up-sampling is stronger, and restored pixel values are closer to pixel values before feature map down-sampling. And then the fused feature graph is subjected to edge refinement through a GBR edge refinement module, so that the edge extraction capability of the coding block is further improved from the global information.

The high-level semantic embedding introduces more semantic information into the low-level features, and solves the problem that the low-level features contain less semantic information and the semantic resolution is not enough to restore. The GCN and BR units in the edge refining module can effectively solve the problems of pixel point classification and positioning in semantic segmentation. The detailed structure of the semantic embedding branch is given in fig. 8.

The MCF samples the high-level feature map containing more semantic information to the same resolution of the low-level feature map through bilinear interpolation, and then performs feature fusion in a pixel-by-pixel multiplication mode. So that the low-level features contain more semantic information. The fused features output by the MCF are input to a global edge refinement module (GBR), which includes GCN and BR cells. The GCN uses symmetrical independent convolution kernels 1 xk + kx1 and kx1 +1 xk to form large-scale convolution kernels, and Relu activation functions are not used in the middle of the large-scale convolution kernels, so that the parameter quantity and the calculated quantity can be obviously reduced compared with the situation that the k xk convolution kernels are directly used, k is 7, the capability of the network for processing different resolution characteristic diagrams is improved, global information can be better acquired, and the classification capability is effectively improved. In order to reduce the situation of object boundary pixel mispartition and further improve the positioning capability near the target boundary, the output characteristics of the GCN are balanced through a residual error module BR formed by small convolution kernels, the edge activation value is improved, and edge refinement is realized.

In one example, the fusing based on the feature map and the refined feature, and obtaining a fused feature map includes:

splicing and fusing the first fusion feature and the second fusion feature to obtain a third fusion feature;

and connecting the third fused feature with the first hierarchical feature and performing feature refinement to obtain a fused feature map.

As shown in fig. 2, the decoding portion includes three inputs, i.e., a high-level feature output by the MCSP, a low-level feature output by the backbone network, and a fusion feature of the SEB semantic embedded branch output. The characteristics output by MCSP are subjected to splicing and fusion by twice upsampling and the characteristics output by SEB, so that the high-level characteristics and the medium-level characteristics are fused by one-time upsampling, the semantic information and the detail information of the image are considered, and the error caused by direct recovery of the space dimensionality of the high-level characteristics is reduced. And then performing dimensionality reduction through convolution of 1x1, performing bilinear interpolation for 2 times, performing upsampling, and connecting with low-level features from a backbone network with the same spatial resolution, wherein the low-level features comprise a large number of channels, so that the convolution of 1x1 is also adopted in the example to reduce the number of channels and reduce unnecessary channel calculation of the network. And after connection, performing feature refinement on the convolution layer of 3x3, performing bilinear interpolation four times up-sampling, inputting the upsampled convolution layer into a global edge refinement module GBR, performing further boundary refinement, and finally realizing the extraction of superposition.

In an example, the extracting the overlap region in the fused feature map by using a preset sliding window stitching strategy includes:

shearing the fusion feature map according to a preset step length by using a sliding window with a preset specification to obtain a plurality of shearing samples;

classifying the obtained plurality of shearing samples to obtain a classification score map of each shearing sample;

and extracting the overlapping and covering area of the SAR image according to the classification score map of each shearing sample.

In this example, a model generated through network training may be used to classify a high-resolution SAR image, and in order to obtain a better result, a sliding window stitching detection strategy is used in this example to cut the high-resolution SAR image into small image samples of a certain size for training.

Due to the fact that the integrity of the target is often damaged when the image is cut, and due to the fact that the discontinuity of the classification result at the junction of two adjacent windows when the splicing is directly tested, the detection error of the edge area can be caused. Based on this, the sliding window stitching method shown in fig. 9 is used in this example to stitch two adjacent images. The specific operation is as follows: in the process of performing the eclipse extraction by using the training model, a sliding window with a preset specification, for example, a sliding window with a size of 512 × 512, is used to cut out the high-resolution SAR image, and the step size is set to 412, specifically as shown in S11, S12, S21 and Smn in fig. 9, and two adjacent windows have 100 pixels overlapping. After both windows have been tested, the classification result of the overlapping region will be generated by averaging the neighboring windows. By the splicing method, a score map with the minimum classification boundary error can be obtained.

The improved network can realize the rapid automatic extraction of the SAR image overlapping area. And the improved network is light, the network layer generation time is greatly shortened, and the network training time and the picture testing time are reduced. MCSP enables the network to learn global features and encode effective features more flexibly from multiple scales in an all-around mode, SEB fuses characteristics of middle layers of backbone networks, the problem of feature misalignment caused by direct fusion of high-level features and low-level features is solved, and a global edge refinement module GBR enables edge information to be decoded and extracted completely. According to the method, the superposition can be more effectively extracted from the millimeter wave data by integrating the extraction precision, the data set training time and the picture testing time.

An embodiment of the present invention further provides a computer-readable storage medium, where a computer program is stored on the computer-readable storage medium, and when the computer program is executed by a processor, the steps of the foregoing synthetic aperture radar image overlap area extraction method are implemented.

It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.

The above-mentioned serial numbers of the embodiments of the present invention are merely for description and do not represent the merits of the embodiments.

Through the above description of the embodiments, those skilled in the art will clearly understand that the method of the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but in many cases, the former is a better implementation manner. Based on such understanding, the technical solutions of the present invention may be embodied in the form of a software product, which is stored in a storage medium (such as ROM/RAM, magnetic disk, optical disk) and includes instructions for enabling a terminal (such as a mobile phone, a computer, a server, an air conditioner, or a network device) to execute the method according to the embodiments of the present invention.

While the present invention has been described with reference to the embodiments shown in the drawings, the present invention is not limited to the embodiments, which are illustrative and not restrictive, and it will be apparent to those skilled in the art that various changes and modifications can be made therein without departing from the spirit and scope of the invention as defined in the appended claims.

完整详细技术资料下载
上一篇:石墨接头机器人自动装卡簧、装栓机
下一篇:一种烟叶长度的测量方法

网友询问留言

已有0条留言

还没有人留言评论。精彩留言会获得点赞!

精彩留言,会给你点赞!