Adaptive semantic segmentation method for remote sensing image domain
1. A remote sensing image domain adaptive semantic segmentation method is characterized by comprising the following steps:
step 1), respectively determining a source domain data set and a target domain data set, and performing semantic label processing on the source domain data set to obtain a corresponding real semantic label graph;
step 2), training a Deeplab-v2 semantic segmentation model on a source domain data set;
step 3), constructing a remote sensing image domain adaptive semantic segmentation model by using a Deeplab-v2 semantic segmentation model;
step 4), inputting the remote sensing image in the target domain data set into the remote sensing image domain adaptive semantic segmentation model in the step 3) for prediction to obtain a semantic segmentation prediction map of the target domain data set;
in the step 3), the remote sensing image domain adaptive semantic segmentation model comprises a semantic segmentation model S and a combined discriminator D, wherein the semantic segmentation model S comprises a feature extractor F, a category attention module CCA and a classifier C which are sequentially connected; the combination discriminator D comprises a global discriminator D arranged in parallelGAnd a category discriminator DCThe global discriminationDevice DGAnd a category discriminator DCAre connected to a feature extractor F, and a class discriminator DCIs connected to the class attention module CCA, said global arbiter DGFor output.
2. The remote-sensing image domain adaptive semantic segmentation method according to claim 1, wherein in the step 3), the step of constructing the remote-sensing image domain adaptive semantic segmentation model comprises:
step 3.1), respectively initializing parameters of the feature extractor F and the classifier C by using the parameters included in the semantic segmentation model of the deep-drop-v 2 in the step 2);
step 3.2), training a semantic segmentation model S on the source domain data set, and updating network parameters of the semantic segmentation model S;
step 3.3), updating the parameters of the feature extractor F based on the target domain data set;
step 3.4), updating the network parameters of the combined discriminator D based on the source domain data set and the target domain data set;
step 3.5) and repeating the steps 3.2) -3.4) until a converged remote sensing image domain adaptive semantic segmentation model is obtained, and storing parameters of the converged remote sensing image domain adaptive semantic segmentation model.
3. The remote sensing image domain adaptive semantic segmentation method according to claim 2, wherein the step 3.2) comprises the steps of:
step 3.2.1), inputting the remote sensing image in the source domain data set into a feature extractor F, and extracting high-level features F of the remote sensing image in the source domain data sets;
Step 3.2.2), high-level feature fsInput type discriminator DCObtaining a category domain label f of a source domaincs;
Step 3.2.3), high level feature fsAnd a category domain label fcsSimultaneously inputting the image into a category attention module CCA to obtain the splicing characteristics of the source domain remote sensing image;
step 3.2.4), inputting the splicing characteristics obtained in the step 3.2.3) into a classifier C for pixel-by-pixel classification, and performing up-sampling on a classification result to obtain a semantic label prediction graph with the same size as the input source domain image;
step 3.2.5), calculating errors of the semantic label prediction graph obtained in the step 3.2.4) and the real semantic label graph in the source domain data set by using a cross entropy loss function, reversely spreading the calculation errors, and updating network parameters of the semantic segmentation model S; wherein the cross entropy loss function expression (1) is:
in the expression (1), M represents the number of samples, y(k)The true semantic tag value representing the kth sample,represents the prediction tag value, L, of the kth samplesegThe loss value is indicated.
4. The remote sensing image domain adaptive semantic segmentation method according to claim 3, wherein the step 3.3) comprises the steps of:
step 3.3.1), inputting the remote sensing image in the target domain data set into a feature extractor F, and extracting high-level features F of the remote sensing image in the target domain data sett;
Step 3.3.2), high level feature ftInput global arbiter DGGet the global label fgtThe high level feature ftInput type discriminator DCGet the class domain label fct;
Step 3.3.3), labeling the whole local area label fgtCalculating the global countermeasure loss by using a first binary cross entropy loss function expression (3) with the source domain label 0, and labeling a category domain label fctCalculating class-level countermeasure loss by using a second binary cross entropy loss function expression (4) together with the source domain label 0, and performing weighted summation on the global countermeasure loss and the class-level countermeasure loss to obtain a secondA total countermeasure loss, which is propagated reversely, and the network parameters of the feature extractor F are updated; wherein the first overall countermeasure loss function expression (2) is:
Ladv(XT)=λadv_gLadv_g(XT)+λadv_cLadv_c(XT) (2)
in the expression (2), Ladv_g(XT) And Ladv_c(XT) Respectively representing global countermeasure losses and class-level countermeasure losses, λadv_gAnd λadv_cWeights representing global countermeasure loss and class-level countermeasure loss, X, respectivelyTAn image representing a target domain;
Ladv_g(XT) Expression (3) of (c) is:
Ladv_c(XT) Expression (4) of (a) is:
in expressions (3) and (4), PT(x) Data distribution, x-P, representing a target domain data setT(x) Representing remote sensing image obeys P in target domain data setT(x) The distribution of (a) to (b) is,represents x to PT(x) Expectation of (1), F (X)T) Representing features of the target domain extracted by the feature extractor F, Dg(F(XT) Global discriminator output representing the target domain image,and a category level discriminator output indicating the target domain image when the category number is the i-th category, wherein N represents the category number.
5. The remote sensing image domain adaptive semantic segmentation method according to claim 4, wherein the specific process of the step 3.4) is as follows:
extracting the high-level features f extracted in the step 3.2.1)sAnd the high-level features f extracted in step 3.3.1)tRespectively input into a combined discriminator D, and output a global area label f through the combined discriminator Dgs、fgtAnd a category domain label fcs、fctGlobal area label f to be outputgs、fgtCalculating the global countermeasure loss L by using a third binary cross entropy loss function expression (6) with the source domain label 0 and the target domain label 1adv_g(Xs,XT) Class domain label f to be outputcs、fctCalculate class-level confrontation loss L using fourth binary cross entropy loss function expression (7) with source domain label 0 and target domain label 1adv_c(Xs,XT) And will globally fight off the loss Ladv_g(Xs,XT) And class level fight loss Ladv_c(Xs,XT) Carrying out weighted summation to obtain a second overall countermeasure loss function expression (5), carrying out back propagation on the second overall countermeasure loss, and updating the network parameters of the combination discriminator D; wherein the second overall countermeasure loss function expression (5) is:
Ladv(XS,YS,XT)=λadv_gLadv_g(Xs,XT)+λadv_cLadv_c(Xs,XT) (5)
in the expression (5), Ladv(XS,YS,XT) Representing a second overall confrontation loss value, XSImage, X, representing a source domainTImage representing a target field, Ladv_g(Xs,XT) And Ladv_c(Xs,XT) Respectively representing global countermeasure losses and class-level countermeasure losses, λadv_gAnd λadv_cA weight representing a global countermeasure loss and a weight representing a category-level countermeasure loss, respectively;
Ladv_g(Xs,XT) Expression (6) of (a) is:
Ladv_c(Xs,XT) Expression (7) of (a) is:
in expressions (6) and (7), PS(x) Data distribution, P, representing a source domain data setT(x) Data distribution, x-P, representing a target domain data setS(x) Representing remote-sensing image obeys P in source domain data setS(x) Distribution of (a), x to PT(x) Representing remote sensing image obeys P in target domain data setT(x) The distribution of (a) to (b) is,represents x to PS(x) In the expectation that the position of the target is not changed,represents x to PT(x) Expectation of (1), F (X)S) Represents the source domain features extracted by the feature extractor F, F (X)T) Representing features of the target domain extracted by the feature extractor F, Dg(F(XS) Global discriminator output, D) representing the source domain imageg(F(XT) Global discriminator output, D) representing the target domain imagec(F(XS) Class level discriminator output, D) representing the source domain imagec(F(XT) A class level discriminator output representing a target domain image, N representing the number of classes,a category level discriminator output of the source domain image indicating the category number as the ith category,and a category level discriminator output for the target domain image when the category number is the i-th category.
6. The remote sensing image domain adaptive semantic segmentation method according to claim 5, wherein the feature extractor F is a convolution feature extractor ResNet-101.
7. The remote sensing image domain adaptive semantic segmentation method according to claim 6, wherein the convergence difference of the converged remote sensing image domain adaptive semantic segmentation model obtained in the step 3.5) is 0.05-0.15.
8. The remote sensing image domain adaptive semantic segmentation method according to any one of claims 1-7, wherein the step 2) comprises:
step 2.1), inputting the remote sensing image in the source domain data set into deep-v 2 to obtain a pixel-by-pixel prediction result;
step 2.2), calculating errors of the prediction result obtained in the step 2.1) and the real semantic label graph by using a cross entropy loss function expression (1), reversely propagating the calculation errors, and updating Deeplab-v2 parameters;
step 2.3), repeating the steps 2.1) -2.2), obtaining a converged Deeplab-v2 semantic segmentation model, and storing parameters of the converged Deeplab-v2 semantic segmentation model.
9. The method for adaptive semantic segmentation of remote sensing image domains according to claim 8, wherein the converged deep-v 2 semantic segmentation model obtained in step 2.3) has a convergence difference of 0.05-0.15.
10. The method for adaptive semantic segmentation of remote sensing image domains according to claim 9, further comprising the step of clipping the images in the target domain data set and the images in the source domain data set with the real semantic label map according to an inverse ratio of the resolution to obtain corresponding image blocks in step 1).
Background
The development of remote sensing technology has led to an increasing number of high resolution remote sensing images (HRSI). Semantic segmentation is an important task of HRSI analysis, which aims to assign a specific semantic class to each pixel, different semantic classes having different features and attributes (such as color, intensity and texture), and the same semantic class having similar features and attributes (such as color, intensity and texture). The semantic segmentation of the HRSI plays an important role in urban traffic management and planning, accurate agriculture, disaster prediction and other applications. In recent years, Deep Convolutional Neural Networks (DCNN) have shown outstanding performance in terms of feature representation. Therefore, some DCNN-based semantic segmentation methods, such as FCN, SegNet, UNet, PSPNet and deep lab, are widely applied to pixel-by-pixel classification of high-resolution remote sensing images and have made good progress. However, the deep semantic segmentation models constructed by these methods have a problem of insufficient portability, that is, when the deep semantic segmentation model trained on a specific labeled remote sensing data set (source domain) is used for predicting another unlabeled remote sensing data set (target domain) with large distribution difference, the prediction performance of the model is significantly reduced.
To solve the above-mentioned problem of domain distribution difference between the source data set and the target data set, a domain adaptation technique is proposed. Domain adaptation is a branch of migration learning that takes advantage of knowledge learned from labeled source domain data to perform new tasks on unlabeled target domains. In recent years, domain adaptation methods are used for semantic segmentation tasks. Hoffman et al aligns the source and target domains in the feature space from a global and local level. The course domain adaptation method learns global label distribution of images and local label distribution of label superpixels to minimize domain gaps in semantic segmentation. AdaptSegNet improves the performance of semantic segmentation by aligning the output spaces of the source and target domains using a multi-level countermeasure network. Lua et al use a class-level countermeasure network to enhance local semantic consistency. The domain adaptation method aligns a source domain and a target domain at a pixel level; another domain adaptation method aligns two domains at the pixel and feature level, driven by the image-to-image conversion effort. This approach is generally composed of two independent sub-networks: image-to-image conversion sub-networks and semantic segmentation sub-networks, i.e. before training the semantic segmentation model, the source domain imagery is mapped to the target domain imagery using image translation techniques to reduce the differences between the domains. The DCAN visually converts the source domain image into the target domain and then performs feature alignment at the feature level. Li et al introduced a two-way learning framework to alternately train image translation and segmentation adaptation models to narrow the domain gap.
Although the domain adaptation methods described above achieve good performance in cross-domain semantic segmentation, they are proposed for natural image datasets. Since there is a great difference between the HRSI and the natural image in terms of shooting angle, spatial complexity, image resolution, etc., the effect of directly using these methods to perform semantic segmentation on the HRSI is not ideal. To address this challenge, Benjdira et al propose a HRSI cross-domain semantic segmentation algorithm based on generation of countermeasure networks (GANs). The algorithm first converts the source domain image to the target domain image using a GAN model. The transformed model is then used to fine-tune the semantic model trained in the source domain. However, the performance of semantic segmentation is limited by the quality of image conversion, and once conversion fails, the accuracy of semantic segmentation is also reduced. Furthermore, the image-to-image conversion can only make the source domain image similar to the target domain image in image style (such as color distribution and texture features), and it is difficult to reduce the difference between the context information and the category representation of the image.
The existing domain adaptation methods described above mostly have the following problems in the domain adaptation process: 1) most of the existing domain adaptation methods only pursue the consistency of global distribution and ignore the difference of local joint distribution in the domain adaptation process, thereby causing the problems of negative migration and difficult migration; 2) the existing domain adaptation semantic segmentation method treats the content of one image identically in the domain adaptation process, and due to the influences of spatial resolution, appearance distribution, object size and scene context information, different regions and categories in one image show different degrees of difference when domain knowledge is migrated. Therefore, the existing domain adaptation method cannot meet the cross-domain semantic segmentation task of the HRSI.
In summary, there is an urgent need for a remote sensing image domain adaptive semantic segmentation method to solve the problems of negative migration, difficult migration and different degree difference during domain knowledge migration in the domain adaptation process of the existing domain adaptation method.
Disclosure of Invention
The invention aims to provide a remote sensing image domain adaptive semantic segmentation method, which comprises the following specific technical scheme:
a remote sensing image domain adaptive semantic segmentation method comprises the following steps:
step 1), respectively determining a source domain data set and a target domain data set, and performing semantic label processing on the source domain data set to obtain a corresponding real semantic label graph;
step 2), training a Deeplab-v2 semantic segmentation model on a source domain data set;
step 3), constructing a remote sensing image domain adaptive semantic segmentation model by using a Deeplab-v2 semantic segmentation model;
step 4), inputting the remote sensing image in the target domain data set into the remote sensing image domain adaptive semantic segmentation model in the step 3) for prediction to obtain a semantic segmentation prediction map of the target domain data set;
in step 3), the remote sensing image domain adaptive semantic segmentation model comprises semantic segmentationThe semantic segmentation model S comprises a feature extractor F, a category attention module CCA and a classifier C which are sequentially connected; the combination discriminator D comprises a global discriminator D arranged in parallelGAnd a category discriminator DCThe global arbiter DGAnd a category discriminator DCAre connected to a feature extractor F, and a class discriminator DCIs connected to the class attention module CCA, said global arbiter DGFor output.
Preferably, in step 3), the step of constructing the remote sensing image domain adaptive semantic segmentation model includes:
step 3.1), respectively initializing parameters of the feature extractor F and the classifier C by using the parameters included in the semantic segmentation model of the deep-drop-v 2 in the step 2);
step 3.2), training a semantic segmentation model S on the source domain data set, and updating network parameters of the semantic segmentation model S;
step 3.3), updating the parameters of the feature extractor F based on the target domain data set;
step 3.4), updating the network parameters of the combined discriminator D based on the source domain data set and the target domain data set;
step 3.5) and repeating the steps 3.2) -3.4) until a converged remote sensing image domain adaptive semantic segmentation model is obtained, and storing parameters of the converged remote sensing image domain adaptive semantic segmentation model.
Preferably, said step 3.2) comprises the steps of:
step 3.2.1), inputting the remote sensing image in the source domain data set into a feature extractor F, and extracting high-level features F of the remote sensing image in the source domain data sets;
Step 3.2.2), high-level feature fsInput type discriminator DCObtaining a category domain label f of a source domaincs;
Step 3.2.3), high level feature fsAnd a category domain label fcsSimultaneously inputting the image into a category attention module CCA to obtain the splicing characteristics of the source domain remote sensing image;
step 3.2.4), inputting the splicing characteristics obtained in the step 3.2.3) into a classifier C for pixel-by-pixel classification, and performing up-sampling on a classification result to obtain a semantic label prediction graph with the same size as the input source domain image;
step 3.2.5), calculating errors of the semantic label prediction graph obtained in the step 3.2.4) and the real semantic label graph in the source domain data set by using a cross entropy loss function, reversely spreading the calculation errors, and updating network parameters of the semantic segmentation model S; wherein the cross entropy loss function expression (1) is:
in the expression (1), M represents the number of samples, y(k)The true semantic tag value representing the kth sample,represents the prediction tag value, L, of the kth samplesegThe loss value is indicated.
Preferably, said step 3.3) comprises the steps of:
step 3.3.1), inputting the remote sensing image in the target domain data set into a feature extractor F, and extracting high-level features F of the remote sensing image in the target domain data sett;
Step 3.3.2), high level feature ftInput global arbiter DGGet the global label fgtThe high level feature ftInput type discriminator DCGet the class domain label fct;
Step 3.3.3), labeling the whole local area label fgtCalculating the global countermeasure loss by using a first binary cross entropy loss function expression (3) with the source domain label 0, and labeling a category domain label fctCalculating class-level countermeasure loss by using a second binary cross entropy loss function expression (4) together with the source domain label 0, carrying out weighted summation on the global countermeasure loss and the class-level countermeasure loss to obtain a first total countermeasure loss, carrying out back propagation on the loss, and updating the network parameters of the feature extractor F; wherein the first overall damage toleranceThe loss function expression (2) is:
Ladv(XT)=λadv_gLadv_g(XT)+λadv_cLadv_c(XT) (2)
in the expression (2), Ladv_g(XT) And Ladv_c(XT) Respectively representing global countermeasure losses and class-level countermeasure losses, λadv_gAnd λadv_cWeights representing global countermeasure loss and class-level countermeasure loss, X, respectivelyTAn image representing a target domain;
Ladv_g(XT) Expression (3) of (c) is:
Ladv_g(XT)=-Ex~PT(x)[log Dg(F(XT))] (3)
Ladv_c(XT) Expression (4) of (a) is:
in expressions (3) and (4), PT(x) Data distribution, x-P, representing a target domain data setT(x) Representing remote sensing image obeys P in target domain data setT(x) Distribution of (E), Ex~PT(x)Represents x to PT(x) Expectation of (1), F (X)T) Representing features of the target domain extracted by the feature extractor F, Dg(F(XT) Global discriminator output, Di, representing the target domain imageC(F(XT) A class level discriminator output indicating the class number of the target domain video when the class number is the i-th class, and N indicates the class number.
Preferably, the specific process of step 3.4) is as follows:
extracting the high-level features f extracted in the step 3.2.1)sAnd the high-level features f extracted in step 3.3.1)tRespectively input into a combined discriminator D, and output a global area label f through the combined discriminator Dgs、fgtAnd a category domain label fcs、fctGlobal area label f to be outputgs、fgtCalculating the global countermeasure loss L by using a third binary cross entropy loss function expression (6) with the source domain label 0 and the target domain label 1adv_g(Xs,XT) Class domain label f to be outputcs、fctCalculate class-level confrontation loss L using fourth binary cross entropy loss function expression (7) with source domain label 0 and target domain label 1adv_c(Xs,XT) And will globally fight off the loss Ladv_g(Xs,XT) And class level fight loss Ladv_c(Xs,XT) Carrying out weighted summation to obtain a second overall countermeasure loss function expression (5), carrying out back propagation on the second overall countermeasure loss, and updating the network parameters of the combination discriminator D; wherein the second overall countermeasure loss function expression (5) is:
Ladv(XS,YS,XT)=λadv_gLadv_g(Xs,XT)+λadv_cLadv_c(Xs,XT) (5)
in the expression (5), Ladv(XS,YS,XT) Representing a second overall confrontation loss value, XSImage, X, representing a source domainTImage representing a target field, Ladv_g(Xs,XT) And Ladv_c(Xs,XT) Respectively representing global countermeasure losses and class-level countermeasure losses, λadv_gAnd λadv_cA weight representing a global countermeasure loss and a weight representing a category-level countermeasure loss, respectively;
Ladv_g(Xs,XT) Expression (6) of (a) is:
Ladv_g(XS,XT)=-Ex~PS(x)[log Dg(F(XS))]-Ex~PT(x)[log(1-Dg(F(XT)))](6)
Ladv_c(Xs,XT) Expression (7) of (a) is:
in expressions (6) and (7), PS(x) Data distribution, P, representing a source domain data setT(x) Data distribution, x-P, representing a target domain data setS(x) Representing remote-sensing image obeys P in source domain data setS(x) Distribution of (a), x to PT(x) Representing remote sensing image obeys P in target domain data setT(x) Distribution of (E), Ex~PS(x)Represents x to PS(x) Expectation of (A), Ex~PT(x)Represents x to PT(x) Expectation of (1), F (X)S) Represents the source domain features extracted by the feature extractor F, F (X)T) Representing features of the target domain extracted by the feature extractor F, Dg(F(XS) Global discriminator output, D) representing the source domain imageg(F(XT) Global discriminator output, D) representing the target domain imagec(F(XS) Class level discriminator output, D) representing the source domain imagec(F(XT) A class level discriminator output representing a target domain image, N representing the number of classes,a category level discriminator output of the source domain image indicating the category number as the ith category,and a category level discriminator output for the target domain image when the category number is the i-th category.
Preferably, the feature extractor F is a convolution feature extractor ResNet-101.
Preferably, the convergence difference value of the converged remote sensing image domain adaptive semantic segmentation model obtained in the step 3.5) is 0.05-0.15.
Preferably, the step 2) includes:
step 2.1), inputting the remote sensing image in the source domain data set into deep-v 2 to obtain a pixel-by-pixel prediction result;
step 2.2), calculating errors of the prediction result obtained in the step 2.1) and the real semantic label graph by using a cross entropy loss function expression (1), reversely propagating the calculation errors, and updating Deeplab-v2 parameters;
step 2.3), repeating the steps 2.1) -2.2), obtaining a converged Deeplab-v2 semantic segmentation model, and storing parameters of the converged Deeplab-v2 semantic segmentation model.
Preferably, the difference of convergence of the semantic segmentation model Deeplab-v2 obtained in step 2.3) is 0.05-0.15.
Preferably, the step 1) further includes cutting the image in the target domain data set and the image in the source domain data set with the real semantic label map according to an inverse ratio of the resolution to obtain corresponding image blocks.
For convenience of description, the source domain tag and the target domain tag are defined as a source domain tag 0 and a target domain tag 1, respectively.
The technical scheme of the invention has the following beneficial effects:
the invention relates to a remote sensing image domain adaptive semantic segmentation method which comprises a global discriminator D arranged in parallelGAnd a category discriminator DCThe combined discriminator D is constructed, and can promote the consistency of local joint distribution under the condition of pursuing the alignment of global distribution, thereby improving the identification performance of the semantic segmentation model on the target domain data set. The invention also comprises a category attention module CCA, which can adaptively strengthen the attention to the misaligned categories and regions in the remote sensing image of the source domain data set according to the category level certainty estimated value of the combined discriminator D, reduce the attention to the aligned categories and regions in the image and improve the performance of the classifier C on the target data set. The remote sensing image domain adaptive semantic segmentation method can improve the precision in cross-domain semantic segmentation, and solves the problems of negative migration, difficult migration and different degree difference during domain knowledge migration in the domain adaptation process of the conventional domain adaptation method.
In addition to the objects, features and advantages described above, other objects, features and advantages of the present invention are also provided. The present invention will be described in further detail below with reference to the drawings.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this application, illustrate embodiments of the invention and, together with the description, serve to explain the invention and not to limit the invention. In the drawings:
FIG. 1 is a flowchart of a method for adaptive semantic segmentation of a remote sensing image domain according to embodiment 1 of the present invention;
FIG. 2 is a diagram of a remote sensing image domain adaptive semantic segmentation model network architecture in embodiment 1;
fig. 3 is a network configuration diagram of a category attention module CCA in embodiment 1;
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments that can be derived by one of ordinary skill in the art from the embodiments given herein are intended to be within the scope of the present invention.
Example 1:
the source domain data set and the target domain data set selected in this embodiment 1 are respectively selected from a Potsdam data set and a Vaihingen data set of a high-resolution remote sensing image data set downloaded from a network of the international society for photogrammetry (isps).
Referring to fig. 1, a remote sensing image domain adaptive semantic segmentation method includes the following steps:
step 1), respectively determining a source domain data set (specifically a Potsdam data set) and a target domain data set (specifically a Vaihingen data set) according to actual needs, performing semantic label processing on the source domain data set to obtain a corresponding real semantic label graph, and not performing semantic label processing on the target domain data set; the step 1) further comprises the steps of cutting the image in the target domain data set and the image in the source domain data set with the real semantic label graph according to the inverse resolution ratio of 5:9 to obtain corresponding image blocks which are respectively used as the target domain data set and the source domain data set used in the subsequent steps, wherein the image in the target domain data set is cut to obtain 512 image blocks with the size of 960, and the image in the source domain data set with the real semantic label graph is cut to obtain 960 image blocks;
step 2), training a Deeplab-v2 semantic segmentation model on a source domain data set;
step 3), constructing a remote sensing image domain adaptive semantic segmentation model by using a Deeplab-v2 semantic segmentation model;
and 4) inputting the remote sensing image in the target domain data set into the remote sensing image domain adaptive semantic segmentation model in the step 3) for prediction to obtain a semantic segmentation prediction map of the target domain data set.
In the step 3), referring to fig. 2, the remote sensing image domain adaptive semantic segmentation model comprises a semantic segmentation model S and a combined discriminator D, wherein the semantic segmentation model S comprises a feature extractor F, a category attention module CCA and a classifier C which are connected in sequence; the combination discriminator D comprises a global discriminator D arranged in parallelGAnd a category discriminator DCThe global arbiter DGAnd a category discriminator DCThe input ends of the two sensors are connected with a feature extractor F and a category discriminator DCIs connected to the class attention module CCA, said global arbiter DGFor outputting the global label F after processing the data input by the feature extractor FgtAnd fgsSaid class discriminator DCFor outputting class domain label F after processing data input by feature extractor FctAnd fcs. The sharing parameters shown in fig. 2 are specifically the parameters of the feature extractor F in table 2.
In the step 3), the step of constructing the remote sensing image domain adaptive semantic segmentation model comprises the following steps:
step 3.1), respectively initializing parameters of the feature extractor F and the classifier C by using the parameters included in the semantic segmentation model of the deep-drop-v 2 in the step 2);
step 3.2), training a semantic segmentation model S on the source domain data set, and updating network parameters of the semantic segmentation model S;
step 3.3), updating the parameters of the feature extractor F based on the target domain data set;
step 3.4), updating the network parameters of the combined discriminator D based on the source domain data set and the target domain data set;
step 3.5) and repeating the steps 3.2) -3.4) until a converged remote sensing image domain adaptive semantic segmentation model is obtained, and storing parameters of the converged remote sensing image domain adaptive semantic segmentation model, wherein the parameters are shown in the table 2.
The step 3.2) comprises the following steps:
step 3.2.1), inputting the remote sensing image in the source domain data set into a feature extractor F, and extracting high-level features F of the remote sensing image in the source domain data sets;
Step 3.2.2), high-level feature fsInput type discriminator DCObtaining a category domain label f of a source domaincs;
Step 3.2.3), high level feature fsAnd a category domain label fcsSimultaneously inputting the image into a category attention module CCA to obtain the splicing characteristics of the source domain remote sensing image; referring to fig. 3, the specific operation of the category attention module CCA is as follows, and the module firstly inputs the high-level feature fsRespectively performing two convolution operations (the convolution kernel size is 1 multiplied by 1) to respectively obtain a characteristic diagram X 'and a characteristic diagram X', and labeling the transformed category domain fcsPerforming matrix multiplication with the feature map X ', obtaining a category affinity attention map by utilizing a softmax layer, performing matrix multiplication with the feature map X' after the map is transposed, and obtaining an attention feature map; finally, the attention feature map and the input high-level feature f are comparedsSplicing is carried out to obtain splicing characteristics; each pixel value in the mosaic feature represents a class deterministic feature map and a high-level feature fsThe stitching feature may cause the classifier C to attempt to selectively focus on aligned and non-aligned regions and categories based on affinity attention, thereby improving the classification performance of the classifier on the target domain data set;
step 3.2.4), inputting the splicing characteristics obtained in the step 3.2.3) into a classifier C for pixel-by-pixel classification, and performing up-sampling on a classification result to obtain a semantic label prediction graph with the same size as the input source domain image;
step 3.2.5), calculating errors of the semantic label prediction graph obtained in the step 3.2.4) and the real semantic label graph in the source domain data set by using a cross entropy loss function, reversely spreading the calculation errors, and updating network parameters of the semantic segmentation model S; wherein, the network parameters of the semantic segmentation model S are shown in Table 2; the cross entropy loss function expression (1) is as follows:
in the expression (1), M represents the number of samples, y(k)The true semantic tag value representing the kth sample,represents the prediction tag value, L, of the kth samplesegThe loss value is indicated.
The step 3.3) comprises the following steps:
step 3.3.1), inputting the remote sensing image in the target domain data set into a feature extractor F, and extracting high-level features F of the remote sensing image in the target domain data sett;
Step 3.3.2), high level feature ftInput global arbiter DGGet the global label fgtThe high level feature ftInput type discriminator DCGet the class domain label fct;
Step 3.3.3), labeling the whole local area label fgtCalculating the global countermeasure loss by using a first binary cross entropy loss function expression (3) with the source domain label 0, and labeling a category domain label fctCalculating class-level countermeasure loss by using a second binary cross entropy loss function expression (4) together with the source domain label 0, carrying out weighted summation on the global countermeasure loss and the class-level countermeasure loss to obtain a first total countermeasure loss, carrying out back propagation on the loss, and updating the network parameters of the feature extractor F; wherein, the network parameters of the feature extractor F are shown in Table 2; the first overall countermeasure loss function expression (2) is:
Ladv(XT)=λadv_gLadv_g(XT)+λadv_cLadv_c(XT) (2)
in the expression (2), Ladv_g(XT) And Ladv_c(XT) Respectively representing global countermeasure losses and class-level countermeasure losses, λadv_gAnd λadv_cWeights representing global countermeasure loss and class-level countermeasure loss, X, respectivelyTAn image representing a target domain;
Ladv_g(XT) Expression (3) of (c) is:
Ladv_g(XT)=-Ex~PT(x)[log Dg(F(XT))] (3)
Ladv_c(XT) Expression (4) of (a) is:
in expressions (3) and (4), PT(x) Data distribution, x-P, representing a target domain data setT(x) Representing remote sensing image obeys P in target domain data setT(x) Distribution of (E), Ex~PT(x)Represents x to PT(x) Expectation of (1), F (X)T) Representing features of the target domain extracted by the feature extractor F, Dg(F(XT) Global discriminator output, D) representing the target domain imagec(F(XT) A class level discriminator output representing a target domain image, N representing the number of classes,and a category level discriminator output for the target domain image when the category number is the i-th category.
The specific process of the step 3.4) is as follows:
extracting the high-level features f extracted in the step 3.2.1)sAnd the high-level features f extracted in step 3.3.1)tRespectively input into a combined discriminator D, and output a global area label f through the combined discriminator Dgs、fgtAnd a category domain label fcs、fctGlobal area label f to be outputgs、fgtCalculating the global countermeasure loss L by using a third binary cross entropy loss function expression (6) with the source domain label 0 and the target domain label 1adv_g(Xs,XT) Class domain label f to be outputct、fcsCalculate class-level confrontation loss L using fourth binary cross entropy loss function expression (7) with source domain label 0 and target domain label 1adv_c(Xs,XT) And will globally fight off the loss Ladv_g(Xs,XT) And class level fight loss Ladv_c(Xs,XT) Carrying out weighted summation to obtain a second overall countermeasure loss function expression (5), carrying out back propagation on the second overall countermeasure loss, and updating the network parameters of the combination discriminator D; wherein the network parameters of the combination discriminator D are shown in table 2, and the second overall countermeasure loss function expression (5) is:
Ladv(XS,YS,XT)=λadv_gLadv_g(Xs,XT)+λadv_cLadv_c(Xs,XT) (5)
in the expression (5), Ladv(XS,YS,XT) Representing a second overall confrontation loss value, XSImage, X, representing a source domainTImage representing a target field, Ladv_g(Xs,XT) And Ladv_c(Xs,XT) Respectively representing global countermeasure losses and class-level countermeasure losses, λadv_gAnd λadv_cA weight representing a global countermeasure loss and a weight representing a category-level countermeasure loss, respectively;
Ladv_g(Xs,XT) Expression (6) of (a) is:
Ladv_c(XS,XT)=-Ex~PS(x)[log Dg(F(XS))]-Ex~PT(x)[log(1-Dg(F(XT)))]
(6)
Ladv_c(Xs,XT) Expression (7) of (a) is:
in expressions (6) and (7), PS(x) Data distribution, P, representing a source domain data setT(x) Data distribution, x-P, representing a target domain data setS(x) Representing remote-sensing image obeys P in source domain data setS(x) Distribution of (a), x to PT(x) Representing remote sensing image obeys P in target domain data setT(x) The distribution of (a) to (b) is,represents x to PS(x) In the expectation that the position of the target is not changed,represents x to PT(x) Expectation of (1), F (X)S) Represents the source domain features extracted by the feature extractor F, F (X)T) Representing features of the target domain extracted by the feature extractor F, Dg(F(XS) Global discriminator output, D) representing the source domain imageg(F(XT) Global discriminator output, D) representing the target domain imagec(F(XS) Class level discriminator output, D) representing the source domain imagec(F(XT) A class level discriminator output representing a target domain image, N representing the number of classes,a category level discriminator output of the source domain image indicating the category number as the ith category,and a category level discriminator output for the target domain image when the category number is the i-th category.
The feature extractor F selects a convolution feature extractor ResNet-101, and uses the cavity convolution to extract the high-level features F of the remote sensing image in the source domain data setsAnd high-level characteristic f of target domain data centralized remote sensing imaget。
Repeating steps 3.2) -3.4) in said step 3.5) until a loss value L is reachedseg,Ladv(XT),Ladv(XS,YS,XT) And reducing the image to a minimum value (the minimum value approaches to zero infinitely), keeping stable oscillation, and obtaining a convergence difference value of 0.1 of a converged remote sensing image domain adaptive semantic segmentation model.
The step 2) comprises the following steps:
step 2.1), inputting the remote sensing image in the source domain data set into deep-v 2 to obtain a pixel-by-pixel prediction result;
step 2.2), calculating errors of the prediction result obtained in the step 2.1) and the real semantic label graph by using a cross entropy loss function expression (1), reversely propagating the calculation errors, and updating Deeplab-v2 parameters, wherein the parameters are shown in a table 3;
step 2.3), repeat steps 2.1) -2.2) until the loss value L is reachedsegDecreasing to a minimum value (the minimum value approaches zero infinitely) and keeping stable oscillation, obtaining a converged Deeplab-v2 semantic segmentation model, and saving parameters of the converged Deeplab-v2 semantic segmentation model, which are shown in Table 3.
The convergence difference of the Deeplab-v2 semantic segmentation model obtained in step 2.3) was 0.1.
Comparative example 1:
the difference from example 1 is: step 3) is omitted, the remote sensing image in the target domain data set is input into the Deeplab-v2 semantic segmentation model constructed in the step 2) for prediction, and a semantic segmentation prediction map of the target domain data set is obtained.
Comparative example 2:
the difference from example 1 is: omitting the category discriminator DCAnd a class attention module CCA.
Comparative example 3:
the difference from example 1 is: the class attention module CCA is omitted.
Comparative example 4:
and inputting the remote sensing image in the target domain data set into the MCD-DA domain adaptive semantic segmentation model for prediction by adopting the MCD-DA domain adaptive semantic segmentation model to obtain a semantic segmentation prediction map of the target domain data set.
Comparative example 5:
and inputting the remote sensing image in the target domain data set into the ADVENT domain adaptive semantic segmentation model for prediction by adopting the ADVENT domain adaptive semantic segmentation model to obtain a semantic segmentation prediction map of the target domain data set.
Comparative example 6:
and inputting the remote sensing image in the target domain data set into the Benjdira's domain adaptive semantic segmentation model for prediction by adopting the Benjdira's domain adaptive semantic segmentation model to obtain a semantic segmentation prediction map of the target domain data set.
Comparative example 7:
and inputting the remote sensing image in the target domain data set into the AdaptSegNet domain adaptive semantic segmentation model for prediction by adopting an AdaptSegNet domain adaptive semantic segmentation model to obtain a semantic segmentation prediction map of the target domain data set.
Comparative example 8:
and inputting the remote sensing image in the target domain data set into the CLAN domain adaptive semantic segmentation model for prediction by adopting a CLAN domain adaptive semantic segmentation model to obtain a semantic segmentation prediction map of the target domain data set.
The domain adaptation results for migration of example 1 and comparative examples 1-8 from Potsdam dataset to Vaihingen dataset are detailed in Table 1, with the data in Table 1 being embodied by the expression F1Calculated as score, OA, MA, mIoU, F1The expression of score is in particular:
in expression (8), Precision ═ nii/∑jnji,Recall=nii/∑jnij;
In the expression (8), F1Is represented by F1Score, Precision for Precision, Recall for Recall, niiN is the number of pixels predicted when the number of categories is the ith categoryijN is the number of pixels in which the number of categories is predicted to be the jth category when the number of categories is the ith categoryjiWhen the number of the indicated categories is jthWhen the number of predicted types is the ith type, i represents that the number of types is the ith type, and j represents that the number of types is the jth type.
OA represents the overall accuracy, and expression (9) thereof is specifically: OA ═ Σinii/∑i∑jnij (9)
In the expression (9), niiIndicates the number of pixels correctly predicted when the number of categories is the ith category, nijThe number of pixels when the number of categories is the ith category and the number of predicted categories is the jth category is referred to.
MA represents the average accuracy, and the expression (10) thereof is specifically: MA ═ 1/ncl)∑i(nii/∑jnji) (10)
In the expression (10), niiIndicates the number of pixels correctly predicted when the number of categories is the ith category, njiThe number of pixels when the number of categories is jth, nclRefers to the number of categories in the dataset.
mIoU represents the average intersection ratio, and the expression is specifically as follows:
mIoU=(1/ncl)∑inii/(∑jnij+∑jnji-nii) (11)
in the expression (11), niiIndicates the number of pixels correctly predicted when the number of categories is the ith category, nijIndicates the number of pixels when the number of categories is the ith category and the number of categories is predicted to be the jth category, njiThe number of pixels when the number of categories is jth, nclRefers to the number of categories in the dataset.
Table 1 results of domain adaptation of example 1 and comparative examples 1-8 from Potsdam dataset migration onto Vaihingen dataset
TABLE 2 parameters of the remote sensing image domain adaptive semantic segmentation model
TABLE 3 network parameters of Deeplab-v2
As shown in the data in table 1, compared to comparative examples 1 to 8, in which comparative examples 4 to 8 are the existing domain adaptive semantic segmentation models, the highest score values were obtained in OA, MA and mlou in example 1 of the present invention, which reach 73.62%, 63.03% and 45.91%, respectively, thereby demonstrating that example 1 can achieve the optimal cross-domain semantic segmentation performance. Furthermore, example 1F on waterproofed surfaces, buildings, low-lying vegetation, automobiles and clutter classes1Score scores of 80.30%, 84.24%, 65.59%, 40.57% and 28.85%, respectively, achieved the best accuracy and demonstrated the effectiveness of class level alignment of example 1.
Comparative example 2 employs global discriminator D, in contrast to comparative example 1GThe constructed domain-adapted semantic segmentation model improves OA, MA and mIoU by 18.48%, 6.22% and 12.60% respectively, because a global discriminator D is usedGThe feature spaces of the source domain data set and the target domain data set can be aligned by counterstudy, and the distribution difference of the source domain data set and the target domain data set is eliminated in the feature spaces, so that the precision in cross-domain semantic segmentation is improved.
Compared with comparative example 2, comparative example 3 has the domain adaptive semantic segmentation model constructed by the combined discriminator D improved by 4.88%, 5.74% and 4.84% on OA, MA and mlou respectively, and comparative example 3 also solves the problem of tree class negative migration caused by comparative example 2, which is caused by that comparative example 2 neglects the consistency of local class semantics in the global feature alignment process, while comparative example 3 passes through the class discriminator DCConsistency of local semantics is promoted from a category perspective.
Compared with the comparative example 3, the remote sensing image domain adaptive semantic segmentation model constructed in the embodiment 1 is respectively improved by 1.41%, 2.89% and 2.56% on OA, MA and mIoU, and the segmentation performance of the classes which are difficult to align (such as low vegetation, trees and clutter) can be greatly improved, because the class attention module CCA can adaptively exert more attention on the unaligned regions and classes in a parameter self-learning manner in the table 2, and meanwhile, the attention on the aligned regions and classes is reduced, so that the segmentation performance of the remote sensing image domain adaptive semantic segmentation model on the classes and regions which are difficult to align is improved, and the overall segmentation accuracy is improved.
The above description is only a preferred embodiment of the present invention and is not intended to limit the present invention, and various modifications and changes may be made by those skilled in the art. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.