Image restoration method based on multi-frequency sub-band probabilistic inference model

文档序号:9219 发布日期:2021-09-17 浏览:31次 中文

1. An image restoration method based on a multi-frequency sub-band probabilistic inference model is characterized in that: the method comprises a training stage and an actual measurement stage;

in the training stage, an inference network and a generation network parallel structure are adopted, and the generation network is used for carrying out an image restoration task under the assistance of the inference network:

the inference network is used for estimating latent variable distribution of the real image in the training process, and the specific operation of the inference network is as follows:

the method comprises the following steps: for real images IgtIteratively decomposing a real image into 4 sub-band images using discrete wavelet transformWherein the low frequency subbands are:the high frequency subbands are:

step two: respectively inputting the high-frequency sub-band and the low-frequency sub-band of the real image obtained in the step one into different encoders to obtain latent variables of the high-frequency sub-band and the low-frequency sub-bandAnd

step three: splicing the high and low frequency latent variables obtained in the step two together and inputting the spliced high and low frequency latent variables into a decoder to obtain a reconstructed image;

step four: inputting the real image and the reconstructed image obtained in the third step into a discriminator network DinferIn (1), iteratively adjusting the generator I by a loss functioninferParameters until the loss function converges, generator IinferStopping training when the parameter reaches an optimal value;

the inference network loss function is specifically as follows:

first, a reconstruction loss function is constructed, the loss being defined as the prediction result IinferAnd the distance L1 between the real image is as follows:

wherein the content of the first and second substances,

Iinferrepresenting images generated in an inference network, IgtA real image is represented by a real image,high and low frequency subbands, G, respectively, of the real imageinferRepresenting a generator in an inference network;

then, constructing a countermeasure loss function in the inference network, wherein the loss is used for enabling the characteristics of the real image in the discriminator network to be closer to the characteristics of the reconstructed image, and the details are as follows:

wherein D isinferRepresenting discriminators in an inference network;

next, using the multi-frequency sub-band probabilistic inference model to obtain KL divergence of the inference networkThe method comprises the following specific steps:

and obtaining KL by utilizing the lower bound of variation in the inference network, which is as follows:

in the formula xLAnd xHRepresenting the generative model, log p (x)L) Represents the distribution of low frequency sub-bands in the generative model, log p (x)H) And expressing the distribution of high-frequency sub-bands in the generative model, wherein the distribution of latent variables generated by the generative model is controlled to be standard normal distribution:to adapt to the number of pixels n, z in the missing regionLAnd zHRepresenting the distribution of latent variables of the generative model, is a posteriori significant sampling function representing the high and low frequency subbands of a real imageCorresponding high and low frequency latent variables obtained after codingThe distribution of (a) to (b) is,the low-frequency sub-band representing the real image is controlled by latent variables obtained after the encoder to generate the expectation of the distribution of the low-frequency sub-band of the image;the high frequency subbands representing the real image are controlled by latent variables derived after the encoder to generate an expectation of the high frequency subband distribution of the image.

Adjusting the priors according to the number of pixels n in the missing partial image based on the equations (4) and (5), definingFor the Gaussian function, the KL divergence of the inference network is obtainedFor minimizing the difference between the two distributions of the real image and the generative model, the following is specific:

finally, a complete loss function of the inference network is obtained, specifically as follows:

wherein the content of the first and second substances,representing a weight coefficient;

the specific operation of generating the network during the training process is as follows:

the method comprises the following steps: for damaged image ImIteratively decomposing the corrupted image into 4 sub-band images using discrete wavelet transformWherein the low frequency subbands are:the high frequency subbands are:

step two: inputting the multi-frequency sub-band representation of the damaged image obtained in the step one into a U-net encoder to respectively obtain latent variables of a high-frequency sub-band and a low-frequency sub-bandAnd

step three: the latent variables of the high and low frequency sub-bands obtained in the step twoAndinputting the image into a decoder of U-net to obtain a multi-frequency sub-band representation of a generated image, and performing inverse wavelet transform on the multi-frequency sub-band representation to obtain a repaired image;

step four: inputting the real image and the repaired image obtained in the third step into a discriminator network DgenIteratively adjusting the generator G by a loss functiongenParameters until the loss function converges, generator GgenThe parameters are optimized, and the training is stopped;

the loss function for the network generated during training is as follows:

first, the reconstruction loss of the generation network is constructedUsing in particular normalized L1 distance as a function of reconstruction lossTo constrain the resulting contour structure, reconstruct the lossThe definition is as follows:

wherein the content of the first and second substances,

Ioutindicating the repair result of the generating network, IDWT [ ·]Which represents an inverse discrete wavelet transform, is,multi-frequency representation, G, of a loss imagegenA generator representing a generation network;

then, a resistance loss function is constructedFor approximating the features of real and reconstructed images in the discriminatorThe definition is as follows:

wherein DgenRepresenting discriminators in a generating network;

then, constructing KL divergence in the generation network by using a multi-frequency sub-band probabilistic inference modelThe method comprises the following specific steps:

and obtaining KL of the generated network by using the lower variation bound in the generated network, wherein the lower variation bound is as follows:

wherein p isθ(. is) a likelihood function, qψ(. is) an a posteriori significant sampling function, pφ(. is) a conditional prior,representing high and low frequency sub-bands from a corrupted imageCorresponding high and low frequency latent variables obtained after codingThe distribution of (a) to (b) is,representing the expectation of generating the image low frequency subband distribution from latent variables of the real image low frequency subbands and low frequency subband control of the missing image,representing the expectation of generating the image high frequency subband distribution from latent variables of the real image high frequency subbands and high frequency subband control of the missing image,andrefers to under given conditionsThe high and low frequency sub-band distribution corresponding to the lower generation model, theta, psi and phi are depth network parameters of corresponding functions;

based on (13) and (14), the consistency between the distribution pairs of the high-frequency and low-frequency sub-band latent variables is normalized by using the KL divergence, and the KL divergence of the generated network is obtained as follows:

then, a texture loss function is constructedFor maintaining consistency of content and style, loss of texture, of generated image and real imageIs defined as follows:

phi represents a high-level feature space extracted by a VGG-16 network pre-trained by ImageNet, and Gram represents a Gram matrix operation;

finally, the overall loss function for constructing the generated network is:

wherein the content of the first and second substances,representing a weight coefficient;

in the actual measurement stage, a generation network is used to obtain a repaired image, and the steps in the actual measurement stage are as follows:

the method comprises the following steps: for damaged image ImIteratively decomposing the corrupted image into 4 sub-band images using discrete wavelet transformWherein the low frequency subbands are:the high frequency subbands are:

step two: inputting the multi-frequency sub-band representation of the damaged image obtained in the step one into a U-net encoder to respectively obtain latent variables of a high-frequency sub-band and a low-frequency sub-bandAnd

step three: the latent variables of the high and low frequency sub-bands obtained in the step twoAndand inputting the image into a decoder of the U-net to obtain a multi-frequency sub-band representation of a generated image, and performing inverse wavelet transform on the multi-frequency sub-band representation to obtain a repaired image, wherein the obtained image has strong reality and clear texture details.

Background art:

image inpainting is a basic task in multimedia applications and computer vision, with the goal of generating alternative global semantic structures and local detail textures for missing regions, and ultimately producing visually realistic results. It is widely applied in the multimedia fields of image editing, restoration, synthesis and the like. The conventional image block-based image inpainting method is to search and copy the best matching image block from a known area to the missing area. The traditional image restoration method has a good processing effect on static textures, but has a limited processing effect on textures of complex or non-repetitive structures such as human faces and the like, and is not suitable for capturing high-level semantic information.

In recent years, modeling image inpainting as a conditional generation problem based on a learning approach, Pathak et al first train a deep neural network with a penalty-fighting function to predict missing regions, which is advantageous for capturing the edges and global structure of large-area missing regions. Ishikawa et al improve it by combining global and local penalty functions to produce finer textures. The deep features are extracted and transmitted through the convolutional neural network, so that the defects of the traditional image restoration algorithm are overcome better, and visually real and reasonable restoration results are obtained by the methods. However, since these methods treat and process the structure and texture information of the input image equally, an edge over-smoothing or texture phenomenon often occurs.

To address this problem, Liu et al propose a two-stage network, recovering the coarse structure of the missing region in the first stage, and generating the final result using the reconstruction information of the first stage in the second stage. However, the second-stage network depends greatly on the correctness of the reconstruction structure of the first-stage network, and the two-stage training also brings additional computational burden. Meanwhile, the data distribution of the low-frequency characteristic and the high-frequency characteristic in the input image are completely different. If the feature distributions of different frequencies are computed indiscriminately, reconstruction of the structure or generation of texture may be misled.

In summary, the existing image restoration algorithm often cannot reconstruct a reasonable structure and a fine texture at the same time, and has limitations.

Disclosure of Invention

In order to solve the problem that the conventional image restoration method cannot simultaneously reconstruct a reasonable structure and fine textures, the invention provides the image restoration method with high quality.

The invention discloses an image restoration method based on a multi-frequency sub-band probabilistic inference model, and provides a dual-path parallel network, namely an inference network and a generation network, based on the multi-frequency probabilistic inference model. Our method first decomposes the input image into low-frequency subbands and high-frequency subbands in the wavelet domain, which facilitates more accurate extraction of feature distributions of different frequencies without interference. Then, the low-frequency characteristic and the high-frequency characteristic of the real image estimated from the inference network are coded to obtain the latent variable distribution of the multi-frequency characteristic of the real image, similarly, the low-frequency and high-frequency sub-bands can be respectively obtained by utilizing wavelet transformation and coded to obtain the latent variable distribution of the multi-frequency characteristic of the damaged image, the latent variable of the damaged image is estimated by utilizing a probability inference model to enable the latent variable distribution of the missing image to be closer to the latent variable distribution of the real image, the latent variable generates corresponding multi-frequency information, the missing area is filled, and the final visual vivid result is generated.

The following two stages, training stage and actual measurement stage, are provided to illustrate our invention:

1. an image restoration method based on a multi-frequency sub-band probabilistic inference model is characterized in that: the method comprises a training stage and an actual measurement stage;

in the training stage, an inference network and a generation network parallel structure are adopted, and the generation network is used for carrying out an image restoration task under the assistance of the inference network:

the inference network is used for estimating latent variable distribution of the real image in the training process, and the specific operation of the inference network is as follows:

the method comprises the following steps: for real images IgtIteratively decomposing a real image into 4 sub-band images using discrete wavelet transformWherein the low frequency subbands are:the high frequency subbands are:

step two: respectively inputting the high-frequency sub-band and the low-frequency sub-band of the real image obtained in the step one into different encoders to obtain latent variables of the high-frequency sub-band and the low-frequency sub-bandAnd

step three: splicing the high and low frequency latent variables obtained in the step two together and inputting the spliced high and low frequency latent variables into a decoder to obtain a reconstructed image;

step four: inputting the real image and the reconstructed image obtained in the third step into a discriminator network DinferIteratively adjusting the generator G by a loss functioninferParameters until the loss function converges, generator GinferStopping training when the parameter reaches an optimal value;

the inference network loss function is specifically as follows:

first, a reconstruction loss function is constructed, the loss being defined as the prediction result IinferAnd the distance L1 between the real image is as follows:

wherein the content of the first and second substances,

Iinferrepresenting images generated in an inference network, IgtA real image is represented by a real image,high and low frequency subbands, G, respectively, of the real imageinferGeneration in a representation inference networkA machine;

then, constructing a countermeasure loss function in the inference network, wherein the loss is used for enabling the characteristics of the real image in the discriminator network to be closer to the characteristics of the reconstructed image, and the details are as follows:

wherein D isinferRepresenting discriminators in an inference network;

next, using the multi-frequency sub-band probabilistic inference model to obtain KL divergence of the inference networkThe method comprises the following specific steps: and obtaining KL by utilizing the lower bound of variation in the inference network, which is as follows:

in the formula xLAnd xHRepresenting the generative model, log p (x)L) Represents the distribution of low frequency sub-bands in the generative model, log p (x)H) And expressing the distribution of high-frequency sub-bands in the generative model, wherein the distribution of latent variables generated by the generative model is controlled to be standard normal distribution:to adapt to the number of pixels n, z in the missing regionLAnd zHRepresenting the distribution of latent variables of the generative model, is a posterior significant sampling function represented byHigh and low frequency sub-bands of real imagesCorresponding high and low frequency latent variables obtained after codingThe distribution of (a) to (b) is,the low-frequency sub-band representing the real image is controlled by latent variables obtained after the encoder to generate the expectation of the distribution of the low-frequency sub-band of the image;the high frequency subbands representing the real image are controlled by latent variables derived after the encoder to generate an expectation of the high frequency subband distribution of the image.

Based on the formulas (4) and (5), the prior is adjusted according to the number of the pixels n in the missing partial image, the prior is defined as a Gaussian function, and the KL divergence of the inference network is obtainedFor minimizing the difference between the two distributions of the real image and the generative model, the following is specific:

finally, a complete loss function of the inference network is obtained, specifically as follows:

wherein the content of the first and second substances,representing a weight coefficient;

the specific operation of generating the network during the training process is as follows:

the method comprises the following steps: for damaged image ImIteratively decomposing the corrupted image into 4 sub-band images using discrete wavelet transformWherein the low frequency subbands are:the high frequency subbands are:

step two: inputting the multi-frequency sub-band representation of the damaged image obtained in the step one into a U-net encoder to respectively obtain latent variables of a high-frequency sub-band and a low-frequency sub-bandAnd

step three: the latent variables of the high and low frequency sub-bands obtained in the step twoAndinputting the image into a decoder of U-net to obtain a multi-frequency sub-band representation of a generated image, and performing inverse wavelet transform on the multi-frequency sub-band representation to obtain a repaired image;

step four: inputting the real image and the repaired image obtained in the third step into a discriminator network DgenIteratively adjusting the generator G by a loss functiongenParameters until the loss function converges, generator GgenThe parameters are optimized, and the training is stopped;

the loss function for the network generated during training is as follows:

first, the reconstruction loss of the generation network is constructedUsing in particular normalized L1 distance as a function of reconstruction lossTo constrain the resulting contour structure, reconstruct the lossThe definition is as follows:

wherein the content of the first and second substances,

Ioutindicating the repair result of the generating network, IDWT [ ·]Which represents an inverse discrete wavelet transform, is,multi-frequency representation, G, of a loss imagegenA generator representing a generation network;

then, a resistance loss function is constructedFor approximating the features of real and reconstructed images in the discriminatorThe definition is as follows:

wherein DgenRepresenting discriminators in a generating network;

then, a multi-frequency sub-band profile is utilizedKL divergence in a rate inference model building generation networkThe method comprises the following specific steps:

and obtaining KL of the generated network by using the lower variation bound in the generated network, wherein the lower variation bound is as follows:

wherein p isθ(. is) a likelihood function, qψ(. is) an a posteriori significant sampling function, pφ(. is) a conditional prior,representing high and low frequency sub-bands from a corrupted imageCorresponding high and low frequency latent variables obtained after codingThe distribution of (a) to (b) is,representing the expectation of generating the image low frequency subband distribution from latent variables of the real image low frequency subbands and low frequency subband control of the missing image,representing the expectation of generating the image high frequency subband distribution from latent variables of the real image high frequency subbands and high frequency subband control of the missing image,andrefers to under given conditionsThe high and low frequency sub-band distribution corresponding to the lower generation model, theta, psi and phi are depth network parameters of corresponding functions;

based on (13) and (14), the consistency between the distribution pairs of the high-frequency and low-frequency sub-band latent variables is normalized by using the KL divergence, and the KL divergence of the generated network is obtained as follows:

then, a texture loss function is constructedFor maintaining consistency of content and style, loss of texture, of generated image and real imageIs defined as follows:

phi represents a high-level feature space extracted by a VGG-16 network pre-trained by ImageNet, and Gram represents a Gram matrix operation;

finally, the overall loss function for constructing the generated network is:

wherein the content of the first and second substances,representing a weight coefficient;

in the actual measurement stage, a generation network is used to obtain a repaired image, and the steps in the actual measurement stage are as follows:

the method comprises the following steps: for damaged image ImIteratively decomposing the corrupted image into 4 sub-band images using discrete wavelet transformWherein the low frequency subbands are:the high frequency subbands are:

step two: inputting the multi-frequency sub-band representation of the damaged image obtained in the step one into a U-net encoder to respectively obtain latent variables of a high-frequency sub-band and a low-frequency sub-bandAnd

step three: the latent variables of the high and low frequency sub-bands obtained in the step twoAndand inputting the image into a decoder of the U-net to obtain a multi-frequency sub-band representation of a generated image, and performing inverse wavelet transform on the multi-frequency sub-band representation to obtain a repaired image, wherein the obtained image has strong reality and clear texture details.

Advantageous effects

Compared with the prior art, the method provided by the invention researches the image restoration problem from the perspective of predicting the low-frequency semantic structure content and the high-frequency detail texture through the multi-frequency subband probability inference model. And obtaining multi-frequency information of the missing area by estimating the multi-frequency characteristic distribution of the missing area. The beneficial effects are as follows: the model not only can synthesize a clear image structure, but also can generate fine textures in a missing area, and is obviously superior to the most advanced method.

Description of the drawings:

FIG. 1 is a diagram of a framework of image restoration technology based on a multi-frequency sub-band probabilistic inference model;

FIG. 2 is an exemplary graph of a repair result on a face data set;

FIG. 3 compares the visual results of different algorithms repairing the central area;

FIG. 4 compares the visual results of different algorithms repairing random areas;

Detailed Description

In the image restoration technology framework diagram based on the multi-frequency sub-band probabilistic inference model in fig. 1, the image restoration work is divided into two parallel network paths — a generation network and an inference network. Inputting a damaged image in a generating network, decomposing the damaged image into multi-frequency sub-bands through discrete wavelet transformation, and generating high-low frequency latent variables corresponding to the damaged image after the multi-frequency sub-bands pass through an encoder. Inputting a real image in an inference network, generating corresponding high-low frequency latent variables in an encoder in the inference network, and regulating the latent variables of the damaged image by adopting the latent variables of the real image so that the repaired result of the damaged image is closer to the real image.

The following is a detailed description in terms of both training and prediction phases.

The inference network is used for estimating latent variable distribution of the real image in the training process, and the specific operation of the inference network is as follows:

the method comprises the following steps: for real images IgtIteratively decomposing a real image into 4 sub-band images using discrete wavelet transformWherein the low frequency subbands are:the high frequency subbands are:

step two: inputting the multi-frequency sub-band of the real image obtained in the step one into an encoder to respectively obtain latent variables of the high-frequency sub-band and the low-frequency sub-bandAndin particular, we use different encoders to infer the distribution of different frequency features, respectively, which better focuses on extracting different levels of information in multiple frequency bands. The low-frequency pipeline takes a low-frequency sub-band as input and predicts context semantic information. The structure of the conventional convolution block and residual block is sensitive to low-frequency information, such as color and low-frequency structure. In contrast, the high-frequency pipeline draws high-frequency details according to the high-frequency sub-bands, and structurally, the high-frequency domain edges and texture features are captured and transmitted by using the residual blocks.

Step three: and splicing the high and low frequency latent variables obtained in the step two together and inputting the spliced high and low frequency latent variables into a decoder to obtain a reconstructed image. The decoder adopts a structure of combining the residual block and the convolution block to carry out feature recovery.

Step four: inputting the real image and the reconstructed image obtained in the third step into a discriminator network, iteratively adjusting generator parameters through a loss function until the loss function is converged and the generator parameters reach an optimal value, and stopping training. The discriminator adopts a common convolution block structure to discriminate the image.

The loss function is as follows:

process G for generating images by inference networkinferThe training process of (a) can be written as:

wherein, IinferRepresenting the images generated in the inference network,respectively, the high and low frequency subbands of the real image. First, a reconstruction loss function is defined as a prediction result IinferDistance L1 from real image:

meanwhile, the generation result of the inference network requires that the distance from the average feature to the L2 of the real data is minimized to make the features of the real image in the discriminator network and the reconstructed image features closer, and the confrontational loss function in the inference network is defined as:

wherein D isinferRepresenting discriminators in an inference network.

Use of a multi-frequency sub-band probabilistic inference model in an inference network:

in a variational auto-encoder (VAE), it is considered that the generation of an image is controlled by latent variables, and therefore it is desirable to obtain a latent variable distribution of a class of images by training the images. We assume that the latent variables are normally distributed, so we use probabilistic inference models in the inference network to hopefully make the latent variable distribution of the image as close to the standard normal distribution as possible.

The derivation of the multi-frequency subband probability model in the inference network is as follows:

assuming that the low frequency subbands and the high frequency subbands are independent of each other, according to a variational self-encoder (VAE), the lower bounds of the variational in the inference network are:

in the formula xLAnd xHRepresenting all the images that we want to obtain, i.e. the latent variable control can generate, we call it the generative model, log p (x)L) Represents the distribution of the low frequency sub-bands in the generative model we want to obtain, log p (x)H) Expressing the distribution of the high-frequency subbands in the generative model we want to obtain, we assume that the distribution of latent variables controlling the generative model generation is a normal distribution: to adapt to the number of pixels n, z in the missing regionLAnd zHRepresenting the distribution of latent variables that we wish to obtain for the generative model. The KL divergence is used to minimize the difference between the two distributions of the real image and the generative model.

Based on the formulas (4) and (5), the prior can be adjusted according to the number of pixels n in the missing partial image, the prior is defined as a Gaussian function, and the KL divergence of the inference network is obtained

The inference network is jointly trained using the following loss function:

where λ represents a weight coefficient.

The specific operation of generating the network during the training process is as follows:

the method comprises the following steps: for damaged image ImWill be subjected to using discrete wavelet transformThe lossy image is iteratively decomposed into 4 sub-band imagesWherein the low frequency subbands are:the high frequency subbands are:

step two: inputting the multi-frequency sub-band representation of the damaged image obtained in the step one into a U-net encoder to respectively obtain latent variables of a high-frequency sub-band and a low-frequency sub-bandAnd

step three: the latent variables of the high and low frequency sub-bands obtained in the step twoAndinputting the image into a decoder of the U-net to obtain a multi-frequency sub-band representation of the generated image, and performing inverse wavelet transform on the multi-frequency sub-band representation to obtain a repaired image.

Step four: inputting the real image and the repaired image obtained in the third step into a discriminator network, iteratively adjusting generator parameters through a loss function until the loss function is converged, optimizing the generator parameters, and stopping training. The discriminators in the generation network use a least squares generation countermeasure network (LSGAN) discriminator structure.

The loss function for the network generated during training is as follows:

and under the constraint of the inference network, utilizing the generation network to carry out an image repairing task. Generator G for generating a networkgenTraining ofThe exercise process can be written as:

wherein IoutIndicating the repair result of the generating network, IDWT [ ·]Which represents an inverse discrete wavelet transform, is,a multi-frequency representation of the loss image is represented.

First, the present model uses the normalized L1 distance as a reconstruction loss functionTo constrain the contour structure that produces the result. Loss of reconstructionThe definition is as follows:

to make the features of the true image and the reconstructed image closer in the discriminator, we use the antagonism constraint, the antagonism lossThe definition is as follows:

use of a multi-frequency sub-band probabilistic inference model in generating a network:

in the generation network, a probability inference model is used to hope that the damaged image is identical to the real image as far as possible in restoration, and because the latent variable controls the generation of the image, what we need to do is to make the latent variable distribution of the damaged image as close to the latent variable distribution of the real image as possible, so the KL dispersion between the obtained latent variable in the minimum inference network and the latent variable obtained in the generation network is used to make the latent variable distribution of the damaged image closer to the latent variable distribution of the real image, and finally the goal of restoring the damaged image is achieved.

Derivation of the multi-frequency subband probabilistic inference model in the generated network is as follows:

in the generation network, the high-low frequency sub-band of the damaged image is taken as a condition, and according to a conditional variation self-encoder (CVAE), the lower bound of variation in the generation network is as follows:

wherein p isθ(. is) a likelihood function, qψ(. is) an a posteriori significant sampling function, pφThe (| ·) is conditional prior, theta, psi and phi are depth network parameters of corresponding functions, and the sum of variation lower bounds of all training data is jointly maximized along with the change of the network parameters while the total log likelihood value of an observed training example is maximized, namely KL dispersion of latent variable distribution between a damaged image and a real image reaches the minimum.

Firstly, the potential feature spaces of the same frequency are closely related, the latent variables of the high-frequency and low-frequency sub-bands are mutually independent, and the distribution of the latent variables of the generated model is assumed to be closer to a real image than a damaged image, so that the following steps can be taken: update equations (11) and (12):

based on (13) and (14), the consistency between the distribution pairs of the high-frequency and low-frequency sub-band latent variables is normalized by using the KL divergence, and the KL divergence of the generated network is obtained as follows:

finally, in order to keep the consistency of the content and style of the generated image and the real image, the VGG-16 network trained in advance by ImageNet is used for extracting a high-level feature space and losing textureIs defined as follows:

where Φ represents the extracted feature representation and Gram represents the Gram matrix operation.

Considering reconstruction loss, countermeasure loss, KL divergence and texture loss, the overall loss function defining the generated network is:

where λ represents a weight coefficient.

In the actual measurement stage, the repaired image is obtained by using the generated network, and the steps of the actual measurement stage are as follows:

the method comprises the following steps: for damaged image ImIteratively decomposing the corrupted image into 4 sub-band images using discrete wavelet transformWherein the low frequency subbands are:the high frequency subbands are:

step two: inputting the multi-frequency sub-band representation of the damaged image obtained in the step one into a U-net encoder to respectively obtain latent variables of a high-frequency sub-band and a low-frequency sub-bandAnd

step three: the latent variables of the high and low frequency sub-bands obtained in the step twoAndand inputting the image into a decoder of the U-net to obtain a multi-frequency sub-band representation of a generated image, and performing inverse wavelet transform on the multi-frequency sub-band representation to obtain a repaired image, wherein the obtained image has strong reality and clear texture details.

And (3) evaluating the image quality:

FIG. 3 is a graph comparing the visual results of the present invention with that of CE, CA, PICNet, Shift-Net on the repaired image in the center area, as can be seen from Table 1 and FIG. 3: context Encoders (CEs) produce distorted structural and fuzzy results, particularly in highly structured images. The Context Attribute (CA) is an effective semantic repair method, but the repair result is represented by structural disorder and color distortion. PICNet is intended to produce a wide variety of authentic images, but sometimes produces repetitive and structurally distorted images. The Shift-Net obtains higher peak signal-to-noise ratio (PSNR), the repairing result is subjectively better, but the outline edge in the subjective graph is fuzzy, and texture detail loss exists to some extent. It can be seen that these results are affected by distortions, which means that these methods may strive to balance the generation of textures and structures. Compared with the methods, subjectively, the model can better deal with the problems, generate a more intuitive and real generation result, and obtain the optimal peak signal-to-noise ratio (PSNR) and Structural Similarity (SSIM) on an objective result.

TABLE 1 Objective quality comparison of different algorithms (center missing region)

Fig. 4 is a graph comparing the visual results of the present invention with EdgeConnect, StructureFlow, GatedConv on a repaired image of an irregular missing region. From table 2 and fig. 4, it can be seen that: EdgeConnect is limited in handling some complex, large and irregular missing regions, resulting in many meaningless textures and distorted structures. The structured flow is effective for filling irregular holes, but still inevitably has over-smoothed results in some areas, requiring additional input calculations. GatedConv has poor recovery properties for complex structures and the results are inconsistent with the surrounding environment. The main reason for these results is that these methods do not take into account the influence between the low and high frequencies of the input image. Compared with the methods, the method based on the probabilistic inference model can better process the problems and can simultaneously generate reasonable structure and rich texture details.

TABLE 2 Objective quality comparison of different algorithms (irregular missing regions)

完整详细技术资料下载
上一篇:石墨接头机器人自动装卡簧、装栓机
下一篇:可穿戴护具以及用于可穿戴护具的场景呈现方法

网友询问留言

已有0条留言

还没有人留言评论。精彩留言会获得点赞!

精彩留言,会给你点赞!