No-reference quality evaluation method for night image

文档序号:9250 发布日期:2021-09-17 浏览:37次 中文

1. A no-reference quality evaluation method for night images is characterized by comprising the following steps:

step (1), natural scene statistics NSS feature extraction;

image distortion was captured using NSS: the NSS characteristics are derived from a local binary pattern mapping (LBP) and a contrast normalization coefficient (MSCN) coefficient of the image, and the LBP and MSCN characteristics respectively measure structural change and natural loss of the image when distortion occurs;

step (2), extracting human visual perception features;

studying the perception characteristics of the human brain from the perspective of human visual perception to characterize image quality; under the IGM, the human brain can generate corresponding representation to a visual scene and perform visual perception; thus, the perceptual quality of an image is closely related to the brain characterization process and the differences between the image and the brain characterization version;

approximating IGM in the human brain using sparse representations to perceive external image signals;

step (3), extracting semantic features;

pre-training an image semantic extraction deep neural network on an ImageNet data set, and then directly obtaining high-level semantic information of the image through the trained network; specifically, an image I to be evaluated is input into a pre-trained network, and then the network is activated before the last softmax layer of the network, so that a 1000-dimensional vector is obtained to represent high-level semantic information of the image;

step (4), calculating image quality;

after NSS characteristics, human visual perception characteristics and semantic characteristics are extracted, the extracted characteristics are connected together to generate a comprehensive characteristic vector to represent the overall quality of the image; then, training the comprehensive characteristic vector and the subjective mean opinion score MOS provided by the database by using the SVR so as to obtain an image quality score prediction model; and predicting the image quality through the trained image quality score prediction model.

2. The nighttime image-oriented no-reference quality evaluation method according to claim 1, wherein the specific method of step (1) is as follows;

first, LBP defines a local structure method operator, when the original image receives external distortion, the LBP value of the image will change, LBP acts on a local image block, and g is assumedcAnd gpCentral pixel and for g respectively representing a local image blockcOther inter-neighbor pixels that are circularly symmetric, according to the rotation invariant consistency of the central pixel, the LBP coding can be calculated as:

where P and R represent the number and radius of adjacent pixels, and s (-) is a step function defined as:

and U (-) is a function for measuring consistency, and counts the spatial conversion times of a mode, and is specifically defined as:

then, LBP coding pixel values are calculated one by one to generate an LBP mapping chart M of 0-1:

wherein x and y are indexes of pixel points, and l is a constant value used for determining a corresponding LBP code;

then, the LBP mapping chart is characterized to extract quality perception characteristics; taking the average value of the obtained LBP mapping map as one dimension of the NSS characteristic vector, and recording the change of image quality through the transformation of the LBP average value;

the second type of NSS feature is derived from MSCN coefficients, which can be calculated for the input image I as:

where x and y represent the pixel coordinates,denotes MSCN coefficients at (x, y), μ (x, y) and σ (x, y) being the mean and standard deviation of the local image block centered at (x, y); modeling the distribution of MSCN coefficients by adopting zero-mean Generalized Gaussian Distribution (GGD), which comprises the following specific processes:

where Γ (·) represents a gamma function, which may be specified as:

wherein alpha and beta represent parameters of GGD, and can be accurately estimated by a method based on moment matching; since the model parameters α and β describing the GGD are very reflective to image distortion, they are used as quality features to capture the introduced image distortion, which is also added to the NSS feature vector;

further calculating MSCN coefficients of adjacent positions along four directions, namely horizontal, vertical, main diagonal and secondary diagonal; modeling by using zero-order asymmetric generalized Gaussian distribution AGGD;

the mean of this distribution is defined as:

also, the parameters (gamma, beta) of the AGGD modellrη) is also introduced into the NSS feature vector of the image I.

3. The method for evaluating the quality of the nighttime image without reference according to claim 2, wherein the specific method in the step (2) is as follows:

approximating IGM in the human brain using sparse representations to perceive external image signals;

extracting image blocks in a given image IWherein k is an image block serial number and represents the kth image block; and xkIn an overcomplete dictionaryThe above sparse representation actually means finding a sparse vector with elements mostly zero or close to zeroSpecifically, can be expressed as:

wherein alpha iskThe first term of (a) represents fidelity, the second term represents sparsity constraint; i | · | purple windpIs represented by lpλ is a constant used to adjust the weights of the two parts; by solving the above equation, the expression x can be obtainedkA sparse vector of (d); because the sparse vector encodes the brain-perceived important information of the visual input, the sparse vector is subjected to perception feature extraction:

it is summarized using standard deviation; suppose σkIs alphakTaking the average value of the obtained standard deviations of all sparse vectors as one dimension of the perceptual feature vector of the image I:

wherein N is expressed as the total number of sparse vectors of the image I;

the input image I can obtain a version I' of human brain prediction after sparse representation, and difference information is quantized from two angles of prediction residual and construction difference;

first, the prediction residual refers to the direct difference between I and I', which is specifically defined as:

PR(x,y)=I(x,y)-I′(x,y) (12)

where x and y are pixel coordinates and PR is the predicted residual value; in order to extract perception features, moment features and entropy features of the PR are utilized to summarize a random variable, and the relation between the random variable and human visual perception is explicitly disclosed; specifically, the mean m of PR is calculatedPRStandard deviation σPRDeviation SPRKurtosis kPRAnd entropy ePRTo represent the feature information of the image, and adding the feature information into the perception feature vector; assuming ε (·) is the mean operator, the perceptual features can be calculated as:

mPR=ε(PR) (13)

wherein p isiProbability density of ith gray scale in PR;

because the distribution of PR is very sensitive to the change of image quality, GGD is used for fitting PR distribution widely, and the best fitting parameter is selected to be used as one dimension of the human visual perception characteristic vector of the image I;

the structural similarity between I and I' was measured using the quality index SSIM:

wherein muIAnd muI′Is the average intensity of I and I'. sigmaIAnd σI′Is the standard deviation, σII′Is a correlation coefficient, C1、C2Constant values to avoid instability; calculating the structural similarity value pixel by pixel to obtain a pixel-level structural similarity graph which is marked as SS, wherein the SSIM value in the SS is used for measuring the similarity degree between two co-located pixels, the value of the SSIM value is less than or equal to 1, and the SSIM value being 1 represents that two compared signals are completely equal; the structural dissimilarity is therefore defined as the distance between the SSIM value and 1:

SD(x,y)=1-SS(x,y) (19)

wherein x and y are pixel coordinates, and SD is a structural dissimilarity graph between I and I';

to characterize the quality of I, the moment and entropy features of SD are selected as a set of dimensions for perceptual quality features, i.e., mSD、σSD、sSD、kSDAnd eSD(ii) a The SD distribution was fitted using the Weibull function to extract quality-aware features, specifically defined as:

wherein the parameters λ and v of their weibull distributions are also introduced into the human visual perception feature vector of the image I.

4. The nighttime image-oriented no-reference quality evaluation method according to claim 3, wherein the specific method in step (4) is as follows:

given a training set omega, and then selecting from the training setTaking an image IiThen extracting its NSS feature f1Human visual perception feature f2And semantic features f3As a feature vectorSuppose an image IiHas a MOS value ofThe final quality prediction model can then be expressed as:

wherein M is the image quality fraction prediction model required by the user;

finally, when image quality prediction is carried out, a new image I to be predicted is subjected tojThe comprehensive characteristic vector is extracted firstThen, the trained image quality score prediction model M is used for carrying out quality prediction on the image quality score, and the image quality can be expressed as:

wherein Q is the picture IjIs determined.

Background

With the increasing proliferation of consumer photography, consumers have made higher demands on the ability to capture images in nighttime environments, and the quality of the images will directly affect the quality of the consumer's experience. Night images shot in a night environment with weak light show low contrast, blurred details, reduced visibility and the like, and the overall image visual quality is reduced. Therefore, it is imperative for consumer photography and image processing systems to design an Image Quality Assessment (IQA) indicator that can predict and assist in improving nighttime image quality. In addition, many real-time image processing algorithms and image-driven applications, such as visual monitoring and autopilot, will be strongly influenced by the quality of the input image, and a reasonably designed IQA method can effectively benchmark and optimize performance for these algorithms and systems.

Since the nighttime image is usually viewed by a person, the way to judge the nighttime image quality is the subjective quality evaluation of the observer. However, subjective quality detection is time-consuming and labor-consuming, and it is difficult to meet the real-time requirement. In contrast, objective Image Quality Assessment (IQA) by an efficient computational model is more attractive in the image quality assessment task. According to the accessibility of the original image, the existing objective IQA methods can be divided into a full-reference (FR), a partial-reference (RR) and a no-reference (NR). The FR IQA method refers to a method of calculating image quality with full reference to an original image. RR IQA refers to extracting partial information from the original image for image quality evaluation. FR and RR IQA are able to achieve higher prediction performance in the case where the original image is fully or partially available. However, when the original image is missing or unavailable, the IQA methods of both FR and RR become unavailable. In this case, the image quality can be measured using NR IQA/biqa (blind IQA) that does not need any information of the reference original image. For night images, the target image is often captured directly by the camera, with no raw image to reference. Therefore, we focus here on BIQA studies that are closer to reality.

From the overall research situation, night images are developed in fields of enhancement, restoration and the like, but the research on quality evaluation is less. To our knowledge, BNBT is the first and only work dedicated to night images, and it proposes a night image blind quality assessment (BNBT) based on brightness and texture features while creating a night image database (NNID). They extract the luminance features from the super-pixel segment of the night image and the texture features from the gray level co-occurrence matrix. And then combining the brightness and the texture characteristics as the input of the SVR to obtain the quality score of the night image. Although the final evaluation result is effective, the final evaluation result is not ideal enough, and the reasons are summarized as follows: firstly, brightness and texture features are relatively simple features in an image, belong to low-level features, and can not acquire enough information; second, the image quality is actually the result of human visual perception, and if the image quality can be further characterized from the perspective of human visual perception, the related contents can be enriched.

Therefore, under the background, a new evaluation method is provided based on the information of the low, medium and high layers of images, namely Natural Scene Statistics (NSS), human brain visual perception characteristics and semantic information.

Disclosure of Invention

Aiming at the defects in the prior art, the invention provides a nighttime image-oriented no-reference quality evaluation method.

The research of the method is about the non-reference evaluation problem of the night image, and the method effectively characterizes the image quality from the low, middle and high layers of feature information on the basis of deep exploration of Natural Scene Statistics (NSS), human brain visual perception features and high-level semantic information, and provides a novel BIQA method for the night image. On the one hand, NSS is widely considered to have a better capability of capturing characteristic information in the image bottom layer, which motivates us to use NSS for blind quality assessment. Specifically, we explore new NSS structures from the local binary pattern map (LBP) and local mean-reduced contrast normalization coefficients (MSCN) of images. On the one hand, since image quality is actually a result of human visual perception, we choose to express the mid-level feature information of an image from the perspective of the Human Visual System (HVS). In particular, on the basis of free energy studies, we use sparse representations to approximate the process of perception of external image signals by an endogenous model (IGM) in the human brain. On the other hand, the high-level semantic features of the images often hide very rich content information, and therefore, a great deal of research is conducted on DNN models for image semantic information extraction work, such as VGG, SqueezeNet, GoogleNet, ResNet and the like, and the semantic information of the images is extracted through intermediate work of the deep networks.

Finally, after integrating the above studies, we finally designed a set of quality perception features to comprehensively characterize the nighttime image quality. Specifically, we used powerful SVRs to integrate all quality-aware features to get corresponding quality scores and performed sufficient experiments on a representative night-time image database (NNID). Experiments have shown that our proposed method performs better than the most advanced BIQA method in terms of work evaluating nighttime image quality. For convenience of reference, we refer to the proposed quality assessment method as the Blind night image quality Index (BNQI).

A nighttime image-oriented no-reference quality evaluation method comprises the following steps:

step (1), natural scene statistics NSS feature extraction;

image distortion was captured using NSS: the NSS characteristics are derived from local binary pattern mapping (LBP) and contrast normalization coefficient (MSCN) coefficients of the image, and the LBP and MSCN characteristics respectively measure structural change and natural loss of the image when distortion occurs.

Step (2), extracting human visual perception features;

the perceptual features of the human brain are studied from the perspective of human visual perception to characterize image quality. Under the IGM-based condition, the human brain can generate corresponding characteristics to the visual scene and perform visual perception. The perceived quality of the image is therefore closely related to the brain characterization process and the differences between the image and the brain characterized version.

The IGM in the human brain is approximated using sparse representations to perceive external image signals.

Step (3), extracting semantic features;

and training image semantics on the ImageNet data set in advance to extract a deep neural network, and then directly obtaining high-level semantic information of the image through the trained network. Specifically, an image I to be evaluated is input into a pre-trained network, and then the network is activated before the last softmax layer of the network, so that a 1000-dimensional vector is obtained to represent high-level semantic information of the image.

Step (4), calculating image quality;

after the NSS features, the human visual perception features and the semantic features are extracted, the extracted features are connected together to generate a comprehensive feature vector to represent the overall quality of the image. Then, the SVR is used for training the comprehensive characteristic vector and the subjective mean opinion score MOS provided by the database, so that an image quality score prediction model is obtained. And predicting the image quality through the trained image quality score prediction model.

The specific method of the step (1) is as follows;

first, LBP defines a local structure method operator, when the original image receives external distortion, the LBP value of the image will change, LBP acts on a local image block, and g is assumedcAnd gpCentral pixel and for g respectively representing a local image blockcOther inter-neighbor pixels that are circularly symmetric, according to the rotation invariant consistency of the central pixel, the LBP coding can be calculated as:

where P and R represent the number and radius of adjacent pixels, and s (-) is a step function defined as:

and U (-) is a function for measuring consistency, and counts the spatial conversion times of a mode, and is specifically defined as:

then, LBP coding pixel values are calculated one by one to generate an LBP mapping chart M of 0-1:

wherein x and y are indexes of pixel points, and l is a constant value used for determining the corresponding LBP code.

The LBP map is then characterized to extract quality-aware features. And taking the average value of the obtained LBP mapping map as one dimension of the NSS feature vector, and recording the change of the image quality through the transformation of the LBP average value.

The second type of NSS feature is derived from MSCN coefficients, which can be calculated for the input image I as:

where x and y represent the pixel coordinates,denotes the MSCN coefficient at (x, y), and μ (x, y) and σ (x, y) are the mean and standard deviation of the local image block centered at (x, y). Modeling the distribution of MSCN coefficients by adopting zero-mean Generalized Gaussian Distribution (GGD), which comprises the following specific processes:

where Γ (·) represents a gamma function, which may be specified as:

where α and β represent parameters of GGD, can be accurately estimated by a method based on moment matching. Since the model parameters α and β describing the GGD are very reflective to image distortion, they are used as quality features to capture the introduced image distortion, which is also added to the NSS feature vector.

The MSCN coefficients of adjacent positions, i.e., horizontal, vertical, major diagonal, and minor diagonal, are further calculated along four directions. Modeling was performed using a generalized gaussian distribution AGGD of zero order asymmetry.

The mean of this distribution is defined as:

also, the parameters (gamma, beta) of the AGGD modell,βrη) is also introduced into the NSS feature vector of the image I.

The specific method of the step (2) is as follows:

the IGM in the human brain is approximated using sparse representations to perceive external image signals.

Extracting image blocks in a given image IWhere k is the image block number and represents the kth image block. And xkIn an overcomplete dictionaryOn sparse representation of realityThe above means to find a sparse vector with mostly zero or close to zero elementsSpecifically, can be expressed as:

wherein alpha iskThe first term of (a) represents fidelity and the second term represents sparsity constraints. I | · | purple windpIs represented by lpIs a constant used to adjust the weights of the two parts. By solving the above equation, the expression x can be obtainedkThe sparse vector of (2). Because the sparse vector encodes the brain-perceived important information of the visual input, the sparse vector is subjected to perception feature extraction:

standard deviations were used to summarize them. Suppose σkIs alphakTaking the average value of the obtained standard deviations of all sparse vectors as one dimension of the perceptual feature vector of the image I:

where N is expressed as the total number of sparse vectors for image I.

The input image I can obtain a version I' of human brain prediction after sparse representation, and difference information is quantized from two angles of prediction residual and construction difference.

First, the prediction residual refers to the direct difference between I and I', which is specifically defined as:

PR(x,y)=I(x,y)-I′(x,y) (12)

where x and y are pixel coordinates and PR is the predicted residual value. In order to extract perceptual features, moment features and entropy features of the PR are used to summarize a random variable and explicitly reveal their relationship to human visual perception. Specifically, the mean m of PR is calculatedPRStandard deviation σPRDeviation sPRKurtosis kPRAnd entropy ePRTo represent the feature information of the image, which is added to the perceptual feature vector. Assuming ε (·) is the mean operator, the perceptual features can be calculated as:

mPR=ε(PR) (13)

wherein p isiIs the probability density of the ith gray level in PR.

Since the distribution of PR is very sensitive to changes in image quality, GGD is used to fit PR distributions extensively and the best fit parameter is chosen as one of the dimensions of the human visual perception feature vector of image I.

The structural similarity between I and I' was measured using the quality index SSIM:

wherein muIAnd muI′Is the average intensity of I and I'. sigmaIAnd σI′Is the standard deviation, σII′Is a correlation coefficient, C1、C2To avoid unstable constant values. Calculating the structure similarity value pixel by pixel to obtain a pixel-level structure similarity graph, which is marked as SS, and because the SSIM value in the SS is measured by two co-located pixelsThe similarity between the elements is 1 or less, and SSIM ═ 1 indicates that the two signals compared are completely equal. The structural dissimilarity is therefore defined as the distance between the SSIM value and 1:

SD(x,y)=1-SS(x,y) (19)

wherein x and y are pixel coordinates, and SD is a structural dissimilarity graph between I and I'.

To characterize the quality of I, the moment and entropy features of SD are selected as a set of dimensions for perceptual quality features, i.e., mSD、σSD、sSD、kSDAnd eSD. The SD distribution was fitted using the Weibull function to extract quality-aware features, specifically defined as:

wherein the parameters λ and v of their weibull distributions are also introduced into the human visual perception feature vector of the image I.

The specific method of the step (4) is as follows:

giving a training set omega, and then selecting an image I from the training setiThen extracting its NSS feature f1Human visual perception feature f2And semantic features f3As a feature vectorSuppose an image IiHas a MOS value ofThe final quality prediction model can then be expressed as:

where M is the image quality score prediction model we need.

Finally, when image quality prediction is carried out, a new image I to be predicted is subjected tojThe comprehensive characteristic vector is extracted firstThen, the trained image quality score prediction model M is used for carrying out quality prediction on the image quality score, and the image quality can be expressed as:

wherein Q is the picture IjIs determined.

The invention has the following beneficial effects:

the invention designs a new evaluation model aiming at the non-reference quality evaluation problem of the night image. Specifically, on the basis of research on Natural Scene Statistics (NSS), human brain visual perception features and high-level semantic information, a new group of quality perception features are designed to represent image quality from the perspective of three layers, namely low, medium and high layers, of image information. Compared with other advanced nighttime image evaluation work, the overall evaluation performance of the method is better.

Drawings

FIG. 1 is a flow chart of a method according to an embodiment of the present invention;

fig. 2 extracts semantic information of an input image using SqueezeNet.

Detailed Description

The method of the invention is further described below with reference to the accompanying drawings and examples.

As shown in fig. 1, a method for evaluating quality of nighttime images without reference specifically includes the following steps:

step (1), natural scene statistics NSS feature extraction;

image distortion was captured using NSS: the NSS features are derived from local binary pattern mapping (LBP) and contrast normalization coefficient (MSCN) coefficients of the image, and the LBP and MSCN features respectively measure structural change and natural loss of the image when distortion occurs, and the LBP and the MSCN features have high indicative performance in the aspect of representing image quality.

First, LBP defines a local structure method operator, which has been successfully applied to various computer vision tasks such as face recognition, texture classification, etc. It is characterized by that when the original image receives external distortion, the LBP value of the image can be changed, the LBP is acted on a local image block, and it is assumed that gcAnd gpCentral pixel and for g respectively representing a local image blockcOther inter-neighbor pixels that are circularly symmetric, according to the rotation invariant consistency of the central pixel, the LBP coding can be calculated as:

where P and R represent the number and radius of adjacent pixels, and s (-) is a step function defined as:

and U (-) is a function for measuring consistency, and counts the spatial conversion times of a mode, and is specifically defined as:

then, LBP coding pixel values are calculated one by one to generate an LBP mapping chart M of 0-1:

wherein x and y are indexes of pixel points, and l is a constant value used for determining the corresponding LBP code.

The LBP map is then characterized to extract quality-aware features. And taking the average value of the obtained LBP mapping map as one dimension of the NSS feature vector, and recording the change of the image quality through the transformation of the LBP average value.

The second NSS feature we extract to characterize image quality comes from MSCN coefficients. Previous researches show that the MSCN coefficient of a natural image well accords with unit Gaussian distribution, and the MSCN coefficient is extremely easy to damage by introducing distortion. The degree of distribution variation can therefore be used to represent the variation in image quality, and in particular the MSCN coefficients of the input image I can be calculated as:

where x and y represent the pixel coordinates,denotes the MSCN coefficient at (x, y), and μ (x, y) and σ (x, y) are the mean and standard deviation of the local image block centered at (x, y). Modeling the distribution of MSCN coefficients by adopting zero-mean Generalized Gaussian Distribution (GGD), which comprises the following specific processes:

where Γ (·) represents a gamma function, which may be specified as:

where α and β represent parameters of GGD, can be accurately estimated by a method based on moment matching. Since the model parameters α and β describing the GGD are very reflective to image distortion, they are used as quality features to capture the introduced image distortion, which is also added to the NSS feature vector.

In addition, the product of pairs of adjacent MSCN coefficients can also effectively characterize image quality. Therefore, we further calculate the MSCN coefficients of neighboring positions along four directions, i.e., horizontal, vertical, major diagonal, and minor diagonal. These results can be modeled using a generalized gaussian distribution with zero order Asymmetry (AGGD).

The mean of this distribution is defined as:

also, the parameters (gamma, beta) of the AGGD modell,βrη) is also introduced into the NSS feature vector of the image I.

Step (2), extracting human visual perception features;

in addition to image quality assessment using NSS, we also investigated the perceptual features of the human brain from the perspective of human visual perception to characterize image quality.

In brain theory and neuroscience, a theory called the free energy principle is proposed that combines several brain theories and physical knowledge to explain the human behavior, perception and learning process. The free energy principle indicates that the perception or understanding of image input by the human brain is an active reasoning process, controlled by an Internal Generative Model (IGM). More specifically, under the IGM, the human brain produces corresponding representations of the visual scene and performs visual perception. The perceived quality of the image is therefore closely related to the brain characterization process and the differences between the image and the brain characterized version.

Therefore, based on the powerful neurobiological support in IQA and careful practice, we will also continue to approximate IGM in the human brain using sparse representations to perceive external image signals.

In particular, the basic unit for sparsely representing an image is usually an image block, so that an image block is first extracted from a given image IWhere k is the image block number and represents the kth image block. And xkIn an overcomplete dictionaryThe above sparse representation actually means finding a sparse vector with elements mostly zero or close to zeroSpecifically, can be expressed as:

wherein alpha iskThe first term of (a) represents fidelity and the second term represents sparsity constraints. I | · | purple windpIs represented by lpNorm (l) ofpThe norm P of image I, where we take P to 0, adopts the zeroth norm), λ is a constant used to adjust the weights of these two parts. By solving the above equation, the expression x can be obtainedkThe sparse vector of (2). Because the sparse vector encodes the brain-perceived important information of the visual input, the sparse vector is subjected to perception feature extraction:

here we focus on the effect of the deviation and summarize it using the standard deviation. Suppose σkIs alphakTaking the average value of the obtained standard deviations of all sparse vectors as one dimension of the perceptual feature vector of the image I:

where N is expressed as the total number of sparse vectors for image I.

The input image I can obtain a version I 'of human brain prediction after sparse representation, and the difference between I and I' can be found through research to well represent the perception quality of the image I. We therefore go into further discussion here, specifically we quantify the difference information from both the perspective of prediction residual and the perspective of construction difference.

First, the prediction residual refers to the direct difference between I and I', which is also a widely used method in measuring the difference between two signals, and is specifically defined as:

PR(x,y)=I(x,y)-I′(x,y) (12)

where x and y are pixel coordinates and PR is the predicted residual value. In order to extract perceptual features, moment features and entropy features of the PR (partial response) are utilized to summarize a random variable, and the relationship between the random variable and human visual perception is clearly revealed. Specifically, the mean m of PR is calculatedPRStandard deviation σPRDeviation sPRKurtosis kPRAnd entropy ePRTo represent the feature information of the image, which is added to the perceptual feature vector. Assuming ε (·) is the mean operator, the perceptual features can be calculated as:

mPR=ε(PR) (13)

wherein p isiIs the probability density of the ith gray level in PR.

Since the distribution of PR is very sensitive to changes in image quality, GGD is used to fit PR distributions extensively and the best fit parameter is chosen as one of the dimensions of the perceptual feature vector of image I.

Secondly, since the human brain has a strong visual perception sensitivity to structural information in the image, we also measure structural differences between the image I and the version I' represented by its brain. To this end, we first use the quality index SSIM to measure the structural similarity between I and I':

wherein muIAnd muI′Is the average intensity of I and I'. sigmaIAnd σI′Is the standard deviation, σII′Is a correlation coefficient, C1、C2To avoid unstable constant values. And calculating the structural similarity value pixel by pixel to obtain a structural similarity graph of a pixel level, and marking the structural similarity graph as SS, wherein the similarity between two co-located pixels is measured by an SSIM value in the SS, the value of the SSIM value is less than or equal to 1, and the SSIM value of 1 indicates that two compared signals are completely equal. The structural dissimilarity is therefore defined as the distance between the SSIM value and 1:

SD(x,y)=1-SS(x,y) (19)

wherein x and y are pixel coordinates, and SD is a structural dissimilarity graph between I and I'.

To characterize the quality of I, the moment and entropy features of SD are selected as a set of dimensions for perceptual quality features, i.e., mSD、σSD、sSD、kSDAnd eSD. Also we examined the SD distribution and found that it can also be used to capture image quality changes, so we fit the SD distribution using the Weibull function to extract quality-aware features, defined specifically as:

wherein the parameters λ and v of their weibull distributions are also introduced into the perceptual feature vector of the image I.

Step (3), extracting semantic features;

the semantic information of the image is not considered in the extracted features, and high-level semantic information can help us to visually reflect the image quality, so that the work of extracting the features related to the semantic information is introduced. Obviously, semantic information extraction of high-quality images is much easier, while low-quality images will hinder useful semantic information extraction, thereby reducing image quality. In order to achieve the aim, an ImageNet data set is trained in advance to extract the deep neural network of image semantics, and then high-level semantic information of the image is obtained directly through the trained network. Specifically, an image I to be evaluated is input into a pre-trained network, and then the network is activated before the last softmax layer of the network, so that a 1000-dimensional vector is obtained to represent high-level semantic information of the image. The deep neural network with better extraction effect on the image semantic information is adopted by the deep neural network for image semantic extraction, such as VGG, SqueezeNet, GoogleNet or ResNet.

Here, the squeezet network with good extraction performance and a small model is selected to extract semantic information, and a specific process of the method is demonstrated in fig. 2. An image block which is located at the center and is 227 multiplied by 227 is extracted from an input image, the image block is input into a neural network, and finally output information of a 'global avgpool' layer is extracted so as to represent semantic information of the image.

Step (4), calculating image quality;

after the NSS features, the human visual perception features and the semantic features are extracted, the extracted features are connected together to generate a comprehensive feature vector to represent the overall quality of the image. Then, the SVR is used for training the comprehensive characteristic vector and the subjective Mean Opinion Score (MOS) provided by the database so as to obtain an image quality score prediction model. And predicting the image quality through the trained image quality score prediction model.

The NSS characteristic, the human visual perception characteristic and the semantic characteristic are all multi-dimensional vectors, numerical values of corresponding parameters and the like are extracted to serve as one dimension, and finally the three vectors are spliced into a large characteristic vector (the three vectors are directly subjected to dimension addition and placed in the same vector).

Specifically, a training set Ω is given, and then an image I is selected from the training setiThen extracting its NSS feature f1Human visual perception feature f2And semantic features f3As a feature vectorSuppose an image IiHas a MOS value ofThe final quality prediction model can then be expressed as:

where M is the image quality score prediction model we need.

Finally, when image quality prediction is carried out, a new image I to be predicted is subjected tojThe comprehensive characteristic vector is extracted firstThen, the trained image quality score prediction model M is used for carrying out quality prediction on the image quality score, and the image quality can be expressed as:

wherein Q is the picture IjIs determined.

完整详细技术资料下载
上一篇:石墨接头机器人自动装卡簧、装栓机
下一篇:一种基于端到端算法的绝缘子缺陷检测的方法

网友询问留言

已有0条留言

还没有人留言评论。精彩留言会获得点赞!

精彩留言,会给你点赞!