Neural network model training method and device for image processing
1. A neural network model training method for image processing, comprising:
performing data enhancement on a source image to be processed, and adding background images with the same transparent color and different styles to obtain an enhanced data set;
extracting an edge position of a capture object for each image in the enhanced dataset using an edge detection algorithm;
calculating boundary region information corresponding to the edge position by using a morphological processing method, and intercepting capture object information of each image in the enhanced data set according to the boundary region information to form a training data set;
and inputting the training data set into a convolutional neural network model for training to obtain an image classification model.
2. The method of claim 1, wherein the data enhancement comprises at least one of an affine transformation, a color transformation, gaussian noise, or a blurring process.
3. The method of claim 2, wherein the affine transformation comprises at least one of a rotation transformation, a scaling transformation, a reflection transformation, or a shear transformation, the scaling transformation comprising an isotropic scaling transformation or a non-isotropic scaling transformation.
4. The method of claim 1, wherein the edge detection algorithm comprises an edge detection algorithm based on a trained ensemble nested edge detection network model.
5. The method of claim 1, wherein the morphological processing method comprises an erosion operation, a dilation operation, or a shutter operation.
6. The method of claim 1, wherein the step of truncating capture object information for each image in the enhanced data set according to the boundary region information further comprises:
setting a background area outside the boundary area of each image to be white.
7. The method of claim 1, wherein the model training is applied to a cloud datacenter, the method further comprising:
according to a preset model cutting point, cutting the image classification model into a first part model and a second part model;
and respectively deploying the first partial model and the second partial model to a mobile terminal and an edge server.
8. The method of claim 7, wherein the step of deploying the first and second partial models in a mobile terminal and an edge server, respectively, further comprises:
and respectively carrying out compression coding on the first partial model and the second partial model.
9. The method of claim 1, wherein the convolutional neural network model comprises a lightweight network model.
10. The method of claim 9, wherein the lightweight network model comprises a MobileNet-v2 network model, the MobileNet-v2 network model comprising an input layer, a plurality of feature extraction layers, and a full connectivity layer.
11. The method of claim 10, wherein the input layer scales each image in the training data set to a preset size.
12. The method of claim 10, wherein the MobileNet-v2 network model is refined by:
the method comprises the steps that the parameters of a feature extraction layer in a pre-trained MobileNet-v2 network model are used as initialization parameters of an image classification model, and the image classification model is subjected to fine adjustment according to an image processing actual service data set; and
and modifying the output type into the number of the preset image types at the full connection layer, and initializing the parameters of the full connection layer by adopting an Xavier method.
13. The method of claim 1, wherein the MobileNet-v2 network model quantifies weights in a post-training quantification manner.
14. The method of claim 13, wherein the post-training quantization approach comprises:
after the convolutional neural network model is trained, quantizing the trained weights and activation functions from 32-bit floating point numbers to 8-bit integers; or
Both the trained weights and the activation function are quantized from 32-bit floating point numbers to 16-bit floating point numbers.
15. A neural network model training apparatus for image processing, comprising:
the data enhancement module is used for enhancing data of a source image to be processed, and adding the same transparent background images with different styles to obtain an enhanced data set;
an edge detection module for extracting an edge location of a capture object for each image in the enhanced dataset using an edge detection algorithm;
the morphological processing module is used for calculating boundary area information corresponding to the edge position by using a morphological processing method, and intercepting the capture object information of each image in the enhanced data set according to the boundary area information to form a training data set;
and the model training module is used for inputting the training data set into a convolutional neural network model for training to obtain an image classification model.
16. The apparatus of claim 1, wherein the model training module further comprises:
the model cutting unit is used for cutting the image classification model into a first part model and a second part model according to a preset model cutting point, and respectively deploying the first part model and the second part model to the mobile terminal and the edge server; and
and the model compression unit is used for respectively carrying out compression coding on the first partial model and the second partial model.
17. An electronic device, comprising:
one or more processors;
a memory for storing one or more programs,
wherein the one or more programs, when executed by the one or more processors, cause the one or more processors to implement the method of any of claims 1-15.
18. A computer-readable storage medium storing computer-executable instructions for implementing the method of any one of claims 1 to 15 when executed.
19. A computer program product comprising a computer program which, when executed by a processor, implements the method of any one of claims 1 to 15.
Background
The explosion of mobile internet has made it possible to provide users with a variety of services on portable devices, and the application platforms of technologies such as image processing have gradually expanded from PC to mobile. Sufficient computational resources, time resources, and storage resources are critical to ensuring model performance. Therefore, the image processing model deployed at the mobile terminal needs to have the characteristics of small occupied space, high accuracy and good real-time performance.
Typical deep convolutional network parameters are huge, have high requirements on computing resources, and are not suitable for a single terminal device (such as a camera, a mobile phone, and the like), for example, 50% of all model parameters in AlexNet are concentrated on the first full connection layer, and 90% of model parameters of VGG are parameters of the full connection layer. With the development of mobile internet, in order to use a neural network on a portable device, it is necessary to reduce the amount of calculation and parameters of a model, and lightweight neural networks represented by MobileNet series have come into play.
In implementing the disclosed concept, the inventors found that there are at least the following problems in the related art: at present, the mainstream image classification model is difficult to realize the service requirement of image processing of a mobile terminal. On one hand, convolutional neural network models such as ResNet-50 have higher accuracy, but tend to require more resources; on the other hand, although resource consumption is greatly reduced due to the appearance of lightweight neural networks such as the MobileNet-v2, the accuracy rate of the lightweight neural networks cannot meet the service requirements. Therefore, how to balance model performance (e.g., accuracy) and resource occupation (e.g., storage, computation, or time resources) is urgently needed.
Disclosure of Invention
In view of the above, the present disclosure provides a neural network model training method, apparatus, device, and medium for image processing.
One aspect of the present disclosure provides a neural network model training method for image processing, including: performing data enhancement on a source image to be processed, and adding background images with the same transparent color and different styles to obtain an enhanced data set; extracting an edge position of the capture object for each image in the enhanced dataset using an edge detection algorithm; calculating boundary region information corresponding to the edge position by using a morphological processing method, and intercepting capture object information of each image in the enhanced data set according to the boundary region information to form a training data set; and inputting the training data set into a convolutional neural network model for training to obtain an image classification model.
According to an embodiment of the present disclosure, the data enhancement includes at least one of affine transformation, color transformation, gaussian noise, or blurring processing.
According to an embodiment of the present disclosure, the affine transformation includes at least one of a rotation transformation, a scaling transformation, a reflection transformation, or a shear transformation, the scaling transformation including an isotropic scaling transformation or a non-isotropic scaling transformation.
According to an embodiment of the present disclosure, the edge detection algorithm includes an edge detection algorithm based on a trained global nested edge detection network model.
According to an embodiment of the present disclosure, the morphological processing method includes an etching operation, a dilation operation, or a switching operation.
According to an embodiment of the present disclosure, after the step of intercepting the capture object information of each image in the enhanced data set according to the boundary area information, the method further includes: the background area outside the border area of each image is set to white.
According to the embodiment of the disclosure, the model training is applied to the cloud data center, and the method further comprises the following steps: according to a preset model cutting point, cutting the image classification model into a first part model and a second part model; the first partial model and the second partial model are deployed at the mobile terminal and the edge server, respectively.
According to the embodiment of the present disclosure, after the step of deploying the first partial model and the second partial model in the mobile terminal and the edge server respectively, the method further includes: and respectively carrying out compression coding on the first partial model and the second partial model.
According to an embodiment of the present disclosure, the convolutional neural network model comprises a lightweight network model.
According to an embodiment of the present disclosure, the lightweight network model includes a MobileNet-v2 network model, the MobileNet-v2 network model including an input layer, a plurality of feature extraction layers, and a full connectivity layer.
According to an embodiment of the present disclosure, the input layer scales each image in the training data set to a preset size.
According to an embodiment of the present disclosure, the MobileNet-v2 network model is improved by: the method comprises the steps that the parameters of a feature extraction layer in a pre-trained MobileNet-v2 network model are used as initialization parameters of an image classification model, and the image classification model is subjected to fine adjustment according to an image processing actual service data set; and modifying the output type into the number of the preset image types at the full connection layer, and initializing the parameters of the full connection layer by adopting an Xavier method.
According to the embodiment of the disclosure, the MobileNet-v2 network model quantifies the weights in a post-training quantification manner.
According to an embodiment of the present disclosure, the post-training quantization approach includes: after the training of the convolutional neural network model is finished, quantizing the trained weight and the trained activation function from 32-bit floating point number to 8-bit integer; or quantize both the trained weights and activation functions from 32-bit floating point numbers to 16-bit floating point numbers.
Another aspect of the present disclosure provides a neural network model training apparatus for image processing, including: the data enhancement module is used for enhancing data of a source image to be processed, and adding the same transparent background images with different styles to obtain an enhanced data set; an edge detection module for extracting an edge position of a capture object of each image in the enhanced dataset using an edge detection algorithm; the morphological processing module is used for calculating boundary area information corresponding to the edge position by using a morphological processing method, intercepting the capture object information of each image in the enhanced data set according to the boundary area information to form a training data set; and the model training module is used for inputting the training data set into the convolutional neural network model for training to obtain an image classification model.
According to an embodiment of the present disclosure, the model training module further comprises: the model cutting unit is used for cutting the image classification model into a first part model and a second part model according to a preset model cutting point, and respectively deploying the first part model and the second part model to the mobile terminal and the edge server; and the model compression unit is used for respectively carrying out compression coding on the first partial model and the second partial model.
Another aspect of the present disclosure provides an electronic device including: one or more processors; a storage device to store one or more programs, wherein the one or more programs, when executed by the one or more processors, cause the one or more processors to perform the above neural network model training method for image processing.
Another aspect of the present disclosure provides a computer-readable storage medium storing computer-executable instructions for implementing the above neural network model training method for image processing when the instructions are executed.
Another aspect of the present disclosure provides a computer program comprising computer executable instructions for implementing the above neural network model training method for image processing when executed.
Compared with the prior art, the neural network model training method, the device, the equipment and the medium for image processing provided by the disclosure have at least the following beneficial effects:
(1) the existing convolutional neural network model is improved, the calculation, storage and wireless resources occupied by the model are saved while the image processing capacity is ensured, and the light-weight network model is conveniently and efficiently deployed under the condition of resource limitation;
(2) the image classification model has the advantages that the classification accuracy of the image classification model is guaranteed through multi-level image preprocessing such as data enhancement, edge detection and morphological processing.
Drawings
The foregoing and other objects, features and advantages of the disclosure will be apparent from the following description of embodiments of the disclosure, which proceeds with reference to the accompanying drawings, in which:
FIG. 1 schematically illustrates a system architecture of a neural network model training method and apparatus for image processing, according to an embodiment of the present disclosure;
FIG. 2 schematically illustrates a flow diagram of a neural network model training method for image processing, in accordance with an embodiment of the present disclosure;
FIG. 3 schematically illustrates an operational flow diagram of a neural network model training method for image processing, in accordance with an embodiment of the present disclosure;
FIG. 4 schematically illustrates a block diagram of various operations of an affine transformation according to an embodiment of the present disclosure;
FIG. 5 schematically shows a block diagram of bounding region information resulting from morphological processing according to an embodiment of the disclosure;
FIG. 6 schematically illustrates a modified M according to an embodiment of the disclosureoThe structure of the bilene-v 2 network model;
FIG. 7 schematically illustrates a flow diagram of a neural network model training method for image processing, according to another embodiment of the present disclosure;
FIG. 8 schematically illustrates a block diagram of a neural network model training apparatus for image processing, in accordance with an embodiment of the present disclosure;
FIG. 9 schematically illustrates a block diagram of a model training module according to another embodiment of the present disclosure; and
FIG. 10 schematically shows a block diagram of an electronic device according to an embodiment of the disclosure.
Detailed Description
Hereinafter, embodiments of the present disclosure will be described with reference to the accompanying drawings. It should be understood that the description is illustrative only and is not intended to limit the scope of the present disclosure. In the following detailed description, for purposes of explanation, numerous specific details are set forth in order to provide a thorough understanding of the embodiments of the disclosure. It may be evident, however, that one or more embodiments may be practiced without these specific details. Moreover, in the following description, descriptions of well-known structures and techniques are omitted so as to not unnecessarily obscure the concepts of the present disclosure.
The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the disclosure. The terms "comprises," "comprising," and the like, as used herein, specify the presence of stated features, steps, operations, and/or components, but do not preclude the presence or addition of one or more other features, steps, operations, or components.
All terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art unless otherwise defined. It is noted that the terms used herein should be interpreted as having a meaning that is consistent with the context of this specification and should not be interpreted in an idealized or overly formal sense.
Where a convention analogous to "at least one of A, B and C, etc." is used, in general such a construction is intended in the sense one having skill in the art would understand the convention (e.g., "a system having at least one of A, B and C" would include but not be limited to systems that have a alone, B alone, C alone, a and B together, a and C together, B and C together, and/or A, B, C together, etc.). Where a convention analogous to "A, B or at least one of C, etc." is used, in general such a construction is intended in the sense one having skill in the art would understand the convention (e.g., "a system having at least one of A, B or C" would include but not be limited to systems that have a alone, B alone, C alone, a and B together, a and C together, B and C together, and/or A, B, C together, etc.).
Some block diagrams and/or flow diagrams are shown in the figures. It will be understood that some blocks of the block diagrams and/or flowchart illustrations, or combinations thereof, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the instructions, which execute via the processor, create means for implementing the functions/acts specified in the block diagrams and/or flowchart block or blocks. The techniques of this disclosure may be implemented in hardware and/or software (including firmware, microcode, etc.). In addition, the techniques of this disclosure may take the form of a computer program product on a computer-readable storage medium having instructions stored thereon for use by or in connection with an instruction execution system.
In the technical scheme of the disclosure, the acquisition, storage, application and the like of the personal information of the related user all accord with the regulations of related laws and regulations, necessary security measures are taken, and the customs of the public order is not violated.
The disclosure provides a neural network model training method for image processing, which can be applied to the technical field of artificial intelligence. The method comprises the following steps: performing data enhancement on a source image to be processed, and adding background images with the same transparent color and different styles to obtain an enhanced data set; extracting an edge position of the capture object for each image in the enhanced dataset using an edge detection algorithm; calculating boundary region information corresponding to the edge position by using a morphological processing method, and intercepting capture object information of each image in the enhanced data set according to the boundary region information to form a training data set; and inputting the training data set into a convolutional neural network model for training to obtain an image classification model.
Fig. 1 schematically illustrates a system architecture 100 of a neural network model training method and apparatus for image processing according to an embodiment of the present disclosure. It should be noted that fig. 1 is only an example of a system architecture to which the embodiments of the present disclosure may be applied to help those skilled in the art understand the technical content of the present disclosure, and does not mean that the embodiments of the present disclosure may not be applied to other devices, systems, environments or scenarios.
As shown in fig. 1, the system architecture according to this embodiment may include a terminal device 101, an edge server 102, and a cloud data center 103. The terminal device 101 and the edge server 102 may communicate with each other through a communication link, and the edge server 102 and the cloud data center 103 may also communicate with each other through a communication link. The communication link may include various connection types, such as wired and/or wireless communication links, and so forth.
Terminal device 101 may be a variety of electronic devices that support web browsing including, but not limited to, smart phones, tablets, laptop portable computers, desktop computers, and the like. In particular, the terminal device 101 may comprise a mobile end device, and more particularly, the mobile end device may comprise a mobile internet of things device. The mobile Internet of things equipment is an Internet of things terminal based on mobile handheld equipment, such as an automatic driving automobile, intelligent household appliances, an intelligent entrance guard, video terminal equipment and the like.
A user may use terminal device 101 to interact with edge server 102 over a communication link to receive or send messages or the like. Similarly, the edge server 102 may also interact with the cloud data center 103 via a communication link to receive or send messages, and the like. Various messaging client applications, such as shopping-like applications, web browser applications, search-like applications, instant messaging tools, mailbox clients, and/or social platform software, etc. (by way of example only) may be installed on terminal device 101.
The cloud data center 103 may be a data center, a cloud computing center, or a super computing center, has powerful computing and storage capabilities, and may provide various management or services, such as a background management server (for example only) that provides support for a website browsed by a user using the terminal device 101. The backend management server may analyze and perform other processing on the received data such as the user request, and feed back a processing result (e.g., a web page, information, or data obtained or generated according to the user request) to the terminal device 101 through the edge server 102.
It should be noted that the neural network model training method for image processing provided in the embodiments of the present disclosure may be generally performed by the cloud data center 103. Accordingly, the neural network model training device for image processing provided by the embodiment of the present disclosure may be generally disposed in the cloud data center 103.
Specifically, the image processing related to the embodiment of the present disclosure may include a mobile-side foreign currency recognition service, a mobile-side garbage classification service, or a digital recognition service. For example, the mobile-end foreign currency identification service is used for accurately, quickly and automatically identifying the currency type, the amount and the real-time exchange rate of foreign currencies; the mobile-end garbage classification service can automatically identify recyclable garbage, unrecyclable garbage, kitchen garbage or harmful garbage for citizens; the digital identification service can automatically identify the bank card number or the express bill number.
It should be understood that the number of end devices, edge servers, and cloud data centers in fig. 1 is merely illustrative. According to implementation needs, any number of terminal devices, edge servers and cloud data centers can be provided.
Fig. 2 schematically shows a flow diagram of a neural network model training method for image processing according to an embodiment of the present disclosure. FIG. 3 schematically illustrates an operational flow diagram of a neural network model training method for image processing, in accordance with an embodiment of the present disclosure.
The method shown in fig. 2 will be described in detail with reference to fig. 3. In the embodiment of the present disclosure, the neural network model training method for image processing may include operations S201 to S204.
In operation S201, a source image to be processed is subjected to data enhancement, and the same transparent color and background images of different styles are added to obtain an enhanced data set.
In operation S202, an edge position of a capture object of each image in the enhanced data set is extracted using an edge detection algorithm.
In operation S203, boundary region information corresponding to the edge position is calculated using a morphological processing method, and capture object information of each image in the enhanced data set is intercepted according to the boundary region information to form a training data set.
In operation S204, the training data set is input into the convolutional neural network model for training, so as to obtain an image classification model.
Through the embodiment of the disclosure, the classification accuracy of the image classification model is ensured through the multi-level image preprocessing of data enhancement, edge detection and morphological processing. Moreover, the method can reduce the requirements of the calculation resources and the storage resources of the convolutional neural network model, and can also ensure the image processing performance.
To prevent under-fitting and to simulate real image processing requests or search conditions for mobile end users, the present disclosure employs a data enhancement approach to expand the size of the data set. In the disclosed embodiment, the data enhancement may include at least one of affine transformation, color transformation, gaussian noise, or blurring processing. After the data enhancement operation, the same transparent color and background images in different styles are uniformly added to the expansion data set, so that the subsequent edge position extraction is facilitated.
In particular, the affine transformation may include at least one of a rotation transformation, a scaling transformation, a reflection transformation, or a shear transformation, the scaling transformation including an isotropic scaling transformation or a non-isotropic scaling transformation. The affine transformation includes the above rotation transformation, scaling transformation, reflection transformation, or shear transformation, and a combination of these transformations in arbitrary order of times.
For easy understanding, let the input image matrix be represented by P, the affine transformation matrix be represented by R, the output matrix be represented by Q, let the rotation angle be θ, and let the horizontal and vertical translational displacements be t respectivelyxAnd tyRepresenting the scaling factor as s, the affine transformation process can be characterized as follows:
FIG. 4 schematically illustrates a block diagram of various operations of an affine transformation according to an embodiment of the present disclosure.
As shown in fig. 4, the translation transformation and the rotation transformation are conventional image transformations, and the scaling transformation can be further divided into scaling transformations including isotropic scaling transformation or non-isotropic scaling transformation, where each axis has the same scaling factor, i.e. isotropic scaling, and the other scaling is non-isotropic scaling. The shear transform is to proportionally translate all points in some specified direction.
In addition, the color transformation refers to the transformation of image brightness, saturation and contrast, and is implemented by processing RGB three-channel data of an input image. Gaussian noise refers to randomly superimposing noise data satisfying gaussian distribution on an input image. The blurring process is implemented by using a low-pass filter.
After adding background images of different styles to the expanded data set, edge detection is continuously carried out to calculate edge coordinates of useful contents in the image, namely, objects to be classified are extracted from impure backgrounds.
In the embodiment of the present disclosure, the Edge Detection algorithm includes an Edge Detection algorithm based on a trained integral Nested-Edge Detection (HED) network model. It can be understood that the classical edge detection algorithm can be implemented by candy operators such as cv2.canny function and cv2.findcontours function in OpenCV, but many threshold parameters need to be set, and the method cannot work normally in a real mobile-end image processing scene. The method adopts the trained HED network model to construct the edge detector, and can work normally in a general scene.
By the embodiment of the disclosure, the end-to-end training and prediction based on the whole image are realized by using the trained HED network model aiming at the detection of the edge and the target boundary in the image processing field, and the multi-scale feature learning is completed at the same time.
In order to further precisely enhance the boundary position of the capture object of each image in the data set, the edge contour of the capture object can be made clearer using a morphological processing method.
In the disclosed embodiment, the morphological processing method includes an etching operation, an expansion operation, or an on/off operation.
The erosion operation is the thinning of the target object in the shrunk input image, the thinning procedure being controlled by the structuring element. The dilation operation is an operation that thickens or lengthens the input image, the degree of thickening or lengthening being controlled by the set of structuring elements.
The open and close operations are different combinations of erosion and dilation operations, including open and closed operations. For example, an on operation may be a first erosion operation followed by a dilation operation, and then an off operation may be a first dilation operation followed by an erosion operation.
Fig. 5 schematically shows a structure diagram of boundary region information obtained by morphological processing according to an embodiment of the present disclosure.
As shown in fig. 5, the capture object in the image may be a banknote, for example, and the boundary area where the banknote is located is the white part in the figure. The boundary region is clear. Therefore, the embodiment of the disclosure can accurately position to the frame of the capture object under an impure background through the multi-level image preprocessing of data enhancement, edge detection and morphological processing, and ensures the image preprocessing quality.
Further, after the step of intercepting the capture object information of each image in the enhanced data set according to the boundary area information, the method further comprises the following steps: the background area outside the border area of each image is set to white. Therefore, a uniform background is set outside the boundary area where the capture object is located, so that the capture object can be conveniently and uniformly input to a subsequent convolutional neural network model for image classification.
In an embodiment of the disclosure, the convolutional neural network model comprises a lightweight network model.
It can be understood that the computing power and the storage space of the mobile terminal are often limited, and the time consumption will seriously reduce the customer experience, so that the deployment of the convolutional neural network directly on the mobile terminal device with limited resources is often unable to meet the business requirements. According to the method, the model training is carried out at the cloud side by adopting the lightweight network model, the calculation, storage and wireless resources occupied by the model can be saved while the image processing capacity is ensured, and the efficient deployment of the lightweight network model can be conveniently realized under the condition of resource limitation.
Further, to better balance the model footprint and image processing capabilities, the lightweight network model can include a MobileNet-v2 network model that includes an input layer, a plurality of feature extraction layers, and a full connectivity layer.
Wherein the input layer scales each image in the training data set to a preset size. To accommodate the image processing requirements of the MobileNet-v2 network model, the predetermined size may be 224 × 224 × 3.
The plurality of feature extraction layers comprise operations such as convolution layers, pooling layers, activation functions, batch normalization and the like, and are used for completing the mapping of original data to hidden layer feature spaces. The fully connected layer plays a role of a classifier, and mapping of distributed feature representation learned by the feature extraction layer to a sample mark space is realized. In actual use, the fully concatenated layer may be implemented by a convolution operation with a convolution kernel of 1 × 1.
In the embodiment of the disclosure, the MobileNet-v2 network model is improved in the following way:
the method comprises the steps that the parameters of a feature extraction layer in a pre-trained MobileNet-v2 network model are used as initialization parameters of an image classification model, and the image classification model is subjected to fine adjustment according to an image processing actual service data set; and
and in the full connection layer, the output type is modified into the number of the preset image types, and the parameters of the full connection layer are initialized by adopting an Xavier method.
It will be appreciated that the Xavier method makes it possible to subject the input and output to the same distribution as much as possible, with the aim of avoiding the tendency of the output value of the ReLu activation function of the subsequent neural network layer towards 0.
Fig. 6 schematically shows a block diagram of an improved Mobilenet-v2 network model according to an embodiment of the present disclosure.
Table 1 is a structural table of a modified mobilene-v 2 network model according to an embodiment of the present disclosure, and the main feature points of the modified mobilene-v 2 network model are Inverted Residual answer and Linear Bottleneck Linear bottleeck, combining table 1 and fig. 6. The inverted residual is mainly used to increase the extraction of image features to improve the accuracy, and the linear bottleneck is mainly used to avoid the information loss of the nonlinear function.
In order to enable the value resolution of the RuLU activation function to be good even when the mobile device is low in precision, the activation range of the ReLU activation function is limited, experiments show that the output limit of the ReLU is 6-best, and the limited function is named as the ReLU6 again.
In table 1, t refers to the dimension of extension in the residual block; c is the output dimension and can also be used for judging the number of convolution kernels; s is the convolution step length and can be used for judging the multiple of down sampling; n is that the current row operation is repeatedly applied n times.
TABLE 1
Input device
Arithmetic unit
t
c
n
s
2242×3
Conv2d
-
32
1
2
1122×32
Bottleneck
1
16
1
1
1122×16
Bottleneck
6
24
2
2
562×24
Bottleneck
6
32
3
2
282×32
Bottleneck
6
64
4
2
142×64
Bottleneck
6
96
3
1
142×96
Bottleneck
6
160
3
2
72×160
Bottleneck
6
320
1
1
72×320
Conv2d1×1
6
1280
1
1
72×1280
Avgpool 7×7
-
-
1
-
1×1×1280
Conv2d 1×1
-
K
-
-
With reference to table 1 and fig. 6, the output classification value K of the improved mobilene-v 2 network model is also the number of preset image types.
Network quantization refers to the discrete representation thereof as low-precision numerical points, compressing the original network by reducing the number of bits required to represent each weight. In the embodiment of the disclosure, the MobileNet-v2 network model quantizes the weight by adopting a quantization mode after training.
Specifically, the post-training quantization approach includes: after the training of the convolutional neural network model is finished, quantizing the trained weight and the trained activation function from 32-bit floating point number to 8-bit integer; or quantize both the trained weights and activation functions from 32-bit floating point numbers to 16-bit floating point numbers.
Since in the mobilent-v 2 network model, the weight parameters and activation function output values are by default floating point numbers of 32 bits. The method adopts quantization after training, namely floating point numbers are still used in the training process, quantization is carried out on the weights after the training is finished, and the weights and the activation functions are quantized from 32-bit floating point numbers to 8-bit integer numbers or 16-bit floating point numbers.
With embodiments of the present disclosure, the image classification model is improved on the basis of a lightweight Mobilenet-v2 network model, and quantization compression is performed after training is completed to reduce the number of parameters and computational complexity.
Fig. 7 schematically illustrates a flow chart of a neural network model training method for image processing according to another embodiment of the present disclosure.
As shown in fig. 7, for brevity, features that are the same as those of the method shown in fig. 2 are not repeated, and only features that are different from those of the foregoing embodiment are described below, that is, in another embodiment of the present disclosure, operations S710 to S720 may be further added to the method shown in fig. 2.
In operation S710, the image classification model is segmented into a first partial model and a second partial model according to a preset model segmentation point.
In operation S720, the first partial model and the second partial model are deployed at the mobile terminal and the edge server, respectively.
Further, the terminal equipment comprises mobile end equipment, and the mobile end equipment comprises mobile internet of things equipment.
Through another embodiment of the present disclosure, available resources in a three-level hierarchical network architecture of a terminal device, an edge server and a cloud data center are fully utilized, and computation, storage and wireless resources occupied by a model are saved while service quality is ensured, so that efficient deployment of the model on a mobile terminal is realized under a condition of resource limitation.
Further, in another embodiment of the present disclosure, after the step of deploying the first partial model and the second partial model in the mobile terminal and the edge server respectively, the method may further include: and respectively carrying out compression coding on the first partial model and the second partial model. Therefore, after compression coding, the first partial model and the second partial model are respectively deployed on the mobile terminal and the edge server, and the convolutional neural network model is further optimized.
Based on the neural network model training method for image processing, the disclosure also provides a neural network model training device for image processing. The apparatus will be described in detail below with reference to fig. 8 and 9.
Fig. 8 schematically shows a block diagram of a neural network model training apparatus for image processing according to an embodiment of the present disclosure.
As shown in fig. 8, the neural network model training apparatus 800 for image processing may include a data enhancement module 810, an edge detection module 820, a morphology processing module 830, and a model training module 840.
And the data enhancement module 810 is used for performing data enhancement on the source image to be processed, and adding the same transparent background image and the background images with different styles to obtain an enhanced data set.
An edge detection module 820 for extracting the edge location of the capture object for each image in the enhanced data set using an edge detection algorithm.
And a morphology processing module 830, configured to calculate boundary region information corresponding to the edge position by using a morphology processing method, and intercept capture object information of each image in the enhanced data set according to the boundary region information to form a training data set.
And the model training module 840 is used for inputting the training data set into the convolutional neural network model for training to obtain an image classification model.
Further, in another embodiment of the present disclosure, the model training module 840 may include a model cutting unit 910 and a model compressing unit 920.
The model cutting unit 910 is configured to cut the image classification model into a first part model and a second part model according to a preset model cutting point, and deploy the first part model and the second part model to the mobile terminal and the edge server, respectively.
A model compressing unit 920, configured to perform compression encoding on the first partial model and the second partial model respectively.
It should be noted that the apparatus part in the embodiment of the present disclosure corresponds to the method part in the embodiment of the present disclosure, and the description of the neural network model training apparatus part for image processing specifically refers to the neural network model training method part for image processing, and is not described herein again.
Any number of modules, sub-modules, units, sub-units, or at least part of the functionality of any number thereof according to embodiments of the present disclosure may be implemented in one module. Any one or more of the modules, sub-modules, units, and sub-units according to the embodiments of the present disclosure may be implemented by being split into a plurality of modules. Any one or more of the modules, sub-modules, units, sub-units according to embodiments of the present disclosure may be implemented at least in part as a hardware circuit, such as a Field Programmable Gate Array (FPGA), a Programmable Logic Array (PLA), a system on a chip, a system on a substrate, a system on a package, an Application Specific Integrated Circuit (ASIC), or may be implemented in any other reasonable manner of hardware or firmware by integrating or packaging a circuit, or in any one of or a suitable combination of software, hardware, and firmware implementations. Alternatively, one or more of the modules, sub-modules, units, sub-units according to embodiments of the disclosure may be at least partially implemented as a computer program module, which when executed may perform the corresponding functions.
For example, any plurality of the data enhancement module 810, the edge detection module 820, the morphology processing module 830, the model training module 840, the model cutting unit 910, and the model compression unit 920 may be combined in one module/unit/sub-unit to be implemented, or any one of the modules/units/sub-units may be split into a plurality of modules/units/sub-units. Alternatively, at least part of the functionality of one or more of these modules/units/sub-units may be combined with at least part of the functionality of other modules/units/sub-units and implemented in one module/unit/sub-unit. According to an embodiment of the present disclosure, at least one of the data enhancement module 810, the edge detection module 820, the morphology processing module 830, the model training module 840, the model cutting unit 910, and the model compression unit 920 may be implemented at least in part as a hardware circuit, such as a Field Programmable Gate Array (FPGA), a Programmable Logic Array (PLA), a system on a chip, a system on a substrate, a system on a package, an Application Specific Integrated Circuit (ASIC), or may be implemented in hardware or firmware in any other reasonable manner of integrating or packaging a circuit, or in any one of three implementations of software, hardware, and firmware, or in any suitable combination of any of them. Alternatively, at least one of the data enhancement module 810, the edge detection module 820, the morphology processing module 830, the model training module 840, the model cutting unit 910 and the model compression unit 920 may be at least partially implemented as a computer program module, which when executed, may perform a corresponding function.
FIG. 10 schematically shows a block diagram of an electronic device according to an embodiment of the disclosure. The electronic device shown in fig. 10 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present disclosure.
As shown in fig. 10, the electronic device 1000 includes a processor 1010, a computer-readable storage medium 1020. The electronic device 1000 may perform a neural network model training method for image processing according to an embodiment of the present disclosure.
In particular, processor 1010 may include, for example, a general purpose microprocessor, an instruction set processor and/or related chip set and/or a special purpose microprocessor (e.g., an Application Specific Integrated Circuit (ASIC)), and/or the like. The processor 1010 may also include on-board memory for caching purposes. Processor 1010 may be a single processing unit or multiple processing units for performing different acts of a method flow according to embodiments of the disclosure.
Computer-readable storage media 1020, for example, may be non-volatile computer-readable storage media, specific examples including, but not limited to: magnetic storage devices, such as magnetic tape or Hard Disk Drives (HDDs); optical storage devices, such as compact disks (CD-ROMs); a memory, such as a Random Access Memory (RAM) or a flash memory; and so on.
The computer-readable storage medium 1020 may comprise a computer program 1021, which computer program 1021 may comprise code/computer-executable instructions that, when executed by the processor 1010, cause the processor 1010 to perform a method according to an embodiment of the disclosure, or any variant thereof.
The computer program 1021 may be configured with computer program code, for example, comprising computer program modules. For example, in an example embodiment, code in computer program 1021 may include one or more program modules, including, for example, 1021A, modules 1021B, … …. It should be noted that the division and number of modules are not fixed, and those skilled in the art may use suitable program modules or program module combinations according to actual situations, and when the program modules are executed by the processor 1010, the processor 1010 may execute the method according to the embodiment of the present disclosure or any variation thereof.
According to an embodiment of the present disclosure, at least one of the data enhancement module 810, the edge detection module 820, the morphology processing module 830, the model training module 840, the model cutting unit 910, and the model compression unit 920 may be implemented as a computer program module as described with reference to fig. 10, which, when executed by the processor 1010, may implement the respective operations described above.
The present disclosure also provides a computer-readable storage medium, which may be contained in the apparatus/device/system described in the above embodiments; or may exist separately and not be assembled into the device/apparatus/system. The computer readable storage medium carries one or more programs which, when executed, implement a neural network model training method for image processing according to an embodiment of the present disclosure.
According to embodiments of the present disclosure, the computer-readable storage medium may be a non-volatile computer-readable storage medium, which may include, for example but is not limited to: a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present disclosure, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.
Embodiments of the present disclosure also include a computer program product comprising a computer program containing program code for performing the method illustrated in the flow chart. When the computer program product runs in a computer system, the program code is used for causing the computer system to realize the neural network model training method for image processing provided by the embodiment of the disclosure.
In accordance with embodiments of the present disclosure, program code for executing computer programs provided by embodiments of the present disclosure may be written in any combination of one or more programming languages, and in particular, these computer programs may be implemented using high level procedural and/or object oriented programming languages, and/or assembly/machine languages. The programming language includes, but is not limited to, programming languages such as Java, C + +, python, the "C" language, or the like. The program code may execute entirely on the user's computing device, partly on the user's device, partly on a remote computing device, or entirely on the remote computing device or server. In the case of a remote computing device, the remote computing device may be connected to the user computing device through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computing device (e.g., through the internet using an internet service provider).
The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams or flowchart illustration, and combinations of blocks in the block diagrams or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
Those skilled in the art will appreciate that various combinations and/or combinations of features recited in the various embodiments and/or claims of the present disclosure can be made, even if such combinations or combinations are not expressly recited in the present disclosure. In particular, various combinations and/or combinations of the features recited in the various embodiments and/or claims of the present disclosure may be made without departing from the spirit or teaching of the present disclosure. All such combinations and/or associations are within the scope of the present disclosure.
The embodiments of the present disclosure have been described above. However, these examples are for illustrative purposes only and are not intended to limit the scope of the present disclosure. Although the embodiments are described separately above, this does not mean that the measures in the embodiments cannot be used in advantageous combination. The scope of the disclosure is defined by the appended claims and equivalents thereof. Various alternatives and modifications can be devised by those skilled in the art without departing from the scope of the present disclosure, and such alternatives and modifications are intended to be within the scope of the present disclosure.