Image processing method, image processing apparatus, terminal, and readable storage medium
1. An image processing method, comprising:
inputting an obtained original image into a segmentation model to obtain at least one frame of first image and at least one frame of second image, wherein the original image comprises a portrait, the first image is a portrait segmentation image comprising a portrait area and a first background area, the second image comprises at least one of a part segmentation image and an associated object segmentation image, the part segmentation image comprises a human body part area and a second background area, and the associated object segmentation image comprises an associated object area and a third background area; and
and according to the first image and/or the second image of at least one frame, carrying out image processing on the original image to obtain a target image.
2. The image processing method according to claim 1, characterized in that the image processing method further comprises:
performing first preprocessing on the original image to obtain a preprocessed image;
inputting the acquired original image into a segmentation model to acquire at least one frame of first image and at least one frame of second image, including:
and inputting the preprocessed image into a segmentation model to obtain at least one frame of the first image and at least one frame of the second image.
3. The image processing method according to claim 1, wherein one frame of the second image includes a part segmentation image, a human body part region in the part segmentation image is a first hair region, the portrait region includes a second hair region, and the first hair region in the part segmentation image is different from the second hair region in the portrait segmentation image;
the image processing the original image according to the first image and/or the at least one frame of the second image to obtain a target image includes:
fusing a first hair region in the part segmentation image into a second hair region in the portrait segmentation image to obtain a third image; and
and carrying out image processing on the original image according to the third image to obtain a target image.
4. The image processing method according to claim 1, wherein one frame of the second image includes a part-segmented image in which a human body part region is a face region,
the image processing the original image according to the first image and/or the at least one frame of the second image to obtain a target image includes:
according to the human face area in the part segmentation image, acquiring the human face area in the original image; and
and carrying out image processing on the human face area in the original image to obtain a target image.
5. The image processing method according to claim 1, wherein one frame of the second image includes an association segmentation image,
the image processing the original image according to the first image and/or the at least one frame of the second image to obtain a target image includes:
obtaining a related object area in the original image according to the related object area in the related object segmentation image; and
and performing image processing on the related object area in the original image to acquire a target image.
6. The image processing method according to claim 1, wherein one frame of the second image includes a part-divided image, the part region of the human body in the part-divided image is a first hair region, the portrait region includes a second hair region, and the first hair region in the part-divided image is different from the second hair region in the portrait divided image, the image processing method further comprising:
identifying a hair region in the original image and a non-hair region in a portrait of an input segmentation model;
performing gray processing on the original image to obtain a gray image of the original image, wherein the gray image of the original image comprises a first area, a second area and a third area, the first area is a part of a region corresponding to a hair region in the original image, the gray value of which is greater than a first threshold value, the second area is a part of the region corresponding to the hair region in the original image, the gray value of which is greater than a second threshold value, and the third area is a part of the region corresponding to a non-hair region in the original image, the gray value of which is greater than a third threshold value;
and acquiring the first image according to the first area and the third area of the gray image of the original image, and acquiring the part segmentation image of which the human body part area is hair according to the second area.
7. The image processing method according to claim 1, characterized in that the image processing method further comprises:
acquiring a sample image set, wherein the sample image set comprises a plurality of sample images, and each sample image comprises a portrait;
performing second preprocessing on each sample image to obtain a first mask image and at least one frame of second mask image, wherein the first mask image comprises a portrait area and a background area, and the second mask image comprises a human body part area and a background area and/or comprises a related area and a background area;
inputting the sample image into the initial model to obtain a first training image and at least one frame of second training image, wherein the first training image comprises a portrait area and a background area, and the second training image comprises a human body part area and a background area and/or comprises a related object area and a background area;
calculating a value of a total loss function of the initial model according to the first mask image, the second mask image, the first training image and the second training image;
and performing iterative training on the initial model according to the value of the total loss function to obtain the segmentation model.
8. The method of claim 7, wherein the calculating the value of the total loss function of the initial model from the first mask image, the second mask image, the first training image, and the second training image comprises:
calculating a value of a first loss function according to the first training image and the first mask image;
calculating a value of a second loss function according to the second training image and the corresponding second mask image;
and calculating the value of the total loss function according to the value of the first loss function, the value of the second loss function, a first weight corresponding to the first loss function and a second weight corresponding to the second loss function.
9. The image processing method according to claim 7,
when the human body region in the second mask image is a hair region, performing second preprocessing on each sample image to obtain a first mask image and at least one frame of second mask image, including:
identifying hair regions in the sample image and non-hair regions in a portrait;
performing gray processing on the sample image to obtain a gray image, wherein the gray image comprises a first area, a second area and a third area, the first area is a part of a region corresponding to a hair region in the sample image, the gray value of which is greater than a first threshold value, the second area is a part of the region corresponding to the hair region in the sample image, the gray value of which is greater than a second threshold value, and the third area is a part of the region corresponding to a non-hair region in the sample image, the gray value of which is greater than a third threshold value;
and acquiring the first mask image according to the first area and the third area of the gray scale image, and acquiring the second mask image comprising the hair area according to the second area.
10. An image processing apparatus characterized by comprising:
the image input module is used for inputting an obtained original image into a segmentation model so as to obtain at least one frame of first image and at least one frame of second image, wherein the original image comprises a portrait, the first image is a portrait segmentation image comprising a portrait area and a first background area, the second image comprises at least one of a part segmentation image and a related object segmentation image, the part segmentation image comprises a human body part area and a second background area, and the related object segmentation image comprises a related object area and a third background area; and
and the processing module is used for carrying out image processing on the original image according to the first image and/or the second image of at least one frame so as to obtain a target image.
11. A terminal, comprising:
a housing; and
one or more processors in combination with the housing, the one or more processors configured to perform the image processing method of any of claims 1-9.
12. A non-transitory computer-readable storage medium containing a computer program, wherein the computer program, when executed by a processor, causes the processor to perform the image processing method of any one of claims 1 to 9.
Background
At present, a portrait region in an image is usually obtained by adopting a semantic segmentation mode or a matting sectional drawing mode, the semantic segmentation mainly comprises multi-class segmentation, and the details of the segmented image are rough; the matting sectional drawing is mainly used for enhancing some edge details and adding a transition zone. However, in any of the methods, only a single portrait area can be obtained, and when image processing such as background blurring and face beautifying is performed on an image containing a portrait, different segmentation requirements are generally required on different parts of a human body, which is difficult to achieve by a conventional portrait segmentation model.
Disclosure of Invention
The embodiment of the application provides an image processing method, an image processing device, a terminal and a non-volatile computer readable storage medium.
The embodiment of the application provides an image processing method. The image processing method comprises the steps of inputting an obtained original image into a segmentation model to obtain at least one frame of first image and at least one frame of second image, wherein the original image comprises a portrait, the first image is a portrait segmentation image comprising a portrait area and a first background area, the second image comprises at least one of a part segmentation image and an associated object segmentation image, the part segmentation image comprises a human body part area and a second background area, and the associated object segmentation image comprises an associated object area and a third background area; and according to the first image and/or the second image of at least one frame, carrying out image processing on the original image to obtain a target image.
The embodiment of the application also provides an image processing device. The image processing device comprises an input module and a processing module. The image input module is used for inputting the acquired original image into a segmentation model so as to acquire at least one frame of first image and at least one frame of second image, wherein the original image comprises a portrait, the first image is a portrait segmentation image comprising a portrait area and a first background area, the second image comprises at least one of a part segmentation image and an associated object segmentation image, the part segmentation image comprises a human body part area and a second background area, and the associated object segmentation image comprises an associated object area and a third background area. And the image processing module performs image processing on the original image according to the first image and/or the second image of at least one frame to obtain a target image.
The embodiment of the application also provides a terminal. The terminal includes a housing and one or more processors. One or more of the processors are associated with the housing. The one or more processors are configured to input an acquired original image into a segmentation model to acquire at least one frame of first image and at least one frame of second image, where the original image includes a portrait, the first image is a portrait segmentation image including a portrait region and a first background region, the second image includes at least one of a part segmentation image and an associated object segmentation image, the part segmentation image includes a human body part region and a second background region, and the associated object segmentation image includes an associated object region and a third background region; and according to the first image and/or the second image of at least one frame, carrying out image processing on the original image to obtain a target image.
The embodiment of the application also provides a nonvolatile computer readable storage medium containing the computer program. The computer program, when executed by a processor, causes the processor to perform an image processing method described below. The image processing method comprises the steps of inputting an obtained original image into a segmentation model to obtain at least one frame of first image and at least one frame of second image, wherein the original image comprises a portrait, the first image is a portrait segmentation image comprising a portrait area and a first background area, the second image comprises at least one of a part segmentation image and a related object segmentation image, the part segmentation image comprises a human body part area and a second background area, and the related object segmentation image comprises a related object area and a third background area; and according to the first image and/or the second image of at least one frame, carrying out image processing on the original image to obtain a target image.
In the image processing method, the image processing apparatus, the terminal and the non-volatile computer-readable storage medium according to the embodiments of the present application, the original image including the portrait is preprocessed and then input into the segmentation model, the segmentation model can simultaneously output the portrait segmentation image, the part segmentation images of different parts in the portrait and/or the related object segmentation images of the related object, and compared with the case of acquiring a single portrait region, a plurality of segmentation images can acquire the portrait region, the human body part region and/or the related object region, so that the original image is subsequently processed according to the portrait segmentation image, the part segmentation image and/or the related object image.
Additional aspects and advantages of embodiments of the present application will be set forth in part in the description which follows and, in part, will be obvious from the description, or may be learned by practice of the present application.
Drawings
The above and/or additional aspects and advantages of the present application will become apparent and readily appreciated from the following description of the embodiments, taken in conjunction with the accompanying drawings of which:
FIG. 1 is a schematic flow chart diagram of an image processing method in some embodiments of the present application;
FIG. 2 is a schematic diagram of an image processing apparatus according to some embodiments of the present disclosure;
FIG. 3 is a block diagram of a terminal in some embodiments of the present application;
FIGS. 4 and 5 are schematic flow charts of image processing methods according to some embodiments of the present disclosure;
FIG. 6 is a schematic diagram of a raw image taken in a horizontal shot in some embodiments of the present application;
FIG. 7 is a schematic diagram of an original image taken in a portrait mode in some embodiments of the present application;
FIG. 8 is a schematic illustration of a rotation of a horizontal shot to a vertical shot in certain embodiments of the present application;
FIGS. 9-10 are schematic diagrams of segmentation models segmenting a pre-processed image in some embodiments of the present application;
FIG. 11 is a schematic flow chart diagram of an image processing method in some embodiments of the present application;
FIG. 12 is a schematic diagram illustrating the acquisition of a third image based on a first image and a second image according to some embodiments of the present disclosure;
FIGS. 13-16 are schematic flow charts of image processing methods according to certain embodiments of the present disclosure;
FIG. 17 is a schematic illustration of a first mask image and a second mask image captured in some embodiments of the present application;
FIG. 18 is a schematic flow chart diagram of an image processing method in some embodiments of the present application;
FIG. 19 is a schematic diagram illustrating the acquisition of a first mask image and a second mask image according to some embodiments of the present disclosure;
FIGS. 20-21 are schematic diagrams of an initial model segmentation sample image in some embodiments of the present application;
FIGS. 22 and 23 are schematic flow charts of image processing methods according to some embodiments of the present disclosure;
FIG. 24 is a schematic diagram illustrating the acquisition of a first image and a second image according to some embodiments of the present disclosure;
FIG. 25 is a schematic diagram of an interaction between a non-volatile computer-readable storage medium and a processor in some embodiments of the present application.
Detailed Description
Reference will now be made in detail to embodiments of the present application, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to the same or similar elements or elements having the same or similar function throughout. The embodiments described below by referring to the drawings are exemplary only for the purpose of explaining the embodiments of the present application, and are not to be construed as limiting the embodiments of the present application.
Referring to fig. 1, an embodiment of the present application provides an image processing method. The image processing method comprises the following steps:
02: inputting the obtained original image into a segmentation model to obtain at least one frame of first image and at least one frame of second image, wherein the original image comprises a portrait, the first image is a portrait segmentation image comprising a portrait area and a first background area, the second image comprises at least one of a part segmentation image and an associated object segmentation image, the part segmentation image comprises a human body part area and a second background area, and the associated object segmentation image comprises an associated object area and a third background area; and
03: and according to the first image and/or the at least one frame of second image, carrying out image processing on the original image to obtain a target image.
Referring to fig. 1 and fig. 2, an image processing apparatus 100 is further provided in the present embodiment. The image processing apparatus 10 is input to a module 12 and a processing module 13. Step 02 of the image processing method can be implemented by the input module 12, and step 03 can be implemented by the processing module 13. That is to say, the input module 12 is configured to input the acquired original image into the segmentation model to acquire at least one frame of first image and at least one frame of second image, where the original image includes a portrait, the first image is a portrait segmentation image including a portrait region and a first background region, the second image includes at least one of a part segmentation image and an associated object segmentation image, the part segmentation image includes a human body part region and a second background region, and the associated object segmentation image includes an associated object region and a third background region; the processing module 13 is configured to perform image processing on the original image according to the first image and/or the at least one frame of second image to obtain a target image.
Referring to fig. 1 and 3, the present invention further provides a terminal 100, where the terminal 100 includes a housing 20 and one or more processors 30, and the one or more processors 30 are combined with the housing 20. Steps 02 and 03 of the above-described image processing method may also be implemented by one or more processors 20. That is, the one or more processors 30 are configured to input the acquired original image into the segmentation model to acquire at least one frame of first image and at least one frame of second image, where the original image includes a portrait, the first image is a portrait segmentation image including a portrait region and a first background region, the second image includes at least one of a part segmentation image and an associated object segmentation image, the part segmentation image includes a human body part region and a second background region, and the associated object segmentation image includes an associated object region and a third background region; and according to the first image and/or the at least one frame of second image, carrying out image processing on the original image to obtain a target image. It should be noted that the terminal 100 may be a mobile phone, a camera, a notebook computer, or an intelligent wearable device, and the following embodiments only use the terminal 100 as a mobile phone for description.
In any of the embodiments, after the acquired original image is input to the segmentation model, the acquired second images may all be part segmentation images; alternatively, the second images may be both related object segmentation images; alternatively, the divided region image included in the second images of the plurality of frames may also include the related object divided image. If the second image includes the multi-frame part segmentation image, the human body part region in each frame part segmentation image may be different human body parts (for example, the second image includes two frames of human body part segmentation images, where the human body part region in one frame part segmentation image is a hair region, and the human body part region in the other frame part segmentation image is a face region).
In the image processing method, the image processing apparatus 10 and the terminal 100 in the embodiment of the application, the original image including the portrait is preprocessed and then input into the segmentation model, the segmentation model can simultaneously output the portrait segmentation image, the part segmentation images of different parts in the portrait and/or the related object segmentation images of the related objects, and compared with the case of acquiring a single portrait area, the multiple segmentation images can acquire the portrait area, the human body part area and/or the related object area, so that the image processing of the original image according to the portrait segmentation image, the part segmentation image and/or the related object segmentation image is facilitated.
In an example, the terminal 100 or the image processing apparatus 10 may further include an imaging module 40, and the terminal 100 or the image processing apparatus 10 may acquire an image of a person through the imaging module 40 to obtain an original image; in another example, the terminal 100 or the image processing apparatus 10 may further include a memory 50, the memory 50 stores an original image containing a portrait in advance, and the processor 30 of the terminal 100 may obtain the original image from the memory 50; in still another example, the terminal 100 or the image processing apparatus 10 may also acquire an original image including a portrait through input of a user. The specific method for acquiring the original image is not limited herein, and the acquired original image needs to contain a portrait.
The image input to the segmentation model needs to meet the requirements of the input image, that is, the segmentation model may have some requirements on the properties of the input image, and the input image should meet the requirements, and the segmentation model can be correctly processed. Therefore, after the original image containing the portrait is acquired, the original image needs to be processed before the segmentation model is input. Referring to fig. 4, in some embodiments, the image processing method further includes:
01: performing first preprocessing on an original image to obtain a preprocessed image;
step 02: inputting an original image obtained by preprocessing the image into a segmentation model to obtain at least a first image and at least one frame of second image, comprising:
021: the preprocessed image is input to a segmentation model to obtain at least one frame of first image and at least one frame of second image.
Referring to fig. 2, in some embodiments, the image processing apparatus 100 further includes a first preprocessing module 11. Step 01 can be implemented by the first preprocessing module 11, and step 021 can be implemented by the input module 12. That is, the first pre-processing module 11 is further configured to perform a first pre-processing on the original image to obtain a pre-processed image. The input module 12 is further configured to input the preprocessed image into the segmentation model to obtain at least one frame of the first image and at least one frame of the second image.
Referring to fig. 3, in some embodiments, step 01 and step 021 can be implemented by one or more processors 30. That is, the processor 30 is further configured to perform a first pre-processing on the original image to obtain a pre-processed image; and inputting the preprocessed image into the segmentation model to obtain at least one frame of first image and at least one frame of second image.
The original image is subjected to the first preprocessing to obtain the preprocessed image, and the preprocessed image can meet the requirements of the segmentation model on the input image. In this way, after the preprocessed image is input into the segmentation model, the segmentation model can correctly process the preprocessed image. Specifically, referring to fig. 1 and 5, in some embodiments, step 01: performing a first pre-processing on the acquired original image to obtain a pre-processed image, comprising:
011: detecting whether the original image is a horizontally shot image or a vertically shot image;
012: if the original image is a horizontal shot image, rotating the original image to change the original image into a vertical shot image; performing normalization processing on the rotated original image to obtain a preprocessed image;
013: and if the original image is a vertical shot image, performing normalization processing on the original image to obtain a preprocessed image.
Referring to fig. 2, in some embodiments, step 011, step 012 and step 013 can be performed by the first preprocessing module 11. That is, the first preprocessing module 11 is further configured to detect whether the original image is a horizontally shot image or a vertically shot image; if the original image is a horizontal shot image, rotating the original image to change the original image into a vertical shot image; performing normalization processing on the rotated original image to obtain a preprocessed image; and if the original image is a vertical shot image, performing normalization processing on the original image to obtain a preprocessed image.
Referring to fig. 3, in some embodiments, step 011, step 012, and step 013 can be performed by one or more processors 30. That is, the processor 30 is also used to detect whether the original image is horizontally shot or vertically shot; if the original image is a horizontal shot image, rotating the original image to change the original image into a vertical shot image; performing normalization processing on the rotated original image to obtain a preprocessed image; and if the original image is a vertical shot image, performing normalization processing on the original image to obtain a preprocessed image.
Illustratively, after an original image containing a portrait is acquired, whether the original image is a landscape shot image or a portrait shot image is detected. If the terminal 100 or the image processing apparatus 10 is in the horizontal shooting mode when the current original image is shot, the current original image is a horizontal shooting image; if the terminal 100 or the image processing apparatus 10 is in the portrait mode when the current original image is photographed, the current original image is a portrait image. In some embodiments, whether the original image is a cross-shot or a portrait can be detected by the width and height of the original image. For example, the terminal 100 captures a current original image. As shown in fig. 6 and 7, the terminal 100 includes a first side 101 and a second side 102 adjacent to each other, and the length of the first side 101 is longer than that of the second side 102. The first side 101 is the long side of the terminal 100 and the second side 102 is the wide side of the terminal 100. And acquiring the width w and the height h of the original image, wherein the length of one side of the original image parallel to the long side of the terminal 100 is the height h of the original image, and the length of one side of the original image parallel to the wide side of the terminal 100 is the width w of the original image. If the width w of the original image is greater than the height h (as shown in fig. 6), the original image is a cross shot image; if the width w of the original image is smaller than the height h (as shown in fig. 7), the original image is a portrait image.
Referring to fig. 8, when the current image is determined to be a horizontally shot image, the original image is rotated to be a vertically shot image, and normalization processing is performed on the rotated original image to obtain a preprocessed image, which is beneficial for the segmentation model to correctly process the preprocessed image. For example, in one example, normalization may be performed by dividing the pixel values of all pixels in the rotated original image by 255; in another example, the normalization may also be performed by dividing the difference of the pixel values of all pixels in the rotated original image minus 127.5 by 127.5, which is not limited herein. Further, in some embodiments, after the horizontal shot image serving as the original image is rotated into the vertical shot image, the original image after rotation is scaled to a preset size, and then the scaled original image is normalized. Wherein the preset size is the size of the input image required for segmenting the model.
When the current image is determined to be a vertical shot image, normalization processing is directly performed on the original image to obtain a preprocessed image, so that the segmentation model can correctly process the preprocessed image. The specific way of normalization is the same as the above-mentioned specific way of normalization of the rotated original image, and is not described herein again. Of course, in some embodiments, the vertical shot image as the original image may be scaled to a preset size, and then the scaled original image may be normalized.
Referring to fig. 9, the partition model 200 may include an encoder 201, a first decoder 202, and at least one second decoder 203. The encoder 201 can perform convolution and pooling on the preprocessed image input to the segmentation model 200 a plurality of times to obtain a feature map, where the feature map includes portrait feature information and at least one of human body part feature information and related object feature information. The first decoder 202 is configured to obtain a first image including a portrait area according to portrait feature information in the feature map and output the first image, and the second decoder 203 is configured to obtain a second image including a corresponding human body area according to human body part feature information in the feature map and output the second image; and/or acquiring a second image containing the related object region according to the related object feature information in the feature map, and outputting the second image.
In some embodiments, the second image obtained after the preprocessed image is input into the segmentation model corresponds to the human body part feature information and the related object feature information included in the feature map obtained by the second decoder 203 and the encoder 201. For example, in one example, the segmentation model 200 can input three second images, wherein one second image is a first human body part segmentation image, one second image is a second human body part segmentation image, and the other second image is a related object segmentation image. At this time, the feature map acquired by the decoder 201 further includes at least first human body part feature information, second human body part feature information, and related object feature information. The segmentation model 200 comprises three second decoders 202, wherein one second decoder 202 can acquire a second image for segmenting the image for the first human body part according to the feature information of the first human body part in the feature map; a second decoder 202 capable of obtaining a second image of the segmented image of the second body part based on the second body part feature information in the feature map; another second decoder 202 can acquire a second image for segmenting the image for the related object according to the related object feature information in the feature map.
It should be noted that the portrait area includes the whole area of the portrait in the image, including but not limited to a face, a hair, limbs, a trunk, and the like, but it should be noted that the portrait area does not include the related area. The human body part region is a region of one part of the portrait in the image, for example, the human body part region may be a face region of the portrait in the image; alternatively, it may be a hair region of a portrait in an image; or, a hand region of a figure in the image, etc. The related object is an object related to the portrait in the image, the related object may be an object carried by the portrait in the image, such as a handheld object, a backpack on the back, a waist pack on the waist, and the like, the related object area is an area where the related object is located, and the portrait area is an area where the portrait (including a face, a hair, four limbs, a trunk, and the like) is located; the human body part region is a region where a human body part is located, for example, if the human body part is a human face, the human body part region is a human face region, if the human body part is a human eye, the human body part region is a human eye region, and if the human body part is a human hand, the human body part region is a human hand region; the area of the related object is the area where the related object is located, such as the area where a hand-held object is located, the area where a backpack is located, or the area where a waist pack is located. In addition, the first image is a portrait segmentation image including a portrait area and a first background area. For example, as shown in fig. 10, a white area in the first image represents a portrait area and a black area represents a first background area. That is, the regions of the first image other than the portrait region are the first background region. The second image includes at least one of a region segmentation image and a related object segmentation image. Wherein, the white area in the part segmentation image represents the human body part area, and the black area represents the second background area. That is, the regions other than the human body part region in the part segmentation image are the second background regions. The white area in the relevance segmentation image represents the relevance area, and the black area represents the third background area. That is, the regions other than the related object region in the related object segmentation map are all the third background regions.
After the at least one frame of first image and the at least one frame of second image are acquired, the original image may be subjected to image processing according to the first image and/or the second image to acquire a target image.
In some embodiments, the original image may be image processed to obtain the target image based on the first image including the portrait area. Specifically, in some embodiments, the processor 30 (or the processing module 13) acquires the portrait area in the original image according to the portrait area in the first image (i.e., the white area portion in the first image of fig. 10). Illustratively, in one example, if the original image is scaled when the first preprocessing is performed, the first image is enlarged to the same size as the size of the original image; if the original image is rotated during the first preprocessing, the first image is reversely rotated. For example, if the original image is rotated by 90 ° to the left when the first preprocessing is performed, the first image is rotated by 90 ° to the right so that the first image corresponds to the original image. After the first image corresponds to the original image, the position in the original image corresponding to the portrait area in the first image (i.e. the white area in the first image in fig. 10) is the portrait area in the original image. Of course, the portrait area in the original image may also be obtained according to the portrait area in the first image in other manners, which is not limited herein. After acquiring the portrait area of the original image, the processor 30 (or the processing module 13) may perform background blurring on an area of the original image except the portrait area; alternatively, the processor 30 (or the processing module 13) may perform beautification processing on the portrait area in the original image; still alternatively, the processor 30 (or the processing module 13) may also extract the portrait area in the original image from the original image and place the extracted portrait area in another frame of image to generate a new image containing the portrait in the original image. The number of output first images may be one frame or a plurality of frames. When outputting the plurality of frames of first images, the user can freely select one frame of first image from the plurality of frames of first images to process the original image so as to obtain the target image. Therefore, the obtained target image can meet the requirements of users. Or, in some embodiments, the multiple frames of first images may also be fused to obtain a fused first image, and the original image is processed according to the fused first image to obtain the target image, which is not limited herein.
Referring to fig. 1 and 11, in some embodiments, the original image may be processed according to the first image and the second image to obtain the target image. For example, in some embodiments, the one-frame second image includes a region segmentation image, the human body region in the region segmentation image is a first hair region H1 (shown in the second image in fig. 12), the portrait region includes a second hair region H2 (shown in the first image in fig. 12), and the first hair region H1 in the region segmentation image is different from the second hair region in the portrait segmentation image. Step 03: according to the first image and/or at least one frame of second image, the image processing is carried out on the original image to obtain a target image, and the method comprises the following steps:
031: fusing a first hair region in the part segmentation image into a second hair region in the portrait segmentation image to obtain a third image; and
032: and carrying out image processing on the original image according to the third image to obtain a target image.
Referring to fig. 2, in some embodiments, the steps 031 and 032 can be implemented by the processing module 13. That is, the processing module 13 is configured to fuse the first hair region H1 (shown in the second image in fig. 12) in the region-segmented image into the second hair region H2 (shown in the first image in fig. 12) in the portrait segmented image to obtain the third image; and carrying out image processing on the original image according to the third image to obtain a target image.
Referring to fig. 3, in some embodiments, the steps 031 and 032 can be implemented by one or more processors 30. That is, the processor 30 is further configured to fuse the first hair region H1 (shown in the second image in fig. 12) in the region-segmented image into the second hair region H H2 (shown in the first image in fig. 12) in the portrait segmented image to obtain the third image; and carrying out image processing on the original image according to the third image to obtain a target image.
Specifically, referring to fig. 12, in some embodiments, one frame of the second image includes a part-segmented image, and the human body part area in the part-segmented image is the first head area H1. At this time, the decoder 201 in the segmentation model 200 acquires feature information including hair in the feature image, and a second decoder 203 is used to acquire a part segmentation image including a hair region according to the feature information of hair. The portrait area in the first image also includes a second hair area H2, and the first hair area H1 in the part-divided image is different from the second hair area H2 in the portrait-divided image. Further, in some embodiments, hairs of the second hair region H2 in the portrait region in the first image converge more inward (toward the face) than hairs of the first hair region H1 in the part segmentation image; the hairs of the first hair region H1 in the region-segmented image are more spread outward (toward the second background region) than the hairs of the second hair region H2 in the portrait region in the first image.
After the first image and the part-segmented image including the first hair region H1 are obtained, the first hair region H1 in the part-segmented image is fused into the second hair region H2 in the portrait segmented image to obtain a third image. As shown in fig. 12, the human image area in the first image is a white area, and the other area in the second image is a black area (first background area), and it is understood that the pixel value of the pixel in the white area is 255 and the pixel value in the black area is 0. That is, in the first image, the pixel value of the pixel point located in the portrait area is 255, and the pixel value of the pixel point located outside the portrait area is 0. Similarly, in the segmented image including the first hair region H1, the pixel values of the pixels located in the first hair region H1 are 255, and the pixel values of the pixels located in the regions other than the first hair region H1 are 0.
Referring to fig. 12, in some embodiments, the pixel values of the pixels in the segmented image including the first hair region H1 are sequentially obtained, and when the obtained pixel value of a certain pixel is 255, that is, the pixel is located in the first hair region H1 in the segmented image, the pixel value of the pixel located in the same position as the pixel in the first image is obtained. If the pixel value of the pixel point in the first image, which is located at the same position as the pixel point, is 255, it indicates that the pixel point at the corresponding position in the first image is located in the portrait area, and at this time, the pixel value of the next pixel point in the part segmentation image containing the first head area H1 is continuously obtained; if the pixel value of the pixel point located at the same position as the pixel point in the first image is 0, it is indicated that the pixel point at the corresponding position in the first image is not located in the human image region, the pixel value of the pixel point at the corresponding position in the first image is updated to 255, then the pixel value of the next pixel point in the part segmentation image containing the first hair region H1 is continuously obtained until all the pixel points in the part segmentation image containing the first hair region H1 are traversed, and the updated first image is used as a third image. The third image comprises a portrait area and a first background area after fusion, and the portrait area in the third image is obtained by fusing the first hair area H1 in the part segmentation image into the second hair area H2 in the portrait segmentation image, so that hairs of a part, close to a human body, of the hair area in the portrait area in the third image are inwardly convergent, hairs of a part, far away from the human body, are outwardly divergent, and the hair area is favorably and more finely extracted, that is, the hair area in the third image can be more closely attached to the actual hair area in the original image, and therefore the accuracy of the portrait area in the third image is improved.
After acquiring the third image, processor 30 performs image processing on the original image according to the third image to acquire a target image. Specifically, in some embodiments, the processor 30 (or the processing module 13) acquires the portrait area in the original image according to the portrait area in the third image (i.e., the white area portion in the third image), and the specific acquisition implementation is the same as that of the embodiment that acquires the portrait area in the original image according to the portrait area in the first image, which is not repeated herein. Compared with the method for acquiring the portrait area in the original image only according to the first image, the method for acquiring the portrait area in the original image has the advantages that the edge of the portrait area in the original image is more attached to the actual edge of the portrait in the original image, and accuracy of acquiring the portrait area in the original image is improved. Similarly, after acquiring the portrait area of the original image, the processor 30 (or the processing module 13) may perform background blurring processing on the area of the original image except for the portrait area; alternatively, the processor 30 (or the processing module 13) may perform beautification processing on the portrait area in the original image; still alternatively, the processor 30 (or the processing module 13) may also extract the portrait area in the original image from the original image and place the extracted portrait area in another frame of image to generate a new image containing the portrait in the original image.
It should be noted that, in some embodiments, the processor 30 (or the processing module 13) may further perform connected component analysis on the portrait area in the first image, and remove an area smaller than a certain threshold, so as to avoid false detection of a non-portrait area in the original image as the portrait area.
Referring to fig. 1 and 13, in some embodiments, the original image line may be image-processed according to the second image to obtain the target image. Illustratively, in some embodiments, wherein the one frame second image includes a part segmentation image, the human body part region in the part segmentation image is a human face region, step 03: according to the first image and/or at least one frame of second image, the image processing is carried out on the original image to obtain a target image, and the method comprises the following steps:
033: segmenting a face region in the image according to the part, and acquiring the face region in the original image;
034: and carrying out image processing on the human face area in the original image to obtain a target image.
Referring to fig. 2, in some embodiments, the steps 033 and 034 may be implemented by the processing module 13. That is, the processing module 13 is further configured to segment a face region in the image according to the part, and acquire the face region in the original image; and carrying out image processing on the human face region in the original image to obtain a target image.
Referring to fig. 3, in some embodiments, the steps 033 and 034 may be implemented by one or more processors 30. That is, the processor 30 is further configured to segment a face region in the image according to the part, and obtain the face region in the original image; and carrying out image processing on the human face region in the original image to obtain a target image.
Specifically, in some embodiments, one frame of the second image includes a part-segmented image, and as shown in the second image from top to bottom in fig. 10, the human body part region in the part-segmented image is a face region. At this time, the decoder 201 in the segmentation model 200 acquires feature information of a face included in the feature image, and a second decoder 203 is used to acquire a region segmentation image including a face region according to the feature information of the face. The processor 30 (or the processing module 13) acquires the face region in the original image from the region-segmented image including the face region (i.e., the white region portion in the region-segmented image of the face region in fig. 10). Illustratively, in one example, if the original image is zoomed in the first preprocessing, the part segmentation image including the face region is enlarged to the same size as the original image; and if the original image is rotated during the first preprocessing, reversely rotating the part segmentation image containing the human face area. For example, when the original image is rotated by 90 ° to the left in the first preprocessing, the region-segmented image including the face region is rotated by 90 ° to the right so that the region-segmented image including the face region corresponds to the original image. After the part-segmented image containing the face region corresponds to the original image, the position in the original image corresponding to the face region in the part-segmented image containing the face region (i.e., the white region in the part-segmented image of the face region in fig. 10) is the face region in the original image. Of course, the face region in the original image may also be obtained by segmenting the face region in the image according to the portion including the face region in other manners, which is not limited herein. Therefore, the face area in the original image can be directly obtained, and the subsequent processing of the face part in the original image is facilitated. After acquiring the face region of the original image, the processor 30 (or the processing module 13) may perform image processing on the face in the original image to acquire a target image. For example, the processor 30 (or the processing module 13) may perform a beautifying operation on the face portion, or the processor 30 (or the processing module 13) may also extract a face region in the original image from the original image and place the face region in another frame image to generate a new image containing the face in the original image. It should be noted that in some embodiments, the processor 30 (or the processing module 13) performs erosion operations to eliminate possible edge false detections, such as hair. That is, the false detection of a non-face region in the original image as a face region is avoided.
Referring to fig. 1 and 14, in some embodiments, one of the frames of the second image includes an association segmentation image, and step 03: according to the first image and/or at least one frame of second image, the image processing is carried out on the original image to obtain a target image, and the method comprises the following steps:
035: dividing the related object area in the image according to the related object to obtain the related object area in the original image; and
036: and performing image processing on the related object area in the original image to acquire a target image.
Referring to fig. 2, in some embodiments, the above steps 035 and 036 can be implemented by the processing module 13. That is, the processing module 13 is further configured to segment the related object area in the image according to the related object, and obtain the related object area in the original image; and carrying out image processing on the related object area in the original image to acquire a target image.
Referring to fig. 3, in some embodiments, the above steps 035 and 036 can be implemented by the processor 30. That is, the processor 30 is further configured to segment the related object region in the image according to the related object, and obtain the related object region in the original image; and carrying out image processing on the related object area in the original image to acquire a target image.
Specifically, in some embodiments, there is one frame of the second image comprising the related object segmentation image, as shown in the third second image from top to bottom in fig. 10. At this time, the decoder 201 in the segmentation model 200 acquires feature information including a related object in the feature image, and one second decoder 203 is used to acquire a related object segmentation image including a related object area according to the feature information of the related object. The processor 30 (or the processing module 13) acquires the related object area in the original image according to the related object segmentation image including the related object area (i.e., the white area portion in the related object segmentation image). In one example, if the original image is zoomed at the time of the first preprocessing, the related object segmentation image including the related object region is enlarged to the same size as the original image; when the original image is rotated during the first preprocessing, the related object segmentation image including the related object region is reversely rotated. For example, when the original image is rotated by 90 ° to the left in the first preprocessing, the related object segmented image including the related object region is rotated by 90 ° to the right so that the related object segmented image including the related object region corresponds to the original image. After the related object segmented image including the related object region is associated with the original image, the position in the original image corresponding to the related object region in the related object segmented image including the related object region (i.e., the white region in the related object segmented image of fig. 10) is the related object region in the original image. Of course, the related object region in the original image may also be obtained according to the related object region in the related object segmentation image in other manners, which is not limited herein. After the processor 30 acquires the related object region, the processor 30 performs image processing on the related object region in the original image to acquire a target image. Likewise, in some embodiments, the processor 30 (or processing module 13) may blur the regions of interest in the original image; alternatively, the processor 30 (or the processing module 13) may also extract the related object area in the original image from the original image and place the related object area in another frame image to generate a new image containing the related object in the original image; or; the processor 30 (or the processing module 13) may also replace or cover the related area in the original image with another object to generate a new image including the portrait in the original image and the replaced or covered object.
Referring to fig. 15, in some embodiments, the image processing method may further include:
04: and inputting a plurality of sample images in the sample image set into the initial model for training so as to obtain a segmentation model.
Referring to fig. 2, in some embodiments, the image processing apparatus 10 may further include a training module 14, and step 04 may be implemented by the training module 14. That is, the training module 14 is configured to input a plurality of sample images in the sample image set to the initial model for training to obtain the segmentation model.
Referring to fig. 3, in some embodiments, step 04 may also be implemented by one or more processors 30. That is, the one or more processors 30 are further configured to input a plurality of sample images in the sample image set to the initial model for training to obtain the segmentation model.
In some embodiments, in order to enable the images output by the segmentation model to achieve the expected effect, the initial model needs to be set in advance, a plurality of sample images in the sample image set are input to the initial model, and the initial model is subjected to a large amount of training to obtain the segmentation model. In one example, step 04 may be performed by the user during the use of the image processing apparatus 10 or the terminal 100, in such a way that the segmentation model used can be highly adaptive to the current use scenario, and the accuracy of the image processing image can be ensured. In another example, step 04 may be performed by the user before using the image processing apparatus 10 or the terminal 100, and the acquired segmentation model may be stored in the memory 50 in advance and called directly from the memory 50 during the use of the image processing apparatus 10 or the terminal 100 by the user, such that the use of the segmentation model can reduce the processing amount for calculating the segmentation model and improve the processing efficiency of the whole image processing.
Specifically, referring to fig. 15 and 16, in some embodiments, step 04: inputting a plurality of sample images in a sample image set into an initial model for training to obtain a segmentation model, wherein the method comprises the following steps:
041: acquiring a sample image set, wherein the sample image set comprises a plurality of sample images, and each sample image comprises a portrait;
042: performing second preprocessing on each sample image to obtain a first mask image and at least one frame of second mask image, wherein the first mask image comprises a portrait area and a background area, and the second mask image comprises a human body part area and a background area and/or comprises a related area and a background area;
043: inputting the sample image into an initial model to obtain a first training image and at least one frame of second training image, wherein the first training image comprises a portrait area and a background area, and the second training image comprises a human body part area and a background area and/or comprises a related object area and a background area;
044: calculating the value of the total loss function of the initial model according to the first mask image, the second mask image, the first training image and the second training image;
045: and performing iterative training on the initial model according to the value of the total loss function to obtain a segmentation model.
Referring to fig. 2, step 041, step 042, step 043, step 044 and step 045 may be implemented by the training module 14. That is, training module 14 is configured to obtain a sample image set, where the sample image set includes a plurality of sample images, and each sample image includes a human image; performing second preprocessing on each sample image to obtain a first mask image and at least one frame of second mask image, wherein the first mask image comprises a portrait area and a background area, and the second mask image comprises a human body part area and a background area and/or comprises a related area and a background area; inputting the sample image into an initial model to obtain a first training image and at least one frame of second training image, wherein the first training image comprises a portrait area and a background area, and the second training image comprises a human body part area and a background area and/or comprises a related object area and a background area; calculating the value of the total loss function of the initial model according to the first mask image, the second mask image, the first training image and the second training image; and performing iterative training on the initial model according to the value of the total loss function to obtain a segmentation model.
Referring to fig. 3, step 041, step 042, step 043, step 044 and step 045 may be implemented by one or more processors 30. That is, the one or more processors 30 are further configured to obtain a sample image set, the sample image set including a plurality of sample images, each sample image including a portrait; performing second preprocessing on each sample image to obtain a first mask image and at least one frame of second mask image, wherein the first mask image comprises a portrait area and a background area, and the second mask image comprises a human body part area and a background area and/or comprises a related area and a background area; inputting the sample image into an initial model to obtain a first training image and at least one frame of second training image, wherein the first training image comprises a portrait area and a background area, and the second training image comprises a human body part area and a background area and/or comprises a related object area and a background area; calculating the value of the total loss function of the initial model according to the first mask image, the second mask image, the first training image and the second training image; and performing iterative training on the initial model according to the value of the total loss function to obtain a segmentation model.
Specifically, processor 30 (or training module 14) acquires a sample image set, where the sample image set includes a plurality of sample images, and each sample image includes a portrait. And carrying out second preprocessing on each sample image to obtain a first mask image and at least one frame of second mask image. As shown in fig. 17, the first mask image includes a portrait area (the white area in the first mask image represents the portrait area) and a background area (the black area in the first mask image represents the background area), and the second mask image includes a human body part area (the white area in the second mask image represents the human body part area) and a background area (the black area in the second mask image represents the background area); and/or the second mask image further comprises an area of interest (the white area in the second mask image characterizes the area of interest) and a background area (the black area in the second mask image characterizes the background area).
In an example, each frame of the second mask image includes a human body region and a background region; or, in another example, each frame of the second mask image includes a related object region and a background region; alternatively, in another example, the partial second mask image includes a human body part region and a background region, and the partial second mask image includes a related region and a background region. More specifically, at least part of the second mask images of the plurality of frames need to correspond to the second image, and the segmentation model trained in this way can output the second image required by the user. For example, after the preprocessed image is input into the segmentation model, the obtained second image includes a frame of part segmentation image including a hair region, a frame of part segmentation image including a face region, and a frame of related object segmentation image including a related object region; the second mask image also includes at least one mask image containing hair regions, one mask image containing face regions, and one mask image containing associations. In addition, each sample image is labeled with a human figure region, a plurality of human body part regions (e.g., a hair region, a human face region, a trunk region, etc.), and a related object region. The processor 30 (or the training module 14) can obtain the corresponding mask image according to the labeled region, for example, taking obtaining the first mask image as an example, the processor 30 (or the training module 14) sequentially obtains the pixels in the sample image, if the current pixel is located in the labeled portrait region in the sample image, the pixel value of the pixel at the position corresponding to the first mask image and the current pixel is set to 255 (that is, the pixel at the position corresponding to the first mask image and the current pixel is set to white), and if the current pixel is not located in the labeled portrait region in the sample image, the pixel value of the pixel at the position corresponding to the first mask image and the current pixel is set to 0 (that is, the pixel at the position corresponding to the first mask image and the current pixel is set to black).
Further, referring to fig. 16 and 18, in some embodiments, when the human body part region in the second mask image is a hair region, that is, when there is a frame that the second mask image includes a human body part and the human body part is a hair region (the first second mask image from top to bottom in fig. 17), step 042: performing a second preprocessing on each sample image to obtain a first mask image and at least one frame of a second mask image, including:
0421: identifying a hair region in the sample image and a non-hair region in the portrait;
0422: performing gray processing on the sample image to obtain a gray image, wherein the gray image comprises a first area, a second area and a third area, the first area is a part of the sample image, corresponding to the hair area, of which the gray value is greater than a first threshold value, the second area is a part of the sample image, corresponding to the hair area, of which the gray value is greater than a second threshold value, and the third area is a part of the sample image, corresponding to the non-hair area, of which the gray value is greater than a third threshold value;
0423: a first mask image is acquired according to the first region and the third region of the gray scale image, and a second mask image including a hair region is acquired according to the second region.
Referring to fig. 2, in some embodiments, steps 0421, 0422 and 0423 may be implemented by training module 14. That is, the training module 14 is also used to identify hair regions in the sample image and non-hair regions in the portrait; performing gray processing on the sample image to obtain a gray image, wherein the gray image comprises a first area, a second area and a third area, the first area is a part of the sample image, corresponding to the hair area, of which the gray value is greater than a first threshold value, the second area is a part of the sample image, corresponding to the hair area, of which the gray value is greater than a second threshold value, and the third area is a part of the sample image, corresponding to the non-hair area, of which the gray value is greater than a third threshold value; and acquiring a first mask image according to the first area and the third area of the gray scale image, and acquiring a second mask image comprising a hair area according to the second area.
Referring to fig. 3, in some embodiments, steps 0421, 0422 and 0423 may be implemented by one or more processors 30. That is, the processor 30 is also used to identify hair regions in the sample image and non-hair regions in the portrait; performing gray processing on the sample image to obtain a gray image, wherein the gray image comprises a first area, a second area and a third area, the first area is a part of the sample image, corresponding to the hair area, of which the gray value is greater than a first threshold value, the second area is a part of the sample image, corresponding to the hair area, of which the gray value is greater than a second threshold value, and the third area is a part of the sample image, corresponding to the non-hair area, of which the gray value is greater than a third threshold value; and acquiring a first mask image according to the first area and the third area of the gray scale image, and acquiring a second mask image comprising a hair area according to the second area.
Referring to fig. 19, when the second mask image includes a human body part, and the human body part is a hair region, in some embodiments, the hair region a in the sample image and the non-hair region B in the portrait image are identified according to the labels in the sample image. The processor 30 (or the training module 14) performs gray processing on the sample image to obtain a gray image, where the gray image includes a first region a1, a second region a2, and a third region B1, the first region a1 is a portion of the sample image corresponding to the hair region a, where the gray value is greater than a first threshold, the second region a2 is a portion of the sample image corresponding to the hair region a, where the gray value is greater than a second threshold, and the third region B1 is a portion of the sample image corresponding to the non-hair region B, where the gray value is greater than a third threshold. Specifically, after obtaining the gray-scale image, the processor 30 (or the training module 14) sequentially obtains gray-scale values of all pixel points in the gray-scale image, and if, in the sample image, a pixel point at a position corresponding to the current pixel point is located in the hair region a and the gray-scale value of the current pixel point is greater than the first threshold, sets the current pixel point in the first region a 1; if the pixel point at the position corresponding to the current pixel point in the sample image is located in the hair area a and the gray value of the current pixel point is greater than the second threshold value, the current pixel point is set in the second area a 2; if the pixel point at the position corresponding to the current pixel point in the sample image is located in the non-hair region B in the portrait and the gray value of the current pixel point is greater than the third threshold, the current pixel point is located in the third region B1.
After the first region a1, the second region a2, and the third region b1 are acquired, a first mask image including a portrait region is acquired from the first region a1 and the third region b1, and a second mask image including a hair region is acquired from the second region a 2. For example, the processor 30 (or the training module 14) sequentially obtains all the pixel points in the grayscale image, and if the current pixel point is located in the first region a1 or the third region b1, the pixel value of the pixel point at the position corresponding to the current pixel point in the first mask image is set to 255 (that is, the pixel point at the position corresponding to the current pixel point in the first mask image is set to white); if the current pixel is not located in the first region a1 or the third region b1, the pixel value of the pixel at the position corresponding to the current pixel in the first mask image is set to 0 (i.e., the pixel at the position corresponding to the current pixel in the first mask image is set to black), so that the first mask image including the portrait region is obtained according to the first region a1 and the third region b 1. Similarly, the processor 30 (or the training module 14) sequentially obtains all the pixel points in the gray-scale image, and if the current pixel point is located in the second region a2, the pixel value of the pixel point at the position corresponding to the current pixel point of the second mask image is set to 255 (that is, the pixel point at the position corresponding to the current pixel point of the second mask image is set to white); if the current pixel point is not located in the second area a2, the pixel value of the pixel point at the position corresponding to the current pixel point of the second mask image is set to 0 (i.e. the pixel point at the position corresponding to the current pixel point of the second mask image is set to black), so that the second mask image including the hair area is obtained according to the second area a 2.
Because the pixel points of the gray value between the first threshold value and the second threshold value in the hair region in the sample image are not in the hair region of the portrait in the first mask image, the hairs in the hair region in the portrait region in the first mask image can be inwards converged; the pixel points of which the gray values in the hair area in the sample image are larger than the second threshold smaller than the first threshold are all in the hair area of the second mask image, so that the hair area of the second mask image can be expanded outwards, and the method is favorable for obtaining the first image with the inward hair area and the second image with the outward hair area by the trained segmentation model. It should be noted that the first threshold is larger than the second threshold. In some embodiments, the second threshold may be equal to the third threshold. In particular, in one example, the first threshold may be 190, and the second threshold is equal to the third threshold and is 127.5.
After each frame of sample image is subjected to second preprocessing, namely, after a first mask image and a second mask image of each frame of sample image are obtained, the sample images are input to the initial model to obtain a first training image and at least one frame of second training image. The first training image comprises a portrait area (white parts in the first training image represent the portrait area) and a background area (black parts in the first training image represent the background area). The second training image comprises at least one of a part segmentation training image and a related object segmentation training image, wherein the part segmentation training image comprises a human body part region (a white part in the part segmentation training image represents the human body part region) and a background region (a black part in the part segmentation training image represents a background region); the related object training image comprises a related object area (a white part in the related object training image represents the related object area) and a background area (a black part in the related object training image represents the background area). In some embodiments, in order to enable a sample image input into an initial model to meet a requirement of an input image of the initial model, first preprocessing is performed on the sample image (including detecting whether the sample image is a horizontally shot image or a vertically shot image, and if the sample image is the horizontally shot image, the sample image is rotated into the vertically shot image, normalization processing is performed on the rotated sample image, and if the sample image is the vertically shot image, the normalization processing is directly performed on the sample image), and then the sample image subjected to the first preprocessing is input into the initial model. The specific implementation of the first preprocessing on the sample image is the same as the specific implementation of the first preprocessing on the original image in the foregoing embodiment, and details are not repeated here.
Referring to fig. 20 and 21, the initial model 300 may include an initial encoder 301, a first initial decoder 302, and at least one second initial decoder 303. The initial encoder 301 is configured to perform convolution and pooling on the input sample image multiple times to obtain a feature map, where the feature map includes portrait feature information and at least one of human body part feature information and related object feature information. The first initial decoder 202 is configured to obtain a first training image including a portrait region according to portrait feature information in the feature map and output the first training image, and the second initial decoder 203 is configured to obtain a second training image including a corresponding human body region according to human body part feature information in the feature map and output the second training image; and/or acquiring a second training image containing the related object region according to the related object feature information in the feature map, and outputting the second training image.
After the first training image and the second training image are obtained, the processor 30 (or the training module 14) calculates a value of a total loss function of the initial model according to the first mask image, the second mask image, the first training image, and the second training image. Specifically, referring to fig. 16 and 22, in some embodiments, step 044: calculating the value of the total loss function of the initial model according to the first mask image, the second mask image, the first training image and the second training image, wherein the value of the total loss function of the initial model comprises the following steps:
0441: calculating a value of a first loss function according to the first training image and the first mask image;
0442: calculating a value of a second loss function according to the second training image and the corresponding second mask image;
0443: the value of the total loss function is calculated from the value of the first loss function, the value of the second loss function, a first weight corresponding to the first loss function, and a second weight corresponding to the second loss function.
Referring to fig. 2, in some embodiments, steps 0441, 0442, and 0443 may be implemented by training module 14. That is, the training module 14 is further configured to calculate a value of the first loss function according to the first training image and the first mask image; calculating a value of a second loss function according to the second training image and the corresponding second mask image; and calculating the value of the total loss function according to the value of the first loss function, the value of the second loss function, a first weight corresponding to the first loss function and a second weight corresponding to the second loss function.
Referring to fig. 3, in some embodiments, steps 0441, 0442 and 0443 may be implemented by one or more processors 30. That is, the one or more processors 30 are also configured to calculate a value of a first loss function from the first training image and the first mask image; calculating a value of a second loss function according to the second training image and the corresponding second mask image; and calculating the value of the total loss function according to the value of the first loss function, the value of the second loss function, a first weight corresponding to the first loss function and a second weight corresponding to the second loss function.
Specifically, the first training image is acquiredAfter imaging the second training image, processor 30 (or training module 14) calculates a value of a first loss function L1 based on the first training image and the first mask image, and calculates a value of a second loss function L2 based on the second training image and the corresponding second mask image. The following description will take an example in which the second training image includes two frame part segmentation training images and one frame related object segmentation training image, where one frame part segmentation training image is a hair region segmentation training image, and the other frame part segmentation training image is a face region segmentation training image. In this case, the second mask image also includes a second mask image including a hair region, a second mask image including a face region, and a second mask image including a related object, and the value of the second loss function L2 includes a loss function L of the hair regionhairValue of (d), loss function L of face regionfaceValue of (d) and loss function L of related areahandThe value of (c). Processor 30 (or training module 14) calculates a value of a first loss function L1 based on a difference between the region of the artifact in the first training image and the region of the artifact in the first mask image; calculating a loss function L of the hair region based on a difference between the hair region in the hair region segmentation training image and the hair region in the second mask image including the hair regionhairA value of (d); calculating a loss function L of the face region according to the difference between the face region in the face region segmentation training image and the face region in the second mask image containing the face regionfaceA value of (d); calculating a loss function L of the related object region from a difference between the related object region in the related object region segmentation training image and the related object region in the second mask image including the related object regionhandThe value of (c).
After obtaining the value of the first loss function L1 and the value of the second loss function L2, the processor 30 (or training module 14) derives a first weight α corresponding to the first loss function L1 from the value of the first loss function L1, the value of the second loss function L2, and the value of the second loss function L21And a second weight alpha corresponding to a second loss function L22The value of the total Loss function Loss is calculated. In some embodiments, the value of the total Loss function Loss is equal to the first Loss function L1 and the corresponding first weight α1And a plurality of second loss functions L2 and corresponding second weights α2The sum of the products of (a). For example, including the loss function L for a hair region at the value of the second loss function L2hairValue of (d), loss function L of face regionfaceValue of (d) and loss function L of related areahandThe values of (A) are described as examples. The second weight α 2 comprises the weight α of the hair regionhairWeight alpha of face regionfaceAnd weight alpha of related object regionhandAnd the weight alpha of the hair regionhairLoss function L from hair regionhairWeight alpha of corresponding face regionfaceLoss function L from face regionfaceCorrespondence and weight α of related object regionhandLoss function L with related areahandCorrespondingly, the value of the total Loss function Loss can be calculated by the formula Loss ═ α1L1+αhairLhair+αfaceLface+αhandLhandAnd (6) calculating.
After the value of the total Loss function Loss of the initial model is obtained, the initial model can be iteratively trained according to the value of the total Loss function Loss to obtain a final segmentation model. In some embodiments, the initial model may be iteratively trained by using an Adam optimizer according to the total Loss function Loss until the Loss value of the output result of the initial model converges, and the model at this time is saved as a trained segmentation model. The Adam optimizer combines the advantages of two optimization algorithms, namely adaptive gradient and RMSProp, comprehensively considers First Moment Estimation (mean value of gradient) and Second Moment Estimation (non-centralized variance of gradient) of the gradient, and calculates the update step length.
It should be noted that the termination condition of the iterative training may include: the number of times of iterative training reaches the target number of times; or the total loss value of the output result of the initial model meets the set convergence condition. In one example, the convergence condition is to make the total loss value as small as possible, and the initial learning rate 1e-3 is used, the learning rate decreases with the cosine of the step number, the batch _ size is 8, and after 16 epochs are trained, the convergence is considered to be completed. Where batch _ size may be understood as a batch parameter, its limit is the total number of samples in the training set, epoch refers to the number of times the entire data set is trained using all samples in the training set, colloquially the value of epoch is the number of times the entire data set is cycled, 1 epoch equals 1 training time using all samples in the training set. In another example, the total Loss value Loss satisfying the set convergence condition may include: the total Loss value Loss is less than the set threshold. Of course, the specific setting conditions may not be limiting.
In some embodiments, the total Loss function Loss is equal to the first Loss function L1 and the corresponding first weight α1And a plurality of second loss functions L2 and corresponding second weights α2The sum of the products of (a). The total Loss may be designed differently according to the specific requirements of each category, for example, in the above embodiment, the total Loss is calculated by the formula of α ═ α1L1+αhairLhair+αfaceLface+αhandLhandAnd (6) calculating. In one example, if the accuracy of the face region is higher, the loss function L of the face region can be determinedfaceWeight alpha of corresponding face regionfaceAnd (5) adjusting the height. In another example, assume a loss function L when a hair region is losthairWhen the first value is the first value, the hair area in the output second training image containing the hair area is relatively smooth, and the loss function L of the loss of the hair area ishairAnd when the value is the second value, the hair region in the output second training image containing the hair region is sparse. If it is desired to obtain a relatively smooth hair region in the region segmentation map including the hair region, the loss function L for the loss of the hair region is satisfied as necessaryhairFor the first value, the initial model is iteratively trained, that is, the loss function L of the loss of the hair region is obtained when the segmentation model is obtained after the iterative training of the initial model is completedhairIs a first value. Of course, the total Loss may also be designed differently according to the specific requirements of each category, and is not illustrated here.
In some embodiments, the trained segmentation model may be stored locally in the terminal 100 (or the image processing apparatus 10), for example, in the memory 50, and the trained segmentation model may also be stored in a server communicatively connected to the terminal 100 (or the image processing apparatus 10), so that the storage space occupied by the terminal 100 (or the image processing apparatus 10) may be reduced, and the operation efficiency of the terminal 100 (or the image processing apparatus 10) may be improved. Of course, in some embodiments, the segmentation model may also periodically or aperiodically acquire new training data, train and update the segmentation model. For example, when there is a portrait image that is segmented by mistake, the portrait image can be used as a sample image, the sample image is labeled, and then training is performed through the above training method, so that the accuracy of the portrait segmentation model can be improved.
It should be noted that, in some embodiments, the initial encoder 301 may be first placed in a single-output pre-trained segmentation model (the single-output pre-trained segmentation model includes only one decoder) for training, and the specific training process is the same as the above training process (inputting a sample image into the training segmentation model, obtaining a loss according to a difference between an annotation in the sample image and a result output by the training segmentation model, and iteratively updating the training segmentation model according to the loss). After the initial encoder 301 completes training in the single-output pre-trained segmentation model, the trained initial encoder 301 is introduced into the initial model, and is trained together with the first decoder 202 and the second decoder 203 to obtain the segmentation model. The initial encoder 301 is placed into a single-output pre-trained segmentation model for training, and then the trained initial encoder 301 is guided into the initial model for re-training, so that the training difficulty is reduced compared with the method of directly training the initial model, and the segmentation effect of the obtained segmentation model on the input image is facilitated.
In some embodiments, the segmentation model may not be the one described in the above embodiments, and the gray threshold processing may be directly performed on the original image input to the segmentation model, so as to obtain at least one frame of the first image and at least one frame of the second image. Specifically, referring to fig. 23, in which one frame of the second image includes a part-segmented image, a human body part region in the part-segmented image is a first hair region H1 (shown in the second image in fig. 12), a portrait region includes a second hair region H2 (shown in the first image in fig. 12), and the first hair region H1 in the part-segmented image is different from the second hair region in the portrait segmented image, the image processing method further includes:
022: identifying a hair region in an original image of an input segmentation model and a non-hair region in a portrait;
023: performing gray processing on the original image to obtain a gray image of the original image, wherein the gray image of the original image comprises a first area, a second area and a third area, the first area is a part of the original image, corresponding to a hair area, of which the gray value is greater than a first threshold value, the second area is a part of the original image, corresponding to the hair area, of which the gray value is greater than a second threshold value, and the third area is a part of the original image, corresponding to a non-hair area, of which the gray value is greater than a third threshold value;
024: and acquiring a first image according to the first area and the third area of the gray-scale image of the original image, and acquiring a part segmentation image of which the human body part area is hair according to the second area.
Referring to fig. 2, in some cases, step 022, step 023 and step 024 can be implemented by the input module 12. That is, the input module 12 is configured to identify a hair region in the original image of the input segmentation model and a non-hair region in the portrait; performing gray processing on the original image to obtain a gray image of the original image, wherein the gray image of the original image comprises a first area, a second area and a third area, the first area is a part of the original image, corresponding to a hair area, of which the gray value is greater than a first threshold value, the second area is a part of the original image, corresponding to the hair area, of which the gray value is greater than a second threshold value, and the third area is a part of the original image, corresponding to a non-hair area, of which the gray value is greater than a third threshold value; and acquiring a first image according to the first area and the third area of the gray scale image of the original image, and acquiring a part segmentation image of the human body part area as hair according to the second area.
Referring to fig. 3, in some embodiments, steps 022, 023, and 024 can be implemented by one or more processors 30, that is, the one or more processors 30 are further configured to identify hair regions in the original image of the input segmentation model and non-hair regions in the portrait; performing gray processing on the original image to obtain a gray image of the original image, wherein the gray image of the original image comprises a first area, a second area and a third area, the first area is a part of the original image, corresponding to a hair area, of which the gray value is greater than a first threshold value, the second area is a part of the original image, corresponding to the hair area, of which the gray value is greater than a second threshold value, and the third area is a part of the original image, corresponding to a non-hair area, of which the gray value is greater than a third threshold value; and acquiring a first image according to the first area and the third area of the gray-scale image of the original image, and acquiring a part segmentation image of which the human body part area is hair according to the second area.
Specifically, referring to fig. 24, in some embodiments, the processor 30 first identifies the original image in the input segmentation model to identify the hair region and the non-hair region of the original image. It should be noted that, in some embodiments, the original image may be sequentially input into two encoding and decoding networks in the form of encoder-decoder, where one encoding and decoding network in the form of encoder-decoder is capable of outputting a segmented image containing a non-hair region of a portrait in the original image; another codec network in the form of an encoder-decoder is capable of outputting a segmented image containing the hair regions of the portrait in the original image. And then acquiring the non-hair region of the portrait in the original image according to the segmentation image containing the non-hair region of the portrait in the original image, and acquiring the hair region of the portrait in the original image according to the segmentation image containing the hair region of the portrait in the original image. Alternatively, in some embodiments, since the color of the hair region in the portrait of the original image is different from the color of the non-hair region, the segmentation model can identify the hair region and the non-hair region in the portrait according to the color difference in the original image. Alternatively, in some embodiments, the hair region and the non-hair region of the portrait in the original image may be preliminarily identified by the user (for example, the user roughly frames the outline of the hair region and the non-hair region in the portrait), and the segmentation model identifies the hair region and the non-hair region in the portrait according to the identification of the user. Of course, other ways to identify the hair region and the non-hair region of the portrait in the original image may also be used, which are not limited herein.
Performing gray processing on an original image input with the segmentation model to obtain a gray image of the original image, wherein the gray image of the original image comprises a first region a1, a second region a2 and a third region B1, the first region a1 is a portion, corresponding to a hair region a in the original image, of which the gray value is greater than a first threshold value, the second region a2 is a portion, corresponding to the hair region a in the original image, of which the gray value is greater than a second threshold value, and the third region B1 is a portion, corresponding to a non-hair region B in the original image, of which the gray value is greater than a third threshold value. Specifically, after obtaining the gray scale image of the original image, the processor 30 (or the training module 14) sequentially obtains the gray scale values of all the pixel points in the gray scale image, and if the pixel point at the position corresponding to the current pixel point in the original image is located in the hair region a and the gray scale value of the current pixel point is greater than the first threshold, the current pixel point is set in the first region a 1; if the pixel point at the position corresponding to the current pixel point in the original image is located in the hair area a and the gray value of the current pixel point is greater than the second threshold value, the current pixel point is set in the second area a 2; if the pixel point at the position corresponding to the current pixel point in the original image is located in the non-hair region B in the portrait and the gray value of the current pixel point is greater than the third threshold value, the current pixel point is located in the third region B1.
After the first region a1, the second region a2, and the third region b1 of the grayscale image of the original image are acquired, a first image including a portrait region is acquired from the first region a1 and the third region b1, and a part segmentation image in which a human body part is hair is acquired from the second region a 2. For example, the processor 30 (or the training module 14) sequentially obtains all the pixel points in the grayscale image of the original image, and if the current pixel point is located in the first region a1 or the third region b1, sets the pixel value of the pixel point in the first image at the position corresponding to the current pixel point to 255 (that is, sets the pixel point in the first image at the position corresponding to the current pixel point to white); if the current pixel point is not located in the first region a1 or the third region b1, the pixel value of the pixel point at the position corresponding to the first image and the current pixel point is set to 0 (i.e., the pixel point at the position corresponding to the first image and the current pixel point is set to black), so that the first image including the portrait region is obtained according to the first region a1 and the third region b 1. Similarly, the processor 30 (or the training module 14) sequentially obtains all the pixel points in the gray-scale image of the original image, and if the current pixel point is located in the second region a2, the pixel value of the pixel point at the position corresponding to the current pixel point of the second image is set to 255 (that is, the pixel point at the position corresponding to the current pixel point of the second image is set to white); if the current pixel point is not located in the second region a2, the pixel value of the pixel point at the position corresponding to the second image and the current pixel point is set to 0 (i.e. the pixel point at the position corresponding to the second image and the current pixel point is set to black), so that the second image including the hair region is obtained according to the second region a2, that is, the part segmentation image in which the human body part region is the hair is obtained.
Because the pixel points of the gray value between the first threshold value and the second threshold value in the hair area in the original image are not in the hair area of the portrait in the first mask image, the hairs in the hair area in the portrait area in the first image can be inwards converged; the pixel points of which the gray values in the hair area in the original image are larger than the second threshold value smaller than the first threshold value are all in the hair area of the second image, so that the hair area of the second image can be expanded outwards. It should be noted that the first threshold is larger than the second threshold. In some embodiments, the second threshold may be equal to the third threshold. In particular, in one example, the first threshold may be 190, and the second threshold is equal to the third threshold and is 127.5. Referring to fig. 25, the present application also provides a non-transitory computer readable storage medium 400 containing a computer program 410. The computer program, when executed by the processor 60, causes the processor 60 to perform the image processing method of any of the above embodiments.
Referring to fig. 1, for example, when the computer program 410 is executed by the processor 60, the processor 30 is caused to perform the methods in 01, 02, 03, 04, 011, 012, 013, 021, 022, 023, 024, 031, 032, 033, 034, 035, 036, 041, 042, 043, 044, 045, 0421, 0422, 0423, 0441, 0442, and 0443. For example, the following image processing method is performed
02: inputting the obtained original image into a segmentation model to obtain at least one frame of first image and at least one frame of second image, wherein the original image comprises a portrait, the first image is a portrait segmentation image comprising a portrait area and a first background area, the second image comprises at least one of a part segmentation image and an associated object segmentation image, the part segmentation image comprises a human body part area and a second background area, and the associated object segmentation image comprises an associated object area and a third background area; and
03: and according to the first image and/or the at least one frame of second image, carrying out image processing on the original image to obtain a target image.
It should be noted that the processor 60 may be disposed in the terminal 100, that is, the processor 60 and the processor 30 are the same processor, and of course, the processor 60 may not be disposed in the terminal 100, that is, the processor 60 and the processor 30 are not the same processor, which is not limited herein.
In the description herein, reference to the description of the terms "one embodiment," "some embodiments," "an illustrative embodiment," "an example," "a specific example" or "some examples" or the like means that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the application. In this specification, schematic representations of the above terms do not necessarily refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples. Furthermore, various embodiments or examples and features of different embodiments or examples described in this specification can be combined and combined by one skilled in the art without contradiction.
Any process or method descriptions in flow charts or otherwise described herein may be understood as representing modules, segments, or portions of code which include one or more executable instructions for implementing specific logical functions or steps of the process, and the scope of the preferred embodiments of the present application includes other implementations in which functions may be executed out of order from that shown or discussed, including substantially concurrently or in reverse order, depending on the functionality involved, as would be understood by those reasonably skilled in the art of the present application.
Although embodiments of the present application have been shown and described above, it is to be understood that the above embodiments are exemplary and not to be construed as limiting the present application, and that changes, modifications, substitutions and alterations can be made to the above embodiments by those of ordinary skill in the art within the scope of the present application.
- 上一篇:石墨接头机器人自动装卡簧、装栓机
- 下一篇:用于图像自动化测试的方法及装置、设备