Image background replacing method, system, electronic equipment and storage medium
1. An image background replacement method, comprising:
obtaining the foreground prediction probability of each pixel point in the first image;
selecting a point with the foreground prediction probability close to the characteristic value as an uncertain point;
fusing the feature maps of the uncertain points in different scales to obtain a foreground map with clear boundaries and an alpha mask;
and synthesizing the background image to be replaced with the foreground image and the alpha mask to obtain a second image under a new background.
2. The image background replacement method according to claim 1, wherein the obtaining of the foreground prediction probability of each pixel point in the first image comprises:
acquiring a first image;
inputting the first image into a convolutional neural network to obtain a characteristic map of the image;
and inputting the characteristic graph into a pyramid pooling module to obtain the foreground prediction probability of each pixel point.
3. The image background replacement method of claim 2, wherein the inputting the feature map into a pyramid pooling module comprises:
a multi-layer characteristic pyramid is constructed,
inputting the characteristic diagram into each layer of pyramid for pooling in different scales;
and after upsampling, full connection layer and activation function processing, obtaining the foreground prediction probability of each pixel point.
4. The image background replacement method according to claim 1, wherein the selecting a point at which the foreground prediction probability is close to the feature value as an uncertain point further comprises:
and (5) carrying out deconvolution processing to obtain a rough segmentation feature map.
5. The image background replacement method according to claim 4, wherein the selecting the point where the foreground prediction probability is close to the feature value as the uncertain point comprises:
performing up-sampling on the boundary of the roughly segmented characteristic diagram by using bilinear interpolation;
obtaining the foreground prediction probability of the boundary pixel points;
a point at which the probability is close to the feature value is selected as the uncertainty point.
6. The image background replacement method according to claim 5, wherein the foreground prediction probability is derived from a formula of a matting algorithm:
I=αF+(1-α)B
wherein I is an image, F is the foreground of I, alpha is the foreground prediction probability of the pixel, and B is the background of I.
7. The image background replacement method according to claim 6, wherein the point at which the foreground prediction probability is close to a feature value is selected as the feature value of the uncertain point, and the feature value is 0.5.
8. The image background replacement method according to claim 3, wherein the fusing the feature maps of the uncertain points in different scales to obtain a foreground map with a clear boundary and an alpha mask comprises:
inputting the characteristic diagrams into each layer of pyramid for pooling in different scales to obtain characteristic diagrams in different scales;
and fusing the feature maps of the uncertain points in different scales to obtain a feature map with clear boundaries.
9. The image background replacement method according to claim 8, wherein the fusing the feature maps of the uncertain point at different scales further comprises:
and decoding to respectively obtain a foreground image with clear boundaries and an alpha mask.
10. The image background replacement method according to claim 9, wherein the decoding to obtain a foreground image with a clear boundary comprises:
carrying out up-sampling, convolution, BN and ReLU processing on the bilinear difference value to obtain a predicted characteristic diagram;
and performing bilinear difference value upsampling, convolution, BN, ReLU, mirror surface Padding and convolution on the predicted feature map and the feature map with clear boundary to obtain a foreground map with clear boundary.
11. An image background replacement method according to claim 9, wherein the decoding to obtain a clear boundary alpha mask comprises:
carrying out up-sampling, convolution, BN and ReLU processing on the bilinear difference value, and decoding;
and (5) carrying out mirror Padding, convolution and Tanh processing to obtain an alpha mask with clear boundary.
12. The image background replacement method according to claim 1, wherein the synthesizing with the foreground image and alpha mask using the background image to be replaced comprises:
processing the alpha mask and the foreground image to obtain a foreground part of a second image;
processing the alpha mask and the background image to be replaced to obtain a background part of a second image;
and superposing the foreground part and the background part to obtain a second image.
13. A method for replacing a background of a video image, comprising:
acquiring a first video image in a current video;
the image background replacement method according to any one of claims 1 to 12, wherein the first video image is subjected to background replacement to obtain a second video image in a new background.
14. The method of claim 13, wherein the obtaining the second video image in the new background comprises:
the rtmp transport protocol is used for the transmission of real-time video.
15. An image background replacement system, comprising:
the image acquisition module is used for acquiring the foreground prediction probability of each pixel point in the first image;
the selecting module is used for selecting the point of the foreground prediction probability close to the characteristic value as an uncertain point;
the fusion module is used for fusing the feature maps of the uncertain points in different scales to obtain a foreground map with clear boundaries and an alpha mask;
and the synthesis module is used for synthesizing the background image to be replaced with the foreground image and the alpha mask to obtain a second image under a new background.
16. A video image background replacement system, comprising:
the video acquisition module is used for acquiring a first video image in a current video;
a background replacement module, configured to perform background replacement on the first video image according to the image background replacement method according to any one of claims 1 to 12, to obtain a second video image in a new background.
17. An electronic device, comprising:
a processor;
a memory storing a computer executable program which, when executed by the processor, causes the processor to perform the image background replacement method of any one of claims 1 to 12 or the video image background replacement method of any one of claims 13 to 14.
18. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, implements the image background replacement method according to any one of claims 1 to 12 or the video image background replacement method according to any one of claims 13 to 14.
19. A computer program product comprising a computer program which when executed by a processor implements the image background replacement method of any one of claims 1 to 12 or the video image background replacement method of any one of claims 13 to 14.
Background
Under the demand of epidemic situation prevention and control, the remote video seat customer service of the bank becomes an important business of the bank. The traditional video seat customer service is only simple video call, and when the user considers that the personal privacy is inconvenient for the customer service to see the background of the user, the service can hardly meet the user requirement. Therefore, the background replacement of the video of the user is an improvement needed by the bank video customer service, and the portrait segmentation algorithm can realize the function.
In the current portrait segmentation algorithm, the problems of low segmentation precision, poor detail processing and the like often exist. The effect of segmenting the portrait is only one contour, and details at the edge of the portrait are easily over-segmented or under-segmented. Therefore, the improved background-matching algorithm is considered to be used for realizing portrait segmentation, and the segmented portrait is placed in a new background to achieve the purpose of background replacement.
Disclosure of Invention
Technical problem to be solved
In view of the above problems, the present disclosure provides an image background replacement method, system, electronic device, and storage medium, which are used to at least partially solve the technical problems of low edge segmentation precision, poor detail processing, and the like in the conventional image background replacement.
(II) technical scheme
One aspect of the present disclosure provides an image background replacement method, including: obtaining the foreground prediction probability of each pixel point in the first image; selecting a point with the foreground prediction probability close to the characteristic value as an uncertain point; fusing feature maps of uncertain points in different scales to obtain a foreground map with clear boundaries and an alpha mask; and synthesizing the background image to be replaced with the foreground image and the alpha mask to obtain a second image under the new background.
Further, obtaining the foreground prediction probability of each pixel point in the first image includes: acquiring a first image; inputting the first image into a convolutional neural network to obtain a characteristic diagram of the image; and inputting the feature map into a pyramid pooling module to obtain the foreground prediction probability of each pixel point.
Further, inputting the feature map into the pyramid pooling module comprises: constructing a multilayer characteristic pyramid, and inputting the characteristic diagram into each layer of pyramid to perform pooling of different scales; and after upsampling, full connection layer and activation function processing, obtaining the foreground prediction probability of each pixel point.
Further, selecting a point with the foreground prediction probability close to the feature value as an uncertain point further comprises: and (5) carrying out deconvolution processing to obtain a rough segmentation feature map.
Further, selecting a point with the foreground prediction probability close to the feature value as an uncertain point comprises: carrying out up-sampling on the boundary of the roughly segmented characteristic diagram by adopting bilinear interpolation; obtaining foreground prediction probability of boundary pixel points; a point at which the probability is close to the feature value is selected as the uncertainty point.
Further, the foreground prediction probability is derived from the formula of the matting algorithm:
I=αF+(1-α)B
wherein I is an image, F is the foreground of I, alpha is the foreground prediction probability of the pixel, and B is the background of I.
Further, a point with the foreground prediction probability close to the characteristic value is selected as the characteristic value of 0.5 in the uncertain points.
Further, fusing feature maps of uncertain points in different scales to obtain a foreground map with clear boundaries and an alpha mask comprises: inputting the feature maps into each layer of pyramid for pooling in different scales to obtain feature maps in different scales; and fusing the feature maps of the uncertain points with different scales to obtain a feature map with clear boundaries.
Further, fusing the feature maps of different scales of the uncertain points further comprises: and decoding to respectively obtain a foreground image with clear boundaries and an alpha mask.
Further, decoding to obtain a foreground image with clear boundaries includes: carrying out up-sampling, convolution, BN and ReLU processing on the bilinear difference value to obtain a predicted characteristic diagram; and performing bilinear difference value upsampling, convolution, BN, ReLU, mirror surface Padding and convolution on the predicted characteristic diagram and the characteristic diagram with clear boundary to obtain a foreground diagram with clear boundary.
Further, decoding the alpha mask with clear boundary comprises: carrying out up-sampling, convolution, BN and ReLU processing on the bilinear difference value, and decoding; and (5) carrying out mirror Padding, convolution and Tanh processing to obtain an alpha mask with clear boundary.
Further, the synthesizing using the background image to be replaced with the foreground image and the alpha mask includes: processing the alpha mask and the foreground image to obtain a foreground part of a second image; processing the alpha mask and the background image to be replaced to obtain a background part of a second image; and overlapping the foreground part and the background part to obtain a second image.
Another aspect of the present disclosure provides a method for replacing a background of a video image, including: acquiring a first video image in a current video; and carrying out background replacement on the first video image according to the image background replacement method to obtain a second video image under a new background.
Further, obtaining the second video image in the new background includes: the rtmp transport protocol is used for the transmission of real-time video.
In still another aspect of the present disclosure, an image background replacement system is provided, including: the image acquisition module is used for acquiring the foreground prediction probability of each pixel point in the first image; the selecting module is used for selecting a point with the foreground prediction probability close to the characteristic value as an uncertain point; the fusion module is used for fusing the feature maps of the uncertain points in different scales to obtain a foreground map with clear boundaries and an alpha mask; and the synthesis module is used for synthesizing the background image to be replaced, the foreground image and the alpha mask to obtain a second image under a new background.
In another aspect, the present disclosure provides a video image background replacement system, including: the video acquisition module is used for acquiring a first video image in a current video; and the background replacing module is used for replacing the background of the first video image according to the image background replacing method to obtain a second video image under a new background.
Yet another aspect of the present disclosure provides an electronic device, including: a memory, a processor and a computer program stored on the memory and executable on the processor, the processor executing the computer program to cause the processor to execute the image background replacing method or the video image background replacing method as described above.
A further aspect of the present disclosure provides a computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, implements the image background replacement method or the video image background replacement method as described above.
A further aspect of the disclosure provides a computer program product comprising a computer program which, when executed by a processor, implements an image background replacement method or a video image background replacement method as described above.
(III) advantageous effects
According to the image background replacing method, the image background replacing system, the electronic device and the storage medium, in the foreground and background segmentation process, edge contour processing is carried out on the edge part between the foreground and the background, up-sampling and feature fusion are carried out on uncertain points of the edge contour, therefore, the characteristics of the uncertain points comprise the characteristics of a large target and the characteristics of detection details, more details are reserved, and a background replacing image with higher segmentation precision is obtained. The same method can also be applied to the background replacement of the video, and a background replacement video image with higher segmentation precision is obtained.
Drawings
For a more complete understanding of the present disclosure and the advantages thereof, reference is now made to the following descriptions taken in conjunction with the accompanying drawings, in which:
fig. 1 schematically illustrates an application scenario diagram of an image background replacement method according to an embodiment of the present disclosure;
fig. 2 is a scene diagram schematically illustrating a specific application example of the image background replacement method according to the embodiment of the present disclosure;
FIG. 3 schematically illustrates a flow chart of an image background replacement method according to an embodiment of the present disclosure;
FIG. 4 schematically illustrates a flowchart of a method for obtaining a foreground prediction probability of each pixel point in a first image according to an embodiment of the present disclosure;
FIG. 5 schematically illustrates a flow diagram of a method of inputting a feature map into a pyramid pooling module according to an embodiment of the present disclosure;
FIG. 6 schematically illustrates a block diagram of a pyramid pooling module according to an embodiment of the present disclosure;
FIG. 7 schematically illustrates a flow chart of a method for selecting a point where the foreground prediction probability is close to the eigenvalue as an uncertainty point according to an embodiment of the present disclosure;
FIG. 8 schematically illustrates a flow diagram of a method of fusing differently scaled feature maps of uncertain points, in accordance with an embodiment of the disclosure;
FIG. 9 schematically illustrates a flow chart of a method of compositing with a foreground map and an alpha mask using a background image to be replaced, in accordance with an embodiment of the disclosure;
FIG. 10 schematically illustrates a flow chart of a complete method of image background replacement according to an embodiment of the present disclosure;
FIG. 11 schematically illustrates a flow diagram of edge profile processing according to an embodiment of the disclosure;
fig. 12 schematically shows a schematic diagram of an upsampling step according to an embodiment of the present disclosure;
fig. 13 schematically illustrates a schematic diagram of video background image replacement according to an embodiment of the present disclosure;
fig. 14 schematically shows a schematic diagram of real-time video transmission according to an embodiment of the present disclosure;
FIG. 15 schematically shows a flow diagram of real-time video transmission according to an embodiment of the disclosure;
FIG. 16 schematically illustrates a block diagram of an image background replacement system according to an embodiment of the present disclosure;
FIG. 17 schematically illustrates a block diagram of a video image background replacement system according to an embodiment of the present disclosure;
FIG. 18 schematically illustrates a block diagram of an electronic device suitable for implementing the above-described method according to an embodiment of the present disclosure.
Detailed Description
Hereinafter, embodiments of the present disclosure will be described with reference to the accompanying drawings. It should be understood that the description is illustrative only and is not intended to limit the scope of the present disclosure. In the following detailed description, for purposes of explanation, numerous specific details are set forth in order to provide a thorough understanding of the embodiments of the disclosure. It may be evident, however, that one or more embodiments may be practiced without these specific details. Moreover, in the following description, descriptions of well-known structures and techniques are omitted so as to not unnecessarily obscure the concepts of the present disclosure.
The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the disclosure. The terms "comprises," "comprising," and the like, as used herein, specify the presence of stated features, steps, operations, and/or components, but do not preclude the presence or addition of one or more other features, steps, operations, or components.
All terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art unless otherwise defined. It is noted that the terms used herein should be interpreted as having a meaning that is consistent with the context of this specification and should not be interpreted in an idealized or overly formal sense.
Where a convention analogous to "at least one of A, B and C, etc." is used, in general such a construction is intended in the sense one having skill in the art would understand the convention (e.g., "a system having at least one of A, B and C" would include but not be limited to systems that have a alone, B alone, C alone, a and B together, a and C together, B and C together, and/or A, B, C together, etc.). Where a convention analogous to "A, B or at least one of C, etc." is used, in general such a construction is intended in the sense one having skill in the art would understand the convention (e.g., "a system having at least one of A, B or C" would include but not be limited to systems that have a alone, B alone, C alone, a and B together, a and C together, B and C together, and/or A, B, C together, etc.).
Some block diagrams and/or flow diagrams are shown in the figures. It will be understood that some blocks of the block diagrams and/or flowchart illustrations, or combinations thereof, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the instructions, which execute via the processor, create means for implementing the functions/acts specified in the block diagrams and/or flowchart block or blocks. The techniques of this disclosure may be implemented in hardware and/or software (including firmware, microcode, etc.). In addition, the techniques of this disclosure may take the form of a computer program product on a computer-readable storage medium having instructions stored thereon for use by or in connection with an instruction execution system.
The embodiment of the disclosure provides an image background replacing method, an image background replacing system, an electronic device and a storage medium, in the image segmentation process, edge contour processing is carried out on an edge part between a foreground and a background, more details are reserved, and a background replacing image with higher segmentation precision is obtained.
Fig. 1 schematically illustrates an exemplary system architecture 100 that may be applied to the image background replacement method according to an embodiment of the present disclosure. It should be noted that fig. 1 is only an example of a system architecture to which the embodiments of the present disclosure may be applied to help those skilled in the art understand the technical content of the present disclosure, and does not mean that the embodiments of the present disclosure may not be applied to other devices, systems, environments or scenarios.
As shown in fig. 1, the system architecture 100 according to this embodiment may include terminal devices 101, 102, 103, a network 104 and a server 105. The network 104 serves as a medium for providing communication links between the terminal devices 101, 102, 103 and the server 105. Network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, to name a few.
The user may use the terminal devices 101, 102, 103 to interact with the server 105 via the network 104 to receive or send messages or the like. The terminal devices 101, 102, 103 may have installed thereon various communication client applications, such as a camera function application, a photo function application, a web browser application, a search-type application, an instant messaging tool, social platform software, etc. (by way of example only).
The terminal devices 101, 102, 103 may be various electronic devices having a display screen and supporting web browsing, including but not limited to smart phones, tablet computers, laptop portable computers, desktop computers, and the like.
The server 105 may be a server providing various services, such as a background management server (for example only) providing support for websites browsed by users using the terminal devices 101, 102, 103. The background management server may analyze and perform other processing on the received data such as the user request, and feed back a processing result (e.g., a webpage, information, or data obtained or generated according to the user request) to the terminal device.
It should be noted that the image background replacement method provided by the embodiment of the present disclosure may be generally executed by the server 105. Accordingly, the system for the image background replacement method provided by the embodiment of the present disclosure may be generally disposed in the server 105. The image background replacement method provided by the embodiment of the present disclosure may also be performed by a server or a server cluster that is different from the server 105 and is capable of communicating with the terminal devices 101, 102, 103 and/or the server 105. Accordingly, the system for the image background replacement method provided by the embodiment of the present disclosure may also be disposed in a server or a server cluster different from the server 105 and capable of communicating with the terminal devices 101, 102, 103 and/or the server 105.
It should be understood that the number of terminal devices, networks, and servers in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.
Fig. 2 schematically shows a scene diagram of a specific application example of the image background replacement method according to the embodiment of the present disclosure.
The terminal device 101, 102 or 103 may transmit the image 201 stored in itself or taken from the outside to the server 105 via the network 104, the server 105 extracts the foreground image 202 thereof, and may replace the background of the image based on the extracted foreground image, so as to obtain the image 203 after replacing the background, and transmit the image 203 to other network terminals through the network 104, and display or play the image at other network terminals.
In order to improve the effect of edge processing in portrait segmentation, a user is usually required to manually provide a sketch (Scratch) or a trigap (Trimap) for auxiliary processing in the algorithm, and a similar strategy is used in the background-matching algorithm, except that the background which is completely without foreground is used as input. It is inconvenient in practical use, on one hand, it is inconvenient to manually shoot background pictures without foreground, which brings great inconvenience to real-time processing, and on the other hand, although two pictures with foreground and background are shot under the same conditions, the conditions are too harsh. Therefore, the method for edge optimization is applied to background-matching, so that the problem that an additional background picture needs to be shot in the traditional method is solved.
Fig. 3 schematically shows a flow chart of an image background replacement method according to an embodiment of the present disclosure.
As shown in fig. 3, the image background replacement method includes:
in operation S1, a foreground prediction probability of each pixel point in the first image is obtained.
In the extraction mode, the foreground prediction probability of each pixel point in the first image needs to be obtained, when the probability of a certain pixel point is closer to 1, the pixel point is considered as a foreground, and is closer to 0, the pixel point is considered as a background, so that the pixel points of the image are simply distinguished into foreground pixels and background pixels, but the method is easy to leave a background color band at the edge of the foreground, so that the edge contour of the finally-fused image of the new background is unnatural, and the quality of the image replacing the image background is influenced.
In operation S2, a point where the foreground prediction probability is close to the feature value is selected as an uncertain point.
After the foreground prediction probability of each pixel point is obtained, selecting the pixel points of the foreground edge part to further refine the points of the edge; then the probability of the selected pixel point is neither close to 1 nor close to 0, but a point close to a certain characteristic value is taken as an uncertain point.
In operation S3, feature maps of different scales of the uncertain point are fused to obtain a foreground map with clear boundaries and an alpha mask.
Before the foreground prediction probability of the pixel point is obtained, feature extraction of different scales is carried out on each point. After the points which can not be determined as foreground pixels and background pixels are selected, namely after the uncertain points are determined, feature maps of the uncertain points in different scales are further fused, the features comprise features of a large target and features of detection details, the foreground map and the alpha mask are predicted through the features of the uncertain points, the boundary of the obtained foreground map and the alpha mask is clearer, and the quality of an image replaced by the image background is guaranteed.
In operation S4, the background image to be replaced is synthesized with the foreground image and the alpha mask to obtain a second image in the new background.
By utilizing the alpha mask, the pixel value of the pixel point in the foreground image is used as the pixel value of the pixel point in the image after replacing the background and at the position corresponding to the pixel point in the foreground image; and taking the pixel value of the pixel point at the position corresponding to the background pixel point in the background image to be replaced as the pixel value of the pixel point at the position corresponding to the background pixel point in the image after replacing the background.
Fig. 4 schematically shows a flowchart of a method for obtaining a foreground prediction probability of each pixel point in a first image according to an embodiment of the present disclosure.
As shown in fig. 4, the method for obtaining the foreground prediction probability of each pixel point in the first image includes:
in operation S11, a first image is acquired.
First, a first image to be background-replaced is acquired. The first image is not particularly limited, and may be a stored image or an image captured in real time.
In operation S12, the first image is input to a convolutional neural network to obtain a feature map of the image.
And importing the first image into a CNN network to obtain a preliminary characteristic map with a larger uncertain area.
In operation S13, the feature map is input to the pyramid pooling module to obtain the foreground prediction probability of each pixel point.
The ability to resolve context information is exploited through pyramid pooling modules using pyramid scene resolution networks (pspnets) with information sets based on different regions.
FIG. 5 schematically illustrates a flow diagram of a method of inputting a feature map into a pyramid pooling module according to an embodiment of the disclosure.
As shown in fig. 5, the method of inputting the feature map into the pyramid pooling module includes:
in operation S131, a multi-layered feature pyramid is constructed.
Referring to fig. 6, the pyramid pooling structure integrates features at four different scales, and divides the input feature map into 1 × 1, 2 × 2, 3 × 3, and 6 × 6 different sub-regions.
In operation S132, the feature map is input into each layer of pyramid for pooling in different scales.
Pooling each sub-area and finally combining the pooled individual signatures comprising position information. And outputting feature maps of different scales at different levels in the pyramid pooling module, using a convolution kernel of 1x1 after each pyramid level in order to keep the weight of the global features, and reducing the dimension of the semantic features to 1/n of the original features when the dimension of a certain level is n.
In operation S133, after performing upsampling, full link layer processing, and function activating processing, the foreground prediction probability of each pixel point is obtained.
And then, directly carrying out up-sampling on the low-dimensional characteristic diagram through bilinear interpolation to enable the low-dimensional characteristic diagram to be the same as the original characteristic diagram in scale. And then splicing the feature maps of different levels into a final pyramid pooling global feature. Wherein the size of each level in the pyramid pooling structure can be modified in relation to the size of the feature map input into the pyramid pooling layer. The structure can extract the characteristics of different sub-regions by adopting different pooling kernels. And finally, obtaining the foreground prediction probability of each pixel point after activating function processing.
On the basis of the above embodiment, selecting a point at which the foreground prediction probability is close to the feature value as an uncertain point further includes: and (5) carrying out deconvolution processing to obtain a rough segmentation feature map.
After the function is activated, the data is continuously input into a deconvolution layer to perform deconvolution treatment CONV, and as shown in fig. 6, a roughly segmented feature map is obtained.
Fig. 7 schematically shows a flowchart of a method for selecting a point with a foreground prediction probability close to a feature value as an uncertain point according to an embodiment of the disclosure.
As shown in fig. 7, the method for selecting a point with a foreground prediction probability close to the feature value as an uncertain point includes:
in operation S21, the boundaries of the coarsely segmented feature map are upsampled using bilinear interpolation.
After the rough segmented feature map is obtained, further processing is required on the rough edge profile. Firstly, if the boundary of the roughly segmented feature map is up-sampled by using a bilinear interpolation value and the pixel distribution in the image is regarded as a grid, the grid is higher in density and higher in resolution after the operation.
In operation S22, foreground prediction probabilities of the boundary pixels are obtained.
After the function is activated, the foreground prediction probability of each pixel point is obtained, and therefore the foreground prediction probability of the pixel points on the boundary can be obtained.
In operation S22, a point at which the probability is close to the feature value is selected as the uncertainty point.
The probability that a certain pixel point is predicted by an image segmentation boundary part is closer to 1 and is considered as a foreground, and the probability that the certain pixel point is closer to 0 and is considered as a background; therefore, the probability of selecting the pixel point is not close to 1 or 0, but is close to a certain characteristic value to be used as an uncertain point.
On the basis of the above embodiment, the foreground prediction probability is derived from the formula of the matting algorithm:
I=αF+(1-α)B
wherein I is an image, F is the foreground of I, alpha is the foreground prediction probability of the pixel, and B is the background of I.
The formula embodies the essence of portrait segmentation, and when the alpha is known, the foreground and the background can be segmented from one image.
On the basis of the above embodiment, a point where the foreground prediction probability is close to the feature value is selected as the feature value of the uncertain point to be 0.5.
In the image processing process, most points with the foreground prediction probability close to 0.5 are ignored, and the uncertain points can directly influence the boundary segmentation effect; we therefore select a number of points on the more dense grid with a probability close to 0.5; and basically considering the pixel point as the edge part of the foreground image when the foreground prediction probability is close to 0.5.
Fig. 8 schematically shows a flowchart of a method for fusing feature maps of different scales of an uncertain point to obtain a foreground map with a clear boundary and an alpha mask according to an embodiment of the present disclosure.
As shown in fig. 8, the method for obtaining the foreground image and the α mask with clear boundaries by fusing the feature images of the uncertain points at different scales includes:
in operation S1321, the feature maps are input into each layer of pyramid to perform pooling at different scales, so as to obtain feature maps at different scales.
In step S132, the feature maps have been input into each layer of pyramid for pooling in different scales; here, feature maps of different scales of the uncertain point are obtained.
In operation S1322, feature maps of different scales of the uncertain point are fused to obtain a feature map with a clear boundary.
Feature graphs of different scales of the uncertain points are fused to serve as features of the uncertain points, the features comprise features of large targets and features of detection details, and more boundary details are reserved.
On the basis of the above embodiment, fusing feature maps of different scales of the uncertain point further includes: and decoding to respectively obtain a foreground image with clear boundaries and an alpha mask.
The process before the edge contour processing can be regarded as a coding process, and here, a decoding process is performed to restore the image and obtain the foreground image and the alpha mask respectively.
On the basis of the above embodiment, decoding to obtain a foreground image with clear boundaries includes: carrying out up-sampling, convolution, BN and ReLU processing on the bilinear difference value to obtain a predicted characteristic diagram; and performing bilinear difference value upsampling, convolution, BN, ReLU, mirror surface Padding and convolution on the predicted characteristic diagram and the characteristic diagram with clear boundary to obtain a foreground diagram with clear boundary.
And (3) predicting branches by the foreground: in this branch, it first goes through a set of 3 residual block decoders for decoding, resulting in the corresponding profile. The first part of decoding uses the feature map obtained before as input, and obtains the predicted feature map after a group of bilinear difference value upsampling, convolution, BN, ReLU operations. And the second part of decoding uses the result after splicing the predicted characteristic diagram and the original characteristic diagram, and obtains a foreground diagram after bilinear difference value upsampling, convolution, BN, ReLU and mirror surface Padding and convolution in sequence.
On the basis of the above embodiment, decoding to obtain a clear-boundary α mask includes: carrying out up-sampling, convolution, BN and ReLU processing on the bilinear difference value, and decoding; and (5) carrying out mirror Padding, convolution and Tanh processing to obtain an alpha mask with clear boundary.
α predict branch: similar to the foreground prediction branch, it first decodes through a set of 3 residual block decoders, then decodes through two sets of bilinear difference values, convolution, BN, ReLU operations, and finally obtains the final predicted alpha matte after a set of mirror surfaces Padding, convolution and Tanh, the reason for using Tanh is that the value of each pixel of the alpha matte needs to be between 0 and 1.
Fig. 9 schematically shows a flowchart of a method of synthesizing with a foreground image and an alpha mask using a background image to be replaced according to an embodiment of the present disclosure.
As shown in fig. 9, the method for synthesizing the background image to be replaced with the foreground image and the alpha mask includes:
in operation S41, the alpha mask and the foreground map are processed to obtain a foreground portion of the second image.
In operation S42, the α mask and the background image to be replaced are processed to obtain a background portion of the second image.
In operation S43, the foreground portion and the background portion are superimposed to obtain a second image.
And (3) multiplying the alpha mask and the corresponding pixel value of the foreground image to obtain a foreground part of the fused image, multiplying the alpha mask processed in the step (1-a) and the corresponding pixel value of the new image to obtain a background part of the fused image, and superposing the foreground part and the background part to obtain the fused image.
The present disclosure also provides a video image background replacement method, including: acquiring a first video image in a current video; and carrying out background replacement on the first video image according to the image background replacement method to obtain a second video image under a new background.
The processing method can also be applied to video image background replacement, and a second video image under a new background can be obtained by acquiring a video frame in a video and performing image replacement.
On the basis of the above embodiment, obtaining the second video image in the new background includes: the rtmp transport protocol is used for the transmission of real-time video.
In order to enable real-time video background replacement, the rtmp transport protocol is used for the transport of video streams.
The video background replacement method disclosed by the invention overcomes the defects of the traditional portrait segmentation algorithm and reserves more portrait details; the user can conveniently replace the image background or replace the background of the user to protect personal privacy when the user performs video customer service, and meanwhile, the portrait part can be better kept, and the user experience is improved.
The following further explains the steps of the method by taking an example of replacing the video background image.
Please refer to fig. 10.
1001, firstly, acquiring a video frame in a video;
step 1002, inputting the video frame image into a convolutional neural network to obtain a feature map of the image;
step 1003, inputting the feature map into a pyramid pooling module for pooling in different scales;
step 1004, after upsampling, full link layer and activation function processing, obtaining the foreground prediction probability of each pixel point;
and directly up-sampling the low-dimensional characteristic graph through bilinear interpolation to ensure that the low-dimensional characteristic graph has the same scale as the original characteristic graph. And then splicing the feature maps of different levels into a final pyramid pooling global feature. Wherein the size of each level in the pyramid pooling structure can be modified in relation to the size of the feature map input into the pyramid pooling layer. The structure can extract the characteristics of different sub-regions by adopting different pooling kernels. And finally, obtaining the foreground prediction probability of each pixel point after activating function processing.
Step 1005, inputting the deconvolution layer to obtain a rough segmentation feature map 1006;
the traditional semantic segmentation network is subjected to a series of convolution pooling. A feature map of a certain resolution is obtained. This feature map is typically 1/8, 1/16, 1/32, etc. of the original. The smaller the characteristic diagram, the larger the receptive field, and the more suitable for detecting large targets. The more suitable the reverse is for detecting small targets. By predicting the points on the feature map, there is a class label, knowing that a certain pixel belongs to a certain class. And then, restoring the image to the size of the original image by a certain up-sampling method, thereby obtaining the semantic segmentation result of the original image.
Step 1007, carrying out edge contour processing on the rough segmented feature map 1006 to obtain a feature map with clear boundary;
in this embodiment, feature extraction is performed on the original image by using depeplab v3 to predict a rough segmentation α mask, and a feature map of 14 × 14 is extracted. The edge part of the segmentation result is not good, and the original background-matching algorithm has a strategy of optimizing the edge by a selection unit, so that the foreground and the background are selectively reserved in the original image, a background image without the foreground and an alpha mask. Although the purpose of edge optimization can be achieved, a background image without foreground needs to be input. In the present disclosure, bilinear interpolation is used to upsample the previously predicted rough boundary, and if the pixel distribution in the image is regarded as a grid, the grid density is higher and the resolution is higher, see fig. 12. The probability that the image segmentation boundary part predicts a certain pixel point is considered as a foreground when being closer to 1, and is considered as a background when being closer to 0. While points with a high probability of close to 0.5 are ignored. And such uncertain points directly affect the effect of boundary segmentation. We therefore select a number of points on the more dense grid with a probability close to 0.5. Since feature extraction of different scales has been performed on each point in deplab 3, 1/4 and 1/16 of the original feature map are fused as features of an uncertain point, and such features include features for detecting both large targets and detection details.
Step 1008, decoding to obtain a foreground image 1009 with clear boundary and an alpha mask 1010 respectively;
the foreground and alpha mask are next predicted by the features of the uncertain points. The foreground prediction branch is decoded by a group of decoders of 3 residual blocks to obtain a corresponding characteristic diagram. The first part of decoding uses the feature map obtained before as input, and obtains the predicted feature map after a group of bilinear difference value upsampling, convolution, BN, ReLU operations. And the second part of decoding uses the result after splicing the predicted characteristic diagram and the original characteristic diagram, and obtains a foreground diagram after bilinear difference value upsampling, convolution, BN, ReLU and mirror surface Padding and convolution in sequence. Similarly, the α prediction branch is first decoded by a set of 3 residual block decoders, then decoded by two sets of bilinear difference values, convolution, BN, ReLU operations, and finally decoded by a set of mirror Padding, convolution and Tanh to obtain the final predicted alpha matte, the reason for using Tanh is that the value of each pixel of the alpha matte needs to be between 0 and 1.
Step 1011, synthesizing the background image to be replaced with the foreground image and the alpha mask to obtain a video image under a new background, please refer to fig. 13 in the above steps 901 to 911.
The edge profile processing 907 specifically includes the following steps, please refer to fig. 11.
Step 1101, performing up-sampling on the boundary of the rough segmented feature map by using a bilinear interpolation value;
step 1102, predicting the probability of a certain pixel point on the boundary;
step 1103, selecting a point with probability close to 0.5 as an uncertain point;
1104, fusing feature maps of uncertain points in the pyramid pooling module under different scales;
and step 1105, obtaining a characteristic diagram with clear boundaries.
After the video image under the new background is obtained, real-time video transmission is performed, please refer to fig. 14 to 15.
The general flow of real-time video transmission is shown in fig. 14, where the video is transmitted using rtmp transport protocol. The source station, i.e. the server, is built using nginx. In order to realize real-time video background replacement, a layer of processing unit needs to be added at the server side, and the whole flow chart is shown in fig. 15, where 1: the image acquisition unit at the front end acquires a video through the camera, acquires a processed video and displays the processed video through a screen; 2, a server side: the server of the video stream, finish the receiving and dispatching of the video stream through the appointed transmission protocol; 3, image processing: and realizing portrait segmentation and background replacement.
FIG. 16 schematically illustrates a block diagram of an image background replacement system according to an embodiment of the present disclosure.
As shown in fig. 16, the image background replacement system 1600 includes: an image acquisition module 1610, a selection module 1620, a fusion module 1630, and a synthesis module 1640.
An image obtaining module 1610, configured to obtain a foreground prediction probability of each pixel point in the first image; according to an embodiment of the present disclosure, the image obtaining module 1610 may be configured to perform the step S1 described above with reference to fig. 3, for example, and is not described herein again.
A selecting module 1620, configured to select a point where the foreground prediction probability is close to the feature value as an uncertain point; according to an embodiment of the disclosure, the selecting module 1620 may be configured to perform the step S2 described above with reference to fig. 3, for example, and is not described herein again.
A fusion module 1630, configured to fuse feature maps of different scales of the uncertain point to obtain a foreground map with a clear boundary and an alpha mask; according to an embodiment of the disclosure, the fusion module 1630 may be used for executing the step S3 described above with reference to fig. 3, for example, and is not described herein again.
A synthesis module 1640, configured to synthesize the background image to be replaced with the foreground image and the alpha mask to obtain a second image in a new background; according to an embodiment of the present disclosure, the synthesis module 1640 may be used, for example, to perform the step S4 described above with reference to fig. 3, which is not described herein again.
Fig. 17 schematically shows a block diagram of a video image background replacement system according to an embodiment of the present disclosure.
As shown in fig. 17, the video image background replacement system 1700 includes: video acquisition module 1710, background replacement module 1720.
The video obtaining module 1710 is configured to obtain a first video image in a current video.
A background replacing module 1720, configured to perform background replacement on the first video image according to the foregoing image background replacing method, so as to obtain a second video image in a new background.
It should be noted that any number of modules, sub-modules, units, sub-units, or at least part of the functionality of any number thereof according to embodiments of the present disclosure may be implemented in one module. Any one or more of the modules, sub-modules, units, and sub-units according to the embodiments of the present disclosure may be implemented by being split into a plurality of modules. Any one or more of the modules, sub-modules, units, sub-units according to embodiments of the present disclosure may be implemented at least in part as a hardware circuit, such as a Field Programmable Gate Array (FPGA), a Programmable Logic Array (PLA), a system on a chip, a system on a substrate, a system on a package, an Application Specific Integrated Circuit (ASIC), or may be implemented in any other reasonable manner of hardware or firmware by integrating or packaging a circuit, or in any one of or a suitable combination of software, hardware, and firmware implementations. Alternatively, one or more of the modules, sub-modules, units, sub-units according to embodiments of the disclosure may be at least partially implemented as a computer program module, which when executed may perform the corresponding functions.
For example, any of the image obtaining module 1610, the selecting module 1620, the fusing module 1630, the synthesizing module 1640, the video obtaining module 1710, and the background replacing module 1720 may be combined and implemented in one module, or any one of the modules may be divided into a plurality of modules. Alternatively, at least part of the functionality of one or more of these modules may be combined with at least part of the functionality of the other modules and implemented in one module. According to an embodiment of the present disclosure, at least one of the image capturing module 1610, the selecting module 1620, the fusing module 1630, the synthesizing module 1640, the video capturing module 1710, and the background replacing module 1720 may be at least partially implemented as a hardware circuit, such as a Field Programmable Gate Array (FPGA), a Programmable Logic Array (PLA), a system on a chip, a system on a substrate, a system on a package, an Application Specific Integrated Circuit (ASIC), or may be implemented by hardware or firmware in any other reasonable manner of integrating or packaging a circuit, or implemented by any one of three implementations of software, hardware, and firmware, or implemented by a suitable combination of any several of them. Alternatively, at least one of the image capture module 1610, the selection module 1620, the fusion module 1630, the composition module 1640, or the video capture module 1710, the background replacement module 1720 may be at least partially implemented as a computer program module that, when executed, may perform corresponding functions.
The image background replacing method system and the image background replacing method can be used in the fields of financial science and technology and the like, particularly the field of remote video seat customer service, and provide a method for realizing portrait segmentation based on an improved background-matching algorithm and achieving the purpose of background replacement by placing the segmented portrait on a new background.
Fig. 18 schematically shows a block diagram of an electronic device adapted to implement the above described method according to an embodiment of the present disclosure. The electronic device shown in fig. 18 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present disclosure.
As shown in fig. 18, the electronic apparatus 1800 described in this embodiment includes: a processor 1801, which may perform various appropriate actions and processes in accordance with a program stored in a Read Only Memory (ROM)1802 or a program loaded from a storage portion 1808 into a Random Access Memory (RAM) 1803. The processor 1801 may include, for example, a general purpose microprocessor (e.g., a CPU), an instruction set processor and/or associated chipset, and/or a special purpose microprocessor (e.g., an Application Specific Integrated Circuit (ASIC)), among others. The processor 1801 may also include onboard memory for caching purposes. The processor 1801 may include a single processing unit or multiple processing units for performing the different actions of the method flows in accordance with embodiments of the present disclosure.
In the RAM 1803, various programs and data necessary for the operation of the system 1800 are stored. The processor 1801, ROM1802, and RAM 1803 are connected to one another by a bus 1804. The processor 1801 performs various operations of the method flows according to embodiments of the present disclosure by executing programs in the ROM1802 and/or the RAM 1803. Note that the programs may also be stored in one or more memories other than ROM1802 and RAM 1803. The processor 1801 may also perform various operations of method flows according to embodiments of the present disclosure by executing programs stored in the one or more memories.
According to an embodiment of the present disclosure, the electronic device 1800 may also include an input/output (I/O) interface 1805, the input/output (I/O) interface 1805 also being connected to the bus 1804. System 1800 can also include one or more of the following components connected to I/O interface 1805: an input portion 1806 including a keyboard, a mouse, and the like; an output portion 1807 including a display such as a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and the like, and a speaker; a storage portion 1808 including a hard disk and the like; and a communication section 1809 including a network interface card such as a LAN card, a modem, or the like. The communication section 1809 performs communication processing via a network such as the internet. A driver 1810 is also connected to the I/O interface 1805 as needed. A removable medium 1811 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on the drive 1810 as necessary, so that a computer program read out therefrom is mounted in the storage portion 1808 as necessary.
According to embodiments of the present disclosure, method flows according to embodiments of the present disclosure may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable storage medium, the computer program containing program code for performing the method illustrated by the flow chart. In such embodiments, the computer program may be downloaded and installed from a network via the communication portion 1809, and/or installed from the removable media 1811. The computer program, when executed by the processor 1801, performs the above-described functions defined in the system of the embodiments of the present disclosure. The systems, devices, apparatuses, modules, units, etc. described above may be implemented by computer program modules according to embodiments of the present disclosure.
The embodiments of the present disclosure also provide a computer-readable storage medium, which may be included in the device/apparatus/system described in the above embodiments; or may exist separately and not be assembled into the device/apparatus/system. The above-described computer-readable storage medium carries one or more programs which, when executed, implement an image background replacement method according to an embodiment of the present disclosure.
According to embodiments of the present disclosure, the computer-readable storage medium may be a non-volatile computer-readable storage medium, which may include, for example but is not limited to: a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In embodiments of the disclosure, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. For example, according to embodiments of the present disclosure, a computer-readable storage medium may include ROM1802 and/or RAM 1803 and/or one or more memories other than ROM1802 and RAM 1803 described above.
Embodiments of the present disclosure also include a computer program product comprising a computer program containing program code for performing the method illustrated in the flow chart. When the computer program product runs in a computer system, the program code is used for causing the computer system to realize the image background replacing method provided by the embodiment of the disclosure.
The computer program performs the above-described functions defined in the system/apparatus of the embodiments of the present disclosure when executed by the processor 1801. The systems, apparatuses, modules, units, etc. described above may be implemented by computer program modules according to embodiments of the present disclosure.
In one embodiment, the computer program may be hosted on a tangible storage medium such as an optical storage device, a magnetic storage device, or the like. In another embodiment, the computer program may also be transmitted in the form of a signal on a network medium, distributed, downloaded and installed via the communication section 1809, and/or installed from a removable media 1811. The computer program containing program code may be transmitted using any suitable network medium, including but not limited to: wireless, wired, etc., or any suitable combination of the foregoing.
In such embodiments, the computer program may be downloaded and installed from a network via the communication portion 1809, and/or installed from the removable media 1811. The computer program, when executed by the processor 1801, performs the above-described functions defined in the system of the embodiments of the present disclosure. The systems, devices, apparatuses, modules, units, etc. described above may be implemented by computer program modules according to embodiments of the present disclosure.
In accordance with embodiments of the present disclosure, program code for executing computer programs provided by embodiments of the present disclosure may be written in any combination of one or more programming languages, and in particular, these computer programs may be implemented using high level procedural and/or object oriented programming languages, and/or assembly/machine languages. The programming language includes, but is not limited to, programming languages such as Java, C + +, python, the "C" language, or the like. The program code may execute entirely on the user computing device, partly on the user device, partly on a remote computing device, or entirely on the remote computing device or server. In the case of a remote computing device, the remote computing device may be connected to the user computing device through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computing device (e.g., through the internet using an internet service provider).
It should be noted that each functional module in each embodiment of the present disclosure may be integrated into one processing module, or each module may exist alone physically, or two or more modules are integrated into one module. The integrated module can be realized in a hardware mode, and can also be realized in a software functional module mode. The integrated module, if implemented in the form of a software functional module and sold or used as a separate product, may be stored in a computer readable storage medium. Based on such understanding, the technical solutions of the present disclosure may be embodied in the form of software products, in part or in whole, which substantially contributes to the prior art.
The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams or flowchart illustration, and combinations of blocks in the block diagrams or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
Those skilled in the art will appreciate that various combinations and/or combinations of features recited in the various embodiments and/or claims of the present disclosure can be made, even if such combinations or combinations are not expressly recited in the present disclosure. In particular, various combinations and/or combinations of the features recited in the various embodiments and/or claims of the present disclosure may be made without departing from the spirit or teaching of the present disclosure. All such combinations and/or associations are within the scope of the present disclosure.
While the disclosure has been shown and described with reference to certain exemplary embodiments thereof, it will be understood by those skilled in the art that various changes in form and details may be made therein without departing from the spirit and scope of the disclosure as defined by the appended claims and their equivalents. Accordingly, the scope of the present disclosure should not be limited to the above-described embodiments, but should be defined not only by the appended claims, but also by equivalents thereof.
- 上一篇:石墨接头机器人自动装卡簧、装栓机
- 下一篇:票据文档电子化处理方法