Method, apparatus and storage medium for recognizing landmark in panoramic image
1. A method of identifying landmarks in a panoramic image, comprising:
performing projective transformation on the panoramic image to generate a projection image;
performing semantic segmentation on the projection image to determine a landmark region and a road surface region;
correcting distortion in the landmark region to generate a corrected landmark region; and
landmarks are identified in the corrected landmark regions.
2. The method of claim 1, wherein the projective transformation is an asteroid projective transformation.
3. The method of claim 2, wherein semantically segmenting the projection image to determine landmark regions and road surface regions comprises:
detecting a circular arc and a ray in the projection image; and
and filtering the detected circular arcs and rays based on the semantic characteristics of the circular arcs and the rays, and determining a landmark region and a road surface region in the projection image based on the filtered circular arcs and the filtered rays.
4. The method of claim 3, wherein filtering the detected arcs and rays based on their semantic characteristics and determining landmark regions and road surface regions in the projection image based on the filtered arcs and rays comprises:
based on semantic characteristics of the arcs and rays, one or more of the following arcs and rays are filtered out: arcs which do not take the projection center as the center of a circle, rays which are not emitted from the projection center, arcs with the radius smaller than a threshold value and isolated arcs; and
and determining a closed area formed by the filtered circular arcs as a road surface area, and determining a sector area formed by the filtered rays and the circular arcs outside the closed area as a landmark area.
5. The method of claim 1 or 2, wherein correcting distortion in the landmark region to generate a corrected landmark region comprises:
projecting pixel points on each circular arc in the landmark region to corresponding straight lines; and
and performing data compression on the pixel points projected to the straight lines to generate the corrected landmark areas.
6. The method of claim 1, wherein identifying a landmark in the corrected landmark region comprises:
performing coarse landmark detection at the corrected landmark region according to the similarity of a plurality of known landmark images and the corrected landmark region to determine candidate landmark images from the plurality of known landmark images; and
and performing feature matching on the corrected landmark region and each candidate landmark image, and identifying the landmarks in the landmark region based on the feature matching result.
7. The method of claim 6, wherein feature matching the corrected landmark region with respective candidate landmark images and identifying landmarks in the landmark region based on feature matching results comprises:
scaling the corrected landmark region and the respective candidate landmark images to the same resolution;
extracting features from the zoomed landmark region and each candidate landmark image and matching to determine matching feature points between the corrected landmark region and each candidate landmark image; and
and selecting a candidate landmark image with the highest matching degree with the corrected landmark region according to at least one of the number, the proportion, the distribution and the average characteristic of the matching feature points, and identifying the landmark in the landmark region as the landmark in the candidate landmark image with the highest matching degree.
8. An apparatus for recognizing landmarks in panoramic images, comprising:
a processor; and
a memory storing computer program instructions that,
wherein the computer program instructions, when executed by the processor, cause the processor to perform the steps of:
performing projective transformation on the panoramic image to generate a projection image;
performing semantic segmentation on the projection image to determine a landmark region and a road surface region;
correcting distortion in the landmark region to generate a corrected landmark region; and
landmarks are identified in the corrected landmark regions.
9. An apparatus for recognizing landmarks in panoramic images, comprising:
a projective transformation unit configured to projectively transform the panoramic image to generate a projected image;
an image segmentation unit configured to semantically segment the projection image to determine a landmark region and a road surface region;
a distortion correction unit configured to correct distortion in the landmark region to generate a corrected landmark region; and
a landmark identifying unit configured to identify a landmark in the corrected landmark region.
10. A computer readable storage medium having computer program instructions stored thereon, wherein the computer program instructions, when executed, implement the steps of:
performing projective transformation on the panoramic image to generate a projection image;
performing semantic segmentation on the projection image to determine a landmark region and a road surface region;
correcting distortion in the landmark region to generate a corrected landmark region; and
landmarks are identified in the corrected landmark regions.
Background
In recent years, panoramic technology has been widely used for image processing, and panoramic images can represent the environment around a photographer as much as possible in a wide-angle representation manner and in the form of drawings, photographs, videos, three-dimensional models, and the like. The panoramic image shot by the user may contain various landmarks such as scenic spots, restaurants, sculptures and the like, and the real scene feeling and the interactive feeling can be given to the user by identifying and labeling the landmarks therein.
Most of the existing landmark recognition models are directed at common images, the recognition effect of the existing landmark recognition models on the common images is good, however, due to the distortion effect existing in the panoramic images, a large error is generated when the existing landmark recognition models recognize the common images. Therefore, the current method for identifying landmarks in panoramic images usually depends on manual judgment, that is, photographed panoramic images are compared with a large number of known landmarks in a database by naked eyes manually, and the identification efficiency and accuracy are relatively low.
Therefore, there is a need for a landmark identification technique that automatically identifies landmarks in panoramic images and can cope with the effects of distortion in panoramic images.
Disclosure of Invention
According to an aspect of the present disclosure, there is provided a method of recognizing a landmark in a panoramic image, including: performing projective transformation on the panoramic image to generate a projection image; performing semantic segmentation on the projection image to determine a landmark region and a road surface region; correcting distortion in the landmark region to generate a corrected landmark region; and identifying a landmark in the corrected landmark region.
According to another aspect of the present disclosure, there is provided an apparatus for recognizing a landmark in a panoramic image, including: a processor; and a memory storing computer program instructions, wherein the computer program instructions, when executed by the processor, cause the processor to perform the steps of: performing projective transformation on the panoramic image to generate a projection image; performing semantic segmentation on the projection image to determine a landmark region and a road surface region; correcting distortion in the landmark region to generate a corrected landmark region; and identifying a landmark in the corrected landmark region.
According to still another aspect of the present disclosure, there is provided an apparatus for recognizing a landmark in a panoramic image, including: a projective transformation unit configured to projectively transform the panoramic image to generate a projected image; an image segmentation unit configured to semantically segment the projection image to determine a landmark region and a road surface region; a distortion correction unit configured to correct distortion in the landmark region to generate a corrected landmark region; and a landmark identifying unit configured to identify a landmark in the corrected landmark region.
According to yet another aspect of the present disclosure, there is provided a computer readable storage medium having computer program instructions stored thereon, wherein the computer program instructions, when executed, implement the steps of: performing projective transformation on the panoramic image to generate a projection image; performing semantic segmentation on the projection image to determine a landmark region and a road surface region; correcting distortion in the landmark region to generate a corrected landmark region; and identifying a landmark in the corrected landmark region.
Drawings
These and/or other aspects and advantages of the present disclosure will become more apparent and more readily appreciated from the following detailed description of the embodiments of the present disclosure, taken in conjunction with the accompanying drawings of which:
fig. 1 illustrates an exemplary scene to which the technique of recognizing landmarks in a panoramic image of the embodiments of the present disclosure may be applied.
Fig. 2 shows a flowchart of a method of recognizing landmarks in a panoramic image, according to an embodiment of the present disclosure.
Fig. 3 shows a schematic representation of the principle of the asteroid projective transformation.
Fig. 4 shows a schematic diagram of projective transformation of a panoramic image with different projection perspectives.
Fig. 5 illustrates an exemplary method of semantically segmenting a projection image in a method of recognizing a landmark in a panoramic image according to an embodiment of the present disclosure.
Fig. 6 shows a schematic diagram of semantic segmentation of a projection image in a method of recognizing landmarks in a panoramic image according to an embodiment of the present disclosure.
Fig. 7 illustrates an exemplary method of correcting distortion in a landmark region in a method of recognizing landmarks in a panoramic image according to an embodiment of the present disclosure.
Fig. 8 shows a schematic diagram for correcting distortion in a landmark region in a method of recognizing landmarks in a panoramic image according to an embodiment of the present disclosure.
Fig. 9 is a schematic diagram illustrating a result of correcting distortion in a landmark region in the method of recognizing a landmark in a panoramic image according to an embodiment of the present disclosure.
Fig. 10 illustrates an exemplary method of recognizing a landmark in a corrected landmark region in the method of recognizing a landmark in a panoramic image according to an embodiment of the present disclosure.
Fig. 11 is a schematic diagram illustrating a method for recognizing a landmark in a panoramic image according to an embodiment of the present disclosure, in which a candidate landmark image is determined from a known landmark image.
Fig. 12 is a schematic diagram illustrating feature extraction and matching in a landmark region and a candidate landmark image in the method of recognizing a landmark in a panoramic image according to the embodiment of the present disclosure.
Fig. 13 shows a schematic hardware block diagram of an apparatus for recognizing landmarks in panoramic images according to an embodiment of the present disclosure.
Fig. 14 shows a schematic block diagram of an apparatus for recognizing landmarks in a panoramic image according to an embodiment of the present disclosure.
Detailed Description
An exemplary scene to which the technique of recognizing landmarks in a panoramic image of the embodiments of the present disclosure may be applied is first described with reference to fig. 1. As shown in the left diagram of fig. 1, a user can photograph a street view including various road objects such as roads and vehicles and landmarks such as building buildings using a panoramic camera, and recognizing and labeling landmarks in the street view can enrich the photographed contents. For example, a landmark in street view may be detected and a building landmark "beijing restaurant" in the photographed panoramic image may be identified, and the identified landmark may be labeled as shown in the right diagram of fig. 1, thereby enhancing interest and interactivity with the user. However, as mentioned above, the existing recognition methods generate large errors due to the distortion effect in the panoramic image. Therefore, most of the existing methods for identifying landmarks in panoramic images still rely on manual judgment, and the identification efficiency and accuracy are relatively low.
In view of the above, in order to automatically recognize landmarks in a panoramic image without being affected by distortion in the panoramic image, the present application proposes a panoramic image landmark recognition technique based on image semantic segmentation and distortion removal, in which a panoramic image is subjected to projection conversion and semantic segmentation, a road surface region and a landmark region are segmented from a projected image, the landmark region is subjected to distortion removal correction processing, and recognition is performed in the landmark region from which distortion is removed, thereby improving efficiency and accuracy of landmark recognition in the panoramic image.
Here and hereinafter, for convenience of explanation, the technology of recognizing landmarks in a panoramic image is described by taking a panoramic image obtained by photographing a street view as an example, however, this is only an example and not a limitation of the present disclosure, and the landmark recognition technology of the present disclosure has a wide application scene, for example, it may be applied to various scenes such as panoramic roaming, robot panoramic vision, immersive fitness place, and the like. The following describes in detail a method and apparatus for recognizing landmarks in panoramic images according to the present disclosure, with reference to the accompanying drawings and examples.
Landmark identification method
Fig. 2 shows a flowchart of a method of recognizing landmarks in a panoramic image, according to an embodiment of the present disclosure. In the embodiments of the present disclosure, the panoramic image and the panoramic video may be used interchangeably, and the landmark identifying method of the present disclosure may be applied to one or more frames in a panoramic image or a piece of panoramic video photographed by a user to identify landmarks therein. In addition, in the embodiments of the present disclosure, a landmark refers to a building or an object having a landmark feature outside a road surface area, such as a tourist attraction, a restaurant, a movie theater, a sculpture, and the like. A method of recognizing landmarks in a panoramic image is described in detail below in conjunction with fig. 2.
As shown in fig. 2, in step S101, the panoramic image is projectively transformed to generate a projection image. In the embodiment of the present disclosure, the panoramic image may be projectively transformed by adopting various projection modes, for example, equidistant columnar projection, cube map projection, fish-eye projection, cylindrical projection, asteroid projection, and the like. For convenience of description, the projection transformation will be described below as an example of a manner of asteroid projection.
Fig. 3 shows a schematic representation of the principle of the asteroid projective transformation. The asteroid projective transformation of the panoramic image mainly comprises the following steps: first, the panoramic image is pasted to the spherical surface according to the longitude and latitude expansion method, the horizontal lines in the actual scene are correspondingly pasted to the latitude lines, and the vertical lines in the actual scene are correspondingly pasted to the longitude lines, taking the horizontal lines and the vertical lines shown in fig. 3 as examples. Then, the coordinates on the spherical surface are projected from the projected point into a circle in the projection plane at a certain projection view angle (FOV), thereby obtaining a projected image. The detailed transformation of the asteroid projection is well known in the art and will not be described herein.
The inventor has noted that, when performing asteroid projective transformation on a panoramic image, the projective transformation performed by selecting different projection points and projection view angles has an influence on the subsequent landmark identification accuracy, for example, the farther the projection point distance is (relative to the projection plane where the south pole or the north pole is located) as shown in fig. 3, the larger the projection view angle is, the larger the proportion range of sky pixels in the corresponding projection image is, so as to ensure that the area space where the landmark is located is larger, so as to more accurately and sufficiently extract the features related to the landmark. Fig. 4 shows a schematic representation of a projective transformation of a panoramic image using the same proxel but using different projection perspectives. As shown in the left side of fig. 4, the panoramic image is shot, and the two images on the right side are projection images that both use the north pole as a projection point and project onto a projection plane tangent to the projection center of the south pole with the spherical surface, but that are obtained by projecting with different projection angles. The effective area of the landmark in the first projection image on the right side of fig. 4 is small, and therefore the feature extracted in relation to the landmark may not be accurate enough. In contrast, the space where the landmark region is located in the second projection image on the right side of fig. 4 is relatively large, so that the features related to the landmarks can be more accurately and sufficiently extracted, and the subsequent landmark identification is facilitated. In the embodiment of the present disclosure, optionally, the projection point and the projection view angle of the projective transformation are determined according to the proportion of the sky pixel in the projected image. For example, for different projection points and projection visual angles, the proportion of sky pixels in the obtained projection image can be observed, so that the proper projection points and projection visual angles are determined, the road surface area is small, the area space where the landmarks are located is large, and the features related to the landmarks are extracted more accurately and sufficiently. For example, the projection points and projection views of the projective transformation may be selected such that the proportion of sky pixels in the entire projected image is greater than, for example, 50%.
Returning to fig. 2, in step S102, the projection image is semantically segmented to determine a landmark region and a road surface region. As described above, various objects such as vehicles and markers on the road surface and various architectural landmarks other than the road surface may be included in the panoramic image. In order to accurately identify the landmarks, it is possible to reduce the influence of the road surface object on the detection result as much as possible, and therefore, it is necessary to determine the landmark region and the road surface region. In the embodiment of the disclosure, the semantic segmentation is adopted to perform visual processing on the projection image, and the meaning in the real scene of each visual element in the projection image is further understood based on the high-level semantic characteristics of the visual element, so that accurate and efficient segmentation is performed. The projection image obtained by the asteroid projective transformation will be described in detail below with reference to fig. 5 and 6.
Fig. 5 illustrates an exemplary method of semantically segmenting a projection image in a method of recognizing a landmark in a panoramic image according to an embodiment of the present disclosure. As shown in fig. 5, in step S1021, an arc and a ray are detected in the projection image. As is known from the principle of the asteroid projective transformation described above, the circular arcs in the projection images represent horizontal lines in the actual image, and the rays in the projection images represent vertical lines in the actual image. Thus, the task of finding horizontal and vertical lines in the actual scene that may represent landmark contours is equivalent to finding arcs and rays in the projected image. The arc and ray may be detected in the projected image using a conventional computer vision processing method, a deep learning method, and the like, which is not limited by the present disclosure.
In step S1022, the detected arc and ray are filtered based on their semantic characteristics, and a landmark region and a road surface region are determined in the projection image based on the filtered arc and ray. Since the road surface lines, road beds, and the like in the road surface area are also represented as arcs and rays in the projection image, the arcs and rays detected in step S1021 may include noise arcs and rays that are not corresponding to the landmark area, which may interfere with the subsequent landmark recognition result, and therefore, it is necessary to filter and semantically segment the arcs and rays based on semantic characteristics represented by the arcs and rays. In this step, as an example, an arc not centered on the projection center and a ray not emerging from the projection center may be filtered out based on semantic characteristics of the arc and the ray.
As described above, the horizontal line in the actual scene corresponds to the circular arc in the projected image and the vertical line in the actual scene corresponds to the ray in the projected image, so for the landmark out of the road surface area, the circular arc corresponding to the horizontal line should be centered on the projection center of the projection plane, and the ray corresponding to the vertical line should be emitted from the projection center of the projection plane (i.e., the extension line of the ray should pass through the projection center). According to the principle, the arc which does not take the projection center as the center of a circle can be filtered, and the ray which does not emit from the projection center can be filtered. For ease of understanding, an example of the above-described filtering process will be described below with reference to fig. 6.
Examples of detected arcs and rays are shown in the left diagram of fig. 6, with only a portion of the detected rays and arcs shown for clarity. The projection center is shown in the left diagram of fig. 6, and the detected ray may be extended to determine whether it passes through the projection center to determine whether the ray is emitted from the projection center, and to determine whether the detected arc is a part of a circle centered at the projection center, thereby filtering out an arc not centered at the projection center and a ray not emitted from the projection center. For example, the middle drawing of fig. 6 shows an exemplary result of the above-described filtering process, in which arcs that are not filtered out with the center of projection as the center are indicated by dashed arcs.
In addition to the above filtering based on whether the arc and the ray share the same projection center, other filtering rules may be adopted in the embodiments of the present disclosure, so as to filter out unsuitable arcs and rays based on semantic characteristics of the detected ray and arc. For example, arcs with radii less than a certain threshold may optionally be filtered out. As shown in the left diagram of fig. 6, from the actual starting points of the two detected rays on the projection image (instead of the above-mentioned extended starting point), the arc corresponding to the landmark with the minimum radius threshold may be determined, and since the arc with a radius smaller than the threshold is unlikely to correspond to the landmark region but belongs to the road surface region, the arc with a radius smaller than the threshold may be filtered accordingly. In yet another example, isolated arcs in the projected image may be filtered accordingly, considering that arcs representing roads may be connected end-to-end with adjacent arcs to form a closed area, while corresponding noise arcs, such as road beds and sidewalks, often appear in the projected image as isolated arcs that do not have adjacent arcs and thus cannot form a closed area. It should be noted that in the embodiment of the present disclosure, one or more of the above filtering rules may be employed simultaneously or sequentially, so as to filter out noise arcs and rays that do not correspond to the landmark region based on semantic characteristics of arcs and rays, thereby reducing the influence of the road surface region on the landmark detection accuracy.
Next, the road surface region and the landmark region may be segmented based on the filtered arc and the ray. Specifically, a closed region formed by the filtered circular arc may be determined as the road surface region, and a sector region formed by the filtered ray and the circular arc outside the closed region may be determined as the landmark region. For example, as shown in the right diagram of fig. 6, a closed area formed by connecting the filtered adjacent arcs end to end may be determined as a road surface area, and a sector area formed by the filtered arcs and the rays outside the closed area may be determined as a landmark area, for example, a sector area formed by two rays, an arc with a minimum radius determined by an actual starting point of the rays, and an arc with a maximum radius determined by an end point of the rays may be determined as a landmark area.
Returning to fig. 2, in step S103, distortion in the landmark region is corrected to generate a corrected landmark region. As described above, the distortion effect in the panoramic image causes a large error in landmark identification, and in the embodiment of the present disclosure, distortion removal correction processing is performed on the landmark region obtained after semantic segmentation, so as to avoid the influence on the accuracy of landmark identification due to panoramic distortion. The correction processing will be described in detail below with reference to fig. 7 to 9 by taking a landmark region obtained by semantic segmentation after asteroid projection transformation as an example.
Fig. 7 illustrates an exemplary method of correcting distortion in a landmark region in a method of recognizing landmarks in a panoramic image according to an embodiment of the present disclosure.
As shown in fig. 7, in step S1031, the pixel points on each arc in the landmark region are projected onto the corresponding straight lines. As described above, the circular arcs in the projected image correspond to the straight lines in the actual scene, and therefore, in step S1031, the pixel points on each circular arc in the landmark region are all projected onto the corresponding straight lines, so that the distortion introduced by the asteroid projection manner is eliminated. For example, as shown in fig. 8, for each circular arc in the determined landmark region, pixel points on the circular arc may be projected onto a corresponding straight line, so as to generate a trapezoidal landmark region from a fan-shaped landmark region.
In step S1032, data compression is performed on the pixel points projected onto the respective straight lines to generate the corrected landmark regions. Specifically, as shown in fig. 8, each circular arc in the sector landmark region and each corresponding straight line in the trapezoid landmark region both correspond to the same landmark and have the same length in the actual scene, and based on this principle, data compression may be performed on each straight line in the trapezoid region obtained in step S1031, so that the number of pixel points on each straight line is the same. For example, as shown in fig. 8, 6 pixel points of a certain line segment on a straight line may be compressed into 2 data points, so as to ensure that each straight line after data compression contains the same number of data points, and thus a corrected rectangular landmark region may be obtained. Fig. 9 is a schematic diagram illustrating the distortion correction process performed on a landmark in a panoramic image by the exemplary method shown in fig. 7 to generate a corrected landmark region, and it can be seen that the distortion effect is substantially eliminated by the landmark included in the corrected landmark region.
The above describes processing of performing semantic segmentation and correction after performing projection transformation on a panoramic image by taking the asteroid projection manner as an example, and it can be seen that after performing the asteroid projection transformation, a road surface region is concentrated near a projection center in a projection plane, so that the road surface region and a landmark region can be separated based on semantic characteristics, and the influence of the road surface region on a landmark identification result can be eliminated by an arc and ray filtering rule, so that the accuracy of landmark identification can be improved. For other projection modes, semantic segmentation processing and correction processing can be similarly performed on the basis of other image elements in the projection image in combination with the geometric characteristics of the image elements, so that the landmark region with distortion removed can be determined on the basis of high-level semantic characteristics. The embodiment of the present disclosure does not limit the projection conversion method and the specific correction processing method.
Returning to fig. 2, in step S104, landmarks are identified in the corrected landmark regions. In this step, landmark detection may be performed in the landmark region from which the distortion effect is removed, and the landmark that most matches the detected landmark is determined from among known landmarks in a pre-established database. The above-described landmark detection and matching process may be performed using various known models and methods to identify landmarks, which are not limited by this disclosure. As an example, the disclosed embodiments employ two stages of coarse landmark detection and fine landmark detection within a corrected landmark region for landmark identification. Specifically, the pre-established database contains a large number of known landmark images, and each landmark has a plurality of known images corresponding to the landmark but having large differences in background environment, shooting angle of view, lighting conditions, and the like, so that searching for an image corresponding to the shot landmark from among the large number of known landmark images is a great challenge to computer processing capability. The landmark identification mode of the two-stage processing can effectively save the computing resource when searching from the known landmarks. An exemplary method of identifying landmarks in the corrected landmark region is described in detail below in conjunction with fig. 10-12.
Fig. 10 illustrates an exemplary method of recognizing a landmark in a corrected landmark region in the method of recognizing a landmark in a panoramic image according to an embodiment of the present disclosure. As shown in fig. 10, in step S1041, rough landmark detection is performed according to the similarity between a plurality of known landmark images and the corrected landmark region to determine a candidate landmark image from the plurality of known landmark images. In this step, the conventional computer vision processing method and deep learning method may be used for landmark detection, for example, by comparing the similarity between the corrected landmark region and the known landmark image or outputting a classifier for the corrected landmark region based on a trained neural network, determining whether there is a possible landmark in the corrected landmark region, and determining one or more candidate landmark images that are relatively similar to the landmark region with the panorama distortion removed from the large number of known landmark images. As an example, a corrected landmark region may be detected using a trained fast RCNN model based on a pre-established database containing a large number of known landmark images, whether a landmark is contained in the landmark region may be determined based on the classifier result of the model, and one or more candidate landmark images may be roughly determined from the database.
An example of determining candidate landmark images from known landmark images in a method of recognizing landmarks in a panoramic image according to an embodiment of the present disclosure is illustrated in fig. 11. In this example, a corrected landmark region, which is classified by a model based on the corrected landmark region and a plurality of known landmark images as inputs to a trained fast RCNN model, may be determined from the plurality of known landmark images as a candidate landmark image that may correspond to a landmark captured in the panoramic image. Optionally, since the corrected landmark region may contain a part of the background environment, such as the sky, the cloud, and the like, and the corrected landmark region may be greatly different from the background environment, the shooting angle, the lighting condition, and the like of the known landmark image in the database, the minimum bounding rectangle of the landmark (as shown in the rectangle of fig. 11) may be further determined from the corrected landmark region and the known landmark image, respectively, when the model is applied, so as to exclude the influence of the background factor difference on the landmark matching.
Returning to fig. 10, in step S1042, the corrected landmark region is feature-matched with each candidate landmark image, and landmarks in the landmark region are identified based on the feature matching result. In this step, after rough landmark detection is performed on the corrected landmark region, more accurate landmark detection can be performed within the range of the resulting candidate landmark image. Specifically, the corrected landmark regions may be matched one-to-one with the respective candidate landmark images, so that the landmark corresponding to the best matching candidate landmark image is taken as the recognized landmark. In this step, feature extraction and matching may be performed directly on the corrected landmark regions and the respective candidate landmark images using various suitable methods. In addition, considering that the corrected landmark region and each candidate landmark image obtained in step S1041 may have different resolutions, and a difference in such a scale may affect the matching result of the images, in this step, optionally, the corrected landmark region and each candidate landmark image may be first scaled to the same resolution, and then feature extraction and matching may be performed on the scaled corrected landmark region and each candidate landmark image. Further, in the case where the minimum bounding rectangular frame of the landmark is further determined from the corrected landmark region and the known landmark image, respectively, as described above, the scaling of the image may also be performed on the basis of the determined minimum bounding rectangular frame of the landmark.
In the above-mentioned scheme of scaling the corrected landmark region and each candidate landmark image to the same resolution, one or more of operators such as DELF, SURF, SIFT, BRIEF, GIST, and VLAD may be used to extract various features in the scaled landmark region and each candidate landmark image that need to be matched, respectively, so as to determine matching feature points in the two images.
Fig. 12 is a schematic diagram illustrating an example of extracting features from a landmark region and a candidate landmark image and performing matching in the method of recognizing a landmark in a panoramic image according to an embodiment of the present disclosure. In this example, a trained DELF model is employed to determine matching feature points between the corrected landmark region and each candidate landmark image. Specifically, for the corrected landmark region to be matched and each candidate landmark image, the characteristic points on the two images can be obtained through the DELF model respectively, and meanwhile, the model can also determine the matched characteristic points between the two images. For example, as shown in the matching result in fig. 12, line segments are connected with the matching feature points on the two images as end points. It can be seen that for each candidate landmark image, the DELF model can derive its matching feature point condition with the corrected landmark region.
After determining the matching feature points between the corrected landmark region and each candidate landmark image as above, the candidate landmark image with the highest matching degree with the corrected landmark region may be selected according to at least one of the number, proportion, distribution and average characteristics of the matching feature points, and the landmark in the landmark region may be identified as the landmark in the candidate landmark image with the highest matching degree. In one example, a candidate landmark image with the largest number of matching feature points may be selected among the respective candidate landmark images with the number of matching feature points higher than the threshold value, and the landmark in the corrected landmark region may be identified as the landmark in the candidate landmark image. In another example, the highest-proportion candidate landmark image may be selected among the respective candidate landmark images in which the proportion of the number of matched feature points to the total feature points is higher than a threshold value, and the landmark in the corrected landmark region may be identified as the landmark in the candidate landmark image. In yet another example, in order to avoid a matching result in which the feature points are too dispersed, a candidate landmark image with the largest number of matching feature points and/or the highest proportion of matching feature points may be selected from among the respective landmark images with the matching feature points more uniform than the threshold, and the landmark in the corrected landmark region may be identified as the landmark in the landmark image. In still another example, in order to equalize average features of the same known landmark in the database so as to cope with influences on different shooting conditions and the like of the same known landmark in the database, considering that there may be a plurality of candidate images each corresponding to the same known landmark from among the candidate images determined from the database, the candidate landmark images corresponding to the same known landmark may be grouped, and the number of matched feature points and/or the average of the ratios of the matched feature points of each candidate known landmark image in the group may be calculated, and the best matched candidate landmark image may be selected based on the number of matched feature points and/or the ratio of the matched feature points.
It is to be appreciated that any combination of one or more of the above examples may be employed in embodiments of the present disclosure to determine a best matching known landmark from the respective candidate landmark images based on a one-to-one matching result of the corrected landmark region and each candidate landmark image, thereby improving accuracy of landmark identification.
The method of recognizing a landmark in a panoramic image according to an embodiment of the present disclosure has been described above with reference to the accompanying drawings. In the method, the projection transformation is carried out on the panoramic image and the semantic segmentation is carried out to determine the road surface area and the landmark area, so that the influence of the road surface area on landmark identification can be reduced; and the distortion of the landmark region is eliminated, so that the identification error caused by panoramic distortion can be reduced; by determining the best matching known landmark through various feature matching rules in the landmark region with distortion eliminated, the efficiency and accuracy of landmark identification in the panoramic image can be further improved.
In addition, the method of recognizing a landmark in a panoramic image according to an embodiment of the present disclosure may be applied to various scenes such as panoramic roaming, robot panoramic vision, immersive fitness place, and the like as described above, and after the landmark in the panoramic image is recognized according to the method, the name of the recognized landmark may be marked in the panoramic image, thereby enhancing interest and interactivity with the user.
Landmark identification device
According to another aspect of the present disclosure, an apparatus 1300 for recognizing a landmark in a panoramic image is described in detail below in conjunction with fig. 13.
Fig. 13 illustrates a hardware block diagram of an apparatus for recognizing landmarks in a panoramic image according to an embodiment of the present disclosure. As shown in fig. 13, the apparatus 1300 includes a processor U1301 and a memory U1302.
Processor U1301 may be any processing capable device capable of performing the functions of embodiments of the present disclosure, such as a general purpose processor, a Digital Signal Processor (DSP), an ASIC, a field programmable gate array signal (FPGA) or other Programmable Logic Device (PLD), discrete gate or transistor logic, discrete hardware components, or any combination thereof designed to perform the functions described herein.
Memory U1302 may include computer system readable media in the form of volatile memory, such as Random Access Memory (RAM) and/or cache memory, and may also include other removable/non-removable, volatile/nonvolatile computer system memory, such as a hard drive, floppy disk, CD-ROM, DVD-ROM, or other optical storage media.
In this embodiment, computer program instructions are stored in memory U1302, and processor U1301 may execute the instructions stored in memory U1302. When executed by the processor, the computer program instructions cause the processor to perform the method of identifying landmarks in panoramic images of the disclosed embodiments. The method for recognizing landmarks in panoramic images is substantially the same as described above with respect to fig. 1-12, and thus, in order to avoid repetition, will not be described again.
According to yet another aspect of the present disclosure, an apparatus 1400 for recognizing landmarks in a panoramic image is provided, which is described in detail below in conjunction with fig. 14.
Fig. 14 illustrates a block diagram of a structure of an apparatus for recognizing landmarks in a panoramic image according to an embodiment of the present disclosure. As shown in fig. 14, the apparatus 1400 includes a projective transformation unit U1401, an image segmentation unit U1402, a distortion correction unit U1403, and a landmark identifying unit U1404. The various components may perform the various steps/functions of the method of recognizing landmarks in panoramic images described above in connection with fig. 1-12, respectively, and therefore, to avoid repetition, only a brief description of the apparatus is provided below, while a detailed description of the same details is omitted.
The projective transformation unit U1401 projectively transforms the panoramic image to generate a projection image. The panoramic image transformed by the projective transformation unit U1401 may be one panoramic image photographed by a user or one or more frames in a piece of panoramic video to identify landmarks therein. The projective transformation unit U1401 can perform projective transformation on the panoramic image by using various projection methods, for example, equidistant columnar projection, cube map projection, fisheye projection, cylindrical projection, asteroid projection, and the like. For convenience of description, the projection transformation will be described below as an example of a manner of asteroid projection. In the embodiment of the present disclosure, the projective transformation unit U1401 may determine the projective point and the projective angle of view of projective transformation according to the proportion of the sky pixel in the projected image, so that the road surface area is smaller and the area space where the landmark is located is larger, thereby more accurately and sufficiently extracting the feature related to the landmark. For example, the projection points and projection views of the projective transformation may be selected such that the proportion of sky pixels in the entire projected image is greater than, for example, 50%.
An image segmentation unit U1402 semantically segments the projection image to determine a landmark region and a road surface region. In order to accurately identify the landmarks and reduce the influence of the road surface object on the detection result as much as possible, the image segmentation unit U1402 may perform the visual processing on the projection image by adopting semantic segmentation, and further understand the meaning in the real scene based on the high-level semantic characteristics of each visual element in the projection image, thereby performing accurate and efficient segmentation. In this embodiment, taking the asteroid projection manner as an example, the image segmentation unit U1402 may detect arcs and rays in the projection image, and filter the detected arcs and rays based on semantic characteristics of the arcs and rays, for example, filter one or more of the following arcs and rays: circular arcs which do not take the projection center as the center of a circle, rays which do not emit from the projection center, circular arcs with the radius smaller than a threshold value and isolated circular arcs. Then, the image segmentation unit U1402 may determine a landmark region and a road surface region in the projection image based on the filtered circular arc and the ray, for example, determine a closed region constituted by the filtered circular arc as the road surface region, and determine a sector region constituted by the filtered ray and the circular arc outside the closed region as the landmark region.
A distortion correction unit U1403 corrects distortion in the landmark region to generate a corrected landmark region. As described above, the distortion effect in the panoramic image causes a large error in landmark identification, and in the embodiment of the present disclosure, distortion removal correction processing is performed on the landmark region obtained after semantic segmentation, so as to avoid the influence on the accuracy of landmark identification due to panoramic distortion. In this embodiment, continuing to use the asteroid projection manner as an example, the distortion correction unit U1403 may project the pixel points on each arc in the landmark region onto the corresponding straight line. Then, the distortion correction unit U1403 may perform data compression on the pixel points projected onto the respective straight lines to generate the corrected landmark regions.
The landmark identifying unit U1404 identifies landmarks in the corrected landmark regions. In the present embodiment, the landmark identifying unit U1404 may perform landmark detection in the landmark region from which the distortion effect is removed, and determine the landmark that most matches the detected landmark from among known landmarks in a pre-established database. As an example, the landmark identifying unit U1404 may perform rough landmark detection according to the similarity of a plurality of known landmark images and the corrected landmark region to determine candidate landmark images from the plurality of known landmark images. Then, the landmark identifying unit U1404 may perform feature matching on the corrected landmark region with each candidate landmark image, and identify landmarks in the landmark region based on the feature matching result. The landmark identifying unit U1404 may select a candidate landmark image that matches the corrected landmark region with the highest degree according to at least one of the number, proportion, distribution, and average characteristics of the matching feature points, and identify a landmark in the landmark region as a landmark in the candidate landmark image with the highest degree of matching.
The apparatus for recognizing a landmark in a panoramic image according to the embodiments of the present disclosure has been described above with reference to the accompanying drawings. The equipment for recognizing the landmarks in the panoramic image determines a road surface area and a landmark area by performing projection transformation on the panoramic image and performing semantic segmentation, so that the influence of the road surface area on landmark recognition can be reduced; and the distortion of the landmark region is eliminated, so that the identification error caused by panoramic distortion can be reduced; by determining the best matching known landmark through various feature matching rules in the landmark region with distortion eliminated, the efficiency and accuracy of landmark identification in the panoramic image can be further improved.
In addition, the apparatus for recognizing a landmark in a panoramic image according to an embodiment of the present disclosure may be applied to various scenes such as panoramic roaming, robot panoramic vision, immersive fitness place, and the like as described above, and after recognizing a landmark in a panoramic image according to the apparatus, names of the recognized landmark may be marked in the panoramic image, thereby enhancing interest and interactivity with a user.
Computer readable storage medium
The method/apparatus for recognizing landmarks in panoramic images according to the present disclosure may also be implemented by providing a computer program product containing program code implementing the method or apparatus, or by any storage medium storing such a computer program product.
The foregoing describes the general principles of the present disclosure in conjunction with specific embodiments, however, it is noted that the advantages, effects, etc. mentioned in the present disclosure are merely examples and are not limiting, and they should not be considered essential to the various embodiments of the present disclosure. Furthermore, the foregoing disclosure of specific details is for the purpose of illustration and description and is not intended to be limiting, since the disclosure is not intended to be limited to the specific details so described.
The block diagrams of devices, apparatuses, systems referred to in this disclosure are only given as illustrative examples and are not intended to require or imply that the connections, arrangements, configurations, etc. must be made in the manner shown in the block diagrams. These devices, apparatuses, devices, systems may be connected, arranged, configured in any manner, as will be appreciated by those skilled in the art. Words such as "including," "comprising," "having," and the like are open-ended words that mean "including, but not limited to," and are used interchangeably therewith. The words "or" and "as used herein mean, and are used interchangeably with, the word" and/or, "unless the context clearly dictates otherwise. The word "such as" is used herein to mean, and is used interchangeably with, the phrase "such as but not limited to".
Also, as used herein, "or" as used in a list of items beginning with "at least one" indicates a separate list, such that, for example, a list of "A, B or at least one of C" means A or B or C, or AB or AC or BC, or ABC (i.e., A and B and C). Furthermore, the word "exemplary" does not mean that the described example is preferred or better than other examples.
It is also noted that in the apparatus and methods of the present disclosure, the components or steps may be broken down and/or recombined. These decompositions and/or recombinations are to be considered equivalents of the present disclosure.
It will be understood by those of ordinary skill in the art that all or any portion of the methods and apparatus of the present disclosure may be implemented in any computing device (including processors, storage media, etc.) or network of computing devices, in hardware, firmware, software, or any combination thereof. The hardware may be implemented with a general purpose processor, a Digital Signal Processor (DSP), an ASIC, a field programmable gate array signal (FPGA) or other Programmable Logic Device (PLD), discrete gate or transistor logic, discrete hardware components, or any combination thereof designed to perform the functions described herein. A general purpose processor may be a microprocessor, but in the alternative, the processor may be any commercially available processor, controller, microcontroller or state machine. A processor may also be implemented as a combination of computing devices, e.g., a combination of a DSP and a microprocessor, a plurality of microprocessors, one or more microprocessors in conjunction with a DSP core, or any other such configuration. The software may reside in any form of computer readable tangible storage medium. By way of example, and not limitation, such computer-readable tangible storage media can comprise RAM, ROM, EEPROM, CD-ROM or other optical disk storage, magnetic disk storage or other magnetic storage devices, or any other tangible medium that can be used to carry or store desired program code in the form of instructions or data structures and that can be accessed by a computer. Disk, as used herein, includes Compact Disk (CD), laser disk, optical disk, Digital Versatile Disk (DVD), floppy disk, and Blu-ray disk.
Various changes, substitutions and alterations to the techniques described herein may be made without departing from the techniques of the teachings as defined by the appended claims. Moreover, the scope of the claims of the present disclosure is not limited to the particular aspects of the process, machine, manufacture, composition of matter, means, methods and acts described above. Processes, machines, manufacture, compositions of matter, means, methods, or acts, presently existing or later to be developed that perform substantially the same function or achieve substantially the same result as the corresponding aspects described herein may be utilized. Accordingly, the appended claims are intended to include within their scope such processes, machines, manufacture, compositions of matter, means, methods, or acts.
The previous description of the disclosed aspects is provided to enable any person skilled in the art to make or use the present disclosure. Various modifications to these aspects will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other aspects without departing from the scope of the disclosure. Thus, the present disclosure is not intended to be limited to the aspects shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.
The foregoing description has been presented for purposes of illustration and description. Furthermore, this description is not intended to limit embodiments of the disclosure to the form disclosed herein. While a number of example aspects and embodiments have been discussed above, those of skill in the art will recognize certain variations, modifications, alterations, additions and sub-combinations thereof.