File merging method and device and electronic equipment

文档序号:8559 发布日期:2021-09-17 浏览:30次 中文

1. A method for merging files, the method comprising the steps of:

detecting activity rule identity between every two files to be merged in a plurality of files to be merged, wherein the files to be merged comprise activity rule features used for calculating the activity rule identity and file features used for calculating the file similarity;

if the fact that the activity rule identification degree between two files to be merged in the plurality of files to be merged reaches a preset identification degree threshold value is detected, taking the two files to be merged as target files, and calculating the file similarity between the two target files;

and merging the target files with the file similarity reaching a preset similarity threshold.

2. The archive merging method as claimed in claim 1, wherein the step of detecting the activity rule identity between each two archives to be merged in the plurality of archives to be merged comprises:

extracting the activity rule characteristics of each file to be merged;

calculating the activity rule characteristic identity degree between every two activity rule characteristics;

and determining the activity rule identity degree between every two files to be merged in the plurality of files to be merged based on the activity rule feature identity degree between every two activity rule features.

3. The method of merging drawing files according to claim 2, wherein the activity rule features include a plurality of activity rule feature values corresponding to a plurality of times, and the step of calculating the activity rule identity between each two activity rule features includes:

calculating the activity rule characteristic value identification degree between two activity rule characteristic values at every two same time in every two activity rule characteristics;

and determining the activity rule characteristic identity degree between every two activity rule characteristics according to at least one activity rule characteristic value identity degree in every two activity rule characteristics.

4. The archive merging method according to claim 1, wherein the step of calculating the archive similarity between two target archives comprises:

extracting the file characteristics of the two target files;

calculating the similarity of the file characteristics between the two file characteristics;

and determining the file similarity of the two target files based on the file feature similarity between the two file features.

5. The file merging method according to claim 4, wherein the file characteristics comprise a plurality of file sub-characteristics; the step of calculating the archival feature similarity between two archival features includes:

calculating the file sub-feature similarity between each file sub-feature in the two file features according to a Cartesian product strategy;

and determining the file feature similarity between the two file features based on the calculated file sub-feature similarity in the two file features.

6. An archive merging apparatus, comprising:

the system comprises a detection module, a merging module and a merging module, wherein the detection module is used for detecting activity rule identification degrees between every two files to be merged in a plurality of files to be merged, and the files to be merged comprise activity rule features used for calculating the activity rule identification degrees and file features used for calculating file similarity;

the calculation module is used for taking the two files to be merged as target files and calculating the file similarity between the two target files if the fact that the activity rule identification degree between the two files to be merged in the plurality of files to be merged reaches a preset identification degree threshold value is detected;

and the merging module is used for merging the target files with the file similarity reaching the preset similarity threshold.

7. The archive merging device of claim 6, wherein the detection module comprises:

the first extraction unit is used for extracting the activity rule characteristics of each file to be merged;

the first calculation unit is used for calculating the activity rule characteristic identity degree between every two activity rule characteristics;

and the first determining unit is used for determining the activity rule identification degree between every two files to be merged in the plurality of files to be merged based on the activity rule feature identification degree between every two activity rule features.

8. The drawing file merging apparatus according to claim 7, wherein the activity rule feature includes a plurality of activity rule feature values corresponding to a plurality of times, the first calculation unit includes:

the first calculating subunit is used for calculating the activity rule characteristic value identification degree between two activity rule characteristic values at every two same time in every two activity rule characteristics;

the first determining subunit is configured to determine an activity rule feature identity degree between every two activity rule features according to at least one of the activity rule feature values of every two activity rule features.

9. An electronic device, comprising: memory, processor and computer program stored on the memory and executable on the processor, the processor implementing the steps in the archive merging method according to any of claims 1 to 5 when executing the computer program.

10. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the steps in the archive merging method according to any one of claims 1 to 5.

Background

With the progress of society, the circulation of personnel is more common, so that the difficulty of large-scale urban personnel management is increased. There are several departments or systems that can establish a staff profile for each person to facilitate the management of the person. One of them is to take a person image through a camera and create a person profile based on the image taken under the camera. In order to more effectively manage the personnel, different people shot by the camera are generally respectively established with corresponding personnel files, and if a certain person is shot by the same camera again, the currently shot image is put into the established personnel files. However, due to the problems of occlusion, angle, light and the like of the real production environment, the situation that the same person has a plurality of files in the clustering filing can occur. Thereby causing a plurality of persons to have a plurality of profiles, so that the data volume of the personal profile increases. So that the number of files is too large, such as billions, when dealing with large-scale personal files at the city level. It is inconvenient for large-scale file management in city level, and needs to combine a plurality of files of the same person, thereby reducing the number of personal files.

However, it is not ideal to simply merge multiple files of the same person based on the acquaintance between the files. Because the duplicate file merging rate is low when the file identification threshold is set to be higher, the duplicate file can be merged incorrectly when the file identification threshold is set to be low. Therefore, the problems of low file merging efficiency and accuracy in the existing file merging method can be seen.

Disclosure of Invention

The embodiment of the invention provides a file merging method which can improve the efficiency and accuracy of file merging.

In a first aspect, an embodiment of the present invention provides a file merging method, where the method includes:

detecting activity rule identity between every two files to be merged in a plurality of files to be merged, wherein the files to be merged comprise activity rule features used for calculating the activity rule identity and file features used for calculating the file similarity;

if the fact that the activity rule identification degree between two files to be merged in the plurality of files to be merged reaches a preset identification degree threshold value is detected, taking the two files to be merged as target files, and calculating the file similarity between the two target files;

and merging the target files with the file similarity reaching a preset similarity threshold.

Optionally, the step of detecting the activity rule identity between every two files to be merged in the plurality of files to be merged includes:

extracting the activity rule characteristics of each file to be merged;

calculating the activity rule characteristic identity degree between every two activity rule characteristics;

and determining the activity rule identity degree between every two files to be merged in the plurality of files to be merged based on the activity rule feature identity degree between every two activity rule features.

Optionally, the activity rule features include a plurality of activity rule feature values corresponding to a plurality of times, and the step of calculating the activity rule identity between every two activity rule features includes:

calculating the activity rule characteristic value identification degree between two activity rule characteristic values at every two same time in every two activity rule characteristics;

and determining the activity rule characteristic identity degree between every two activity rule characteristics according to at least one activity rule characteristic value identity degree in every two activity rule characteristics.

Optionally, the step of calculating the profile similarity between the two target profiles includes:

extracting the file characteristics of the two target files;

calculating the similarity of the file characteristics between the two file characteristics;

and determining the file similarity of the two target files based on the file feature similarity between the two file features.

Optionally, the archive feature comprises a plurality of archive sub-features; the step of calculating the archival feature similarity between two archival features includes:

calculating the file sub-feature similarity between each file sub-feature in the two file features according to a Cartesian product strategy;

and determining the file feature similarity between the two file features based on the calculated file sub-feature similarity in the two file features.

In a second aspect, an embodiment of the present invention further provides a file merging device, where the method includes:

the system comprises a detection module, a merging module and a merging module, wherein the detection module is used for detecting activity rule identification degrees between every two files to be merged in a plurality of files to be merged, and the files to be merged comprise activity rule features used for calculating the activity rule identification degrees and file features used for calculating file similarity;

the calculation module is used for taking the two files to be merged as target files and calculating the file similarity between the two target files if the fact that the activity rule identification degree between the two files to be merged in the plurality of files to be merged reaches a preset identification degree threshold value is detected;

and the merging module is used for merging the target files with the file similarity reaching the preset similarity threshold.

Optionally, the detection module includes:

the first extraction unit is used for extracting the activity rule characteristics of each file to be merged;

the first calculation unit is used for calculating the activity rule characteristic identity degree between every two activity rule characteristics;

and the first determining unit is used for determining the activity rule identification degree between every two files to be merged in the plurality of files to be merged based on the activity rule feature identification degree between every two activity rule features.

Optionally, the activity rule feature includes a plurality of activity rule feature values corresponding to a plurality of times, and the calculating unit includes:

the first calculating subunit is used for calculating the activity rule characteristic value identification degree between two activity rule characteristic values at every two same time in every two activity rule characteristics;

the first determining subunit is configured to determine an activity rule feature identity degree between every two activity rule features according to at least one of the activity rule feature values of every two activity rule features.

In a third aspect, an embodiment of the present invention further provides an electronic device, including: the system comprises a memory, a processor and a computer program stored on the memory and capable of running on the processor, wherein the processor executes the computer program to realize the steps of the archive merging method provided by the embodiment.

In a fourth aspect, an embodiment of the present invention further provides a computer-readable storage medium, where a computer program is stored on the computer-readable storage medium, and when the computer program is executed by a processor, the computer program implements the steps in the archive merging method provided in the foregoing embodiment.

In the embodiment of the present invention, by detecting an activity rule identity between every two files to be merged in a plurality of files to be merged, the files to be merged include an activity rule feature for calculating the activity rule identity and a file feature for calculating file similarity; if the fact that the activity rule identification degree between two files to be merged in the plurality of files to be merged reaches a preset identification degree threshold value is detected, taking the two files to be merged as target files, and calculating the file similarity between the two target files; and merging the target files with the file similarity reaching a preset similarity threshold. Therefore, the condition can be judged by the activity rule identification degree and the file similarity of the two files to be merged, and the problem that the file merging effect is not ideal based on the file similarity between the files can be well solved. Because the activity rule of the same person is basically consistent every day, the activity rule between the two files is basically the same if the two files are the files of the same person. Then, on the basis that the activity rule identification degrees of the two files to be merged meet the preset identification degree, the file similarity between the two files is calculated, so that the efficiency and the accuracy of file merging can be greatly improved.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.

FIG. 1 is a flowchart illustrating a file merging method according to an embodiment of the present invention;

FIG. 2 is a flow chart of a method provided in step 101 in the embodiment of FIG. 1;

FIG. 3 is a flow chart of one method provided in step 202 of the embodiment of FIG. 2;

FIG. 4 is a flow chart of one method provided by step 102 in the embodiment of FIG. 1;

FIG. 5 is a flow chart of one method provided in step 402 of the embodiment of FIG. 4;

FIG. 6 is a schematic structural diagram of a file merging device according to an embodiment of the present invention;

FIG. 7 is a schematic diagram of one configuration provided by the detection module in the embodiment of FIG. 6;

FIG. 8 is a schematic diagram of a structure provided by the first computing unit in the embodiment of FIG. 7;

FIG. 9 is a schematic diagram of a structure provided by the computing module in the embodiment of FIG. 6;

FIG. 10 is a schematic diagram of a structure provided by the second computing unit in the embodiment of FIG. 9;

fig. 11 is a schematic structural diagram of an electronic device according to an embodiment of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

Referring to fig. 1, fig. 1 is a flowchart of a file merging method according to an embodiment of the present invention, the method including the steps of:

step 101, detecting activity rule identity between every two files to be merged in a plurality of files to be merged.

The archive to be merged comprises activity rule features used for calculating activity rule identity and archive features used for calculating archive similarity.

The files to be merged are a plurality of files of the same person needing to be merged. One profile corresponds to the same person. Of course, if it is desired to merge multiple files of multiple persons, the file to be merged is multiple files of multiple persons. The archive to be merged may be a picture archive, which is archived by pictures, and of course, the archive to be merged may also be a text archive or an attribute archive, etc. In the present embodiment, the description is mainly given by taking the picture file as the main point. The picture referred to here may be an image, a photograph, or the like.

The activity rule is the activity track rule of a certain person in each file to be merged. And the pictures in each archive to be merged are sorted according to the time stamps, so that the activity track rule of a certain person can be reflected. Each file to be merged has an own activity rule, and when two files to be merged are files of the same person, the activity rules embodied in the two files to be merged are basically consistent, because the activity rules of the same person are basically consistent every day.

The above-mentioned activity rule acquaintance degree is the acquaintance degree between the activity rules of each file to be merged. The activity rule features are features capable of representing the activity rule, such as activity place features of an activity track rule and the like. The activity rule acquaintance of the activity rule can be calculated through the activity rule characteristic.

The file similarity is a similarity between two files to be merged. Above-mentioned archives characteristic can be for representing the characteristic of waiting to merge archives, for example, the cover characteristic of every archives cover in the picture archives, and when the archives cover was the face cover, the archives characteristic was then the face characteristic, and the archives similarity was then the similarity between two face characteristics. Attribute characteristics of each profile attribute in the attribute profile. The profile attribute may be a fingerprint, DNA, etc. of a person, and may also be height, weight, body type, etc. The attribute features may include fingerprint features, DNA features, etc. corresponding to the persons.

Specifically, when a plurality of files to be merged are obtained, the degree of acquaintance between the activity rules of any two files to be merged in the plurality of files to be merged needs to be calculated.

It should be noted that a plurality of archives to be merged may be stored in the archive database or in the archive cloud space, or may be stored in the archive system, and of course, the number of archives to be merged increases with time. Each archive to be merged can be obtained by archives built by different systems or different departments, for example, a camera in a certain cell management system shoots a face image of a certain person, and the system establishes an archive of the person based on the shot face image and stores the archive in a corresponding archive database; and the camera in another community management system also shoots the face image of the person, establishes a corresponding archive for the person and stores the corresponding archive in a corresponding archive database. In this case, the person has a file in the file database of both cell management systems, i.e. the person has two files at the same time, but both files are of the same person.

Or, after a certain camera shoots the face image of a certain person and establishes a personal picture archive, when the camera shoots the face image of the person again, because the camera considers that the person is not the same person due to the problems of shielding, angles, light rays and the like of the actual production environment, a new picture archive is re-established for the person, so that the person has two archives and stores the archives into the corresponding archive database.

Thus, when personnel files of a plurality of community management systems need to be managed, files in file databases of the plurality of community management systems need to be taken out for management, and therefore the situation that the same person has a plurality of files exists. And each district management system can establish the archives of a plurality of people, so when managing the archives of a plurality of district management systems, the situation that a plurality of people correspond to a plurality of archives exists. For this reason, the plurality of files to be merged may correspond to a plurality of files provided in a plurality of cell management systems in this example, and the plurality of files may be part of or all of the file data provided by the plurality of cell management systems. Each file to be merged can also be obtained by filing of different filing terminals of the same system or the same department.

Step 102, if it is detected that the activity rule identification between two files to be merged in the plurality of files to be merged reaches a preset identification threshold, taking the two files to be merged as target files, and calculating the file similarity between the two target files.

The preset acquaintance threshold value may be a preset acquaintance threshold value. The preset acquaintance threshold is a standard for judging whether the activity rules in the two files to be merged are consistent, and can be set as required. The greater the activity rule identity of the two files to be merged is, the more consistent the activity rule in the two files to be merged is, and certainly, under the condition that a preset identity threshold is set, when the activity rule identity of the two files to be merged meets the preset identity threshold, the more consistent the activity rule between the two files to be merged can be explained, and then the two files to be merged are target files.

The two files to be merged as the target file means that the two files to be merged are files to be merged, the similarity of which needs to be calculated.

Specifically, after detecting the activity rule identity between every two files to be merged in the plurality of files to be merged, comparing and judging the detected activity rule identity with a preset identity threshold. The number of regular activity acquaintances may be one or more. When one or more activity rule identity degrees are detected, each activity rule identity degree is compared with a preset identity degree threshold value, and whether each activity rule identity degree meets the preset identity degree threshold value or not is judged. If the activity rule identity is greater than the preset identity threshold, the activity rule identity meets the preset identity threshold, that is, the activity rules of the two files to be merged corresponding to the activity rule identity are basically consistent, the two files to be merged are reserved as target files, and the file similarity of the two files to be merged is calculated.

Of course, if each activity rule identity degree is less than or equal to the preset identity degree threshold, it indicates that the activity rule identity degree does not satisfy the preset identity degree threshold, that is, it indicates that the activity rules of the two files to be merged corresponding to the activity rule identity degree are inconsistent, it is not necessary to keep the two files to be merged as target files, and it is also not necessary to calculate the file similarity between the two files to be merged.

It should be noted that when the calculated phase identity degrees of the plurality of activity rules all satisfy the preset phase identity degree, a plurality of groups of target files are correspondingly obtained. And respectively calculating the file similarity of the plurality of groups of target files. It also means that the number of file similarity is also plural.

And 103, merging the two target files with the file similarity reaching the preset similarity threshold.

The preset similarity threshold may be a preset similarity threshold. The preset similarity threshold is a condition for determining whether the two files to be merged are files of the same person. The preset similarity threshold may be set as required, for example, the similarity threshold is set to be 0.90, if the similarity between two files to be merged is 0.95, which indicates that the file similarity 0.95 of the two files to be merged is greater than the preset similarity threshold 0.90, the two files to be merged are considered to be files of the same person. The merging refers to merging two files to be merged into one file, that is, merging two files to be merged belonging to the same person into one file.

Specifically, after the file similarity between two target files is obtained through calculation, the file similarity obtained through calculation is compared with a preset similarity threshold value for judgment, and whether the file similarity meets the preset similarity threshold value is judged. If the similarity of the files is greater than the preset similarity threshold, the similarity of the files meets the preset similarity threshold, namely, two files to be merged corresponding to two target files corresponding to the similarity of the files are the files of the same person, the two files to be merged are merged to obtain one file, and thus the files belonging to the same person can be combined into one file.

Of course, if the file similarity is less than or equal to the preset similarity threshold, it indicates that the file similarity does not satisfy the preset similarity threshold, i.e., it indicates that the two files to be merged corresponding to the two target files corresponding to the file similarity are not files of the same person, and the two files to be merged are not merged.

Furthermore, when a plurality of archives exist in a plurality of archives to be merged at the same time, a plurality of archives meeting the activity rule identification degree and the archives similarity degree can be merged into one archive together.

In the embodiment of the invention, by detecting the activity rule identity between every two files to be merged in a plurality of files to be merged, the files to be merged comprise an activity rule characteristic used for calculating the activity rule identity and a file characteristic used for calculating the file similarity; if the fact that the activity rule identification degree between two files to be merged in the plurality of files to be merged reaches a preset identification degree threshold value is detected, taking the two files to be merged as target files, and calculating the file similarity between the two target files; and merging the target files with the file similarity reaching a preset similarity threshold. Therefore, the condition can be judged by the activity rule identification degree and the file similarity of the two files to be merged, and the problem that the file merging effect is not ideal based on the file similarity between the files can be well solved. Because the activity rule of the same person is basically consistent every day, the activity rule between the two files is basically the same if the two files are the files of the same person. Then, on the basis that the activity rule identification degrees of the two files to be merged meet the preset identification degree, the file similarity between the two files is calculated, so that the efficiency and the accuracy of file merging can be greatly improved.

Referring to fig. 2, fig. 2 is a flow chart of a method provided in step 101 of the embodiment of fig. 1, wherein step 101 includes the steps of:

step 201, extracting the activity rule characteristics of each file to be merged.

Step 202, calculating the activity rule feature acquaintance between every two activity rule features.

And 203, obtaining the activity rule identity degree between every two files to be merged in the plurality of files to be merged based on the activity rule feature identity degree between every two activity rule features.

The activity rule feature acquaintance degree is an acquaintance degree between the activity rule features of the activity rules in the two files to be merged. The activity rule feature identification degree is used for judging whether two conditions for judging whether two activity rules are consistent or not.

Specifically, the activity rule features of each file to be merged in a plurality of files to be merged are extracted. Then, the acquaintance degree between the corresponding activity rule characteristics in the activity rules of every two files to be merged is calculated respectively. And the activity rule feature identity corresponding to the two activity rule features is used as the rule identity between the two activity rules, so as to obtain the activity rule identity between the two files to be merged.

Further, referring to fig. 3, fig. 3 is a flowchart of a method provided in step 202 in the embodiment of fig. 2, where the activity rule feature includes a plurality of activity rule feature values corresponding to a plurality of times, and step 202 includes:

step 301, calculating the activity rule characteristic value identification degree between every two activity rule characteristic values at the same time in every two activity rule characteristics.

Step 302, obtaining an activity rule feature identity between every two activity rule features according to at least one activity rule feature value identity in every two activity rule features.

Wherein, the time can be set according to year, month, day and hour. The same time may be the same year, the same month, the same day, the same time, etc. The activity rule characteristic value is a characteristic value corresponding to the activity rule characteristic at a certain time in the activity rule characteristics. The number of the activity rule characteristic values may be one or more. Correspondingly, the number of the activity rule characteristic value identification degrees can be one or more, and when two activity rule characteristic values exist at the same time, the activity rule characteristic value identification degrees are also multiple.

Specifically, taking the activity rule characteristic values (the activity rule characteristic values are set as 24-dimensional integer arrays, and may further represent the activity rule characteristic values corresponding to 24 hours) of the files A, A1 and B, C on each date as an example, the activity rule characteristic values of the provided files A, A1 and B, C are respectively as follows:

a (number 1): [1,2,3,4,5,6,7,8,9,10,11,12,13, … ];

a (number 2): [1,2,3,4,5,6,7,8,9,10,11,12,13, … ];

a (No. 3): [1,2,3,4,5,6,7,8,9,10,11,12,13, … ];

a (number 4): [1,2,3,4,5,6,7,8,9,10,11,12,13, … ];

a (number 5): [1,2,3,4,5,6,7,8,9,10,11,12,13, … ].

A1 (No. 1): [1,2,3,4,5,6,7,8,9,10,11,12,13, … ];

a1 (No. 3): [1,2,3,4,5,6,7,8,9,10,11,12,13, … ];

a1 (No. 4): [1,2,3,4,5,6,7,8,9,10,11,12,13, … ];

a1 (No. 5): [1,2,3,4,5,6,7,8,9,10,11,12,13, … ];

a1 (No. 7): [1,2,3,4,5,6,7,8,9,10,11,12,13, … ].

B (number 1): [1,2,3,4,5,6,7,8,9,10,11,12,13, … ];

b (No. 4): [1,2,3,4,5,6,7,8,9,10,11,12,13, … ];

b (number 6): [1,2,3,4,5,6,7,8,9,10,11,12,13, … ];

b (No. 7): [1,2,3,4,5,6,7,8,9,10,11,12,13, … ];

b (number 11): [1,2,3,4,5,6,7,8,9,10,11,12,13, … ].

C (No. 3): [1,2,3,4,5,6,7,8,9,10,11,12,13, … ];

c (No. 5): [1,2,3,4,5,6,7,8,9,10,11,12,13, … ];

c (No. 8): [1,2,3,4,5,6,7,8,9,10,11,12,13, … ];

c (number 12): [1,2,3,4,5,6,7,8,9,10,11,12,13, … ];

c (No. 23): [1,2,3,4,5,6,7,8,9,10,11,12,13, … ].

Wherein, A (No. 1), A (No. 2) and the like are represented as the activity rule characteristics of No. 1 and No. 2 of A archives. Of course, the number of dates in a (No. 1), a (No. 2), etc. may be determined according to actual circumstances. A (number 1): [1,2,3,4,5,6,7,8,9,10,11,12,13, … ] is expressed as an activity rule feature value corresponding to 24 hours in the activity rule feature of file A in item 1. In this embodiment, the value in [1,2,3,4,5,6,7,8,9,10,11,12,13, … ] is a specific value of the activity rule feature value corresponding to 24 hours, and of course, the value in [1,2,3,4,5,6,7,8,9,10,11,12,13, … ] is only exemplary, and the specific value may be set according to the actual activity rule feature value. Of course, when calculating the activity rule feature values between the activity rule feature values, the numerical values in [1,2,3,4,5,6,7,8,9,10,11,12,13, … ] may be converted into binary numbers, and then the calculation may be performed according to the binary numbers. For example, assume that the activity rule characteristic values of a (No. 1) are: [157 (10011101), 78 (01001110), 52 (00110100), 51 (00110011), … ]

It should be noted that, in this embodiment, the activity rule in each archive to be merged is based on camera mapping. Specifically, can set up the camera code that camera serial number and camera serial number correspond, wherein, the camera serial number sets up to: camera8, camera7, camera6, camera5, camera4, camera3, camera2, camera 1; the camera code is set as follows: 1. 1, 1.

It can be represented by an integer 255 because the binary value of 255 is 11111111, where each bit represents a camera. In a specific implementation, a camera code is 1 when a person is present in the camera and 0 when a person is not present in the camera. For example, if there is an activity rule feature value 157 (10011101) in profile a, it indicates that the person in profile a is present in camera1, camera3, camera4, camera5, and camera 8.

Further, after the activity rule feature values of each of the archives A, A1, B, C at each time are obtained, the archives A, A1, B, C respectively get the following activity rule feature values by taking the intersection according to the time, taking the intersection according to the time of the archive a and the archive a1 as an example, specifically as follows:

A∩A1:

a (number 1) A (number 2) A (number 3) A (number 4) A (number 5)

A1 (number 1) A1 (number 3) A1 (number 4) A1 (number 5) A1 (number 7)

A (No. 1) a (No. 3) a (No. 4) a (No. 5) and a1 (No. 1) a1 (No. 3) a1 (No. 4) a1 (No. 5).

Still further, assume that the characteristic values of a (No. 1) are: [157 (10011101), 78 (01001110), 52 (00110100), 51 (00110011), … ]

Assume that the characteristic value of a1 (No. 1) is: [156 (10011100), 76 (01001100), 52 (00110100), 50 (00110000), … ].

So a (No. 1) _ a1 (No. 1) ═ the sum of (number of identical digits per bit in binary)/(total number of digits in binary)

(157 (10011101) and 156 (10011100) are 7)7/8

+ (78 (01001110) and 76 (01001100) is 7)7/8

+ (52 (00110100) and 52 (00110100) are 8)8/8

+ (51 (00110011) and 50 (00110000) are 6)6/8

+…

7/8+7/8+8/8+6/8+ … (24-dimensional data in total)

=19.8。

Finally, since 19.8/24 is 0.825, the activity rule eigenvalue identification of the file a (No. 1) _ a1 (No. 1) finally obtained by calculation is 0.825.

Obtaining the following by the same method:

a (No. 3) _ a1 (No. 3) ═ 0.86;

a (No. 4) _ a1 (No. 4) ═ 0.88;

a (No. 5) _ a1 (No. 5) ═ 0.90.

The activity rule identification degree of the file A _ A1 is obtained as follows: 0.825+0.86+0.88+ 0.90-3.465.

Similarly, the activity rule similarity of the file A _ B, the file A _ C, the file A1_ B, the file A1_ C, and the file B _ C is:

A_A1=3.456;

A_B=1.25;

A_C=0.105;

A1_B=1.9;

A1_C=1.2;

B_C=2.8。

and determining the activity rule characteristic value identification degree obtained by calculation as the activity rule characteristic identification degree between the two activity rule characteristics, further obtaining the activity rule identification degree between the two activity rules, and further obtaining the activity rule identification degree between the two files to be merged.

If the predetermined similarity threshold is 2, the activity rule similarity of the file a _ a1 and the file B _ C in the files A, A1 and B, C satisfies the predetermined similarity threshold, which indicates that the activity rules of the file a and the file a1 are substantially consistent, and the activity rules of the file B and the file C are substantially consistent. And a _ B is 1.25; a _ C is 0.105; a1 — B ═ 1.9; a1_ C does not satisfy the preset identity threshold value when a 1.2 is satisfied, and the activity rules of a plurality of groups are inconsistent.

In this embodiment, the degree of identity of the activity rules of the corresponding personnel in the two files to be merged can be calculated according to the activity rule features in the two files to be merged and the corresponding activity rule feature values at different times, so as to determine whether the activity rules of the personnel corresponding to the two files to be merged are consistent, and if so, it is indicated that the two files to be merged may be files of the same personnel, and then the subsequent steps can be executed. If the two files to be merged are not consistent, the two files to be merged are not necessarily files of the same person, and subsequent steps are not required to be executed.

Referring to fig. 4, fig. 4 is a flow chart of one method provided in step 102 of the embodiment of fig. 1.

Step 102 comprises the steps of:

step 401, extracting the file characteristics of the two target files.

Step 402, calculating the similarity of the two file characteristics.

Step 403, obtaining the file similarity of the two files to be merged based on the file feature similarity between the two file features.

The above-mentioned profile feature similarity is a profile feature similarity between profile features in two target profiles. The profile feature similarity is a condition for determining whether two target profiles are profiles of one person.

Specifically, after determining that the activity rule identification degree between two files to be merged meets the preset identification degree, the two files to be merged are determined to be the target files, and the file similarity between the two files to be merged needs to be calculated. For this purpose, it is necessary to extract the profile features of the two target profiles, calculate the profile feature similarity between the two profile features, and determine the profile feature similarity as the profile similarity between the two target profiles, because the profile features can represent the target profiles. For example, if the file characteristics are file cover characteristics, the file cover characteristics corresponding to the two target files are extracted, and the file cover characteristic similarity between the two file cover characteristics is calculated, so that the file similarity of the two target files is obtained.

Further, referring to fig. 5, fig. 5 is a flow chart of a method provided in step 402 in the embodiment of fig. 4. The profile characteristics comprise a plurality of profile sub-characteristics, and step 402 comprises the steps of:

step 501, calculating the file sub-feature similarity between each file sub-feature in the two file features according to a Cartesian product strategy.

Step 502, obtaining the file feature similarity between two file features based on the similarity of each file sub-feature calculated from the two file features.

The Cartesian product is also called Cartesian product, which refers to the Cartesian product (also called direct product) of two sets X and Y in mathematics, and is expressed as X × Y, the first object is a member of X and the second object is one member of all possible ordered pairs of Y, and assuming that the set a is { a, B }, and the set B is {0,1,2}, the Cartesian products of the two sets are { (a,0), (a,1), (a,2), (B,0), (B,1), (B,2) }. The cartesian product strategy is a method for calculating the similarity between the sub-features of each file by using cartesian products.

The above-mentioned sub-features of the file can be a plurality of features in each file feature, for example, when the file feature is a cover feature, if the cover of the file to be merged is provided with a plurality of cover representative pictures, a sub-feature of the file is a corresponding feature of a cover representative picture.

The file sub-feature similarity is a similarity between each file sub-feature, and is a condition for determining whether each file sub-feature is the same file sub-feature. The number of archival sub-feature similarities may be one or more. When the files to be merged are all provided with only one file sub-feature, the obtained file sub-feature similarity is also only one, and when the files to be merged are all provided with a plurality of file sub-features, the obtained file sub-feature similarity is also only a plurality of.

Specifically, the file sub-feature similarity between the file sub-features in the target file is calculated through a Cartesian product strategy. And the average similarity of the sub-feature similarities of the files can be calculated, so that the file similarity between the two target files is obtained.

Illustratively, if file E has E1 file sub-feature, E2 file sub-feature, E3 file sub-feature; the file F has F1 file sub-feature, F2 file sub-feature, and F3 file sub-feature. Then the similarity between profile a and profile B can be calculated according to a cartesian product policy, for example:

e1_f1=0.91,e1_f2=0.94,e1_f3=0.93;

e2_f1=0.93,e2_f2=0.91,e2_f3=0.90;

e3_f1=0.95,e3_f2=0.90,e3_f3=0.96。

the average value thereof was found to be (0.91+0.94+0.93+0.93+0.91+0.90+0.95+0.90+0.96)/9 was 0.926.

Specifically, if the similarity of the files a _ a1 and B _ C can be obtained by the cartesian product policy as follows: a _ A1: 0.818; b _ C: 0.77. If the preset similarity threshold is set to 0.816, wherein A _ A1:0.818 is greater than the preset similarity threshold, the profile A and the profile A1 are the same person's profile. Whereas profile B and profile C are not profiles of the same person.

In this embodiment, the file sub-feature similarity between the file sub-features corresponding to the file features in the two target files is calculated, so as to obtain the file feature similarity between the two file features, and further obtain the file similarity between the two target files. And then whether the two files to be merged are the files of the same person or not is judged by calculating the activity rule identification degree and the file similarity between the two files to be merged. If the files are files of the same person, the two files to be merged are merged into one file. Thereby improving the efficiency and accuracy of file merging.

Referring to fig. 6, fig. 6 is a schematic structural diagram of a file merging device according to an embodiment of the present invention, where the file merging device 600 includes:

the detecting module 601 is configured to detect an activity rule identity between every two files to be merged in the multiple files to be merged, where the files to be merged include an activity rule feature for calculating the activity rule identity and a file feature for calculating a file similarity.

The calculating module 602 is configured to, if it is detected that the activity rule identity between two files to be merged in the plurality of files to be merged reaches a preset identity threshold, use the two files to be merged as target files, and calculate a file similarity between the two target files.

The merging module 603 is configured to merge the target files whose file similarity reaches a preset similarity threshold.

Referring to fig. 7, fig. 7 is a schematic structural diagram provided by the detection module in the embodiment of fig. 6, and the detection module 601 includes:

a first extracting unit 6011, configured to extract an activity rule feature of each file to be merged.

The first calculating unit 6012 is configured to calculate an activity rule feature identity between every two activity rule features.

The first determining unit 6013 determines, based on the activity rule feature similarity between every two activity rule features, an activity rule similarity between every two files to be merged in the plurality of files to be merged.

Referring to fig. 8, fig. 8 is a schematic structural diagram provided by the first computing unit in the embodiment of fig. 7, where the activity rule feature includes a plurality of activity rule feature values corresponding to a plurality of times, and the first computing unit 6012 includes:

a first calculating subunit 60121, configured to calculate an activity rule feature value identification degree between two activity rule feature values at every two same times in every two activity rule features.

The first determining subunit 60122 is configured to determine, according to the knowledge degree of at least one activity rule feature value in every two activity rule features, an activity rule feature knowledge degree between every two activity rule features.

Referring to fig. 9, fig. 9 is a schematic structural diagram provided by the computing module in the embodiment of fig. 6, and the computing module 602 includes:

the second extraction unit 6021 is configured to extract archive features of the two target archives.

A second calculating unit 6022, configured to calculate the archival feature similarity between the two archival features.

A second determining unit 6023, configured to determine the profile similarity of the two target profiles based on the profile feature similarity between the two profile features.

Referring to FIG. 10, FIG. 10 is a schematic diagram of a structure provided by the second computing unit in the embodiment of FIG. 9, in which the file feature includes a plurality of file sub-features; the second calculation unit 6022 includes:

a second calculating subunit 60221, configured to calculate a file sub-feature similarity between each of the two file features according to a cartesian product policy.

A second determining subunit 60222, configured to determine the similarity of the archival features between the two archival features based on the similarity of the respective archival sub-features calculated in the two archival features.

The file merging device 600 provided in the embodiment of the present invention can implement each implementation manner in the foregoing method embodiments and corresponding beneficial effects, and for avoiding repetition, details are not described here.

Referring to fig. 11, fig. 11 is a schematic structural diagram of an electronic device 700 according to an embodiment of the present invention, where the electronic device 700 includes: the memory 702, the processor 701, and a computer program stored on the memory 702 and executable on the processor 701, when the processor 701 executes the computer program, the steps in the archive merging method provided by the above embodiments are implemented, and the processor 701 executes the following steps:

and detecting the activity rule identification degree between every two files to be merged in the plurality of files to be merged, wherein the files to be merged comprise the activity rule characteristics used for calculating the activity rule identification degree and the file characteristics used for calculating the file similarity.

And if the fact that the activity rule identification degree between two files to be merged in the plurality of files to be merged reaches a preset identification degree threshold value is detected, taking the two files to be merged as target files, and calculating the file similarity between the two target files.

And merging the target files with the file similarity reaching a preset similarity threshold.

Optionally, the step of detecting the activity rule identity between every two files to be merged in the multiple files to be merged executed by the processor 701 includes:

and extracting the activity rule characteristics of each file to be merged.

And calculating the activity rule characteristic identity degree between every two activity rule characteristics.

And determining the activity rule identity degree between every two files to be merged in the plurality of files to be merged based on the activity rule feature identity degree between every two activity rule features.

Optionally, the activity rule features include a plurality of activity rule feature values corresponding to a plurality of times, and the step of calculating the activity rule identity between every two activity rule features, which is executed by the processor 701, includes:

calculating the activity rule characteristic value identification degree between two activity rule characteristic values at every two same time in every two activity rule characteristics;

and determining the activity rule characteristic identity degree between every two activity rule characteristics according to at least one activity rule characteristic value identity degree in every two activity rule characteristics.

Optionally, the step of calculating the profile similarity between two target profiles executed by the processor 701 includes:

and extracting the file characteristics of the two target files.

And calculating the similarity of the file characteristics between the two file characteristics.

Based on the profile feature similarity between the two profile features, the profile similarity of the two target profiles is determined.

Optionally, the file feature comprises a plurality of file sub-features; the steps performed by processor 701 to calculate the archival feature similarity between two archival features include:

and calculating the similarity of the sub-characteristics of the files between the sub-characteristics of the files in the two file characteristics according to a Cartesian product strategy.

And determining the file feature similarity between the two file features based on the calculated file sub-feature similarity in the two file features.

It should be noted that the electronic device 700 may be an intelligent terminal, a mobile phone, a tablet computer, and the like of a department related to the archive.

The electronic device 700 provided in the embodiment of the present invention can implement each implementation manner in the above method embodiments and corresponding beneficial effects, and is not described herein again to avoid repetition.

The embodiment of the present invention further provides a computer-readable storage medium, where a computer program is stored on the computer-readable storage medium, and when the computer program is executed by the processor 701, the computer program implements each process of the file merging method provided in the embodiment of the present invention, and can achieve the same technical effect, and in order to avoid repetition, details are not repeated here.

It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by a computer program, which can be stored in a computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. The storage medium may be a magnetic disk, an optical disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), or the like.

The above disclosure is only for the purpose of illustrating the preferred embodiments of the present invention, and it is therefore to be understood that the invention is not limited by the scope of the appended claims.

完整详细技术资料下载
上一篇:石墨接头机器人自动装卡簧、装栓机
下一篇:用于模型验证的方法、装置、设备和介质

网友询问留言

已有0条留言

还没有人留言评论。精彩留言会获得点赞!

精彩留言,会给你点赞!