Case merging method and device based on multiple dimensions and storage medium
1. A case merging method based on multiple dimensions is characterized by comprising the following steps:
acquiring an original case and a candidate case set; the origin case comprises a first dimension set, the first dimension set comprises a plurality of first dimensions, the candidate case set comprises a plurality of candidate cases, and each candidate case comprises a plurality of second dimensions;
selecting one candidate case from the candidate case set;
respectively calculating first similarity of each first dimension and each second dimension which is the same as the first dimension in the selected candidate cases;
determining a second similarity of the original case and the selected candidate case according to the first similarity and a preset dimension weight;
determining a data set according to the second similarity; the data set comprises first important class data or second important class data, and the importance degree of the second important class data is higher than that of the first important class data;
merging the second key point class data and the original point case to obtain a merged case;
and taking the merged case as a new original point case, selecting a new candidate case from the candidate cases except the data set in the candidate case set, and returning to the step of respectively calculating the first similarity of each first dimension and each second dimension which is the same as the first dimension in the selected candidate cases for iteration until a preset iteration condition is reached to obtain a merged result.
2. The case merging method based on multiple dimensions as claimed in claim 1, wherein: each first dimension comprises at least one first element, each second dimension comprises at least one second element, and the step of respectively calculating the first similarity between each first dimension and each second dimension which is the same as the first dimension in the selected candidate case comprises the following steps:
obtaining a first data depth score of the first element in each first dimension and a second data depth score of the second element in each second dimension;
and determining the first similarity of each first dimension and each second dimension which is the same as the first dimension in the selected candidate cases according to the first data depth score of the first element and the second data depth score of the second element which is the same as the first dimension corresponding to the first element.
3. The case merging method based on multiple dimensions as claimed in claim 1, wherein: the first dimension includes a what victim dimension, a when dimension, a where dimension, a what suspect dimension, a what tool dimension, a what instrument dimension, a what cause dimension, a what behavior dimension, a what result dimension, and a what state dimension.
4. The case merging method based on multiple dimensions as claimed in claim 3, wherein: determining a second similarity between the original case and the selected candidate case according to the first similarity and a preset dimension weight, wherein the determining the second similarity comprises the following steps:
weighting the first similarity and the preset dimension weight to obtain a second similarity of the original point case and the selected candidate case; the behavior dimension, the time dimension, the tool dimension, the place dimension, the reason dimension, the result dimension, the state dimension, the suspect dimension and the victim dimension are weighted in the preset dimension weight from big to small.
5. The case merging method based on multiple dimensions as claimed in claim 1, wherein: determining a data set according to the second similarity, including:
when the second similarity is greater than or equal to a first threshold and smaller than a second threshold, determining the selected candidate case as the first key class data;
alternatively, the first and second electrodes may be,
and when the second similarity is larger than or equal to a second threshold value, determining the selected candidate case as the second key point class data.
6. The case merging method based on multiple dimensions as claimed in claim 1, wherein: the reaching of the preset iteration condition comprises:
when the total amount of key class data in all the data sets is greater than or equal to a third threshold value, the preset iteration condition is reached;
alternatively, the first and second electrodes may be,
when the iteration times are larger than a fourth threshold value, the preset iteration condition is reached;
alternatively, the first and second electrodes may be,
and if the total quantity of the key class data in all the current data sets is equal to the total quantity of the key class data in the data set obtained in the last iteration, the preset iteration condition is reached.
7. The case merging method based on multiple dimensions as claimed in any one of claims 1-6, wherein: the method further comprises the following steps:
displaying the original point case and each data set on a map, and displaying the incidence relation between each data set and the second dimension and the first dimension of the original point case;
alternatively, the first and second electrodes may be,
displaying the origin case and each of the data sets on a map, responsive to a trajectory determination operation, displaying spatiotemporal motion trajectories of the merged results based on when and where dimensions; the first dimension and the data set each include a when dimension and a where dimension;
alternatively, the first and second electrodes may be,
and displaying the original position and each data set on a map, and responding to element input operation to position the original position or the data set corresponding to the input element on the map.
8. A case merging device based on multiple dimensions is characterized by comprising:
the acquisition module is used for acquiring an original case and a candidate case set; the origin case comprises a first dimension set, the first dimension set comprises a plurality of first dimensions, the candidate case set comprises a plurality of candidate cases, and each candidate case comprises a plurality of second dimensions;
a selecting module for selecting one of the candidate cases from the candidate case set;
the first calculation module is used for respectively calculating the first similarity of each first dimension and each second dimension which is the same as the first dimension in the selected candidate cases;
the first determining module is used for determining a second similarity of the original point case and the selected candidate case according to the first similarity and a preset dimension weight;
the second determining module is used for determining a data set according to the second similarity; the data set comprises at least one of first important class data and second important class data, and the importance of the second important class data is higher than that of the first important class data;
the merging module is used for merging the second key point class data and the original point case to obtain a merged case;
and the iteration module is used for taking the merged case as a new original point case, selecting a new candidate case from the candidate cases except the data set in the candidate case set, and returning the step of respectively calculating the first similarity of each first dimension and each second dimension which is the same as the first dimension in the selected candidate cases for iteration until a preset iteration condition is reached to obtain a merged result.
9. A case merging device based on multiple dimensions is characterized by comprising a processor and a memory;
the memory stores a program;
the processor executes the program to implement the method of any one of claims 1-7.
10. A computer-readable storage medium, characterized in that the storage medium stores a program which, when executed by a processor, implements the method according to any one of claims 1-7.
Background
Along with the development of economy and science and technology, the criminal activities become more and more rampant in recent years, and are characterized and trend of scale, group and globalization. Various cases such as novel theft, fraud and the like are in endlessly, but the crime has the characteristic of carrying out invasion for many times, so that a detection basis is provided for the serial and parallel connection of a series of cases. Nowadays, case databases have been built in police service systems after years of development, and a large amount of alarm situations and data of different cases are accumulated in the databases, but in real work, deep analysis is performed to determine whether different cases have associations, whether a plurality of cases are independent cases or series cases often depends on a relatively experienced person to inquire through individual keywords, and then the inquired cases are analyzed and determined whether to be combined based on the keywords and other possible association points.
Disclosure of Invention
In view of the above, in order to solve the above technical problems, an object of the present invention is to provide a case merging method, device and storage medium based on multiple dimensions, which improve analysis efficiency and ensure validity of associated case analysis.
The technical scheme adopted by the invention is as follows:
a case merging method based on multiple dimensions comprises the following steps:
acquiring an original case and a candidate case set; the origin case comprises a first dimension set, the first dimension set comprises a plurality of first dimensions, the candidate case set comprises a plurality of candidate cases, and each candidate case comprises a plurality of second dimensions;
selecting one candidate case from the candidate case set;
respectively calculating first similarity of each first dimension and each second dimension which is the same as the first dimension in the selected candidate cases;
determining a second similarity of the original case and the selected candidate case according to the first similarity and a preset dimension weight;
determining a data set according to the second similarity; the data set comprises first important class data or second important class data, and the importance degree of the second important class data is higher than that of the first important class data;
merging the second key point class data and the original point case to obtain a merged case;
and taking the merged case as a new original point case, selecting a new candidate case from the candidate cases except the data set in the candidate case set, and returning to the step of respectively calculating the first similarity of each first dimension and each second dimension which is the same as the first dimension in the selected candidate cases for iteration until a preset iteration condition is reached to obtain a merged result.
Further, each of the first dimensions includes at least one first element, each of the second dimensions includes at least one second element, and the calculating a first similarity between each of the first dimensions and each of the second dimensions of the selected candidate cases that are the same as the first dimensions includes:
obtaining a first data depth score of the first element in each first dimension and a second data depth score of the second element in each second dimension;
and determining the first similarity of each first dimension and each second dimension which is the same as the first dimension in the selected candidate cases according to the first data depth score of the first element and the second data depth score of the second element which is the same as the first dimension corresponding to the first element.
Further, the first dimension includes what victim dimension, when dimension, where dimension, what suspect dimension, what tool dimension, what instrument dimension, what cause dimension, what behavior dimension, what result dimension, and what status dimension.
Further, the determining a second similarity between the original case and the selected candidate case according to the first similarity and a preset dimension weight includes:
weighting the first similarity and the preset dimension weight to obtain a second similarity of the original point case and the selected candidate case; the behavior dimension, the time dimension, the tool dimension, the place dimension, the reason dimension, the result dimension, the state dimension, the suspect dimension and the victim dimension are weighted in the preset dimension weight from big to small.
Further, the determining a data set according to the second similarity includes:
when the second similarity is greater than or equal to a first threshold and smaller than a second threshold, determining the selected candidate case as the first key class data;
alternatively, the first and second electrodes may be,
and when the second similarity is larger than or equal to a second threshold value, determining the selected candidate case as the second key point class data.
Further, the reaching of the preset iteration condition includes:
when the total amount of key class data in all the data sets is greater than or equal to a third threshold value, the preset iteration condition is reached;
alternatively, the first and second electrodes may be,
when the iteration times are larger than a fourth threshold value, the preset iteration condition is reached;
alternatively, the first and second electrodes may be,
and if the total quantity of the key class data in all the current data sets is equal to the total quantity of the key class data in the data set obtained in the last iteration, the preset iteration condition is reached.
Further, the method further comprises:
displaying the original point case and each data set on a map, and displaying the incidence relation between each data set and the second dimension and the first dimension of the original point case;
alternatively, the first and second electrodes may be,
displaying the origin case and each of the data sets on a map, responsive to a trajectory determination operation, displaying spatiotemporal motion trajectories of the merged results based on when and where dimensions; the first dimension and the data set each include a when dimension and a where dimension;
alternatively, the first and second electrodes may be,
and displaying the original position and each data set on a map, and responding to element input operation to position the original position or the data set corresponding to the input element on the map.
The invention also provides a case merging device based on multiple dimensions, which comprises:
the acquisition module is used for acquiring an original case and a candidate case set; the origin case comprises a first dimension set, the first dimension set comprises a plurality of first dimensions, the candidate case set comprises a plurality of candidate cases, and each candidate case comprises a plurality of second dimensions;
a selecting module for selecting one of the candidate cases from the candidate case set;
the first calculation module is used for respectively calculating the first similarity of each first dimension and each second dimension which is the same as the first dimension in the selected candidate cases;
the first determining module is used for determining a second similarity of the original point case and the selected candidate case according to the first similarity and a preset dimension weight;
the second determining module is used for determining a data set according to the second similarity; the data set comprises at least one of first important class data and second important class data, and the importance of the second important class data is higher than that of the first important class data;
the merging module is used for merging the second key point class data and the original point case to obtain a merged case;
and the iteration module is used for taking the merged case as a new original point case, selecting a new candidate case from the candidate cases except the data set in the candidate case set, and returning the step of respectively calculating the first similarity of each first dimension and each second dimension which is the same as the first dimension in the selected candidate cases for iteration until a preset iteration condition is reached to obtain a merged result.
The invention also provides a case merging device based on multiple dimensions, which comprises a processor and a memory;
the memory stores a program;
the processor executes the program to implement the method.
The present invention also provides a computer-readable storage medium storing a program which, when executed by a processor, implements the method.
The invention has the beneficial effects that: acquiring an original point case and a candidate case set, respectively calculating first similarity of each first dimension and each second dimension which is the same as the first dimension in the selected candidate cases, determining second similarity of the original point case and the selected candidate cases according to the first similarity and a preset dimension weight, determining a data set according to the second similarity, merging the second key class data and the original point case to obtain a merged case, and automatically determining the merged case based on the similarity of the original point case and the candidate cases, so that the analysis efficiency of the associated case is improved; and taking the merged case as a new original point case, selecting a new candidate case from the candidate cases in the candidate case set except the data set, returning to the step of respectively calculating the first similarity of each first dimension and each second dimension which is the same as the first dimension in the selected candidate cases for iteration until a preset iteration condition is reached to obtain a merged result, so that the final merged result comprises a plurality of candidate cases associated with the original point case, the efficiency can be automatically analyzed under the condition of having a plurality of candidate cases, and the analysis effectiveness of the associated cases is ensured to a certain extent.
Drawings
FIG. 1 is a schematic flow chart illustrating the steps of the case merging method based on multiple dimensions according to the present invention;
FIG. 2 is a diagram of a serial-parallel intelligent operation center according to an embodiment of the present invention;
fig. 3 is a schematic diagram of a visual relationship graph according to an embodiment of the present invention.
Detailed Description
In order to make the technical solutions better understood by those skilled in the art, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only partial embodiments of the present application, but not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
The terms "first," "second," "third," and "fourth," etc. in the description and claims of this application and the accompanying drawings are used for distinguishing between different objects and not for describing a particular order. Furthermore, the terms "include" and "have," as well as any variations thereof, are intended to cover non-exclusive inclusions. For example, a process, method, system, article, or apparatus that comprises a list of steps or elements is not limited to only those steps or elements listed, but may alternatively include other steps or elements not listed, or inherent to such process, method, article, or apparatus.
Reference herein to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment can be included in at least one embodiment of the application. The appearances of the phrase in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. It is explicitly and implicitly understood by one skilled in the art that the embodiments described herein can be combined with other embodiments.
As shown in fig. 1, the present embodiment provides a case merging method based on multiple dimensions, including steps S100-S700:
s100, acquiring an original case and a candidate case set.
Specifically, the origin case refers to a case which needs to determine whether other associated cases exist, and the origin case comprises a first dimension set, and the first dimension set comprises a plurality of first dimensions. The candidate case set includes a plurality of candidate cases, each candidate case including a plurality of second dimensions. Optionally, the first dimension includes a dimension of what victim, a dimension of when, a dimension of what place, a dimension of what suspect, a dimension of what tool, a dimension of what means, a dimension of what cause, a dimension of what behavior, a dimension of what result, and a dimension of what state, that is, the first dimension set includes ten kinds of first dimensions; likewise, the second dimension also includes the ten dimensions described above.
In an embodiment of the present invention, each first dimension includes at least one first element, for example: in which victim dimension, the first element includes, but is not limited to, identity number, residence address, cell phone number, gender, age group (teenager [ age ═ 17], youth [18 ═ age ═ 45], middle age [46 ═ age ═ 69], senior [70 ═ age ]), occupation (foreigners, yellow-involved people, virus-involved people, gambling-involved people, harbor australian, students, other occupation, etc.), native place, cell phone, mailbox, etc.; in the when dimension, the first elements include, but are not limited to, a scheduled date, a scheduled time period (morning [02:00-06:00], morning [06:00-12:00], noon [12:00-14:00], afternoon [14:00-17:00], evening [17:00-24:00], late night [24:00-03:00]), weekday, weekend, holiday, etc.; in which dimension, the first elements include but are not limited to a case address, a place type (garden district, villa, multi-storey building, high-rise building, town village, old house, dormitory, campus dormitory, factory dormitory, etc.), a belonging zone, longitude and latitude coordinates (84 coordinate system), peripheral base stations and information acquisition equipment information; in which suspect dimension, the first element includes, but is not limited to, an identity number, a physical feature, a cell phone number, a person name, a nickname, a gender, an age group (teenager [ age < (17%), youth [18< (age) > 45], middle age [46< (age) > 69], old [70< (age ]), and the like; in which tool dimension, the first element includes but is not limited to bank account number, card opening name, QQ number, micro signal code, Payment treasure number, related website, collected mobile phone number, mobile phone Serial Number (SN) code, MAC address, related vehicle information (license plate number, frame number, brand) element; in what dimension of means, the first element includes, but is not limited to, manufacturing conditions, preparation tools, intrusion from a door, entry into a cave, brute force, gun holding, mechanical holding, riot, hijacking, broken lock theft, car stealing, impersonation identity, cheating, entrainment, concealment, other means; in which dimension, the first element includes, but is not limited to, political motivation, financial motivation, reward motivation, fear motivation, other motivations, and the like; in which behavioral dimension, the first element includes but is not limited to elements such as disputes, public security class, criminal class, theft (burglary, pickpocket, other theft), robbery, fraud (contact type fraud, non-contact type fraud), cases involved in virus, and the like; in which outcome dimension, the first element includes, but is not limited to, institutional medical identification, minor injury, major injury, lethal, non-injured, economically invasive, unidentified; in which state dimension, the first element includes, but is not limited to, in-process, put up a case, in-process, solve a case, committed a case. Similarly, each second dimension includes at least one second element, and the second element and the first element are the same in kind and are not described again.
It should be noted that before the original case and the candidate case set are obtained, case data may be collected and preprocessed to obtain the original case and the candidate case set. Alternatively, preprocessing may include data cleansing, conversion (normalization), loading, and the like. And the data cleaning can clean the corresponding elements in the candidate set according to the element blacklist.
S200, selecting a candidate case from the candidate case set.
Optionally, the candidate cases in the candidate case set are numbered or arranged in sequence, and one candidate case is selected from the candidate case set according to the order of the numbering or the arrangement. For example, an origin case set Y ═ Y1, Y2, Y3..., Yn }, Yn is the nth origin case, assuming one selected origin case Y1, a candidate case set M ═ M1, M2, M3.. and Mn }, and Mn is the nth candidate case, and a candidate case M1 is selected from the candidate case set.
S300, respectively calculating the first similarity of each first dimension and each second dimension which is the same as the first dimension in the selected candidate cases.
Specifically, the second dimension which is the same as the first dimension refers to a dimension which is the same as the type, for example, the first dimension includes ten dimensions, the second dimension includes ten dimensions, when the first dimension is the victim dimension, the first similarity of the first dimension and the second dimension of the second dimension which is the same as the victim dimension is calculated, that is, the first similarity of what victim dimension of the original case Y1 and what victim dimension of the candidate case M1 is calculated, it can be understood that the first similarity of each first dimension and each corresponding second dimension which is the same as the first dimension can be determined through ten times of calculation, and finally ten first similarities are obtained.
Optionally, step S300 includes steps S310-S320:
s310, obtaining a first data depth score of the first element in each first dimension and a second data depth score of the second element in each second dimension.
Specifically, the first data depth score and the second data depth score are determined based on element depth, and since elements of data dimensions may include a relationship between a top level and a bottom level, for example, where and where the dimensions are taken as examples, if a venue belongs to a cell, specifically, a closed cell, and at this time, the closed cell is deeper data relative to the cell, the data depth score of the closed cell may be higher than the data depth score of the cell, and since the same criminal may frequently commit to the same type of cell from the law of activity of criminals and the analysis of contact circles, the data weighted depth is higher for the next level of data nodes in the data processing process. It should be noted that the setting principle of the data depth scores of other dimensions is similar, and may be determined according to actual needs, and the embodiment of the present invention is not particularly limited.
S320, determining the first similarity of each first dimension and each second dimension which is the same as the first dimension in the selected candidate case according to the first data depth score of the first element and the second data depth score of the second element which is the same as the first dimension corresponding to the first element.
Specifically, the formula is:
where ssim (x, y) is a first similarity, approaching 1 indicates that two elements of the dimension are more similar, x indicates a first dimension, y indicates a second dimension of the same type as the first dimension, and x indicates a second similarityiDenotes the ith first element, y, in xiRepresenting the ith second element in y, the value of n is the number of the first element or the second element, LxA first data depth score, L, representing a first elementyA second data depth score representing a second element. Alternatively, xiAnd yiA numerical value may be preset so as to obtain a calculation result by substitution, for example, rational numbers such as 1, 2, etc. It can be understood that the first similarity between the first dimension and the second dimension which is the same as the first dimension can be calculated by the above formula, and each first similarity between each first dimension and each second dimension which is the same as each first dimension can be obtained by calculating for multiple times, that is, ten first similarities are obtained, and ten first similarities corresponding to ten dimensions between origin case Y1 and candidate case M1 are obtained. It should be noted that, in the embodiment of the present invention, since the first element and the second element have element breadth, each dimension relates to one element or multiple elements, and the multiple elements, such as case-related articles of burglary cases, include both a mobile phone and a valuable jewelry, the multiple elements need to be matched in the data processing process to increase the comprehensiveness and accuracy of similarity matching.
It should be noted that the first element and the second element in the embodiment of the present invention also have element ambiguity and element noise. The element ambiguity refers to a special symbol which is possibly stored in case data and is subjected to fuzzy processing, partial information cannot be acquired accurately in most cases, and the data can only be recorded through omission. Exemplary, suspect mailbox address as M1 case: "XY 275 × [email protected] qq.com", mailbox address record "XY [email protected] qq.com" of M2 case, in the process of data processing, using character string matching method, dividing original element into several groups by x, successively comparing them by matching rule, if they can be matched, it shows that M1 and M2 suspect mailbox elements have identity, and the whole data is weighted. The element noise performance refers to that in the data preprocessing process, if the first element and the second element contain other elements, equal elements or uncertain elements, the elements need to be removed in a preprocessing link because of no serial-parallel meaning and possible influence on serial-parallel accuracy.
In the embodiment of the invention, cases of different crime types are discovered by analyzing data of a large number of cases for detecting and detecting the alarm conditions, and the threads of the obtained factors (namely the first dimension and the second dimension) are different. Specifically, the dimensions of the suspects obtained by the fraud police are richer and more detailed, and the suspects comprise channel information such as a fraud nickname, remittance bank account numbers, contact ways and the like. The relative dimension data of the suspect of the theft alarm case is difficult to obtain basically, and the dimension data is mainly reflected in the dimension of which means and the dimension of which means, including information of tools, means of ways, fingerprints, shoe prints and the like for implementing theft, so that comprehensive operation from the dimension of which elements is ten is needed from the aspect of never generating serial-parallel data omission.
S400, determining a second similarity of the original case and the selected candidate case according to the first similarity and the preset dimension weight.
Optionally, step S400 includes step S410:
s410, weighting the first similarity and a preset dimension weight to obtain a second similarity of the original point case and the selected candidate case.
The inventionIn the embodiment, the behavior dimension, when dimension, what tool dimension, what instrument dimension, where dimension, what reason dimension, what result dimension, what state dimension, what suspect dimension, and what victim dimension are sequentially weighted from large to small in the preset dimension weight, that is: the default dimension weight depends on which behavior dimension (phi)7)>When dimension (phi)1)>Dimension of the tool (phi)4)>Dimension of what means (phi)5)>Where dimension (phi)2)>Dimension of reason (phi)6)>Dimension of what result (phi)8)>Dimension of what state (phi)9)>Dimension of suspect (phi)3)>Dimension of what victim is (phi)10) And phi is7≥1,1≥φ1>φ4>φ5>φ2>φ6>φ8>φ9>φ3>φ10≥0,φ1+φ4+φ5+φ2+φ6+φ8+φ9+φ3+φ10=1。
It should be noted that the preset dimension weight may be set according to actual requirements. In the embodiment of the invention, the fact that the crime type and behavior dimension are crossed is analyzed through a large amount of data practice, and the value of the series-parallel connection is not large. Exemplarily, if the remaining 9 dimensions of the case of theft and the case of robbery are different, the comprehensive similarity basically approaches to 0; the dimension of time and space is expressed as space-time mutual exclusivity, which is a habit rule that criminal suspects cannot implement multiple cases at the same time but do cases in the same time period; for the dimension of the tool of the articles involved in the case, such as the QQ number and the bank card number, the information is the same as the identity information of a person, has uniqueness and is beneficial to screening and breaking, so the weight level is particularly important to represent, and for the dimension weight of a suspect larger than the dimension of a victim, the result meaning of the information of the victim after the information is subjected to cluster removal is not great for the same criminal or criminal group which infringes different victims, thereby determining the preset dimension weight.
Specifically, the second similarity is calculated according to the formula:
wherein wsim (x, y) is the second similarity, ssimm(x, y) is the mth first similarity, calculated by the above formula of ssim (x, y), phimAnd the weight corresponding to the mth dimension. It should be noted that the second similarity represents the overall similarity between the origin case Y1 and the candidate case M1.
It should be noted that, as shown in fig. 2, the data processing in the embodiment of the present invention is implemented by a serial-parallel intelligent operation center, which provides operation management services, for example, services including obtaining an original case and a candidate case set and calculating a first similarity and a second similarity, and the like, and uses a modeling serial-parallel operation basis distributed node (i.e., an algorithm operation slice E), E ═ E1, E2, E3, …, En }, and En represents an nth node, which receives a processing task request of a task initiator to the outside for internal management, performs data preprocessing, and broadcasts service state information of itself, including data storage slices, idle task numbers, and load conditions of hardware computing resources to the whole network, and loads preprocessed data to operation service nodes by an ETL data exchange tool, and waiting for the initiation of the operation task, and storing the data processed by the algorithm operation slice E to generate a visual relation map. Wherein the sentinel represents the transmission of sentinel data.
And S500, determining a data set according to the second similarity.
Specifically, the data set includes first emphasis class data or second emphasis class data, and the importance of the second emphasis class data is higher than that of the first emphasis class data.
Optionally, step S500 includes step S510 or step 520:
s510, when the second similarity is larger than or equal to the first threshold and smaller than the second threshold, the selected candidate case is determined to be first key class data.
Specifically, the first threshold is U1Second, secondThreshold value of U2,1≤U1<U2When U is formed2>wsim(x,y)≥U1And determining the selected candidate case M1 as the first key class data D1.
And S520, when the second similarity is larger than or equal to a second threshold value, determining the selected candidate case as second key point class data.
Specifically, when wsim (x, y) ≧ U2And determining the selected candidate case M1 as second key point class data D2. It should be noted that, when the candidate cases are the second key type data, it is considered that the candidate cases have a serial-parallel value, and may be the same group or have a serial-parallel case in which the same criminal uses different crime forms, so that the cases may be merged.
S600, merging the second key-point data and the original point case to obtain a merged case.
Specifically, when the candidate case is the second important data, the candidate case and the original case are merged. For example: when the candidate case M1 is the second key type data D2, M1 and D2 are merged to obtain a merged case B1. It is understood that merging corresponds to adding a second element in the second dimension of M1 to a corresponding first dimension that is the same as the second dimension, and that a duplicate may be avoided if the second element is the same as the first element.
Illustratively, if a criminal of a case uses a QQ number and another case uses a bank card number, but the case is determined to be the second important class data through the similarity calculation, the QQ number and the bank card number of the two cases are necessary to be the meaning of the next round of serial-parallel elements, i.e. the bank card number and the QQ number are merged into the dimension of one case.
S700, taking the merged case as a new original point case, selecting a new candidate case from the candidate cases except the data set in the candidate case set, and returning to the step of respectively calculating the first similarity of each first dimension and each second dimension which is the same as the first dimension in the selected candidate cases for iteration until a preset iteration condition is reached to obtain a merged result.
Specifically, the merged case B1 is used as a new original point case, and a new candidate case is selected from candidate cases in the candidate case set M except for the data set, for example, a new candidate case is selected from M2-Mn, for example, M2, the steps of calculating first similarities of each first dimension and each second dimension in the selected candidate cases, which is the same as the first dimension, are returned to perform iteration, that is, B1 is used as Y1 in the step S300, M2 is used as M1 in the step S300, the step S300 is re-performed, a new data set and a new merged case B2 are determined, and are results of one iteration until a preset iteration condition is reached, a plurality of new data sets and new merged cases Bn are determined, and a final merged result is obtained. Note that the merging result is a result of merging the non-origin case Y1 and all the candidate cases determined as the second highlight type data in the candidate case set M.
The merging mode disclosed by the embodiment of the invention increases a longitudinal serial-parallel deep digging method, also adds clues which possibly have serial-parallel values in other cases into an origin data packet longitudinally on the basis of transversely calculating the similarity between the other cases and an origin case Y1, and increases the merging universality and the inclusion capacity by performing serial-parallel deep digging in the next round.
It can be understood that, after the iteration is finished, a new origin case Yn (for example, Y2) may be extracted from the origin case set, and a new merging result is obtained with the candidate case set M by using the steps described above, and the iteration is repeated until all the operation analysis of the Yn origin case is finished, which is not described again. The merging result of each original point case is stored in the data memory, so that the subsequent task process can be traced back conveniently, and a realistic analysis basis is provided for case serial-parallel cables.
Optionally, the reaching of the preset iteration condition in step S700 may include steps S710, S720, or S730:
and S710, when the total amount of the key data in all the data sets is larger than or equal to a third threshold value, a preset iteration condition is reached.
Specifically, the total amount of the key class data in all the data sets refers to the first key class data and the second key class data in the plurality of data sets determined by the iterative processTotal number, total number SumnDenotes, n denotes the number of iterations, SumnThe number refers to the total number of key class data in all data sets after the nth iteration, U3Is the third threshold value, when Sumn≥U3And considering preset iteration conditions to obtain a final combination result.
And S720, when the iteration times are larger than a fourth threshold value, a preset iteration condition is reached.
Specifically, when the number of iterations is greater than the fourth threshold U4And reaching the preset iteration condition.
Note that U is3、U4Can be set according to the actual situation, such as U3Is 100, U4The reason why the merging result of 20 cases, that is, the merging result of the cases, should not exceed 100 cases, the number of times of the serial-parallel connection of the original data packet (i.e., the new merged case Bn) should not exceed 20 times, and the third threshold and the fourth threshold are set is that if the conditions of the third threshold and the fourth threshold are not met, the final merging result obtained may have data abnormality, the obtained serial-parallel result is also meaningless, and it may be generated that the analysis data packet is abnormally large, resulting in service overload operation. Illustratively, if one criminal suspect uses the QQ number and pretends to be a case by the short message '10086' number, and another case uses the bank card number and also pretends to be a case by the '10086' number, the result would be that the original data packet would be deeply concatenated with the incorrectly associated QQ number and bank card number, which would result in a large deviation of the number of concatenated cases and the number of concatenated cases, which would require the addition of the element causing the data anomaly to the blacklist. In addition, in the data preprocessing, a blacklist of data results which need to be loaded with the data operation slice En can be filtered, abnormal data can be removed, for newly found elements causing data abnormality, measures of fusing the third threshold and the fourth threshold are adopted in the operation process, and the newly found elements are added into the blacklist after being audited.
And S730, if the total quantity of the key data in all the current data sets is equal to the total quantity of the key data in the data set obtained in the last iteration, a preset iteration condition is achieved.
Likewise, in particular, the total number of key class data in all data sets refers to the total number of first key class data and second key class data in the plurality of data sets determined by the iterative process, the total number being SumnDenotes, n denotes the number of iterations, SumnThe number refers to the total number of key class data in all data sets after the nth iteration, when Sumn=Sumn-1,Sumn-1The total amount of key data in all the data sets after the (n-1) th iteration is considered as a preset iteration condition, and a final combination result is obtained.
The case merging method based on the multi-dimension of the embodiment of the invention can also comprise the steps of S810, S820 or S830, the visual map service of the merging result is provided, the whole case serial-parallel (merging) process has high cross correlation, in the process of mining from B1 data to Bn multi-dimension data, a flexible user interface and an exploratory mining experience are constructed, the layer-by-layer serial-parallel process is displayed in the form of a visual relation map, and in the iterative serial-parallel process of the serial-parallel model, the multi-dimensional deep-digging serial-parallel method is constructed, so that a user can drill down, screen, set an early warning value and the like in a delivery interface. In addition, for the display of the track information, a GIS technology is adopted to display the case occurrence place and the activity track of the suspect on a map in a scattered point form. All the information of the original point cases, the data sets, the combined cases and the like supports the export of EXCEL of the final result, and specifically, the export of each dimension of the original data according to the sheet page labels is carried out by using the excell method for the serial-parallel result of each layer in the device, so that a data base is provided for the subsequent detection of the cases.
Specifically, the method comprises the following steps:
s810, displaying the original point case and each data set on a map, and displaying the association relationship between each data set and the second dimension and the first dimension of the original point case.
Specifically, the original point case is taken as the center of a circle, the merging result and the data set of each time are displayed in a layered mode and displayed on a map,
as shown in fig. 3, with the first layer and the second layer, the origin (case) Y1 is associated with the Mn case by the similarity of the ten elements (i.e., ten dimensions), so that the application value and the accuracy of the related string and parallel cases can be clearly determined, and the association relationship between the second dimension and the first dimension of each data set and the origin case can be well demonstrated.
Optionally, the association relationship mined in parallel (i.e. merged) in each layer can be shown in a page showing mode or in a process animation mode during the showing, and particularly, in the parallel in the first layer, the { Y1, C is shown1Cases and related whatever elements and brief warnings, Y1 being the origin case, C1For the data set obtained in the first iteration, when the second drilling-down is continued, a second series-parallel (i.e. combination) result is shown, including { Y1, C1,C2Cases and associated dimensional elements, C2The data set obtained for the second iteration. And by analogy, the drill-down can be carried out step by step until the series-parallel connection is finished. The serial-parallel results of different layers have different studying and judging meanings, the deeper the drilling level is, the more the serial-parallel data result is increased geometrically, the lower the data accuracy is, so that the drilling serial-parallel results of corresponding levels need to be used in combination with actual service conditions.
And S820, displaying the original position and each data set on the map, responding to the track determination operation, and displaying the space-time motion track of the combined result based on the time dimension and the place dimension.
Specifically, the first dimension and the data sets both include a time dimension and a place dimension, the original position case and each data set are displayed on the map, operation is determined in response to the trajectory of the user, the system utilizes the place dimension and the time dimension in the ten-dimension elements of the case, for example, a GIS (geographic information system) address of an 84 coordinate system of the case of the original position case and the data sets is obtained, and then the spatio-temporal motion trajectory of the string and parallel case results (namely, the merging results) can be displayed on the display page according to the specific time of the case of the time dimension, so that a new technical means is provided for case detection, and the analysis efficiency is improved.
And S830, displaying the original position and each data set on the map, and responding to the element input operation to position the original position or the data set corresponding to the input element on the map.
Specifically, a search screening text box may be provided on the display page for the user to input an element, for example, a first element or a second element in ten dimensions, and the system, in response to the user's element input operation, locates on the map to a position of an origin case or a data set corresponding to the input element, so that a case position in the relationship map, for example, a position of the origin case or the data set, may be quickly located on the map.
To sum up, the embodiment of the invention calculates the similarity by counting and classifying the original point case and the candidate case set M containing the alarm condition, the case information, the record data and the like, and acquiring the ten-dimensional characteristics of the original point case and the candidate cases in the candidate case set M, and acquires the data matrix under the candidate case set M, and the data is judged to be the effective merging result after the result meets the rule of the set threshold; and performing association mining calculation on the similarity array matrix algorithm through a structured similarity comparison algorithm, and finally performing iteration deep mining to form a case clue relationship chain map.
The embodiment of the invention also provides a case merging device based on multiple dimensions, which comprises:
the acquisition module is used for acquiring an original case and a candidate case set; the original case comprises a first dimension set, the first dimension set comprises a plurality of first dimensions, the candidate case set comprises a plurality of candidate cases, and each candidate case comprises a plurality of second dimensions;
the selection module is used for selecting a candidate case from the candidate case set;
the first calculation module is used for respectively calculating the first similarity of each first dimension and each second dimension which is the same as the first dimension in the selected candidate cases;
the first determining module is used for determining a second similarity between the original point case and the selected candidate case according to the first similarity and the preset dimension weight;
the second determining module is used for determining the data set according to the second similarity; the data set comprises at least one of first important class data and second important class data, and the importance degree of the second important class data is higher than that of the first important class data;
the merging module is used for merging the second key class data and the original point case to obtain a merged case;
and the iteration module is used for taking the merged case as a new original point case, selecting a new candidate case from the candidate cases except the data set in the candidate case set, returning to the step of respectively calculating the first similarity of each first dimension and each second dimension which is the same as the first dimension in the selected candidate cases, and performing iteration until a preset iteration condition is reached to obtain a merged result.
The contents in the above method embodiments are all applicable to the present system embodiment, the functions specifically implemented by the present system embodiment are the same as those in the above method embodiment, and the beneficial effects achieved by the present system embodiment are also the same as those achieved by the above method embodiment.
The embodiment of the invention also provides a case merging device based on multiple dimensions, and the device comprises a processor and a memory;
the memory is used for storing programs;
the processor is used for executing programs to realize the multi-dimensional based case merging method of the embodiment of the invention. The device of the embodiment of the invention can realize the function of case combination based on multiple dimensions. The device can be any intelligent terminal such as a mobile phone, a tablet Personal computer, a Personal Digital Assistant (PDA for short), a Point of Sales (POS for short), a vehicle-mounted computer, and the like.
The contents in the above method embodiments are all applicable to the present apparatus embodiment, the functions specifically implemented by the present apparatus embodiment are the same as those in the above method embodiments, and the advantageous effects achieved by the present apparatus embodiment are also the same as those achieved by the above method embodiments.
The embodiment of the present invention further provides a computer-readable storage medium, where a program is stored, and the program is executed by a processor to implement the case merging method based on multiple dimensions according to the foregoing embodiment of the present invention.
Embodiments of the present invention also provide a computer program product including instructions, which when run on a computer, cause the computer to execute the multi-dimensional based case merging method of the foregoing embodiments of the present invention.
The terms "first," "second," "third," "fourth," and the like in the description of the application and the above-described figures, if any, are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the application described herein are capable of operation in sequences other than those illustrated or described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
It should be understood that in the present application, "at least one" means one or more, "a plurality" means two or more. "and/or" for describing an association relationship of associated objects, indicating that there may be three relationships, e.g., "a and/or B" may indicate: only A, only B and both A and B are present, wherein A and B may be singular or plural. The character "/" generally indicates that the former and latter associated objects are in an "or" relationship. "at least one of the following" or similar expressions refer to any combination of these items, including any combination of single item(s) or plural items. For example, at least one (one) of a, b, or c, may represent: a, b, c, "a and b", "a and c", "b and c", or "a and b and c", wherein a, b, c may be single or plural.
In the several embodiments provided in the present application, it should be understood that the disclosed apparatus and method may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, a division of a unit is merely a logical division, and an actual implementation may have another division, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form. Units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment. In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.
The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application may be substantially implemented or contributed to by the prior art, or all or part of the technical solution may be embodied in a software product, which is stored in a storage medium and includes multiple instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method of the embodiments of the present application. And the aforementioned storage medium includes: various media capable of storing programs, such as a usb disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.
The above embodiments are only used to illustrate the technical solutions of the present application, and not to limit the same; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions in the embodiments of the present application.
- 上一篇:石墨接头机器人自动装卡簧、装栓机
- 下一篇:基于降维算法的数据关联方法及系统