Privacy protection method for release track based on replacement
1. A privacy protection method for a published track based on replacement is characterized by comprising the following steps:
step 1: determining a distance relationship between interchangeable trajectory points based on desensitization and validity;
step 2: based on the distance relation in the step 1, connecting edges between the track points to establish a track point relation network;
and step 3: establishing a k-core subnet based on the trace point relationship network in the step 2;
and 4, step 4: and (3) based on the k-core subnet in the step (3), realizing the k anonymity of the trace points by exchanging the positions between adjacent nodes on the k-core subnet.
2. The privacy protection method for a publication track based on permutation as claimed in claim 1, wherein the step 1 specifically comprises:
defining a sensitive area AsSensitive area AsAs pairs of spatio-temporal sensitivity thresholds<Sr,Tr>Distance area model RM of a sensitive area1In which A issThe middle point and the track point p do not exchange positions, and desensitization of the point p can be realized only when the point p leaves a sensitive area;
defining an effective area AeEffective area AeAs space-time effective threshold value pairs<Se,Te>Distance area model RM of an active area2In which A iseThe outer points do not exchange positions with the track points p, and p is only replaced to AeThe effectiveness of releasing the track points can be guaranteed only by the region;
the conversion target position of the track point p is AsAnd the target position is to be at AeAnd (4) the following steps.
3. The privacy protection method for track release based on displacement as claimed in claim 2, wherein the step 2 is to make the edges between the track points specifically when the edges are connectedThere are no replacement regions that meet both desensitization requirements and usability requirements;
when in useWhen the temperature of the water is higher than the set temperature,an exchangeable area of p is formed, this area being equal to the area from AeIn which one core A is removedsForming a shell domain SIR;
during the track point conversion process, p can only be replaced with the track points distributed in the SIR of the shell domain.
4. The privacy preserving method of claim 1 for a post track based on permutationThe method is characterized in that the track point relational network established in the step 2 is specifically that the track points are stored in a node linked list in the track point relational network and are used as a storage structure of the track point relational network; the data structure of the node chain table comprises a previous node P pointing to the current node in turnpreviousUnique serial number NodeNum distributed to each node, track identifier TID of node main body, space-time position (t, x, y) of track point, and next node P pointing to current nodenextDegree NeighborNum of node in network and neighbor set P pointing to nodeneighbour;
Storing the neighbor set of each node in the form of binary search tree, and calculating the time complexity of the algorithm from k2Reduction to log (k);
the data structure of the nodes in the neighbour tree thus comprises in turn a parent node P pointing to the current nodefatherUnique serial number NodeNum of adjacent node and left child node P pointing to current nodeleftRight child node P pointing to current noderightAnd node P pointing to the current neighbor in the main storage chain tableneighbor。
5. The privacy protection method for track release based on displacement according to claim 1, characterized in that the step 3 of establishing the k-core subnet specifically includes that k anonymity of track points in the track release data is obtained according to the theorem that if and only if adjacent nodes randomly exchange positions in the k-core network, the probability that the exchanged nodes are recovered does not exceed 1/k, the exchange must be limited to the k-core subnet of the track point relationship network, and in the track point relationship network, the k-core subnet can be obtained by repeatedly deleting all nodes with a degree smaller than k.
6. The privacy protection method for the published track based on the replacement as claimed in claim 5, wherein the step 4 is implemented by the following steps of obtaining published track data satisfying the k-anonymity of the track points through the interchange of the positions of the adjacent nodes on a k-core subnet of the track point relational network:
step 4.1: establishing a node binary search tree based on the degree, and marking all nodes as active;
step 4.2: selecting the active node p with the lowest moderate KTNminA 1 is to pminThe position of (2) is replaced by the adjacent node with the minimum degree, and the position is marked as exchanged;
step 4.3: based on the exchange of step 4.2, deleting all neighbors except the common node of the two exchange nodes after the exchange;
step 4.4: correcting the degrees of the nodes in the two interchanged states in the step (2) and the original neighbor nodes thereof, and updating the binary search tree based on the degrees;
step 4.5: based on the updating of the step 4.4, if the minimum node in the active state has no neighbor node, marking the minimum node as a freqen;
step 4.6: repeating the processes of the steps 4.2 to 4.5 until no node in an active state exists;
step 4.7: and randomly inserting the node in each frozen state into the adjacent track of the node so as to ensure that the k anonymity of the track point in the published track data is not damaged.
Background
With the widespread use of mobile communication devices and positioning technology, location-based services continue to emerge. The location service provider may obtain a large amount of trajectory data by the content and geographic location of the location service request. Trajectory features may be applied in many areas of intelligent transportation, supply chain management, etc., but trajectory owners may need to publish trajectories to untrusted third parties for analysis. If a third party is able to associate a particular track with a track subject, the privacy of the subject may be compromised. Therefore, the trace must be processed before it is published to protect privacy security.
The operation of the track data is divided, and the privacy protection method for releasing the track is divided into three types which respectively correspond to three basic operations of the relational database, namely addition, deletion and updating.
The method based on virtual data addition comprises the following steps:
the main idea is to add virtual data to the original data, wherein the virtual data is generated according to the characteristics of the original data. The method improves the anonymity level of the mixed data on the premise of ensuring that serious data loss does not occur. Such methods increase the amount of data processed. Due to the space-time correlation and the multidimensional property of the track, the success rate of the existing virtual method in the aspect of protecting the track privacy of the user is not more than 15%. Such categories are commonly used for privacy protection in location-based services.
Method based on suppressing sensitive data:
the purpose of such methods is to delete sensitive points before publishing the trace data. Sensitive points may be specific to different issues, locations frequently visited by the user, locations with important semantic features, etc. This class of methods is based on the quasi-identifier known (QID-aware) with specific limitations on the enemy's background knowledge. The difficulty of such problems is how to accurately and in advance understand the sensitive points in the trajectory data and the background knowledge of the opponent. A large number of sensitive points are suppressed, resulting in a large amount of information being lost, limiting the availability of publishing trace data.
A generalization-based method:
the generalization method generates a release track that represents the original track. While the important characteristics of the original track data are kept, the leakage of sensitive information is prevented or the probability of identifying a track body by an adversary is reduced. The release track is not a new track, but represents a value range of the original track, so that the release track can be regarded as a generalization of the original track.
K anonymity of track data is a typical privacy protection method for releasing track data based on generalization, and is realized through two steps of clustering and reconstructing. In a first step, each track is merged into a cluster, each cluster containing at least k tracks. (nk) of k tracks in order for k tracks in a cluster to meet the similarity requirement2-nk)/the average distance between 2 point pairs cannot exceed the cluster radius, where n is the average length of the track. In the reconstruction phase, the tracks in each cluster are reconstructed as k indistinguishable tracks. The reconstruction distorts the distribution track, the distortion degree being positively correlated with the cluster radius. The cluster radius grows rapidly with increasing n, and this phenomenon in which data distortion is exacerbated with increasing data dimensions is known as dimensionality disaster. Another typical approach to reduce data distortion is to generalize the trace points into a region to achieve desensitization. The trace point generalization based approach overcomes this severe information distortion, however, this approach does not quantify the level of privacy protection.
Disclosure of Invention
The invention provides a privacy protection method for a release track based on replacement, which is used for solving the problem of low success rate of a method based on virtual data addition; a large number of sensitive points are suppressed based on the method for suppressing the sensitive data, so that a large amount of information is lost, and the usability of the released track data is limited; the method based on track generalization causes serious information distortion; the method based on trace point generalization can not quantify the level of privacy protection.
The invention is realized by the following technical scheme:
a privacy protection method for a release track based on replacement specifically comprises the following steps:
step 1: determining a distance relationship between interchangeable trajectory points based on desensitization and validity;
step 2: based on the distance relation in the step 1, connecting edges between the track points to establish a track point relation network;
and step 3: establishing a k-core subnet based on the trace point relationship network in the step 2;
and 4, step 4: and (3) based on the k-core subnet in the step (3), realizing the k anonymity of the trace points by exchanging the positions between adjacent nodes on the k-core subnet.
Further, the step 1 is specifically to define a sensitive area asSensitive area AsAs pairs of spatio-temporal sensitivity thresholds<Sr,Tr>Distance area model RM of a sensitive area1In which A issThe middle point and the track point p do not exchange positions, and desensitization of the point p can be realized only when the point p leaves a sensitive area;
defining an effective area AeEffective area AeAs space-time effective threshold value pairs<Se,Te>Distance area model RM of an active area2In which A iseThe outer points do not exchange positions with the track points p, and p is only replaced to AeThe effectiveness of releasing the track points can be guaranteed only by the region;
the conversion target position of the track point p is AsAnd the target position is to be at AeAnd (4) the following steps.
Further, the step 2 makes the edges between the tracing points be specifically whenThere are no replacement regions that meet both desensitization requirements and usability requirements;
when in useWhen the temperature of the water is higher than the set temperature,an exchangeable area of p is formed, this area being equal to the area from AeIn which a core is removedAsForming a shell domain SIR;
during the track point conversion process, p can only be replaced with the track points distributed in the SIR of the shell domain.
Further, the establishing of the trace point relationship network in the step 2 is specifically that the trace points are stored in a node linked list in the trace point relationship network and are used as a storage structure of the trace point relationship network; the data structure of the node chain table comprises a previous node P pointing to the current node in turnpreviousUnique serial number NodeNum distributed to each node, track identifier TID of node main body, space-time position (t, x, y) of track point, and next node P pointing to current nodenextDegree NeighborNum of node in network and neighbor set P pointing to nodeneighbour;
Storing the neighbor set of each node in the form of binary search tree, and calculating the time complexity of the algorithm from k2Reduction to log (k);
the data structure of the nodes in the neighbour tree thus comprises in turn a parent node P pointing to the current nodefatherUnique serial number NodeNum of adjacent node and left child node P pointing to current nodeleftRight child node P pointing to current noderightAnd node P pointing to the current neighbor in the main storage chain tableneighbor。
Further, the step 3 of establishing the k-core subnet specifically includes that k anonymity of trace points in the released trace data is obtained according to the theorem that if and only if positions of adjacent nodes are randomly exchanged in the k-core network, the probability that the exchanged nodes are recovered does not exceed 1/k, the exchange must be limited in the k-core subnet of the trace point relation network, and in the trace point relation network, the k-core subnet can be obtained by repeatedly deleting all nodes with degrees smaller than k.
Further, the step 4 is completed by the following steps of obtaining the published trajectory data meeting the k-anonymity of the trace points through the interchange of the positions of the adjacent nodes on the k-core subnet of the trace point relational network:
step 4.1: establishing a node binary search tree based on the degree, and marking all nodes as active;
step 4.2: selecting the active node p with the lowest moderate KTNminA 1 is to pminThe position of (2) is replaced by the adjacent node with the minimum degree, and the position is marked as exchanged;
step 4.3: based on the exchange of step 4.2, deleting all neighbors except the common node of the two exchange nodes after the exchange;
step 4.4: correcting the degrees of the nodes in the two interchanged states in the step (2) and the original neighbor nodes thereof, and updating the binary search tree based on the degrees;
step 4.5: based on the updating of the step 4.4, if the minimum node in the active state has no neighbor node, marking the minimum node as a freqen;
step 4.6: repeating the processes of the steps 4.2 to 4.5 until no node in an active state exists;
step 4.7: and randomly inserting the node in each frozen state into the adjacent track of the node so as to ensure that the k anonymity of the track point in the published track data is not damaged.
The invention has the beneficial effects that:
the invention enables the release track data to meet the privacy requirement of the publisher and the availability requirement of the analyst; compared with a method based on false track and inhibition and a method based on track anonymity, the method reduces information distortion to a great extent; the algorithm can achieve the requirements of quantified privacy requirements in terms of both usefulness and desensitization, while achieving quantification of the privacy level (trace points k are anonymous).
Drawings
FIG. 1 is a schematic diagram of the process steps of the present invention.
FIG. 2 is a schematic diagram of the algorithm steps of the present invention.
FIG. 3 is a schematic representation of the replaceable regions of the shell-like of the present invention.
Fig. 4 is a schematic diagram of a data structure of nodes in the trace point relationship network of the present invention, in which (a) is a node chain table in the trace point relationship network, and (b) is a data structure of nodes in a neighbor tree.
Fig. 5 is a schematic diagram of two trace point replacement schemes on a 2-core network according to the present invention, where (a) is the 2-core network waiting for replacement, (b) is the first replacement scheme, and (c) is the second replacement scheme.
FIG. 6 is a schematic diagram of privacy attack of a published track satisfying k-anonymity of track points generated by the algorithm of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be described clearly and completely with reference to the accompanying drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Example 1
As shown in fig. 1, a privacy protection method for a publication track based on permutation specifically includes the following steps:
step 1: determining a distance relationship between interchangeable trajectory points based on desensitization and validity;
step 2: based on the distance relation in the step 1, connecting edges between the track points to establish a track point relation network;
and step 3: establishing a k-core subnet based on the trace point relationship network in the step 2;
and 4, step 4: and (3) based on the k-core subnet in the step (3), realizing the k anonymity of the trace points by exchanging the positions between adjacent nodes on the k-core subnet.
Fig. 2 depicts the steps of the method: a. the method comprises the steps of original tracks, b, track point sets, c, track point relation networks, d, k kernel subnets of the track point relation networks, e, track point interchange positions and f, track reconstruction. In fig. 1, the spatial dimension is represented in the abscissa by only one dimension, k being 3.
Further, the step 1 is specifically to define a sensitive area asSensitive area AsAs pairs of spatio-temporal sensitivity thresholds<Sr,Tr>Distance area model RM of a sensitive area1In which A issThe middle point and the track point p do not exchange positions, onlyDesensitization of p can be achieved with the sensitive region away from p;
defining an effective area AeEffective area AeAs space-time effective threshold value pairs<Se,Te>Distance area model RM of an active area2Composition is carried out; wherein A iseThe outer points do not exchange positions with the track points p, and p is only replaced to AeThe effectiveness of releasing the track points can be guaranteed only by the region; unless p and AeThe outer track points are interchanged, otherwise, the distortion of p does not influence the usability of the release track;
the conversion target position of the track point p is AsAnd the target position is to be at AeAnd (4) the following steps.
Further, the step 2 makes the edges between the tracing points be specifically whenThere are no replacement regions that meet both desensitization requirements and usability requirements;
when in useWhen the temperature of the water is higher than the set temperature,an exchangeable area of p is formed, this area being equal to the area from AeIn which one core A is removedsForming a shell domain SIR;
during the track point conversion process, p can only be replaced with the track points distributed in the SIR of the shell domain. As shown in fig. 3.
Further, the establishing of the trace point relationship network in the step 2 is specifically that the trace points are stored in a node linked list in the trace point relationship network and are used as a storage structure of the trace point relationship network; the data structure of the node chain table comprises a previous node P pointing to the current node in turnpreviousUnique serial number NodeNum distributed to each node, track identifier TID of node main body, space-time position (t, x, y) of track point, and next node P pointing to current nodenextDegree of nodes in network neighborBorNum and neighbor set P pointing to nodeneighbour;
Storing the neighbor set of each node in the form of binary search tree, and calculating the time complexity of the algorithm from k2Reduction to log (k);
the data structure of the nodes in the neighbour tree thus comprises in turn a parent node P pointing to the current nodefatherUnique serial number NodeNum of adjacent node and left child node P pointing to current nodeleftRight child node P pointing to current noderightAnd node P pointing to the current neighbor in the main storage chain tableneighbor. As shown in fig. 4.
Further, the step 3 of establishing the k-core subnet specifically includes that according to the theorem that if and only if the positions of adjacent nodes are randomly exchanged in the k-core network, the probability that the exchanged nodes are recovered does not exceed 1/k, the k anonymity of trace points in the release trace data TD is obtained, the exchange must be limited in the k-core subnet of the trace point relationship network TN, and in the trace point relationship network, the k-core subnet can be obtained by repeatedly deleting nodes with all degrees smaller than k.
The process of proving theorem is as follows
The sufficiency proves that the hypothesis p1And p2Are two adjacent trace points, p, in the k-core network G1Degree of (1) is g1(g1≥k),p2Degree of (1) is g2(g2K or more); exchange p1And p2Form new track points p1Sum of p2*. therefrom is given g1*=g2≥k,g2*=g1≥k;p1May be associated with new g1Any one of the neighbors is obtained by switching; the probability of inferring the correct p1 position is 1/g1*(1/g11/k) or less; similarly, p is inferred2Probability of original position 1/g2*(1/g2*≤1/K);
The necessity proves that: let p be1And p2Are two adjacent points in the network G; p is a radical of1Is the only node with a degree less than k, p1Degree of (1) is g1(g1<k);p2Degree of (1) is g2(g2K or more); exchange p1And p2At a position of (b) forms p1Sum of p2A, then there is g1*=g2≥K,g2*=g1<k;p2Probably by p2And p2G in neighborhood2Derived from any one of the alternative positions of the trace points, and hence from p2Deducing p2Has a probability of 1/g2*(1/g2*>1/k);
Further, the optimal selection of switching node pairs on the network is a complex problem. In the 2-core network shown in fig. 5, switching locations according to the first scheme leaves a single node that cannot be switched. By using the second scheme, the release track meeting the anonymity of the track point 2 can be obtained. In order to improve the successful probability of the replacement of the nodes in the network to the maximum extent, a replacement algorithm based on minimum priority is designed.
And 4, on the k-core subnet of the trace point relation network, through the interchange of the positions of adjacent nodes, the issued trace data meeting the k-anonymity of the trace points is obtained and completed through the following steps:
step 4.1: establishing a node binary search tree based on the degree, and marking all nodes as active; the probability of successful replacement of the small nodes can be improved;
step 4.2: selecting the active node p with the lowest moderate KTNminA 1 is to pminThe position of (2) is replaced by the adjacent node with the minimum degree, and the position is marked as exchanged;
step 4.3: based on the exchange of step 4.2, deleting all neighbors except the common node of the two exchange nodes after the exchange; the purpose is to increase the swappable opportunities of neighboring nodes;
step 4.4: correcting the degrees of the nodes in the two interchanged states in the step (2) and the original neighbor nodes thereof, and updating the binary search tree based on the degrees;
step 4.5: based on the updating of the step 4.4, if the minimum node in the active state has no neighbor node, marking the minimum node as a freqen;
step 4.6: repeating the processes of the steps 4.2 to 4.5 until no node in an active state exists;
step 4.7: and randomly inserting the node in each frozen state into the adjacent track of the node so as to ensure that the k-anonymity of the track point in the published track data is not damaged.
By the privacy protection method of the published track based on the replacement, the track points in the published track meet the k anonymous privacy requirement, so the privacy protection level meets the privacy requirement and is explicit. Meanwhile, the information distortion generated by the distribution track data is limited within the available range proposed by an analyst, and the usability of the information distortion is ensured. Fig. 5 shows the privacy protection effect of the trace point 6 anonymity. Suppose an opponent has mastered trajectory TiP of (a)i1And pi2Then at least p can bei1Finds 6 exchangeable positions in the SIR and can thus be associated to at least 6 correlation trajectories. Using TS1To represent a group of with pi1An associated set of trajectories. Similarly, can be obtained like pi2Associated track set TS2。TiNecessarily at the intersection of these two sets, so the number of intersection trajectories is not less than 1. In the extreme case, the intersection contains only one trace T, which is then TiIs converted into a track Ti*. At any time t3Track T ofiMust be at pi3Within SIR of a signal and within its neighborhood at least 6 trace points are included, thus determining pi3Does not exceed 1/6.
The strategy based on the track point position interchange keeps the original position of the release track. The k anonymity strategy based on the track point reduces the suppression and distortion of the issued track. However, these advantages are obtained without giving up anonymity of the trajectory k. The trace point k is anonymous, so that the limit on the privacy requirement is reduced; therefore, the method is suitable for the track data which does not need to hide the identity of the track main body but needs to hide the sensitive position of the track point.
Assuming that the number of tracks in the to-be-released track data is N, the average track length is N, and the running time of the algorithm is consumed in three sub-processes:
constructing a track point relation network: the algorithm compares each point in the time window with its neighborsDistance of time with a time complexity of O (DeltaT.n.N)2);
In a trace point relational network, recursively traversing trace points which do not meet the requirements of a k-core, and deleting the trace points from adjacent trace points, wherein the time complexity is O (n.N.Trach.log (K)), and the Trach is the maximum inhibition rate of the trace points;
in the process of track point position exchange, the track positions of adjacent nodes need to be traversed and exchanged, and the time complexity is O (n.N).
The main running time of the method is caused by repeatedly detecting and deleting the trace points which do not meet the requirements of a k-core, and the time complexity is O (n.N.Trach.log (k)), wherein n.N is the number of the trace points in TD, and k is an anonymity threshold value.
Example 2
In practical application, the trajectory data to be issued and SIR (S) are takene0=50;ΔSe=10;Ss=Seα;Te=Se/v;Ts=SsV) and k as inputs. Wherein S ise0Is the initial value of the effective threshold in SIR; delta SeIs SeIncrements for each time; ssV is the average velocity in the trajectory data for the sensitivity threshold. Ss/SeExpressed by α, set to 0, 0.25, 0.5 and 0.75, respectively. Through continuous processing of all steps of the algorithm, the issued track data meeting the k-anonymity of the track points are obtained. Desensitization requirement of track body (A)sK) and performance requirements of trajectory data users (A)e) Are satisfied in the release track. There are several alternatives that need to be illustrated:
space-time distance relationship between trace points
This relationship may depend on the particular application and may have a variety of models, for example:
when with piP in accordance with the distance relation of inequality (1) when being a reference pointjIs limited in positionIs prepared byiIs within the central elliptical region. If the space distance s between the tracing points is measured by Euclidean distanceijInto sij=((xi-xj)2+(yi-yj)2)1/2Then the distance relationship is represented by piIs a central ellipsoid. Using RM1To represent this distance region model.
According to specific application requirements, other space-time relationships can be selected as distance relationships between the trace points. For example:
Trsij+Srtij≤SrTr (2)
in this distance relationship, pjIs limited to the position ofiA central diamond-shaped area. If the space distance s between the tracing pointsijDefined as the Manhattan distance, s | x | + | y |, then in the three-dimensional space formed by t, x and y, the distance relation between the track points is expressed as piIs a central octahedron. Using RM2To represent this distance region model.
Alternatively, the relationship between these two dimensions can be peeled off and simply defined as follows:
sij≤Sr,tij≤Tr (3)
at this time, pj is limited to a rectangular area with pi as the center. When the euclidean distance sij ═ ((xi-xj)2+ (yi-yj)2)1/2 is used as the spatial distance between the trace points, the distance relationship is represented by a cylinder. This distance area model is represented using RM 3.
Improved algorithm of track point replacement algorithm
The method of the invention processes the low-level nodes preferentially, and realizes pairing and exchange of all the nodes as much as possible. However, frequent searches and modifications in the degree-based binary search tree reduce operational efficiency. To improve efficiency, the algorithm is simplified. Permutation algorithm step at minimum priority [2 ]]The location of each node is exchanged with its random neighbors. This greatly reduces the comparison and ranking of the trace pointsOrder, time complexity from k2Down to log (k). The simplified algorithm describes the following steps:
step S1: detecting whether the current track point is in an Active state and the number of neighbors of the current track point is more than 0, and if the current track point does not meet the condition, completely anonymizing the node;
step S2: if the conditions are met, one neighbor node is selected randomly;
step S3: exchanging the positions of the current Active node and the selected neighbor node thereof;
step S4: marking a current node and a selected neighbor node thereof as an exchanged state;
step S5: deleting the link relation between the current node and the selected neighbor node;
step S6: deleting the links of the current node and the selected neighbor node and all the non-shared neighbors thereof;
step S7: if the number of the neighbors of the common neighbors is 0 and the neighbors are in an Active state, moving the neighbors to the current node, and marking the neighbors as an Interchanged state;
step S8: and traversing all track points, and executing the steps 1-7 to finish the displacement of the track points and the anonymity of k.