Semantic and space-time correlation based road network LBS interest point query privacy protection method
1. A road network LBS interest point query privacy protection method based on semantic and space-time association is characterized by comprising the following steps:
a, a central anonymous server initializes road network data, an interest point database and historical query data, and calculates historical interest point query frequency according to the historical query data;
step B, the mobile user sends LBS interest point query request to a central anonymous server;
c, the central anonymous server acquires historical query data of current query time, establishes a spatio-temporal correlation model to calculate time correlation probability between query semantics and position semantics, and defines a correlation entropy to construct an optimal correlation table at the current time;
d, the central anonymous server generates an anonymous set by using a false position generation method and reorders the anonymous set by combining the optimal association table, the historical query frequency of the interest point where the user initiates the query and the personalized privacy requirements of the user;
e, the central anonymous server sends the query content and the anonymous set to the LBS server;
step F, LBS, the server inquires according to the received anonymous set and the inquiry content and sends the inquiry result to the central anonymous server;
and G, filtering the received result set by the central anonymous server according to the accurate position of the user, and returning the corresponding query result to the user.
2. The method for protecting privacy of query of road network LBS based on semantic and spatio-temporal associations according to claim 1, wherein the query request qu of interest point initiated by user in step B is represented by a quadruple<uid,t,uloct,qst>(ii) a Wherein the content of the first and second substances,
uid is a unique identifier when the user requests LBS interest point query, t is the time of initiating query, uloctFor the location of interest point, qs, at which the user initiates the query at ttAnd querying the semantics of the interest points for the user at t, wherein the semantics of the interest points comprise hotels, restaurants, hospitals, schools, parks, bars and the like.
3. The road network LBS interest point query privacy protection method based on semantic and spatio-temporal correlation according to claim 1, wherein the proposing of the spatio-temporal correlation model in step C comprises:
step C1, the central anonymous server acquires the historical query data of the user;
step C2, the central anonymous server obtains the time period corresponding to the query request according to the time t of the query request, and according to the history, the location uloc where each user initiates the query requesttConstructing a spatio-temporal correlation sequence Ct;
and step C3, the central anonymous server constructs a space-time correlation directed graph Gt according to the space-time correlation sequence Ct, and calculates the time correlation probability between the query semantics and the position semantics.
4. The method for protecting privacy of query of interest points of road network LBS based on semantic and spatio-temporal association as claimed in claim 3, wherein the spatio-temporal association sequence in step C2 is defined as if it is in time period [ Ta, Tb ]]Within, the two adjacent queries initiated by any user us at times ta and tb respectively are qusta=<usid,ta,uslocta,qsta>、qustb=<usid,tb,usloctb,qstb>Wherein Ta is less than or equal to Ta<Tb is less than or equal to Tb; if the position usloc at the moment tbtbSemantics of (A)Semantic qs with query at time tataIf they are equal, the position semantics of the time ta is calledWith query semantics qs initiated at that timetaIn a time period [ Ta, Tb]There is a time correlation in memory, notedCounting all users in two adjacent queriesThe number of occurrences is Nf, then the nameIn a time period [ Ta, Tb ]]A length of 1, where Tn is a time period [ Ta, Tb ]]And Tn ∈ [0,1,...,23 ]]。
5. The method for protecting privacy of query of interest points of road network LBS based on semantic and spatio-temporal association as claimed in claim 3, wherein the spatio-temporal association directed graph of the current time period in step C3 is Gt ═ V, E, and the spatio-temporal association sequence C ═ { Ct ═ Ct1,Ct2,...,Ct|C|A composition comprising a set of vertices V and edges E; each vertex V e V represents a semantic category, and the out-degree of each vertex represents the semantic Sloc of the position in the spatio-temporal association sequence to other query semantics Qs { Qs ═ Qs1,qs2,...qs|Qs|The number of time-related times NF ═ NF1,Nf2,...,Nf|NF|Sum ofThe NF sets and the Qs set elements have a one-to-one corresponding relation; each edge E belongs to E and represents the position semantic Locs to the specific query semantic qs in a spatiotemporal correlation sequencem(m ∈ 1, 2., | Qs |) occurs a number of times Nfn(n∈1,2,...,|NF|);
The probability of temporal association between query semantics and location semantics in step C3 is calculated as: if the out-degree of vertex v is not 0, i.e.The probability of temporal association between location semantics and query semantics isOtherwise, ifPt is 0.
6. The road network LBS interest point query privacy protection method based on semantic and spatio-temporal associations according to claim 1, wherein the step C of constructing an optimal association table comprises the steps of:
step C1', the position semantics of the user is put into the optimal association table BSC;
step C2', according to the time correlation probability of the query semantic and the position semantic, selecting a semantic with the maximum correlation entropy in the optimal correlation table BSC and putting the semantic into the optimal correlation table BSC;
and C3 ', repeating the step C2' until the number of the elements in the optimal association table reaches the user-defined optimal association table length threshold dsem.
7. The method for protecting privacy of query of road network LBS interest points based on semantic and spatio-temporal association as claimed in claim 6, wherein the association entropy in step C2' is used to describe the indistinguishable degree of the position semantics and query content semantics in the position set of the current query time period; the larger the entropy value is, the more difficult it is for an attacker to filter the location set according to the temporal association between the query semantics and the location semantics;
given a set of locations Sloc ═ { loc }1,loc2,...,loc|Sloc|And query content semantics qs ifRepresenting the time association probability between the position semantics in the position set and the query content semantics, the association probability set isThe associated entropy is calculated as:
wherein the content of the first and second substances,is the normalization process of the time association probability between the position semantic and the query content semantic.
8. The method for protecting query privacy of road network LBS interest points based on semantic and spatio-temporal associations according to claim 6, wherein said optimal association table threshold dsem in step C3' is customized by user, wherein S is greater than or equal to dsem and is less than or equal to ST, S is a false location anonymity set in the user personalized privacy requirement and at least contains S semantic categories, and ST is all semantic categories contained in the whole road network space.
9. The method for protecting privacy of query of interest points of road network LBS based on semantic and spatio-temporal associations according to claim 1, wherein in step D, the personalized privacy requirements PR of the user are as follows: and PR is (K, L, S), wherein K is the number of the false positions in the anonymous set, L is the number of the false positions in the anonymous set, the false positions in the anonymous set are at least distributed on L road sections, and S is the number of the false positions in the anonymous set, which at least comprises S semantic categories.
10. The method for protecting query privacy of road network LBS interest points based on semantic and spatio-temporal associations according to claim 1, wherein the method for generating false positions in step E comprises:
step E1, initializing a false position anonymous set CR, a current road section set CRL and a current semantic type set CRS;
step E2, the location of the interest point of the user is ULoctAdding an anonymous set CR, adding the position semantics of a user into a current semantic type set CRS, adding a road section of the user position into a current road section set CRL, and obtaining the historical query frequency P (uloc) of the position of the interest pointt);
Step E3, detecting whether the anonymous set CR of the current user meets the user personalized privacy requirement PR ═ K, L, S; if the personalized privacy requirement PR of the user is not met, executing a step E4; otherwise, meeting the personalized privacy requirement PR of the user, ending anonymization, and returning to an anonymization set CR;
step E4, expanding the adjacent road sections in a network expansion mode, and adding the adjacent road sections into the adjacent road section set NL;
step E5, for each adjacent stretch nleE.g. NL (e 1, 2., | NL |), and randomly choose NLeThe historical query frequency on the road section is P (uloc)t)-δ,P(uloct)+δ]Taking an interest point poi _ i which is in the interval and meets the semantic diversity as a false position; wherein, delta is a threshold value which is set by the user according to the actual situation and used for judging the similarity of the historical query frequency,for the semantics of the interest point poi, i.e. randomly selecting an interest point with similar historical query frequency but different semantic categories and semantics in the BSC as a false position on each road segment, and on an adjacent road segment nleOnly one interest point is selected; adding Spoi _ i into CRS, and adding the segment nl of the interest pointeAdding to CRL and putting the section nleRemoving from the set NL of adjacent road segments to be expanded; the following two conditions are judged while selecting:
the method comprises the following steps that firstly, whether a current semantic type set CRS meets the semantic type number S of the personalized privacy requirement of a user or not is judged;
the second condition is that whether the current road section set CRL meets the number L of road sections of the personalized privacy requirement of the user or not;
if either or both of the above conditions are satisfied, executing step E6; otherwise, returning to the step E4 and the step E5;
e6, if the first condition is met and the second condition is not met, executing the step E7; if the condition two is satisfied and the condition one is not satisfied, executing step E8; if both conditions are satisfied, go to step E9;
step E7, selecting the remaining adjacent road segments NL in the adjacent road segment set NL one by onee(nle∈NL,e=el,el+1,el+2,...,|NL|,el∈(1,|NL|]) Randomly selecting each road segment nleThe historical query frequency is similar and one interest point poi _ j with semantics in the BSC is taken as a false position until the condition two is met;
At the same time, ifThen it is added to the CRS and the segment nl where the point of interest is locatedeAdding the segment into the CRL and removing the segment from the segment set NL to be expanded; if it isContinuing to expand the adjacent road sections and selecting the interest points meeting the personalized privacy requirements of the user as false positions until the condition II is met; if the condition two is satisfied, go to step E9;
step E8, selecting the interest points poi _ k with similar historical query frequency and unselected on the road sections in the current road section set CRL as false positions; wherein Spoi _ k is the semantic of the interest point poi _ k andthe Spoi _ k belongs to the BSC until a condition one is met; if the interest points meeting the personalized privacy requirements of the user do not exist in the current road section set CRL, continuously expanding the adjacent road sections to search for the interest points meeting the personalized privacy requirements of the user as false positions until the condition I is met, and executing a step E9;
e9, if the number K of the anonymous positions meeting the individual privacy requirements of the user is satisfied, ending the anonymity, reordering the anonymous sets CR and returning the anonymous sets CR; otherwise, randomly selecting interest points with similar historical query frequency, similar semantics in BSC and unselected in the current road section set CRL as false positions until the personalized privacy requirements of the user are met, and reordering the anonymous set CR and returning the anonymous set CR.
Background
The rapid development of the internet has promoted the development process of location-based services (LBS). Among them, point of interest (POI) query has been a widely used service.
However, the user may be at risk of location privacy disclosure while enjoying the convenience of the LBS point-of-interest query service. On one hand, an attacker not only steals the false position anonymous set submitted by the central anonymous server to the LBS provider, but also knows the current query content of the user, and filters out some unrealistic false positions through the time correlation relationship between the semantics of the query content and the position semantics of the anonymous set, so that the attack success probability is greatly increased; on the other hand, an attacker analyzes the false position anonymity set according to the position semantic information and by combining with background knowledge such as historical query data of interest points, road networks and the like, so that privacy information such as hobbies, living habits, occupation and even health of the user can be inferred.
The existing position privacy protection method rarely combines the query content to protect the position information, also rarely considers the position semantics contained in the anonymous set, and even only focuses on the free environment and ignores the more complex and close to the real road network environment. Meanwhile, the existing fake location generation method cannot resist the simultaneous semantic inference attack and time correlation attack, and once an attacker steals the fake location anonymous set submitted to the LBS provider by the central anonymous server and the current query content of the user, the risk of revealing the location privacy of the user is caused.
Therefore, it is urgently needed to provide a road network LBS interest point query privacy protection method based on semantic and space-time association, which can effectively prevent semantic inference attack and time association attack, to solve the above technical problems.
Disclosure of Invention
The invention aims to provide a road network LBS interest point query privacy protection method based on semantic and space-time association, which can effectively prevent semantic inference attack and time association attack, and has high safety and good protection effect.
In order to achieve the purpose, the invention provides a road network LBS interest point query privacy protection method based on semantic and space-time association, which comprises the following steps:
a, a central anonymous server initializes road network data, an interest point database and historical query data, and calculates historical interest point query frequency according to the historical query data;
step B, the mobile user sends LBS interest point query request to a central anonymous server;
c, the central anonymous server acquires historical query data of current query time, establishes a spatio-temporal correlation model to calculate time correlation probability between query semantics and position semantics, and defines a correlation entropy to construct an optimal correlation table at the current time;
d, the central anonymous server generates an anonymous set by using a false position generation method and reorders the anonymous set by combining the optimal association table, the historical query frequency of the interest point where the user initiates the query and the personalized privacy requirements of the user;
e, the central anonymous server sends the query content and the anonymous set to the LBS server;
step F, LBS, the server inquires according to the received anonymous set and the inquiry content and sends the inquiry result to the central anonymous server;
and G, filtering the received result set by the central anonymous server according to the accurate position of the user, and returning the corresponding query result to the user.
Preferably, the user-initiated query for point of interest qu in step B is represented by a quadruple<uid,t,uloct,qst>(ii) a Wherein the content of the first and second substances,
uid is a unique identifier when the user requests LBS interest point query, t is the time of initiating query, uloctFor the location of interest point, qs, at which the user initiates the query at ttAnd querying the semantics of the interest points for the user at t, wherein the semantics of the interest points comprise hotels, restaurants, hospitals, schools, parks, bars and the like.
Preferably, the proposing of the spatio-temporal correlation model in step C includes:
step C1, the central anonymous server acquires the historical query data of the user;
step C2, the central anonymous server obtains the time period corresponding to the query request according to the time t of the query request, and according to the history, the location uloc where each user initiates the query requesttConstructing a spatio-temporal correlation sequence Ct;
and step C3, the central anonymous server constructs a space-time correlation directed graph Gt according to the space-time correlation sequence Ct, and calculates the time correlation probability between the query semantics and the position semantics.
Preferably, the spatio-temporal correlation sequence in step C2 is determined if the time period [ Ta, Tb ] is within the time period]Within, the two adjacent queries initiated by any user us at times ta and tb respectively are qusta=<usid,ta,uslocta,qsta>、qustb=<usid,tb,usloctb,qstb>Wherein Ta is less than or equal to Ta<Tb is less than or equal to Tb; if the position usloc at the moment tbtbSemantics of (A)Semantic qs with query at time tataIf they are equal, the position semantics of the time ta is calledWith query semantics qs initiated at that timetaIn a time period [ Ta, Tb]There is a time correlation in memory, notedCounting all users in two adjacent queriesThe number of occurrences is Nf, then the nameIn a time period [ Ta, Tb ]]A length of 1, where Tn is a time period [ Ta, Tb ]]And Tn ∈ [0,1,...,23 ]]。
Preferably, step C3 is performedThe space-time correlation directed graph of the previous time period is Gt ═ (V, E), and the space-time correlation sequence C ═ { Ct ═1,Ct2,...,Ct|C|A composition comprising a set of vertices V and edges E; each vertex V e V represents a semantic category, and the out-degree of each vertex represents the semantic Sloc of the position in the spatio-temporal association sequence to other query semantics Qs { Qs ═ Qs1,qs2,...qs|Qs|The number of time-related times NF ═ NF1,Nf2,...,Nf|NF|Sum ofThe NF sets and the Qs set elements have a one-to-one corresponding relation; each edge E belongs to E and represents the position semantic Locs to the specific query semantic qs in a spatiotemporal correlation sequencem(m ∈ 1, 2., | Qs |) occurs a number of times Nfn(n∈1,2,...,|NF|);
The probability of temporal association between query semantics and location semantics in step C3 is calculated as: if the out-degree of vertex v is not 0, i.e.The probability of temporal association between location semantics and query semantics isOtherwise, ifPt is 0.
Preferably, the constructing the optimal association table in step C includes:
step C1', the position semantics of the user is put into the optimal association table BSC;
step C2', according to the time correlation probability of the query semantic and the position semantic, selecting a semantic with the maximum correlation entropy in the optimal correlation table BSC and putting the semantic into the optimal correlation table BSC;
and C3 ', repeating the step C2' until the number of the elements in the optimal association table reaches the user-defined optimal association table length threshold dsem.
Preferably, the associated entropy in step C2' is used to describe the indistinguishable degree of the location semantics and the query content semantics in the location set of the current query time period; the larger the entropy value is, the more difficult it is for an attacker to filter the location set according to the temporal association between the query semantics and the location semantics;
given a set of locations Sloc ═ { loc }1,loc2,...,locSlocAnd query content semantics qs ifRepresenting the time association probability between the position semantics in the position set and the query content semantics, the association probability set isThe associated entropy is calculated as:
wherein the content of the first and second substances,is the normalization process of the time association probability between the position semantic and the query content semantic.
Preferably, the optimal association table threshold dsem in step C3' is customized by the user, where S is not less than dsem is not less than ST, S is a pseudo location anonymity set in the personalized privacy requirement of the user and at least includes S semantic categories, and ST is all semantic categories included in the whole road network space.
Preferably, the user personalized privacy requirement PR in step D is: and PR is (K, L, S), wherein K is the number of the false positions in the anonymous set, L is the number of the false positions in the anonymous set, the false positions in the anonymous set are at least distributed on L road sections, and S is the number of the false positions in the anonymous set, which at least comprises S semantic categories.
Preferably, the method for generating the pseudo position in step E includes:
step E1, initializing a false position anonymous set CR, a current road section set CRL and a current semantic type set CRS;
step (ii) ofE2, locating the user's interest point location uloctAdding an anonymous set CR, adding the position semantics of a user into a current semantic type set CRS, adding a road section of the user position into a current road section set CRL, and obtaining the historical query frequency P (uloc) of the position of the interest pointt);
Step E3, detecting whether the anonymous set CR of the current user meets the user personalized privacy requirement PR ═ K, L, S; if the personalized privacy requirement PR of the user is not met, executing a step E4; otherwise, meeting the personalized privacy requirement PR of the user, ending anonymization, and returning to an anonymization set CR;
step E4, expanding the adjacent road sections in a network expansion mode, and adding the adjacent road sections into the adjacent road section set NL;
step E5, for each adjacent stretch nleE.g. NL (e 1, 2., | NL |), and randomly choose NLeThe historical query frequency on the road section is P (uloc)t)-δ,P(uloct)+δ]Taking an interest point poi _ i which is in the interval and meets the semantic diversity as a false position; wherein, delta is a threshold value which is set by the user according to the actual situation and used for judging the similarity of the historical query frequency,for the semantics of the interest point poi, i.e. randomly selecting an interest point with similar historical query frequency but different semantic categories and semantics in the BSC as a false position on each road segment, and on an adjacent road segment nleOnly one interest point is selected; adding Spoi _ i into CRS, and adding the segment nl of the interest pointeAdding to CRL and putting the section nleRemoving from the set NL of adjacent road segments to be expanded; the following two conditions are judged while selecting:
the method comprises the following steps that firstly, whether a current semantic type set CRS meets the semantic type number S of the personalized privacy requirement of a user or not is judged;
the second condition is that whether the current road section set CRL meets the number L of road sections of the personalized privacy requirement of the user or not;
if either or both of the above conditions are satisfied, executing step E6; otherwise, returning to the step E4 and the step E5;
e6, if the first condition is met and the second condition is not met, executing the step E7; if the condition two is satisfied and the condition one is not satisfied, executing step E8; if both conditions are satisfied, go to step E9;
step E7, selecting the remaining adjacent road segments NL in the adjacent road segment set NL one by onee(nle∈NL,e=el,el+1,el+2,...,|NL|,el∈(1,|NL|]) Randomly selecting each road segment nleThe historical query frequency is similar, and one interest point poi _ j of the semantics in the BSC is taken as a false position until a condition two is met;
at the same time, ifThen it is added to the CRS and the segment nl where the point of interest is locatedeAdding the segment into the CRL and removing the segment from the segment set NL to be expanded; if it isContinuing to expand the adjacent road sections and selecting the interest points meeting the personalized privacy requirements of the user as false positions until the condition II is met; if the condition two is satisfied, go to step E9;
step E8, selecting the interest points poi _ k with similar historical query frequency and unselected on the road sections in the current road section set CRL as false positions; wherein Spoi _ k is the semantic of the interest point poi _ k andthe Spoi _ k belongs to the BSC until a condition one is met; if the interest points meeting the personalized privacy requirements of the user do not exist in the current road section set CRL, continuously expanding the adjacent road sections to search for the interest points meeting the personalized privacy requirements of the user as false positions until the condition I is met, and executing a step E9;
e9, if the number K of the anonymous positions meeting the individual privacy requirements of the user is satisfied, ending the anonymity, reordering the anonymous sets CR and returning the anonymous sets CR; otherwise, randomly selecting interest points with similar historical query frequency, similar semantics in BSC and unselected in the current road section set CRL as false positions until the personalized privacy requirements of the user are met, and reordering the anonymous set CR and returning the anonymous set CR.
According to the technical scheme, the method comprises the steps of establishing a space-time correlation model by utilizing historical query data, calculating the correlation probability between query semantics and position semantics, defining a correlation entropy to construct an optimal correlation table of the current time period, and generating a group of false position anonymity sets which are distributed on different road sections and can resist semantic inference attacks and time correlation attacks for a user by adopting a network expansion method according to the optimal correlation table, so that an attacker cannot obtain position privacy information of the user, and the position privacy protection of the user during LBS interest point query is realized.
Additional features and advantages of the invention will be set forth in the detailed description which follows.
Drawings
The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description serve to explain the principles of the invention and not to limit the invention. In the drawings:
FIG. 1 is a flow chart of a road network LBS interest point query privacy protection method based on semantic and spatio-temporal association provided by the present invention;
FIG. 2 is a system architecture diagram of a road network LBS interest point query privacy protection method based on semantic and spatio-temporal association according to the present invention;
FIG. 3 is a schematic diagram of an embodiment of a road network LBS interest point query privacy protection method based on semantic and spatio-temporal association according to the present invention;
FIG. 4 is a time-space correlation directed graph of an embodiment of a road network LBS interest point query privacy protection method based on semantics and time-space correlation provided by the present invention.
Detailed Description
The following detailed description of embodiments of the invention refers to the accompanying drawings. It should be understood that the detailed description and specific examples, while indicating the present invention, are given by way of illustration and explanation only, not limitation.
In the present invention, unless otherwise specified, the directional terms included in the terms merely represent the directions of the terms in a conventional use state or are colloquially known by those skilled in the art, and should not be construed as limiting the terms.
Referring to fig. 1 and fig. 2, the invention provides a road network LBS interest point query privacy protection method based on semantic and spatiotemporal association, comprising:
a, a central anonymous server initializes road network data, an interest point database and historical query data, and calculates historical interest point query frequency according to the historical query data;
step B, the mobile user sends LBS interest point query request to a central anonymous server;
wherein, the interest point query request qu initiated by the user in the step B is represented by a quadruple<uid,t,uloct,qst>(ii) a uid is a unique identifier when the user requests LBS interest point query, t is the time of initiating query, uloctFor the location of interest point, qs, at which the user initiates the query at ttAnd querying the semantics of the interest points for the user at t, wherein the semantics of the interest points comprise hotels, restaurants, hospitals, schools, parks, bars and the like.
C, the central anonymous server acquires historical query data of current query time, establishes a spatio-temporal correlation model to calculate time correlation probability between query semantics and position semantics, and defines a correlation entropy to construct an optimal correlation table at the current time;
wherein, the proposing of the spatio-temporal correlation model in the step C comprises the following steps:
step C1, the central anonymous server acquires the historical query data of the user;
step C2, the central anonymous server obtains the time period corresponding to the query request according to the time t of the query request, and according to the history, the location uloc where each user initiates the query requesttConstructing a spatio-temporal correlation sequence Ct;
the space-time correlation sequence in step C2 refers to the time period [ Ta, Tb]Within, the two adjacent queries initiated by any user us at times ta and tb respectively are qusta=<usid,ta,uslocta,qsta>、qustb=<usid,tb,usloctb,qstb>Wherein Ta is less than or equal to Ta<Tb is less than or equal to Tb; if the position usloc at the moment tbtbSemantics of (A)Semantic qs with query at time tataIf they are equal, the position semantics of the time ta is calledWith query semantics qs initiated at that timetaIn a time period [ Ta, Tb]There is a time correlation in memory, notedCounting all users in two adjacent queriesThe number of occurrences is Nf, then the nameIn a time period [ Ta, Tb ]]A length of 1, where Tn is a time period [ Ta, Tb ]]And Tn ∈ [0,1,...,23 ]]。
And step C3, the central anonymous server constructs a space-time correlation directed graph Gt according to the space-time correlation sequence Ct, and calculates the time correlation probability between the query semantics and the position semantics.
In step C3, the space-time correlation directed graph of the current time period is Gt ═ (V, E), and the space-time correlation sequence C ═ { Ct1,Ct2,...,Ct|C|A composition comprising a set of vertices V and edges E; each vertex V e V represents a semantic category, and the out-degree of each vertex represents the semantic Sloc of the position in the spatio-temporal association sequence to other query semantics Qs { Qs ═ Qs1,qs2,...qs|Qs|The number of time-related times NF ═ NF1,Nf2,...,Nf|NF|Sum ofWherein, NF set and Qs set elements exist one-to-oneThe corresponding relation; each edge E belongs to E and represents the position semantic Locs to the specific query semantic qs in a spatiotemporal correlation sequencem(m ∈ 1, 2., | Qs |) occurs a number of times Nfn(n∈1,2,...,|NF|);
The probability of temporal association between query semantics and location semantics in step C3 is calculated as: if the out-degree of vertex v is not 0, i.e.The probability of temporal association between location semantics and query semantics isOtherwise, ifPt is 0.
In addition, the step C of constructing the optimal association table includes:
step C1', the position semantics of the user is put into the optimal association table BSC;
step C2', according to the time correlation probability of the query semantic and the position semantic, selecting a semantic with the maximum correlation entropy in the optimal correlation table BSC and putting the semantic into the optimal correlation table BSC;
wherein, the associated entropy in the step C2' is used to describe the indistinguishable degree of the position semantics and the query content semantics in the position set of the current query time period; the larger the entropy value is, the more difficult it is for an attacker to filter the location set according to the temporal association between the query semantics and the location semantics;
given a set of locations Sloc ═ { loc }1,loc2,...,loc|Sloc|And query content semantics qs ifRepresenting the time association probability between the position semantics in the position set and the query content semantics, the association probability set isThe associated entropy is calculated as:
wherein the content of the first and second substances,is the normalization process of the time association probability between the position semantic and the query content semantic.
And C3 ', repeating the step C2' until the number of the elements in the optimal association table reaches the user-defined optimal association table length threshold dsem.
The optimal association table threshold dsem in the step C3' is self-defined by the user, wherein S is greater than or equal to dsem and is less than or equal to ST, S is a false location anonymity set in the personalized privacy requirement of the user and at least includes S semantic categories, and ST is all semantic categories included in the whole road network space.
D, the central anonymous server generates an anonymous set by using a false position generation method and reorders the anonymous set by combining the optimal association table, the historical query frequency of the interest point where the user initiates the query and the personalized privacy requirements of the user;
and D, the personalized privacy requirement PR of the user is as follows: and PR is (K, L, S), wherein K is the number of the false positions in the anonymous set, L is the number of the false positions in the anonymous set, the false positions in the anonymous set are at least distributed on L road sections, and S is the number of the false positions in the anonymous set, which at least comprises S semantic categories.
E, the central anonymous server sends the query content and the anonymous set to the LBS server;
further, the method for generating the false position in step E includes:
step E1, initializing a false position anonymous set CR, a current road section set CRL and a current semantic type set CRS;
step E2, the location of the interest point of the user is ULoctAdding an anonymous set CR, adding the position semantics of a user into a current semantic type set CRS, adding a road section of the user position into a current road section set CRL, and obtaining the historical query frequency P (uloc) of the position of the interest pointt);
Step E3, detecting whether the anonymous set CR of the current user meets the user personalized privacy requirement PR ═ K, L, S; if the personalized privacy requirement PR of the user is not met, executing a step E4; otherwise, meeting the personalized privacy requirement PR of the user, ending anonymization, and returning to an anonymization set CR;
step E4, expanding the adjacent road sections in a network expansion mode, and adding the adjacent road sections into the adjacent road section set NL;
step E5, for each adjacent stretch nleE.g. NL (e 1, 2., | NL |), and randomly choose NLeThe historical query frequency on the road section is P (uloc)t)-δ,P(uloct)+δ]Taking an interest point poi _ i which is in the interval and meets the semantic diversity as a false position; wherein, delta is a threshold value which is set by the user according to the actual situation and used for judging the similarity of the historical query frequency,for the semantics of the interest point poi, i.e. randomly selecting an interest point with similar historical query frequency but different semantic categories and semantics in the BSC as a false position on each road segment, and on an adjacent road segment nleOnly one interest point is selected; adding Spoi _ i into CRS, and adding the segment nl of the interest pointeAdding to CRL and putting the section nleRemoving from the set NL of adjacent road segments to be expanded; the following two conditions are judged while selecting:
the method comprises the following steps that firstly, whether a current semantic type set CRS meets the semantic type number S of the personalized privacy requirement of a user or not is judged;
the second condition is that whether the current road section set CRL meets the number L of road sections of the personalized privacy requirement of the user or not;
if either or both of the above conditions are satisfied, executing step E6; otherwise, returning to the step E4 and the step E5;
e6, if the first condition is met and the second condition is not met, executing the step E7; if the condition two is satisfied and the condition one is not satisfied, executing step E8; if both conditions are satisfied, go to step E9;
step E7, selecting one by oneSelecting remaining adjacent road segments NL in the set of adjacent road segments NLe(nle∈NL,e=el,el+1,el+2,...,|NL|,el∈(1,|NL|]) Randomly selecting each road segment nleThe historical query frequency is similar, and one interest point poi _ j of the semantics in the BSC is taken as a false position until a condition two is met;
at the same time, ifThen it is added to the CRS and the segment nl where the point of interest is locatedeAdding the segment into the CRL and removing the segment from the segment set NL to be expanded; if it isContinuing to expand the adjacent road sections and selecting the interest points meeting the personalized privacy requirements of the user as false positions until the condition II is met; if the condition two is satisfied, go to step E9;
step E8, selecting the interest points poi _ k with similar historical query frequency and unselected on the road sections in the current road section set CRL as false positions; wherein Spoi _ k is the semantic of the interest point poi _ k andthe Spoi _ k belongs to the BSC until a condition one is met; if the interest points meeting the personalized privacy requirements of the user do not exist in the current road section set CRL, continuously expanding the adjacent road sections to search for the interest points meeting the personalized privacy requirements of the user as false positions until the condition I is met, and executing a step E9;
e9, if the number K of the anonymous positions meeting the individual privacy requirements of the user is satisfied, ending the anonymity, reordering the anonymous sets CR and returning the anonymous sets CR; otherwise, randomly selecting interest points with similar historical query frequency, similar semantics in BSC and unselected in the current road section set CRL as false positions until the personalized privacy requirements of the user are met, and reordering the anonymous set CR and returning the anonymous set CR.
Step F, LBS, the server inquires according to the received anonymous set and the inquiry content and sends the inquiry result to the central anonymous server;
and G, filtering the received result set by the central anonymous server according to the accurate position of the user, and returning the corresponding query result to the user.
Therefore, the method can effectively prevent an attacker from deducing the position of the user by utilizing the time association relation between the query content semantics and the anonymous set position semantics, and can also effectively prevent semantic inference attack caused by too single semantics in the anonymous set; meanwhile, the problem of over-centralized false location distribution is avoided, and the protection degree of the location privacy of the user under the LBS interest point query service is enhanced.
One specific example of the invention is provided below to further illustrate the invention in detail:
step A, a central anonymous server initializes road network data, a semantic position database and historical interest point query frequency. As shown in fig. 3, link numbers are represented by 1 to 14, link start and end nodes are represented by i to vi (road network edges) and (r) to (c), and positions on each link are represented by a to T. The semantic categories are schools, restaurants, hospitals, hotels, banks, communities, bars, parks.
Step B, the user A sends a service query request < Aid,10:00, J school, restaurant > based on the position to a central anonymous server, the historical query frequency of the current J school is 0.03, the query frequency similarity threshold delta is 0.005, the dsem defined by the user is 4, and the personalized privacy requirement PR of the user A is (6,5, 4);
c, the central anonymous server acquires the query of the user A, acquires the query data of the historical 10:00-11:00 time period according to the query time of 10:00, and constructs a space-time correlation sequence:<10 school → restaurant, 10>、<10, school → hotel, 8>、<10 school → Bar, 2>、<10 park → restaurant, 9>、<10 park → school, 6>、<10 hotel → restaurant, 1>、<10, Hotel → park, 8>、<10 hotel → Bar, 1>、<10, cell → restaurant, 2>、<10, cell → school, 8>、<10, bank → restaurant, 8>、<10, bank → hotel, 4>、<10, bank → school, 3>、<10 restaurant → restaurant, 6>、<10 restaurant → park, 3>、<10 restaurant → school, 4>、<10 hospital → restaurant, 9>、<10 Hospital → park, 1>、<10 Hospital → school, 1>、<10, hospital → hospital: 1>. Establishing a spatio-temporal correlation directed graph according to the spatio-temporal correlation sequence as shown in fig. 4, and calculating the correlation probability between the query semantics and the position semantics in the current query time period: school → restaurant:park → restaurant:hotel → restaurant:cell → restaurant:bank → restaurant:restaurant → restaurant:hospital → restaurant:bar → restaurant: 0.
then, the central anonymous server firstly adds the semantic "school" of the interest point "J school" where the current user A is located into an optimal association table, wherein the current optimal association table is as follows: { "school" }, candidate semantics: parks, hotels, districts, banks, restaurants, hospitals, bars. The associated entropy is then calculated (only one detailed calculation is given):
E{ school, hotel } → dining room≈0.6500、E{ studyingSchool, district } → restaurant≈0.8631、E{ school, bank } → restaurant≈0.9994、E{ school, restaurant } → restaurant≈0.9987、E{ school, hospital } → dining room≈0.9706、E{ school, bar } → restaurantAnd (5) selecting the one with the largest entropy value to be added into the optimal association table: { "school", "bank" }, where | BSC | ═ 2<dsem, remaining candidate semantics: parks, hotels, cells, restaurants, hospitals, bars, again compute the associated entropy (only one detailed computation is listed here):
E{ school, bank, nine spots } → restaurant≈1.3424、E{ school, bank, district } → restaurant≈1.4774、E{ school, bank, restaurant } → restaurant≈1.5825、E{ school, bank, hospital } → dining room≈1.5604、E{ school, bank, bar } → restaurantAnd (4) the value is approximately equal to 0.9994, and one entropy value is selected to be added into an optimal association table: { "school", "bank", "restaurant" }, where | BSC | -, 3<And (5) repeating the calculation for the dsem and the remaining candidate semantics of the parks, the hotels, the districts, the hospitals and the bars, selecting a dsem with the largest entropy value each time, adding the dsem into the optimal association table until the dsem meets the user-defined requirements, and calculating the optimal association table as follows: { "school", "bank", "restaurant", "park" }.
And E, adding the road section where the user is located into the current road section set CRL, adding the semantics of the interest point into the current semantic type set CRS when initiating the query, and adding the interest point into the anonymous set CR when initiating the query by the user, namely the anonymous set CR is { "H school" }, the CRL is { "6" }, and the CRS is { "school" }.
An anonymous position is searched in a network expansion mode, an adjacent road section set NL { "3", "7", "9", "12" } is obtained, and the query frequency of the H school is known to be 0.3, the historical query frequency of the C park is known to be 0.028, the historical query frequency of the E restaurant is known to be 0.03, the historical query frequency of the N bank is known to be 0.027, and the query frequency of the P bank is known to be 0.031. For each road segment in the set, randomly selecting an interest point on the road segment according to the semantic in the optimal association table and the historical query frequency of the interest point in the interval [0.025,0.035], namely randomly selecting one of a C park and an E restaurant on the road segment 3 as a false position. Adding a C park into an anonymous set CR, and deleting a road section where the C park is located from an adjacent road section set, wherein the CR is { "H school", "C park" }, CRL is { "6", "4" }, CRS is { "school", "park" }, NL is { "7", "9", and "12". At this time, | CRS | ═ 2<4, | CRS | ═ 2<5, and the user continues to determine while selecting, without satisfying the personalized privacy requirement of the user. Since the points of interest in the road segments 7, 9 are not in the optimal association table, the road segments 7, 9 are removed from the set of adjacent road segments. Randomly selected P banks on the route segment 12 are added to the anonymous set CR, that is, CR { "H school", "C park", "P bank" }, CRL { "6", "3", "12" }, CRs { "school", "park", "bank" }. At this time, | CRS | ═ 3<4, | CRS | ═ 3<5, and the personalized privacy requirements of the user are not satisfied.
NL is an empty set, and continues to expand the adjacent segments, adding segments 1,2, 5, 8, 10, 11, 13, 14 to the set of adjacent segments. It is known that the historical query frequency of restaurant a is 0.015, that of restaurant K is 0.032, that of park F is 0.022, and that of bank R is 0.029. Section a restaurant on section 1 is excluded because semantic diversity is satisfied but historical query frequency is not satisfied, and section 5 is removed from NL because the semantic "hospital" does not exclude section 2 in the optimal association table, although the semantic of F park is in the optimal association table but it does not satisfy historical query frequency. K restaurants in the route segment 8 satisfy semantic diversity and in the optimal association table, restaurants are added to the anonymous set CR, namely CR { "H school", "C park", "P bank", "K restaurant" }, CRL { "6", "3", "12", "8" }, CRs { "school", "park", "bank", "restaurant" }, NL { "10", "11", "13", "14" }. At this time, the semantic diversity | CRS |, which meets the personalized privacy requirement of the user, is 4, but the number of the segments which do not meet the personalized privacy requirement of the user continues to search for the pseudo position to be expanded in NL. Since the L-cell in the segment 10 is not in the optimal association table, the segment 10 is removed from the NL. The R bank in the link 11 is added to the anonymous set CR, where CR { "H school", "C park", "P bank", "K restaurant", "R bank" }, CRL { "6", "3", "12", "8", "11" }, CRs { "school", "park", "bank", "restaurant" }, and the number of links that satisfy semantic diversity and user privacy requirements.
But the anonymous set | CR | ═ 5< K, the point of interest continues to be found in the current road segment CRL, the E restaurant in the road segment 3 and the N bank in the road segment 12 satisfy the condition, one of them is randomly selected as a false location, i.e., CR { "H school", "C park", "P bank", "K restaurant", "R bank", "E restaurant" }, the user personalized privacy requirements are met, and the anonymous set is reordered to { "R bank", "C park", "E restaurant", "H school", "K restaurant", "P bank" }.
The center anonymous server sends the query content 'restaurant' and the anonymous set { 'R Bank', 'C park', 'E restaurant', 'H school', 'K restaurant', 'P Bank' } to the LBS server;
step F, LBS, the server inquires according to the received anonymous set and the inquiry content and sends the inquiry result to the central anonymous server;
and G, filtering the received result set by the central anonymous server according to the accurate position of the user, returning the corresponding result to the user, and finishing the query.
According to the technical scheme, the method comprises the steps of establishing a space-time correlation model by utilizing historical query data, calculating the correlation probability between query semantics and position semantics, defining a correlation entropy to construct an optimal correlation table of the current time period, and generating a group of false position anonymity sets which are distributed on different road sections and can resist semantic inference attacks and time correlation attacks for a user by adopting a network expansion method according to the optimal correlation table, so that an attacker cannot obtain position privacy information of the user, and the position privacy protection of the user during LBS interest point query is realized.
Therefore, the method provides a space-time association model for measuring the time association relation between the query semantics and the position semantics. And constructing a space-time correlation sequence through historical query data, establishing a space-time correlation directed graph according to the space-time correlation sequence, and calculating time correlation probability by using the out-degree condition.
Meanwhile, an associated entropy is defined to calculate the indistinguishable degree of the location semantics and the query semantics in the location set. Wherein the larger the entropy value, the harder it is for an attacker to filter false locations based on temporal correlation analysis of the location semantics and query semantics of the location set.
In addition, a false position generation method meeting the personalized privacy requirements of the user is also provided. By formalizing the personalized privacy requirements of the users, a group of false position anonymous sets which are distributed on different road sections and can resist semantic inference attacks and time correlation attacks are generated for the users by using the proposed false position generation method.
The preferred embodiments of the present invention have been described in detail with reference to the accompanying drawings, however, the present invention is not limited to the specific details of the above embodiments, and various simple modifications can be made to the technical solution of the present invention within the technical idea of the present invention, and these simple modifications are within the protective scope of the present invention.
It should be noted that the various technical features described in the above embodiments can be combined in any suitable manner without contradiction, and the invention is not described in any way for the possible combinations in order to avoid unnecessary repetition.
In addition, any combination of the various embodiments of the present invention is also possible, and the same should be considered as the disclosure of the present invention as long as it does not depart from the spirit of the present invention.