Private data two-party security comparison method based on trusted third party
1. A private data two-party security comparison method based on a trusted third party is characterized by comprising the following steps:
randomly selecting two random numbers as masks input by both computing parties by using a trusted third party, and respectively sending the two random numbers generated randomly to both computing parties;
when a trusted third party randomly selects two random numbers, operation keys for comparison operation are respectively generated for two calculation parties;
the two calculation parties generate an input added with a mask according to the input of the two calculation parties and send the input to the other calculation party;
and the two calculation parties carry out comparison operation according to the input added with the mask and the own operation key to obtain the own corresponding comparison result.
2. The two-party secure comparison method for private data based on the trusted third party as claimed in claim 1, wherein the trusted third party calculates the calculation key for comparison calculation according to the algorithm KeyGen.
3. The method according to claim 1, wherein during the selection of two random numbers by the trusted third party, the CTR/ECB encryption mode of AES is used to speed up random number performance.
4. The method of claim 3, wherein the trusted third party transmits the AES encryption key via Diffie-Hellman's key exchange protocol.
Background
In the era of big data networking, the privacy protection problem of sensitive data becomes a prominent problem which needs to be solved urgently, and particularly under the condition that various laws related to privacy protection are issued at home and abroad in recent years, important items related to the sensitive data are stranded due to the lack of privacy protection on key data. In order to enable data to flow (invisible) without exposure, privacy computing plays an important role as a main tool and means in a series of environments requiring privacy protection, such as blockchains, federal learning, and the like.
In common privacy computation, operators of two-party privacy protection computation based on a trusted third party, such as two-party safe four-rule operation, comparison operation and the like, become the basis for constructing privacy computation. However, due to the calculation overhead and the network overhead of the existing implementation scheme, the calculation efficiency of the existing scheme cannot be improved well when the existing scheme is applied to large-scale data operation.
One of the solutions of the comparison operation technology capable of implementing privacy protection in the prior art is implemented by determining whether to generate borrow operation through bit decomposition and bit subtraction by using a secret sharing method. To better understand the process of the overall scheme, assume that the two unsigned integers are x and y in the overall scheme. The comparison-less algorithm for these two unsigned integers can then be briefly divided into the following steps:
1) bit-splitting two input numbers, i.e. x ═ x1x2…xlAnd y ═ y1y2…ylWherein the length of the two unsigned integers is assumed to be l bits;
2) the unsigned integer represented by the two bits is subtracted. The borrow flag z after subtraction is the final result.
The privacy protection method of the sub-operation is to construct the whole process from the basis of secret sharing, the essence of which is that for each and every secret sharingAn input x is decomposed into two random numbers and distributed to two parties, namely x ═ x]0+[x]1Wherein [ x ]]0And [ x ]]1Each represents P0And P1The resulting secret shard for x, P0And P1Representing two parties involved in the privacy computation, respectively. The subtraction is based on secret sharing, and in short, the two parties subtract the secret slices x and y corresponding to each other. The secret sharing multiplication needs to generate random multiplication pairs by a trusted third party and needs to be completed by two parties through one interaction.
Here, a less than comparative efficiency bottleneck is embodied in two aspects. First, once each bit and operation involved in the bit decomposition of two inputs needs to be communicated once by both sides (the bit and operation and the multiplication operation are actually equivalent, but the operand field of the bit and operation is binary); second, the operation of subtracting the numbers after the two-bit decomposition also involves multiple rounds of operations (since the two-bit subtraction operation can also be decomposed into l and operations). That is, P0And P1At least one round of interaction is required to complete the sequence or operation. I.e. complexity of bit decomposition and bit phase subtraction operation leading to P0And P1The number of communication rounds. Assuming that l is 64, a generic 64-bit integer, then performing an equality test requires at least 64 rounds of communication. However, as the demand for internet services increases, current privacy protection is facing the challenge of processing and analyzing private data, especially where near real-time processing of such data is required. According to the comparison method, the calculation amount is large, the time consumption is long, a large amount of network space is occupied, and the method cannot help to process a huge data set and provide response in real time, which is unacceptable in the current internet era; in the big data era, the application ratio of the general operator is high, and the network overhead caused by the method cannot be accepted.
The second prior art scheme is to use the garbled circuit to implement the encoding of the logic circuit to obtain an encrypted garbled circuit. For the garbled circuit, the main steps and the secret sharing are basically similar, but because the operation is not performed on the original plaintext circuit, for each bit, the encoding party needs to encode by using a 128-bit random number to achieve the purpose of encryption. Meanwhile, since the garbled circuit calculator is required to obtain the random codes corresponding to the corresponding inputs of the garbled circuit calculator between the calculation of the garbled circuit result, the overhead of the part which is transmitted carelessly is involved, and the overhead of the part which is transmitted carelessly can be reduced by a trusted third party.
Generally, in the garbled circuit scheme, both computing parties can compute the final result through one round of communication. The network overhead can be divided into the overhead of inadvertent transmissions and the overhead of garbled circuits. The overhead of the inadvertent transmission is at least 128l bits, and the overhead of the garbled circuit is at least 256l bits. However, for 64-bit integers, the garbled circuit scheme requires at least 3KB of data to be transferred to complete a comparison. For large-scale comparison operation, the network overhead cost is too large. Therefore, for large data operations, the amount of network traffic will become an important performance bottleneck.
Disclosure of Invention
The invention aims to overcome the defects of the prior art and provide a private data two-party security comparison method based on a trusted third party, which greatly improves the number of communication rounds and the communication traffic compared with the prior art, thereby minimizing the number of network communication rounds and further reducing the network communication traffic.
The purpose of the invention can be realized by the following technical scheme:
a private data two-party security comparison method based on a trusted third party comprises the following steps:
randomly selecting two random numbers as masks input by both computing parties by using a trusted third party, and respectively sending the two random numbers generated randomly to both computing parties;
when a trusted third party randomly selects two random numbers, operation keys for comparison operation are respectively generated for two calculation parties;
the two calculation parties generate an input added with a mask according to the input of the two calculation parties and send the input to the other calculation party;
and the two calculation parties carry out comparison operation according to the input added with the mask and the own operation key to obtain the own corresponding comparison result.
Further, the trusted third party calculates both parties according to the algorithm KeyGen and generates an operation key for comparison operation.
Further, in the process of selecting two random numbers by the trusted third party, the CTR/ECB encryption mode of AES is adopted to accelerate the performance of the random numbers.
Further, the trusted third party transmits the AES encryption key via Diffie-Hellman's key exchange protocol.
Compared with the prior art, the private data two-party safety comparison method based on the trusted third party at least has the following beneficial effects:
1) the method improves the existing scheme from the aspect of communication wheel number and communication traffic, thereby achieving the purpose of minimizing the network communication wheel number and further reducing the network communication traffic; the traffic is reduced by nearly 50% compared to at least 128l +256l for a garbled circuit; compared with a secret sharing mode, the number of communication rounds is reduced by more than 90%.
2) The invention constructs a special data structure and a method, the data structure is constructed based on a tree-shaped data structure, and the internal operation only relates to simple addition, subtraction and exclusive-or operation; in addition, the generation of random numbers adopts a special instruction set mode to accelerate speed. The calculation efficiency can be further improved, and the communication traffic can be reduced.
Drawings
Fig. 1 is a schematic flowchart of a private data two-party security comparison method based on a trusted third party in an embodiment.
Detailed Description
The invention is described in detail below with reference to the figures and specific embodiments. It is to be understood that the embodiments described are only a few embodiments of the present invention, and not all embodiments. All other embodiments, which can be obtained by a person skilled in the art without any inventive step based on the embodiments of the present invention, shall fall within the scope of protection of the present invention.
Examples
In order to facilitate a better understanding of the present application, the technical parameter terms related to the present embodiment will be briefly described below.
P0And P1: representing two parties involved in the privacy computation.
The random number generator has an input of a seed s with a length of in bits and an output of a random number with out bits.
An input x of length in bits is converted to an output of length out bits.
λ: representing the security parameters of the system.
: representing an exclusive or operation of two bit strings or two integers.
The invention relates to a private data two-party security comparison method based on a trusted third party, wherein the flow of the complete technical scheme of the method is shown in fig. 1, for uniform description, the embodiment adopts a less than operation as an example (i.e. x < y), and other operation equivalence relations are shown in the following table:
TABLE 1 operational equivalence relation
Original relationship
Equivalence relation
x>y
y<x
x≤y
1-(y<x)
x≥y
1-(x<y)
The specific procedure for the operation LT (x, y) is as follows, where this example uses b ∈ {0,1} to represent one of the two parties (correspondingly, 1-b represents the other):
step one, acquiring initial data of a target user through data acquisition equipment, dividing the initial data into first grouped data and second grouped data which are shared randomly, wherein the two grouped data respectively represent a computing party P0And P1。
A trusted third party randomly selects two random numbers r with the length of lambda bits0And r1As P0And P1The input mask, and then sends the two numbers to both parties separately (note: P)bThe random number r cannot be obtained1-b)。
Step two, simultaneously with the step above, the trusted third party needs to generate and calculate the key k smaller than the operation LT for the two parties according to the algorithm KeyGen0And k1This key will be used by both parties to compute the Less Than operation LT (Less Than operation) (algorithm KeyGen will be introduced later).
And step three, the two parties generate the input added with the mask according to the input of the two parties and send the input to the other party.
Step four, the two parties calculate the corresponding result z according to the input of the added mask and the operation key operation less than the operation LT of the two parties0And z1Wherein z is0+z1X < y) (algorithm LT will be described later).
The main contents of the algorithm KeyGen are: trusted thirdSquare computation k0,k1KeyGen (alpha, 1), where alpha r0-r1The difference between the two masks is indicated. The specific steps of the algorithm are as follows:
1: let alpha be alpha0α1…αl-1Bit representation of alpha, where alpha0Represents the highest bit of alpha;
2: let s, t, and cw be three empty lists;
3: generating two random numbers with the length of lambda bits and adding the random numbers into a list s;
4: adding two numbers of 0 and 1 into the list t respectively;
5: let vα0 is an integer of length l bits, representing the correction value for each layer;
6: repeating the steps from 7 to 15 from the beginning of i to the end of l-1;
7: order toAndwhereinRespectively representing random numbers of length lambda bits,also respectively represent random numbers of length lambda bits,respectively represent random bits;
8: computingWherein s iscwIs a bit string of length λ bits;
9: order tovcwThe representation length is l bitsAn integer of (d);
10: order tovαRepresents an integer of length l bits;
11: computingAndwherein the content of the first and second substances,represents two bits;
12: will be provided withAdding the list cw;
13: let t0=t[-2]And t1=t[-1]I.e. t0,t1Respectively representing the values of the penultimate and penultimate bits of the list t so far;
14: will be provided withAndadding the list s;
15: will be provided withAndadding the list t;
16: will be provided withAdding to the list cw, where s-2],s[-1]Respectively represent the penultimate sum of the list s so farThe value of the last-but-one bit string;
17:return k0=(s[0]cw) and k1=(s[1]Cw) in which s [0 ]],s[1]Respectively representing the values, k, of the first and second bit strings of the list s0,k1Respectively indicate to be sent to P0And P1The key of (2).
The general idea of the algorithm KeyGen is: each bit of α corresponds to each layer of the for loop. For the ith layer, scwCorresponding to the seed 1-alpha on the upper layeri-1Generating random numbers corresponding to the two child nodes;corresponding is a supplementary item to the control bit list t. At the ith layer, two terms are generated that add to s and two terms that add to t:
a. two terms added to s are two random numbers in the form ofThe exclusive or result of the two terms is the exclusive or of the 4 random numbers generated by the layer;
b. two terms added to t are two random bits, which are shaped asThe exclusive or result of these two terms is 1.
The meaning of the above will be clearly understood in conjunction with the less than algorithm LT:
less than algorithm LT: computing each party P of both partiesb(b ∈ {0,1}) z was calculated separatelyb=LT(kbW), where w ═ x + r0-y-r1. The specific steps of the algorithm are as follows:
1: let w be w0w1…wl-1Bit representation of w, where w0Represents the highest bit of w;
2: let s and t be two empty lists;
3: resolution of kb=s[b]Cw, and mixing kbFirst term of (1 s b)]Adding the list s;
4: adding b to the list t;
5: repeating the steps from i to l-1 from 0 to 6-10;
6: let scw=cw[4i],Wherein the content of the first and second substances,values representing items 4i to 4i +3 of the list cw, respectively;
7: order toWherein, s < -1 [ - ]]Representing the value of the last but one bit string of the list s so far, s0,s1,v0,v1Respectively representing a random bit string of length lambda bits, t0,t1Respectively representing a random bit;
8: order toWherein, t < -1 [ - ]]Represents the value of the last bit of the list t so far, if this value is 0If not, then,the exclusive-or operation here means exclusive-or of the values of the corresponding positions inside the two input brackets, respectively;
9: computingWherein, t < -1 [ - ]]A value representing the last bit of the list t so far;
10: will be provided withAdd to the list s, willAdding the list t;
11:returnwherein, s < -1 [ - ]]Represents the value of the last-but-one bit string of the list s so far, t-1]Represents the value of the last bit of the list t so far, cw [ -1 [, in]Represents the value of the penultimate item of the list cw so far.
The general idea of the algorithm LT is: for each layer, if the current input wi=αiThen, two random number seeds s generated by both parties on line 7 of LT are calculated0,s1It is equal to the two values, two control bits t, respectively, added to the current layer s list in KeyGen0,t1The same is also equal to the two values added to the current layer t list in KeyGen, respectively. Otherwise, calculating two random number seeds s generated by the two parties0=s1Two control bits t0=t1Thereafter, the previously calculated difference is retained since the subsequent calculations are all based on the same seed and control bits. That is, if the disparity has previously occurred, the size is determined by the difference in the first unequal bits. In summary, when x-y > r0-r1The whole calculation result will be offset to 0 on both sides, otherwise, y on both sides0+y1=1。
The following factors are considered in the implementation of the embodiment:
1) the safety parameter is set to λ 128 to meet moderate safety performance requirements. Unlike conventional random number generator generation methods, the methods hereinThe method comprises encrypting the seed s with CTR/ECB encryption mode of AES to generate random number for improving performance (AES is symmetric block encryption)CTR and ECB are two encryption modes in which the implementation efficiency is faster). The AES can be further realized by utilizing an AES-NI instruction set of hardware so as to further improve the realization efficiency. The CTR/ECB encryption mode of AES is a common approach in the prior art and is not described herein in detail.
2) The calculation two parties need to agree on the encryption key of AES before the protocol starts, this step can transmit AES encryption key through the secure key exchange protocol of Diffie-Hellman, in the concrete implementation, two parties need to select a calculation number field F firstpThen, a generator g on the number field is negotiated together, and then the generator is operated by secret random numbers a and b respectively selected by the two parties according to the generator to respectively obtain gaAnd gbAnd sent to the other party, and finally the two parties calculate gabAnd the randomly generated AES key is encrypted by taking the AES key as a key to complete key exchange.
3) For a complete LT example, due to the random number r0,r1Embedded into the algorithm at the KeyGen stage, the entire KeyGen + LT is not reusable, primarily for security considerations.
The overall communication round number of the technical scheme of the invention can be seen from the flowchart in fig. 1, both computing parties need to receive the random number and the key sent by the third trusted party, and the communication traffic is 128+ (128+2+ l) l + l bits. The traffic of both parties is then calculated to be only l bits.
The traffic is reduced by nearly 50% compared to at least 128l +256l for a garbled circuit, but the number of communication rounds remains the same as for a garbled circuit. Compared with a secret sharing mode, the number of communication rounds is reduced by more than 90%. In conclusion, the effect of the technical scheme of the method is very obvious and can also achieve a good practical effect compared with the existing scheme. In addition, the communication traffic can be further reduced by considering the difference compression using the sorting arrays during the transmission.
The technical key point of the invention is to construct a special data structure and a method, the data structure is constructed based on a tree-shaped data structure, and the internal operation only involves simple addition, subtraction and exclusive-or operation. In addition, the generation of random numbers adopts a special instruction set mode to accelerate speed. Therefore, this LT generation technique is currently the most effective method.
While the invention has been described with reference to specific embodiments, the invention is not limited thereto, and those skilled in the art can easily conceive of various equivalent modifications or substitutions within the technical scope of the invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.