E-commerce network abnormal user detection method and system
1. A method for detecting abnormal users of an E-commerce network is characterized by comprising the following steps:
s10, preprocessing the collected e-commerce network data;
s20, extracting spatial structure information of the E-commerce network data obtained after preprocessing in the step S10, constructing a heterogeneous information network and converting the heterogeneous information network into a user-equipment bipartite graph;
s30, constructing an abnormal user detection model of the E-commerce network by using a self-encoder and support vector data description based on the user-equipment bipartite graph obtained in the step S20;
s40, training the E-commerce network abnormal user detection model constructed in the step S30 in an iterative calculation mode, and determining the optimal parameters of the model;
and S50, outputting the abnormal user detection result by using the E-commerce network abnormal detection model constructed in the step S30 and the model optimal parameters determined in the step S40, and processing the abnormal user.
2. The method for detecting abnormal users of an e-commerce network according to claim 1, wherein the step S10 of preprocessing the collected e-commerce network data specifically comprises the following steps:
s11, cleaning samples with missing values in the collected E-commerce network data, and supplementing the samples by randomly sampling an original data set;
and S12, correcting the sample.
3. The method for detecting abnormal users of an e-commerce network according to claim 1, wherein the step S20 specifically comprises the following steps:
s21, abstracting the E-commerce network data preprocessed in the step S10 into a heterogeneous information network, and converting the E-commerce network data into a user-equipment bipartite graph G (X, Y, E), wherein X is { X ═ X1,x2,...,xMDenotes a set of M users, where xmRepresents the mth user, M ∈ [1, M],Y={y1,y2,...,yNDenotes a set of N devices, where ynDenotes the nth device, N ∈ [1, N],E={emn}m=1,2,...,M,n=1,2,...,NSet representing the user's login behavior on different devices, emnRepresenting a user xmLogging device ynIf user xmAt device ynGo to login, then emn1, otherwise, emn=0;
S22, constructing a user-equipment bipartite graph structure and expressing S ═ S1,s2,...,sM]TWherein s ism=[em1,em2,...,emN],m∈[1,M]。
4. The e-commerce network abnormal user detection method of claim 1, wherein the e-commerce network abnormal detection model constructed in the step S30 comprises three parts, namely an encoder, a decoder and a detector.
5. The e-commerce network abnormal user detection method of claim 1, wherein: the step S30 specifically includes the following steps:
s31, the encoder is used for encoding the user-device bipartite graph structure S into a user low-dimensional representation set Z in a hypersphere hidden space, and the formalization representation of the encoding process is shown as formula (1):
Z=Relu(WS+b) (1)
wherein Z is [ Z ]1,z2,...,zM]TSet of user low-dimensional representations in hypersphere hidden space for bipartite graph structure S, zmIs s ismAnd (3) corresponding user low-dimensional representation in the hypersphere space, wherein W and b are respectively coding weight and bias, and a Relu activation function is defined by the formula (2):
s32, a decoder for reconstructing the user low-dimensional representation set Z into a bipartite graph structureThe formalization of the decoding process is shown in equation (3):
wherein, the reconstructed bipartite graph structureW and b are decoding weight and bias respectively, and are the same as encoding weight and bias;
s33, the detector detects a user low-dimensional representation set Z in the hypersphere hidden space by adopting support vector data description, and the core c of the hypersphere hidden space is calculated by formula (4):
the euclidean distance between each user low-dimensional representation and the kernel c is calculated by equation (5):
wherein d ismRepresenting z for the user in a low dimensionmThe euclidean distance to kernel c is defined as the set of distances between all the user low-dimensional representations and the kernel D ═ D1,d2,...,dM};
S34, discussing the normal distribution of the set D through the 3 sigma criterion, if x-N (mu, sigma)2) Then, there are:
P{|x-μ|<σ}=0.6826 (6)
P{|x-μ|<2σ}=0.9545 (7)
P{|x-μ|<3σ}=0.9973 (8)
wherein x is a normal variable, σ is a standard deviation, μ is a mean value, and as can be seen from formula (8), the probability that the value of the normal variable x is outside the interval (μ -3 σ, μ +3 σ) is less than 0.003;
s35, according to the 3 sigma criterion, the detector calculates sigma and mu in the set D, and D is outside the (mu-3 sigma, mu +3 sigma) intervalmAnd removing, selecting the maximum value from the rest set as a hyper-sphere radius r, finally comparing the Euclidean distance between each user low-dimensional representation and the core with the radius r, if the Euclidean distance between the user low-dimensional representation and the core is larger than the radius r, the user is an abnormal user, otherwise, the user is a normal user.
6. The method for detecting abnormal users of an e-commerce network according to claim 1, wherein the step S40 specifically comprises the following steps:
s41, according to the device aggregability characteristics, the device similarity between users is calculated using equation (9):
wherein i, j belongs to [1, M, sim _ d ∈ijFor user xiAnd xjDevice similarity between, NiRepresenting a user xiSet of logged-in devices, NjRepresenting a user xjA set of logged-in devices;
dividing a day into 24 time periods equally according to the activity aggregation characteristic, and counting the times T of logging in equipment by each user in each time periodp,p∈[0,23]And describe each user's login behavior as ti=[T0,T1,...,T23]The activity similarity between users is calculated by equation (10):
wherein sim _ tijFor user xiAnd xjDegree of activity similarity between, tiRepresenting a user xiLog-in behavior of tjRepresenting a user xjThe login behavior of (2);
according to the equations (9) and (10), the behavior similarity of the user in the original space is calculated by the equation (11):
simij=sim_dij×sim_tij (11)
wherein, simijFor user xiAnd xjBehavioral similarity between them;
s42, the behavior difference between the user low-dimensional representations can be calculated by equation (12):
wherein disijRepresenting z for the user in a low dimensioniAnd zjThe Euclidean distance between;
further, the behavior similarity between the user low-dimensional representations is calculated by equation (13):
wherein the content of the first and second substances,representing z for the user in a low dimensioniAnd zjBehavioral similarity between them;
s43, establishing a joint objective function shown in a vertical type (14) for the E-commerce network abnormal user detection model established in the S30:
L=Lrec+α(Lsim+Lsvdd) (14)
wherein alpha is a hyper-parameter, the value range is (0,1), and LrecFor reconstruction errors, for measuring the original input S and the reconstructed outputThe difference therebetween, calculated by equation (15):
Lsimthe difference of the behavior similarity between two users before and after coding is measured as the behavior similarity difference, and is calculated by the following formula (16):
Lsvddfor hypersphere constraint, as means for distinguishing between normal users and anomaliesThe classification boundary between users is calculated by equation (17):
s44, initializing the E-commerce network abnormal user detection model in the step S30, initializing self-encoder parameters W and b, and setting a hypersphere spatial dimension dim, an iteration time epoch, a batch size and a learning rate;
and (5) iteratively executing the steps S45-S49 until the set iteration times are reached, finishing the training of the detection model of the abnormal users of the E-commerce network, and obtaining the optimal parameters of the model:
s45, taking the user-equipment bipartite graph structure S obtained in the S22 as input, and obtaining a user low-dimensional representation set Z according to the coding of an encoder in the formula (1);
s46, decoding the user low-dimensional representation set Z by a decoder according to the formula (3) to obtainCompleting forward propagation;
s47, calculating the behavior similarity between users according to the formula (11), and calculating the behavior similarity between low-dimensional representations of the users according to the formula (13);
s48, completing back propagation by adopting a random gradient descent method through a combined objective function L in an optimized formula (14), and realizing updating of the weight W and the bias b in the self-encoder;
s49, according to the steps S33-S35, the detector detects the abnormality of the user low-dimensional representation set Z.
7. The method for detecting abnormal users in an e-commerce network of claim 1, wherein the step S50 specifically comprises the following steps:
s51, obtaining the optimal parameters of the E-commerce network abnormal user detection model through iteratively executing the training process of the steps S45-S49, and taking the abnormal detection result obtained by using the optimal parameters as a final detection result;
and S52, outputting the abnormal user detection result to the user safety management related personnel of the E-commerce platform, so as to improve the efficiency and reliability of the abnormal user detection, and further processing the damage degree and risk influence of the abnormal user in a targeted manner.
8. A system for detecting abnormal users of an E-commerce network is characterized in that: the system comprises a computer processor, a memory, an E-commerce network data preprocessing unit, an E-commerce network abnormal user detection model training unit and an E-commerce network abnormal user detection result output unit; the E-commerce network data preprocessing unit preprocesses the acquired E-commerce network data and loads the data into a computer memory; the E-commerce network abnormal user detection model training unit constructs an E-commerce network abnormal user detection model according to E-commerce network data generated by the E-commerce network data preprocessing unit, and determines the optimal value of parameters in the model through iterative calculation; the E-commerce network abnormal user detection result output unit is used for outputting the E-commerce network abnormal user detection result to related workers or scientific researchers, and is used for related tasks such as abnormal user detection and network safety detection of each E-commerce platform.
Background
With the continuous popularization and development of the internet, many bad merchants induce customers to purchase defective products by operating a large number of users to perform fraudulent activities such as false comments, malicious bill swiping and the like on various large e-commerce network platforms, and the interests of the customers are seriously damaged. In order to eliminate the negative effects brought by the abnormal users, the invention provides a method and a system for detecting the abnormal users of the E-commerce network, which can accurately detect the abnormal users.
Disclosure of Invention
The invention provides a method and a system for detecting abnormal users in an e-commerce network, which can effectively and reliably detect the abnormal users in the e-commerce platform network.
In order to achieve the purpose, the invention adopts the following technical scheme:
the invention mainly focuses on the login activities of users on different devices by analyzing the generation mechanism of abnormal users in the E-commerce network, and accordingly provides two behavior characteristics of the abnormal users: device aggregation and activity aggregation. According to the two characteristics, the invention provides a method for detecting abnormal users in an E-commerce network, which comprises three main links, wherein the step S10 is an E-commerce network data preprocessing link, the steps S20-S40 are a link for constructing and optimizing an E-commerce network abnormal user detection model, and the step S50 is a link for outputting and processing an E-commerce network abnormal user detection result.
A method for detecting abnormal users of an E-commerce network comprises the following specific steps:
s10, preprocessing the collected E-commerce network data, and reducing the influence of noise data on the detection result;
s20, extracting spatial structure information of the E-commerce network data obtained after preprocessing in the step S10, constructing a heterogeneous information network and converting the heterogeneous information network into a user-equipment bipartite graph;
s30, constructing an abnormal user detection model of the E-commerce network by using a self-encoder and support vector data description based on the user-equipment bipartite graph obtained in the step S20;
s40, training the E-commerce network abnormal user detection model constructed in the step S30 in an iterative calculation mode, and determining the optimal parameters of the model;
and S50, outputting the abnormal user detection result by using the E-commerce network abnormal detection model constructed in the step S30 and the model optimal parameters determined in the step S40, and processing the abnormal user.
Further, the preprocessing the collected e-commerce network data in the step S10 specifically includes the following steps:
s11, cleaning samples with missing values in the collected E-commerce network data, and supplementing the samples by randomly sampling an original data set;
and S12, correcting the sample, and reducing the influence on the detection result possibly caused by the randomness of the sampling.
Further, the step S20 specifically includes the following steps:
s21, abstracting the E-commerce network data preprocessed in the step S10 into a heterogeneous information network, and converting the E-commerce network data into a user-equipment bipartite graph G (X, Y, E), wherein X is { X ═ X1,x2,...,xMDenotes a set of M users, where xmRepresents the mth user, M ∈ [1, M],Y={y1,y2,...,yNDenotes a set of N devices, where ynDenotes the nth device, N ∈ [1, N],E={emn}m=1,2,...,M,n=1,2,...,NSet representing the user's login behavior on different devices, emnRepresenting a user xmLogging device ynIf user xmAt device ynGo to login, then emn1, otherwise, emn=0;
S22, constructing a user-equipment bipartite graph structure and expressing S ═ S1,s2,...,sM]TWherein s ism=[em1,em2,...,emN],m∈[1,M]。
Further, the anomaly detection model of the e-commerce network constructed in the step S30 includes three parts, namely an encoder, a decoder and a detector.
The step S30 specifically includes the following steps:
s31, the encoder is used for encoding the user-device bipartite graph structure S into a user low-dimensional representation set Z in a hypersphere hidden space, and the formalization representation of the encoding process is shown as formula (1):
Z=Relu(WS+b) (1)
wherein Z is [ Z ]1,z2,...,zm,...,zM]TSet of user low-dimensional representations in hypersphere hidden space for bipartite graph structure S, zmIs s ismAnd (3) corresponding user low-dimensional representation in the hypersphere space, wherein W and b are respectively coding weight and bias, the coder part uses a Relu activation function, and the Relu activation function is defined by the formula (2):
s32, a decoder for reconstructing the user low-dimensional representation set Z into a bipartite graph structureThe formalization of the decoding process is shown in equation (3):
wherein, the reconstructed bipartite graph structureW and b are respectively decoding weight and bias, and the same as the encoding weight and the bias, the decoder part also uses a Relu activation function;
s33, the detector detects a user low-dimensional representation set Z in the hypersphere hidden space by adopting support vector data description, and the core c of the hypersphere hidden space is calculated by formula (4):
the euclidean distance between each user low-dimensional representation and the kernel c is calculated by equation (5):
wherein d ismRepresenting z for the user in a low dimensionmThe euclidean distance to kernel c is defined as the set of distances between all the user low-dimensional representations and the kernel D ═ D1,d2,...,dM};
S34, discussing the normal distribution condition of the set D through a 3 sigma criterion, and searching for a proper hypersphere radius r if x-N (mu, sigma)2) Then, there are:
P{|x-μ|<σ}=0.6826 (6)
P{|x-μ|<2σ}=0.9545 (7)
P{|x-μ|<3σ}=0.9973 (8)
wherein x is a normal variable, σ is a standard deviation, μ is a mean value, and as can be seen from equation (8), the probability that the value of the normal variable x is outside the interval (μ -3 σ, μ +3 σ) is less than 0.003, which is generally considered to be very low;
s35, according to the 3 sigma criterion, the detector calculates sigma and mu in the set D, and D is outside the (mu-3 sigma, mu +3 sigma) intervalmAnd removing the user data, selecting the maximum value from the rest set as the radius r, so as to ensure that most users can be represented in the hypersphere hidden space, and finally comparing the Euclidean distance between the low-dimensional representation of each user and the core with the radius r, wherein if the Euclidean distance between the low-dimensional representation of a certain user and the core is greater than the radius r, the user is an abnormal user, and otherwise, the user is a normal user.
Further, the step S40 specifically includes the following steps:
s41, according to the method, according to two behavior characteristics of the abnormal user: device aggregations and activity aggregations, computing behavioral similarities between users in the original space.
According to the device aggregation, the abnormal users share the devices to a large extent, and the abnormal users are represented by a bipartite graph that the abnormal users have a plurality of devices connected together, so that the similarity between the devices is high, while the behaviors of the normal users are independent and the similarity is low overall, and the device similarity between the users is calculated by using the formula (9):
wherein i, j belongs to [1, M, sim _ d ∈ijFor user xiAnd xjDevice similarity between, NiRepresenting a user xiSet of logged-in devices, NjRepresenting a user xjA set of logged-in devices;
according to the activity aggregation, the abnormal user group can explode and aggregate activities in a certain time period in a day, the method equally divides the day into 24 time periods, and counts the times T of logging in equipment by each user in each time periodp,p∈[0,23]And describe each user's login behavior as ti=[T0,T1,...,T23]The activity similarity between users is calculated by equation (10):
wherein sim _ tijFor user xiAnd xjDegree of activity similarity between, tiRepresenting a user xiLog-in behavior of tjRepresenting a user xjThe login behavior of (2);
according to the equations (9) and (10), the behavior similarity of the user in the original space is calculated by the equation (11):
simij=sim_dij×sim_tij (11)
wherein, simijFor user xiAnd xjBehavioral similarity between them;
s42, the behavior difference between the user low-dimensional representations can be calculated by equation (12):
wherein disijRepresenting z for the user in a low dimensioniAnd zjThe euclidean distance between them.
Further, the behavior similarity between the user low-dimensional representations is calculated by equation (13):
wherein the content of the first and second substances,representing z for the user in a low dimensioniAnd zjBehavioral similarity between them;
s43, establishing a joint objective function shown in a vertical type (14) for the E-commerce network abnormal user detection model established in the S30:
L=Lrec+α(Lsim+Lsvdd) (14)
wherein alpha is a hyper-parameter, the value range is (0,1), and LrecFor reconstruction errors, for measuring the original input S and the reconstructed outputThe difference therebetween, calculated by equation (15):
Lsimthe difference of the behavior similarity between two users before and after coding is measured as the behavior similarity difference, and is calculated by the following formula (16):
Lsvddfor the hypersphere constraint, as a classification boundary for distinguishing between normal users and abnormal users, it is calculated by equation (17):
s44, initializing the E-commerce network abnormal user detection model in the step S30, initializing self-encoder parameters W and b, and setting a hypersphere spatial dimension dim, an iteration time epoch, a batch size and a learning rate;
and (5) iteratively executing the steps S45-S49 until the set iteration times are reached, finishing the training of the detection model of the abnormal users of the E-commerce network, and obtaining the optimal parameters of the model:
s45, taking the user-equipment bipartite graph structure S obtained in the S22 as input, and obtaining a user low-dimensional representation set Z according to the coding of an encoder in the formula (1);
s46, decoding the user low-dimensional representation set Z by a decoder according to the formula (3) to obtainCompleting forward propagation;
s47, calculating the behavior similarity between users according to the formula (11), and calculating the behavior similarity between low-dimensional representations of the users according to the formula (13);
s48, completing back propagation by adopting a random gradient descent method through a combined objective function L in an optimized formula (14), and realizing updating of the weight W and the bias b in the self-encoder;
s49, according to the steps S33-S35, the detector detects the abnormality of the user low-dimensional representation set Z.
Further, the step S50 specifically includes the following steps:
s51, obtaining the optimal parameters of the E-commerce network abnormal user detection model through iteratively executing the training process of the steps S45-S49, and taking the abnormal detection result obtained by using the optimal parameters as a final detection result;
and S52, outputting the abnormal user detection result to the user safety management related personnel of the E-commerce platform, so as to improve the efficiency and reliability of the abnormal user detection, and further processing the damage degree and risk influence of the abnormal user in a targeted manner.
The invention also provides a system for detecting the abnormal users of the E-commerce network, which comprises a computer processor, a memory, an E-commerce network data preprocessing unit, an E-commerce network abnormal user detection model training unit and an E-commerce network abnormal user detection result output unit. The E-commerce network data preprocessing unit executes the step S10, preprocesses the acquired E-commerce network data and loads the preprocessed E-commerce network data into a memory of the computer; the E-commerce network abnormal user detection model training unit executes steps S20-S40 according to E-commerce network data generated by the E-commerce network data preprocessing unit, constructs an E-commerce network abnormal user detection model, and determines the optimal value of parameters in the model through iterative calculation; and the E-commerce network abnormal user detection result output unit executes the step S50, and outputs the E-commerce network abnormal user detection result to related workers or scientific research personnel for related tasks such as abnormal user detection, network safety detection and the like of each E-commerce platform.
Compared with the prior art, the invention has the following advantages:
1. the detection method of the invention not only can keep the behavior characteristics of the user on the basis of constructing the heterogeneous information network and converting the heterogeneous information network into the user-equipment bipartite graph, but also effectively expresses the space structure relationship between the two entities of the user and the equipment, and is beneficial to obtaining the abnormal user detection result with stronger robustness and interpretability.
2. The detection method of the invention utilizes the self-encoder and the support vector data to describe and establish the E-commerce network abnormal user detection model, so that the model has certain self-supervision learning capability, can automatically provide supervision information for abnormal detection work, and effectively improves the detection performance of the model.
Drawings
Fig. 1 is a diagram of a model structure for detecting abnormal users in the e-commerce network in step S30 according to the present invention;
FIG. 2 is a system configuration diagram of the abnormal user detection system of the E-commerce network according to the present invention;
fig. 3 is a flowchart of the method for detecting abnormal users in the e-commerce network according to the present invention.
Detailed Description
In order to further explain the technical scheme of the invention, the invention is further explained by combining the drawings and the embodiment.
The method for detecting the abnormal users of the e-commerce network is implemented by a computer program, and a specific implementation mode of the technical scheme provided by the invention is detailed according to a flow shown in fig. 3. According to the technical scheme, abnormal user detection is carried out on random sampling samples of a daily execution log in the Amazon E-commerce platform. The execution log includes user ID, device ID, login time, and the like, where the number M of users is 236, the number N of devices is 275, and the number of samples is 5000.
The implementation mode mainly comprises the following key contents:
s10, preprocessing the collected E-commerce network data, and reducing the influence of noise data on the detection result, specifically comprising the following steps:
s11, cleaning samples with missing values in the collected E-commerce network data, and supplementing the samples by randomly sampling an original data set;
and S12, correcting the sample, and reducing the influence on the detection result possibly caused by the randomness of the sampling.
S20, extracting the spatial structure information of the E-commerce network data obtained in the step S10, constructing a heterogeneous information network and converting the heterogeneous information network into a user-equipment bipartite graph, which specifically comprises the following steps:
s21, abstracting the E-commerce network data preprocessed in the step S10 into a heterogeneous information network, and converting the E-commerce network data into a user-equipment bipartite graph G (X, Y, E), wherein X is { X ═ X1,x2,...,xMDenotes a set of M users, where xmRepresents the mth user, M ∈ [1, M],Y={y1,y2,...,yNDenotes a set of N devices, where ynIs shown asN devices, N ∈ [1, N],E={emn}m=1,2,...,M,n=1,2,...,NSet representing the user's login behavior on different devices, emnRepresenting a user xmLogging device ynIf user xmAt device ynGo to login, then emn1, otherwise, emn=0;
S22, constructing a user-equipment bipartite graph structure and expressing S ═ S1,s2,...,sM]TWherein s ism=[em1,em2,...,emN],m∈[1,M]。
And S30, constructing an abnormal user detection model of the E-commerce network by using the self-encoder and the support vector data description based on the user-equipment bipartite graph obtained in the step S20. The E-commerce network anomaly detection model comprises an encoder, a decoder and a detector, and the overall structure of the E-commerce network anomaly detection model is shown in the attached figure 1, and the E-commerce network anomaly detection model specifically comprises the following steps:
s31, the encoder is used for encoding the user-device bipartite graph structure S into a user low-dimensional representation set Z in a hypersphere hidden space, and the formalization representation of the encoding process is shown as formula (1):
Z=Relu(WS+b) (1)
wherein Z is [ Z ]1,z2,...,zM]TSet of user low-dimensional representations in hypersphere hidden space for bipartite graph structure S, zmIs s ismAnd (3) corresponding user low-dimensional representation in the hypersphere space, wherein W and b are respectively coding weight and bias, the coder part uses a Relu activation function, and the Relu activation function is defined by the formula (2):
s32, a decoder for reconstructing the user low-dimensional representation set Z into a bipartite graph structureThe formalization of the decoding process is shown in equation (3):
wherein, the reconstructed bipartite graph structureW and b are respectively decoding weight and bias, and the same as the encoding weight and the bias, the decoder part also uses a Relu activation function;
s33, the detector detects a user low-dimensional representation set Z in the hypersphere hidden space by adopting support vector data description, and the core c of the hypersphere hidden space is calculated by formula (4):
the euclidean distance between each user low-dimensional representation and the kernel c is calculated by equation (5):
wherein d ismRepresenting z for the user in a low dimensionmThe euclidean distance to kernel c is defined as the set of distances between all the user low-dimensional representations and the kernel D ═ D1,d2,...,dM};
S34, discussing the normal distribution condition of the set D through a 3 sigma criterion, and searching for a proper hypersphere radius r if x-N (mu, sigma)2) Then, there are:
P{|x-μ|<σ}=0.6826 (6)
P{|x-μ|<2σ}=0.9545 (7)
P{|x-μ|<3σ}=0.9973 (8)
wherein x is a normal variable, σ is a standard deviation, μ is a mean value, and as can be seen from equation (8), the probability that the value of the normal variable x is outside the interval (μ -3 σ, μ +3 σ) is less than 0.003, which is generally considered to be very low;
s35, according to the 3 sigma criterion, the detector calculates sigma and mu in the set D, and D is outside the (mu-3 sigma, mu +3 sigma) intervalmAnd removing the user data, selecting the maximum value from the rest set as the radius r, so as to ensure that most users can be represented in the hypersphere hidden space, and finally comparing the Euclidean distance between the low-dimensional representation of each user and the core with the radius r, wherein if the Euclidean distance between the low-dimensional representation of a certain user and the core is greater than the radius r, the user is an abnormal user, and otherwise, the user is a normal user.
S40, training the E-commerce network abnormal user detection model constructed in the step S30 in an iterative calculation mode, and determining the optimal parameters of the model, wherein the method specifically comprises the following steps:
s41, according to two behavior characteristics of the abnormal user: device aggregations and activity aggregations, computing behavioral similarities between users in the original space.
According to the device aggregation, the abnormal users share the devices to a large extent, and the abnormal users are represented by a bipartite graph that the abnormal users have a plurality of devices connected together, so that the similarity between the devices is high, while the behaviors of the normal users are independent and the similarity is low overall, and the device similarity between the users is calculated by using the formula (9):
wherein i, j belongs to [1, M, sim _ d ∈ijFor user xiAnd xjDevice similarity between, NiRepresenting a user xiSet of logged-in devices, NjRepresenting a user xjA set of logged-in devices;
according to the activity aggregation, the abnormal user group can explode and aggregate activities in a certain time period in a day, the method equally divides the day into 24 time periods, and counts the times T of logging in equipment by each user in each time periodp,p∈[0,23]And describe each user's login behavior as ti=[T0,T1,...,T23]The activity similarity between users is calculated by equation (10):
wherein sim _ tijFor user xiAnd xjDegree of activity similarity between, tiRepresenting a user xiLog-in behavior of tjRepresenting a user xjThe login behavior of (2);
according to the equations (9) and (10), the behavior similarity of the user in the original space is calculated by the equation (11):
simij=sim_dij×sim_tij (11)
wherein, simijFor user xiAnd xjBehavioral similarity between them;
s42, the behavior difference between the user low-dimensional representations can be calculated by equation (12):
wherein disijRepresenting z for the user in a low dimensioniAnd zjThe euclidean distance between them.
Further, the behavior similarity between the user low-dimensional representations is calculated by equation (13):
wherein the content of the first and second substances,representing z for the user in a low dimensioniAnd zjBehavioral similarity between them;
s43, establishing a joint objective function shown in a vertical type (14) for the E-commerce network abnormal user detection model established in the S30:
L=Lrec+α(Lsim+Lsvdd) (14)
wherein alpha is a hyper-parameter, the value range is (0,1), and LrecFor reconstruction errors, for measuring the original input S and the reconstructed outputThe difference therebetween, calculated by equation (15):
Lsimthe difference of the behavior similarity between two users before and after coding is measured as the behavior similarity difference, and is calculated by the following formula (16):
Lsvddfor the hypersphere constraint, as a classification boundary for distinguishing between normal users and abnormal users, it is calculated by equation (17):
s44, initializing the E-commerce network abnormal user detection model in the step S30, initializing self-encoder parameters W and b, and setting a hypersphere-hidden space dimension dim, an iteration time epoch, a batch size and a learning rate;
and (5) iteratively executing the steps S45-S49 until the set iteration times are reached, finishing the training of the detection model of the abnormal users of the E-commerce network, and obtaining the optimal parameters of the model:
s45, taking the user-equipment bipartite graph structure S obtained in the S22 as input, and obtaining a user low-dimensional representation set Z according to the coding of an encoder in the formula (1);
s46, decoding the user low-dimensional representation set Z by a decoder according to the formula (3) to obtainCompleting forward propagation;
s47, calculating the behavior similarity between users according to the formula (11), and calculating the behavior similarity between low-dimensional representations of the users according to the formula (13);
s48, completing back propagation by adopting a random gradient descent method through a combined objective function L in an optimized formula (14), and realizing updating of the weight W and the bias b in the self-encoder;
s49, according to the steps S33-S35, the detector detects the abnormality of the user low-dimensional representation set Z.
S50, outputting the abnormal user detection result by using the E-commerce network abnormal detection model constructed in the step S30 and the model optimal parameters determined in the step S40, and processing the abnormal user, specifically comprising the following steps:
s51, obtaining the optimal parameters of the E-commerce network abnormal user detection model through iteratively executing the training process of the steps S45-S49, and taking the abnormal detection result obtained by using the optimal parameters as a final detection result;
and S52, outputting the abnormal user detection result to the user safety management related personnel of the E-commerce platform, improving the efficiency and reliability of the abnormal user detection, and further processing the damage degree and risk influence of the abnormal user in a targeted manner.
Evaluating the technical effect:
in order to verify the effectiveness and the advancement of the technical scheme provided by the invention, the invention is compared with a plurality of classical anomaly detection methods, the comparison method comprises a K nearest neighbor method (KNN), an isolated forest method (IF), an OCSVM (support vector machine) method (OCSVM), a local anomaly factor method (LOF) and a Principal Component Analysis (PCA), the average identification F1-measure and AUC of 20 experiments are taken as evaluation indexes, the matching results are compared and analyzed, and the comparison results are shown in Table 1:
the results in the table show that compared with a plurality of classical anomaly detection methods, the technical scheme of the invention can obtain better results when detecting users with abnormal E-commerce network.
As shown in fig. 2, a system for detecting users with abnormal e-commerce network comprises a computer processor and a memory, an e-commerce network data preprocessing unit, an e-commerce network abnormal user detection model training unit, and an e-commerce network abnormal user detection result output unit. The E-commerce network data preprocessing unit executes the step S10, preprocesses the acquired E-commerce network data and loads the preprocessed E-commerce network data into a memory of the computer; the E-commerce network abnormal user detection model training unit executes the steps S20-S40 according to the E-commerce network data generated by the E-commerce network data preprocessing unit, constructs an E-commerce network abnormal user detection model, and determines the optimal values of the parameters in the model through iterative calculation. The e-commerce network abnormal user detection result output unit executes the step S50, and outputs the e-commerce network abnormal user detection result to related workers or scientific research personnel for related tasks such as abnormal user detection and network security detection of each e-commerce platform.
It should be noted that variations and modifications can be made by those skilled in the art without departing from the principle of the present invention, and these should also be construed as falling within the scope of the present invention.