Internal user behavior detection method based on BiGAN and OTSU

文档序号:7427 发布日期:2021-09-17 浏览:31次 中文

1. A method for detecting internal user behaviors based on BiGAN and OTSU is characterized by comprising the following steps:

acquiring user behavior original log data, respectively extracting behavior characteristics of specified time unit length from the user behavior original log data according to different users, counting frequency characteristics of the behavior characteristics, and performing data processing on the extracted characteristics;

dividing data after data preprocessing into training data and data to be detected, training the training data through a BiGAN network, and obtaining a normal behavior model according to a training result;

and calculating a reconstruction error and a discriminator error of the data to be detected of the user to obtain an abnormal score based on the trained normal behavior model, and then automatically selecting a threshold value by applying an OTSU algorithm to obtain a detection result.

2. The method according to claim 1, wherein the step of extracting the behavior features of the specified time unit length and counting the frequency features specifically comprises:

the number of times of logging in the system in the unit time or outside the unit time, and the number of times of connecting the mobile device in the unit time or outside the unit time.

3. The BiGAN and OTSU-based internal user behavior detection method according to claim 1, wherein in the step of performing data processing on the extracted features:

and normalizing the extracted features.

4. The BiGAN and OTSU based internal user behavior detection method according to claim 1,

the BiGAN network consists of an encoder, a generator and a discriminator.

5. The method for detecting the behavior of the internal user based on the BiGAN and the OTSU according to claim 1, wherein the criteria for automatically selecting the threshold value by applying the OTSU algorithm are:

according to the gray property of the image, the image is divided into a target and a background;

when the difference between the background and the target is large, the inter-class variance between the two parts is large;

when the background or the target is wrongly divided, the difference becomes small.

6. The method for detecting the behavior of the internal user based on the BiGAN and the OTSU according to claim 4, wherein the data after data preprocessing is divided into training data and data to be detected, the training data is trained through a BiGAN network, and the training result is used for obtaining a normal behavior model:

only the normal behavior data of the user is selected as the training input of the BiGAN network, the generator part of the BiGAN network after training can only generate data similar to the normal behavior data, and the discriminator part discriminates whether the input data is the normal behavior data.

Background

At present, security accidents caused by malicious operations of internal users frequently occur, and most of the internal users have access rights to a system, know vulnerabilities of an internal network and grasp core data, so that internal attacks often cause more serious loss than external attacks. So the internal user behavior detection is gradually paid high attention by researchers at home and abroad. At present, certain research results are obtained in the field of internal user behavior detection, but the defects and the points to be strengthened still exist.

In the internal user behavior detection, due to the characteristics of small data quantity, unbalanced positive and negative sample quantity and time sequence dependency of user behavior data, the accuracy and the false alarm rate of the traditional abnormality detection method in the internal user behavior detection are influenced.

Disclosure of Invention

The invention aims to provide an internal user behavior detection method based on BiGAN and OTSU, and aims to solve the technical problems that the traditional anomaly detection method in the prior art is low in accuracy rate and high in false alarm rate in internal user behavior detection, and selection of an anomaly threshold value is more dependent on prior knowledge.

In order to achieve the above object, the method for detecting internal user behavior based on BiGAN and OTSU of the present invention comprises the following steps:

acquiring user behavior original log data, respectively extracting behavior characteristics within a specified time unit length from the user behavior original log data according to different users, counting the operation frequency of the behavior characteristics, and performing data processing on the extracted characteristics;

dividing data after data preprocessing into training data and data to be detected, training the training data through a BiGAN network, and obtaining a normal behavior model according to a training result;

and calculating a reconstruction error and a discriminator error of the data to be detected of the user to obtain an abnormal score based on the trained normal behavior model, and then automatically selecting a threshold value by applying an OTSU algorithm to obtain a detection result.

The method for extracting the behavior characteristics of the specified time unit length and counting the frequency characteristics specifically comprises the following steps:

the number of times of logging in the system in the unit time or outside the unit time, the number of times of connecting the movable equipment in the unit time or outside the unit time, and the like.

Wherein, in the step of performing data processing on the extracted features:

and normalizing the extracted features.

The BiGAN network consists of an encoder, a generator and a discriminator.

The standard for automatically selecting the threshold by applying the OTSU algorithm is as follows:

according to the gray property of the image, the image is divided into a target and a background;

when the difference between the background and the target is large, the inter-class variance between the two parts is large;

when the background or the target is wrongly divided, the difference becomes small;

the data after data preprocessing are divided into training data and data to be detected, the training data are trained through a BiGAN network, and the training result is obtained as a normal behavior model:

only the normal behavior data of the user is selected as the training input of the BiGAN network, the generator part of the BiGAN network after training can only generate data similar to the normal behavior data, and the discriminator part discriminates whether the input data is the normal behavior data.

The internal user behavior detection method based on the BiGAN and the OTSU, provided by the invention, aims at the problems of small user behavior data amount and unbalanced distribution of positive and negative samples, combines the excellent effects of the BiGAN network in the field of small sample data generation, provides a user behavior detection model based on the BiGAN, considers the time series characteristics of the user behavior data, and uses GRU as a generator of the GAN network. Aiming at the problems that the abnormal threshold value is difficult to select and depends on prior knowledge in the traditional abnormal detection method, the method is combined with the advantages of high efficiency and good accuracy of an OTSU algorithm in the field of image segmentation, and the automatic selection method using the OTSU as the threshold value in the user behavior abnormal detection is provided. Therefore, the accuracy of the abnormal detection method in the internal user behavior detection is improved, and the false alarm rate is reduced.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.

Fig. 1 is a flow chart of steps of the internal user behavior detection method based on BiGAN and OTSU of the present invention.

Fig. 2 is a module framework diagram of the internal user behavior detection method based on BiGAN and OTSU of the present invention.

Figure 3 is a schematic diagram of BiGAN of the present invention.

FIG. 4 is an experimental structural analysis diagram of the present invention.

FIG. 5 is a flow chart of the present invention for building a model of normal behavior of a user.

FIG. 6 is a flow chart of user behavior detection of the present invention.

Detailed Description

Reference will now be made in detail to embodiments of the present invention, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to the same or similar elements or elements having the same or similar function throughout. The embodiments described below with reference to the drawings are illustrative and intended to be illustrative of the invention and are not to be construed as limiting the invention.

Referring to fig. 1 to fig. 6, the present invention provides an internal user behavior detection method based on BiGAN and OTSU, including the following steps:

s1: acquiring user behavior original log data, respectively extracting behavior characteristics of specified time units from the user behavior original log data according to different users, counting frequency characteristics of the behavior characteristics, and performing data processing on the extracted characteristics;

s2: dividing data after data preprocessing into training data and data to be detected, training the training data through a BiGAN network, and obtaining a normal behavior model according to a training result;

s3: and calculating a reconstruction error and a discriminator error of the data to be detected of the user to obtain an abnormal score based on the trained normal behavior model, and then automatically selecting a threshold value by applying an OTSU algorithm to obtain a detection result.

In this embodiment, the extracting behavior features of a specified time unit length and counting frequency features thereof specifically includes:

the number of times of logging in the system in the unit time or outside the unit time, the number of times of connecting the movable equipment in the unit time or outside the unit time, and the like.

The step of processing the extracted features includes:

and normalizing the extracted features.

The BiGAN network consists of an encoder E, a generator G and a discriminator D.

The criteria for automatically selecting the threshold value by applying the OTSU algorithm are as follows:

according to the gray property of the image, the image is divided into a target and a background;

when the difference between the background and the target is large, the inter-class variance between the two parts is large;

when the background or the target is wrongly divided, the difference becomes small;

the maximum inter-class variance is the best segmentation between the target and the background, i.e. the threshold at this time is the best threshold.

Dividing data after data preprocessing into training data and data to be detected, training the training data through a BiGAN network, and obtaining a normal behavior model according to a training result in the steps of:

only the normal behavior data of the user is selected as the training input of the BiGAN network, the generator G part of the BiGAN network after training can only generate data similar to the normal behavior data, and the discriminator D part discriminates whether the input data is the normal behavior data.

The encoder E: extracting hidden variables of the real data as E (x);

a generator G: generating random noise as generation data g (z);

a discriminator D: in contrast to the original GAN network that only discriminates between true and false from the data space (x, g (z)), BiGAN discriminates between the data space and the potential spaces (x, e (x)) and (g (z), z). E (x) is the encoder output, G (z) is the generator output;

objective optimization function of the BiGAN network:

minG,EmaxDV(D,G,E)=Ex~px[Ez~pE(·|x)[logD(x,z)]]+Ez~pz[Ex~PG(·|z)[log(1-D(x,z))]]。

where x represents the original data, z represents the random latent variable, pG(x|z)pZ(z) and pE(z|x)pX(x) Respectively, the joint probability distribution of the generator and the encoder.

The principle of the OTSU algorithm:

setting: w 0: the proportion of foreground points;

w 1: the proportion of the background points, w1 is 1-w 0;

u 0: a foreground gray average;

u 1: a background gray level mean value;

u: global gray average, u-w 0 u0+ w1 u 1;

then g: maximum inter-class variance, g ═ w0(u0-u) × (u0-u) + w1(u1-u) × (u 1-u);

the target function g reflects the difference between the foreground and the background, and if the difference is larger, the difference between the foreground and the background is larger, and then t is the optimal threshold.

The specific internal user behavior detection method based on BiGAN and OTSU mainly comprises three parts, namely a data processing module, a user normal behavior model building module and a user behavior detection module, and a frame diagram of the method is shown in FIG. 2.

Data processing module

The data processing mainly realizes the processing of the original log. The invention calculates the frequency characteristics of the data according to the unit length of the time selected and appointed by different users, such as the times of logging in the system in the unit time or outside the working time, the times of connecting the movable equipment in the unit time or outside the working time, and the like, thereby being used as the characteristics for describing the user behavior mode, and finally, the extracted characteristics are normalized.

User normal behavior model building module

The module selects a BiGAN network with excellent effect in the field of small sample data generation as a model for constructing a normal behavior mode of a user. Compared with the common GAN network, the BiGAN network is additionally provided with the encoder, so that the function of randomly generating data of the original GAN is kept, meanwhile, the encoder can effectively extract the characteristics of the data, and the quality of the generated data is improved.

User behavior detection module

The module obtains an abnormal score by calculating a reconstruction error and a discriminator error of data to be detected based on a trained normal behavior model, and finally, an OTSU algorithm is applied to automatically select a threshold value, so that a detection result is achieved.

Compared with the common GAN network, the BiGAN has the advantages that the encoder is added, the function of randomly generating data of the original GAN is kept, meanwhile, the encoder can effectively extract the characteristics of the data, and therefore the quality of the generated data is improved. The structure of BiGAN is shown in fig. 3:

the BiGAN network consists of an encoder E, a generator G and a discriminator D.

The encoder E: extracting hidden variables of the real data as E (x);

a generator G: generating random noise as generation data g (z);

a discriminator D: in contrast to the original GAN network that only discriminates between true and false from the data space (x, g (z)), BiGAN discriminates between the data space and the potential spaces (x, e (x)) and (g (z), z). E (x) is the encoder output, G (z) is the generator output;

objective optimization function of the BiGAN network:

minG,EmaxDV(D,G,E)=Ex~px[Ez~pE(·|x)[logD(x,z)]]+Ez~pz[Ex~PG(·|z)[log(1-D(x,z))]]。

where x represents the original data, z represents the random latent variable, pG(x|z)pZ(z) and pE(z|x)pX(x) Respectively, the joint probability distribution of the generator and the encoder.

The BiGAN, after adding to the encoder, has the ability to learn deep characterizations of the data as compared to the original GAN. In the original GAN, D receives an input sample and takes a representation of its learned implicit spatial layer as a feature representation of the relevant task. When G generates real data, D can only predict the authenticity of the generated data, but cannot learn meaningful intermediate representations. Meanwhile, in an abnormal detection stage, the basic GAN network still needs to iteratively search the optimal z to find the lowest reconstruction error, so that the detection efficiency is low, and the BiGAN can directly use the trained encoder E to replace and search the z in the detection stage, so that the BiGAN is innovatively used as a user behavior detection model, the GRU is innovatively used as a generator of the BiGAN, and the difficult problem that the conventional GAN network is difficult to process time sequence data is solved.

The OTSU algorithm: the maximum inter-class variance method (OTSU) is an adaptive method of determining the threshold. The main idea is to divide the image into a target and a background according to the gray-scale property of the image. The inter-class variance between two parts of the background and the target is large when the difference is large. When the background or the object is mistakenly classified, the difference becomes small. Therefore, when the inter-class variance is maximum, the target and background segmentation is the best, i.e. the threshold value is the best threshold value at this time.

The principle of the OTSU algorithm:

setting: w 0: the proportion of foreground points;

w 1: the proportion of the background points, w1 is 1-w 0;

u 0: a foreground gray average;

u 1: a background gray level mean value;

u: global gray average, u-w 0 u0+ w1 u 1;

then g: maximum inter-class variance, g ═ w0(u0-u) × (u0-u) + w1(u1-u) × (u 1-u);

the target function g reflects the difference between the foreground and the background, and if the difference is larger, the difference between the foreground and the background is larger, and then t is the optimal threshold.

Among them, True positivity, TP (True positive): predicted as positive samples, actually positive samples

False Positives, FP (False positive): predicted as positive samples, actually negative samples

True Negatives, TN (True negative): prediction as negative, actually negative

False Negatives, FN (False negative): predicted as negative samples, actually positive samples

Accuracy (Accuracy) (TP + TN)/(TP + TN + FP + FN)

Precision ratio (Precision) ═ TP)/(TP + FP)

False alarm rate (FPR) ═ FP/(FP + TN)

Recall ratio (Recall) ═ TP/(TP + FN)

F1(F-Measure, general evaluation index) ═ (2 × P × R)/(P + R)

The method mainly carries out comparative analysis on the accuracy, the recall rate and the false alarm rate: the experiment selects DBN + OCSVM, GRU + attention and BiGAN + OSTU to compare, the experimental result is shown in figure 4, the comparison experiment of figure 4 shows that the effect of BiGAN + OSTU proposed herein is almost the same as that of GRU + attention in terms of accuracy, and the two methods of DBN + OCSVM and GRU + attention are superior to the two methods of DBN + OCSVM and GRU + attention in terms of recall ratio and false alarm ratio index, so that the high detection rate of user behavior anomaly detection on abnormal behaviors and lower false alarm ratio are realized.

The specific embodiment is as follows:

data processing

The CMU-CERT internal user behavior data set r5.2 version widely used in the field of internal threat detection is adopted as an experimental data source in the experiment. The data set card is an internal threat test data set proposed by CERT department of the university of Meilong, and simulates three main types of internal attack behavior data: internal fraud, system destruction, information theft, and large amounts of normal behavioral data. The CERT dataset consists of a number of files that contain a log of the employee's behavior in the organization. logon.csv, http.csv, email.csv, device.csv, psychrometric.csv contain login, logout, website access, email, copy files to removable disk, time and behavior of connecting and disconnecting the removable disk, scores on employee psychology tests, and an LDAP file containing user positions, departments, work periods and participating items. The file system adopts a logon.csv file, an http.csv file, an email.csv file, a device.csv file and a file.csv file.

First, the working time of the user is set to 08:00 to 18: 00, then counting the behaviors by taking the day as a unit length, and calculating the operation times of each type of behavior in the unit time length, such as the times of logging in the system in the unit time or out of the working time, the times of connecting with the mobile equipment in the unit time or out of the working time, and whether logging in 16 characteristics of the dangerous website. Particularly, only logging in or logging out on a certain day is performed, so that the behavior of the day is continuous with the behavior of the last day, and the behavior occurring before logging out is combined with the behavior of the last day for calculation. The behavior features extracted are shown in the following table:

user behavior detection

The internal user abnormal behavior detection based on the BiGAN and the OTSU is mainly divided into a normal behavior model construction stage and a detection stage.

And (3) a normal behavior model construction stage:

due to the characteristics of small data quantity and unbalanced sample distribution (less abnormal data) of user behavior data, a traditional neural network usually needs a large amount of data for training, so that the BiGAN network with excellent effect in the field of small sample data generation is selected as a model for constructing a normal behavior pattern of a user in this chapter. The flow chart of the user normal behavior model building is shown in fig. 5:

as shown in fig. 6, the trained normal behavior model can only generate normal behavior data, so that the normal behavior data cannot be generated well when encountering abnormal data, and therefore, the abnormal score a is used as a basis for judging the abnormality, and finally, an OTSU algorithm is applied to automatically select an abnormal threshold value, so that a detection result is achieved.

Let the data to be detected be x, and the abnormal score thereof be a (x), wherein a (x) is calculated as follows:

A(x)=αL(D)+(1-α)L(G)

1) l (G): representing a reconstruction error, judging the difference between x and real data by using a generated model, and calculating the error of a data space level by using the following calculation formula:

LG=||x-G(E(x))||1

e (x) represents the output of the encoder E through which the data x passes, G (E (x)) represents the data generated by the generator according to E (x), and l (G) represents the error between these two data. Because the difference between the normal behavior data and the abnormal behavior data of the user is small, the error calculation mode adopts cosine similarity which is sensitive to data change to calculate the error.

2) L (D): determining the difference between x and a real sample from the angle of the discriminator, selecting the expression of the intermediate layer of the discriminator to calculate the error, namely the error calculation of the hidden space layer, wherein the calculation formula is as follows:

L(D)=||fD(x,E(x))-fD(G(E(x)),E(x))||1

fDl (D) represents the error between (x, E (x)) and (G (E (x)), where cosine similarity is used as the error calculation method.

3) A (x): and (4) calculating an abnormal score A (x) by weighting and summing L (G), L (D), wherein alpha represents a weight value.

After the abnormal score is obtained through BiGAN calculation, performing threshold segmentation on the abnormal score by using an OTSU algorithm, wherein the OTSU-based abnormal threshold segmentation algorithm comprises the following steps:

input anomaly score s1,s2,…,snAnd initializing a threshold t to be 0;

circularly adding 1 from the initial threshold value to calculate the maximum variance g among classes under different threshold values;

and assigning the maximum inter-class variance g to t, wherein t is the optimal threshold value finally returned by the algorithm.

According to the abnormal score calculated by the BiGAN and the abnormal threshold calculated by the OTSU, the internal user behavior detection based on the BiGAN and the OTSU is realized by the following steps:

inputting behavior data D (D) to be detected1,d2,…,dn);

The data to be detected is calculated by BiGAN to obtain an abnormal score { s }1,s2,…,sn};

Calculating the obtained abnormal score through OTSU to obtain an abnormal threshold t;

and judging whether the abnormal score is larger than an abnormal threshold value, if so, determining the abnormal behavior, and otherwise, determining the normal behavior.

In summary, experiments prove that the internal user behavior detection method based on the BiGAN and OTSU algorithm can generate normal behaviors under the condition of a small number of data samples, enables the BiGAN network to digest time sequence data, can realize automatic selection of a threshold value by using the OTSU algorithm, and finally shows that the method improves the accuracy and recall rate of user behavior detection and has low false alarm rate.

While the invention has been described with reference to a preferred embodiment, it will be understood by those skilled in the art that various changes in form and detail may be made therein without departing from the spirit and scope of the invention as defined by the appended claims.

完整详细技术资料下载
上一篇:石墨接头机器人自动装卡簧、装栓机
下一篇:服务器集群容量评估方法、装置、电子设备及存储介质

网友询问留言

已有0条留言

还没有人留言评论。精彩留言会获得点赞!

精彩留言,会给你点赞!