Block chain network anomaly detection method
1. A method for detecting network anomaly of a block chain comprises the steps of,
the first slave node is responsible for collecting a real-time data set from the gateway;
the second slave node preprocesses the real-time data set;
the main detection node is responsible for detecting the data in the real-time data set;
storing the data set at a block node;
the step that the first slave node is responsible for collecting the real-time data set from the gateway comprises the following steps:
the first slave node formats the real-time data set, the processed real-time data set is stored in a temporary storage area, and the first slave node sends the real-time data set stored in the temporary storage area to a second slave node.
2. The method of claim 1, wherein the block chain network anomaly detection method,
the data acquisition layer is used for the first slave node to acquire the gateway real-time data set; the data acquisition step comprises the following specific implementation steps:
the data acquisition layer analyzes a Jpacket object from the gateway by utilizing a pcap _ loop () function, acquires original data information of a data packet from the Jpacket object, stores the captured data packet and then generates a basic feature object containing the data packet, judges whether the data feature of the data packet contains a source/destination IP address, a service and a protocol basic feature, if the data packet does not contain the basic feature, the data packet is filtered by the data acquisition layer, and if the data packet contains the basic feature, the data packet is packaged into a. cdv file by the data acquisition layer and is stored in the temporary storage area, and the. cdv file is sent to the data acquisition layer;
the data preprocessing layer receives the cdv file, and the data preprocessing layer is used for the data preprocessing process from the second slave node, the second slave node is responsible for receiving the formatted data set of the first slave node, and the data set is stored in the second slave node in a Topic queue manner; the second slave node carries out preprocessing operation on the data set, wherein a data feature set is built in the second slave node; the preprocessing operation carries out feature analysis on data in the formatted data set, if the data features of the data belong to the data feature set, the data are divided into a first data set, if the data features in the formatted data set do not belong to the data feature set, the data are divided into a second data set, and the second slave node respectively sends the first data set and the second data set to a main detection node;
and a Euclidean space is built in the data preprocessing layer, wherein the data characteristics contained in the data characteristic set comprise: the data preprocessing layer divides the data into the first data set if the data features of the data belong to the data feature set.
Background
The development of the internet promotes the data security problem, so that a data detection system suitable for large data scale needs to be established, data is taken as a detection key point, abnormal detection behaviors are analogized to finding out abnormal data from the data, and the detection technology and the data mining technology are combined to effectively reduce the false detection alarm rate and improve the detection efficiency.
For example, CN105591836A and CN109542772A disclose an anomaly detection method based on data flow analysis, the present invention is directed to a new BPEL software paradigm, which considers the language characteristics that the traditional software does not have, but the technique can only be directed to single category data identification. Referring to the method and apparatus for detecting data stream disclosed in the prior art of CN101459554A, a detection method combining the main stream characteristic information and the auxiliary stream characteristic information is adopted, so that detection of data stream not carrying obvious characteristic information is achieved, but the total amount of data stream is excessively large.
The invention is made in order to solve the problems that the traditional network abnormity detection commonly existing in the field is limited by data storage and processing capacity, and has lower accuracy rate, higher false alarm rate and the like.
Disclosure of Invention
The invention aims to improve the detection capability and the speed of a network anomaly detection model, and provides the following for the defects that the traditional network anomaly detection which generally exists at present is limited by data storage and processing capabilities, and has lower accuracy, higher false alarm rate and the like:
a method of block chain network anomaly detection, the method comprising,
the first slave node is responsible for collecting a real-time data set from the gateway;
the second slave node preprocesses the real-time data set;
the main detection node is responsible for detecting the data in the real-time data set;
storing the data set at a block node;
the step that the first slave node is responsible for collecting the real-time data set from the gateway comprises the following steps:
the first slave node formats the real-time data set, the processed real-time data set is stored in a temporary storage area, and the first slave node sends the real-time data set stored in the temporary storage area to a second slave node.
The data acquisition layer is used for the first slave node to acquire the gateway real-time data set; the data acquisition step comprises the following specific implementation steps:
the data acquisition layer analyzes a Jpacket object from the gateway by utilizing a pcap _ loop () function, acquires original data information of a data packet from the Jpacket object, stores the captured data packet and then generates a basic feature object containing the data packet, judges whether the data feature of the data packet contains a source/destination IP address, a service and a protocol basic feature, if the data packet does not contain the basic feature, the data packet is filtered by the data acquisition layer, and if the data packet contains the basic feature, the data packet is packaged into a. cdv file by the data acquisition layer and is stored in the temporary storage area, and the. cdv file is sent to the data acquisition layer;
the data preprocessing layer receives the cdv file, and the data preprocessing layer is used for the data preprocessing process from the second slave node, the second slave node is responsible for receiving the formatted data set of the first slave node, and the data set is stored in the second slave node in a Topic queue manner; the second slave node carries out preprocessing operation on the data set, wherein a data feature set is built in the second slave node; the preprocessing operation carries out feature analysis on data in the formatted data set, if the data features of the data belong to the data feature set, the data are divided into a first data set, if the data features in the formatted data set do not belong to the data feature set, the data are divided into a second data set, and the second slave node respectively sends the first data set and the second data set to a main detection node;
and a Euclidean space is built in the data preprocessing layer, wherein the data characteristics contained in the data characteristic set comprise: the data preprocessing layer divides the data into the first data set if the data features of the data belong to the data feature set. The beneficial effects obtained by the invention are as follows:
1. the data analysis layer is adopted to perform cluster analysis on the data stream with the abnormal points and the data stream with the non-abnormal points, a new detection model is established in the cluster analysis process, and an algorithm for quickly removing the isolated points is designed to improve the detection efficiency;
2. by adopting a real-time analysis technology, the data abnormal behavior can be accurately and quickly detected on the premise of high-capacity data detection.
3. The characteristic value of the data stream is stored by adopting a block chain storage technology, so that the characteristic value storage capacity of the data detection system is improved, and the detection capacity of the system is improved;
4. by adopting two data classification methods, two types of data streams can be detected, and the detection diversity is improved.
Drawings
The invention will be further understood from the following description in conjunction with the accompanying drawings. The components in the figures are not necessarily to scale, emphasis instead being placed upon illustrating the principles of the embodiments. Like reference numerals designate corresponding parts throughout the different views.
FIG. 1 is a flow chart of the detection method of the present invention.
Fig. 2 is a schematic structural view of a data acquisition layer of the present invention.
FIG. 3 is a schematic diagram of the structure of the data preprocessing layer of the present invention.
Detailed Description
In order to make the objects and advantages of the present invention more apparent, the present invention will be further described in detail with reference to the following embodiments; it should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention. Other systems, methods, and/or features of the present embodiments will become apparent to those skilled in the art upon review of the following detailed description. It is intended that all such additional systems, methods, features and advantages be included within this description, be within the scope of the invention, and be protected by the accompanying claims. Additional features of the disclosed embodiments are described in, and will be apparent from, the detailed description that follows.
The same or similar reference numerals in the drawings of the embodiments of the present invention correspond to the same or similar components; in the description of the present invention, it should be understood that if there is an orientation or positional relationship indicated by the terms "upper" and "lower" and "left" and "right" etc., it is only for convenience of description and simplification of the description based on the orientation or positional relationship shown in the drawings, but it is not indicated or implied that the device or assembly referred to must have a specific orientation.
The present embodiment may be understood as providing a method for detecting a blockchain network applied to a detection system, wherein the system includes: the system comprises a data acquisition layer, a data preprocessing layer, a data analysis layer and a data storage layer; the data acquisition layer is in data connection with the data preprocessing layer, the data preprocessing layer is in data connection with the data analysis layer, and the data analysis layer is in data connection with the data storage layer; the layers are independent from each other, the data format of the layers is standard conversion, the data calling between the layers adopts a standardized interface,
it is noted that the method may be applied to similar systems, which are not intended to be limiting.
The block chain network detection method comprises the following steps:
the first slave node is responsible for collecting a real-time data set from the gateway;
the second slave node preprocesses the real-time data set;
the main detection node is responsible for detecting the data in the real-time data set;
storing the data set at a block node;
the step that the first slave node is responsible for collecting the real-time data set from the gateway comprises the following steps:
the first slave node formats the real-time data set, the processed real-time data set is stored in a temporary storage area, and the first slave node sends the real-time data set stored in the temporary storage area to a second slave node;
the data acquisition layer is used for the first slave node to acquire the gateway real-time data set; the data acquisition step comprises the following specific implementation steps:
the data acquisition layer analyzes a Jpacket object from the gateway by utilizing a pcap _ loop () function, acquires original data information of a data packet from the Jpacket object, stores the captured data packet and then generates a basic feature object containing the data packet, judges whether the data feature of the data packet contains a source/destination IP address, a service and a protocol basic feature, if the data packet does not contain the basic feature, the data packet is filtered by the data acquisition layer, and if the data packet contains the basic feature, the data packet is packaged into a. cdv file by the data acquisition layer and is stored in the temporary storage area, and the. cdv file is sent to the data acquisition layer;
the data preprocessing layer receives the cdv file, and the data preprocessing layer is used for the data preprocessing process from the second slave node, the second slave node is responsible for receiving the formatted data set of the first slave node, and the data set is stored in the second slave node in a Topic queue manner; the second slave node carries out preprocessing operation on the data set, wherein a data feature set is built in the second slave node; the preprocessing operation carries out feature analysis on data in the formatted data set, if the data features of the data belong to the data feature set, the data are divided into a first data set, if the data features in the formatted data set do not belong to the data feature set, the data are divided into a second data set, and the second slave node respectively sends the first data set and the second data set to a main detection node;
and a Euclidean space is built in the data preprocessing layer, wherein the data characteristics contained in the data characteristic set comprise: the method comprises the following steps of discrete type characteristic, continuous type characteristic and high-dimensional characteristic, if the data characteristic of the data belongs to the data characteristic set, the data preprocessing layer divides the data into the first data set, and the data preprocessing layer preprocesses the first data set specifically as follows:
1. digitizing the discrete data; cdv, extracting data in the file, wherein the data preprocessing layer performs feature recognition on the data according to the data features, the discrete data types comprise back, land and nmap, discrete attributes of the discrete data are projected to Euclidean space, the discrete attributes have own spatial positions in the Euclidean space, then the Euclidean distance of the data at the spatial positions in the Euclidean space is calculated, and the discrete attributes adopt binary coding for data coding;
2. carrying out standardization and normalization processing on the continuous data; normalizing and normalizing the continuous characteristic data in the cdv file, wherein partial data are selected as sample data from the continuous characteristic data, and the processing steps are as follows:
normalizing the continuous attributes according to a formula (1);
B=(vi-r)/σ (1)
wherein B represents continuous attribute normalization of the continuous data, σ represents standard deviation of the sample data, the standard deviation is calculated by formula (2), viRepresenting the ith attribute of the continuous data, wherein i represents the attribute serial number of the continuous data, taking any positive integer, and selecting the number of the attributes to be 10; r represents the average value of the attribute, the average value being taken by equation (3);
(2)
wherein 10 represents the number of said attributes;
(3)
wherein 10 represents the number of said attributes;
normalizing the numerical value; normalizing the data which is processed in the step I and becomes standardized into a [0,1] interval, and mapping a value range of the data into a [0,1] range for processing, wherein the normalization processing is calculated by a formula (4);
wherein C'iIs CiNormalized value of [0,1]]Inner arbitrary number, CminIs CiMinimum value of (1), CmaxIs CiMaximum value of (1); ciIs the normalized data value;
3. performing principal component analysis and dimensionality reduction treatment; carrying out dimension reduction processing on the data with the high-dimensional data characteristics in the cdv file, wherein the specific steps are as follows:
1) combining the high-dimensional characteristic data into an n-row m-column matrix 1 according to columns, wherein each row of the matrix 1 represents an attribute field;
2) zero-averaging each row of the matrix 1, and subtracting the average value in each row;
3) solving a covariance matrix of the matrix 1;
4) solving an eigenvalue of the covariance matrix and an eigenvector corresponding to the eigenvalue;
5) arranging the eigenvectors into a matrix 2 from top to bottom according to the corresponding eigenvalue size, and taking the first k rows to form a matrix 3, wherein the k value is any positive integer not exceeding the row number and the column number of the matrix 2;
6) multiplying the matrix 3 with the matrix 1 to obtain low-dimensional data subjected to dimension reduction processing on the data with high-dimensional data characteristics in the cdv file subjected to dimension reduction processing;
if the data feature of the data does not belong to any feature element in the data feature set, the data is divided into a second data set by the data preprocessing layer, and the preprocessing step of the second data set by the data preprocessing layer is as follows:
if the data features of the second data set belong to five category labels, the five category labels include: normal, Dos, Probe, U2R, R2L, respectively representing positive integers from 1 to 5 for said category labels, respectively, the rest of said data not belonging to said five category labels being represented by 0;
the data preprocessing layer carries out vectorization on the data characteristics of the first data set and the second data set after preprocessing, and the vectorization of the data characteristics of the first data set and the second data set is realized through a Vector Assembler class;
and after completing the vectorization of the data characteristics, the data preprocessing layer sends the first data set and the second data set to the data analysis layer.
In the above embodiments, the descriptions of the respective embodiments have respective emphasis, and reference may be made to the related descriptions of other embodiments for parts that are not described or illustrated in a certain embodiment.
Although the invention has been described above with reference to various embodiments, it should be understood that many changes and modifications may be made without departing from the scope of the invention. That is, the methods, systems, and devices discussed above are examples. Various configurations may omit, substitute, or add various procedures or components as appropriate. For example, in alternative configurations, the methods may be performed in an order different than that described, and/or various components may be added, omitted, and/or combined. Moreover, features described with respect to certain configurations may be combined in various other configurations, as different aspects and elements of the configurations may be combined in a similar manner. Further, elements therein may be updated as technology evolves, i.e., many elements are examples and do not limit the scope of the disclosure or claims.
Specific details are given in the description to provide a thorough understanding of the exemplary configurations including implementations. However, configurations may be practiced without these specific details, for example, well-known circuits, processes, algorithms, structures, and techniques have been shown without unnecessary detail in order to avoid obscuring the configurations. This description provides example configurations only, and does not limit the scope, applicability, or configuration of the claims. Rather, the foregoing description of the configurations will provide those skilled in the art with an enabling description for implementing the described techniques. Various changes may be made in the function and arrangement of elements without departing from the spirit or scope of the disclosure.
In conclusion, it is intended that the foregoing detailed description be regarded as illustrative rather than limiting, and that it be understood that these examples are illustrative only and are not intended to limit the scope of the invention. After reading the description of the invention, the skilled person can make various changes or modifications to the invention, and these equivalent changes and modifications also fall into the scope of the invention defined by the claims.