XGboost model-based method and device for detecting abnormal electricity consumption behavior of user

文档序号:9168 发布日期:2021-09-17 浏览:29次 中文

1. A method for detecting abnormal electricity consumption behaviors of a user based on an XGboost model is characterized by comprising the following steps:

responding to the acquired user original data, and performing data preprocessing on the user original data based on an edge computing node to enable the user original data to be optimized power utilization data, wherein the user original data comprise user historical power utilization data and terminal equipment abnormal power utilization records;

responding to the obtained optimized electricity consumption data, training a training model and generating an XGboost detection model;

performing parameter optimization on the XGboost detection model based on an improved genetic algorithm to determine an optimal hyper-parameter combination of the XGboost detection model, wherein the improved genetic algorithm comprises cross probability and mutation probability of exponential decay, and the expression of the cross probability of exponential decay is as follows:

in the formula (I), the compound is shown in the specification,in order to be a cross-over probability,is the average value of the fitness of the population,the fitness of the two crossed individuals with larger fitness is obtained,are all constant, andin order to be able to obtain the attenuation coefficient,the current genetic iteration number;

the expression of the exponentially decaying mutation probability is:

in the formula (I), the compound is shown in the specification,the probability of the variation is the probability of the variation,the fitness of the variant individual is shown as the fitness,are all constant, and

the steps in the improved genetic algorithm include constructing a fitness function, the expression of which is:

in the formula (I), the compound is shown in the specification,in order to be a function of the fitness measure,in order to be the weight coefficient,in order to be precise in terms of rate of accuracy,in order to be able to recall the rate,the number of the hyper-parameters;

and inputting the data to be detected into the XGboost detection model, and judging whether certain optimized electricity consumption data is abnormal or not based on the optimal hyper-parameter combination.

2. The XGboost model-based abnormal electricity consumption behavior detection method for the user according to claim 1, wherein the data preprocessing comprises data cleaning, missing value processing and data dimensionality reduction.

3. The XGboost model-based abnormal electricity consumption behavior detection method for the user according to claim 1, wherein the calculation accuracy rate is expressed as:

in the formula (I), the compound is shown in the specification,for the number of abnormally charged users that are correctly detected,the number of correctly detected users with abnormal electricity is adopted.

4. The XGboost model-based abnormal electricity consumption behavior detection method for the user according to claim 1, wherein the expression for calculating the recall rate is as follows:

in the formula (I), the compound is shown in the specification,for the number of abnormally charged users that are correctly detected,the number of the users with abnormal electricity consumption detected by mistake.

5. A device for detecting abnormal electricity consumption behaviors of a user based on an XGboost model is characterized by comprising:

the processing module is configured to respond to the acquired user original data, carry out data preprocessing on the user original data based on an edge computing node to enable the user original data to be optimized power utilization data, wherein the user original data comprise historical power utilization data of a user and abnormal power utilization records of terminal equipment;

the training module is configured to respond to the acquired optimized power utilization data, train a training model and generate an XGboost detection model;

an optimization module configured to perform parameter optimization on the XGBoost detection model based on an improved genetic algorithm to determine an optimal hyper-parameter combination of the XGBoost detection model, wherein the improved genetic algorithm includes an exponentially decaying cross probability and a variation probability, and an expression of the exponentially decaying cross probability is as follows:

in the formula (I), the compound is shown in the specification,is a crossThe probability of a fork is,is the average value of the fitness of the population,the fitness of the two crossed individuals with larger fitness is obtained,are all constant, andin order to be able to obtain the attenuation coefficient,the current genetic iteration number;

the expression of the exponentially decaying mutation probability is:

in the formula (I), the compound is shown in the specification,the probability of the variation is the probability of the variation,the fitness of the variant individual is shown as the fitness,are all constant, and

the steps in the improved genetic algorithm include constructing a fitness function, the expression of which is:

in the formula (I), the compound is shown in the specification,in order to be a function of the fitness measure,in order to be the weight coefficient,in order to be precise in terms of rate of accuracy,in order to be able to recall the rate,the number of the hyper-parameters;

and the judging module is configured to input data to be detected into the XGboost detecting model and judge whether certain optimized electricity utilization data is abnormal or not based on the optimal hyper-parameter combination.

6. An electronic device, comprising: at least one processor, and a memory communicatively coupled to the at least one processor, wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any of claims 1 to 4.

7. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the method of any one of claims 1 to 4.

Background

With the rapid development of economy, the power consumption demand of users is continuously increased, if the power consumption behavior of the users is abnormal, the non-technical loss of a power grid is increased, and the operation cost of a power company is increased. The traditional method for detecting the abnormal electricity utilization behavior of the user is that field personnel regularly patrol a line, regularly check an electricity meter, report a user and the like, the means have high dependence on people, a large amount of labor cost needs to be invested, and meanwhile, the electricity utilization behavior detection consumes a long time and has low efficiency.

At present, researches on abnormal electricity utilization behavior detection are mainly divided into two types of methods based on states and artificial intelligence. The state-based analysis method is used for detecting abnormality by comparing changes of a large amount of data such as power, voltage, current and the like of the power distribution network in real time; the abnormal electricity consumption behavior detection model based on artificial intelligence firstly extracts indexes capable of reflecting the abnormal electricity consumption behaviors through data analysis, and then trains a mapping relation between the indexes and an electricity consumption behavior detection result by means of an artificial intelligence method to complete construction of the abnormal electricity consumption behavior detection model. Related art 1: a hardware-based user abnormal electricity consumption behavior detection method uses external monitoring devices such as a complex detection system consisting of a camera, a sensor and a networking device to monitor whether power supply equipment is damaged or not and whether electricity consumption behaviors are normal or not in real time. The method needs higher equipment cost, hardware equipment is easily interfered by external factors such as weather, equipment maintenance is difficult, and abnormal power utilization behaviors such as software power stealing and remote control power stealing are difficult to identify. Related art 2: the user abnormal electricity consumption behavior detection method based on the state mainly detects the electricity consumption behavior of the user by comparing and analyzing the electricity consumption information of the user, for example, the data of daily electricity consumption and daily line loss of the user are processed in batch and analyzed by the correlation degree to identify the electricity stealing behavior of the users in a transformer area; and the synchronous line loss, the power load, the daily power consumption, the current, the active power and other electric quantity information are comprehensively compared and analyzed, so that the accurate detection of the abnormal power consumption behaviors of the user is realized. However, in actual operation of the power distribution network, the user side has a large amount of diversified power consumption data, and abnormal power consumption behaviors are also diversified, so that a state-based detection method requires a long detection time. Related art 3: the user abnormal electricity consumption behavior detection method based on artificial intelligence is characterized in that an electricity consumption behavior detection model is trained by means of an artificial intelligence method and a large amount of electricity consumption data, and the effect of rapidly identifying whether the user electricity consumption behavior is abnormal under the condition of known user electricity consumption data is achieved. If normal user electricity consumption data are used as training samples, self-coding network learning data characteristics are adopted, input data are reconstructed to calculate a detection threshold value, and an abnormal electricity consumption behavior identification model for comparing errors with the detection threshold value is established on the basis. The method still has a large promotion space in the aspects of selection of evaluation indexes, training time and optimization of detection efficiency.

However, the above method has the following problems:

1. the method for monitoring the power supply equipment in real time by means of the camera, the sensor and other devices has high cost, needs a large amount of expensive hardware to meet the monitoring requirement, and has the defects that the camera of the device is difficult to identify and alarm due to no direct participation of people in the phenomena of software electricity stealing, remote control electricity stealing and the like.

2. The method for monitoring the electric quantity state in real time is difficult to process a large amount of actual data of the power distribution network in time, consumes long time, has various abnormal electricity utilization behaviors and is easy to misjudge.

3. In the existing method for identifying the abnormal electricity consumption behavior of the user, a complex artificial intelligence algorithm is often adopted to improve the identification accuracy, so that more calculation resources are occupied, and the calculation time is longer.

4. According to the traditional identification method based on the abnormal user electricity consumption mode, a detection device is required to upload all electricity consumption data of a user, an identification module of a control center is used for calculating and identifying the electricity consumption data, the calculation load of the control center is large, the electricity consumption data of the user is easy to steal, and the electricity consumption privacy of the user is difficult to protect.

In summary, there is a need for a method and an apparatus for detecting abnormal power consumption behavior of a user, which optimize an abnormal power consumption detection model to improve the detection accuracy.

Disclosure of Invention

The invention provides a method and a device for detecting abnormal electricity consumption behaviors of a user based on an XGboost model, which are used for at least solving one of the technical problems.

In a first aspect, the invention provides a method for detecting abnormal electricity consumption behavior of a user based on an XGboost model, which comprises the following steps:responding to the acquired user original data, and performing data preprocessing on the user original data based on an edge computing node to enable the user original data to be optimized power utilization data, wherein the user original data comprise user historical power utilization data and terminal equipment abnormal power utilization records; responding to the obtained optimized electricity consumption data, training a training model and generating an XGboost detection model; performing parameter optimization on the XGboost detection model based on an improved genetic algorithm to determine an optimal hyper-parameter combination of the XGboost detection model, wherein the improved genetic algorithm comprises cross probability and mutation probability of exponential decay, and the expression of the cross probability of exponential decay is as follows:in the formula (I), wherein,in order to be a cross-over probability,is the average value of the fitness of the population,the fitness of the two crossed individuals with larger fitness is obtained,are all constant, andin order to be able to obtain the attenuation coefficient,the current genetic iteration number; the exponentially decaying mutation probabilityThe expression of (a) is:in the formula (I), wherein,the probability of the variation is the probability of the variation,the fitness of the variant individual is shown as the fitness,are all constant, and(ii) a The steps in the improved genetic algorithm include constructing a fitness function, the expression of which is:in the formula (I), wherein,in order to be a function of the fitness measure,in order to be the weight coefficient,in order to be precise in terms of rate of accuracy,in order to be able to recall the rate,the number of the hyper-parameters; inputting data to be detected into the XGboost detection model, and judging whether certain optimized electricity consumption data is abnormal or not based on the optimal hyper-parameter combination。

In a second aspect, the present invention provides an XGBoost model-based device for detecting abnormal electricity consumption behavior of a user, including: the processing module is configured to respond to the acquired user original data, carry out data preprocessing on the user original data based on an edge computing node to enable the user original data to be optimized power utilization data, wherein the user original data comprise historical power utilization data of a user and abnormal power utilization records of terminal equipment; the training module is configured to respond to the acquired optimized power utilization data, train a training model and generate an XGboost detection model; an optimization module configured to perform parameter optimization on the XGBoost detection model based on an improved genetic algorithm to determine an optimal hyper-parameter combination of the XGBoost detection model, wherein the improved genetic algorithm includes an exponentially decaying cross probability and a variation probability, and an expression of the exponentially decaying cross probability is as follows:in the formula (I), wherein,in order to be a cross-over probability,is the average value of the fitness of the population,the fitness of the two crossed individuals with larger fitness is obtained,are all constant, andin order to be able to obtain the attenuation coefficient,the current genetic iteration number; the expression of the exponentially decaying mutation probability is:in the formula (I), wherein,the probability of the variation is the probability of the variation,the fitness of the variant individual is shown as the fitness,are all constant, and(ii) a The steps in the improved genetic algorithm include constructing a fitness function, the expression of which is:in the formula (I), wherein,in order to be a function of the fitness measure,in order to be the weight coefficient,in order to be precise in terms of rate of accuracy,in order to be able to recall the rate,the number of the hyper-parameters; and the judging module is configured to input data to be detected into the XGboost detecting model and judge whether certain optimized electricity utilization data is abnormal or not based on the optimal hyper-parameter combination.

In a third aspect, an electronic device is provided, comprising: the XGboost model-based abnormal electricity consumption behavior detection method comprises at least one processor and a memory which is in communication connection with the at least one processor, wherein the memory stores instructions which can be executed by the at least one processor, and the instructions are executed by the at least one processor so as to enable the at least one processor to execute the steps of the XGboost model-based abnormal electricity consumption behavior detection method.

In a fourth aspect, the present invention further provides a computer-readable storage medium, on which a computer program is stored, the computer program including program instructions, which, when executed by a computer, cause the computer to execute the steps of the XGBoost model-based abnormal electricity consumption behavior detection method according to any embodiment of the present invention.

According to the method and device for detecting the abnormal electricity consumption behaviors of the user based on the XG boost model, the abnormal electricity consumption behavior detection system of the user based on the edge computing node is adopted, key electricity consumption data containing user privacy are stored, analyzed and processed locally, only abnormal electricity consumption conditions are uploaded, the user privacy can be effectively protected, the XG boost algorithm optimized by using a modified genetic algorithm is used for carrying out hyper-parameter detection on the abnormal electricity consumption behaviors of the terminal user, and the accuracy and rapidity of detection are greatly improved while the electricity consumption privacy of the user is protected.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on the drawings without creative efforts.

Fig. 1 is a flowchart of a method for detecting abnormal electricity consumption behavior of a user based on an XGBoost model according to an embodiment of the present invention;

fig. 2 is a flowchart of another XGBoost model-based method for detecting abnormal electricity consumption behavior of a user according to an embodiment of the present invention;

fig. 3 is a block diagram of a device for detecting abnormal electricity consumption behavior of a user based on an XGBoost model according to an embodiment of the present invention;

fig. 4 is a schematic structural diagram of an electronic device according to an embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

Referring to fig. 1, a flowchart of a method for detecting abnormal electricity consumption behavior of a user based on an XGBoost model according to the present application is shown.

As shown in fig. 1, the XGBoost model-based method for detecting abnormal electricity consumption behavior of a user specifically includes: step S101, responding to the acquired user original data, and performing data preprocessing on the user original data based on an edge calculation node to enable the user original data to be optimized power utilization data, wherein the user original data comprise user historical power utilization data and terminal equipment abnormal power utilization records.

In this embodiment, the data preprocessing includes data cleaning, missing value processing, and data dimensionality reduction, where the missing value processing includes processing a missing value by an expectation-maximization interpolation method, and the data dimensionality reduction specifically includes the following steps: standardizing the user data, wherein the mean value is 0 and the variance is 1; calculating a covariance matrix, an eigenvalue of the covariance matrix and an eigenvector corresponding to the eigenvalue; sorting the eigenvalues according to the sizes, selecting the largest m of the eigenvalues, and taking the corresponding eigenvectors as row vectors respectively to form an eigenvector matrix; and converting the user data into a new space consisting of m feature vectors. The data preprocessing method of data cleaning, missing value processing and data dimension reduction is adopted to carry out training sample construction on the original data collected by the terminal, so that the original data is more complete, the training data volume is greatly reduced, the training time and the detection accuracy of the model are improved, the dimension of the data is reduced by a principal component analysis method, new characteristic data without losing the original data information volume is generated, the calculation time is greatly shortened, and the detection capability of the model is improved.

And S102, responding to the acquired optimized power consumption data, training a training model and generating an XGboost detection model.

Step S103, performing parameter optimization on the XGboost detection model based on an improved genetic algorithm to determine an optimal hyper-parameter combination of the XGboost detection model, wherein the improved genetic algorithm comprises the steps of constructing a fitness function, and the expression of the fitness function is as follows:

in the formula (I), the compound is shown in the specification,in order to be a function of the fitness measure,in order to be the weight coefficient,in order to be precise in terms of rate of accuracy,in order to be able to recall the rate,is a root of Chao ShenThe number of the numbers.

In this embodiment, a hyper-parameter of the XGBoost detection model is optimized by using exponentially decaying intersection and variation probability, where an expression of the exponentially decaying intersection probability is:in the formula (I), wherein,in order to be a cross-over probability,is the average value of the fitness of the population,the fitness of the two crossed individuals with larger fitness is obtained,are all constant, andin order to be able to obtain the attenuation coefficient,the current genetic iteration number; the expression of the exponentially decaying mutation probability is:in the formula (I), wherein,the probability of the variation is the probability of the variation,the fitness of the variant individual is shown as the fitness,are all constant, and. The optimization efficiency of the algorithm is improved, the cross probability and the variation probability are high in the early stage of iterative optimization, the cross probability and the variation probability are low in the later stage of iterative optimization, the improvement of the algorithm efficiency is facilitated, the values of two control parameters are reasonably designed, the optimal hyper-parameter combination of the XGboost detection model is facilitated to be found by the genetic algorithm, and the global optimal solution can be found by jumping out the local optimal solution.

And S104, inputting data to be detected into the XGboost detection model, and judging whether certain optimized electricity consumption data is abnormal or not based on the optimal hyper-parameter combination.

In the present embodiment, the constructor is adopted asThe genetic algorithm optimizes the over-parameters of the XGboost model, the problem that the XGboost model is affected due to excessive over-parameters can be solved as much as possible, the purpose of improving the optimizing accuracy of the XGboost model is achieved by considering the number of the over-parameters, abnormal electricity utilization users can be rapidly and accurately identified based on the detection model of the abnormal electricity utilization behaviors of the XGboost model to the terminal users, and the power supply side can stop damage in time.

In conclusion, the method of the embodiment adopts the user abnormal electricity consumption behavior detection system based on the edge computing node, the key electricity consumption data containing the user privacy are stored, analyzed and processed locally, only the abnormal electricity consumption condition is uploaded, the user privacy can be effectively protected, the improved genetic algorithm is used for carrying out the XGboost algorithm optimized by the super-parameters, the electricity consumption abnormal behavior of the terminal user is detected, the electricity consumption privacy of the user is protected, and meanwhile, the accuracy and the rapidity of detection are greatly improved.

Referring to fig. 2, a flowchart of another XGBoost model-based abnormal electricity consumption behavior detection method for a user according to the present application is shown.

As shown in fig. 2, the XGBoost model-based method for detecting abnormal electricity consumption behavior of a user includes the following steps:

step 1: data acquisition of edge node detection device

The detection device acquires original power consumption data of power users of the power distribution network system from the power consumption acquisition system and the energy management system, wherein the original power consumption data comprises basic power consumption information data of the users, alarm information data of the terminal and electricity stealing information data of the users in the area.

Step 2: edge node calculation module preprocesses data

(2.1) data cleaning: data cleansing refers to the removal of redundant, irrelevant data from the original data to smooth out data noise. Non-resident users such as utilities and the like generally do not have abnormal electricity utilization behaviors, and electricity utilization data of the non-resident users can be deleted.

(2.2) missing value processing: data recorded by the power utilization acquisition system can be partially lost due to acquisition equipment faults, transmission packet loss and other reasons, and if lost samples are directly ignored, the data error of the daily loss rate is larger, so that the accuracy of the abnormal power utilization behavior detection model is reduced. In order to avoid the influence of missing values, the missing values are processed by an expectation-maximization (EM algorithm) interpolation method. The specific method comprises the following steps: firstly, under the condition of given observed data, the condition expectation of a missing value is obtained, and the missing data is interpolated by using the obtained condition expectation value; secondly, carrying out maximization estimation, obtaining a complete data set after interpolation, and carrying out maximum likelihood estimation on parameters of the complete data set.

The first step is as follows: the objective is to solve for the expectation of t +1 iterations:

(1)

the second step is that: iterated in the previous stepTo find a valueSo that the following holds:

(2)

wherein the observation dataFor incomplete data that contains missing values,for implicit data not observed in the observed data,andtaken together, are referred to as complete data. Function(s)Is called asThe function being a log-likelihood function of the complete dataWith respect to data on given observationsAnd current parametersUnderlying pair of implicit dataConditional probability distribution ofIn the expectation that the position of the target is not changed,the estimated value of the model parameter obtained after the iteration of the t step,the estimated value of the model parameter obtained after the iteration of the step t +1,is a joint probability distribution of X and Z,for given observation data X and current parameter estimationConditional probability distribution of underlying hidden variable Z.

Repeatedly circulating the first step and the second step untilIs sufficiently small.

(2.3) data dimension reduction: the obtained original data has more features, which can cause dimension disaster, and some features have no meaning on the detection of abnormal electricity consumption and have higher correlation degree of partial features, which are not beneficial to the training and detection of the abnormal electricity consumption behavior detection model of the user. The principal component analysis method is adopted to perform dimensionality reduction processing on the electricity utilization data, so that features with close relations are changed into new features as few as possible, the new features are irrelevant pairwise, and therefore fewer feature indexes can be used for representing important information in original data. Principal component analysis requires reduction of the input n-dimensional data to k-dimensions. The specific implementation method comprises the following steps: firstly, standardizing original data, wherein the data are all 0, and the variance is 1; secondly, calculating a covariance matrix Cov, and calculating an eigenvalue of the covariance matrix and a corresponding eigenvector; then sorting the eigenvalues according to the magnitude, selecting the largest k of the eigenvalues, and taking the corresponding eigenvectors as row vectors respectively to form an eigenvector matrix P; finally, the data are transformed into a new space consisting of k eigenvectors.

And step 3: XGboost-based abnormal user electricity consumption behavior detection model constructed by edge node discrimination module

(3.1) model input

Dividing the preprocessed sample data set into a training set and a testing set according to the proportion of 8:2, training the XGboost model based on the training set, and using the testing set as input data of model performance evaluation.

(3.2) construction of a lifting Tree

A Boosted tree (boost trees) is an integration method, the XGBoost algorithm accumulates trees based on a data set D, one tree is trained in each iteration, a CART regression tree is used as a sub-tree model of the model, and a set of regression trees is represented as:

(3)

in the formula, q represents the structure function of the tree (i.e., the index of the input x output leaf node), and the function is: will inputMapping to a certain leaf node, m is the dimension of the observed data X, T represents the number of leaf nodes of a tree,is a one-dimensional vector of length T,for the weight of each leaf node of the tree q (i.e., the weight of the input leaf node index output leaf node), f representsA CART tree.

When the training of k trees is completed, the prediction value based on the XGboost model is expressed as follows:

(4)

in the formula, k is the number of trees;is a function in a function space;is the predicted value of the ith sample;inputting ith sample data; f is the set of all possible CART regression trees.

The training iterative process of the tree is independent, namely the original model is kept unchanged, and a new function is added into the model. One function corresponds to one tree, the newly generated tree fits the residual of the last prediction, and the iterative process is shown in formula (5).

(5)

In the formula (I), the compound is shown in the specification,is the model training of the t-th round,is a reservation of the model predictions for the first t-1 round,is a function of the new addition of the t-th round.

(3.3) regularizing the objective function

The objective function of XGBoost is shown in equations (6) and (7):

(6)

(7)

in the formula: k is the number of the trees,is a function of the error in the first phase,the total training error is used for measuring the deviation between the predicted value and the true value;is a term of regularization that is,the K trees are the total regularization items of the K trees and are used for measuring the complexity of the model and preventing the model from being over-fitted through training.As the weight of the jth leaf node of the tree,the parameters are used for controlling the number of the leaf nodes and controlling the weight of the leaf nodes respectively.

For the model, the training process is to find the best combination of parameters based on the minimization of the objective function. For total training errorThe taylor expansion is performed and the final objective function depends only on the first and second derivatives of each data point in the error function, as shown in equation (8).

(8)

(9)

(10)

In the formula (I), the compound is shown in the specification,for the loss function after t iterations of the XGBoost algorithm,for the set of samples at the leaf node j,first and second order gradient statistics of the training error respectively,is the sum of the first order gradients within the leaf node j,is the sum of the second order gradients within the leaf node j.

The structure function q of the setting tree is fixed and solvedThe optimal weight and the target value of the tree node can be obtained by calculation.

(11)

(12)

In the formula (I), the compound is shown in the specification,is the optimal weight value for the jth tree node,to the final objective function value after the simplification.

(3.4) node segmentation algorithm

The XGboost algorithm adopts a greedy algorithm to split one node every time from a root node, calculates the split gain and selects a node corresponding to the maximum gain.Andrespectively dividing sample sets on the left and right sides of the point, and calculating the information gain according to an XGboost loss function:

(14)

in the formula, the three terms of addition are the Gain scores of the left sub-tree, the right sub-tree and the non-segmentation respectively, and when the Gain is less than 0, the segmentation is abandoned.

And 4, step 4: and (4) performing parameter optimization on the XGboost detection model by using a genetic algorithm to determine the optimal hyper-parameter combination of the model.

The method adopts the genetic algorithm to carry out parameter optimization on the XGboost user abnormal electricity consumption behavior detection model, so that the XGboost detection model can have more accurate detection capability under the optimal parameter combination.

(4.1) hyper-parametric coding

According to parameter adjusting experience of the XGboost model, the number of the hyper-parameters influencing the detection effect of the model is mainly four, namely the number n of the base classifiers and the learning rateMaximum depth of tree max _ depth and minimum leaf node weight. The four hyper-parameters are regarded as variable individuals solved by the genetic algorithm, the operation object of the genetic algorithm is a symbol string representing the individuals, and the four hyper-parameters are represented by unsigned binary integers.

The number n of the base classifiers is an integer between 1 and 100, and the learning rate1/10 being an integer between 0 and 10, the maximum depth max _ depth of the tree being an integer between 3 and 10, the minimum leaf node weightThe expression vector is an integer between 1 and 10 and is respectively represented by 7-bit, 4-bit, 3-bit and 4-bit unsigned binary integers which are connected together to form an 18-bit unsigned binary number, so that the genotype of an individual is formed, and a feasible solution is represented. For example, genotype x =0110101|0100|010|0110 corresponds to a phenotype x = [53,4,2,6]By way of variation, the value of the hyperparameter represented by the facies shape of the individual is n =53,=0.4,max_depth=3+2=5, and (6). Thus, the phenotype and genotype of an individual are interconverted by the encoding and decoding procedures.

(4.2) initializing the population

The genetic algorithm is an evolution operation performed on a population, before the evolution is started, population data representing initial search points needs to be initialized, the number of the population is set to be 100-300 according to the length of an individual in the super-parameter coding, and the generation number of propagation is 100. The size of the population was taken to be 200, i.e. the population consisted of 200 individuals, each generated by a random method.

(4.3) construction of fitness function

Fitness indicates the superiority or inferiority of an individual or a solution. And evaluating each individual through a fitness function, selecting the individual with high fitness value to participate in genetic operation, and eliminating the individual with low fitness value. The optimal parameter combination of the XGboost detection model is solved by using a genetic algorithm, and the fitness function is selected according to whether the XGboost model is favorable for improving the capability of detecting abnormal electricity consumption. And the evaluation indexes for evaluating the XGboost model with excellent performance comprise accuracy, f1 score and area AUC under the ROC curve. Since the f1 score takes into account both the accuracy and the recall of the detection model, the score of equation (17) is chosen as the fitness function, which is expressed as:

(15)

(16)

(17)

in the formula:the method comprises the following steps of (1) taking a fitness function as a basis, precision as an accuracy rate, recall as a recall rate, TP as the number of abnormal electricity users which are correctly detected, and FP as the number of non-abnormal electricity users which are correctly detected; FN is the number of false detections of non-abnormal electricity consumers,and k is the number of optimized hyperparameters.

And (4.4) designing genetic operators through selection, intersection and mutation operations, and continuously updating the population.

The updating mechanism of the chromosome population is to design a genetic operator by three operations of a selection operator, a crossover operator and a mutation operator. The generation number of the breeding set in the initialized population is 100 times, each generation can generate different next generation individuals in the genetic process, and the individuals with higher fitness in the current population are inherited to the next generation population according to the roulette rule through selection operation. Supposing that k individuals randomly generated by hyper-parametric coding form a group, the k individuals represent k different hyper-parameter combinations, the hyper-parameter combinations are taken as input and are brought into an XGboost model line for training, fitness values (namely the fitness values of the individuals in the group) under different hyper-parameter combinations are calculated according to a fitness function in a constructed fitness function, and the fitness values calculated by the k different parameter combinations areWill bePlotted on a disk, the size of the value represents the area on the disk. The larger the area of an individual module, the greater the probability of being selected during rotation of the wheel. The selection method is to calculate the relative fitness of each individualEach summary ofThe probability values form a region, the sum of the probability values of all the individuals in the group is 1, then k random numbers between 0 and 1 are generated, and the number of times each individual is selected is determined according to the probability region in which the random number appears.

The individuals with higher fitness are inherited to the next generation through a selection operation in a large probability event, and then the crossover and mutation operations are carried out. Both crossover and mutation operations are operations that generate new individuals. Probability of crossingAnd probability of variationThe values of (A) are two key control parameters which influence the performance and convergence of the genetic algorithm, so that in order to improve the optimization efficiency of the algorithm, a larger value is adopted in the early stage of iterative optimizationAndusing smaller ones in the later stages of iterative optimizationAndand the efficiency of the algorithm is improved. The values of the two control parameters are reasonably designed, so that the optimal hyper-parameter combination of the XGboost detection model can be found by the genetic algorithm, and the local optimal solution can be skipped to find the global optimal solution. As shown in formulas (18) and (19), the invention improves the genetic algorithm by adopting the cross and variation probability of exponential decay.

(18)

(19)

Wherein:the fitness average value of the population is obtained;the fitness of the two crossed individuals with larger fitness is obtained;the fitness of the variant individual;is a constant.For the attenuation coefficient and is typically set to 0.5, n is the current number of genetic iterations.

When the fitness of the individual is higher than the average fitness of the population, the values of Pc and Pm are dynamically adjusted, and when the fitness of the individual is lower than the average fitness of the population, a larger fixed value is givenThe value is obtained.

(4.5) judgment of termination Condition

When the genetic algorithm proceeds to the following three cases, the whole algorithm process is ended:

when the new individual fitness value produced by iteration is not significantly improved;

when the algorithm is carried out to reach the preset iteration times.

And if the termination condition is not met, returning to construct a fitness function to recalculate the fitness of the individuals in the group, then performing genetic operation, and outputting the best hyper-parameter combination of the XGboost detection model when the termination condition is met.

And 5: and uploading the abnormal power utilization judgment result of the user to a data management center by the edge node, and performing actions such as alarming, power failure and the like on the user with the abnormal power utilization state as the judgment result.

Step 6: model evaluation, namely establishing an online detection model for abnormal power consumption of users

And (3) carrying out accuracy test on the optimal hyper-parameter combination of the XGboost detection model output in the step (4) on the test set divided in the model input in the step (3), wherein the result shows that the comprehensive evaluation indexes of the XGboost detection model are remarkably improved in accuracy, f1 score and AUC. The performance of the slave model on the test set shows the effectiveness of the XGboost model based on genetic algorithm hyperparametric optimization in the abnormal electricity detection of the user.

And (3) preprocessing the data acquired on line through the step (2), inputting the data into the trained detection model, acquiring a model detection result, and judging whether abnormal power utilization occurs or not under the condition of a given threshold value.

Above-mentioned scheme can realize following technological effect:

1. the method comprises the steps of establishing a user abnormal electricity consumption behavior detection system based on edge calculation, storing, analyzing and processing key electricity consumption data containing user privacy locally, only uploading abnormal electricity consumption conditions, and effectively protecting the user privacy.

2. The design detection system carries out preprocessing of electric quantity data at the side edge node of a user and can share the calculation load of a data processing center; under the situation of finer time granularity, the edge node locally processes the electric quantity data, only uploads a small amount of key information, reduces the data quantity in communication, and reduces the bandwidth and communication requirements.

3. The XGboost model is adopted to detect the abnormal electricity consumption behavior of the terminal user, and the XGboost model has better learning performance. The abnormal user detection capability is greatly improved, and when the terminal user has abnormal power utilization conditions, accurate detection can be rapidly made.

The XGboost algorithm supports parallelism, which is parallelism at feature granularity. Before training, XGboost sorts the values of the features in advance, then a block structure is stored, and the block structure is repeatedly used in later iteration, so that the calculation amount is greatly reduced, and the detection time of the model on abnormal electricity consumption behaviors is prolonged.

5. The method aims to overcome the defects of the traditional optimization method such as cross validation, grid search parameter optimization and the like. The method adopts the genetic algorithm to optimize the parameters of the XGboost detection model, realizes the simultaneous optimization of the super parameters of a plurality of XGboost models, finally obtains the abnormal electricity utilization detection model with excellent performance, and improves the detection accuracy.

Referring to fig. 3, a block diagram of a device for detecting abnormal electricity consumption behavior of a user based on an XGBoost model according to the present application is shown.

As shown in fig. 3, the apparatus 200 for detecting abnormal electricity consumption behavior of a user includes a processing module 210, a training module 220, an optimizing module 230, and a determining module 240.

The processing module 210 is configured to perform data preprocessing on user raw data based on an edge computing node in response to the obtained user raw data, so as to obtain optimized power utilization data, where the user raw data includes user historical power utilization data and terminal equipment abnormal power utilization records; the training module 220 is configured to respond to the acquired optimized power consumption data, train a training model and generate an XGboost detection model; an optimization module 230 configured to perform parameter optimization on the XGBoost detection model based on an improved genetic algorithm, so as to determine an optimal hyper-parameter combination of the XGBoost detection model, where the improved genetic algorithm includes an exponentially decaying cross probability and a mutation probability, and an expression of the exponentially decaying cross probability is:in the formula (I), wherein,in order to be a cross-over probability,is the average value of the fitness of the population,the fitness of the two crossed individuals with larger fitness is obtained,are all constant, andin order to be able to obtain the attenuation coefficient,the current genetic iteration number; the expression of the exponentially decaying mutation probability is:in the formula (I), wherein,the probability of the variation is the probability of the variation,the fitness of the variant individual is shown as the fitness,are all constant, and(ii) a The steps in the improved genetic algorithm include constructing a fitness function, the expression of which is:in the formula (I), wherein,in order to be a function of the fitness measure,in order to be the weight coefficient,in order to be precise in terms of rate of accuracy,in order to be able to recall the rate,the number of the hyper-parameters; the judging module 240 is configured to input data to be detected into the XGBoost detection model, and judge whether a certain optimized electricity consumption data is abnormal based on the optimal hyper-parameter combination.

It should be understood that the modules depicted in fig. 3 correspond to various steps in the method described with reference to fig. 1. Thus, the operations and features described above for the method and the corresponding technical effects are also applicable to the modules in fig. 3, and are not described again here.

In other embodiments, an embodiment of the present invention further provides a non-volatile computer storage medium, where a computer-executable instruction is stored in the computer storage medium, and the computer-executable instruction may execute the method for detecting abnormal power consumption by a user in any of the above method embodiments;

as one embodiment, a non-volatile computer storage medium of the present invention stores computer-executable instructions configured to:

responding to the acquired user original data, and performing data preprocessing on the user original data based on an edge computing node to enable the user original data to be optimized power utilization data, wherein the user original data comprise user historical power utilization data and terminal equipment abnormal power utilization records;

responding to the obtained optimized electricity consumption data, training a training model and generating an XGboost detection model;

performing parameter optimization on the XGboost detection model based on an improved genetic algorithm to determine the optimal hyper-parameter combination of the XGboost detection model;

and inputting the data to be detected into the XGboost detection model, and judging whether certain optimized electricity consumption data is abnormal or not based on the optimal hyper-parameter combination.

Fig. 4 is a schematic structural diagram of an electronic device according to an embodiment of the present invention, and as shown in fig. 4, the electronic device includes: one or more processors 310 and a memory 320, one processor 310 being illustrated in fig. 4. The electronic device may further include: an input device 330 and an output device 340. The processor 310, the memory 320, the input device 330, and the output device 340 may be connected by a bus or other means, such as the bus connection in fig. 4. The memory 320 is a non-volatile computer-readable storage medium as described above. The processor 310 executes various functional applications and data processing of the server by running the nonvolatile software programs, instructions and modules stored in the memory 320, that is, the method for detecting abnormal power consumption behavior of the user in the embodiment of the method is implemented. The input device 330 may receive input numeric or character information and generate key signal inputs related to user settings and function control of the user abnormal electricity usage behavior detection apparatus. The output device 340 may include a display device such as a display screen.

The product can execute the method provided by the embodiment of the invention, and has corresponding functional modules and beneficial effects of the execution method. For technical details that are not described in detail in this embodiment, reference may be made to the method provided by the embodiment of the present invention.

As an embodiment, the electronic device is applied to a device for detecting abnormal electricity consumption behavior of a user, and is used for a client, and the device includes: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to cause the at least one processor to:

responding to the acquired user original data, and performing data preprocessing on the user original data based on an edge computing node to enable the user original data to be optimized power utilization data, wherein the user original data comprise user historical power utilization data and terminal equipment abnormal power utilization records;

responding to the obtained optimized electricity consumption data, training a training model and generating an XGboost detection model;

performing parameter optimization on the XGboost detection model based on an improved genetic algorithm to determine the optimal hyper-parameter combination of the XGboost detection model;

and inputting the data to be detected into the XGboost detection model, and judging whether certain optimized electricity consumption data is abnormal or not based on the optimal hyper-parameter combination.

Through the above description of the embodiments, those skilled in the art will clearly understand that each embodiment can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware. With this understanding in mind, the above-described technical solutions may be embodied in the form of a software product, which can be stored in a computer-readable storage medium, such as ROM/RAM, magnetic disk, optical disk, etc., and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the methods of the various embodiments or some parts of the embodiments.

Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

完整详细技术资料下载
上一篇:石墨接头机器人自动装卡簧、装栓机
下一篇:一种水质异常分析方法及装置

网友询问留言

已有0条留言

还没有人留言评论。精彩留言会获得点赞!

精彩留言,会给你点赞!