XGboost and Stacking model fusion-based non-invasive load identification method
1. A non-invasive load identification method based on XGboost and Stacking model fusion is characterized in that: the method comprises the following steps of establishing a plurality of load identification meta-models based on XGboost, combining to form a meta-model layer fused with a Stacking model, connecting an XGboost model to fuse the meta-model, integrally forming a comprehensive identification system, improving the identification precision of an algorithm, and specifically comprising the following steps:
step1, processing unbalanced samples aiming at the problem that the identification precision of the model is reduced due to the unbalanced load samples of different types;
step2 selecting representative and generalization characteristics;
step3 establishes a plurality of classification models with different parameters based on the XGboost, so that a Stacking model fused base model layer is constructed, and another XGboost model is arranged behind the base model layer to be used as a final classification model of the integration model, so that a non-invasive load identification system fused by the XGboost and the Stacking model is established.
2. The method for preprocessing the acquired data and acquiring the related information of the electric equipment of the users in the range as claimed in claim 1, wherein:
the method comprises the steps of collecting residential electricity consumption data of a certain area, preprocessing the data, and reducing the identification accuracy of an algorithm due to the fact that most samples are trained to be fitted when data imbalance occurs in the samples; oversampling is therefore used on a few classes of samples using the KMeans SMOTE algorithm, i.e. applying KMeans clustering before oversampling using SMOTE: in the clustering step, clustering into k groups by using k means; filtering the clusters selected for oversampling, retaining clusters having a high proportion of minority class samples; then, it allocates the number of synthesized samples, and allocates more samples to clusters with few samples in sparse distribution; finally, an oversampling step, applying SMOTE in each selected cluster to achieve the target ratio of minority and majority instances.
3. The method of claim 2, wherein selecting representative and generalized features is one of the keys to the model's ability to perform accurate load recognition, and further comprising:
selecting typical steady-state current, power odd harmonic and down-sampling current as selected characteristics, wherein the calculation mode of the typical steady-state current is as follows: starting to cut off the complete steady-state current by using the position of the current rising zero crossing point, and accumulating the values of corresponding index points in all periods to calculate an average value to obtain typical steady-state current data; the power odd harmonic is calculated in the following way: multiplying vector data of current and voltage, performing Fourier transform, and taking the first 11 order odd harmonics; the calculation method of the down-sampling current is as follows: the dimensionality of the original current/voltage is reduced and the calculation amount is reduced by extracting 1 data point every other D-1 point and accumulating and summing the previous D data points, wherein the calculation mode of D is that the cycle length is divided by the down-sampling target length.
4. The non-invasive load identification method using XGboost and Stacking model fusion as claimed in claim 3, characterized in that:
the adopted Stacking integration model is formed by fusing a plurality of different XGboost models in an integration learning mode so as to improve the accuracy of non-intrusive load identification; while ensuring the learning capability of the enhanced model and reducing the complexity of the integrated model, the structure of the Stacking model fusion system is designed into 2 layers; the XGboost models with different super-parameters are used as the basic models of the first layer of the Stacking model fusion system, and the other XGboost model is arranged on the second layer of the Stacking model fusion system and used as the final classification model of the integration model, so that the classification effect of the XGboost is improved to the maximum.
Background
While modern power systems are increasingly complex, novel power grid-power user interaction energy systems aiming at building smart power grids are gradually developed, as an important means for perfecting power demand sides, non-invasive load monitoring plays an immeasurable role, and as one of important components of non-invasive load monitoring, load identification has great significance for guiding users to use electricity, improving electricity safety performance and eliminating aged electrical equipment.
The invention provides a non-invasive load identification method based on XGboost and Stacking model fusion, which is based on a non-invasive model and aims at the problem of low single model load identification accuracy.
Disclosure of Invention
The invention mainly aims to provide a non-intrusive load identification method based on XGboost and Stacking model fusion.
The method comprises the following steps:
step1, processing unbalanced samples aiming at the problem that the identification precision of the model is reduced due to the unbalanced load samples of different types;
step2 selecting representative and generalization characteristics;
step3 establishes a plurality of classification models with different parameters based on extreme gradient boost (XGboost), so that a Stacking model fused base model layer is constructed, and another XGboost model is arranged behind the base model layer to serve as a final classification model of the integration model, so that a non-invasive load identification system fused with the XGboost and the Stacking model is established.
The invention discloses a non-invasive load identification method based on XGboost and Stacking model fusion, which is characterized in that firstly, over-sampling processing is carried out on a small number of samples aiming at the problem that the identification precision of a model is reduced due to the unbalance of different types of load samples; then, selecting characteristics with representativeness and generalization, which is beneficial to accurate load identification; establishing a plurality of classification models with different parameters based on extreme gradient boost (XGboost), thereby establishing a Stacking model fused base model layer, and setting another XGboost model behind the base model layer as a final classification model of the integration model, thereby establishing a non-invasive load identification system fused by the XGboost and the Stacking model; the technical effects are as follows: compared with a single model, the method integrates a plurality of XGboost models to form a comprehensive identification system based on the Stacking model fusion method, improves the identification precision and has certain application value.
Drawings
In order to make the reader more clearly understand the embodiments of this patent, the following brief description of the drawings in the detailed description of this patent is provided:
FIG. 1 is a schematic diagram of a Stacking ensemble learning method implemented by the present invention
FIG. 2 is a structural diagram of a non-intrusive load identification method based on XGboost and Stacking model fusion implemented by the invention
Detailed Description
The invention mainly aims to provide a non-intrusive load identification method based on XGboost and Stacking model fusion.
The method comprises the following steps:
step1, processing unbalanced samples aiming at the problem that the identification precision of the model is reduced due to the unbalanced load samples of different types;
step2 selecting representative and generalization characteristics;
step3 establishes a plurality of classification models with different parameters based on extreme gradient boost (XGboost), so that a Stacking model fused base model layer is constructed, and another XGboost model is arranged behind the base model layer to serve as a final classification model of the integration model, so that a non-invasive load identification system fused with the XGboost and the Stacking model is established.
The specific steps of Step1 for processing the unbalanced sample are as follows:
the method comprises the steps of collecting residential electricity consumption data of a certain area, preprocessing the data, and reducing the identification accuracy of an algorithm due to the fact that most samples are trained to be fitted when data imbalance occurs in the samples; therefore oversampling is used on a few classes of samples using the KMeans SMOTE algorithm, i.e. KMeans clustering is applied before oversampling using SMOTE: in the clustering step, clustering into k groups by using k means; filtering the clusters selected for oversampling, retaining clusters having a high proportion of minority class samples; then, it allocates the number of synthesized samples, and allocates more samples to clusters with few samples in sparse distribution; finally, an oversampling step, applying SMOTE in each selected cluster to achieve the target ratio of minority and majority instances.
The specific steps of Step2 for extracting the load characteristics are as follows:
and selecting typical steady-state current, power odd harmonic and down-sampled current as alternative characteristics.
The typical steady state current is calculated as: starting to cut off the complete steady-state current by using the position of the current rising zero crossing point, and accumulating the values of corresponding index points in all periods to calculate an average value to obtain typical steady-state current data; the power odd harmonic is calculated in the following way: multiplying vector data of current and voltage, performing Fourier transform, and taking the first 11 order odd harmonics; the calculation method of the down-sampling current is as follows: the dimensionality of the original current/voltage is reduced and the calculation amount is reduced by extracting 1 data point every other D-1 point and accumulating and summing the previous D data points, wherein the calculation mode of D is that the cycle length is divided by the down-sampling target length.
The Step3 uses a non-invasive load identification method based on XGboost and Stacking model fusion, and comprises the following specific steps:
step1 dividing the original data set;
step2 selects k models, which may be the same or different, for building meta-models;
step3, constructing a new training set by using the N groups of classification results of the training set output by the meta-model layer, and using the new training set for training the second layer of recognition model;
step4, constructing a new test set by using the N groups of classification results of the test set output by the meta-model layer, and using the new test set for the identification of the second layer prediction model to obtain a final identification result.
The patent has certain universality, and various equivalent transformations are carried out on the technical scheme of the invention within the technical idea scope of the invention, and the direct or indirect application in other related technical fields is within the patent protection scope of the invention.