Transformer abnormal data detection method and device, computer equipment and storage medium

文档序号:8550 发布日期:2021-09-17 浏览:29次 中文

1. A transformer abnormal data detection method is characterized by comprising the following steps:

acquiring state data of the transformer;

screening and feature extracting are carried out on the state data to obtain a feature data set;

performing clustering analysis on the characteristic data set to obtain a clustering result;

and judging abnormal data based on the clustering result to obtain and output a transformer abnormal data detection result.

2. The transformer abnormal data detection method according to claim 1, wherein the screening and feature extraction of the state data to obtain a feature data set comprises:

screening the state data by using a mutual information algorithm to obtain a data matrix;

and extracting the characteristics of the data matrix based on a principal component analysis algorithm to obtain a characteristic data set.

3. The transformer abnormal data detection method according to claim 2, wherein the extracting features of the data matrix based on a principal component analysis algorithm to obtain a feature data set comprises:

standardizing the data matrix, and obtaining a sample covariance matrix based on a standardized result;

performing eigenvalue decomposition on the covariance matrix to obtain an eigenvalue;

calculating the cumulative variance contribution rate of each eigenvalue, and determining an initial principal component according to the cumulative variance contribution rate to obtain an initial principal component matrix;

calculating a complex correlation coefficient of the original features in the data matrix and the initial principal component matrix, and determining a supplementary principal component according to the complex correlation coefficient;

and obtaining a characteristic data set according to the initial principal component and the supplementary principal component.

4. The transformer abnormal data detection method according to claim 1, wherein the performing cluster analysis on the characteristic data set to obtain a cluster result comprises:

and carrying out clustering analysis on the characteristic data set by using a MapReduce parallelization clustering algorithm to obtain a clustering result.

5. The method for detecting the abnormal data of the transformer according to claim 4, wherein the clustering analysis is performed on the characteristic data set by using a MapReduce parallelization clustering algorithm to obtain a clustering result, and the method comprises the following steps:

performing physical segmentation on the characteristic data set to obtain a plurality of data sets; sending the data set to a corresponding Map function node for key-value pair conversion;

in the Map stage, calculating the distance between the data points in each data set and a preset clustering center based on a clustering algorithm, marking each data point to the closest cluster to obtain a new cluster, and sending the new cluster to the Reduce stage;

and in the Reduce stage, calculating the clustering center of the new cluster based on the clustering algorithm until the target function is converged to obtain a clustering result.

6. The transformer anomaly data detection method according to claim 4 or 5, wherein the clustering algorithm is a K-means algorithm.

7. The method for detecting the abnormal data of the transformer according to claim 1, wherein after the abnormal data is judged based on the cluster analysis result to obtain and output the detection result of the abnormal data of the transformer, the method further comprises:

and outputting early warning information when abnormal data exists.

8. An abnormal data detection device for a transformer, comprising:

the acquisition module is used for acquiring state data of the transformer;

the extraction module is used for screening and extracting the characteristics of the state data to obtain a characteristic data set;

the clustering module is used for carrying out clustering analysis on the characteristic data set to obtain a clustering result;

and the judging module is used for judging the abnormal data based on the clustering result to obtain and output a transformer abnormal data detection result.

9. A computer device comprising a memory and a processor, the memory storing a computer program, characterized in that the processor, when executing the computer program, implements the steps of the method of any of claims 1 to 7.

10. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the method of any one of claims 1 to 7.

Background

The intelligent substation is used as a physical foundation of the intelligent power grid, is used as an information acquisition and command execution unit of an advanced dispatching center and is an important component of the intelligent power grid. The intelligent substation takes the primary and secondary equipment of the substation as digital objects, takes a high-speed network communication platform as a basis, realizes information sharing and interoperation inside and outside the substation by standardizing digital information, and realizes automatic functions such as measurement monitoring, control protection, information management and the like on the basis of network data. As an important component of the intelligent substation, the quality of the state data of the transformer directly affects the accuracy of state evaluation and decision making of the intelligent substation, so that abnormal data detection of the transformer is necessary.

According to the traditional transformer abnormal data detection method, the running state data of the transformer are collected, each state data is compared with the corresponding threshold range on the basis of the preset threshold range, and when the state data is not in the corresponding threshold range, the abnormal data is judged. That is, all the state data need to be compared and analyzed one by one, and then abnormality judgment is performed according to the comparison result. Due to the fact that the data volume of the transformer operation state data is large, the traditional transformer abnormal data detection method has the defect of low working efficiency.

Disclosure of Invention

In view of the above, it is necessary to provide a transformer abnormal data detection method, a transformer abnormal data detection device, a computer device, and a storage medium with high work efficiency.

A transformer abnormal data detection method comprises the following steps:

acquiring state data of the transformer;

screening and feature extracting are carried out on the state data to obtain a feature data set;

performing clustering analysis on the characteristic data set to obtain a clustering result;

and judging abnormal data based on the clustering result to obtain and output a transformer abnormal data detection result.

In one embodiment, the screening and feature extraction of the state data to obtain a feature data set includes:

screening the state data by using a mutual information algorithm to obtain a data matrix;

and extracting the characteristics of the data matrix based on a principal component analysis algorithm to obtain a characteristic data set.

In one embodiment, the extracting features of the data matrix based on the principal component analysis algorithm to obtain a feature data set includes:

standardizing the data matrix, and obtaining a sample covariance matrix based on a standardized result;

performing eigenvalue decomposition on the covariance matrix to obtain an eigenvalue;

calculating the cumulative variance contribution rate of each eigenvalue, and determining an initial principal component according to the cumulative variance contribution rate to obtain an initial principal component matrix;

calculating a complex correlation coefficient of the original features in the data matrix and the initial principal component matrix, and determining a supplementary principal component according to the complex correlation coefficient;

and obtaining a characteristic data set according to the initial principal component and the supplementary principal component.

In one embodiment, the performing a cluster analysis on the feature data set to obtain a clustering result includes:

and carrying out clustering analysis on the characteristic data set by using a MapReduce parallelization clustering algorithm to obtain a clustering result.

In one embodiment, the performing a clustering analysis on the feature data set by using a MapReduce parallelization clustering algorithm to obtain a clustering result includes:

performing physical segmentation on the characteristic data set to obtain a plurality of data sets; sending the data set to a corresponding Map function node for key-value pair conversion;

in the Map stage, calculating the distance between the data points in each data set and a preset clustering center based on a clustering algorithm, marking each data point to the closest cluster to obtain a new cluster, and sending the new cluster to the Reduce stage;

and in the Reduce stage, calculating the clustering center of the new cluster based on the clustering algorithm until the target function is converged to obtain a clustering result.

In one embodiment, the clustering algorithm is a K-means algorithm.

In one embodiment, after the determining abnormal data based on the cluster analysis result to obtain and output the detection result of the abnormal data of the transformer, the method further includes:

and outputting early warning information when abnormal data exists.

An abnormal data detection device for a transformer, comprising:

the acquisition module is used for acquiring state data of the transformer;

the extraction module is used for screening and extracting the characteristics of the state data to obtain a characteristic data set;

the clustering module is used for carrying out clustering analysis on the characteristic data set to obtain a clustering result;

and the judging module is used for judging the abnormal data based on the clustering result to obtain and output a transformer abnormal data detection result.

In one embodiment, the extraction module comprises:

the screening unit is used for screening the state data by using a mutual information algorithm to obtain a data matrix;

and the characteristic extraction unit is used for extracting the characteristics of the data matrix based on a principal component analysis algorithm to obtain a characteristic data set.

In one embodiment, the feature extraction unit is specifically configured to:

standardizing the data matrix, and obtaining a sample covariance matrix based on a standardized result;

performing eigenvalue decomposition on the covariance matrix to obtain an eigenvalue;

calculating the cumulative variance contribution rate of each eigenvalue, and determining an initial principal component according to the cumulative variance contribution rate to obtain an initial principal component matrix;

calculating a complex correlation coefficient of the original features in the data matrix and the initial principal component matrix, and determining a supplementary principal component according to the complex correlation coefficient;

and obtaining a characteristic data set according to the initial principal component and the supplementary principal component.

In one embodiment, the clustering module is specifically configured to: and carrying out clustering analysis on the characteristic data set by using a MapReduce parallelization clustering algorithm to obtain a clustering result.

In one embodiment, the clustering module comprises:

the segmentation unit is used for physically segmenting the characteristic data set to obtain a plurality of data sets; sending the data set to a corresponding Map function node for key-value pair conversion;

the data point marking unit is used for calculating the distance between the data point in each data set and a preset clustering center based on a clustering algorithm in the Map stage, marking each data point to the closest cluster to obtain a new cluster, and sending the new cluster to the Reduce stage;

and the clustering result generating unit is used for calculating the clustering center of the new cluster based on the clustering algorithm in the Reduce stage until the target function is converged to obtain a clustering result.

In one embodiment, the clustering algorithm is a K-means algorithm.

In one embodiment, the apparatus for detecting abnormal data of a transformer further includes:

and the early warning module is used for outputting early warning information when abnormal data exists.

A computer device comprising a memory and a processor, the memory storing a computer program, the processor implementing the following steps when executing the computer program:

acquiring state data of the transformer;

screening and feature extracting are carried out on the state data to obtain a feature data set;

performing clustering analysis on the characteristic data set to obtain a clustering result;

and judging abnormal data based on the clustering result to obtain and output a transformer abnormal data detection result.

A computer-readable storage medium, on which a computer program is stored which, when executed by a processor, carries out the steps of:

acquiring state data of the transformer;

screening and feature extracting are carried out on the state data to obtain a feature data set;

performing clustering analysis on the characteristic data set to obtain a clustering result;

and judging abnormal data based on the clustering result to obtain and output a transformer abnormal data detection result.

According to the transformer abnormal data detection method, the transformer abnormal data detection device, the computer equipment and the storage medium, the state data of the transformer is firstly screened and the characteristics are extracted to obtain the characteristic data set, which is equivalent to dimension reduction processing on the state data, so that the data volume of the subsequent processing process can be reduced, and the work efficiency is improved.

Drawings

FIG. 1 is a flow chart of a method for detecting abnormal data of a transformer according to an embodiment;

FIG. 2 is a schematic diagram illustrating types of data anomalies in an embodiment;

FIG. 3 is a flowchart of a method for detecting abnormal data of a transformer according to another embodiment;

FIG. 4 is a flow diagram illustrating the screening and feature extraction of status data to obtain a feature data set according to an embodiment;

FIG. 5 is a flowchart illustrating feature extraction of a data matrix based on a principal component analysis algorithm to obtain a feature data set according to an embodiment;

FIG. 6 is a flow chart illustrating clustering performed on feature data sets to obtain clustering results in one embodiment;

FIG. 7 is a block diagram of an apparatus for detecting abnormal data of a transformer according to an embodiment;

FIG. 8 is a block diagram of an apparatus for detecting abnormal data of a transformer according to another embodiment;

FIG. 9 is a block diagram of the components of the extraction module in one embodiment;

FIG. 10 is a block diagram illustrating the components of the clustering module in one embodiment;

FIG. 11 is a block diagram that illustrates the components of the computer device in one embodiment.

Detailed Description

To facilitate an understanding of the present application, the present application will now be described more fully with reference to the accompanying drawings. Embodiments of the present application are set forth in the accompanying drawings. This application may, however, be embodied in many different forms and should not be construed as limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete.

Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs. The terminology used herein in the description of the present application is for the purpose of describing particular embodiments only and is not intended to be limiting of the application.

With the continuous improvement of the construction of the intelligent transformer substation, the generated and stored data is continuously increased. Taking the running state data of the transformer as an example, in the data acquisition and transmission process, the problems of sensor failure, gas circuit and circuit faults, peak point deviation, device aging, some manual errors and the like may cause the data distortion, data mutation, isolated noise and other abnormalities of the existing online monitoring system. Due to the abnormal data of the transformer, the stability of the data can be disturbed, the accuracy and the convergence rate of the state evaluation of the system are influenced, software such as system topology analysis, safety analysis and reactive power optimization can be frequently operated, and the energy consumption is greatly increased. Moreover, because the operation and maintenance personnel of the transformer substation need to perform in-network scheduling according to the state data of the transformer, the existence of abnormal data inevitably affects the correctness of judgment and decision of the operation and maintenance personnel, and even the operation and maintenance personnel cannot make a real judgment on the operation condition of the transformer, so that a fire disaster of the transformer substation is caused, and a major safety accident is caused.

Based on this, the application provides a transformer abnormal data detection method in a first aspect. In one embodiment, as shown in fig. 1, the method for detecting abnormal data of a transformer includes steps S200 to S800.

Step S200: and acquiring state data of the transformer.

The state data of the transformer comprises sampling data of operating parameters of the transformer such as winding temperature, top layer oil temperature, gas content in oil, partial discharge and the like. The status data may be represented as time series sampled values of the respective operating parameter.

Step S400: and screening and feature extracting are carried out on the state data to obtain a feature data set.

The characteristic data set is a data set comprising a plurality of characteristic parameters and state data corresponding to each characteristic parameter. Specifically, the state data is screened and feature extracted, so that feature parameters of the transformer can be obtained, and a feature data set containing state data corresponding to each feature parameter is further obtained. It should be noted that the obtained feature parameters are different according to different screening and feature extraction algorithms. For example, the operation parameters can be sorted according to the influence degree of the operation parameters of the transformer on the operation state of the transformer, then screening is performed, and N operation parameters in the top of the sorting are extracted as characteristic parameters to obtain a characteristic data set; and any one operation parameter can be used as a tag variable, the rest operation parameters are used as comparison variables, the correlation degree mining is carried out on the time series sampling values of the comparison variables and the tag variable, and the operation parameter with the correlation degree higher than a preset threshold value is extracted as a characteristic parameter.

Step S600: and carrying out clustering analysis on the characteristic data set to obtain a clustering result.

Clustering is a process of classifying data into different classes or clusters, so that objects in the same cluster have great similarity, and objects in different clusters have great dissimilarity. Cluster analysis is a method of simplifying data through data modeling. The clustering method can be a systematic clustering method, a decomposition method, an addition method, a dynamic clustering method, ordered sample clustering, overlapped clustering, fuzzy clustering and the like.

Specifically, the clustering analysis is performed on the feature data set, the distance between each data point in the feature data set and a preset clustering center can be calculated, new clusters are divided, the clustering center of the new clusters is calculated, and a clustering result is obtained. It should be noted that the preset clustering center can be calculated based on a preset clustering algorithm according to historical state data acquired when the transformer normally operates. The number of the preset clustering centers is determined by the number of clusters formed after the historical state data are clustered. Further, the preset clustering algorithm may be a K-means algorithm or a K-means algorithm.

Step S800: and judging abnormal data based on the clustering result to obtain and output a transformer abnormal data detection result.

Specifically, based on the clustering result, it can be determined whether the distance between each data in the feature data set and the clustering center of the new cluster is greater than a preset distance threshold. And when data points with the distance from the clustering center of the same cluster larger than a preset distance threshold value appear in the data points corresponding to the same characteristic parameter, namely isolated points or outliers appear in the clustering result, judging the data points as abnormal data, and outputting a corresponding transformer abnormal data detection result. The preset distance threshold value can be determined by combining expert opinions according to the data characteristics and the operation characteristics of the transformer. Further, the content of the transformer abnormal data detection result is not unique, and for example, the content may only include a text indicating whether the abnormal data exists, or may also include a specific numerical value of the abnormal data and a characteristic parameter corresponding to the specific numerical value. The output object of the abnormal transformer data detection result is not unique, and can be a memory, a display or a terminal, for example. The terminal includes, but is not limited to, various personal computers, notebook computers, smart phones, tablet computers, and portable wearable devices.

In addition, in one embodiment, after the abnormal data is determined, the abnormal condition of the transformer data can be classified by combining the state data of the operation parameters corresponding to the abnormal data, and the classified result and the abnormal data monitoring result are output together.

Specifically, the causes of the generation of the abnormal data of the transformer mainly include: the data acquisition is not performed simultaneously; in the data measurement or transmission process, equipment in the system breaks down due to accidents; data measurement or transmission systems are subject to unexpected failures due to interference from external environmental factors. Based on this, data abnormal conditions can be classified based on the state data corresponding to the abnormal data, as shown in fig. 2, six types of data abnormal conditions are totally classified, including data loss, data invariance, isolated noise, short-time variation, high noise value, data mutation and the like. When the state data has blank values, dividing the state data into data missing exceptions; when the data change in the state data is almost zero, dividing the data into constant data and abnormal data; when individual data separation or mutation occurs in the state data, dividing the state data into isolated noise anomalies; when the change in the state data is large in a short period and the subsequent return is normal, the change is classified as short-time change abnormality; when a plurality of measurement data are not named normal data cluster types in the state data, dividing the state data into high noise value abnormity; and when the trend of the state data corresponding to the characteristic parameters is abnormal, dividing the state data into data mutation abnormality. The transformer data abnormal condition classification is carried out, and the classification result and the abnormal data monitoring result are output together, so that transformer substation operation and maintenance personnel can conveniently carry out targeted rechecking and maintenance according to the abnormal condition classification.

According to the transformer abnormal data detection method, the state data of the transformer is firstly screened and the characteristics are extracted to obtain the characteristic data set, which is equivalent to dimension reduction processing of the state data, so that the data volume of the subsequent processing process can be reduced, and the working efficiency is improved.

In one embodiment, as shown in fig. 3, after step S800, step S900 is further included: and outputting early warning information when abnormal data exists.

Wherein, the content and the output object of the early warning information are not unique. For example, the warning information may be text information describing the abnormal data, or may be prompt information combining sound, light, or sound and light. The output object of the early warning information can be a memory, a display or a terminal. Specifically, when abnormal data exists, the early warning information is output, operation and maintenance personnel can conveniently acquire the abnormal information in time to perform related processing, and the safety of the power system is further improved.

In one embodiment, as shown in fig. 4, step S400 includes step S420 and step S440.

Step S420: and screening the state data by using a mutual information algorithm to obtain a data matrix.

The mutual information is used for measuring the association degree of the two random variables, different from the correlation coefficient, the mutual information can only capture the linear correlation of the two random variables, the mutual information can capture any statistical dependency between the two variables, the larger the mutual information value is, the more the information shared by the two variables is, and the higher the association degree is.

The specific process of screening the state data by using a mutual information algorithm to obtain a data matrix is as follows:

firstly, a transformer state data matrix is generated according to the transformer state data, and normalization processing is carried out, so that all data fall in the range of [0,1 ]. The normalized formula is:

wherein R is a sampling value before normalization processing, R*For normalizing the processed sampled values, RminThe minimum value, R, of each sampling value corresponding to the same operating parametermaxThe maximum value of each sampling value corresponding to the same operation parameter.

And then extracting state data corresponding to any one operation parameter from the state data matrix as a tag variable, taking state data corresponding to the rest operation parameters as judgment variables, selecting a group of tag variables and judgment variables as random variables, recording the random variables as U and V, and calculating mutual information of the U and V. Specifically, when the two random variables U, V are discrete random variables, the mutual information of U and V is:

in the formula, the size range of the mutual information value I (U, V) is [0, 1], p (U, V) is a joint probability distribution of U, V, and p (U) and p (V) are edge probability distribution functions of U, V.

When the two random variables U, V are continuous random variables, then the mutual information of U and V is:

and calculating mutual information values of all judgment variables and all label variables based on the formula. And finally, according to the size relation between the mutual information value and a preset mutual information threshold value, carrying out operation parameter screening on the state data to obtain a corresponding data matrix. The preset mutual information threshold value can be determined according to the data characteristics and the operation characteristics of the transformer and by combining expert opinions.

Further, the number of the preset mutual information threshold may be one or multiple. For example, a threshold δ may be set, and it is determined whether a mutual information value I (U, V) between a variable and a tag variable is greater than the threshold δ, and if I (U, V) < δ, the variable is removed, otherwise, the variable is retained, and the retained variable constitutes a data matrix Z; the mutual information threshold range can be divided by setting a plurality of preset mutual information thresholds, the correlation degree between the judgment variable and the label variable can be obtained according to the mutual information threshold range where the mutual information value is located, and variable screening can be carried out. If the mutual information threshold values are set to be 1/3 and 2/3, when I (U, V) belongs to [0,1/3], the correlation is weak, when I (U, V) belongs to [1/3,2/3], the correlation is medium, when I (U, V) belongs to [2/3,1], the correlation is strong, judgment variables with strong correlation are reserved, judgment variables with weak correlation are removed, secondary screening is carried out on the judgment variables with medium correlation based on a mutual information algorithm until no medium correlation variables exist, final reserved variables are obtained, and the reserved variables form a data matrix Z.

Step S440: and performing feature extraction on the data matrix based on a principal component analysis algorithm to obtain a feature data set.

The principal component analysis is an analysis process for recombining a plurality of original variables with certain correlation into a group of new independent comprehensive variables to replace the original variables. Specifically, feature extraction is performed on the data matrix based on a principal component analysis algorithm, and variables possibly having correlation in the data matrix Z are converted into a group of linear uncorrelated variables through orthogonal transformation to obtain principal components, so that a corresponding feature data set is obtained.

In the above embodiment, the mutual information algorithm is used to perform primary screening on the state data, and then the principal component analysis algorithm is used to perform secondary screening to obtain the final feature data set, so that part of variables with low association degree can be removed, the data amount of subsequent cluster analysis is reduced, the cluster analysis efficiency is improved, and the overall detection speed is improved.

In one embodiment, as shown in fig. 5, step S440 includes steps S441 through S445.

Step S441: and carrying out standardization processing on the data matrix, and obtaining a sample covariance matrix based on a standardization processing result.

Specifically, let the data matrix Z contain n samples and m features in total, and be recorded as Z ═ Z1,Z2,…,Zn}TWherein Z isi={zi1,zi2,…,zim},zijIs the jth feature of the ith sample. When the data matrix Z is normalized, the corresponding standard data matrix X ═ X can be obtained after the normalization process1,X2,…,Xn}TWherein X isi={xi1,xi2,…,xim},xijIs the jth feature of the normalized ith sample. The correlation formula for the normalization process is as follows:

wherein x isijFor the jth feature of the normalized ith sample,sample mean, s, representing the jth feature in the original samplejRepresenting the standard deviation of the jth feature in the original sample.

The sample covariance matrix can be found from the standard data matrix X:

step S442: and carrying out eigenvalue decomposition on the covariance matrix to obtain an eigenvalue.

The covariance matrix is a real symmetric matrix, and one of its main properties is that it can be orthogonally diagonalized, and thus can be decomposed into eigenvectors and eigenvalues. Solving the eigenvectors and eigenvalues for the covariance matrix is equivalent to fitting a straight line that preserves the maximum variance. Where the eigenvectors track the direction of the eigenvalues, and the axes of maximum variance and covariance indicate the directions in which the data is most likely to change. Based on this, the eigenvalue solution process of the covariance matrix is equivalent to diagonalizing the covariance matrix: i.e., the other elements except the diagonal are 0, and the eigenvalues are sorted from top to bottom in size on the diagonal as follows.

Let a set of orthonormal basis vectors pi(i ═ 1,2, …, k), after X is projected through the set of orthogonal bases as a new data set T, the features of T are uncorrelated pairwise and the first k principal elements extracted should contain most of the information of the data matrix Z, i.e.:

in the formula, pi(i ═ 1,2, …, k) is the eigenvalue λ of the covariance matrix Si(i-1, 2, …, k) corresponding featuresVector and satisfy lambda12>…>λk. And features T in the new dataset Ti(i=1,2,…,k)∈RnThe new feature obtained after transformation is the feature value.

Step S443: and calculating the cumulative variance contribution rate of each eigenvalue, and determining an initial principal component according to the cumulative variance contribution rate to obtain an initial principal component matrix.

Wherein the cumulative variance contribution rate of the eigenvalues is:

in the formula, CPV is t1,t2,…,tlThe cumulative contribution of the l eigenvalues, l ≦ k.

The magnitude of the CPV value indicates t1,t2,…,tlThe ability of integrating m original variables reflects the accuracy of the principal component model. Further, the value l when the CPV is larger than or equal to 85% is taken as the number of the initial principal components to obtain an initial principal component matrix, so that the accuracy of the principal component model can reach the standard of a principal component analysis algorithm.

Step S444: and calculating a complex correlation coefficient of the original features in the data matrix and the initial principal component matrix, and determining a supplementary principal component according to the complex correlation coefficient.

The complex correlation coefficient is an index for measuring the complex correlation degree by reflecting the direct correlation degree of a dependent variable and a group of independent variables. The larger the complex correlation coefficient, the more closely the linear correlation between the elements or variables is indicated.

Specifically, the original features S and t in the data matrix S1,t2,…,tlThe complex correlation coefficient MCC of (a) may be expressed as:

in the formula, beta01,…,βlIs a linear regression coefficient.

When the feature serial number in the data matrix S is from l to m, calculating a complex correlation coefficient MCC and an average complex correlation coefficient of each original feature S and the initial principal component matrix to finally obtain an m-l dimension array MCC; verifying the number of the arrays mcc from 1 st to m-l th one by one, selecting the number of characteristic values when the complex correlation coefficient is just larger than a preset value, and expressing the number by h to obtain a supplementary principal component. Further, in one embodiment, the preset value is set to 85% to ensure that the accuracy meets the criteria of the principal component analysis algorithm.

Step S445: and obtaining a characteristic data set according to the initial principal component and the supplementary principal component.

Specifically, a final principal component is determined according to the initial principal component and the supplementary principal component obtained in the above steps, and then the data matrix Z is projected to an l + h-dimensional subspace, so that a characteristic data set including an l + h dimension can be obtained.

In the embodiment, the improvement is performed based on the principal component analysis algorithm, and an improved principal component analysis algorithm is provided, so that the strong correlation between the original data and the principal component is ensured, the high cumulative variance contribution rate is also ensured, and the improvement of the accuracy of the principal component analysis is facilitated.

In one embodiment, step S300 includes: and performing clustering analysis on the characteristic data set by using a MapReduce parallelization clustering algorithm to obtain a clustering result.

The MapReduce is a programming model on a distributed system and is used for parallel operation of large-scale data sets. The MapReduce distributed computing framework mainly comprises two processing procedures: map phase and Reduce phase. And the Map function in the Map stage and the Reduce function in the Reduce stage are customized by a user according to requirements. The Map function basically processes an input data set and produces intermediate outputs, which are then combined together by a Reduce function to derive a final result. Specifically, the MapReduce parallelization clustering algorithm is that the clustering algorithm is operated on a MapReduce distributed computing framework, and data is input and output in pairs in a key/value mode, so that the advantages of the clustering algorithm can be reserved, the problem of insufficient computing memory in big data anomaly detection can be solved, the computing speed can be increased, and the safety and reliability of the data can be ensured.

In one embodiment, the clustering algorithm is a K-means algorithm.

The K-means algorithm is a clustering analysis algorithm for iterative solution, and the method comprises the steps of dividing data into K groups in advance, randomly selecting K objects as initial clustering centers, calculating the distance between each object and each seed clustering center, and allocating each object to the nearest clustering center. The cluster centers and the objects assigned to them represent a cluster. The cluster center of a cluster is recalculated for each sample assigned based on the objects existing in the cluster. This process will be repeated until some termination condition is met. The termination condition may be that no (or minimum number) objects are reassigned to different clusters, no (or minimum number) cluster centers are changed again, and the sum of squared errors is locally minimal.

Specifically, let clustering sample set Y ═ Y1,y2,…,ynDividing the original data into q categories by using epsilon under the condition of giving a value of a classification group number q (q is less than or equal to n)i(i-1, 2, …, q) represents that q cluster centers δ are selected from n data objects1,δ2,…,δqThe cluster center is the arithmetic mean of the same category data objects:

in the formula, NiIs a class stThe number of data objects.

For other data objects, the assignment is made according to the similarity (distance) to the cluster center, which can be calculated, for example, using the euclidean distance formula:

these data objects are assigned to the closest cluster to it, a new cluster center (the mean of all objects clustered) is calculated, and the process is repeated until the standard measure function converges.

In one embodiment, the standard measure function J is:

where J is the sum of the mean square deviations of all data objects in the clustered sample set.

In the embodiment, the K-means algorithm is used, and is simple, so that the clustering analysis efficiency is improved, and the overall working efficiency of the transformer abnormal data detection method is improved.

In one embodiment, as shown in fig. 6, step S600 includes steps S610 to S630.

Step S610: carrying out physical segmentation on the characteristic data set to obtain a plurality of data sets; and sending the data set to a corresponding Map function node for key-value pair conversion.

Specifically, the feature data set may be physically divided according to a preset length to obtain a plurality of data sets, and then each data set is randomly allocated to a MAP cluster, and key-value pair conversion is performed.

Step S620: and in the Map stage, calculating the distance between the data points in each data set and a preset clustering center based on a clustering algorithm, marking each data point to the closest cluster to obtain a new cluster, and sending the new cluster to the Reduce stage.

The preset clustering center can be obtained by calculation based on a preset clustering algorithm according to historical state data acquired when the transformer normally operates. The pre-set clustering algorithm may be a K-means algorithm or a K-means algorithm. To ensure the consistency of the algorithm, the clustering algorithm in this step needs to be the same as the preset clustering algorithm. Taking a K-means algorithm as an example, in a Map stage, calculating Euclidean distances between data points in each data set and a preset clustering center based on the K-means algorithm, marking each data point to a cluster with the nearest distance to obtain a new cluster, and sending the new cluster to a Reduce stage.

Step S630: and in the Reduce stage, calculating the clustering center of the new cluster based on a clustering algorithm until the target function is converged to obtain a clustering result.

It can be understood that the clustering algorithm in step S630 is also the same as the preset clustering algorithm mentioned above, and taking the K-means algorithm as an example, the K-means algorithm is used to calculate the mean values of all data points in the new cluster, so as to obtain a new cluster center, until the objective function converges, so as to obtain a clustering result.

In the embodiment, a specific process of clustering the extracted characteristic values by the MapReduce parallelization clustering algorithm is provided, so that the advantages of the clustering algorithm can be retained, the problem of insufficient computational memory in big data anomaly detection can be solved, the operation speed can be increased, and the safety and the reliability of data can be ensured.

It should be understood that, although the steps in the flowcharts shown in the above embodiments are shown in sequence as indicated by the arrows, the steps are not necessarily executed in sequence as indicated by the arrows. The steps are not performed in the exact order shown and described, and may be performed in other orders, unless explicitly stated otherwise. Moreover, at least a part of the steps in each flowchart involved in the above embodiments may include multiple sub-steps or multiple stages, which are not necessarily performed at the same time, but may be performed at different times, and the order of performing the sub-steps or the stages is not necessarily sequential, but may be performed alternately or alternatingly with other steps or at least a part of the sub-steps or the stages of other steps.

In a second aspect of the present application, a device for detecting abnormal data of a transformer is provided. As shown in fig. 7, the transformer abnormal data detecting apparatus includes: an obtaining module 200, configured to obtain state data of a transformer; an extraction module 400, configured to perform screening and feature extraction on the state data to obtain a feature data set; a clustering module 600, configured to perform clustering analysis on the feature data set to obtain a clustering result; and the judging module 800 is configured to judge the abnormal data based on the clustering result, obtain a transformer abnormal data detection result, and output the transformer abnormal data detection result.

In one embodiment, as shown in fig. 8, the apparatus for detecting abnormal data of a transformer further includes: and an early warning module 900, configured to output early warning information when abnormal data exists.

In one embodiment, as shown in FIG. 9, the extraction module 400 includes: a screening unit 410, configured to screen the state data by using a mutual information algorithm to obtain a data matrix; and the feature extraction unit 420 is configured to perform feature extraction on the data matrix based on a principal component analysis algorithm to obtain a feature data set.

In one embodiment, the feature extraction unit 420 is specifically configured to: carrying out standardization processing on the data matrix, and obtaining a sample covariance matrix based on a standardization processing result; performing eigenvalue decomposition on the covariance matrix to obtain an eigenvalue; calculating the cumulative variance contribution rate of each characteristic value, and determining an initial principal component according to the cumulative variance contribution rate to obtain an initial principal component matrix; calculating a complex correlation coefficient of the original features in the data matrix and the initial principal component matrix, and determining a supplementary principal component according to the complex correlation coefficient; and obtaining a characteristic data set according to the initial principal component and the supplementary principal component.

In one embodiment, the clustering module 600 is specifically configured to: and performing clustering analysis on the characteristic data set by using a MapReduce parallelization clustering algorithm to obtain a clustering result.

In one embodiment, as shown in FIG. 10, the clustering module 600 includes: a dividing unit 610, configured to perform physical division on the feature data sets to obtain a plurality of data sets; sending the data set to a corresponding Map function node for key-value pair conversion; a data point marking unit 620, configured to calculate, in the Map stage, a distance between a data point in each data set and a preset clustering center based on a clustering algorithm, mark each data point to a closest cluster to obtain a new cluster, and send the new cluster to the Reduce stage; and a clustering result generating unit 630, configured to calculate, in the Reduce stage, a clustering center of the new cluster based on a clustering algorithm until the target function converges, so as to obtain a clustering result.

In one embodiment, the clustering algorithm is a K-means algorithm.

For specific limitations of the transformer abnormal data detection device, reference may be made to the above limitations on the transformer abnormal data detection method, which is not described herein again. All or part of each module in the transformer abnormal data detection device can be realized by software, hardware and a combination thereof. The modules can be embedded in a hardware form or independent from a processor in the computer device, and can also be stored in a memory in the computer device in a software form, so that the processor can call and execute operations corresponding to the modules.

In one embodiment, a computer device is provided, which may be a server, and its internal structure diagram may be as shown in fig. 11. The computer device includes a processor, a memory, and a network interface connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device comprises a nonvolatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, a computer program, and a database. The internal memory provides an environment for the operation of an operating system and computer programs in the non-volatile storage medium. The database of the computer device is used for storing status data of the transformer. The network interface of the computer device is used for communicating with an external terminal through a network connection. The computer program is executed by a processor to implement a transformer anomaly data detection method.

Those skilled in the art will appreciate that the architecture shown in fig. 11 is merely a block diagram of some of the structures associated with the disclosed aspects and is not intended to limit the computing devices to which the disclosed aspects apply, as particular computing devices may include more or less components than those shown, or may combine certain components, or have a different arrangement of components.

In one embodiment, a computer device is provided, which includes a memory and a processor, wherein the memory stores a computer program, and the processor executes the computer program to realize the steps of the transformer abnormal data detection method.

In one embodiment, a computer readable storage medium is provided, on which a computer program is stored, which when executed by a processor implements the steps of the transformer anomaly data detection method as described above.

It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by hardware instructions of a computer program, which can be stored in a non-volatile computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. Any reference to memory, storage, database or other medium used in the embodiments provided herein can include at least one of non-volatile and volatile memory. Non-volatile Memory may include Read-Only Memory (ROM), magnetic tape, floppy disk, flash Memory, optical storage, or the like. Volatile Memory can include Random Access Memory (RAM) or external cache Memory. By way of illustration and not limitation, RAM can take many forms, such as Static Random Access Memory (SRAM) or Dynamic Random Access Memory (DRAM), among others.

The technical features of the above embodiments can be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the above embodiments are not described, but should be considered as the scope of the present specification as long as there is no contradiction between the combinations of the technical features.

The above examples only express several embodiments of the present application, and the description thereof is more specific and detailed, but not construed as limiting the scope of the invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the concept of the present application, which falls within the scope of protection of the present application. Therefore, the protection scope of the present patent shall be subject to the appended claims.

完整详细技术资料下载
上一篇:石墨接头机器人自动装卡簧、装栓机
下一篇:基于模板匹配和注意力机制的少样本弱小目标检测方法

网友询问留言

已有0条留言

还没有人留言评论。精彩留言会获得点赞!

精彩留言,会给你点赞!