Data mining method for gas stealing behavior by utilizing fractal calculation

文档序号:7799 发布日期:2021-09-17 浏览:43次 中文

1. A data mining method aiming at the behavior of stealing fuel gas by utilizing fractal calculation is characterized in that: the method comprises the following steps:

step S1: extracting data of a plurality of gas meters, and preprocessing all the data to form a data set S to be screened; each element in the data set S has a table number of a gas meter;

step S2: traversing each element in the data set S, respectively carrying out linear correlation processing on the data of each gas meter according to the table number, and calculating a fractal curve;

step S3: and performing hierarchical clustering on the fractal curves to obtain the gas meter with the isolated point of gas stealing behavior.

2. The data mining method for gas stealing behavior by using fractal computation according to claim 1, characterized in that: the method comprises the following steps of extracting data of a plurality of gas meters and preprocessing all the data, wherein the steps comprise:

and removing the data with insufficient and unstable effective data in the data of the gas meters to form a data set S to be screened.

3. The data mining method for gas stealing behavior by using fractal computation according to claim 1, characterized in that: the step of performing linear correlation processing on the data of each gas meter according to the meter number comprises the following steps:

for each gas meter, inquiring the last reading of each day according to the meter number to form a reading data set R1 of the gas meter; the data format of each element in the reading data set R1 is as follows: [ current time, reading, daily gas usage ];

removing the element with the daily gas consumption of zero in the reading data set R1;

inquiring the reading starting time of the gas meter, and converting the element data format of the reading data set R1 into the following format by using the reading starting time: [ days, readings of difference between the current time and the reading start time ], a reading data set R2 is formed:

R2={[x1,y2],[x2,y2],...[xj,yj],...[xn,yn]};

where xj represents the number of days that the time j differs from the start time of the reading, and yj represents the reading at time j;

calculating a linear correlation coefficient ρ of the set X and the set Y from the reading data set R2:

whereinRepresents the average of the set X = { X1, X2.. xn },represents the average of the set Y = { Y1, Y2.. yn },the standard deviation of the set X is shown,representing a set YStandard deviation.

4. The data mining method for gas stealing behavior by using fractal computation according to claim 3, characterized in that: the step of calculating the fractal curve comprises the following steps:

setting a threshold value t, and removing the gas meter with the linear correlation coefficient rho > t from the data set S to remove data with strong correlation between time and gas meter reading;

calculating fractal dimension of a gas meter with a linear correlation coefficient rho less than or equal to t, setting a group of time interval sets F = { F | F, 2F, 3F,. kf,. mf }, wherein the time interval is F, and kf is the date of the kth time interval, and calculating the length v of a gas curve of the gas meter by using each element in the time interval sets F;

the gas meter comprises n straight line segments at the date of any time interval, and the gas use curve length v at the time scale of the time interval is as follows:

wherein the content of the first and second substances,the number of days of any one time interval is indicated,representing the number of days of the time interval preceding the time interval,indicating the reading of the day at any one of the time intervals,a reading indicating the day of the time interval immediately preceding the time interval;

taking logarithm f = ln (f) for each time interval, taking logarithm v = ln (v) for the length v of the gas use curve corresponding to each time interval f, and drawing a fractal curve of f-v.

5. The data mining method for gas stealing behavior by using fractal computation according to claim 4, wherein the data mining method comprises the following steps: the step of carrying out hierarchical clustering on the fractal curves to obtain the gas meter with the isolated points of gas stealing behavior comprises the following steps:

defining the fractal curve of each gas meter as p, sharing N gas meters, forming a set V = { V1, V2, …, vN }, and carrying out data standardization processing on the set V:

whereinMax (V) is the maximum value in set V, and min (V) is the minimum value in set V;

forming a set P = { P } by the standardized fractal curves of all the gas meters;

and carrying out hierarchical clustering on the set P to find out isolated points.

Background

The wide use of gas brings great convenience to social production and people's life, and meanwhile, various gas stealing phenomena are endless, which not only causes huge economic loss to gas companies, but also brings hidden dangers to social security. Due to the concealment of the gas stealing behavior and the continuous change of the gas stealing mode, the traditional methods such as inspection, check and the like are difficult to check timely and efficiently. For example, in the heating season in the north, the fuel gas consumption is large, and the economic loss caused by theft is measured in tens of millions. Therefore, how to analyze the behavior of digging and excavating the stolen gas by using the existing historical data of the gas meter can provide great help for social construction.

Disclosure of Invention

The invention aims to utilize the existing historical data of a gas meter and excavate a gas stealing behavior based on fractal dimension analysis, and provides a data mining method for the gas stealing behavior by utilizing fractal calculation.

In order to achieve the above object, the embodiments of the present invention provide the following technical solutions:

a data mining method aiming at the behavior of stealing fuel gas by utilizing fractal calculation is characterized in that: the method comprises the following steps:

step S1: extracting data of a plurality of gas meters, and preprocessing all the data to form a data set S to be screened; each element in the data set S has a table number of a gas meter;

step S2: traversing each element in the data set S, respectively carrying out linear correlation processing on the data of each gas meter according to the table number, and calculating a fractal curve;

step S3: and performing hierarchical clustering on the fractal curves to obtain the gas meter with the isolated point of gas stealing behavior.

The method comprises the following steps of extracting data of a plurality of gas meters and preprocessing all the data, wherein the steps comprise:

and removing the data with insufficient and unstable effective data in the data of the gas meters to form a data set S to be screened.

The step of performing linear correlation processing on the data of each gas meter according to the meter number comprises the following steps:

for each gas meter, inquiring the last reading of each day according to the meter number to form a reading data set R1 of the gas meter; the data format of each element in the reading data set R1 is as follows: [ current time, reading, daily gas usage ];

removing the element with the daily gas consumption of zero in the reading data set R1;

inquiring the reading starting time of the gas meter, and converting the element data format of the reading data set R1 into the following format by using the reading starting time: [ days, readings of difference between the current time and the reading start time ], a reading data set R2 is formed:

R2={[x1,y2],[x2,y2],...[xj,yj],...[xn,yn]};

where xj represents the number of days that the time j differs from the start time of the reading, and yj represents the reading at time j;

calculating a linear correlation coefficient ρ of the set X and the set Y from the reading data set R2:

whereinRepresents the average of the set X = { X1, X2.. xn },represents the average of the set Y = { Y1, Y2.. yn },the standard deviation of the set X is shown,the standard deviation of the set Y is indicated.

The step of calculating the fractal curve comprises the following steps:

setting a threshold value t, and removing the gas meter with the linear correlation coefficient rho > t from the data set S to remove data with strong correlation between time and gas meter reading;

calculating fractal dimension of a gas meter with a linear correlation coefficient rho less than or equal to t, setting a group of time interval sets F = { F | F, 2F, 3F,. kf,. mf }, wherein the time interval is F, and kf is the date of the kth time interval, and calculating the length v of a gas curve of the gas meter by using each element in the time interval sets F;

the gas meter comprises n straight line segments at the date of any time interval, and the gas use curve length v at the time scale of the time interval is as follows:

wherein the content of the first and second substances,the number of days of any one time interval is indicated,representing the number of days of the time interval preceding the time interval,indicating the reading of the day at any one of the time intervals,a reading indicating the day of the time interval immediately preceding the time interval;

taking logarithm f = ln (f) for each time interval, taking logarithm v = ln (v) for the length v of the gas use curve corresponding to each time interval f, and drawing a fractal curve of f-v.

The step of carrying out hierarchical clustering on the fractal curves to obtain the gas meter with the isolated points of gas stealing behavior comprises the following steps:

defining the fractal curve of each gas meter as p, sharing N gas meters, forming a set V = { V1, V2, …, vN }, and carrying out data standardization processing on the set V:

whereinMax (V) is the maximum value in set V, and min (V) is the minimum value in set V;

forming a set P = { P } by the standardized fractal curves of all the gas meters;

and carrying out hierarchical clustering on the set P to find out isolated points.

The invention has the beneficial effects that:

the monitoring principle of the gas stealing based on the fractal dimension is that a time interval is taken as a ruler for measuring, a fractal curve of a gas consumption K line is inspected, and on the basis, a gas user with special fractal characteristics is positioned in a hierarchical clustering mode based on Euclidean distance, so that the gas stealing user is excavated.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the embodiments will be briefly described below, it should be understood that the following drawings only illustrate some embodiments of the present invention and therefore should not be considered as limiting the scope, and for those skilled in the art, other related drawings can be obtained according to the drawings without inventive efforts.

FIG. 1 is a flow chart of a data mining method of the present invention;

FIG. 2 is a schematic view of a gas consumption K line of a gas meter having only one gas consumption mode according to an embodiment of the present invention;

FIG. 3 is a schematic diagram of the gas consumption K line existing in the embodiment of the present invention, which can be linearly approximated by a straight line segment;

FIG. 4(a) is a schematic diagram of converting linear segments which can be linearly approximated by using gas K line into time scale according to the embodiment of the present invention;

FIG. 4(b) is a schematic diagram of the calculation of the gas curve length v on a time scale expressed in time intervals according to the embodiment of the present invention;

FIG. 5 is a schematic view of the degree of inflection of a coastline measurement curve according to an embodiment;

FIG. 6 is a schematic diagram of a fractal curve according to embodiments f-v of the present invention;

FIG. 7 is a schematic diagram of hierarchical clustering according to an embodiment of the present invention;

FIG. 8(a) is a schematic diagram of a case of removing an element with an amount of gas of zero according to an embodiment of the present invention;

fig. 8(b) is a schematic diagram of another situation of removing an element with an air consumption of zero according to the embodiment of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. The components of embodiments of the present invention generally described and illustrated in the figures herein may be arranged and designed in a wide variety of different configurations. Thus, the following detailed description of the embodiments of the present invention, presented in the figures, is not intended to limit the scope of the invention, as claimed, but is merely representative of selected embodiments of the invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments of the present invention without making any creative effort, shall fall within the protection scope of the present invention.

It should be noted that: like reference numbers and letters refer to like items in the following figures, and thus, once an item is defined in one figure, it need not be further defined and explained in subsequent figures. Also, in the description of the present invention, the terms "first", "second", and the like are used for distinguishing between descriptions and not necessarily for describing a relative importance or implying any actual relationship or order between such entities or operations.

Example (b):

the invention is realized by the following technical scheme, as shown in figure 1, a data mining method for stealing gas by utilizing fractal calculation specifically comprises the following steps:

step S1: extracting data of a plurality of gas meters, and preprocessing all the data to form a data set S to be screened; each element in the data set S has a table number of a gas meter.

And removing the data with insufficient effective data and unstable data in the data of the plurality of gas meters to form a data set S to be screened, wherein each element in the data set S has a meter number of one gas meter.

The condition that the effective data is not enough is mainly due to data loss caused by communication reasons. For example, at least 60 pieces of data are needed in the project analysis cycle, and if there is a linear relationship of more than 2 segments in the 60 pieces of data, see fig. 3, and at least 30 pieces of valid data on the maximum one straight-line segment, it can be considered as valid data. The selection of valid data is based on the actual situation.

The gas meter can obtain an upper hatching (the maximum reading in the day), a lower hatching (the minimum reading in the day), a bottom line (the 1 st reading in the day) and a top line (the last 1 reading in the day) in a gas consumption K line graph of one day, and if the 1 st reading is less than the last 1 reading, the data is valid data.

Case of data instability: (1) because the character wheel of the gas meter is monotonously increased, the upper hatching and the lower hatching can not occur; (2) also the 1 st reading > the last 1 reading does not occur. Therefore, the stability index of each gas meter can be obtained by counting the number of the upper hatching and the lower hatching of each gas meter, and a threshold (for example, 20% of the threshold) is set for removing the gas meters with low data stability.

Step S2: and traversing each element in the data set S, respectively carrying out linear correlation processing on the data of each gas meter according to the table number, and calculating a fractal curve.

For each gas meter, inquiring the last reading of each day according to the meter number to form a reading data set R1 of the gas meter; the data format of each element in the reading data set R1 is as follows: [ current time, reading, daily gas usage ].

Assuming that there are N gas meters, the last reading of the first gas meter per day is queried according to the number of the first gas meter to form a reading data set R1 of the first gas meter, where the reading data set R1 is data of one gas meter, and the N gas meters form N reading data sets R1.

Taking the data of the first gas meter as an example, the data format of each element in the reading data set R1 of the first gas meter is: [ current time, reading, daily gas usage ]. For example, if the current time is 6/2020, the last reading of 6/2020 is 100, and the daily gas usage is 5, then the format of an element in the reading data set R1 may be [2020/06/06,100,5 ]. Then the current time of the next day is 6/7/2020, the last reading on that day is 103, and the daily gas usage is 3, then the reading dataset R1 may be { [2020/06/06,100,5], [2020/06/07,103,3 }.

If the reading data set R1 has an element with the daily gas consumption of zero, the element with the daily gas consumption of zero is removed. When the daily gas consumption is zero, the present embodiment may have the following cases, as an example:

referring to fig. 8(a), the L2 segment reads 0, but the L3 segment returns to normal, because the L3 and the L1 segment are approximately in a uniform linear relationship, it is considered that the L2 segment is an unexpected situation, for example, when there is reading interference or reading failure due to insufficient power, the L2 segment reads 0, but after the unexpected situation of contact interference, the reading returns to normal as the L3 segment. Due to the unexpected reading condition of the L2 segment, the data of the L2 segment needs to be rejected, otherwise, the subsequent calculation of the correlation coefficient is influenced.

Referring to fig. 8(b), the reading of the L2 segment is 0, but the L3 and the L1 segments are not in a uniform linear relationship, the L2 segment is considered to be the case that the gas consumption is 0 during normal operation, but for better calculation of the correlation coefficient, the data of the L2 segment with the gas consumption of 0 is also required to be removed.

The reading is blank-missing similar to the reading between 2015-07-02 to 2015-07-28 in fig. 2, so an element of this time needs to be removed.

Inquiring the reading starting time of the gas meter, and converting the element data format of the reading data set R1 into the following format by using the reading starting time: [ days, readings of difference between the current time and the reading start time ], a reading data set R2 is formed:

r2= { [ x1, y2], [ x2, y2], [ xj, yj ], [ xn, yn ] }, where xj denotes the number of days that the time jj differs from the reading start time, and yj denotes the reading at the time j.

Considering that the initial reading of each gas meter may not be 0, the initial reading time of the gas meter may be queried, for example, if the initial reading time of the first gas meter is 6/1/2020, the converted element data format is (5,100) at the current time of 6/2020. Then, as time passes, a reading data set R2= { [ x1, y2], [ x2, y2],.. [ xj, yj ],.. [ xn, yn ] } may be formed, if x1 represents data of 6 months and 6 days, x2 represents data of 6 months and 7 days, and xj represents the number of days that the j time and the reading start time differ, that is, xj-x (j-1) = 1; yj represents the reading at time j, but not the accumulated reading from the start time to time j, since the reading at the start time may not be 0.

By way of example:

the starting time of the first gas meter is 6 months and 1 day 2020, and the reading data set R2= { [5,100] } at 6 months and 6 days 2020;

reading data sets R2= { [5,100], [6,103] } on 6, month and 7 days in 2020;

reading data sets R2= { [5,100], [6,103], [7,105] } on 6, month and 8 days of 2020;

and so on.

From the read data set R2= { [ X1, Y2], [ X2, Y2], [ xj, yj ], [ xn, yn ] }, a set X and a set Y can be obtained, where the set X = { X1, X2,... xj.. xn }, and the set Y = { Y1, Y2,... yj.. yn }. Calculating a linear correlation coefficient ρ of the set X and the set Y from the reading data set R2:

whereinRepresents the average of the set X = { X1, X2.. xn },represents the average of the set Y = { Y1, Y2.. yn },the standard deviation of the set X is shown,the standard deviation of the set Y is indicated.

Step S3: and performing hierarchical clustering on the fractal curves to obtain the gas meter with the isolated point of gas stealing behavior.

Each gas meter can obtain its corresponding reading data set R2, so each gas meter can calculate its corresponding linear correlation coefficient ρ, and N gas data tables have N linear correlation coefficients ρ. Setting a threshold value t, and removing the gas meter with the linear correlation coefficient rho > t from the data set S so as to remove data with strong correlation between time and gas meter reading.

For a gas user who only has one gas usage mode, please refer to fig. 2, the gas usage K line of the gas meter is a curve that can be linearly approximated and monotonically increases. Although there are various means for stealing gas, such as taking over a pipeline to shunt and bypass the metering device, or destroying the accuracy of the metering device by dopants in the meter, in any case, the gas consumption K is significantly lower on a certain time scale, and the linear relationship between the gas consumption and the time is destroyed. Meanwhile, please refer to fig. 3, the time segment in which the gas stealing occurs forms a linear segment on a certain time scale, so that the local and the whole have self-similarity and the fractal dimension analysis is based.

Fractal dimension can be used for measuring complex shapes, is a supplement to Euclidean space integer latitude, and is accurately defined by Hausdorff, also called Hausdorff dimension. For any geometric object F, define:

wherein u isOf FCoverage, | u | is the length of the sub-coverage. When in useOn → 0, the hausdorv dimension can be determined by the following defined infimum:

please refer to fig. 5, which can measure the complex bending degree of the curve, in short, a fractal dimension can be used to measure a dimension of the complex geometric object, from the viewpoint of the fractal dimension, a point is 0 dimension, a straight line is 1 dimension, and a complex coastline is 1.0 dimension.

A curve which can be linearly approximated and is monotonically increased has strong correlation between time and gas meter reading and represents that the gas user does not steal gas, therefore, in order to eliminate data with strong correlation, a threshold value t is set, and the gas meter with a linear correlation coefficient rho > t is eliminated from the data set S, so that the data with strong correlation between time and gas meter reading is eliminated. Please refer to fig. 4(a), which illustrates a graph of fig. 3, in which linear segments that can be linearly approximated exist in the gas consumption K line are converted into a time scale. The time scale 1 is from the initial point (t1, r1) to the end point (t3, r3), and the time scale 2 is from the point (t2, r2) to the point (t1, r1) and the point (t2, r2) to the point (t3, r3) because the point (t2, r2) contains two straight line segments.

To facilitate the description of calculating the fractal dimension for the gas meter with linear correlation coefficient ρ ≦ t, the time scale of the transformation in fig. 4(a) is taken out, and as shown in fig. 4(b), a set of time interval sets F = { F | F, 2F, 3F.. kf.. mf }, where the time interval is F and kf is the date of the kth time interval, is set, and the length v of the gas usage curve of the gas meter is calculated using each element in the time interval set F. In fig. 4(b), points (t1, r1) are set as a first time interval f, points (t2, r2) are set as a second time interval 2f, and points (t3, r3) are set as a third time interval 3f (for illustration only, the abscissa may be unequal).

By way of example, assuming that the time interval F is 2 days, then the time interval set F = { F |2, 4, 6.. 128}, each element F in F is used to calculate the length v of the gas usage curve, and now, taking the time interval set F = { F |2, 4} as an example, see fig. 4(b), the point (t1, r1) represents the 2 nd day, the point (t2, r2) represents the 4 th day, and the point (t3, r3) represents the 6 th day.

For the calculation of time scale 1, since time scale 1 covers the entire time range of the readings, the gas usage curve length v1 for time scale 1 is:

for the calculation of time scale 2, since time scale 2 includes two straight line segments, the gas usage curve length v2 of time scale 2 is:

therefore, if the gas meter includes n straight line segments on the date of any time interval, the gas usage curve length v on the time scale of the time interval is:

wherein the content of the first and second substances,the number of days of any one time interval is indicated,representing the number of days of the time interval preceding the time interval,indicating the reading of the day at any one of the time intervals,a reading of the day of the time interval preceding the time interval. For example, when n =2, the indicating point (t2, r2) includes 2 straight line segments,the number of days representing the time interval in which t2 is present, as can be seen in figure 4(b) is 4 days,the number of days representing the time interval in which t1 is present is 2 days, thenA reading on day 4 is indicated,indicating day 2 readings.

Then taking logarithm f = ln (f) for each time interval, taking logarithm v = ln (v) for the length v of the gas usage curve corresponding to each time interval f, and drawing a fractal curve of f-v, as shown in fig. 6.

Step S3: and performing hierarchical clustering on the fractal curves to obtain the gas meter with the isolated point of gas stealing behavior.

Defining the fractal curve of each gas meter as p, sharing N gas meters, forming a set V = { V1, V2, …, vN }, and carrying out data standardization processing on the set V:

whereinMax (V) is the maximum value in set V, and min (V) is the minimum value in set V;

and forming a set P = { P } by the standardized fractal curves of all the gas meters.

And (3) performing hierarchical clustering on the set P to find out the gas meter with the isolated point being the gas stealing behavior, referring to fig. 7, wherein the gas meter 5 and the gas meter 3 are the isolated points of the clustering. Hierarchical clustering is to express the correlation of each gas meter, and in fig. 7, dashed lines are used as boundaries, the gas meters 2, 4, 1 and 6 form one class, the gas meters 5 and 3 form another class, and dashed lines are drawn according to the situation.

Thus, it will be appreciated by those skilled in the art that while embodiments of the invention have been illustrated and described in detail herein, many other variations or modifications can be made which conform to the principles of the invention, as may be directly determined or derived from the disclosure herein, without departing from the spirit and scope of the invention. Accordingly, the scope of the invention should be understood and interpreted to cover all such other variations or modifications.

完整详细技术资料下载
上一篇:石墨接头机器人自动装卡簧、装栓机
下一篇:度量空间划分多边界搜索性能衡量的方法及相关组件

网友询问留言

已有0条留言

还没有人留言评论。精彩留言会获得点赞!

精彩留言,会给你点赞!