Hazardous area equipment data compression method and system based on big data analysis
1. A dangerous area equipment data compression method based on big data analysis is characterized by comprising the following steps:
selecting a time sequence of a plurality of states of the dangerous area equipment in a period of time to form a state set, and processing the time sequence in the state set to obtain a correlation sequence;
acquiring a prediction sequence of the correlation sequence according to the historical state data, and calculating a difference value between the correlation sequence and the prediction sequence to obtain a prediction error sequence;
acquiring the associated compression degree of each time sequence in the state set, wherein the calculation method of the associated compression degree comprises the following steps:
wherein the content of the first and second substances,for a sequence of timings in a set of statesThe degree of compression associated with (a) is,for a sequence of timings in a set of statesExp () is an exponential function with a natural constant as the base, and F is a time seriesThe number of the characteristic segments of (a),as a time sequenceThe length of the f-th feature segment is proportional,as a time sequenceThe square sum of the elements of the f characteristic segment at the corresponding position of the prediction error sequence; selecting a time sequence with the maximum correlation compression degree in the state set as a compression sequence corresponding to the state set;
and obtaining an optimal compression scheme according to the associated compression degree of the compression sequences corresponding to the different state sets.
2. The hazardous area equipment data compression method based on big data analysis according to claim 1, wherein the processing of the time sequence in the state set to obtain the correlation sequence comprises:
and obtaining a plurality of difference value sequences after pairwise difference is carried out on the time sequence sequences in the state set, and summing element values of the same element positions of the obtained difference value sequences to obtain a correlation sequence of the state set.
3. The hazardous area equipment data compression method based on big data analysis according to claim 1, wherein the obtaining of the prediction sequence of the correlation sequence according to the historical state data comprises:
and the current analysis time interval is a first time interval, and a prediction sequence of the correlation sequence of the first time interval is obtained according to the correlation sequence of the previous time interval adjacent to the first time interval.
4. The hazardous area equipment data compression method based on big data analysis as claimed in claim 1, wherein the compressible amount of the compressed sequence is obtained according to the length of the compressed sequence and the associated compression degree.
5. The hazardous area equipment data compression method based on big data analysis as claimed in claim 1, wherein the compression method of the state set corresponding to the compression sequence comprises:
acquiring the square sum of elements of the compressed sequence characteristic segment at the corresponding position of the prediction error sequence as a first coefficient, and acquiring the ratio of the length of the compressed sequence characteristic segment in the time sequence to the first coefficient as the distribution probability of the characteristic segment; and sequentially selecting the characteristic segments according to the distribution probability, and deleting the element with the minimum element gradient absolute value in the characteristic segments until the number of the deleted elements reaches the compressible amount.
6. The hazardous area equipment data compression method based on big data analysis according to claim 1, wherein the compression result of the hazardous area equipment data compression method can obtain the time sequence of the original state data through decompression:
obtaining a prediction sequence of the correlation sequence of the first time period according to the correlation sequence of the previous time period adjacent to the first time period, subtracting every two time sequence sequences in the state set to obtain an algebraic function of a compressed sequence, and constructing a target equation according to the algebraic function and the prediction sequence to obtain a preliminary decompressed time sequence; and correcting by combining the compression result to obtain a decompression result of the time sequence.
7. The hazardous area equipment data compression method based on big data analysis according to claim 1, wherein the obtaining of the optimal compression scheme according to the associated compression degrees of the compression sequences corresponding to different state sets comprises:
and if the compressed sequences corresponding to the plurality of state sets are the same, selecting the state set corresponding to the state set with the maximum correlation compression degree as the correlation state set when the compressed sequences are compressed.
8. The hazardous area equipment data compression method based on big data analysis according to claim 1, wherein the obtaining of the optimal compression scheme according to the associated compression degrees of the compression sequences corresponding to different state sets comprises:
combining different state sets to obtain a plurality of combinations, wherein the compression sequence corresponding to the state sets in the combinations is an alternative compression scheme, and the combinations corresponding to the alternative compression scheme need to meet the following requirements: the compression sequences corresponding to the state sets in the combination are different, and the compression sequence of any state set is not contained in other state sets in the combination; and selecting the optimal compression scheme according to the compressible amount of each compression sequence in the alternative compression scheme.
9. A hazardous area equipment data compression system based on big data analysis, comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the computer program realizes the steps of the method according to any one of claims 1 to 8 when executed by the processor.
Background
Chemical enterprises generally include various chemical reaction chambers, storage devices, high-pressure and high-temperature devices and other production equipment. During the production process, it is necessary to record various status data of each apparatus in real time, such as reaction temperature, reaction rate, consumption rate of reactants of the chemical reaction chamber obtained by the sensor, internal pressure, temperature, mechanical vibration amplitude, etc. of the storage apparatus. The various state data of each device can reflect the real-time state of the device, and not only can be used for analyzing the characteristics of the production rate, the product output, the cost consumption and the like of an enterprise, but also can reflect the characteristics of whether the production process is safe, whether abnormity occurs and the like. It is therefore necessary to store status data for each device for big data analysis and mathematical statistics to assist the safe production of the enterprise.
In chemical enterprises, due to the complexity of production lines, a plurality of devices participating in production are provided, and the generated state data volume is large. When storing state data, the amount of data is large, which results in large occupied storage space, and in order to save storage space, data is often compressed. The existing compression technology generally deletes old data stored for a long time directly or performs downsampling on the data to reduce the data volume. However, these data compression storage methods only compress single data, and on one hand, do not consider whether the data itself has useful feature information, and on the other hand, do not consider the correlation between different data or the interdependence between different data, so that too much feature information is lost after data compression, and the data cannot be recovered, which is not favorable for subsequent operations such as big data analysis or data visualization.
Disclosure of Invention
In order to solve the above technical problems, an object of the present invention is to provide a method and a system for compressing hazardous area device data based on big data analysis, wherein the adopted technical scheme is as follows:
in a first aspect, an embodiment of the present invention provides a hazardous area equipment data compression method based on big data analysis.
Selecting a time sequence of a plurality of states of the dangerous area equipment in a period of time to form a state set, and processing the time sequence in the state set to obtain a correlation sequence;
acquiring a prediction sequence of the correlation sequence according to the historical state data, and calculating a difference value between the correlation sequence and the prediction sequence to obtain a prediction error sequence;
acquiring the associated compression degree of each time sequence in the state set, wherein the calculation method of the associated compression degree comprises the following steps:
wherein the content of the first and second substances,for a sequence of timings in a set of statesThe degree of compression associated with (a) is,for a sequence of timings in a set of statesExp () is an exponential function with a natural constant as the base, and F is a time seriesThe number of the characteristic segments of (a),as a time sequenceThe length of the f-th feature segment is proportional,as a time sequenceThe square sum of the elements of the f characteristic segment at the corresponding position of the prediction error sequence; selecting a time sequence with the maximum correlation compression degree in the state set as a compression sequence corresponding to the state set;
and obtaining an optimal compression scheme according to the associated compression degree of the compression sequences corresponding to the different state sets.
Preferably, the processing the time sequence in the state set to obtain the association sequence includes: and obtaining a plurality of difference value sequences after pairwise difference is carried out on the time sequence sequences in the state set, and summing element values of the same element positions of the obtained difference value sequences to obtain a correlation sequence of the state set.
Preferably, the obtaining a predicted sequence of the associated sequence according to the historical state data includes: and the current analysis time interval is a first time interval, and a prediction sequence of the correlation sequence of the first time interval is obtained according to the correlation sequence of the previous time interval adjacent to the first time interval.
Preferably, the compressible amount of the compressed sequence is derived from the length of the compressed sequence and the associated degree of compression.
Preferably, the method for compressing the state set corresponding to the compressed sequence includes: acquiring the square sum of elements of the compressed sequence characteristic segment at the corresponding position of the prediction error sequence as a first coefficient, and acquiring the ratio of the length of the compressed sequence characteristic segment in the time sequence to the first coefficient as the distribution probability of the characteristic segment; and sequentially selecting the characteristic segments according to the distribution probability, and deleting the element with the minimum element gradient absolute value in the characteristic segments until the number of the deleted elements reaches the compressible amount.
Preferably, the compression result of the data compression method for the hazardous area equipment can obtain the time sequence of the original state data through decompression: obtaining a prediction sequence of the correlation sequence of the first time period according to the correlation sequence of the previous time period adjacent to the first time period, subtracting every two time sequence sequences in the state set to obtain an algebraic function of a compressed sequence, and constructing a target equation according to the algebraic function and the prediction sequence to obtain a preliminary decompressed time sequence; and correcting by combining the compression result to obtain a decompression result of the time sequence.
Preferably, the obtaining the optimal compression scheme according to the associated compression degrees of the compression sequences corresponding to the different state sets includes: and if the compressed sequences corresponding to the plurality of state sets are the same, selecting the state set corresponding to the state set with the maximum correlation compression degree as the correlation state set when the compressed sequences are compressed.
Preferably, the obtaining the optimal compression scheme according to the associated compression degrees of the compression sequences corresponding to the different state sets includes: combining different state sets to obtain a plurality of combinations, wherein the compression sequence corresponding to the state sets in the combinations is an alternative compression scheme, and the combinations corresponding to the alternative compression scheme need to meet the following requirements: the compression sequences corresponding to the state sets in the combination are different, and the compression sequence of any state set is not contained in other state sets in the combination; and selecting the optimal compression scheme according to the compressible amount of each compression sequence in the alternative compression scheme.
In a second aspect, another embodiment of the invention provides a hazardous area equipment data compression system based on big data analysis.
A hazardous area equipment data compression system based on big data analysis comprises a memory, a processor and a computer program which is stored on the memory and can run on the processor, wherein the computer program realizes a hazardous area equipment data compression method based on big data analysis when being executed by the processor.
The invention has the following beneficial effects:
the method comprises the steps of predicting and analyzing the characteristic distribution density of a sequence by establishing a time sequence prediction model to obtain the compressibility degree of the time sequence, obtaining associated sequences under different state combinations according to the compressibility degree of each sequence, further calculating the associated compression degrees of the different state combinations, obtaining an optimal data compression scheme by constructing a hidden Markov chain and according to the associated compression degrees of the different state combinations, and finally giving out a data compression and decompression method. The compressed data can keep important characteristics as much as possible, and simultaneously keep the incidence relation among different data.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions and advantages of the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and other drawings can be obtained by those skilled in the art without creative efforts.
Fig. 1 is a flowchart of a data compression method for hazardous area equipment based on big data analysis according to an embodiment of the present invention.
Detailed Description
To further illustrate the technical means and effects of the present invention for achieving the predetermined objects, the following detailed description of the method and system for compressing hazardous area equipment data based on big data analysis according to the present invention with reference to the accompanying drawings and preferred embodiments will be made below. In the following description, different "one embodiment" or "another embodiment" refers to not necessarily the same embodiment. Furthermore, the particular features, structures, or characteristics may be combined in any suitable manner in one or more embodiments.
Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs.
In order to solve the problem that a large amount of data of different states needing to be stored are generated by production equipment in a dangerous production area, so that the occupied storage space is large, the method and the device obtain the optimal compression scheme of each state set and the optimal compression scheme of a plurality of state set combinations according to the associated compression degree and the characteristic distribution density of different state time sequence sequences, aim to compress the state data generated by equipment in the dangerous area, and simultaneously reserve the characteristic information of the equipment state data and the associated information among the data as much as possible. The following describes a specific scheme of a hazardous area equipment data compression method and system based on big data analysis in detail with reference to the accompanying drawings.
Specific example 1:
the embodiment provides a dangerous area equipment data compression method based on big data analysis.
The specific scenes aimed by the invention are as follows: device historical state data stored in a database for a period of time is compressed. The state data is all state data generated by a single device, and comprises state data of reaction temperature, reaction rate, consumption rate of reactants and the like of the chemical reaction chamber; the length of the time period T in this embodiment is 24 hours, that is, for any state, the length of the state time sequence is 24 hours, and the time interval of the time sequence is 1 minute.
Referring to fig. 1, a flowchart of a hazardous area equipment data compression method based on big data analysis according to an embodiment of the present invention is shown. The dangerous area equipment data compression method based on big data analysis comprises the following steps:
selecting a time sequence of a plurality of states of the dangerous area equipment in a period of time to form a state set, and processing the time sequence in the state set to obtain a correlation sequence;
acquiring a prediction sequence of the correlation sequence according to the historical state data, and calculating a difference value between the correlation sequence and the prediction sequence to obtain a prediction error sequence;
acquiring the associated compression degree of each time sequence in the state set, wherein the calculation method of the associated compression degree comprises the following steps:
wherein the content of the first and second substances,for a sequence of timings in a set of statesThe degree of compression associated with (a) is,for a sequence of timings in a set of statesIs characterized byDistribution density, exp () is an exponential function with a natural constant as the base, F is a time seriesThe number of the characteristic segments of (a),as a time sequenceThe length of the f-th feature segment is proportional,as a time sequenceThe square sum of the elements of the f characteristic segment at the corresponding position of the prediction error sequence; selecting a time sequence with the maximum correlation compression degree in the state set as a compression sequence corresponding to the state set;
and obtaining an optimal compression scheme according to the associated compression degree of the compression sequences corresponding to the different state sets.
The specific implementation steps are as follows:
first, all state data of the hazardous area equipment is acquired and dimensionless.
Specifically, all state data generated by a single device in a dangerous area for a period of time are acquired; is provided with N state time sequence sequences, and each state time sequence is respectively recorded as. To be provided withFor example, the non-dimensionalization process for the state time series is:
(1) obtaining hyper-parameters. Making statistics of state N in one quarter among different time intervalsThe state data at intervals, and the average value of the data is calculated, and the average value is the hyperparameter. The hyperparameterExpectation value, time sequence for characterizing a state N of a single device in a hazardous areaEach element in (1) is inThe nearby fluctuations vary.
(2) Obtaining a time series sequenceAny one of the elementsLet us orderWherein, in the step (A),is a program assignment symbolRepresenting each element in the original sequence minusFor characterizing each element relativeThe size difference of (a); if the difference is considered as an error between a random variable and the true valueThe formed sequence is regarded as a random process, namely the values of random variables at different moments;the method is used for removing dimensions and aims to avoid influence on subsequent data analysis caused by the difference of the dimensions or the magnitude between different states of equipment. All subsequent time sequence are non-dimensionalized according to the above method, and the compression objects of the invention are the non-dimensionalized data.
At this point, a dimensionless time sequence to be compressed by the device is obtained.
Secondly, the feature distribution density of the time sequence is obtained.
Specifically, opening and closing operations are carried out on the time sequence of each state, isolated noise data are removed, then the time sequence is divided into different segments by adopting a watershed algorithm, each segment represents a change characteristic, and the segments are called as characteristic segments; the ratio of the number of feature fragments to the length of the sequence is called the feature distribution density, and is used to represent the number of features per unit length in the time series sequence.
Thus, the characteristic distribution density of the time series is obtained.
Thirdly, selecting a time sequence of a plurality of states of the dangerous area equipment in a period of time to form a state set, and processing the time sequence in the state set to obtain a correlation sequence; acquiring a prediction sequence of the correlation sequence according to the historical state data, and calculating a difference value between the correlation sequence and the prediction sequence to obtain a prediction error sequence; obtaining the associated compression degree of each sequence in the state set according to the characteristic distribution density of the time sequence in the state set, the length of the characteristic segment and the prediction error sequence; and selecting the time sequence with the maximum correlation compression degree in the state set as the compression sequence corresponding to the state set.
In particular, the amount of compressibility of a compressed sequence is derived from the length of the compressed sequence and the associated degree of compression. Acquiring the square sum of elements of the compressed sequence characteristic segment at the corresponding position of the prediction error sequence as a first coefficient, and acquiring the ratio of the length of the compressed sequence characteristic segment in the time sequence to the first coefficient as the distribution probability of the characteristic segment; and sequentially selecting the characteristic segments according to the distribution probability, and deleting the element with the minimum element gradient absolute value in the characteristic segments until the number of the deleted elements reaches the compressible amount.
Specifically, since a single device in a hazardous area has multiple states, corresponding to multiple state time series, and these time series have an association relationship therebetween, and determine the states of the devices together, data compression cannot be performed on only a single state time series, but the association relationship and interaction between different states are taken into full consideration. In the time period T, the state data generated by the device is compressed, and if no special description is provided subsequently, the data refers to the data in the time period T. N states of the equipment form a plurality of state sets S, which represent combinations of different states, and the value number of S isI.e. the value range of S is. CollectionIs the k-th value of S,containing a time-sequential sequence of one or more states.
The method for acquiring the compressed sequence in the state set comprises the following steps:
(1) and obtaining a plurality of difference value sequences after pairwise difference is carried out on the time sequence sequences in the state set, and summing element values of the same element positions of the obtained difference value sequences to obtain a correlation sequence of the state set. And (3) carrying out pairwise difference on the time sequence according to a set rule, wherein the set rule comprises the following steps: numbering the time sequence, and subtracting the time sequence with the small number from the time sequence with the large number; the sequence numbers are counted, and the sequence numbers smaller than the sequence numbers larger than the sequence numbers smaller than the sequence numbers. By state collectionFor example, the following steps are carried out: for the kth value set of the state set SThe set comprises n states; in the time period T, state time sequence sequences corresponding to the n states are obtained, wherein the state time sequence sequences are respectively. After the difference is made between every two of the sequences, all difference sequences are summed to obtain a correlation sequence. The specific method comprises the following steps:。is a collectionA correlation sequence of the time-sequential sequences of all states in (a). Wherein the content of the first and second substances,representing the corresponding elements of the two sequences by calculating the difference, ifIn which there is only one time sequenceThen, then;Represents fromOptionally taking two different sequences;represents: each state sequence has a global ID number, and when calculating the difference, the number is large minus the number is small.
In particular, for state setsThe time-series sequence with the smallest ID number is expressed asThe time-series sequence with the largest ID number is expressed asSet of statesThe time sequence with the middle ID number as the middle value is expressed as. For any one of the state setsObtaining all tuplesAnd(ii) a Wherein the content of the first and second substances,,(ii) a If it is notThe number of the former tuples and the number of the latter tuples can be equalReferred to as null sequences, which participate inWhen calculating, pairThe result of (3) has no influence. Obtaining all conditions satisfiedAnd then obtainAll null sequences in (1).
(2) And the current analysis time interval is a first time interval, and a prediction sequence of the correlation sequence of the first time interval is obtained according to the correlation sequence of the previous time interval adjacent to the first time interval. Analysis ofWhen the stability is achieved, an ARMA prediction model is established, and a prediction error sequence of a predicted value and a true value is obtained. Obtaining a setA time-series sequence other than any one of the invalid sequencesObtainingThe length ratio of the characteristic segment is obtained, the larger the ratio is, the more the characteristic can be compressed, and the smaller the ratio is, the more the characteristic is, the less the compressed quantity is; obtaining the f-th characteristic segment with the length ratio ofThe prediction error sequence corresponding to this segmentThe sum of squares of the elements above is. Then, the time sequenceThe associated degree of compression of (a) is:
wherein the content of the first and second substances,as a time sequenceThe number of the characteristic segments of (a),for the density of the characteristic distribution of the sequence,is an exponential function of the characteristic distribution density of the sequence.
As can be seen from the above equation, the lower the feature distribution density of the time series sequence, the more the feature segment ratio, and the smaller the error of the corresponding segment, the greater the degree of correlation compression. If state setTime series ofThe maximum degree of correlation compression, then the time seriesI.e. set of statesThe compressed sequence of (2). The state set is explained when the compression sequence corresponds to a larger associated compression degreeAll the time series in (1) are associated together and can be predicted, and the description showsCan be compressed; also sets of descriptionsCan not be compressed, can beIs compressed in association with a time series, the pair setsThe related compression of the time sequence in (1) refers to the collectionTime sequence with maximum compression degreeThe compression is carried out, and the compression is carried out,is called asThe compression target of (1). When in useWithout havingWhile being stationary, aggregateIs 0, and at this time, there is no corresponding compression target.
(3) The specific associated compression method comprises the following steps: state collectionIs set asTo aTarget compression sequence in (1)Obtaining compressible quantity. Wherein the content of the first and second substances,for over-parameter, the inventionIs composed ofLength of (d). In particular, if the compressible amount is not an integer, the compressible amount needs to be rounded down. First, a time series sequence is obtainedCalculating the error sequence corresponding to the f-th segmentSum of squares of the upper elementsObtaining the length pair time sequence of the f characteristic segmentLength of (1) to (2)Andratio ofWherein, in the step (A),. To pairAnd carrying out normalization processing, wherein the obtained result is the probability distribution of the feature segments, and each probability corresponds to one feature segment. Secondly, a feature segment a is sampled from all feature segments with the probability distribution, and then an element whose absolute value of the gradient is smallest in the feature segment a is deleted from the feature segment a. Finally, the time sequence is deleted by sampling a plurality of characteristic segments for a plurality of timesIn (1)The compressible amount of the state set is reached, and the final obtained compression result is。As a set of statesTime series ofAnd performing compression results of the correlation compression.
In particular, the compression result of the data compression method for the hazardous area equipment can obtain the time sequence of the original state data through decompression: obtaining a prediction sequence of the correlation sequence of the first time period according to the correlation sequence of the previous time period adjacent to the first time period, subtracting every two time sequence sequences in the state set to obtain an algebraic function of a compressed sequence, and constructing a target equation according to the algebraic function and the prediction sequence to obtain a preliminary decompressed time sequence; and correcting by combining the compression result to obtain a decompression result of the time sequence.
Specifically, the embodiment further includes a process of decompressing the compression result to obtain the original state time sequence, that is, the compression result can be obtainedObtaining original time sequence by decompression. During the time period T, it is now assumed that pairs are aggregatedThe k sequence of (1)Is compressed, the compression result isThe purpose of decompression is according toAndin (1) removingOther than solving out time series. For descriptive convenience, the set of states for time period TBy usingPresentation, time seriesBy usingIt is shown that,by usingAnd (4) showing. The specific method comprises the following steps:
(1) the time period T1 and the time period T are the same in time length by the database acquiring one time period T1 immediately before the time period T. During the time period T1, a state set is obtainedIs marked as. According toObtaining the associated sequence from all the time series in (1)(ii) a When in useWhen the sequence is a stable sequence, an ARMA model is constructed,predicting a time sequence within a time period T according to an ARMA model。
(2) Constructing a target equation:. Wherein the content of the first and second substances,is about a sequence of positionsIs determined by the function of the first and second algebraic functions,the sequences are considered known. Solve out. At this time, although it has already been solvedBut for in order toCloser to the true value, need to useResult of compression ofTo correct it.
(3) Acquisition using DTW algorithmAndthe matching relationship among the elements is specifically as follows: is provided withAny one element is e, and the DTW algorithm is used for obtainingThe element in the element set E is matched with the element E, the element E is required to be used for correcting the element in the element set E, and the correcting method comprises the following steps: multiplying all elements in E by a coefficient to obtain a data set E1, making the mean of the data set E1 be E, and then multiplyingThe elements in element set E of E are replaced with the elements in E1. When e traverses all the values, the pair can be realizedIs corrected, after correctionIs thatThe decompression result of (2).
To this end, an optimal compression and decompression scheme for each state set is obtained.
And finally, obtaining an optimal compression scheme according to the associated compression degree of the compression sequence corresponding to the combination of the different state sets.
In particular, the optimal compression scheme for one compression sequence is: and if the compressed sequences corresponding to the plurality of state sets are the same, selecting the state set corresponding to the state set with the maximum correlation compression degree as the correlation state set when the compressed sequences are compressed. The optimal compression scheme for all state data of the device is as follows: combining different state sets to obtain a plurality of combinations, wherein the compression sequence corresponding to the state sets in the combinations is an alternative compression scheme, and the combinations corresponding to the alternative compression scheme need to meet the following requirements: the compression sequences corresponding to the state sets in the combination are different, and the compression sequence of any state set is not contained in other state sets in the combination; and selecting the optimal compression scheme according to the compressible amount of each compression sequence in the alternative compression scheme.
CollectionIt may be that only a combination of states of all states of the hazardous area device is reflected, which may not be the best compression scheme, and there may be another state or combination of states that may result in a better compression scheme, i.e., the compression method with the most amount of data compression.
Specifically, an optimal compression scheme for all state data of the device, i.e., a compression scheme in which a plurality of state sets are combined together to achieve the maximum amount of compression. The invention needs to obtain a compression scheme with the best maximum data compression amount, and the specific method comprises the following steps:
(1) a hidden Markov chain data structure is obtained. The state node is(ii) a Observable nodes are. The invention uses state nodes to represent a value of a state set S, e.g.(ii) a Representing time-series data to be compressed by observable nodes, e.g.. For a hidden Markov chain, the time sequence data can bePerforming compression, and performing compressionIs to utilizeOf the represented state set; once a suitable hidden markov chain is obtained, a compression method and a decompression method of the data are obtained.
(2) One suitable hidden markov chain generation process is: suppose the tth state nodeRepresenting a set of statesSize of node represents setDegree of compression of medium dataI.e. by(ii) a CollectionThe data compressible in the middle isThen observable node corresponding to the tth state nodeRepresenting a time-sequential sequence to be compressed。
(3) Then the t +1 th nodeCollections of representationsThree conditions are satisfied: in the first place, the first,cannot be 0. The goal is to ensure that the time series data represented by the state nodes is compressible. In the second place, the first place is,the represented collection cannot containAny one of the above. Aiming at observing nodesWhen compression is performed, the state set participating in compression is madeThere is no time series that can be compressed. Third, observable nodesIs not contained inAny one state set in. The purpose is to make the data to be compressed no longer take part in the compression of other data.
(4) And combining different state sets to obtain a plurality of combinations, wherein the compression sequence corresponding to the state set in the combinations is the alternative compression scheme. Namely, a plurality of values are selected from the S in sequence randomly as initial nodes of the hidden Markov chain, so that a plurality of hidden Markov chains can be obtained, and each hidden Markov quantity corresponds to a data compression scheme. And for one hidden Markov chain, acquiring the sum of the sizes of all state nodes on the hidden Markov chain as the total compression amount of the alternative compression scheme.
(5) Obtaining hidden Markov chain with maximum total compression quantity, and compressing data according to the hidden Markov chain, namely, compressing time sequencePerforming compression, and performing compressionTime utilizationThe represented state set.
Therefore, a compression scheme with the maximum compression amount in the time period T can be obtained, and it should be noted that when the data in the time period T is compressed, the data in the time period T1 cannot be compressed, because the data in the time period T1 is needed when the data compressed in the time period T is decompressed, and therefore the data in the time period T1 is not compressed. Therefore, the invention requires data compression once every two time periods with certain length, and an implementer can specify the length of the time periods so as to control the compression amount of all data generated by the dangerous area equipment.
Thus, the optimal scheme of combined association compression of the dangerous area equipment state set is obtained.
Specific example 2:
the embodiment provides a dangerous area equipment data compression system based on big data analysis.
The specific scenes aimed by the invention are as follows: device historical state data stored in a database for a period of time is compressed. The state data is all state data generated by a single device, and comprises state data of reaction temperature, reaction rate, consumption rate of reactants and the like of the chemical reaction chamber; the time period length of the embodiment is 24 hours, that is, for any state, the state time sequence length is 24 hours, and the time interval of the time sequence is 1 minute.
The hazardous area equipment data compression system based on big data analysis comprises: the system includes a memory, a processor, and a computer program stored on the memory and executable on the processor, the computer program when executed by the processor implementing a hazardous area equipment data compression method based on big data analysis.
It should be noted that: the precedence order of the above embodiments of the present invention is only for description, and does not represent the merits of the embodiments. And specific embodiments thereof have been described above. Other embodiments are within the scope of the following claims. In some cases, the actions or steps recited in the claims may be performed in a different order than in the embodiments and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing may also be possible or may be advantageous.
The embodiments in the present specification are described in a progressive manner, and the same and similar parts among the embodiments are referred to each other, and each embodiment focuses on the differences from the other embodiments.
The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents, improvements and the like that fall within the spirit and principle of the present invention are intended to be included therein.