Method for determining influence factors of metallurgical coke quality prediction
1. A method for determining influence factors of metallurgical coke quality prediction is characterized by specifically comprising the following steps:
the method comprises the following steps: collecting data in the metallurgical coke production process, and cleaning the data;
step two: continuously training new trees by adopting a limit gradient enhancement tree method to fit the prediction residual error of the previous tree population, and summarizing the scores of the corresponding leaf nodes of each tree after the training is finished to obtain the prediction value of the sample; the objective function of the extreme gradient enhancement tree is defined as:
wherein: f is the function space represented by all decision trees, yiAndrespectively obtaining a sample real value and a model predicted value; f. ofkFor the kth decision tree, the objective function is divided into two parts, and the former term is model loss, namely prediction error; the latter term is a regularization term that is used to normalize model structure/behavior.
2. The method for determining influence factors for metallurgical coke quality prediction according to claim 1, wherein in the second step of the method, a new tree is continuously trained by using a limit gradient enhancement tree method to fit the prediction residuals of the previous tree population, specifically, the EGBT-based correlation analysis is adopted, and the method comprises the following steps:
step two, firstly: initializing a weak classifier, and setting M regression trees;
step two: solving an approximate residual error of the ith base regression tree;
step two and step three: learning the ith base regression tree by using the approximate residual Rmj obtained in the second step, and determining the region division Rmj of the ith base regression tree;
step two, four: solving the value of the final residual Rmj of the Rmj;
step two and step five: updating the regression tree;
step two, step six: and if i is less than M, repeating the second step to the second fifth step.
3. The method of claim 2, wherein the first step of the method comprises single coal quality index data, blended coal quality index data, coke quality index data, and coking process parameter data.
4. The metallurgical coke quality prediction influencing factor determination method of claim 1 wherein in method step one, the data cleaning comprises rejecting outlier data, missing data, and atypical data in the data set that are significantly outside of the actual value range.
5. The method for determining the influence factors for predicting the metallurgical coke quality according to claim 1, wherein in the first method step, data matching is performed according to the coal blending data, the coke data and the coke oven process data after the data in the metallurgical coke production process are collected.
Background
The metallurgical coke is the most important basic raw material in blast furnace smelting, is a heat source, a reducing agent, a material column framework and a penetrating agent in the blast furnace smelting production, and is also the most important regulating means in the blast furnace production process. With the development and progress of blast furnace smelting technology in recent years, particularly with the large volume of a blast furnace, the rapid development of high air temperature technology and blast oxygen-enriched coal injection technology, coke is used as a framework of a material column in the blast furnace, and the functions of ensuring air permeability and liquid permeability in the blast furnace are more prominent. The quality of metallurgical coke has great influence on the smelting process of the modern blast furnace and becomes a key factor for limiting the stable, balanced, high-quality and high-efficiency production of molten iron by the blast furnace.
The prediction of the coke quality is an important link for controlling and producing the metallurgical coke quality, but due to the complexity and instability of the metallurgical coke production, particularly the extreme complexity of the raw material coking single coal property and the non-simulation property of the coking process increase the difficulty for accurately predicting the metallurgical coke, and the method becomes a difficult problem in the coking industry. Many coking enterprises employ different methods and forms to metallurgical coke quality, but the universality of coke quality prediction is poor due to differences in metallurgical coke production.
Disclosure of Invention
The invention aims to provide a method for determining influence factors of coke quality prediction, which is used for acquiring a large amount of production data, calculating the correlation influencing the metallurgical coke quality index by using a big data processing method of a limit gradient enhancement tree and determining parameters influencing the metallurgical coke quality.
Specifically, the method for determining the influence factors for predicting the quality of metallurgical coke comprises the following steps:
the method comprises the following steps: collecting data in the metallurgical coke production process, and cleaning the data;
step two: continuously training new trees by adopting a limit gradient enhancement tree method to fit the prediction residual error of the previous tree population, and summarizing the scores of the corresponding leaf nodes of each tree after the training is finished to obtain the prediction value of the sample; the objective function of the extreme gradient enhancement tree is defined as:
wherein: f is the function space represented by all decision trees, yiAndrespectively obtaining a sample real value and a model predicted value; f. ofkIs the kth decision tree (corresponding equivalent function). The target function is divided into two parts, and the former term is model loss, namely prediction error; the latter term is a regularization term that is used to normalize model structure/behavior.
Further, in the second step of the method, a new tree is continuously trained by using a limit Gradient enhancement tree method to fit the prediction residuals of the previous tree population, specifically, correlation analysis based on EGBT (Extreme Gradient enhancement tree, EGBT) is an integrated learning algorithm is used, and the method comprises the following steps:
step two, firstly: initializing a weak classifier, and setting M regression trees;
step two: solving an approximate residual error of the ith base regression tree;
step two and step three: learning the ith base regression tree by using the approximate residual Rmj obtained in the second step, and determining the region division Rmj of the ith base regression tree;
step two, four: solving the value of the final residual Rmj of the Rmj;
step two and step five: updating the regression tree;
step two, step six: and if i is less than M, repeating the second step to the second fifth step.
The invention has the beneficial effects that: the invention utilizes the big data theory to combine with the production practice to determine an analysis method which analyzes and influences the metallurgical coke quality prediction by the big data theory, solves the extreme complexity of the single coal property of raw material coking and the non-simulation property of the coking process, increases the accurate prediction difficulty of the metallurgical coke, has poor universality of coke quality prediction, and becomes a difficult problem in the coking industry. The method fully combines quality prediction and big data application in the metallurgical coke production process, and improves the universality and accuracy of metallurgical coke prediction.
Drawings
FIG. 1 is a flow chart of the method of the present invention.
FIG. 2 is a schematic diagram of a gradient enhancement tree node.
Detailed Description
The following description of the embodiments of the present invention is provided with reference to the accompanying drawings:
as shown in the figures 1 and 2, the method collects a large amount of production data, calculates the correlation affecting the quality index of the metallurgical coke by using a big data processing method of the extreme gradient enhancement tree, and determines the parameters affecting the quality of the metallurgical coke.
1. Data collation
Collecting and extracting 135000 pieces of data in the metallurgical coke production process of a coking plant of a certain steel and iron integrated enterprise, wherein the 135000 pieces of data comprise single coal quality indexes, blending coal quality indexes, coke quality indexes and coking production process parameter data. And performing data matching according to the process data of the blended coal, the coke and the coke oven. Based on coke production process knowledge and experience accumulated by coking engineers for a long time, a preliminary rule for data cleaning is established, and data is cleaned. Outlier data, missing data (individual parameters not recorded) and atypical data (data significantly deviated from normal and rarely appeared) in the data set that significantly exceeded the actual value range were removed.
2. Correlation analysis of impact on coke quality index
And continuously training new trees by adopting a limit gradient enhancement tree method to fit the prediction residual error of the previous tree population, and summarizing the scores of the corresponding leaf nodes of each tree after the training is finished to obtain the prediction value of the sample. The objective function of the extreme gradient enhancement tree is defined as:
the adopted EGBT-based correlation analysis method comprises the following steps:
step 1: initializing a weak classifier, and setting M regression trees;
step 2: solving an approximate residual error of the ith base regression tree;
and step 3: learning the ith base regression tree by using the residual Rmj, and determining the region division Rmj of the ith base regression tree;
and 4, step 4: solving a final value Rmj of Rmj;
and 5: updating the regression tree;
step 6: and if i < M, repeating the steps 2-5.
The technical route of the method is shown in figure 1.
3. Coke quality index correlation analysis
The correlation analysis model is based on the coke oven coal blending indexes and production process data collected on site, a limit gradient enhancement tree model is constructed, and various indexes (ash content, sulfur content, M10, M40, CRI and CSR) of coke are predicted. And obtaining a model with the accuracy meeting the requirement after parameter optimization, further analyzing the model, calculating the importance of relative variables, and carrying out statistics and summarization to obtain the relevance evaluation of each characteristic variable and the target variable.
The specific model inputs and outputs are shown in table 1:
TABLE 1
And selecting standard data from the collected data to train a gradient enhancement tree regression model to obtain the importance of the six coke quality index characteristics.
While the foregoing is directed to the preferred embodiment of the present invention, it will be understood by those skilled in the art that various changes and modifications may be made without departing from the spirit and scope of the invention as defined in the appended claims.