Optimization method for reducing gasoline octane number loss based on random forest
1. The optimization method for reducing the octane number loss of gasoline based on the random forest is characterized by comprising the following steps:
step 1: acquiring original sample data of gasoline refining, preprocessing the data, and removing abnormal data and invalid data;
step 2: screening out key variables from the obtained original sample data by a dimension reduction method;
and step 3: establishing a product octane number loss prediction model by utilizing a random forest model based on the data preprocessed in the step 1 and the key variables selected in the step 2, and performing model verification;
and 4, step 4: predicting the sulfur content and octane number loss of the product by using the SVR model and the random forest model according to the key variables screened in the step 2, and adjusting and optimizing the operating variables of the sample by using the sulfur content as a constraint condition to obtain an optimization scheme for reducing the octane number loss under different situations;
and 5: and extracting the variation range and the stride of the key variables of the original sample data, drawing by using Python, and displaying the variation of the predicted value when the single operation variable is adjusted.
2. The optimization method for reducing the octane number loss of gasoline based on random forests as claimed in claim 1, wherein the specific operation steps of the data preprocessing in the step 1 are as follows:
step 11: introducing the collected samples into an excel table, screening the amplitude limiting method of the maximum value of each row of variables, and eliminating samples which are not in the range;
step 12: removing outliers using the Lauda criterion;
step 13: replacing the cells with the value of 0 with NA, and deleting the columns with missing values of NA over 50%;
step 14: the NA values were mean-filled over two hours.
3. A random forest based optimization method for reducing gasoline octane number loss according to claim 2, wherein the specific operation steps of step 2 include:
step 21: processing the missing data according to step 13 and step 14;
step 22: and (3) low variance filtering screening: firstly, normalizing the original sample data in the step 1, and then deleting the columns with the variance less than 0.1;
step 23: calculating a Pearson correlation coefficient among the features in the processed sample data set, generating a correlation matrix, and only retaining variables with the correlation degree larger than 0.9 to obtain preliminarily screened feature variables;
step 24: when the number of the features screened in the step 23 exceeds 300 dimensions, a random forest method is used for screening the features, the Gini index is used as an evaluation index for evaluating the contribution degree to obtain an importance score of the variable, and a score calculation formula for obtaining the importance after normalization processing of the importance score is as follows:
wherein the content of the first and second substances,a Gini index representing the average change quantity of the node splitting purity of the jth feature in all decision trees of the random forest; VIMjRepresenting the average change quantity of the node splitting purities of the jth feature in all decision trees of the random forest; c represents c features in the decision tree;
step 25: and sequencing the importance of the obtained random forest variables, and finally screening a plurality of key variables.
4. A random forest based optimization method for reducing gasoline octane number loss according to claim 1, wherein the specific operation steps of step 3 include:
step 31: generating a decision tree model corresponding to the key variables, further generating a random forest model, finally obtaining a product octane number loss prediction model, and judging and outputting results through the decision tree in the random forest;
step 32: and selecting the mean square error, the R-square value, the average absolute error and the root mean square error to predict the deviation between the octane number and the true value of the product, thereby verifying the established model.
5. A random forest based optimization method for reducing gasoline octane number loss according to claim 1, wherein the specific operation steps of step 4 comprise:
step 41: setting a maximum value of sulfur content and a minimum value of octane value loss amplitude reduction;
step 42: and performing parameter optimization on the main operating variables of each sample, taking the set maximum value of the sulfur content and the octane number loss reduction amplitude as constraint conditions, and screening out the samples meeting the constraint conditions and the operating conditions thereof under the constraint conditions.
Background
Gasoline is used as a main fuel of a small vehicle, the air quality is affected by the emission of tail gas generated by the combustion of the gasoline, and the environmental problem caused by incomplete combustion of the gasoline is more and more serious along with the continuous increase of the gasoline demand.
The most intuitive index of the combustion performance of gasoline is octane number, and is used as the commercial brands (such as 89#, 92#, and 95#) of gasoline, but the octane number of the gasoline is reduced when the modern catalytic cracking gasoline is subjected to desulfurization and olefin reduction. However, the reduction of octane number causes great economic loss for enterprises, and each 1 unit reduction of octane number corresponds to about 150 yuan/ton loss. For example, if a 100-million ton/year catalytic cracking gasoline refining device is used, the economic benefit can reach four thousand five million yuan if the RON loss can be reduced by 0.3 unit.
Therefore, the key point of gasoline cleanliness is to reduce the sulfur and olefin content in gasoline and maintain the octane number of the gasoline as much as possible. However, modeling of chemical processes in the prior art is generally realized by a data association or mechanism modeling method, operating variables of the chemical processes are in a linear relationship, and a traditional data association model has relatively few variables, high requirements on analysis of raw materials by mechanism modeling and non-timely response to process optimization, so that the effect is not ideal. Therefore, how to establish an octane number loss model in the gasoline refining process and perform operation optimization is a problem to be solved urgently at present.
Disclosure of Invention
Aiming at the problems, the invention provides an optimization method for reducing the loss of the octane number of gasoline based on a random forest.
The technical solution for realizing the purpose of the invention is as follows:
an optimization method for reducing the octane number loss of gasoline based on random forests is characterized by comprising the following steps:
step 1: acquiring original sample data of gasoline refining, preprocessing the data, and removing abnormal data and invalid data;
step 2: screening out key variables from the obtained original sample data by a dimension reduction method;
and step 3: establishing a product octane number loss prediction model by utilizing a random forest model based on the data preprocessed in the step 1 and the key variables selected in the step 2, and performing model verification;
and 4, step 4: predicting the sulfur content and octane number loss of the product by using the SVR model and the random forest model according to the key variables screened in the step 2, and adjusting and optimizing the operating variables of the sample by using the sulfur content as a constraint condition to obtain an optimization scheme for reducing the octane number loss under different situations;
and 5: and extracting the variation range and the stride of the key variables of the original sample data, drawing by using Python, and displaying the variation of the predicted value when the single operation variable is adjusted.
Further, the data preprocessing described in step 1 includes the following specific operation steps:
step 11: introducing the collected samples into an excel table, screening the amplitude limiting method of the maximum value of each row of variables, and eliminating samples which are not in the range;
step 12: removing outliers using the Lauda criterion;
step 13: replacing the cells with the value of 0 with NA, and deleting the columns with missing values of NA over 50%;
step 14: the NA values were mean-filled over two hours.
Further, the specific operation steps of step 2 include:
step 21: processing the missing data according to step 13 and step 14;
step 22: and (3) low variance filtering screening: firstly, normalizing the original sample data in the step 1, and then deleting the columns with the variance less than 0.1;
step 23: calculating a Pearson correlation coefficient among the features in the processed sample data set, generating a correlation matrix, and only retaining variables with the correlation degree larger than 0.9 to obtain preliminarily screened feature variables;
step 24: when the number of the features screened in the step 23 exceeds 300 dimensions, a random forest method is used for screening the features, the Gini index is used as an evaluation index for evaluating the contribution degree to obtain an importance score of the variable, and a score calculation formula for obtaining the importance after normalization processing of the importance score is as follows:
wherein the content of the first and second substances,a Gini index representing the average change quantity of the node splitting purity of the jth feature in all decision trees of the random forest; VIMjRepresenting the average change quantity of the node splitting purities of the jth feature in all decision trees of the random forest; c represents c features in the decision tree;
step 25: and sequencing the importance of the obtained random forest variables, and finally screening a plurality of key variables.
Further, the specific operation steps of step 3 include:
step 31: generating a decision tree model corresponding to the key variables, further generating a random forest model, finally obtaining a product octane number loss prediction model, and judging and outputting results through the decision tree in the random forest;
step 32: and selecting the mean square error, the R-square value, the average absolute error and the root mean square error to predict the deviation between the octane number and the true value of the product, thereby verifying the established model.
Further, the specific operation steps in step 4 include:
step 41: setting a maximum value of sulfur content and a minimum value of octane value loss amplitude reduction;
step 42: and performing parameter optimization on the main operating variables of each sample, taking the set maximum value of the sulfur content and the octane number loss reduction amplitude as constraint conditions, and screening out the samples meeting the constraint conditions and the operating conditions thereof under the constraint conditions.
Compared with the prior art, the method has the following beneficial effects:
firstly, the invention constructs a model for reducing octane number loss in the process of refining gasoline and provides an optimization method of operating variables to predict the octane number and the sulfur content of a product. On the basis of limiting the sulfur content, aiming at applicable models under different scenes, the operating conditions after the optimization of corresponding operating variables are given, so that the octane number loss is reduced by optimizing the operating variables.
Secondly, due to the complexity of the oil refining process and the diversity of equipment, the method adopts a random forest model with a highly nonlinear and mutually strongly coupled relation, has more operation variables, can screen the variables from the aspect of statistics, finds key operation variables, can timely respond to and optimize new data in the production process, and has an effect obviously superior to that of the prior art.
Drawings
FIG. 1 is a schematic diagram of an SVM;
FIG. 2 is a random forest variable random forest model fitting graph;
FIG. 3 is a schematic diagram of the effect of reducer temperature on predicted values;
FIG. 4 is a schematic diagram of an influence track of a hydrogen-oil ratio on a predicted value;
FIG. 5 is a schematic diagram of the trajectory of the influence of the flow of the D121 destabilizing tower on the predicted value;
FIG. 6 is a schematic diagram showing the influence of the temperature of the E-101A shell pass outlet pipe on the predicted value;
FIG. 7 is a schematic diagram illustrating the effect of differential pressure between the regenerator bottom and the regeneration receiver on the predicted value;
FIG. 8 is a graphical illustration of the trace of the effect of EH-102 heating element/B-beam temperature on predicted values;
FIG. 9 is a schematic diagram of the trace of the influence of the low-pressure hot nitrogen pressure on the predicted value;
FIG. 10 is a schematic diagram showing the trajectory of the D-125 liquid level effect on the predicted value;
FIG. 11 is a schematic diagram showing the influence of the 2# catalytic gasoline inlet flow on the predicted value;
FIG. 12 is a schematic diagram of the trajectory of the effect of the steady tower top pressure on the predicted value;
FIG. 13 is a schematic diagram of the trajectory of the effect of the left exhaust temperature of K-101A on the predicted value;
FIG. 14 is a schematic diagram of the trace of the effect of sulfur content in raw gasoline on predicted values;
FIG. 15 is a schematic diagram of an influence track of hydrogen flow rate of a hydrogen mixing point on a predicted value;
FIG. 16 is a schematic diagram of the influence track of the flow of the loosening wind at the lower part of D-107 on the predicted value;
FIG. 17 is a schematic diagram of the trajectory of the effect of the outlet temperature of the stabilized column bottom on the predicted value;
FIG. 18 is a graphical illustration of the effect of regenerator temperature on predicted values;
FIG. 19 is a graphical illustration of the effect of the R-102 spool valve on predicted values;
FIG. 20 is a schematic diagram of the influence track of S-ZORB.PT-1501. PV on the predicted values.
Detailed Description
In order to make those skilled in the art better understand the technical solution of the present invention, the following further describes the technical solution of the present invention with reference to the drawings and the embodiments.
In order to solve the problems, the invention provides an optimization method for reducing the octane number loss of gasoline based on a random forest, which comprises the following steps:
step 1: data selection and preprocessing: acquiring original sample data of gasoline refining, preprocessing the data, and removing abnormal data and invalid data;
preferably, the original sample data collected by the method is two representative samples randomly selected from industrial data of a catalytic cracking gasoline refining and desulfurizing device of a petrochemical enterprise operating for 4 years: sample nos. 285 and 313; and 325 sample data were processed using Excel and Python software according to the sample determination method in the prior art.
Step 2: finding modeling main variables: screening out key variables from the obtained original sample data by a dimension reduction method;
preferably, establishing a model for octane number loss reduction requires that 7 feed properties, 2 spent sorbent properties, 2 regenerated sorbent properties, 2 product properties, etc. variables be involved, as well as another 354 operating variables. Therefore, according to 325 sample data, the modeling main variable is screened out from 367 operation variables by a dimensionality reduction method, and is made to be as representative and independent as possible.
And step 3: establishing an octane number (RON) loss prediction model: extracting effective key variables from the data by combining the existing data mining technology based on the preprocessed data in the step 1 and the key variables selected in the step 2, establishing a product octane number loss prediction model by utilizing a random forest model, and performing model verification;
and 4, step 4: optimization of the main variable operating scheme: and (3) on the premise of ensuring that the sulfur content of the product is not more than 5 mug/g, obtaining the operation variables after optimization of the main variables corresponding to the sample with the octane number (RON) loss reduction of more than 30% in the data sample by using the random forest model established in the step (3), thereby obtaining a corresponding operation scheme, wherein the properties of the raw material, the adsorbent to be regenerated and the regenerated adsorbent are kept unchanged in the optimization process.
And 5: visual display of the model: displaying the change tracks of the corresponding gasoline octane number and sulfur content in the optimization and adjustment process of the main operation variables by using graphs;
for smooth production of industrial devices, the key variables in step 2 after optimization can only be gradually adjusted.
The following examples further illustrate the aspects and effects of the present invention, but do not limit the present invention.
Examples
According to the steps, the embodiment establishes an octane number loss model, and obtains a scheme for reducing the octane number loss of the gasoline by adjusting parameters through the model:
1. step 1: data selection and processing:
most variable data in the acquired original data are normal, but some partial sites of the data of each set of device have problems, some variables only contain data of partial time periods, all the data of some variables are null values or some data are null values, the quality of the data can directly influence research results, so the original data and sample data of the sample are processed and then used, and the processing process is as follows:
(1) establishing an Excel table 1 for storing 325 sample data, processing a header of the Excel table 1, converting a two-dimensional index into a one-dimensional index, and naming rules of various properties are as follows: xx property _ xx, for example: the raw material property _ sulfur content, the product property _ octane value RON, the regenerated adsorbent _ coke, the wt% and the like, and the list names in the attachment I have English and Chinese names, and the English names are uniformly adopted due to the deficiency of the Chinese names;
(2) establishing an Excel table 2 for storing original data of samples No. 285 and No. 313, newly establishing a sheet table named as sample property in the Excel table 2, copying a sheet head in the Excel table 1 to the sheet, and keeping a source format. And copying the data of the raw materials, the products, the to-be-generated adsorbent and the regenerated adsorbent in the Excel table 2 to corresponding positions of the sample property table. Splitting the operation variable table into two sheet tables of a sample 285 and a sample 313, wherein the header of the new two tables is the second line of the operation variable table (time | S-ZORB.CAL _ H2. PV | S-ZORB.PDI _2102.PV | …) and then performing data processing by using Python;
(3) and introducing data of the sample 285 and the sample 313, screening by using a limiting method of the maximum value of each row of variables in Excel table 1, and removing a part of samples out of the range. 0 is missing data, replaced with NA, and columns with all NA values deleted. Filling the missing value by mean value within two hours, and combining the processed data and the attachment I by mean value filling directly because the data are all within two hours;
(4) outliers were removed using the Laviad criterion (3 σ criterion), which means: let the measured variable be measured with equal precision to obtain x1,x2,…,xnCalculating the arithmetic mean x and the residual error vi=xi-x (i ═ 1,2, …, n), and the standard error σ is calculated according to the Bessel equation, if a certain measured value x is presentbResidual error v ofb(1. ltoreq. b. ltoreq.n) satisfies | vb|=|xb-x > 3 σ, then x is considered to bebBad values containing large error values should be eliminated, and the Bessel formula is:
2. step 2: finding the main variables of the modeling:
the invention uses a method of firstly reducing the dimension and then modeling, which is beneficial to neglecting secondary factors and discovering and analyzing the main variables and factors influencing the model, and the process is divided into: processing missing data, low variance filtering processing, correlation analysis, principal component analysis and random forest feature screening.
(1) Processing of missing data
The data analysis shows that the sample data values belong to random deletion. Therefore, the operation variables need to be screened in a dimensionality reduction manner, and missing data is processed according to the processed data. The treatment methods generally have three types: delete data, interpolate data, and do not process. The data interpolation is only to supplement the unknown value to the subjective estimation value, and common interpolation methods include mean interpolation, data interpolation, interpolation using the same kind of mean, maximum likelihood estimation and multiple interpolation. The invention deletes the columns with missing values over 50 percent, and carries out interpolation processing on the residual missing data by utilizing a mean value interpolation mode.
(2) Low variance filtering screening
Low variance filtering is similar to the missing value deletion method, which assumes that columns with very small variations in the data columns contain a small amount of information, and therefore, all columns with small variance in the data columns are removed, noting that the data needs to be normalized before using the method, because the variance is related to the data range. The invention normalizes the data and deletes the column with the data variance less than 0.1.
(3) Relevance analysis data screening
Correlation analysis mainly studies the direction and closeness of interdependencies between variables. Firstly, calculating the Pearson correlation coefficients among 359 characteristics, generating a correlation matrix, only reserving one variable with the correlation degree larger than 0.9, wherein the number of the screened variables is 153, the number of the repeated screening variables is 5 and the number of the residual variables is 177: 367 (total number of variables) -8 (missing value screen) -34 (low variance filter) -153 (correlation) +5 (repeat) ═ 177.
(4) Random forest feature screening
When the number of the features of the data set exceeds 300 dimensions, the features which have large influence on the result need to be selected for modeling, and the method adopts random forest screening to carry out feature screening.
The random forest carries out importance assessment on the features in the following mode: considering the contribution of each feature on each tree in the random forest, averaging the contribution, and finally comparing the contribution of different features, wherein the evaluation indexes of the contribution degree comprise: keni index (Gini), out-of-bag data (OOB) error rate, etc.;
the random forest uses the Gini value as the standard of the segmentation node, in the Weighted Random Forest (WRF), the weight has 2 functions, the 1 st point is used for selecting the weighting calculation Gini value for the segmentation point, and the expression is as follows:
Δi=i(NL)-i(NR) (3)
wherein N represents an unseparated node, NLAnd NRRespectively represent the left and right side nodes after separation, WiFor class weight of class c samples, niThe number of various samples in the node is shown, and Δ i is the impurity reduction amount, and the larger the value is, the better the separation effect of the separation point is; the 2 nd point is that the class weight is used for determining the class label in the terminal node, and the expression is as follows:
nodeclass=argmaxi(niWi)(i=1.2,...,C) (4) ,
the importance score of a variable is represented by VIM, the Gini value is represented by GI, and m features X are assumed to exist1,X2,..., XmCalculate the Gini index score VIM of each featurej(j-th feature average change of node splitting purity in all decision trees of the random forest), the calculation formula of the gini index is as follows:
where k denotes that there are k classes, pmkRepresenting the proportion of the category k in the node m (calculating the change quantity of the characteristic m to the Gini value one by one);
characteristic XjThe importance of the node m, i.e., the Gini index change amount before and after the node m branches, is:
wherein GIlAnd GIrRespectively representing the Gini indexes of two new nodes after branching.
If the feature XjThe nodes that appear in decision tree i are in set M, then XjThe importance in the ith tree is:
assuming that the random forest has n trees in total, then:
and normalizing the importance scores to obtain importance scores:
importance of features returned by random forests in sklern:
TABLE 1 random forest operation variable feature importance
TABLE 2 random forest operation variable feature importance
The importance sequence of random forest variables is shown in table 1, 18 key variables are screened out, the variables meet industrial requirements, are close to the required variables in the octane number prediction formula of the traditional product, and have theoretical and practical support.
3. And step 3: establishing an octane number loss prediction model:
the model for predicting the octane number loss reduction is established by applying two models, namely a support vector machine regression model and a random forest model, to predict after the original data are processed and the data are subjected to dimensionality reduction screening extraction, so as to obtain a final prediction model result and compare the final prediction model result. Because the octane number loss is equal to the octane number of the raw material minus the octane number of the product, the indirect calculation of the octane number after the octane number of the product is predicted is more accurate. And combining the variables screened out by utilizing the random forest importance with the two models, and comparing to obtain the optimal variable and model combination mode. Therefore, the selection of the model plays a decisive role in finally establishing the octane number loss prediction model. And finally, the result obtained by modeling the random forest model is better by comparing the results of the two models.
(1) Support vector machine regression model (SVR)
SVM is called support vector machine and is mainly applied to pattern recognition, classification and regression analysis. As shown in fig. 1, the two-dimensional data points of red and blue are apparently separable by a straight line, which is referred to in the field of pattern recognition as the linear separability problem; the solid black line is the boundary, also called "decision surface", and each decision surface corresponds to a linear classifier. The SVM may be represented as:
the support vector machine is used for regression, namely an SVR model, and the basic situation is as follows: according to given training sampleWherein xi∈X=Rn,yi∈Y=RnN, look for RnA decision function of (a).
f(x)=wx+b (11),
Wherein w and b are parameters of the model to be determined, and can be obtained by fitting the data of f (x) and y.
To solve ω and b, the above problem is transformed into an optimization problem:
the constraint conditions are as follows:
equation (11) is not usually solved directly, but rather introduces its dual problem:
the constraint conditions are as follows:
in obtaining alphaiThen, if 0<αi<C, then there must be xii0, and further having:
finally, f (x) can be expressed as:
in the formula, k (x, x)i) For kernel functions, the invention uses 4 kinds of kernel functions:
the linear kernel function formula is:
the polynomial kernel function formula is:
the gaussian kernel function formula is:
the Sigmoid kernel function formula is:
(2) random forest model
Decision tree + Bagging is a random forest, which is a relatively new machine learning model integrated learning method and is also called a nonlinear tree-based model. The random forest is a forest established in a random mode and is one of cluster classification models. Decision trees that make up a random forest do not have any correlation to each other.
And after the random forest model is built, inputting a new sample into the built model and respectively judging by a decision tree. When the random forest is applied to the classification problem, a voting method is used, namely the decision model of the maximum vote number is output; for the regression problem, a simple average method is used, and regression results of a plurality of weak learners are arithmetically averaged to obtain a final model output. The random forest model has the advantages that:
a. the method is good in performance on a data set, and random forests are not easy to fall into overfitting due to the introduction of randomness.
b. The random forest has good anti-noise capability due to the introduction of randomness.
c. The method can process data with multiple feature sets, can process discrete types or continuous types, and has strong data set adaptability.
d. The training speed is high, the mutual influence among the features can be detected, and the variable importance ranking is obtained.
e. And the parallel processing greatly improves the training speed of large samples.
f. The realization is simpler.
The deviation between the octane number and the true value of the product is predicted by selecting the following different errors as regression algorithm evaluation indexes, wherein the selected errors are respectively as follows:
mse (mean square error): and is used to measure the deviation between the octane number of the product and the actual value. The closer the MSE is to 0, the better the predictive power of the representative model; conversely, the less predictive capability the representative model is. The calculation formula is as follows:
r2_ score (R square): the interval is [0,1], and R2_ score is 1, which indicates that the prediction capability of the model reaches the maximum value, and conversely, the prediction capability of the representative model is worse. The formula for R2_ score is:
mae (mean absolute error): the method is an average value of absolute errors, can better reflect the actual situation of predicted value errors, and has the following calculation formula:
rmse (root mean square error): the square root of the MSE is calculated as:
(3) model result comparison and analysis
And converting the problem of the loss of the predicted octane number into a product octane number prediction problem, wherein the loss of the octane number is the octane number of the raw material to the octane number of the product. The following 2 models were established according to the indices in table 1 and table 2 above, respectively: a product octane number prediction model based on random forest RF and SVR and a product octane number prediction model based on random forest.
The index pairs of the models are shown in tables 3-4.
TABLE 3 SVR + RF model score
TABLE 4 random forest model Scoring
As can be seen from tables 3 and 4, the evaluation indexes of the random forest are all in the front. R2 and MSE for SVR + RF do not perform well compared to random forests. And through the random forest variables and the random forest model fitting graph, as can be seen in the attached figure 2, the random forest variables are more in line with industrial requirements and are close to variables required in a traditional octane number prediction formula, so that the variables screened by the random forests are finally selected to establish a random forest model for octane number (RON) loss prediction, and the error between the predicted value and the actual value is small.
4. And 4, step 4: optimization of the main operating variables:
(1) operational change optimization analysis
And (4) according to the industrial demand condition of the optimization of the operation variables, namely, the variables which are important for reducing the octane number loss, predicting the sulfur content and the octane number loss of the product by using the operation variables screened out by the random forest in the step two and an SVR (support vector regression) model and a random forest model. And (3) adjusting the operating variable of each sample on the premise that the sulfur content is not more than 5 mug/g by taking the sulfur content as a constraint condition, and screening out the operating condition optimization mode of the sample with the octane number loss reduction amplitude of more than 30%.
(2) Optimization of the Main operating scheme (comparing two models to obtain a random forest model with results more in line with the industry requirements)
Firstly, optimizing an operation scheme by a support vector machine regression (SVR) model;
because the SVR model adopts the linear kernel function to predict the octane number of the product, the operating variables and the octane number of the product are in a linear relation, and therefore the operating variables can be adjusted and optimized by utilizing the correlation matrix. The operating variable is positively correlated with the octane number loss of the product, the maximum value of the operating variable selection interval is selected, and the minimum value is selected in the negative correlation. The interval table is shown in table 5.
TABLE 5 operating variable selection interval
After optimization, under the premise that the sulfur content of 45 samples is not more than 5 mu g/g, the octane number (RON) loss reduction amplitude is more than 30%, and the values after the operation variables are adjusted and optimized are shown in Table 6.
TABLE 6 optimized values of SVR model operating variables
As can be seen by combining Table 5 and Table 6, in order to optimize the octane loss reduction, the temperature of the reducer, the temperature of the E-101A shell pass outlet pipe, the pressure difference between the bottom of the regenerator and the regeneration receiver, the temperature of the EH-102 heating element/B bundle, the pressure of low-pressure hot nitrogen gas, the flow rate of the No. 2 catalytic gasoline inlet device, the sulfur content of raw gasoline and the R-102 bottom slide valve are increased, and the hydrogen-oil ratio, the flow rate of the D121 destabilizing tower, the D-125 liquid level, the pressure of the stabilizing tower top, the K-101A left exhaust temperature, the hydrogen flow rate of the hydrogen mixing point, the loosening air flow rate at the lower part of D-107, the outlet temperature of the stabilizing tower bottom, the temperature of.
Secondly, optimizing the operation scheme by the random forest model;
the random forest is a tree model and is a nonlinear model. The optimization of operating variables under the random forest model is a nonlinear optimization problem. And each variable selects the value with the best optimization degree of the octane number of the product as a final value, so that the octane number loss of the product is reduced to reach a local optimal solution. And the first 60 samples are selected for operation variable tuning to obtain 16 samples meeting the conditions, and the operation variable value of each sample is shown in table 7.
TABLE 7 optimized values of random forest model operating variables
It can be seen from table 5 and table 7 that the optimization conditions of the operation variables of each sample are different, a uniform optimization method cannot be provided, and different operation condition optimization modes are adopted for different sample raw materials and adsorbent characteristics.
Finally, the two protocols were compared and the results are shown in table 8.
TABLE 8 scheme COMPARATIVE TABLE
As can be seen from table 8, the nonlinear random forest model is optimized for each sample, and the ratio of the number of optimized samples is nearly one time that of the SVR model, which has better pertinence and better optimization degree compared to the SVR model. Therefore, different schemes can be selected according to different situations in the practical problem, the SVR model is adopted for fast optimization under the condition of large sample amount, and the random forest model is adopted for targeted optimization under the condition of small sample amount to ensure the minimum octane number reduction, so that the industrial requirement is better met.
5. And 5: visual display of the random forest model;
according to the industrial requirements, the change of the gasoline octane number and the sulfur content can be visually displayed when key variable adjustment optimization is carried out on the 325 collected sample data (sample No. 133). And extracting the change range and the step of each operation variable of the No. 133 sample, drawing by using Python, and displaying the change of the predicted value when the single operation variable is adjusted. The model adopts a random forest model, and the variables adopt variables selected by a random forest importance screening method.
(1) Model visualization display
TABLE 9 operating variable value Range
As can be seen from the attached figure 3, as the temperature of the reducer rises, the sulfur content and the octane number of the gasoline tend to rise, the temperature of the reducer is positively and linearly related to the sulfur content and the octane number of the gasoline.
As can be seen from the attached figure 4, as the hydrogen-oil ratio increases, the sulfur content and the gasoline octane number are in a descending trend, and the hydrogen-oil ratio is in a negative correlation and linear relationship with the sulfur content and the gasoline octane number.
As can be seen from the attached FIG. 5, as the flow rate of the D121 destabilizing tower increases, the sulfur content tends to increase, then decrease, and then increase, and the octane number of the gasoline tends to decrease first and then increase. The flow of the D121 destabilizing tower has a nonlinear relation with the sulfur content and the gasoline octane number.
As can be seen from the attached FIG. 6, the sulfur content is in a rising trend along with the temperature rise of the outlet pipe of the shell side of E-101A, and the octane number of the gasoline is in a rising trend after first falling. The temperature of an outlet pipe of the E-101A shell pass and the sulfur content are in a positive correlation linear relationship, and the octane number of the gasoline is in a non-linear relationship.
As can be seen from fig. 7, as the pressure difference between the regenerator bottom and the regeneration receiver increases, the sulfur content tends to decrease and then increase, and the gasoline octane number tends to increase and then decrease. The pressure difference between the bottom of the regenerator and the regeneration receiver has a nonlinear relationship with the sulfur content and the gasoline octane number.
As can be seen in FIG. 8, as EH-102 heating element/B-beam temperature increases, the sulfur content tends to increase in steps and the gasoline octane number tends to increase first and then to plateau. EH-102 heating element/B-beam temperature is non-linear with respect to sulfur content and gasoline octane.
As can be seen from FIG. 9, the sulfur content increases and the octane number decreases as the pressure of the low pressure hot nitrogen increases. The pressure of the low-pressure hot nitrogen gas is in a linear relationship with the sulfur content and the gasoline octane number, is positively correlated with the sulfur content and is negatively correlated with the gasoline octane number.
As can be seen from FIG. 10, the octane number of gasoline decreases as the D-125 liquid level increases, the sulfur content decreases and then steadily increases. The D-125 liquid level is in a nonlinear relation with the sulfur content and in a linear relation with the gasoline octane number.
As can be seen from the attached figure 11, the sulfur content of the 2# catalytic gasoline increases along with the increase of the flow rate of the device, and then the sulfur content of the gasoline tends to increase, and the octane number of the gasoline tends to increase after being stabilized and then to be stabilized. The flow of the No. 2 catalytic gasoline inlet device has a nonlinear relation with the sulfur content and the gasoline octane number.
As can be seen from figure 12, the sulfur content decreased and the gasoline octane number decreased as the steady top pressure increased. The pressure at the top of the stabilizer tower is in a linear relationship with the sulfur content and the gasoline octane number, and is inversely related.
As can be seen from the attached figure 13, the sulfur content steadily increases after decreasing with the increase of the left exhaust temperature of K-101A, and the octane number of the gasoline steadily increases after decreasing and finally decreases. The temperature of the left exhaust of K-101A is in a nonlinear relation with the sulfur content and the octane number of the gasoline.
As can be seen from the attached figure 14, the sulfur content of the raw gasoline is in an increasing trend along with the increase of the sulfur content, and the octane number of the gasoline is in a trend of decreasing and then stabilizing after increasing. The sulfur content of the raw material gasoline is positively correlated with the sulfur content, and the octane number of the gasoline is in a nonlinear relation.
As can be seen from the attached figure 15, as the hydrogen flow at the hydrogen mixing point increases, the sulfur content gradually decreases and then increases, and the octane number of the gasoline tends to fluctuate and decrease. The hydrogen flow rate of the hydrogen mixing point has a nonlinear relation with the sulfur content and the gasoline octane number.
As can be seen from the attached figure 16, the sulfur content is in a steady trend after being reduced in a step manner along with the increase of the flow of the loosening air at the lower part of D-107, and the octane number of the gasoline is in a fluctuation and reduction trend. The flow of the loosening air at the lower part of the D-107 is in a nonlinear relation with the sulfur content and the octane number of the gasoline.
As can be seen from the attached figure 17, the sulfur content is in a trend of declining and then becoming steady along with the rise of the temperature at the bottom of the stabilizing tower, and the octane number of the gasoline is in a trend of declining and then becoming steady along with the rise of the temperature at the bottom of the stabilizing tower. The outlet temperature of the stabilizing tower bottom is in a nonlinear relation with the sulfur content and the gasoline octane number.
As can be seen from FIG. 18, the octane number of gasoline is on a steady rising trend as the sulfur content steadily decreases and then steadily increases as the regenerator temperature increases. Regenerator temperature is non-linear with sulfur content and gasoline octane number.
As can be seen from FIG. 19, the octane number of gasoline is in a step-up trend as the sulfur content of the R-102 bottom slide valve is increased and then steadily increases again after decreasing. The R-102 bottom slide valve has a non-linear relationship with sulfur content and gasoline octane number.
As can be seen from the attached figure 20, the sulfur content is in a trend of rising after steadily decreasing with the rising of S-ZORB.PT-1501 PV, and the octane number of the gasoline is in a trend of descending in steps. And the S-ZORB.PT-1501. PV has a nonlinear relation with the sulfur content and the gasoline octane number.
Through the attached figures 3-20, 18 corresponding change track graphs of the octane number and the sulfur content of the gasoline in the optimization adjustment process are shown. Outputting the processed data by using a random forest model, respectively setting value intervals and step lengths for 18 variables, predicting by using the trained random forest model and outputting the influence of the 18 variables on the octane number loss of the product. 3-20, the result shows the nonlinear characteristics of the random forest model, and simultaneously conforms to the nonlinear characteristics of the influence of variables on the result in industrial practice.
Those not described in detail in this specification are within the skill of the art. Although the present invention has been described in detail with reference to the foregoing embodiments, it will be apparent to those skilled in the art that various changes in the embodiments and modifications of the invention can be made, and equivalents of some features of the invention can be substituted, and any changes, equivalents, improvements and the like, which fall within the spirit and principle of the invention, are intended to be included within the scope of the invention.