LIBS iron ore pulp quantitative analysis method for screening PLS based on mutual information characteristics
1. The LIBS iron ore pulp quantitative analysis method for screening PLS based on mutual information characteristics is characterized by comprising the following steps:
and (3) off-line modeling: collecting laser-induced original spectral data of the ore pulp sample; performing mutual information characteristic screening on each original spectrum characteristic, and reserving the characteristic of non-zero mutual information quantity; establishing a PLS model of spectral intensity-concentration by using the new features after feature screening, and determining an optimal principal component number according to the interpretation variance and mean square error of a training set in the repeated iteration process; acquiring an optimal PLS model by using the optimal principal component number;
and (3) real-time detection: and acquiring real-time spectral data of an on-site ore pulp sample by using an on-site device, inputting the real-time spectral data into the PLS model of optimized spectral intensity-concentration, and acquiring the concentration content of a specified element in ore pulp.
2. The method for quantitative analysis of LIBS iron ore pulp for mutual information feature based screening PLS according to claim 1, wherein the collection of laser induced raw spectral data of the pulp sample is collected using a laser induced breakdown spectrometer.
3. The method for quantitative analysis of LIBS iron ore pulp for mutual information feature based screening PLS according to claim 1, wherein the off-line modeling comprises the steps of:
s1, data preprocessing and feature extraction: carrying out averaging and spectral line characteristic extraction processing on the obtained laser-induced breakdown spectrum original data of the substance to be detected;
s2, data set division: dividing the collected ore pulp sample data into a training set and a testing set; training sample samples are used for modeling, and testing samples are used for evaluating the prediction accuracy of the final model;
s3, calculating estimated mutual information between each dimension feature and the label in each column in the training set;
s4, removing a features with zero mutual information quantity in the training set;
s5, reserving a feature column in the test set, wherein the feature column is the same as the residual features in the training set;
s6, performing PLS modeling by using the residual feature iteration principal component number of the training set;
and S7, determining the number of the principal components by using the interpretation variance and the mean square error of the training set data, and optimizing the PLS model.
4. The method of claim 3, wherein the test set data are uniformly distributed in the concentration range of the training set samples when the data sets are divided, so as to achieve the most sufficient effect of evaluating the model performance.
5. The method for quantitative analysis of LIBS iron ore pulp for mutual information feature based screening PLS according to claim 1 or 3, wherein the obtaining of the estimated mutual information comprises:
mutual information between the characteristic spectral line X and the element concentration Y is calculated according to the following formula:
in the formula: p (X, Y) represents the probability of X, Y occurring simultaneously, and p (X) is the probability of X, p (Y) is the probability of Y, X is a certain column of spectral characteristic lines, and Y is a concentration label.
6. The method of quantitative analysis of LIBS iron ore pulp for mutual information feature based screening PLS according to claim 5, wherein the mutual information is estimated using a nearest neighbor method.
7. The method of claim 1, wherein a training set interpretation variance and a mean square error are calculated, and when both errors are within a fluctuation threshold, the current principal component number is an optimal value.
8. The LIBS iron ore slurry quantitative analysis system for screening PLS based on mutual information characteristics is characterized by comprising: the system comprises spectrum acquisition equipment, a processor and a memory; the spectrum acquisition equipment is used for acquiring laser-induced original spectrum data of the ore pulp sample; the memory stores a program module, and a processor reads a program to execute the method steps of any one of claims 1 to 7 to realize the concentration content prediction of the specified element in the current ore pulp sample;
an offline modeling program module: collecting laser-induced original spectral data of the ore pulp sample; performing mutual information characteristic screening on each original spectrum characteristic, and reserving the characteristic of non-zero mutual information quantity; establishing a PLS model of spectral intensity-concentration by using the new features after feature screening, and determining an optimal principal component number according to the interpretation variance and mean square error of a training set in the repeated iteration process; acquiring an optimal PLS model by using the optimal principal component number;
a real-time detection program module: the method comprises the steps of collecting real-time spectral data of an on-site ore pulp sample by using an on-site device, inputting an optimized spectral intensity-concentration PLS model, and obtaining a real-time characteristic extraction result of the spectral data of the current ore pulp sample, namely the concentration content of a specified element in the current ore pulp sample.
Background
A series of complex mineral separation processes are carried out from the raw ore of the iron ore to the steel, wherein the mineral flotation is a mineral separation method for separating target minerals from impurities according to the difference of physicochemical properties of the target minerals and the impurities so as to extract the target minerals from the raw ore pulp. Tailings are final products of flotation, and the iron tailings yield in China is about 4.76 hundred million tons according to statistics in 2018. The analysis of the iron grade of the tailings can help to judge the process performance of the whole flotation, and meanwhile, the method plays an important role in protecting the environment and recycling tailings resources.
At present, the method for detecting the ore pulp grade of the domestic ore dressing plant is chemical analysis, but the chemical analysis period is long, the hysteresis exists, and online detection and analysis cannot be realized. In recent years, new detection techniques have also emerged for on-line detection of pulp, such as X-ray fluorescence (XRF) analysis. The XRF analysis method can realize real-time online detection, but an XRF analyzer used for ore pulp online analysis cannot detect elements with atomic numbers being 20 or less, and X-rays are radioactive and have potential hazard. The laser-induced breakdown spectroscopy technology is a new detection technology and is called as 'future giant star' by the famous spectral analyst Winefordner in the world, has the advantages of multi-element simultaneous analysis, no need of sample treatment, small damage to samples, rapid analysis and real-time detection and the like, and compared with other methods, the LIBS is more favorable for detecting ore pulp.
When analyzing iron elements in iron ore pulp, the components of the ore pulp are complex and the self-absorption effect is relatively serious, and the requirement of quantitative analysis of the iron elements in the iron ore pulp cannot be met by adopting the traditional univariate quantitative analysis method, so that when analyzing the iron ore pulp, the multivariate analysis method is usually adopted to correct the self-absorption effect and the matrix effect. The partial least squares regression is a multivariate statistical analysis algorithm, can realize quantitative analysis under the condition of independent variable multiple correlation, can easily eliminate noise in the independent variable, and can solve the problem of difficult analysis in multivariate linear regression. The spectral data acquired by the LIBS contains a large amount of redundant information which is useless for component analysis, and modeling complexity is increased by modeling with full-spectrum data, so that the accuracy of the established model is not high and generalization capability is not high.
In order to reduce the modeling complexity, avoid the problems of excessive data dimensionality and reduce the interference of redundant information, the invention provides a laser-induced breakdown spectroscopy iron ore pulp quantitative analysis method based on mutual information characteristic screening partial least squares.
Disclosure of Invention
Aiming at the defects of the prior art, the invention aims to solve the problems of data redundancy and overfitting caused by the fact that PLS modeling is carried out due to the fact that the dimensionality of spectral data is too high, and the laser-induced breakdown spectroscopy is influenced by self-absorption effect and matrix effect when the composition analysis is carried out. Mutual information characteristic screening is introduced into the processing of spectral data, and a model based on mutual information characteristic screening partial least squares is provided to improve the accuracy of quantitative analysis of iron elements in the iron ore pulp tailings.
Therefore, the invention is realized by adopting the following technical scheme: the LIBS iron ore pulp quantitative analysis method for screening PLS based on mutual information characteristics comprises the following steps:
and (3) off-line modeling: collecting laser-induced original spectral data of the ore pulp sample; performing mutual information characteristic screening on each original spectrum characteristic, and reserving the characteristic of non-zero mutual information quantity; establishing a PLS model of spectral intensity-concentration by using the new features after feature screening, and determining an optimal principal component number according to the interpretation variance and mean square error of a training set in the repeated iteration process; acquiring an optimal PLS model by using the optimal principal component number;
and (3) real-time detection: and acquiring real-time spectral data of an on-site ore pulp sample by using an on-site device, inputting the real-time spectral data into the PLS model of optimized spectral intensity-concentration, and acquiring the concentration content of a specified element in ore pulp.
The laser-induced raw spectrum data of the ore pulp sample is acquired by adopting a laser-induced breakdown spectrometer.
The offline modeling comprises the following steps:
s1, data preprocessing and feature extraction: carrying out averaging and spectral line characteristic extraction processing on the obtained laser-induced breakdown spectrum original data of the substance to be detected;
s2, data set division: dividing the collected ore pulp sample data into a training set and a testing set; training sample samples are used for modeling, and testing samples are used for evaluating the prediction accuracy of the final model;
s3, calculating estimated mutual information between each dimension feature and the label in each column in the training set;
s4, removing a features with zero mutual information quantity in the training set;
s5, reserving a feature column in the test set, wherein the feature column is the same as the residual features in the training set;
s6, performing PLS modeling by using the residual feature iteration principal component number of the training set;
and S7, determining the number of the principal components by using the interpretation variance and the mean square error of the training set data, and optimizing the PLS model.
When the data set is divided, the data of the test set is uniformly distributed in the concentration range of the sample of the training set, so that the effect of most fully evaluating the performance of the model is achieved.
The obtaining of the estimated mutual information comprises:
mutual information between the characteristic spectral line X and the element concentration Y is calculated according to the following formula:
in the formula: p (X, Y) represents the probability of X, Y occurring simultaneously, and p (X) is the probability of X, p (Y) is the probability of Y, X is a certain column of spectral characteristic lines, and Y is a concentration label.
The mutual information is estimated using the nearest neighbor method.
7. The method of claim 1, wherein a training set interpretation variance and a mean square error are calculated, and when both errors are within a fluctuation threshold, the current principal component number is an optimal value.
The LIBS iron ore pulp quantitative analysis system for screening PLS based on mutual information characteristics comprises: the system comprises spectrum acquisition equipment, a processor and a memory; the spectrum acquisition equipment is used for acquiring laser-induced original spectrum data of the ore pulp sample; the memory is stored with a program module, and the processor reads the program to execute the method steps to realize the concentration content prediction of the specified elements in the current ore pulp sample;
an offline modeling program module: collecting laser-induced original spectral data of the ore pulp sample; performing mutual information characteristic screening on each original spectrum characteristic, and reserving the characteristic of non-zero mutual information quantity; establishing a PLS model of spectral intensity-concentration by using the new features after feature screening, and determining an optimal principal component number according to the interpretation variance and mean square error of a training set in the repeated iteration process; acquiring an optimal PLS model by using the optimal principal component number;
a real-time detection program module: the method comprises the steps of collecting real-time spectral data of an on-site ore pulp sample by using an on-site device, inputting an optimized spectral intensity-concentration PLS model, and obtaining a real-time characteristic extraction result of the spectral data of the current ore pulp sample, namely the concentration content of a specified element in the current ore pulp sample.
The invention has the following beneficial effects and advantages:
the invention carries out quantitative analysis of LIBS based on the mutual information partial least square model, reduces the modeling complexity caused by data redundancy and the influence caused by self-absorption and matrix effect, improves the accuracy of ore pulp grade analysis, and can be practically applied to the field monitoring of ore pulp grade analysis and monitoring of a dressing plant.
Drawings
FIG. 1 is a flow chart of the present invention.
FIG. 2 shows the relationship between PLS principal component number and training set interpretation variance and mean square error.
Fig. 3 shows characteristic positions of mutual information retention.
Figure 4 is a graph comparing predicted values with actual values for iron concentrate pulp.
Detailed Description
In order to make the above objects, features and advantages of the present invention more comprehensible, the technical solution of the present invention is further described below with reference to an example of LIBS grade analysis of tailing pulp. In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present invention. This invention may, however, be embodied in many different forms and should not be construed as limited to the embodiments set forth herein, but rather should be construed as modified in the spirit and scope of the present invention as set forth in the appended claims.
Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. The terminology used in the description of the invention herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention.
Example (b): a laser-induced breakdown spectroscopy iron ore pulp quantitative analysis based on mutual information characteristic screening partial least squares is disclosed, and a flow chart is shown in figure 1, and specifically comprises the following steps:
(1) and (4) preprocessing data. Carrying out mean value processing on the obtained laser-induced breakdown spectrum original data of the substance to be detected; a total of spectral matrices of 400X 6116 dimensions were obtained for 40 samples, each sample having 10 spectral data. And (3) averaging 10 pieces of spectral data acquired by each sample, and finally obtaining a 40 x 6116 dimensional spectral data matrix by 40 samples.
(2) And dividing a training set and a testing set. The total number of the samples is N, N _ train samples are selected as training samples, modeling is carried out by using the samples, N _ test samples are used as testing samples and used for evaluating the prediction accuracy of a final model, and the testing samples are uniformly distributed in the concentration range of the training samples when being divided, so that the effect of most fully evaluating the performance of the model is achieved.
The total number of samples was 40, 10 samples were randomly selected as the test set, and the remaining 30 samples were used as the training set. When the training sample and the test sample are divided, the concentration range of the test set is uniformly distributed in the concentration range of the training set, and the samples with the maximum concentration and the minimum concentration are in the training set.
(3) Calculating estimated mutual information between each dimension characteristic and the label in the training set;
the concrete description is as follows: due to the influence of the matrix effect and the self-absorption effect, the characteristic spectral line and the element concentration have a nonlinear relation besides a linear relation. Mutual information, which may reflect any relationship between variables. When the two measured variables are independent of each other, the corresponding mutual information is zero, and when the two variables have a certain correlation, the obtained mutual information will be a certain positive value. And the characteristic screening is carried out by using a mutual information theory to reduce the data dimensionality and reduce the influence of the interference of redundant information.
According to the definition of entropy, mutual information of two discrete random variables X and Y can be written as:
in the formula: p (X, Y) represents the probability of X, Y occurring simultaneously, and p (X) is the probability of X, p (Y) is the probability of Y, X is a certain sequence of spectral features, and Y is a concentration tag.
The probability distribution of the random variables must be known in advance, as is known from the definition of information entropy or mutual information. However, in real world applications, the true probability distribution of the data is generally unknown. The mutual information is approximated by a method of estimating the probability density or entropy without parameters. This time, the mutual information is estimated by using the nearest neighbor method of Kraskov Alexander and the like, and the estimation is mainly realized by a mutual _ info _ regression function packet in python.
(4) Removing a-dimensional features with mutual information amount of 0 in the training set, as shown in fig. 3, and preserving feature positions for mutual information, and it can be seen from fig. 3 that silicon, iron, calcium and sodium are preserved element features;
(5) the test set reserves a characteristic column which is the same as the residual dimension characteristic of the training set;
(6) performing PLS modeling by using residual features of the training set;
the conventional PLS model can establish the relation between the concentration of the element place value and the spectral data:
c is gradeValue concentration, betaiIs a regression coefficient, IiIs the characteristic spectral line intensity value.
(7) And comprehensively considering the number of principal components of the PLS model by using the training set interpretation variance and the mean square error. The number of the main components is the dimension number of the PLS model after optimization. As shown in FIG. 2, the relationship between PLS principal component number and training set solution variance and mean square error is explained. Specifically, a training set interpretation variance and a mean square error are calculated, and when the two errors approach to be stable, the current principal component number is an optimal value (the interpretation variance and the mean square error tend to converge when the principal component number is 8 in fig. 2), so that the PLS model is optimized.
And (5) result verification: FIG. 4 is a graph showing the comparison between the predicted values and the true values of the training set and the test set obtained by the method. It can be seen from fig. 4 that the fitting effect and accuracy of the test set are well-behaved.
TABLE 1
Table 1 compares the results of the method of the invention, which achieves the lowest predictive decision error (MAE) and the highest decision coefficient R, with the full spectrum PLS and linear variable screening PLS methods2. The method has obvious improvement effect on quantitative analysis by reducing the dimensionality of spectral data with higher dimensionality and reducing the influence caused by self-absorption and matrix effect.
The tailing ore pulp is adopted in the embodiment, which is only a preferred embodiment, and the analysis can be carried out according to different application objects during specific implementation, so that the selected spectral lines and the number of main components can be adjusted.
The above-described embodiments are intended to illustrate the present invention, but not to limit the present invention, and any modifications, equivalents, improvements, etc. made within the spirit of the present invention and the scope of the claims fall within the scope of the present invention.
- 上一篇:石墨接头机器人自动装卡簧、装栓机
- 下一篇:一种污秽化合物识别方法和装置