High-precision sea surface temperature inversion method based on machine learning
1. A high-precision sea surface temperature inversion method based on machine learning is characterized by comprising the following steps:
step (1), combining time series sea surface temperature measured data;
step (2), preparing remote sensing image data with a time domain similar to the sea surface measured temperature data;
step (3), establishing whether the remote sensing image has a cloud judgment standard;
step (4), carrying out time correlation on the data obtained in the step (1) and the step (2), matching remote sensing image data close to the spatial position of the actually measured sea surface temperature, judging whether the data is cloud or not according to the step (3), and obtaining waveband data and auxiliary data in the remote sensing image and whether the cloud exists or not through matching to be used as an original characteristic X';
step (5), matching the data in the step (4) with the actually measured sea surface temperature data (Y) to form a machine learning training sample set;
step (6), establishing an expansion characteristic standard according to the inversion research of the current sea surface temperature, and expanding the original characteristic X' to form a new characteristic X;
step (7), standardizing all the characteristics X;
step (8), constructing a machine learning submodel, training data by using a machine learning regression model, namely a random forest regression model, a support vector machine regression model, an XGboot regression model and an ANN model as submodels, and obtaining Level0 output data;
step (9), integrating Level0 layer output by taking logistic regression as an integrated model, Level0 output data as input and a least square algorithm as a regression test algorithm to obtain Level1 layer output data;
step (10), constructing an inversion result evaluation index of a machine learning model;
and (11) carrying out model evaluation according to the evaluation indexes in the step (10) by using the output data in the step (9) and the corresponding actually measured data result pair.
2. The machine learning-based high-precision sea surface temperature inversion method according to claim 1, characterized in that in the step (4), the waveband data and auxiliary data in the remote sensing image obtained by matching and whether cloud exists are used as original features X', and the method is as follows;
step 41, according to the remote sensing data with the same time matching shape of the actually measured data, the file name of the remote sensing data comprises time information;
step 42, taking the space position of the measured sea surface temperature as the center to extract the remote sensing data, and taking the position of the measured data as the center to establish a 3 multiplied by 3 window to extract the average value of the remote sensing data related to the sea surface temperature;
step 43, checking whether the remote sensing data extracted in the step 2 has a cloud layer or not according to the step 1; as long as a cloud layer exists in one window, a label of "YL ═ 1" is added in the extracted data, and a label of "YL ═ 0" is added in the non-cloud data; and matching the satellite zenith angle, the satellite azimuth angle, the solar zenith angle and the solar azimuth angle to obtain waveband data, auxiliary data and cloud labels in the remote sensing data as the original characteristic X'.
3. The machine learning-based high-precision sea surface temperature inversion method according to claim 1, wherein in the step (5), a machine learning training data set is formed by matching the data set obtained in the step (4) with the measured sea surface temperature data (Y), and is stored as a CSV format file.
4. The machine learning-based high-precision sea surface temperature inversion method according to claim 1, wherein in the step (6), the new feature X is formed by the steps of;
according to the current sea surface temperature inversion research, all the wave band subtraction and division in the original characteristics are expanded into new characteristics, the wave band subtraction and division expansion characteristic mode is shown as the following formula,
feature2=bandi-bandj
wherein the bandiDenotes the i-th bandjIndicating the j-th band.
5. The machine learning-based high-precision sea surface temperature inversion method according to claim 1, wherein in the step (7), all the characteristics X are normalized by the following method;
the normalized data is obtained by subtracting the mean value and then dividing by the variance (or standard deviation), and the data normalized by this data normalization method conforms to the standard normal distribution, i.e. the mean value is 0, the standard deviation is 1, and the transformation function is as the expression:
where μ represents the mean and o represents the variance.
6. The machine learning-based high-precision sea surface temperature inversion method according to claim 1, wherein in the step (8), a machine learning submodel is constructed, and a machine learning regression model is used, specifically as follows:
step 81, the random forest regression uses CART regression tree, utilizes the minimum mean square error to search each characteristic and characteristic value division point, the expression is,
further establishing a random forest regression model (RFR), wherein c1As a sample D1Average value of c2As a sample D2A represents a feature, s represents a cut point;
and 82, selecting 80% of data in the sample data set for training, and performing random forest key parameter: iterating the number (n _ estimators) of trees and the maximum characteristic number (max _ features), setting the iteration number to be 1000, obtaining the optimal parameters of the model, and establishing an optimal random forest model;
step 83, inverting all sample data according to the optimal random forest model in the step 2 to obtain a random forest predicted sea surface temperature sequence P1;
Step 84, based on the Gaussian kernel function, the expression is as follows:
constructing a regression model of a support vector machine by using scimit-lern;
85, selecting 80% of data in the sample data set to carry out training of the support vector machine model, and inverting all sample data based on the training model to obtain regression of the support vector machinePredicted sea surface temperature sequence P2;
Step 86, constructing a multilayer neural network according to the characteristic quantity, wherein input parameters are standardized characteristics, output parameters are sea surface temperature, and the expression is as follows;
wherein xi(i ═ 1, …), N0 refers to an element in the input layer that represents the satellite zenith angle, satellite azimuth angle, solar zenith angle, solar azimuth angle, and the original band feature and extended feature totaling 36 variables; n0 ═ 36 denotes the number of elements in the input layer; n is a radical of1、N2And N3Representing the number of neurons in the three hidden layers; w is a1,ji,w2,kj,w3,lk,w4,mlRepresenting the weights of the three hidden layers and the output layer; b1,j,b2,k,b3,lAnd b4,mRepresenting deviation values of the three hidden layers and the output layer, wherein the weight and the deviation values are determined by a training algorithm; f represents the hyperbolic tangent sigmoid function in the equation, ymThe m-th element in the output layer is represented, the remote sensing reflectivity of the m-th waveband is indicated, and meanwhile, the maximum value of m is 10, and 10 bright temperature wavebands of Himapari satellite data are represented;
step 87, constructing the multilayer neural network according to the step 6, selecting 80% of sample data for iterative training, inverting all the sample data by the optimal model to obtain a sea surface temperature sequence P predicted by the multilayer neural network3。
7. The machine learning-based high-precision sea surface temperature inversion method of claim 1, characterized in that in the step (9), the method for obtaining Level1 layer output data is;
step 91, taking logistic regression as an integrated model, wherein the expression of the model is as follows:
wherein P isiRepresents the output value (P) in step (8)1,P2,P3),wiRepresents a corresponding weight value;
step 92, in a least square method, according to the expression:
as a solution to the above expressionCalculated when Error is minimized, wiIs thus constructedBuilding an integrated model of logistic regression;
and step 93, inverting the simulation values of all samples by using the integrated model constructed in the step 2Where i belongs to the sample data set n samples i.
8. The machine learning-based high-precision sea surface temperature inversion method according to claim 1, characterized in that in step (10), a machine learning model inversion result evaluation index is constructed: and constructing an evaluation index for evaluating the integrity of the model based on the interpretable variance, the average absolute error, the mean square error and the decision coefficient of machine learning, and taking the scatter diagram of the sea surface temperature and the actually measured temperature estimated by the model as the evaluation indexes of different temperature periods.
9. The machine learning-based high-precision sea surface temperature inversion method according to claim 8, wherein in the step (10), the formula for interpretable variance, mean absolute error, mean square error and decision coefficient is as follows:
Background
Sea Surface Temperature (SST) is a basic physical variable for estimating complex relationship between the Sea and the atmosphere, has important influence on the environment and sustainable management of fishery, and can provide important information for activities and planning of aquaculture. With reference to the research of sea surface temperature in the existing scientific literature, the simple linear regression algorithm estimates the sea surface temperature and is easily influenced by the changes of the atmosphere and the sea area to reduce the inversion accuracy, and some nonlinear algorithms can eliminate the influence of the atmosphere and the sea area to a certain extent, but the inversion accuracy is relatively low due to external uncertain factors. In addition, the sea temperature of cloud-containing remote sensing data cannot be predicted due to cloud layer interference.
The most extensive inversion algorithm in the prior art is a nonlinear sea surface temperature inversion algorithm, which can estimate the sea surface temperature with higher precision, but is oriented to the global sea temperature, and the nonlinear algorithm mainly shows the following characteristics: the algorithm cannot invert the sea surface temperature containing cloud data, only can remove data shielded by the cloud, and estimate the sea surface temperature of clear air data; the algorithm needs complex data preprocessing, and an expression of the estimated sea temperature is obtained in a fitting mode.
For example, chinese patent application 201510812028.X discloses a sea surface temperature inversion method and system based on Landsat8 data. The method comprises the following steps: reading optical and thermal infrared remote sensing images of Landsat8 and atmospheric profile data corresponding to the optical and thermal infrared remote sensing images; based on water body and seawater emissivity data, combining with a spectral response function of a Landsat8 thermal infrared sensor TIRS, taking an atmospheric profile as a drive, and simulating the channel brightness temperature of the two TIRS channels at the top of an atmospheric layer under the combination conditions of different atmospheric conditions, different sea surface temperatures, emissivity and the like; and dividing the water vapor content of the atmospheric column into a plurality of intervals by using the obtained water vapor content of the atmospheric column, constructing a sea surface temperature inversion algorithm of each water vapor content interval by combining a split window algorithm, further calculating the sea surface temperature and outputting a result.
The sea surface temperature inversion algorithm is constructed by a split window algorithm, and although the influence of atmosphere and sea area can be eliminated to a certain extent, the inversion accuracy is relatively low due to external uncertain factors; in addition, the sea temperature of cloud-containing remote sensing data cannot be predicted due to the fact that the sea temperature is prone to cloud layer interference.
Disclosure of Invention
Based on this, in order to solve the above problems, the primary object of the present invention is to provide a high-precision sea surface temperature inversion method based on machine learning, which provides a high-precision sea surface temperature consistency inversion based on machine learning under cloud and non-cloud conditions, aiming at the defects that the existing sea surface temperature inversion algorithm model parameters are complicated to construct, the inversion precision is easily affected by spectral noise, and especially the sea surface temperature with cloud remote sensing data cannot be estimated.
The invention also aims to provide a high-precision sea surface temperature inversion method based on machine learning, the inversion model constructed by the method can effectively solve the problems, the machine learning model constructed by stack generalization can reduce the influence of cloud layers on different wave band spectrum noise, and the problem that the traditional algorithm can not estimate the cloud-containing remote sensing data by inverting the sea surface temperature of the cloud-containing remote sensing data by matching the expanded brightness temperature wave band characteristics with the actual temperature is solved. In addition, the constructed model does not need to preprocess data, and the matched data is directly used for inversion of sea surface temperature, so that complicated inversion steps are reduced, and the inversion accuracy of the sea surface temperature is improved.
The invention further aims to provide a high-precision sea surface temperature inversion method based on machine learning, which is characterized in that Hiwari-8 data are selected, high-precision sea surface temperature inversion is carried out based on a machine learning algorithm by utilizing a stack generalization technology, a data set meeting the inversion sea surface temperature of a machine learning model is favorably established, partial spectrum noise is eliminated through a feature expansion method, the learning precision of a single machine is improved, and sea surface temperature inversion errors caused by cloud noise are eliminated through the stack generalization technology.
In order to achieve the purpose, the invention provides the following technical scheme:
a high-precision sea surface temperature inversion method based on machine learning comprises the following steps:
step (1), combining time series sea surface temperature measured data;
step (2), preparing remote sensing image data with a time domain similar to the sea surface measured temperature data;
step (3), establishing whether the remote sensing image has a cloud judgment standard;
step (4), carrying out time correlation on the data obtained in the step (1) and the step (2), matching remote sensing image data close to the spatial position of the actually measured sea surface temperature, judging whether the data is cloud or not according to the step (3), and obtaining waveband data and auxiliary data in the remote sensing image and whether the cloud exists or not through matching to be used as an original characteristic X';
and (5) matching the data in the step (4) with the actually measured sea surface temperature data (Y) to form a machine learning training sample set.
Step (6), establishing an expansion characteristic standard according to the inversion research of the current sea surface temperature, and expanding the original characteristic X' to form a new characteristic X;
step (7), standardizing all the characteristics X;
step (8), constructing a machine learning submodel, training data by using a machine learning regression model, namely a random forest regression model, a support vector machine regression model, an XGboot regression model and an ANN model as submodels, and obtaining Level0 output data;
step (9), integrating Level0 layer output by taking logistic regression as an integrated model, Level0 output data as input and a least square algorithm as a regression test algorithm to obtain Level1 layer output data;
step (10), constructing an inversion result evaluation index of a machine learning model;
and (11) carrying out model evaluation according to the evaluation indexes in the step (10) by using the output data in the step (9) and the corresponding actually measured data result pair.
Compared with the prior art, the invention has the beneficial effects that:
according to the invention, the sea surface temperature can be estimated with high precision by extracting sample data and training a test model, the specific influence degree directly solves the influence factors of the external environment through data by using the advantages of a machine learning algorithm, the weakening degree of cloud layers detected in different wave bands on remote sensing data can be realized through a machine learning model constructed by stacking generalization, and the problem that the cloud-containing remote sensing data cannot be estimated by a traditional algorithm is solved by inverting the sea surface temperature of the cloud-containing remote sensing data through matching of the weakened brightness temperature and the measured temperature. In addition, the constructed model does not need to preprocess data, and the matched data is directly used for inversion of sea surface temperature, so that complicated inversion steps are reduced, and the inversion accuracy of the sea surface temperature is improved.
In addition, Himapari-8 data are selected, and high-precision sea surface temperature inversion is carried out on the basis of a machine learning algorithm by utilizing a stack generalization technology. The Himapari-8 data has the time resolution of a 10-min ultrashort period, can be effectively matched with actually measured sea surface temperature data, is favorable for establishing a data set meeting the inversion sea surface temperature of a machine learning model, eliminates partial frequency spectrum noise through a feature extension method, improves the learning precision of a single machine, and eliminates the sea surface temperature inversion error caused by cloud noise by using a stack generalization technology.
Moreover, the invention avoids the inversion of the sea surface temperature by combining different algorithms (with cloud and without cloud), and can be used as a consistent sea surface temperature inversion algorithm.
Drawings
FIG. 1 is a flow chart of an implementation of the present invention.
FIG. 2 is a scatter plot of the cloud-free data estimates of the present invention.
FIG. 3 is a scatter plot of cloud data estimates in accordance with the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
As shown in fig. 1, which is a flow chart for implementing the present invention, a model is trained on constructed sample data, a trained machine sub-model estimates the sea surface temperature, and finally the inversion model performance is evaluated through constructed evaluation indexes.
The various implementation steps are described separately below to provide reference.
Step (1), combining time series sea surface temperature measured data;
acquiring the actually measured sea surface temperature with time-space information through a meteorological website, arranging data in a time sequence, adding an index according to the information and storing the data in a TXT format;
step (2), preparing remote sensing image data with a time domain similar to the sea surface measured temperature data;
and (3) matching the remote sensing data with the nearest time according to the Himapari remote sensing data in the L1-grade NC format, which is obtained in the step (1) and is matched with the actual sensing data within 30min before and after the actual sensing data is matched with the actual sensing data.
Step (3), establishing whether the remote sensing image has a cloud judgment standard;
judging whether the remote sensing data contains cloud according to the remote sensing data center wave band of 1.6 μm and the threshold value of 0.0215, judging that cloud shielding exists when the remote sensing data center wave band is larger than the threshold value, and judging that the remote sensing data center wave band is smaller than clear air data;
step (4), carrying out time correlation on the data obtained in the step (1) and the step (2), matching remote sensing image data close to the spatial position of the actually measured sea surface temperature, judging whether the data is cloud or not according to the step (3), and obtaining waveband data and auxiliary data in the remote sensing image and whether the cloud exists or not through matching as original characteristics X', which are concretely as follows;
step 41, according to the remote sensing data with the same time matching shape of the actually measured data, the file name of the remote sensing data comprises time information;
step 42, taking the space position of the measured sea surface temperature as the center to extract the remote sensing data, and taking the position of the measured data as the center to establish a 3 multiplied by 3 window to extract the average value of the remote sensing data related to the sea surface temperature;
step 43, checking whether the remote sensing data extracted in the step 2 has a cloud layer or not according to the step 1; as long as a cloud layer exists in one window, a label of "YL ═ 1" is added in the extracted data, and a label of "YL ═ 0" is added in the non-cloud data; and matching the satellite zenith angle, the satellite azimuth angle, the solar zenith angle and the solar azimuth angle to obtain waveband data, auxiliary data and cloud labels in the remote sensing data as the original characteristic X'.
And (5) matching the data in the step (4) with the actually measured sea surface temperature data (Y) to form a machine learning training data set.
And (5) matching the data set obtained in the step (4) with the actually measured sea surface temperature data (Y) to form a machine learning training data set, and storing the machine learning training data set as a CSV format file.
Step (6), establishing an expansion characteristic standard according to the current sea surface temperature inversion research to expand the original characteristic X' and form a new characteristic X;
according to the current sea surface temperature inversion research, all the wave band subtraction and division in the original characteristics are expanded into new characteristics, and the wave band subtraction and division expansion characteristic modes are as follows 1) and 2),
wherein the bandiDenotes the i-th bandjIndicating the j-th band.
feature2=bandi-bandj 2)
Step (7), standardizing all the characteristics X;
the data is normalized by subtracting the mean and then dividing by the variance (or standard deviation), and this data normalization method is to fit the standard normal distribution after processing, i.e. the mean is 0, the standard deviation is 1, and the transformation function is as shown in expression 3), where μ represents the mean and o represents the variance:
step (8), constructing a machine learning submodel, training data by using a machine learning regression model, namely a random forest regression model, a support vector machine regression model, an XGboot regression model and an ANN model as submodels, and obtaining Level0 output data;
step 81, the random forest regression uses CART regression tree, utilizes minimum Mean Square Error (MSE) to search each characteristic and characteristic value division point, the expression is as shown in 4), and further establishes a random forest regression modelType (RFR) wherein c1As a sample D1Average value of c2As a sample D2A represents a feature, s represents a cut point;
and 82, selecting 80% of data in the sample data set for training, and performing random forest key parameter: iterating the number (n _ estimators) of trees and the maximum characteristic number (max _ features), setting the iteration number to be 1000, obtaining the optimal parameters of the model, and establishing an optimal random forest model;
step 83, inverting all sample data according to the optimal random forest model in the step 2 to obtain a random forest predicted sea surface temperature sequence P1;
Step 84, constructing a regression model of the support vector machine by using scimit-lern based on a Gaussian kernel function with the expression as 5);
85, selecting 80% of data in the sample data set to carry out training of the support vector machine model, inverting all sample data based on the training model, and obtaining a sea surface temperature sequence P supporting regression prediction of the vector machine2;
Step 86, constructing a multilayer neural network according to the characteristic quantity, wherein the input parameter is the standardized characteristic, the output parameter is the sea surface temperature, and the multilayer neural network can be expressed as shown in the expression 6);
wherein xi(i ═ 1, …), N0 refers to an element in the input layer that represents the satellite zenith angle, satellite azimuth angle, solar zenith angle, solar azimuth angle, and the original band feature and extended feature totaling 36 variables; n0 ═ 36 denotes the number of elements in the input layer; n is a radical of1、N2And N3Representing the number of neurons in the three hidden layers; w is a1,ji,w2,kj,w3,lk,w4,mlRepresenting the weights of the three hidden layers and the output layer; b1,j,b2,j,b3,lAnd b4,mThe deviation values of the three hidden layers and the output layer are shown. The values of the weights and the deviations are determined by a training algorithm. 6) In the formula, f represents a hyperbolic tangent sigmoid function in the equation. y ismThe m-th element in the output layer is represented, the remote sensing reflectivity of the m-th waveband is indicated, and meanwhile the maximum value of m is 10, and the Himapari satellite data 10 bright temperature wavebands are represented.
Step 87, constructing the multilayer neural network according to the step 6, selecting 80% of sample data for iterative training, inverting all the sample data by the optimal model to obtain a sea surface temperature sequence P predicted by the multilayer neural network3。
Step (9), integrating Level0 layer output by taking logistic regression as an integrated model, Level0 output data as input and a least square algorithm as a regression test algorithm to obtain Level1 layer output data;
step 91, taking logistic regression as an integration model, wherein the expression of the model is shown as 7), and P isiThe output values (P1, P2, P3), w in (8)iRepresenting the corresponding weight values.
Step 92. calculating w when 8) is the minimum, using the least squares method, such as expression 8), as a criterion for solving 7)iIs thus constructedBuilding an integrated model of logistic regression;
step 93. withInverting the simulation values of all samples by the integrated model constructed in the step 2i belongs to a sample data set n samples i;
step (10), constructing an inversion result evaluation index of a machine learning model;
constructing an inversion result evaluation index of a machine learning model: and constructing an evaluation index for evaluating the integrity of the model based on the interpretable variance 9) of the machine learning, the average absolute error 10), the mean square error 11) and the decision coefficient 12), and using the scatter diagrams of the sea surface temperature and the measured temperature estimated by the model as the evaluation indexes of different temperature periods. The formula that can account for variance, mean absolute error, mean square error, and coefficient of determination is as follows:
and (11) carrying out model judgment according to the evaluation indexes in the step (10) by comparing the output data in the step (9) with the corresponding actually measured data result.
And (5) writing the estimated sea surface temperature of the model in the step (9) and the corresponding measured temperature into the step (10), and evaluating the estimation capability of the model through the constructed evaluation index.
The evaluation effect is shown in fig. 2 and 3, and the sea surface temperature can be accurately reflected.
According to the invention, the weakening degree of the remote sensing data by different wave band detection cloud layers can be realized through the machine learning model constructed by stack generalization, and the problem that the cloud-containing remote sensing data cannot be estimated by a traditional algorithm is solved by inverting the sea surface temperature of the cloud-containing remote sensing data through matching of the weakened brightness temperature and the measured temperature. In addition, the constructed model does not need to preprocess data, and the matched data is directly used for inversion of sea surface temperature, so that complicated inversion steps are reduced, and the inversion accuracy of the sea surface temperature is improved.
In summary, the advantages of the present invention are as follows:
1. the constructed high-precision sea surface temperature inversion method can provide a substitution scheme for global sea surface temperature inversion.
2. The remote sensing data sea surface temperature inversion containing cloud noise is realized based on a machine learning technology, and the defect that the traditional algorithm cannot estimate is overcome.
3. The stack generalization technology is utilized to construct the consistent sea surface temperature inversion, the inversion of sea surface temperature by combining different algorithms (with cloud and without cloud) is avoided, and the method can be used as a consistent sea surface temperature inversion algorithm.
The above description is only for the preferred embodiment of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art should be able to cover the technical solutions and the inventive concepts of the present invention within the technical scope of the present invention.