Electroencephalogram data reconstruction method based on reinforcement learning self-encoder
1. An electroencephalogram data reconstruction method based on a reinforcement learning self-encoder is characterized by comprising the following steps:
s1, acquiring original data:
building a simulation platform by using Openvibe, and designing a P300 signal stimulation paradigm for simulating road walking; simultaneously selecting N tested young years in the same age group, walking on a simulated road, and collecting the electroencephalogram data with the length of T under the stimulation of gold coins suddenly appearing in the walking process;
s2, data processing:
preprocessing the acquired original electroencephalogram data, and eliminating noise interference by adopting high-pass filtering and low-pass filtering; eliminating the electro-ocular interference by independent component analysis; eliminating the interference pseudo-note to obtain electroencephalogram data for subsequent model training;
s3, constructing and training a network EEG-RL-AE model:
s3-1: initialization:
the network EEG-RL-AE model comprises an auto-encoder AE, a reinforcement learning auto-encoder and a data discriminator D, and the parameters of the auto-encoder AE and the parameters of the reinforcement learning auto-encoder are initialized;
s3-2: training the self-encoder:
inputting electroencephalogram data serving as a training set, then training the input electroencephalogram data by using a self-encoder, and updating parameters of the self-encoder AE to obtain a better parameter model;
s3-3: training a reinforcement learning self-encoder:
inputting electroencephalogram data serving as a training set, then training the input electroencephalogram data by using the reinforcement learning self-encoder, and updating parameters of the reinforcement learning self-encoder to obtain a better parameter model;
s3-4: and (3) feeding back a result:
inputting test data, obtaining results through the action of a self-encoder and a reinforcement learning self-encoder, comparing the similarity of the output of the two encoders and the input data by using a comparator, and taking the data with higher similarity as a final result;
s4, evaluating a result:
and comparing the similarity of the reconstructed data and the original data to evaluate the reconstruction effect of the model data.
2. The electroencephalogram data reconstruction method based on the reinforcement learning self-encoder as claimed in claim 1, wherein the step S3 is implemented as follows:
s3-1: initialization
EEG-RL-AE model is composed of self-encoders AE andthe reinforced learning self-encoder RL-AE and the data discriminator D are respectively composed of three modules, namely a parameter mu of two encodersAE、μRL-AECarrying out initialization operation;
the activation functions of the layers of the self-encoder AE and the reinforcement learning self-encoder RL-AE network are ELU functions, wherein x represents a sample, and the formula is shown as formula (1):
ELU(x)=max(0,x)+min(0,α*(exp(x)-1))#(1)
where α is a hyperparameter, default is 1.0;
the self-encoder AE is trained using the mean square error of the MESloss as a loss function, where x represents the input data sample, y represents the sample generated by the training model, and i represents each value in the sample matrix, as shown in equation (2):
Loss(xi,yi)=(xi-y)2 #(2)
the activation function of the last layer of AE and RL-AE is Tanh, as in equation (3):
in addition, the data discriminator D compares the similarity between the data generated by AE and RL-AE and the input data by using MESloss;
s3-2: training self-encoder
Reading the electroencephalogram data in the folder, and importing the electroencephalogram data into a self-encoder; the following work then starts:
and (3) encoding: the first step is to extract features from the input image by using a convolutional neural network; specifically, three layers of one-dimensional convolutional layers are adopted, wherein the former two layers of one-dimensional convolutional layers each adopt a convolutional kernel sliding window with the length of 3, the step length is set to be 1, the filling value is 1, the last layer adopts the arrangement that the convolutional kernel is 4, the step length is 1, and the filling is 1, and the arrangement is used for feature extraction and coding;
and (3) decoding: the work of recovering low-level features from feature vectors is completed by utilizing deconvolution layers, three layers of one-dimensional deconvolution layers are adopted, wherein the convolution kernel of the first convolution layer is 4, the step length is 1, the filling is 1, the normalization is carried out after the middle deconvolution layer, the activation is carried out by using a ReLU function, each layer of one-dimensional convolution layer adopts a convolution kernel sliding window with the length of 3, the step length is set to be 1, the filling value is 1, the Tanh activation function is adopted after the last layer of deconvolution, and finally generated EEG data are obtained;
after the operation is finished, the parameters of the self-encoder model are saved;
s3-3: training reinforcement learning self-encoder
The algorithm used by the reinforcement learning self-encoder is DDPG, and the DDPG algorithm mainly comprises four network models;
the Actor online policy network: is responsible for the iterative update of the strategy network parameter theta, is responsible for selecting the current action A according to the current state S and is used for interacting with the environment to generate the next state S', and the reward function R
The Actor target policy network: responsible for selecting the optimal next action A ' based on the next state S ' sampled in the empirical playback pool, the network parameters theta ' being periodically copied from theta
Critic online scoring network: the method is responsible for iterative updating of a value network parameter w, a discount factor gamma and calculation of a current Q value Q (S, A, w); the target Q ﹡ value under ideal conditions is expressed by the formula (4):
Q﹡=R+γQ'(S',A',w')#(4)
critic goal scoring network: the portion responsible for calculating the target Q ﹡ value; the network parameter w' is copied from w periodically, besides, DDPG can add some randomness for the learning process, and increase the learning coverage, the model adds a certain matrix vector N to the selected action a, that is, the expression of the action a finally interacting with the environment is shown in formula (5):
A=πθ(S)+N#(5)
wherein pi represents the currently selected strategy, theta is a network parameter, and S is the current state; regarding the loss function of the DDPG algorithm, the loss function used is the mean square error;
the reward function R is improved, x is an input test data sample, wherein alpha is a discount coefficient (0-1), and LtThe square mean error of the data obtained by the DDPG model at this time and the original data is obtained; l ist-1Data and original data obtained last time for DDPG modelThe mean square error of; l isAEThe square error of the data obtained by the test data directly from the encoder and the original data is obtained; the formula is shown as (6):
R(x)=-(α*(Lt-Lt-1)+(1-α)*(Lt-LAE))#(6)
reading the electroencephalogram data in the folder, and importing the electroencephalogram data into a reinforcement learning self-encoder; the following work then starts:
and (3) encoding: the first step is to extract features from the input image by using a convolutional neural network; specifically, three layers of one-dimensional convolutional layers are adopted, wherein the former two layers of one-dimensional convolutional layers each adopt a convolutional kernel (kernel) sliding window with the length of 3, the step length (Stride) is set to be 1, the Padding (Padding) value is set to be 1, the last layer adopts the arrangement that the convolutional kernel is 4, the step length is 1, and the Padding is 1, and the arrangement is used for extracting and coding the characteristics;
and (3) decoding: the work of recovering low-level features from feature vectors is completed by utilizing deconvolution layers, three layers of one-dimensional deconvolution layers are adopted, wherein the convolution kernel of the first convolution layer is 4, the step length is 1, the filling is 1, the normalization is carried out after the middle deconvolution layer, the activation is carried out by using a ReLU function, each layer of one-dimensional convolution layer adopts a convolution kernel sliding window with the length of 3, the step length is set to be 1, the filling value is 1, the Tanh activation function is adopted after the last layer of deconvolution, and finally generated EEG data are obtained;
training parameters of a reinforcement learning DDPG algorithm: after the coding and decoding model is trained, the model parameters are saved; importing data into a coding model, then obtaining a matrix vector through the coding model, entering a reinforcement learning DDPG algorithm, training the coded matrix vector, adding a new matrix vector value, then putting the matrix vector into a decoding model, and finally obtaining generated EEG data;
s3-4: feedback of results
The operation of the discriminator D: loading the data generated by the models of S3-2 and S3-3 into a discriminator, comparing the data with the input test data through a mean square error formula, and outputting the data with smaller mean square error value as a final EEG result;
obtaining a trained model through the steps; inputting a test data set into the trained model, then obtaining generated EEG data through a self-encoder and a reinforcement learning self-encoder respectively, and then inputting the two obtained data into a discriminator D to obtain a final EEG data result.
Background
Electroencephalograms (EEG) are graphs obtained by recording spontaneous bioelectric potentials of the brain from the scalp by means of a precise electronic instrument in an enlarged manner. There is a large amount of information on the neuronal activity of the brain in the electroencephalogram, which can be used to obtain the mental state of a person. Generally, the electroencephalogram data are encoded and used for classifying emotions, sexes, diseases and the like in various models. However, little work can be done to reconstruct the encoded data back into the original data. An Auto Encoder (AE) is a model that encodes data and then decodes the data back to the original data. However, the model is decoded back to the original data, and the loss of the data is usually caused.
Reinforcement learning is the act of learning by an Agent (Agent) in a trial and error manner, and obtaining a reward guide by interacting with the environment, with the goal of maximizing the reward for the Agent. It is different from supervised learning, and mainly shows on the reinforcement signal, the reinforcement signal provided by the environment in reinforcement learning is an evaluation of the generation of action, rather than telling the reinforcement learning system how to generate the correct action.
Based on the technical characteristics of the model, the reinforcement learning is combined with the self-encoder to form a reinforcement learning self-encoder model, so that the self-encoder has a better data reconstruction effect.
Disclosure of Invention
The invention aims to provide an electroencephalogram data reconstruction method based on a reinforcement learning self-encoder, and the technical scheme adopted for realizing the method is as follows:
s1, acquiring original data:
and (3) building a simulation platform by using Openvibe, and designing a P300 signal stimulation paradigm for simulating road walking. And simultaneously selecting N tested young years in the same age group, walking on a simulated road, and collecting the electroencephalogram data with the length of T under the stimulation of the gold coins suddenly appearing in the walking process.
S2, data processing:
preprocessing the acquired original electroencephalogram data, and eliminating noise interference by adopting high-pass filtering and low-pass filtering. Ocular interference was eliminated by Independent Component Analysis (ICA). And eliminating the interference pseudo-note to obtain the electroencephalogram data for subsequent model training.
S3, constructing and training a network EEG-RL-AE model:
s3-1: initialization:
the network EEG-RL-AE model comprises an auto-encoder AE, a reinforcement learning auto-encoder and a data discriminator D, and parameters of the auto-encoder AE and parameters of the reinforcement learning auto-encoder are initialized.
S3-2: training the self-encoder:
and (3) inputting the electroencephalogram data as a training set, then training the input electroencephalogram data by using a self-encoder, and updating parameters of the self-encoder AE to obtain a better parameter model.
S3-3: training a reinforcement learning self-encoder:
and (3) inputting the electroencephalogram data as a training set, then training the input electroencephalogram data by using the reinforcement learning self-encoder, updating parameters of the reinforcement learning self-encoder, and obtaining a better parameter model.
S3-4: and (3) feeding back a result:
the input test data is processed by the self-encoder and the reinforcement learning self-encoder to obtain the result, then the comparator is used for comparing the similarity between the output of the two encoders and the input data, and the data with higher similarity is taken as the final result.
S4, evaluating a result:
and comparing the similarity of the reconstructed data and the original data to evaluate the reconstruction effect of the model data.
The invention has the beneficial effects that:
the invention trains through the reinforcement learning self-encoder, and the data reconstruction effect is better and better through continuous iteration. The method solves the problem that the conventional machine learning and deep learning method is poor in data recovery effect based on electroencephalogram. The invention uses a discriminator module to compare the data set reconstructed by the reinforcement learning self-encoder with the data set reconstructed by the self-encoder directly, has good generalization and is suitable for any self-encoder.
In a word, the method has a good effect on the reconstruction of the electroencephalogram data, and meanwhile, the method is expected to have a wider application prospect in the actual brain-computer interaction.
Drawings
FIG. 1 is a flow chart of the present invention
FIG. 2 is a diagram of a network architecture employed by the present invention
FIG. 3 is a diagram of an algorithm structure for reinforcement learning employed in the present invention
Detailed Description
The present invention is further described in conjunction with the accompanying drawings and examples so that the advantages and features of the present invention may be more readily understood by those skilled in the art and the scope of the invention will be more clearly and clearly defined.
The invention takes an Automatic Encoder (AE) as a data reconstruction method, updates the parameters of the automatic encoder by a reinforcement learning DDPG algorithm, and finally obtains a better data reconstruction result by comparing the reinforcement learning self-encoder with a common self-encoder through a discriminator.
S1, acquiring original data:
and (3) building a simulation platform by using Openvibe, and designing a P300 signal stimulation paradigm for simulating road walking. The data for this analog system was recorded by a digital monitoring system (Beain Products GmbH, germany) with a sampling frequency of 250Hz, all 61 EEG channels referenced to both earlobes grounded to the FCz channel, and kept their impedance below 10 KQ.
Based on the system, N young years of the same age group are selected as the tested object, and electroencephalogram signals are collected. The task is that the user is tried to sit on a chair, wears an electroencephalogram cap and walks on a simulated road by using keys according to prompts on a map. Gold coins appear during walking, giving the subject a stimulus of an event-related potential. The place and time of occurrence of the gold coins are random, and the task is finished after the tested object walks to the terminal point in the simulation system. The experiment contained multiple sets of experimental collections (no less than 6).
S2, data preprocessing:
and preprocessing the acquired original data by using an EEGlab script. The method comprises the steps of algorithms such as notch filtering, high-pass filtering, low-pass filtering and the like, filtering the commercial power with the frequency of 50Hz and high-frequency noise, and removing artifacts. And then, dividing the data into a plurality of sequences with equal time T and storing the sequences in a file for subsequent model training.
And S3, designing and training a corresponding EGG-RL-AE network model named as EGG-RL-AE according to the design idea of the AE network and reinforcement learning, as shown in figure 2. The concrete implementation is as follows:
s3-1: initialization
The EEG-RL-AE model consists of three modules, namely an auto-encoder AE, a reinforcement learning auto-encoder RL-AE and a data discriminator D, and parameters mu of the two encoders are respectively calculatedAE、μRL-AECarrying out initialization operation;
the activation functions of the layers of the self-encoder AE and the reinforcement learning self-encoder RL-AE network are ELU functions, wherein x represents a sample, and the formula is as follows:
ELU(x)=max(0,x)+min(0,α*(exp(x)-1)) #(1)
where α is a hyperparameter, default is 1.0;
the self-encoder AE is trained using the mean square error of the MESloss as a loss function, where x represents the input data sample, y represents the sample generated by the training model, and i represents each value in the sample matrix, as shown in equation (2):
Loss(xi,yi)=(xi-y)2 #(2)
the activation function of AE and the last layer of RL-AE is Tanh, as shown in equation (3):
in addition, the data discriminator D compares the similarity between the data generated by AE and RL-AE and the input data by using MESloss.
S3-2: training self-encoder
And reading the electroencephalogram data in the folder, and importing the electroencephalogram data into a self-encoder. The following work then starts:
and (3) encoding: the first step is to extract features from the input image using a convolutional neural network. Specifically, three layers of one-dimensional convolutional layers are adopted, wherein the former two layers of one-dimensional convolutional layers each adopt a convolutional kernel (kernel) sliding window with the length of 3, the step length (Stride) is set to be 1, the Padding (Padding) value is set to be 1, the last layer adopts the arrangement that the convolutional kernel is 4, the step length is 1, and the Padding is 1, and the arrangement is used for feature extraction and coding.
And (3) decoding: and (2) finishing the work of recovering low-level features from the feature vectors by using deconvolution layers, wherein three layers of one-dimensional deconvolution layers are adopted, the convolution kernel of the first convolution layer is 4, the step length is 1, the filling is 1, the normalization is carried out after the middle deconvolution layer, the activation is carried out by using a ReLU function, each layer of one-dimensional convolution layer adopts a convolution kernel sliding window with the length of 3, the step length is set to be 1, the filling value is 1, the Tanh activation function is adopted after the last layer of deconvolution, and finally the generated EEG data is obtained.
After training, the parameters of the self-encoder model are saved.
S3-3: training reinforcement learning self-encoder
The algorithm used by the reinforcement learning self-encoder is the DDPG (deep Deterministic Policy gradient) algorithm, as shown in FIG. 3.
The DDPG algorithm mainly has four network models.
The Actor online policy network: is responsible for the iterative update of the strategy network parameter theta, is responsible for selecting the current action A according to the current state S and is used for interacting with the environment to generate the next state S', and the reward function R
The Actor target policy network: responsible for selecting the optimal next action A ' based on the next state S ' sampled in the empirical playback pool, the network parameters theta ' being periodically copied from theta
Critic online scoring network: responsible for the iterative update of the value network parameter w, the discount factor γ, responsible for calculating the current Q value Q (S, a, w). The target Q value formula in the ideal state is shown in formula (4):
Q*=R+γQ’(S’,A’,w’) #(4)
critic goal scoring network: responsible for calculating the fraction of the target Q value. The network parameters w' are periodically copied from w.
In addition, DDPG can add some randomness to the learning process, and increase the learning coverage, the model adds a certain matrix vector N to the selected action a, that is, the expression of the action a finally interacting with the environment is shown in formula (5):
A=πθ(S)+N#(5)
where pi represents the currently selected policy, theta is a network parameter, and S is the current state. Regarding the loss function of the DDPG algorithm, the loss function used is the mean square error.
In the present invention, the reward function R is improved, x is the input test data sample, where alpha is the discount coefficient (0-1), LtThe square mean error of the data obtained by the DDPG model at this time and the original data is obtained; l ist-1The square mean error of the data obtained last time by the DDPG model and the original data is obtained; l isAEThe squared error of the data obtained directly from the encoder for the test data and the original data. The formula is shown in formula (6):
R(x)=-(α*(Lt-Lt-1)+(1-α)*(Lt-LAE)) #(6)
and reading the electroencephalogram data in the folder, and importing the electroencephalogram data into the reinforcement learning self-encoder. The following work then starts:
and (3) encoding: the first step is to extract features from the input image using a convolutional neural network. Specifically, three layers of one-dimensional convolutional layers are adopted, wherein the former two layers of one-dimensional convolutional layers each adopt a convolutional kernel (kernel) sliding window with the length of 3, the step length (Stride) is set to be 1, the Padding (Padding) value is set to be 1, the last layer adopts the arrangement that the convolutional kernel is 4, the step length is 1, and the Padding is 1, and the arrangement is used for feature extraction and coding.
And (3) decoding: and (2) finishing the work of recovering low-level features from the feature vectors by using deconvolution layers, wherein three layers of one-dimensional deconvolution layers are adopted, the convolution kernel of the first convolution layer is 4, the step length is 1, the filling is 1, the normalization is carried out after the middle deconvolution layer, the activation is carried out by using a ReLU function, each layer of one-dimensional convolution layer adopts a convolution kernel sliding window with the length of 3, the step length is set to be 1, the filling value is 1, the Tanh activation function is adopted after the last layer of deconvolution, and finally the generated EEG data is obtained.
Training parameters of a reinforcement learning DDPG algorithm: after the coding and decoding models are trained, the model parameters are saved. And importing the data into a coding model, then obtaining a matrix vector through the coding model, entering a reinforcement learning DDPG algorithm, training the coded matrix vector, adding a new matrix vector value, then placing the matrix vector into a decoding model, and finally obtaining the generated EEG data.
S3-4: feedback of results
The operation of the discriminator D: the data generated by the models of S3-2 and S3-3 are loaded into a discriminator, and compared with the input test data by a mean square error formula, respectively, the data with smaller mean square error value is output as the final EEG result.
Obtaining a trained model through the steps; inputting a test data set into the trained model, then obtaining generated EEG data through a self-encoder and a reinforcement learning self-encoder respectively, and then inputting the two obtained data into a discriminator D to obtain a final EEG data result.
S4, evaluating a result:
and displaying the reconstructed data and the original data on the same picture, and comparing the similarity point to point at the same moment to evaluate the reconstruction effect of the model data.
- 上一篇:石墨接头机器人自动装卡簧、装栓机
- 下一篇:一种深度神经网络模型鲁棒性优化方法