Tree process neural network initialization method
1. A method for initializing a dendritic neural network comprises pruning dendrites of neurons, and is characterized by comprising the following steps:
step one, generating a training set D { (x) from a data set by using a k-fold cross validation method1,y1),(x2,y2),...,(xm,ym)};
Generating a new data set T from the training set D by using a binarization method;
step three, generating a corresponding decision tree structure through an ID3 or C4.5 learning algorithm;
combining the paths of the decision tree with the leaf node marked as 1, and pruning the paths with the leaf node marked as 0;
step five, determining the number of the dendrote layers in the dendritic neural network according to the number of the paths marked as 1 in the decision tree;
and step six, constructing a dendrimer layer with the same classification function according to each path of the decision tree.
2. The method for initializing a dendritic neural network of claim 1, further comprising a classification method based on one-bit efficient coding, the classification method comprising the steps of:
step one, setting a probability prediction model h, h (x) ═ h1(x),h2(x),...,hk(x))∈[0,1]kAnd h (x) is the probability that sample x belongs to class i, the maximum probability classification function for model h is,
and step two, outputting classification corresponding to the maximum model prediction probability by maximum probability classification, and using the classification as the classification prediction of the sample.
Background
An Artificial Neural Network (ANN) is an operational model that is built with the inspiration of the potential generation mechanism that is inhibited and triggered by natural neurons. The ANN successfully solves many practical problems in the field of prediction estimation, such as successful application of the dendritic neural network (DNM) to high-precision classification of breast cancer, liver disease, credit, and the like, financial time series, exchange rate and passenger arrival, prediction of the china house price index, and building the dendritic neuron model requires pruning the dendrites of the neurons to achieve the effect of information transmission and storage. Some focuses on filtering level pruning, namely, pruning is carried out according to the importance of neurons, and the method improves the performance of the network without changing the original network structure and has stronger generalization capability; a new method for pruning an artificial neural network is proposed, which measures the neural complexity of the neural network, and reduces the network with excessive complexity while keeping the learning behavior and the fitness, and the pruning method is a great improvement on the most common pruning method based on the amplitude; a sequential learning algorithm of a Radial Basis Function (RBF) network is proposed, which is called as a generalized growth and pruning algorithm (GGAP-RBF) of the RBF network, and a growth and pruning strategy of the GGAP-RBF neural network is performed based on the required learning precision and the recently added importance of new neurons, but the method has higher complexity and is not beneficial to practical application; also, a new channel pruning method is proposed to accelerate a very deep convolutional neural network, and an iterative two-step algorithm is proposed, each layer is effectively pruned through channel selection and least square reconstruction based on LASSO regression, and if the algorithm is further popularized to a multi-layer and multi-branch situation, the method can prune neurons with high contribution, so that the network precision is reduced.
Dendritic neural networks (DNMs) are a biomimetic network, the basic structure of neurons including dendrites, axons, soma and nuclei. Thus, the dendritic neural network has four layers: synaptic, Dendritic, Membrane and Soma layers. The Synaptic layer receives an input signal and converts the linear signal to a neuron signal using a sigmoid function. The Dendritic layer is used for carrying out convergence processing on the output of the synthetic layer. The Membrane layer is the output of the enhancement Dendritic layer and inputs the result to the Soma layer. The Soma layer uses another Sigmoid function to give the final result.
Referring to fig. 1, a DNM network architecture with 6 layers of Dendrite and 9 inputs is shown. Wherein, input xiThrough the connection of the Dendritic with the synthetic layer (four connection states), the Membrane layer performs enhanced activation on the output of the synthetic layer and transfers the same to the Soma layer.
The Synaptic layer is an important component of information interaction between neurons. This layer converts linear signals into neuronal signals using Sigmoid functions. The Synapses layer can be divided into inhibitory and excitatory Synapses according to potential changes caused by the received ions. The Synapses layer formula is as follows:
wherein, YijRepresents the output from the ith input to the jth Synapses layer, and has a range of [0, 1%]. k is a connection parameter, typically set to an integer between 1 and 10. When ω isijAnd thetaijTaking different values may correspond to four connection states, see fig. 2. The four connection states are described below:
1) constant 0 connection (omega)ij<0<θijOr 0 < omegaij<θij) In this state, no matter xiHow value of (A) is, output YijAre all 0.
2) Positive connection (0 < theta)ij<ωij) The output and input are equal no matter how the input changes between 0 and 1And (4) positively correlating.
3) Reverse connection (omega)ij<0<θij) The output is inversely related to the input regardless of how the input changes between 0 and 1.
4) Normal 1 connection (theta)ij<ωij< 0 or thetaij<0<ωij) In this state, no matter xiHow value of (A) is, output YijAre all 1.
The function of the Dendrite layer is to multiply the Synaptic signals on each branch so that the signals between the Synaptic layers produce a non-linear interaction. This approach is similar to the logical AND operation, with the Dendrite layer formula as follows:
wherein Z isjIs the output of the j-th Dendrite layer.
The Membrane layer collects the signals from each of the Dendrite layers. The layer performs convergence processing on the input of each branch, and outputs the convergence processing result to the next layer. Its action is very similar to a logical or operation. The Membrane layer formula is shown below:
wherein, V is the output after the convergence processing of the Membrane layer.
The Soma layer functions to perform a function similar to a somatic cell, and when the output of the Soma layer exceeds a threshold, a neuron is triggered. This process is represented by a Sigmoid function, and is formulated as follows:
where O is the output of the Soma layer.
Since DNM is a feed-forward model and all functions in DNM are differential, the error is reversedThe propagation algorithm (BP) can be effectively used as a learning algorithm. The BP algorithm continuously adjusts theta through derivative and learning rateijAnd ωijTo reduce the difference between the actual output O and the desired output T. The squared error between O and T is defined as:
in DNM, E is minimized by continuously modifying the join parameters in the negative gradient direction during the iteration.
Dendritic neural networks suffer from several problems:
1) there is no reason for the Dendrite layer number setting of the dendritic neural network. Too many layers affect the efficiency of the training of the dendritic neural network, and too few layers may cause some convergence problems.
2) Weight ω due to dendritic neural networkijAnd a threshold value thetaijIs randomly generated and therefore the initial convergence speed is slow or may converge to some local optimum.
3) In training the dendritic neural network, some of the dendron layers may be pruned to reduce the complexity of the dendritic neural network, but may result in reduced accuracy.
4) The dendritic neural network can effectively solve the problem of two-classification, but the problem of multiple-classification cannot be effectively solved due to the high complexity of the network.
Disclosure of Invention
The invention aims to provide a dendritic neural network initialization method which is high in pruning precision, high in convergence speed and good in generalization capability.
The technical scheme for realizing the aim is that,
a method for initializing a dendron neural network comprises the pruning of neuron dendrites, and comprises the following steps:
the method comprises the following steps: generating a training set D { (x) from the dataset using a k-fold cross-validation method1,y1),(x2,y2),...,(xm,ym)}。
Step two: a new data set T is generated from the training set D using a binarization method.
Step three: the corresponding decision tree structure is generated by ID3 or C4.5 learning algorithm.
Step four: paths of the decision tree with leaf nodes marked as 1 are merged, and paths with leaf nodes marked as 0 are pruned.
Step five: and determining the number of the dendrote layers in the dendritic neural network according to the number of paths marked as 1 in the decision tree.
Step six: and constructing the dendrote layer with the same classification function according to each path of the decision tree.
Further, the method comprises a classification method based on one-bit effective coding, and the classification method comprises the following steps:
step one, setting a probability prediction model h, h (x) ═ h1(x),h2(x),...,hk(x))∈[0,1]kAnd h (x) is the probability that sample x belongs to class i, the maximum probability classification function for model h is,
and step two, outputting classification corresponding to the maximum model prediction probability by maximum probability classification, and using the classification as the classification prediction of the sample.
The invention provides a Dendritic Neural network (MDTDNM) Based on a Multi-Decision Tree Based on the existing method for cutting the dendrites of the neurons, the Model has a neuron cutting function, useless synapses and unnecessary dendrites are screened through the Decision Tree (DT), and a unique Dendritic topology for a specific task is formed. Furthermore, the invention introduces a One-bit effective coding (One-Hot) coding method, which can effectively solve the multi-classification problem. Firstly, under the condition of not influencing the performance, the MDTDNN cuts out neurons which do not contribute much through a decision tree, so that a large amount of computing resources are reduced; secondly, initializing a weight value by the model through a decision tree to form a unique dendritic topology aiming at a specific task; and finally, the MDTDNN can effectively process the multi-classification problem in a One-Hot coding mode. Simulation results show that the model reduces algorithm complexity, improves efficiency, and is superior to the existing model in the aspects of precision and computational efficiency.
Drawings
FIG. 1 shows a DNM structure;
FIG. 2 is a diagram illustrating threshold-corresponding connection states;
FIG. 3 illustrates a decision tree based dendritic neural network model;
FIG. 4 is a diagram of MDTDNM multi-classification;
FIG. 5 is a graph comparing the convergence curves of Iris data sets;
FIG. 6 is a graph comparing the convergence curves of the Wine data sets;
FIG. 7 is a graph comparing the convergence curves of the Ecoli dataset.
Detailed Description
The present invention will be described in detail with reference to examples.
Decision Trees (DTs) are a basic classification and regression approach. The goal is to create a model that predicts the target variables by learning simple decision rules inferred from the data attributes, with the core idea being to follow a simple intuitive "divide and conquer" strategy. The key to decision tree learning is how to select the optimal partition attribute a, defined as:
in general, as the learning process progresses, the more samples contained in a desired branch node belong to their correct class number, the higher the "purity" of the node. Entropy is the most common measure of "purity" of a sample set, if the proportion of samples of type k in the sample set D is pk(k ═ 1, 2, 3., | Y |), then the entropy of information for D is defined as:
the smaller ρ (D), the higher the purity of D. Assume that the discrete attribute a has V possible values a1,a2,...,aV}. The sample value can be found where the v-th branch node contains all the attributes a in D, denoted as Dv. Obtaining the information entropy of D according to the formula (7), wherein the weight of the branch node is | D since different branch nodes contain different sample numbersvI.e., the larger the number of samples, the greater the impact of the branching node. Thus, the information gain can be calculated as:
in general, the greater the information gain, the higher the degree of "purity" of discrimination using the attribute a. Thus, the information gain is used to select the classification attributes of the decision tree. How to select a is the key to successfully learn the decision tree, and the following code shows the step of selecting the best classification attribute a.
It was found through research that the dendritic neural network and DT are identical in solving the classification problem. From the decision tree initialization, rules generated by the decision tree can be formed, which contain a logical and a logical or, similar to the functions of the synthetic and Membrane layers in the dendritic neural network. The correspondence between DT and the dendritic neural network is shown in table 1.
TABLE 1 correspondence between decision trees and DNM
The invention depends on the correlation between the classification result of the basic classifier generated by each decision tree and the example label. The method mainly comprises the following operation steps:
step 1: generating a training set D { (x) from the dataset using a k-fold cross-validation method1,y1),(x2,y2),...,(xm,ym)}。
Step 2: a new data set T is generated from the training set D using a binarization method.
And step 3: the corresponding decision tree structure is generated by ID3 or C4.5 learning algorithm.
And 4, step 4: paths of the decision tree with leaf nodes marked as 1 are merged, and paths with leaf nodes marked as 0 are pruned.
And 5: the number of dendrite layers in the dendrite neural network is determined from the number of paths marked 1 in the DT.
Step 6: and constructing a dentist layer with the same classification function according to each path of the DT.
For further understanding, the specific implementation steps of the decision tree pruning strategy are illustrated:
step 1: a corresponding decision tree structure is generated using a decision tree based learning algorithm. Where the inner nodes (non-leaves) represent the test of the attributes and the outer nodes (leaves) represent the test results. As shown in FIG. 3, 15 internal nodes of c1-c4 and 16 leaf nodes labeled 0 or 1 are processed.
Step 2: the M paths labeled 1 are retained by merging and pruning, denoted ψ ═ σ1,σ2,...,σM}. As shown in fig. 3, a path contains three leaf nodes labeled 1. Therefore, ψ { (c) can be obtained1=0,c3=0,c2=0);(c1=0,c3=1);(c1=1,c2=0,c41) orFrom the above, rules generated by the decision tree can be formed:
IF(c1=0∩c3=0∩c2=0)∪(c1=0∩c3=1)∪(c1=1 ∩c2=0∩c4=1)
THEN Class is 1
the rules are similar to the function of the Dendritic and Membrane layers in a Dendritic neural network.
And step 3: the three paths of the decision tree are converted into three dendrote layers of the dendrite neural network. The attribute c1 is connected to the Dendrite layer by the Condure layer in the forward direction, c2 is connected to the Dendrite layer by the reverse direction, c3 is connected to the Dendrite layer by the constant 1 and c4 is connected to the Dendrite layer by the forward direction.
And 4, step 4: and finally forming a dendritic neural network model based on the decision tree.
A single dendritic neural network cannot achieve multiple classifications, and thus multiple dendritic neural networks are required to work in concert to perform multiple classifications. It has been found through research that if multiple dendritic neural networks are used for multi-classification, too many parameters need to be set, so that the networks cause slow convergence and poor classification.
The dendritic neuron model based on decision tree pruning can be used for shearing unnecessary branches on the premise of ensuring the performance, so that excessive network parameters are effectively avoided, and the network complexity is reduced. Therefore, the model can effectively solve the multi-classification problem.
One-Hot encoding, also known as One-bit-significance encoding, mainly uses N-bit state registers to encode N states, each state being independent of its register bit and only One bit being significant at any time [30 ]]. Similarly, in the class k classification problem, the training data contains the vector label y ∈ [0, 1 ]]k. And if the object belongs to the t-th class, the value of the t-th bit of the mark vector is 1, and the other values are 0, and the idea is consistent with the idea of One-Hot coding. Therefore, the embodiment is based on the One-Hot encoding method to encode the sample label. The following terms describe the core idea of multi-classification:
given a probabilistic predictive model h, h (x) ═ h1(x),h2(x),...,hk(x))∈[0,1]k. Where h (x) is the probability that sample x belongs to class i. The maximum probability classification function for model h is then:
the maximum probability classification outputs the classification corresponding to the maximum model prediction probability and uses it as the classification prediction of the sample. As shown in fig. 4, a network model is composed of three decision tree pruning-based tree-like neurons. And decomposing the three-classification problem into two-classification problem by each neuron, and finally using the class with the highest probability of each neural prediction as the class prediction of the sample.
The performance effect of the method for initializing the dendritic neural network of the above embodiment is verified by the UCI machine learning repository, and the specific process is as follows,
table 2 describes these data sets. At present, many recent studies use the database for performance verification, so the model proposed in this embodiment can be compared with the recent model.
TABLE 2 UCI data
The experimental design divided the samples of each data set into three sections, with 70% for training, 15% for validation and 15% for testing.
To evaluate the performance advantages of the present invention, BP [25] M dendritic neural networks [17] were chosen for comparative analysis and comparison. Performance was evaluated from four aspects: mean Square Error (MSE) curve, accuracy, precision, recall, and p-Value. In addition, to make the comparison more equitable, the parameters of the three models were all equal, as shown in table 3. Four performance indicators are briefly described as follows:
1) mean Square Error (MSE) curve: MSE is the squared difference of the error between the actual and predicted values. The smaller the value of MSE, the better the accuracy of the prediction model and may reflect the convergence of the model.
2) The accuracy is as follows: the performance metric used to evaluate classification problems is typically classification accuracy. The proportion of correctly classified samples in the total number of samples for a given datum.
3) ROC curve: to compare the performance of the BPNN and M-dendritic neural networks of the present invention, one common method is to calculate the area under the ROC curve (AUC). The more the AUC approaches 1, the better the model performance.
4) p-Value: if the p-Value is less than 0.05, the performance of the model is obviously better than that of other models.
TABLE 3 parameter settings
Six groups of data were selected from the UCI database for the experiment in this example. The convergence curves MSE for the three models are compared as shown in fig. 5 to 7. It can be seen that the MSE of the present invention converges at the fastest rate, while the MSE of BPNN and M dendrite neural networks converge at a slower rate. In addition, the MSE curve of the present invention is low, which means that it is close to the global optimum solution. The main reasons for the analysis are: according to the invention, the decision tree is used for pruning, so that the number of dendritic neurons can be reduced, and the initialization weight and the threshold of the neurons are optimized, thereby improving the training efficiency.
Six experiments were performed using different maximum iterations (200, 500 and 1000) and the accuracy of the three models were compared, respectively. Furthermore, for a more fair comparison, the number of dendrote layers (hidden layers) of the BPNN and the M dendrite neural network are almost equal. The results in tables 4 to 9 show that the invention has higher precision no matter how many times the iteration is carried out, and the performance of the invention still maintains better stability and robustness under the same number of dendrite layers. This is because the performance of the M dendrite neural network and BP is highly dependent on the number of dendrite layers (hidden layers) and the randomly generated initial value.
Tables 4 to 9 show that the AUC values of the present invention are higher than those of other algorithms no matter how many iterations. This is because the parameters of the BPNN and the M-dendrite neural network are randomly generated, and the dendrote layer (hidden layer) settings are not based on anything. Therefore, the AUC value is low. Since the present invention is based on the construction of DTs, forming a unique dendrite topology for a specific task, the initial structure of the present invention is already close to global optima.
TABLE 4 Glass data set Performance comparison
TABLE 5 Wisconsin Breast-Cancer dataset Performance comparison
TABLE 6 Iris dataset Performance comparison
TABLE 7 wire dataset Performance comparison
TABLE 8 Image segmentation dataset Performance comparison
TABLE 9 ECOLI dataset Performance comparison
From the results in the above table, it can be seen that the accuracy and convergence rate achieved by the present invention for the six data sets are higher than those achieved by the two comparative models. Thus, embodiments of the present invention have excellent classification results.
In summary, the embodiments of the present invention have the following improvements: 1) compared with the complete connectivity of the BPNN and the M dendritic neural network, the invention can directly determine the number of the dentrite layers through decision tree initialization. It effectively solves the classification problem in terms of accuracy and convergence speed. 2) Compared with the M dendritic neural network and the BPNN, the method has better generalization capability, and is verified by the test accuracy of different data sets. 3) The invention effectively solves the problem of multi-classification according to the idea of One-Hot coding. 4) Through verification of six sets of experimental data, C4.5 is more capable of improving the performance of the invention than ID 3. The invention is applied to the fields of medical diagnosis, images and the like, reduces the algorithm complexity, improves the efficiency, reduces a large amount of calculation resources and obtains a more accurate result.