Learning style identification method based on fusion label and stacked machine learning model
1. The learning style identification method based on the fusion label and the stacked machine learning model is characterized by comprising the following steps:
1) respectively calculating the learning styles corresponding to the learners by using a Kolb learning style quantity table and an online learning behavior survey quantity table, and taking the intersection of the two calculation results to obtain the divided and undivided learning styles;
2) clustering the divided and undivided learning styles, determining the learning styles, labeling labels, and supplementing the undivided learning styles;
3) carrying out correlation test on the learning style label obtained in the step 2) and the online learning behavior characteristics collected by the online learning platform;
4) selecting training data and testing data for the online learning behavior characteristics qualified by the relevance inspection, and training by using a stacking machine learning model to obtain a complete stacking model;
5) and evaluating the comprehensive performance of the trained stacking model, optimizing the stacking model, and predicting the learning style of the actual learner by using the optimized stacking model.
2. The learning style identification method based on the fusion label and stack machine learning model according to claim 1, characterized in that the specific steps of step 1) comprise:
11) acquiring a Kolb learning style scale of the learner, and calculating each learning style score;
12) and acquiring an online learning behavior survey table of the learner, calculating each learning style score, taking an intersection with each learning style score obtained based on the Kolb learning style table of the learner, and acquiring a set which successfully divides the learning style and a set which unsuccessfully divides the learning style.
3. The method of claim 2, wherein the learning style score is calculated by the following formula:
in the formula: LS (least squares)kTo learn style scores, ai,jThe score of the jth option of the ith question represents one of all learning ways of the learner.
4. The learning style identification method based on the fusion label and the stacked machine learning model according to claim 1, wherein the specific contents of the step 2) are as follows:
after feature dimension reduction is carried out on online learning behavior survey table data, clustering is carried out on the data subjected to dimension reduction by using K-Means + + to obtain four clustering clusters, then a cluster meaning determining method based on expert labeling is constructed to determine the position of the mass center of each cluster, then Euclidean distances from other points to the mass center are calculated, sample points in the threshold value closest to the mass center in the clustering are selected by using a mass center selecting method, the learning style of the sample is determined according to the Kolb learning style table data of the sample points and the online learning behavior table data, labels are labeled, and the learning style which is not divided is supplemented.
5. The method of claim 4, wherein the learning style of the sample is determined according to the Kolb learning style scale data and the online learning behavior scale data of the sample point, and labeled with a label, and the learning style which is not divided is supplemented by:
if the fact that more than 50% of samples in the sample points are the same learning style is determined, the intrinsic meaning of the cluster is determined as the learning style, and if the condition that more than 50% of samples are not met, the threshold value is expanded by 1 until the learning style can be successfully divided; and finally, supplementing the learning style which cannot be divided.
6. The learning style identification method based on the fusion label and the stacked machine learning model as claimed in claim 1, wherein in the step 3), correlation test is performed on the learning style label obtained in the step 2) and the online learning behavior feature collected by the online learning platform through Spearman correlation coefficient.
7. The method for recognizing learning style based on fusion label and stacked machine learning model as claimed in claim 1, wherein the step 4) of training with the stacked machine learning model comprises the following steps:
41) preprocessing behavior data generated by a learner in an online teaching platform;
42) constructing a stacked machine learning model by combining the learning style labeling result obtained in the step 2);
43) and carrying out model training and parameter adjustment on the constructed stacking machine learning model to obtain a complete stacking model.
8. The method of claim 7, wherein the stacked machine learning model is a fusion model based on a two-layer model, and the first layer comprises four basic classifiers: the method comprises the following steps of (1) random forest, gradient boosting decision trees, a support vector machine and a multilayer perceptron, wherein the input of a first layer is an original training set and a test set; and the second layer is a logistic regressor, takes the output of the basic classifier of the first layer as input, and retrains by adding a training set to further obtain a complete stack model.
9. The method for learning style identification based on fusion label and stacked machine learning model of claim 7, further comprising a step of resampling samples by using SMOTE algorithm before step 43).
10. The method for recognizing learning style based on fusion label and stacked machine learning model as claimed in claim 1, wherein in step 5), the trained stacked model is evaluated for comprehensive performance by using accuracy, recall rate, precision, F1 score and area under curve, so as to optimize the stacked model.
Background
Online education eliminates the time and space limitations of traditional education and allows teachers and students to communicate at any time and place, which brings possibility to the realization of 'teaching by nature' proposed by the Confucius. Scholars such as Dunn, Kolb, Felder and Keefe have long recognized that students have different styles of learning new knowledge, with differences including personality traits, knowledge levels, learning abilities and learning styles. Wherein the learning style includes learning preferences and learning characteristics. Finding a learning style suitable for students can guide the students to learn, so that the automatic recognition task of the learning style is important for promoting the individual learning in the online education environment.
The traditional method of identifying learning style is to ask the learner to fill out a learning style table. Although such methods are effective, there are still some disadvantages. Firstly, the traditional learning style scale designer can not avoid subjective factors when preparing problems; secondly, learners can directly cause inaccuracy of learning style recognition due to unclear cognition when filling out scales; thirdly, when the evaluation indexes of the various learning styles have consistent scores, the scale may not identify the learning mode; finally, there are dynamically changing characteristics of the student's learning style, and scale-based identification is a static approach. Therefore, under the current situation, the learning style of the learner is implicitly and dynamically identified by utilizing the multisource heterogeneous online education data, so that the problems of a series of traditional learning style identification such as incapability of identification, inaccuracy of identification, high subjectivity, static identification and the like are solved, and method support is provided for realizing personalized learning under the online education background.
At present, there are various studies on the recognition of learning styles by scholars at home and abroad, and the method for recognizing the learning styles has two routes: the method comprises a self-defined rule-based gauge identification method and an automatic identification method based on a machine learning technology. The scale recognition method is a conventional method when it is inconvenient to collect online learning behavior data of a learner. The automatic identification method can divide the learning style of learners by cleaning and integrating the online learning behavior data of students, and Brahim Hmedna et al designs a method which can automatically identify the learning style by using the learning behavior data generated by students in MOOC, and the essence of the model is to cluster the data to obtain a learning style label, and then predict the learning style by using a classification algorithm. The method has the problem of insufficient interpretability because the practical significance of the clustering result needs to be explained by adopting a manual method, and the method directly utilizes the clustering result obtained by the distribution of the data to serve as a learning style label of a learner. Chia-Cheng Hsu et al propose a FIS model based on neuron fuzzy inference for identifying the learning style of online learners; the essence of the method is that the learning style is identified through a single hidden layer neural network by using a custom rule, and the explainability of the method is insufficient when the learning style is not uniformly distributed due to the fact that the number of learners in an experimental verification part is only 102. The EENN-PSO model proposed by Song Lai et al achieves high accuracy in learning style identification, but the research has the problem that only the NEO-FFI scale is used for obtaining the learning style labels, and the use of a single scale is too subjective. Furthermore, deep learning has recently been applied to identify learning styles. Zhanhao et al constructed a deep belief neural network for learning style detection and the highest recognition accuracy (Vis/Vrb) at a single learning style was 0.89. However, the deep learning method requires a large amount of data to train the model, which takes a lot of time and cannot guarantee the interpretability of the data.
Disclosure of Invention
The invention aims to overcome the defects of the prior art and provide a learning style identification method based on a fusion label and a stacking machine learning model.
The purpose of the invention can be realized by the following technical scheme:
a learning style identification method based on a fusion label and a stacked machine learning model comprises the following steps:
s1: and respectively calculating the learning styles corresponding to the learners by using the Kolb learning style quantity table and the online learning behavior survey quantity table, and acquiring the divided and undivided learning styles by taking the intersection of the two calculation results.
S2: clustering the divided and non-divided learning styles, determining the learning styles, labeling labels, and supplementing the non-divided learning styles.
S3: for step S2: and carrying out correlation test on the obtained learning style label and the online learning behavior characteristics collected by the online learning platform.
S4: and selecting training data and testing data for the online learning behavior characteristics qualified by the relevance inspection, and training by using a stacking machine learning model to obtain a complete stacking model.
S5: and evaluating the comprehensive performance of the trained stacking model, optimizing the stacking model, and predicting the learning style of the actual learner by using the optimized stacking model.
The specific steps of step S1 include:
11) acquiring a Kolb learning style scale of the learner, and calculating each learning style score;
12) and acquiring an online learning behavior survey table of the learner, calculating each learning style score, taking an intersection with each learning style score obtained based on the Kolb learning style table of the learner, and acquiring a set which successfully divides the learning style and a set which unsuccessfully divides the learning style.
The calculation formula of the learning style score is as follows:
in the formula: LS (least squares)kTo learn style scores, ai,jThe score of the jth option of the ith question represents one of all learning ways of the learner.
The specific content of step S2 is:
after feature dimension reduction is carried out on online learning behavior survey table data, clustering is carried out on the data subjected to dimension reduction by using K-Means + + to obtain four clustering clusters, then a cluster meaning determining method based on expert labeling is constructed to determine the position of the mass center of each cluster, then Euclidean distances from other points to the mass center are calculated, sample points in the threshold value closest to the mass center in the clustering are selected by using a mass center selecting method, the learning style of the sample is determined according to the Kolb learning style table data of the sample points and the online learning behavior table data, labels are labeled, and the learning style which is not divided is supplemented.
Further, the learning style of the sample is determined according to the Kolb learning style scale data and the online learning behavior scale data of the sample point, and the label is labeled, and the specific contents for supplementing the undivided learning style are as follows:
if the fact that more than 50% of samples in the sample points are the same learning style is determined, the intrinsic meaning of the cluster is determined as the learning style, and if the condition that more than 50% of samples are not met, the threshold value is expanded by 1 until the learning style can be successfully divided; and finally, supplementing the learning style which cannot be divided.
In step S3, a correlation test is performed on the learning style label obtained in step S2 and the online learning behavior characteristics collected by the online learning platform through a Spearman correlation coefficient.
In step S4, the specific steps of training using the stacked machine learning model include:
41) preprocessing behavior data generated by a learner in an online teaching platform;
42) building a stack machine learning model by combining the learning style labeling result obtained in the step S2;
43) and carrying out model training and parameter adjustment on the constructed stacking machine learning model to obtain a complete stacking model.
The stacked machine learning model is a fusion model based on a two-layer model, wherein the first layer comprises four basic classifiers: the method comprises the following steps of (1) random forest, gradient boosting decision trees, a support vector machine and a multilayer perceptron, wherein the input of a first layer is an original training set and a test set; and the second layer is a logistic regressor, takes the output of the basic classifier of the first layer as input, and retrains by adding a training set to further obtain a complete stack model.
Further, before step 43), a step of resampling the samples by using SMOTE algorithm is also included.
In step S5, comprehensive performance evaluation is performed on the trained stacking model by adopting the accuracy, the recall rate, the accuracy, the F1 score and the area under the curve, and the stacking model is optimized.
Compared with the prior art, the learning style identification method based on the fusion label and the stacking machine learning model provided by the invention at least has the following beneficial effects:
1) the invention provides a method capable of dynamically identifying a learning style, which obtains a learning style label of a learner by processing two quantitative tables filled by the learner, and then predicts the learning style of the learner by adopting data generated by the student on an online teaching platform, wherein the processed data is large and abundant in quantity, and the data structure is a second-order tensor, so that the difficulty of model training can be reduced, and the time required by identification can be reduced;
2) according to the method, learning style clustering based on a fusion label is adopted, the problem of high subjectivity caused by a single-part scale can be solved by fusing the labeling results of two parts of scales, and relevance analysis is carried out by comparing the learning style label divided by the LSDM, the Kolb learning style scale and the online learning behavior scale with the learning behavior characteristics on an online learning platform by using a Spearman correlation coefficient, so that the identification result can be obtained more objectively and accurately;
3) the method disclosed by the invention is used for constructing a two-layer stacking model based on a resampling technology, can be used for predicting the learning style by integrating the learning behavior data of a learner on an online teaching platform, further can be used for reducing the problem of poor prediction performance caused by unbalanced categories, and has higher recognition accuracy compared with the traditional machine learning method.
4) The invention is based on a machine learning model, the related data volume is large, the structure is a second-order tensor, all the characteristics of the experiment have interpretability, namely, the experiment is carried out without using a deep learning model in the prior art, and the problem that the interpretability is damaged by adopting deep learning is avoided.
Drawings
FIG. 1 is a schematic flow chart of a learning style identification method based on fusion labels and a stacked machine learning model in an embodiment;
FIG. 2 is a schematic block diagram of a learning style identification method based on fusion labels and a stacked machine learning model according to an embodiment;
FIG. 3 is a general framework diagram of a learning style label division method constructed in the embodiment of the present invention;
FIG. 4 is a schematic diagram of a cluster meaning determination method constructed in the present invention in the embodiment;
FIG. 5 is a schematic diagram of an online learning behavior scale constructed by the present invention in an example;
fig. 6 is a general framework diagram of a learning style prediction method constructed by the present invention in the embodiment.
Detailed Description
The invention is described in detail below with reference to the figures and specific embodiments. It is to be understood that the embodiments described are only a few embodiments of the present invention, and not all embodiments. All other embodiments, which can be obtained by a person skilled in the art without any inventive step based on the embodiments of the present invention, shall fall within the scope of protection of the present invention.
Examples
The invention aims to solve the problem that the learning style of a learner in the online learning process can be implicitly and dynamically identified by the method based on the fusion label and the stacked machine learning model in the application field of learning style identification. The learning style of learners is divided into 4 types according to the theory of Kolb learning style: divergent, central, compliant, and anabolic types, because the learning style changes with the age of the learner, the cognitive level, the environment, and other factors, this variability makes the static learning style obtaining method of the scale unreliable. In addition, because it is not possible to ensure a balanced learning style label of the collected data sample, in order to ensure the accuracy of identification, a resampling technique is adopted to improve the problem of unbalanced data samples while identifying the learning style by using a stacked machine learning model, so as to obtain a higher identification rate.
The invention relates to a learning style identification method based on a fusion label and a stacked machine learning model, the basic flow of the method is shown in figure 1, the integral method is mainly divided into two steps of obtaining the label of the learning style of a learner and dynamically identifying the learning style of the learner, and the specific contents are as follows:
step 1: at first, the Learning Style of the learner is unknown, and the Learning Style labeling problem of the learner is solved by using the Learning Style labeling Method (Learning Style labeling Method) provided by the present invention and using LSDM as the following, and the specific flow of the Method is shown in fig. 3. In fig. 3, first, after a plurality of learners fill in two quantitative tables respectively to obtain corresponding quantitative table data, the learning style is divided by an objective method and a subjective method, the subjective method is described as a learning style dividing method based on a given rule, when the learning style scores are the highest, the learning style cannot be effectively divided by the two rules, and the learning style under the subjective method is obtained by taking an intersection; the objective method is described as a learning style dividing method based on a clustering algorithm, an optimal clustering algorithm is selected according to an adjusted contour coefficient (Silhouette score), and then the clustered clusters are determined according to EAM (average likelihood), so that the learning style under the objective method is obtained. And finally, taking intersection of the learning style division results of the subjective method and the objective method to obtain a final learning style label. Namely, the LSDM reduces the problem of high subjectivity caused by acquiring labels from a single quantity table by fusing rule-based labeling results of two quantity tables and the labeling results of a clustering algorithm, and a learner needs to fill in the two quantity tables in the method: kolb learning style scale and online learning behavior survey scale (as shown in fig. 4), and then the design idea of the online learning behavior scale and the specific method of fusing the learning style labels are respectively described.
1) Design idea of Kolb learning style scale
Because the learning style and the learning ability have a close and inseparable relationship, the invention designs a new online learning behavior scale as a supplement of the Kolb learning style scale, and the online learning behavior scale contains 27 problems in total. The back of each preset problem corresponds to different learning abilities of the students, and the learning abilities can reflect implicit learning behaviors of the students. The scale uses the Likert5 scale structure (strongly agreed, weakly agreed, not agreed, strongly agreed) as the answer to each preset question. Four basic capabilities are hidden behind the Kolb learning style: specificity and experience (solution, imagination, execution), action and application (attention, intent, independence), thinking and observation (interpretation, will, feedback), abstraction and summarization (abstraction, logic, practice), each learning ability measured in terms of at least 3 questions based on the rationality of the scale design. The principle of "ensuring the quality of the gauge while minimizing the number of problems" is generally followed, and the specific gauge is shown in fig. 4.
2) Method for fusing learning style labels
For a total of 12 questions on the Kolb learning style scale, the learner would need to rank the four options in each question according to their degree of match, and then calculate the score for each learning style of the learner using the given rules. Learning style score LSkThe calculation formula of (a) is as follows:
wherein a isi,jThe score of the jth option representing the ith question, and k represents one of four learning approaches. The score of each learning style is the sum of the scores of the corresponding options of the 12 questions, and then, the learning style with the highest score is identified as the learning style of the learner, but when a plurality of learning styles obtain the same highest score, this approach cannot divide the learning styles, such as that the scores of the divergence type and the compliance type are both 33, which are larger than those of the other two learning styles. The successful division of the set of learning styles and the unsuccessful division of the set of learning styles are obtained by calculating the learning style scores. Calculating the online learning behavior table in the same way, then taking intersection of the results obtained by the online learning behavior table and the online learning behavior table, and marking out the same learning style for both the online learning behavior table and the online learning behavior table according to the success of marking out the same learning styleThe learner's learning style at the moment. The rest is unsuccessfully supplemented by adopting a learning style label dividing method based on clustering.
First, too high a number of features can have a negative impact on the clustering effect. Therefore, it is necessary to use linear transformation to reduce the feature dimension of online learning behavior scale data before clustering, while preserving the original meaning of the data. Preferably, the original quantity table has 27 dimensionalities, and when the dimensionalities are reduced to 24 dimensionalities, the clustering effect is optimal. Therefore, principal component analysis is selected as a characteristic dimension reduction algorithm, clustering experiments are carried out on the data subjected to dimension reduction by using K-Means + + to obtain four clustering clusters, and then an Expert labeling-based cluster meaning determination method (EAM) is constructed, as shown in figure 5. The centroid is a value that is continuously adjusted by the K-Means + + algorithm during the iteration. The clustering algorithm may determine the location of the centroid of each cluster after clustering is completed. Suppose that cluster C is divided into { C1,c2,...,cmM is the number of cluster partitions, the goal of K-Means + + is to minimize the squared error E:
wherein u isnIs a cluster cnX is the cluster cnMiddle sampling, | left without counting2Representing the euclidean distance between the two vectors. The formula for the centroid is as follows:
after the centroid of the cluster is obtained, the Euclidean distances from the remaining points to the centroid are calculated. Then, an initial distance threshold is set to 5, i.e., the 5 sample points in the cluster closest to the centroid are selected, and then the experts need to determine their respective learning styles from the Kolb learning style scale data and the online learning behavior scale data for these 5 sample points. If the expert determines that more than half of the samples in a sample point are of the same learning style, the intrinsic meaning of the cluster will be determined to be that learning style, if the above conditional centroid selection method is not satisfied, the threshold will be expanded by 1 speed until the learning style can be successfully divided, which is called EAM, see fig. 5. Finally, the partially divided learning styles complement learning styles that cannot be divided in the rule-based approach.
Step 2: according to the learning style label obtained in the step 1, reasonable features are constructed by using data collected in an online learning platform, correlation test is carried out between the two by using a Spearman correlation coefficient, proper training and testing data are selected, initial training is carried out on a stacked machine learning model, and the specific structure and the training process of the model are shown in fig. 6. In fig. 6, firstly, behavior data generated by a learner in an online teaching platform is preprocessed, and then a learning style labeling result of an LSDM model is combined to solve a sample imbalance problem by using SMOTE, where the whole stacked machine learning model (hereinafter abbreviated as SMLM) includes two layers including 4 basic classifiers and 1 regressor. And the training method of the model is shown in detail. Stacking is a model fusion strategy in ensemble learning, and integrity can be improved by fusing a plurality of single models. The SMLM model is a fusion model based on a two-layer model, the first layer consisting of four basic classifiers: random Forest (RF), Gradient Boosting Decision Trees (GBDT), Support Vector Machines (SVM), and multi-layered perceptrons (MLP). The inputs are the original training set and the test set. The Logistic Regressor (LR) at the second level uses the output of the basic classifier at the first level as input to join the training set for retraining, thus obtaining a complete stack model. The whole process comprises data preprocessing, model training and parameter adjustment and model evaluation. Specifically, the method comprises the following steps:
1) data pre-processing
This section includes log processing, data normalization and resampling. Firstly, the log represents the interaction record left by the student on the online teaching platform, the log record (labeled behavior label) is processed according to the specified rule to obtain the online learning behavior characteristics of the student, and the online learning behavior log data is converted into the statistical data based on the duration and the times, so that the training difficulty of the stack model can be reduced. Then, because the online learning behavior features have different structures, each feature is normalized by the following formula:
where y represents a certain online learning behavior vector, y*Is a normalized value of ymeanIs the sample y in yaMean value of (a), ystdIs the sample y in yaStandard deviation of (2).
In addition, whether a resampling method is used or not is decided according to the distribution of the learning style obtained by LSDM, and a new example is synthesized by analyzing several types of sample distribution rules by using the SMOTE algorithm, so that the overfitting problem caused by random overfitting of repeated samples can be reduced. In particular, for each sample, sample y' in the minority class, the distance from the class to all samples in the minority class sample set is computed using the Euclidean distance to obtain its n nearest neighbors, then the randomly selected neighboring samples are usedConstructing a new sample together with the original sample, the formula is as follows:
where y 'represents all online learning behavior record samples, y'newFor newly generated samples, rand (0,1) is a random number between 0 and 1.
These synthetic data will be added to the training set of the original data set to solve the class imbalance problem. And finally, dividing the data set into a training set and a testing set according to 70% and 30%.
2) Training and tuning of models
The visualization process of model training is shown in fig. 6. Specifically, first, the preprocessed data are trained in the stacking model using a 5-fold cross-validation strategyTo practice, the model divides the input training set into five training set subsets traini,i∈[1,5]. Then, operating on each basic classifier of the first layer: in 5-fold cross validation, each fold uses sequence i as the validation set, and the remaining four sequences j, j ∈ [1,5 ]]Lambda.j ≠ i as training set. The classifier is trained on this data. Meanwhile, the trained model is used for predicting the test set. At this point, each basic classifier of the first layer will obtain five different predictions. Then, P is obtained by vertically stacking these five predictionsn1And obtaining P by averaging the predicted data of the validation setn1. Finally, P with corresponding labeln1Used as training set and P with corresponding labeln2And (5) using the test set as a test set, performing final classification through a meta classifier Logistic, and obtaining a final prediction result.
The parameters of the model are adjusted one by one, the parameters of a single classifier except the multilayer perceptron of the first layer are searched for the optimal parameters under the current data set in a mode of combining random search coarse adjustment and grid search fine adjustment, and the multilayer perceptron optimizes the network parameters through a back propagation algorithm so as to improve the prediction effect of the model.
3) Evaluation of the model
And (3) performing performance evaluation on the trained stack classification model by using Accuracy (Accuracy), Recall (Recall), Precision (Precision), F1 score and area under the curve (AUC).
To demonstrate the effectiveness of the method of the present invention, this example performed a practical experiment that selected learning data from college students from a college on an online teaching platform, including students from 4 grades and 18 colleges. The data considers as many students as possible with different learning backgrounds. The learning mode of students is dynamically changed, so that the 'scale data and learning platform data' collected in the experiment are continuous teaching scholars so as to ensure the timeliness of research. The period of online learning data of students is 3/1/2020 to 7/1/2020. In the process of sorting the scale data, some students can repeatedly fill in the scales, some students cannot completely fill in the scales, and some students cannot be matched on the teaching platform. According to psychological studies, the first impression of people when filling out a scale is valid, so this experiment only selects students to fill out the scale for the first time, and in the latter two cases the students' information will be deleted. The number of the finally collected effective students is 2056, and 4 learning styles are designed in the experiment: divergent, concentrated, compliant, and anabolic.
The learning style identification method based on the fusion label and the stacking machine learning model comprises the following specific steps:
firstly, learning style label labeling: and calculating by using the two rules to obtain the learning styles corresponding to the two weight tables, taking the intersection to obtain the divided and undivided learning styles, clustering the online learning behavior tables to determine the learning style, and supplementing the undivided learning style.
1.1, issuing two scales to answer students of a university, and calculating the learning style scores by using the following formula according to a Kolb learning style scale after recovering data:
wherein a isi,jThe score of the jth option representing the ith question, and k represents one of four learning approaches. The score for each learning style is the sum of the scores of the corresponding options for the 12 questions, resulting in a successful and unsuccessful split set of learning styles. Aiming at the online learning behavior scale, calculating the learning style scores by using the following formula:
wherein wiRepresents the weight of the i question set by the researcher, and willThe highest learning style is determined as the learning style of the student, and the mode also results in successfully dividing a set of learning styles and unsuccessfully dividing a set of learning styles. Obtaining LS by taking intersection of the two setsknowAnd LSunknow。
1.2, comparing four dimensionality reduction methods of low variance filtering, Pearson correlation coefficient, factor analysis and principal component analysis in the online learning behavior scale data to obtain the best dimensionality reduction effect of the principal component analysis on the data set. Subsequently, the effect evaluation indices used were Silhouuette index, Calinski-Harabasz index, Davies-Bouldin index, and cluster balance index by comparing the clustering effects of K-Means + +, Birch, Aglomerative, and K-Means. The cluster balance index is an evaluation index for measuring the fluctuation of the number of clusters, because the imbalance of the number of clusters can cause the imbalance of the learning style categories, and the specific formula is as follows:
CBI=std<count(c1)|...|count(c4)>
in the formula, CBI represents cluster balance index, std<*|...|*>Represents the standard deviation, count (c), of all values in the setk) Representative pairs belonging to the same cluster ckThe samples of (2) are summed.
The K-Means + + obtained after comprehensive evaluation has the best clustering effect, and can better meet the requirements that the inter-cluster distance is as large as possible and the intra-cluster distance is as small as possible. And then, determining cluster meanings of 4 clusters obtained by KMeans + + by adopting EAM, wherein 5 sample points around the centroid are selected as threshold values, successfully obtaining a learning style described by a clustering algorithm through expert marking, and finally marking out the learning style distribution of 2056 students in a form of intersection again: compliance was 32.93%, dispersion 19.94%, assimilation 10.51%, and concentration 36.62%.
II, dynamic prediction of learning style: firstly, a Spearman correlation coefficient is utilized to construct reasonable characteristics for the learning style label obtained in the step one and the data collected in the online learning platform for correlation test, then proper training and testing data are selected, a stacked machine learning model is trained, and a model with reasonable parameters is constructed.
2.1, firstly, processing log records (labeling behavior labels) according to specified rules to obtain online learning behavior characteristics of students, and at the moment, converting online learning behavior log data into statistical data based on duration and times to reduce training difficulty of the stack model. Using Spearman correlation coefficient to compare learning style labels divided by LSDM, Kolb learning style scale and online learning behavior scale with learning behavior characteristics on online learning platform for correlation analysis, namely, label Y, Kolb scale Y2 and online learning behavior scale Y3 obtained by learning behavior characteristics x and LSDM are respectively subjected to Spearman correlation analysis, namely x and Y are subjected to Spearman correlation analysis*In the meantime.
Taking the label Y obtained by learning the behavior feature x and the LSDM as an example, the calculation formula of the Spearman correlation coefficient ρ is as follows:
in the formula (d)iAnd representing paired variable position differences obtained by respectively sequencing two variables obtained by calculating the Pearson correlation coefficients of the features and the labels, wherein n represents the number of samples.
The correlation results then show that the LSDM-based learning style classification method can obtain 4 significantly correlated indexes (the number is higher than the Kolb learning style scale and the online learning behavior scale), which indicates that the LSDM method is more reasonable in obtaining the learning style. Although correlation coefficients are valuable in identifying important indicators, the relationship between data is very complex. The correlation coefficient describing how two variables change together cannot directly determine the learning style of the student.
2.2, then enter the data preprocessing section. Because the online learning behavior features have different structures, the online learning behavior features are standardized by adopting the following formula:
further, according to LSDMThe obtained learning style is unbalanced in distribution, a resampling method is needed, and compared with the effects of 5 resampling technologies including random undersampling, cluster centroid sampling, random oversampling and BorderLine SMOTE and SMOTE, SMOTE is selected as a resampling method more suitable for the sample. Specifically, for each sample, y' in the minority class, the distance from the class to all samples in the minority class sample set is computed using the Euclidean distance to obtain its k nearest neighbors, then the randomly selected neighboring samples are usedConstructing a new sample together with the original sample, the formula is as follows:
these synthetic data will be added to the training set of the original data set to solve the class imbalance problem. And finally, dividing the data set into a training set and a testing set according to 70% and 30%.
2.3, then, constructing a two-layer stacking machine learning fusion model, wherein the first layer consists of four basic classifiers: random forest, gradient boosting decision tree, support vector machine, multilayer perceptron and second layer of logistic regression. The inputs are the original training set and the test set, and the outputs of the first-level basic classifier are used as inputs to join the training set for retraining. Specifically, first, the preprocessed data is trained in a stacked model using a 5-fold cross-validation strategy, which divides the input training set into five training set subsets traini,i∈[1,5]. Then, operating on each basic classifier of the first layer: in 5-fold cross validation, each fold uses sequence i as the validation set, and the remaining four sequences j, j ∈ [1,5 ]]Lambda.j ≠ i as training set. The classifier is trained on this data. Meanwhile, the trained model is used for predicting the test set. At this point, each basic classifier of the first layer will obtain five different predictions. Then, P is obtained by vertically stacking these five predictionsn1And pass the verification ofSet prediction data averaging obtaining Pn1. Finally, P with corresponding labeln1Used as training set and P with corresponding labeln2And (5) using the test set as a test set, performing final classification through a meta classifier Logistic, and obtaining a final prediction result.
2.4, subsequently, to further evaluate the performance of the proposed model, a comparison is made with the baseline machine learning method: k nearest neighbor classification algorithm (KNN), Gaussian-based naive bayes (Gaussian NB), Bernoulli-based naive bayes (Bernoulli NB), entropy-based decision trees, kini-based decision trees, Support Vector Machines (SVM), Random Forests (RF), Adaboost, random gradient descent classifiers (SGD), Bagging with random forests (Bagging), extreme random trees (ET), gradient enhanced decision trees (GBDT), voting methods, and multi-layered perceptron (MLP). Meanwhile, the methods are trained by data processed by SMOTE. The basic classifier used in the voting method is the same as the classifier of the first layer of the stacked machine learning model, and the parameter settings are also consistent. The remaining methods all use the corresponding default parameters in sklern. The proposed stacked machine learning model has better prediction performance than the baseline machine learning approach.
And 2.5, adjusting parameters of the model one by one, searching optimal parameters under the current data set by combining random search coarse adjustment and grid search fine adjustment of the parameters of a single classifier except the multilayer perceptron of the first layer, and optimizing network parameters by the multilayer perceptron through a back propagation algorithm to improve the prediction effect of the model.
And 2.6, finally, carrying out comprehensive performance evaluation on the trained stacked classification model by adopting Accuracy (Accuracy), Recall (Recall), Precision (Precision), F1 score and area under the curve (AUC), acquiring an optimal model, inputting online learning behavior data into the model for identification, wherein the online learning behavior data can adopt video watching times, video watching time length, discussion participation times, hand holding times and the like. Compared with the traditional machine learning method, under the same condition, the model disclosed by the invention can generate a result with higher accuracy.
In conclusion, the learning style prediction model constructed by the invention is based on a 5-machine learning classification model and a classical clustering algorithm which are machine learning algorithms tested in different scenes, and an integral framework for learning style identification of learners is constructed by utilizing the characteristics and advantages of the machine learning algorithms, wherein a Kolb learning style scale used for obtaining learning style labels also exists in the previous research and is proved to have higher credibility; therefore, the invention fully utilizes the existing research results, and provides the learner learning style identification method based on the fusion label and the stacking machine learning model aiming at the problems that the prior research does not consider that the subjectivity of a single-portion scale is too high, the static identification cannot meet the dynamic change characteristics of the learning style and the identification precision is insufficient. According to the method, redundant information in data is removed by reconstructing the data and extracting important features by utilizing the characteristics of a stacking classification model, the data is trained in a fusion manner, meanwhile, the SMOTE technology is proposed to be used for improving the problem aiming at the problem of accuracy reduction caused by unbalanced samples, and finally, the model obtains accurate recognition results in 4 learning styles, namely, the recognition accuracy of divergent type, concentrated type, compliant type and assimilation type can reach 97.6%, 96.3%, 96.1% and 95.4%, so that the method has a practical application prospect.
While the invention has been described with reference to specific embodiments, the invention is not limited thereto, and those skilled in the art can easily conceive of various equivalent modifications or substitutions within the technical scope of the invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.
- 上一篇:石墨接头机器人自动装卡簧、装栓机
- 下一篇:一种基于注意力机制的图像分类方法