textCNN-based method and system for detecting online learning behaviors of students
1. A textCNN-based method for detecting online learning behaviors of students is characterized by comprising the following steps: the detection method comprises the following steps;
step S1, logging in an online classroom, and crawling the speaking information of students as learning behavior data;
step S2, preprocessing the obtained data to form preprocessed data;
step S3, pre-training a Skip-gram model by utilizing the pre-processing data;
step S4, training to obtain a TextCNN classification model;
step S5, identifying the online learning behavior of the student to be tested by using a TextCNN classification model;
and step S6, calculating the course comment activity index and the final score of each student according to the recognition result.
2. The TextCNN-based method for detecting on-line learning behavior of a student as claimed in claim 1, wherein: in the step S1, the learning behavior data of the student is crawled by a method combining the Selenium and the browser driver, specifically: firstly, inputting an account and a password to enter a comment area of an online classroom in a login state, and acquiring a total page number of a comment page according to a label of a last comment; secondly, constructing a URL browsing comment page according to the page number; and finally, obtaining comments and corresponding user ids in each comment page according to the tags, and storing the comments and the corresponding user ids as learning behavior data.
3. The TextCNN-based method for detecting on-line learning behavior of a student as claimed in claim 1, wherein: the step S2 includes the following steps;
step A1, removing symbols from the crawled data, specifically: converting symbols under Unicode coding into spaces;
a2, performing word segmentation processing on the data with the symbols removed, namely performing word segmentation on the data with the symbols removed by using an open source jieba word segmentation device;
step A3, removing stop words from the data after word segmentation, namely properly filtering the stop words to avoid recognition deviation, wherein the method comprises the steps of loading a stop word list, inquiring the stop word list one by one, and then removing the stop words; the inactive word list comprises Chinese language and qi words, auxiliary words and words without obvious actual meaning;
step A4, establishing a labeling function label (x) for labeling the comment data subjected to the steps, wherein x is the data to be labeled; screening out a plurality of words which have the highest frequency of appearance and can be clearly judged whether to be related to the course or not from the data without labels as label words, and dividing the label words into course-related words and words which are unrelated to the course; labeling comments containing the words with labels; the comment containing the relevant words of the course is marked as 1, namely, an effective comment, and the irrelevant words are marked as 0, namely, an ineffective comment; repeating the steps until all comments are marked; if the last remaining comments are not enough to find out the tagged words, manually traversing one by one, and tagging the comments according to the expert experience; the labeled data were 80% classified as training set and 20% as test set.
4. The TextCNN-based method for detecting on-line learning behavior of a student as claimed in claim 3, wherein: the screened data without labels has the highest frequency of occurrence and can definitely judge whether the number of a plurality of words related to the course is ten; the curriculum-related terms include "exam," "unit test," and "assignment"; the curriculum irrelevant words comprise 'untidy' and 'eat'; the deactivation vocabulary includes "yes", "o";
in the step a1, symbols are removed using a character replacement method in the range u ' \ u4e00 ' -u ' \ u9fff of the chinese common character.
5. The TextCNN-based method for detecting on-line learning behavior of a student as claimed in claim 1, wherein: the Skip-gram model in the step S3 includes an input layer, a hidden layer, and an output layer;
input w of the input layercThe matrix is obtained by one-hot coding words, the size is V multiplied by 1, and V represents the number of words in a vocabulary table; the dimension of the representative word in the one-hot code is 1, and the others are 0;
weight matrix W of the hidden layer1Is a d × V matrix, thus passing through the formula W1wcThe output v of the hidden layer can be obtainedcThe size is d × 1;
a weight matrix W of the output layer2Is a V x d matrix, represented by the formula W2vcAnd softmax function, obtaining a V multiplied by 1 dimensional probability matrix, V being a vocabulary tableThe number of words in the Chinese word table represents the probability that the input word and each word in each word table are context;
the Skip-gram model is used for predicting the context words of the input word vector, and obtaining a hidden layer weight matrix W after pre-training1I.e. the weights of the word vector lookup table, i.e. the TextCNN model embedding layer, required for the subsequent steps.
6. The TextCNN-based method for detecting on-line learning behavior of a student as claimed in claim 1, wherein: in step S4, the TextCNN model is divided into the following layers:
b1, embedded layer: the input of the embedded layer is preprocessed comment data, the comment data is represented in a one-hot form, the size is nxv, n is the maximum number of words forming a sentence, and the weight is the transpose of the weight of the hidden layer obtained by pre-training in the step S3The size is V x d, and the output x of the embedded layer is obtained by multiplying the V x d and the V x d1:nThe size is n multiplied by d;
b2, convolutional layer: which checks the x using a plurality of convolution kernels of different sizes1:nPerforming a convolution operation to extract text features, the form of the convolution operation for each convolution kernel can be expressed as follows:
yi=g(w·xi:i+h-1+ b), i is more than or equal to 1 and less than or equal to n-h +1, formula I;
wherein w ∈ RhdWeight vector, x, representing the convolution kerneli:i+h-1Line i to line i + h-1 of the vector representing the embedded layer output, b represents the bias, g represents the nonlinear activation function, yiI.e. the i-th element of the feature vector obtained after convolution. Each convolution kernel will be applied to every possible window of the sentence { x1:h,x2:h+1,...,xn-h+1:nGenerating a feature vector y ═ y1,y2,...,yn-h+1];
B3, pooling layer: which maximally pools the results obtained from the convolutional layers, generated for each convolutional kernelThe eigenvector y ═ y1,y2,...,yn-h+1]Selecting the maximum value y' ═ max { y }, and combining all the maximum values into a new feature vector;
b4, full connection layer: the method takes fixed-length feature vectors output by a pooling layer as input, generates binary results by using a softmax function, and prevents overfitting by using a random inactivation strategy and a weight attenuation strategy.
7. The TextCNN-based method for detecting on-line learning behavior of a student as claimed in claim 1, wherein: in the step S5, the specific method includes acquiring learning behavior data of students to be detected, including user ids and comment data, and performing preprocessing operation on the acquired data, where the preprocessing operation includes removing symbols, word segmentation, and stop words, then using the preprocessed comment data as input of a classification model, and finally recording a detection result of each comment in a data table corresponding to each user id.
8. The TextCNN-based method for detecting on-line learning behavior of a student as claimed in claim 1, wherein: the step S6 specifically includes the following steps;
step C1, student in statistical data sheeti(i∈N+) Number of valid comments ei(ii) a The maximum upper limit number of comments of each student is set as m, the standard is that the comments reach the upper limit number and are all effective comments, and then all scores of the comments can be obtained; defining the number of course selection people in the current school period as n, and introducing an activity index of a review area to measure the participation of students in the course, wherein the number is expressed as follows:
the average value of the sum of the ratios of the number of effective comments of all students to the maximum number of upper limits of comments is used as one of indexes for measuring the construction quality of the admiration course.
Step C2, setting student of each studentiThe end-of-term test score, the unit test score, the viewing duration score and the course discussion score of (1) are respectively expressed asAndthe percentage of the four parts in the total part is w1,w2,w3And w4。Andpart of objective scores are directly given by an online classroom, and watching duration scoresBy studentiIs viewed for a time period tiThe final score is expressed as a ratio to the total duration of the video, total, and is converted to a percentile system as follows:
similarly, course discussion scoresExpressed by the ratio of the number of valid comments to the highest upper limit number of student comments:
step C3, studentiTotal score of courseIs represented as follows:
whereinThe sum of the percentages of the four parts is 1, i.e. the course score is composed of the four parts only, andare all in percent, and the total score obtained finallyAlso in percent.
9. The TextCNN-based method for detecting on-line learning behavior of a student as claimed in claim 1, wherein: the online classroom is an online classroom of a MOOC admiration class website of Chinese university.
10. A textCNN-based student online learning behavior detection system adopts a textCNN-based student online learning behavior detection method, and is characterized in that: the device comprises a preprocessing module, a model training module and a learning behavior detection module; in particular to
A. A preprocessing module: the system is used for crawling and preprocessing student learning behavior data and pre-training a Skip-gram model and comprises a data crawling module, a data preprocessing module and a Skip-gram pre-training module; firstly, crawling speaking information of students as learning behavior data, then preprocessing the crawled data, and secondly, using the preprocessed learning behavior data w of the students as a Skip-gram modelcPre-training input, and finally transposing hidden layer weight of pre-trained Skip-gram modelThe preprocessed student learning behavior data w serving as the weight of an embedded layer in a model training modulecTransmitting to a model training module;
B. a model training module: the method is used for training a TextCNN model, and particularly relates to the transposition of the weight of a Skip-gram model hidden layer obtained by a preprocessing module in an embedded layerFor weight, student learning behavior data wcPerforming word vectorization for input, performing convolution operation on the convolutional layer to extract text features, performing maximum pooling on the pooling layer, performing full-connection operation on the full-connection layer to obtain two classification results, and training the model to obtain a classification model;
C. a learning behavior detection module; the system is used for detecting the learning behaviors of students so as to obtain course comment activity indexes and final scores of the students and comprises a data preprocessing module, a data classification module and a calculation module; preprocessing data to be detected, identifying student behaviors by using a classification model obtained by a model training module, reconstructing scores of discussion parts of courses according to identification results, and calculating and outputting an active index of course comments and each studenti(i∈N+) Is finally scored
Background
Online lessons become the main form of students continuing to receive education in epidemic situations, and many high-quality lessons are released in the major schools MOOC (mu lessons) in China. Course assessment content relates to unit tests, end-of-term tests, video learning duration, and course discussions. The idea of the course discussion is to develop the independent thinking ability of students, encourage the students to ask questions actively, communicate with teachers in the course and help course teams to advance course construction. However, the number of participating comments as the only criterion for assessing the score of the course discussion results in the student making comments unrelated to the course with the goal of getting a score, so that the course discussion is streamed. Moreover, to improve the efficiency of teacher-student interaction, the assistant needs to spend a lot of time identifying and deleting invalid comments. In order to measure the course discussion scores and the student course participation degrees by the effective evaluation quantity, encourage students to carefully learn thinking, improve the learning effect of admiring the students and reduce the teaching assistant workload, a method and a system for classifying the evaluation of the admiring students are urgently needed, and the establishment of the admiring course under epidemic situation normality is promoted.
Text classification is the field of natural language processing and refers to the process by which a computer maps sentences or text carrying information to a given category or subject of several categories. The traditional machine learning text classification algorithm usually adopts a method for extracting word frequency and word bag, and then carries out model training, such as naive Bayes, support vector machine and the like, wherein the methods are insufficient for text feature extraction, and lack attention to semantics, word order and context association of the text. The deep learning text classification has good feature extraction capability and text representation capability and can present better results. The deep learning text classification mainly comprises a context-based mechanism, a memory storage-based mechanism, a word vector-based mechanism and the like.
A deep learning text classification model based on a context mechanism. lai proposes a model (RCNN) of a cyclic convolution neural network to solve the problem that the traditional model ignores the word order and the context and is complex in structure. The model is different from the traditional model in that a bidirectional recurrent neural network is used, noise is reduced, the context of the text is focused, a larger word sequence range is reserved, and a pooling layer is adopted for capturing key information of the text. The method integrates the characteristics of a cyclic network and a convolutional neural network, and improves the generalization capability of the model.
And a deep learning text classification model based on a memory storage mechanism. In order to allow information to be passed to deeper layers when processing long-sequence text, students have proposed long short-term memory networks (LSTM) and gated round-robin units (GRU). In LSTM, a forgetting gate is used for determining the retention and discarding of data, an input gate is used for processing input information, the input information and a memory unit at the previous moment are accumulated to obtain a current memory unit, and an output gate controls the hiding state that the information of the current memory unit can be transmitted to a hiding layer to prepare for the next accumulation. The GRU is very similar to the LSTM, it is simpler than the LSTM network structure, has fewer parameters, converges more easily, and only updates and resets two gates. But the LSTM performs better under large-scale datasets.
A deep learning classification model based on word vectors. In order to solve the problem that the traditional vector space model assumes that the feature items are independent of each other, the text word vector form is adopted, and data similar to images and voice are adopted, so that the similarity of words is considered, and the position information of the words in the text is considered. The Word2vec predictive Word vector method draws attention, and two model structures of CBOW and Skip-gram are constructed. The CBOW model predicts the target word using the context of the current word, while the Skip-gram method predicts the words in its context using the emerging words. The word vectors are applied to CNN, the structure is simple, the classification effect is better, the speed is higher, and the accuracy is improved.
The textCNN is a deep classification model based on word vectors, can effectively capture sentence characteristics, improves text classification accuracy and has strong adaptability.
Disclosure of Invention
The invention provides a textCNN-based student online learning behavior detection method, which is characterized in that a new course discussion score evaluation method is designed by identifying effective and ineffective comments in an online classroom comment area, an auxiliary evaluation tool is provided for an online learning platform, the learning behavior of students is corrected, the independent thinking capability of the students is cultured, and the establishment of an admire course under epidemic situation normality is promoted.
The invention adopts the following technical scheme.
A textCNN-based method and a system for detecting online learning behaviors of students are provided, wherein the detection method comprises the following steps;
step S1, logging in an online classroom, and crawling the speaking information of students as learning behavior data;
step S2, preprocessing the obtained data to form preprocessed data;
step S3, pre-training a Skip-gram model by utilizing the pre-processing data;
step S4, training to obtain a TextCNN classification model;
step S5, identifying the online learning behavior of the student to be tested by using a TextCNN classification model;
and step S6, calculating the course comment activity index and the final score of each student according to the recognition result.
In the step S1, the learning behavior data of the student is crawled by a method combining the Selenium and the browser driver, and specifically, the method includes: firstly, inputting an account and a password to enter a comment area of an online classroom in a login state, and acquiring a total page number of a comment page according to a label of a last comment; secondly, constructing a URL browsing comment page according to the page number; and finally, obtaining comments and corresponding user ids in each comment page according to the tags, and storing the comments and the corresponding user ids as learning behavior data.
The step S2 includes the following steps;
step A1, removing symbols from the crawled data, specifically: converting symbols under Unicode coding into spaces;
a2, performing word segmentation processing on the data with the symbols removed, namely performing word segmentation on the data with the symbols removed by using an open source jieba word segmentation device;
step A3, removing stop words from the data after word segmentation, namely properly filtering the stop words to avoid recognition deviation, wherein the method comprises the steps of loading a stop word list, inquiring the stop word list one by one, and then removing the stop words; the inactive word list comprises Chinese language and qi words, auxiliary words and words without obvious actual meaning;
step A4, establishing a labeling function label (x) for labeling the comment data subjected to the steps, wherein x is the data to be labeled; screening out a plurality of words which have the highest frequency of occurrence and can definitely judge whether the words are related to the course or not from the data without labels as label words, and dividing the label words into course-related words and words which are unrelated to the course; labeling comments containing the words with labels; the comment containing the relevant words of the course is marked as 1, namely, an effective comment, and the irrelevant words are marked as 0, namely, an ineffective comment; repeating the steps until all comments are marked; if the last remaining comments are not enough to find out the tagged words, the comments are manually traversed one by one, and the comments are tagged according to the experience of experts; the labeled data were 80% classified as training set and 20% as test set.
The screened data without labels has the highest frequency of occurrence and can definitely judge whether the number of a plurality of words related to the course is ten; the related words of the course comprise examination, unit test, homework and the like; the words unrelated to the curriculum comprise 'untidy' and 'eat' and the like; the deactivation vocabulary comprises 'yes', 'o' and the like;
in the step a1, symbols are removed using a character replacement method in the range u ' \ u4e00 ' -u ' \ u9fff of the chinese common character.
The Skip-gram model in the step S3 includes an input layer, a hidden layer, and an output layer;
input w of the input layercThe matrix is obtained by one-hot coding words, the size is V multiplied by 1, and V represents the number of words in a vocabulary table; the dimension of the representative word in the one-hot code is 1, and the others are 0;
weight matrix W of the hidden layer1Is a d × V matrix, thus passing through the formula W1wcThe output v of the hidden layer can be obtainedcThe size is d × 1;
a weight matrix W of the output layer2Is a V x d matrix, represented by the formula W2vcAnd a softmax function, which can obtain a V multiplied by 1 dimensional probability matrix, wherein V is the number of words in the vocabulary table and represents the probability that the input word and each word in each word table are context;
the Skip-gram model is used for predicting the context words of the input word vector, and obtaining a hidden layer weight matrix W after pre-training1I.e. the weights of the word vector lookup table, i.e. the TextCNN model embedded layer, required for the subsequent steps.
In step S4, the TextCNN model is divided into the following layers:
b1, embedded layer: the input of the embedded layer is preprocessed comment data, the comment data is represented in a one-hot form, the size is nxv, n is the maximum number of words forming a sentence, and the weight is the transpose of the weight of the hidden layer obtained by pre-training in the step S3The size is V x d, and the output x of the embedded layer is obtained by multiplying the V x d and the V x d1:nThe size is n multiplied by d;
b2, convolutional layer: which checks the x using a plurality of convolution kernels of different sizes1:nPerforming a convolution operation to extract text features, the form of the convolution operation for each convolution kernel can be expressed as follows:
yi=g(w·xi:i+h-1+ b), i is more than or equal to 1 and less than or equal to n-h +1, formula I;
wherein w ∈ RhdWeight vector, x, representing the convolution kerneli:i+h-1Line i to line i + h-1 of the vector representing the embedded layer output, b represents the bias, g represents the nonlinear activation function, yiI.e. the i-th element of the feature vector obtained after convolution. Each convolution kernel will be applied to every possible window of the sentence { x1:h,x2:h+1,...,xn-h+1:nGenerating a feature vector y ═ y1,y2,...,yn-h+1];
B3, pooling layer: the method includes maximally pooling results obtained by the convolution layers, and generating a feature vector y ═ y for each convolution kernel1,y2,...,yn-h+1]Selecting the maximum value y' ═ max { y }, and combining all the maximum values into a new feature vector;
b4, full connection layer: the method takes fixed-length feature vectors output by a pooling layer as input, generates binary results by using a softmax function, and prevents overfitting by using a random inactivation strategy and a weight attenuation strategy.
In the step S5, the specific method includes acquiring learning behavior data of students to be detected, including user ids and comment data, and performing preprocessing operation on the acquired data, where the preprocessing operation includes removing symbols, word segmentation, and stop words, then using the preprocessed comment data as input of a classification model, and finally recording a detection result of each comment in a data table corresponding to each user id.
The step S6 specifically includes the following steps;
step C1, systemEach student in the data sheeti(i∈N+) Number of valid comments ei(ii) a The maximum upper limit number of comments of each student is set as m, the standard is that the comments reach the upper limit number and are all effective comments, and then all scores of the comments can be obtained; defining the number of course selection people in the current school period as n, and introducing an active index of a review area to measure the participation of students in the course, wherein the number is expressed as follows:
the average value of the sum of the ratios of the number of effective comments of all students to the maximum number of upper limits of comments is used as one of indexes of the construction quality of the balanced metrological course.
Step C2, setting student of each studentiThe end-of-term test score, the unit test score, the viewing duration score and the course discussion score of (1) are expressed asAndthe percentage of the four parts in the total part is w1,w2,w3And w4。Andpart of the objective scores are directly given by an online classroom, and the watching duration scoresBy studentiIs viewed for a time period tiThe final score, expressed as a ratio to the total time length of the video, total, translates to a percentile system, as follows:
similarly, course discussion scoresExpressed by the ratio of the number of valid comments to the highest upper limit number of student comments:
step C3, studentiTotal score of courseIs represented as follows:
whereinThe sum of the percentages of the four parts is 1, i.e. the course score is composed of the four parts only, andare all in percent, and the total score obtained finallyAlso in percent.
The online classroom is an online classroom of a MOOC admiration class website of Chinese university.
A student online learning behavior detection system based on TextCNN adopts a student online learning behavior detection method based on TextCNN, and is characterized in that: the device comprises a preprocessing module, a model training module and a learning behavior detection module; in particular to
A. A preprocessing module: for crawling and preprocessing student learning behavior data and pre-training Skip-graThe m model comprises a data crawling module, a data preprocessing module and a Skip-gram pre-training module; firstly, crawling speaking information of students as learning behavior data, then preprocessing the crawled data, and secondly, using the preprocessed learning behavior data w of the students as a Skip-gram modelcPre-training input, and finally transposing hidden layer weight of pre-trained Skip-gram modelThe preprocessed student learning behavior data w serving as the weight of an embedded layer in a model training modulecTransmitting to a model training module;
B. a model training module: the method is used for training a TextCNN model, and particularly relates to the transposition of the weight of a Skip-gram model hidden layer obtained by a preprocessing module in an embedded layerFor weight, student learning behavior data wcPerforming word vectorization for input, performing convolution operation on the convolutional layer to extract text features, performing maximum pooling on the pooling layer, performing full-connection operation on the full-connection layer to obtain a binary classification result, and training the model to obtain a classification model;
C. a learning behavior detection module; the system is used for detecting the learning behaviors of students so as to obtain course comment activity indexes and final scores of the students and comprises a data preprocessing module, a data classification module and a calculation module; preprocessing the data to be detected, identifying student behaviors by using a classification model obtained by a model training module, reconstructing the score of a course discussion part according to an identification result, and calculating and outputting a course comment activity index and each studenti(i∈N+) Is finally scored
The invention has the beneficial effects that: by identifying effective and ineffective comments and designing a new course discussion score evaluation method, an auxiliary evaluation tool is provided for an online learning platform, the learning behavior of students is corrected, the independent thinking capability of the students is cultured, and the establishment of the admire course of online education under epidemic situation normalization is promoted.
Drawings
The invention is described in further detail below with reference to the following figures and detailed description:
FIG. 1 is a schematic flow diagram of the process of the present invention;
FIG. 2 is a schematic flow chart of the scheme of the invention in the process of crawling data and labeling labels;
FIG. 3 is a simplified version of the TextCNN model training process of the present invention;
fig. 4 is a schematic structural diagram of an on-line learning behavior detection system for students in the invention.
Detailed Description
As shown in the figure, the textCNN-based method for detecting the online learning behavior of the student is characterized in that: the detection method comprises the following steps;
step S1, logging in an online classroom, and crawling the speaking information of students as learning behavior data;
step S2, preprocessing the obtained data to form preprocessed data;
step S3, pre-training a Skip-gram model by utilizing the pre-processing data;
step S4, training to obtain a TextCNN classification model;
step S5, identifying the online learning behavior of the student to be tested by using a TextCNN classification model;
and step S6, calculating the course comment activity index and the final score of each student according to the recognition result.
The step Sl of crawling the learning behavior data of the students by adopting a method of combining the Selenium and the browser drive specifically comprises the following steps: firstly, inputting an account and a password to enter a comment area of an online classroom in a login state, and acquiring a total page number of a comment page according to a label of a last comment; secondly, constructing a URL browsing comment page according to the page number; and finally, obtaining comments and corresponding user ids in each comment page according to the tags, and storing the comments and the corresponding user ids as learning behavior data.
The step S2 includes the following steps;
step A1, removing symbols from the crawled data, specifically: converting symbols under Unicode coding into spaces;
step A2, performing word segmentation processing on the data without symbols, namely using the ji of the open sourceeThe ba word segmentation device carries out word segmentation on the data with the symbols removed;
step A3, removing stop words from the data after word segmentation, namely properly filtering the stop words to avoid recognition deviation, wherein the method comprises the steps of loading a stop word list, inquiring the stop word list one by one, and then removing the stop words; the inactive word list comprises Chinese language and qi words, auxiliary words and words without obvious actual meaning;
step A4, establishing a labeling function label (x) for labeling the comment data subjected to the steps, wherein x is the data to be labeled; screening out a plurality of words which have the highest frequency of occurrence and can definitely judge whether the words are related to the course or not from the data without labels as label words, and dividing the label words into course-related words and words which are unrelated to the course; labeling comments containing the words with labels; the comments containing the relevant words of the course are marked as l, namely the comments are valid, and the irrelevant words are marked as 0, namely the comments are invalid; repeating the steps until all comments are marked; if the last remaining comments are not enough to find out the tagged words, the comments are manually traversed one by one, and the comments are tagged according to the experience of experts; the labeled data were 80% classified as training set and 20% as test set.
The screened data without labels has the highest frequency of occurrence and can definitely judge whether the number of a plurality of words related to the course is ten; the related words of the course comprise examination, unit test, homework and the like; the words unrelated to the curriculum comprise 'untidy' and 'eat' and the like; the deactivation vocabulary comprises 'yes', 'o' and the like;
in the step a1, symbols are removed using a character replacement method in the range u ' \ u4e00 ' -u ' \ u9fff of the chinese common character.
The Skip-gram model in the step S3 includes an input layer, a hidden layer, and an output layer;
input w of the input layercThe matrix is obtained by one-hot coding words, the size is V multiplied by 1, and V represents the number of words in a vocabulary table; the dimension of the representative word in the one-hot code is 1, and the others are 0;
weight matrix W of the hidden layer1Is a d × V matrix, thus passing through the formula W1wcThe output v of the hidden layer can be obtainedcThe size is d × 1;
a weight matrix W of the output layer2Is a V x d matrix, represented by the formula W2vcAnd a softmax function, which can obtain a V multiplied by 1 dimensional probability matrix, wherein V is the number of words in the vocabulary table and represents the probability that the input word and each word in each word table are context;
the Skip-gram model is used for predicting the context words of the input word vector, and obtaining a hidden layer weight matrix W after pre-training1I.e. the weights of the word vector lookup table, i.e. the TextCNN model embedded layer, required for the subsequent steps.
In step S4, the TextCNN model is divided into the following layers:
b1, embedded layer: the input of the embedded layer is preprocessed comment data, the comment data is represented in a one-hot form, the size is nxv, n is the maximum number of words forming a sentence, and the weight is the transpose of the weight of the hidden layer obtained by pre-training in the step S3The size is V x d, and the output x of the embedded layer is obtained by multiplying the V x d and the V x d1:nThe size is n multiplied by d;
b2, convolutional layer: which checks the x using a plurality of convolution kernels of different sizes1:nPerforming a convolution operation to extract text features, the form of the convolution operation for each convolution kernel can be expressed as follows:
yi=g(w·xi:i+h-1+ b), i is more than or equal to 1 and less than or equal to n-h +1 formula I;
Wherein w ∈ RhdWeight vector, x, representing the convolution kerneli:i+h-1Line i to line i + h-1 of the vector representing the embedded layer output, b represents the bias, g represents the nonlinear activation function, yiI.e. the i-th element of the feature vector obtained after convolution. Each convolution kernel will be applied to every possible window of the sentence { x1:h,x2:h+1,...,xn-h+1:nGenerating a feature vector y ═ y1,y2,...,yn-h+1];
B3, pooling layer: the method includes maximally pooling results obtained by the convolution layers, and generating a feature vector y ═ y for each convolution kernel1,y2,...,yn-h+1]Selecting the maximum value y' ═ max { y }, and combining all the maximum values into a new feature vector;
b4, full connection layer: the method takes fixed-length feature vectors output by a pooling layer as input, generates binary results by using a softmax function, and prevents overfitting by using a random inactivation strategy and a weight attenuation strategy.
In the step S5, the specific method includes acquiring learning behavior data of students to be detected, including user ids and comment data, and performing preprocessing operation on the acquired data, where the preprocessing operation includes removing symbols, word segmentation, and stop words, then using the preprocessed comment data as input of a classification model, and finally recording a detection result of each comment in a data table corresponding to each user id.
The step S6 specifically includes the following steps;
step C1, student in statistical data sheeti(i∈N+) Number of valid comments ei(ii) a The maximum upper limit number of comments of each student is set as m, the standard is that the comments reach the upper limit number and are all effective comments, and then all scores of the comments can be obtained; defining the number of course selection people in the current school period as n, and introducing an active index of a review area to measure the participation of students in the course, wherein the number is expressed as follows:
the average value of the sum of the ratios of the number of effective comments of all students to the maximum number of upper limits of comments is used as one of indexes of the construction quality of the balanced metrological course.
Step C2, setting student of each studentiThe end-of-term test score, the unit test score, the viewing duration score and the course discussion score of (1) are expressed asAndthe percentage of the four parts in the total part is w1,w2,w3And w4。Andpart of the objective scores are directly given by an online classroom, and the watching duration scoresBy studentiIs viewed for a time period tiThe final score, expressed as a ratio to the total time length of the video, total, translates to a percentile system, as follows:
similarly, course discussion scoresExpressed by the ratio of the number of valid comments to the highest upper limit number of student comments:
step C3, studentiTotal score of courseIs represented as follows:
whereinThe sum of the percentages of the four parts is 1, i.e. the course score is composed of the four parts only, andare all in percent, and the total score obtained finallyAlso in percent.
The online classroom is an online classroom of a MOOC admiration class website of Chinese university.
A student online learning behavior detection system based on TextCNN adopts a student online learning behavior detection method based on TextCNN, and is characterized in that: the device comprises a preprocessing module, a model training module and a learning behavior detection module; in particular to
A. A preprocessing module: the system is used for crawling and preprocessing student learning behavior data and pre-training a Skip-gram model and comprises a data crawling module, a data preprocessing module and a Skip-gram pre-training module; firstly, crawling speaking information of students as learning behavior data, then preprocessing the crawled data, and secondly, using the preprocessed learning behavior data w of the students as a Skip-gram modelcPre-training input, and finally transposing hidden layer weight of pre-trained Skip-gram modelThe preprocessed student learning behavior data w serving as the weight of an embedded layer in a model training modulecTransmitting to a model training module;
B. a model training module: the method is used for training a TextCNN model, and particularly relates to the transposition of the weight of a Skip-gram model hidden layer obtained by a preprocessing module in an embedded layerFor weight, student learning behavior data wcPerforming word vectorization for input, performing convolution operation on the convolutional layer to extract text features, performing maximum pooling on the pooling layer, performing full-connection operation on the full-connection layer to obtain a binary classification result, and training the model to obtain a classification model;
C. a learning behavior detection module; the system is used for detecting the learning behaviors of students so as to obtain course comment activity indexes and final scores of the students and comprises a data preprocessing module, a data classification module and a calculation module; preprocessing the data to be detected, identifying student behaviors by using a classification model obtained by a model training module, reconstructing the score of a course discussion part according to an identification result, and calculating and outputting a course comment activity index and each studenti(i∈N+) Is finally scored
- 上一篇:石墨接头机器人自动装卡簧、装栓机
- 下一篇:检索用资料信息存储装置