Modeling method and system of traditional Chinese medicine syndrome type prediction model of systemic lupus erythematosus
1. A modeling method of a traditional Chinese medicine syndrome type prediction model of systemic lupus erythematosus is characterized by comprising the following steps:
formulating a traditional Chinese medicine clinical questionnaire table of systemic lupus erythematosus, and integrating traditional Chinese medicine four-diagnosis information of electronic medical records of systemic lupus erythematosus patients in hospitals and corresponding traditional Chinese medicine syndrome type diagnosis result information by utilizing the traditional Chinese medicine clinical questionnaire table of systemic lupus erythematosus;
preprocessing the traditional Chinese medicine four-diagnosis information and the corresponding traditional Chinese medicine syndrome type diagnosis result information in the traditional Chinese medicine clinical questionnaire of the systemic lupus erythematosus to obtain modeling data;
carrying out data statistics on the traditional Chinese medicine syndrome of the modeling data by using SPSS software, calculating the proportion of different syndromes in all syndromes, and calculating the weight of each syndrome by using a nonlinear weighting function; and
and training and testing a machine learning model by using the modeling data to obtain the traditional Chinese medicine syndrome type prediction model of the systemic lupus erythematosus.
2. The modeling method of the traditional Chinese medicine syndrome type prediction model of systemic lupus erythematosus of claim 1, which is characterized in that: the method comprises the steps of formulating a traditional Chinese medicine clinical questionnaire for systemic lupus erythematosus, and integrating traditional Chinese medicine four-diagnosis information of an electronic medical record of a systemic lupus erythematosus patient in a hospital and corresponding traditional Chinese medicine syndrome type diagnosis result information by utilizing the traditional Chinese medicine clinical questionnaire for systemic lupus erythematosus, wherein the traditional Chinese medicine clinical questionnaire for systemic lupus erythematosus comprises 98 clinical expressions of bright red butterfly-shaped erythema, hyperpyrexia, thirst, dizziness, arthralgia, myalgia, yellow urine, scanty urine, fatigue, alopecia, aphtha, bitter taste in mouth, poor sleep, anorexia, red rash and the like of the face of the patient.
3. The modeling method of the traditional Chinese medicine syndrome type prediction model of systemic lupus erythematosus of claim 1, which is characterized in that: the step of formulating the systemic lupus erythematosus traditional Chinese medicine clinical questionnaire and integrating the traditional Chinese medicine four-diagnosis information and the corresponding traditional Chinese medicine syndrome type diagnosis result information of the electronic medical record of the systemic lupus erythematosus patient in the hospital by using the systemic lupus erythematosus traditional Chinese medicine clinical questionnaire specifically comprises the following steps: extracting the four-diagnosis information of the electronic medical record of the systemic lupus erythematosus patient in the hospital and the corresponding traditional Chinese medicine syndrome type diagnosis result information, synonymously matching the four-diagnosis information of the electronic medical record with the traditional Chinese medicine clinical questionnaire of the systemic lupus erythematosus, extracting the four-diagnosis information of the electronic medical record and the traditional Chinese medicine syndrome type diagnosis result information, filling the four-diagnosis information and the traditional Chinese medicine syndrome type diagnosis result information into the traditional Chinese medicine clinical questionnaire of the systemic lupus erythematosus, and corresponding the unique patient number with the four-diagnosis information and the traditional Chinese medicine syndrome type diagnosis result information one by one.
4. The modeling method of the traditional Chinese medicine syndrome type prediction model of systemic lupus erythematosus of claim 1, which is characterized in that: the step of preprocessing the traditional Chinese medicine four-diagnosis information and the corresponding traditional Chinese medicine syndrome type diagnosis result information in the traditional Chinese medicine clinical questionnaire of the systemic lupus erythematosus to obtain modeling data specifically comprises the following steps:
deleting the data of the single traditional Chinese medicine syndrome accounting for less than 3% of the total traditional Chinese medicine syndrome;
carrying out 0-1 standardization processing on the traditional Chinese medicine four-diagnosis information data;
performing one-hot coding on the traditional Chinese medicine syndrome type diagnosis result information;
deleting the missing data in the tongue condition data and the pulse condition data or the Chinese medicine syndrome type one-hot coding data in the Chinese medicine four diagnosis information; and
and eliminating the value with abnormality greater than 0.5% from the data after missing value processing by utilizing Minitab software.
5. The modeling method of the traditional Chinese medicine syndrome type prediction model of systemic lupus erythematosus of claim 4, characterized in that: the step of preprocessing the traditional Chinese medicine four-diagnosis information and the corresponding traditional Chinese medicine syndrome type diagnosis result information in the traditional Chinese medicine clinical questionnaire of the systemic lupus erythematosus to obtain modeling data further comprises the following steps: and (3) interpolating missing values of the tongue condition data and the pulse condition data of the samples in the four diagnostic methods according to the traditional Chinese medicine syndrome types of the samples by using the average values of the tongue condition data and the pulse condition data of the samples in the syndrome types.
6. The modeling method of the traditional Chinese medicine syndrome type prediction model of systemic lupus erythematosus of claim 1, which is characterized in that: the step of training and testing a machine learning model by using the modeling data to obtain the traditional Chinese medicine syndrome type prediction model of the systemic lupus erythematosus specifically comprises the following steps: and training a machine learning model by using 70% of the modeling data, and testing the machine learning model by using the rest 30% of the modeling data to obtain the traditional Chinese medicine evidence type prediction model of the systemic lupus erythematosus, wherein the machine learning model comprises at least two of a random forest model, a support vector machine model, an XGboost model and a nonlinear empowerment XGboost model.
7. A modeling system of a traditional Chinese medicine syndrome prediction model of systemic lupus erythematosus is characterized by comprising the following components:
the data acquisition module is used for extracting the traditional Chinese medicine four-diagnosis information and the corresponding traditional Chinese medicine syndrome type diagnosis result information of the electronic medical record of the systemic lupus erythematosus patient in the hospital, and integrating the traditional Chinese medicine four-diagnosis information and the corresponding traditional Chinese medicine syndrome type diagnosis result information into a traditional Chinese medicine clinical questionnaire;
the data processing module is used for preprocessing the traditional Chinese medicine four-diagnosis information and the corresponding traditional Chinese medicine syndrome type diagnosis result information in the traditional Chinese medicine clinical questionnaire of the systemic lupus erythematosus to obtain modeling data;
the nonlinear weighting module is used for carrying out data statistics on the traditional Chinese medicine syndrome of the modeling data by using SPSS software, calculating the proportion of different syndromes in all syndromes, and calculating the weight of each syndrome by using a nonlinear weighting function; and
and the model establishing module is used for training and testing a machine learning model by using the modeling data to obtain the traditional Chinese medicine syndrome type prediction model of the systemic lupus erythematosus.
8. The modeling system of the traditional Chinese medicine syndrome type prediction model of systemic lupus erythematosus of claim 7, characterized in that: the traditional Chinese medicine clinical questionnaire of the systemic lupus erythematosus comprises 98 clinical manifestations of a patient, such as bright red butterfly-shaped erythema on the face, high fever, thirst, dizziness, arthralgia, myalgia, yellow urine, oliguria, fatigue, alopecia, aphtha of the mouth and tongue, bitter taste in the mouth, poor sleep, poor appetite, red rash and the like.
9. The modeling system of the traditional Chinese medicine syndrome type prediction model of systemic lupus erythematosus of claim 7, characterized in that: the data preprocessing of the data processing module comprises the following steps: removing less than 3% of traditional Chinese medicine syndrome types, standardizing, one-hot coding, deleting missing values and removing abnormal values.
10. The modeling system of the traditional Chinese medicine syndrome type prediction model of systemic lupus erythematosus of claim 7, characterized in that: the machine learning model of the model establishing module comprises at least two of a random forest model, a support vector machine model, an XGboost model and a nonlinear empowerment XGboost model.
Background
Systemic lupus erythematosus is a chronic autoimmune disease which involves multiple organs of the whole body and has a protracted and repeated course of disease, has complex clinical manifestations, mainly comprises fever and facial butterfly erythema, and usually involves viscera, particularly heart, lung, liver, kidney and blood systems, in addition to lesion of skin, mucous membrane and joints, so that multiple system damage is finally caused. Modern medicine mostly adopts glucocorticoid, non-steroidal anti-inflammatory drugs, immunosuppressant, antimalarial drugs, biological agents and the like for treatment, but because the course of disease is long, adverse reactions are more after long-term administration, and the toxic and side effects are inevitable.
The traditional Chinese medicine has definite curative effect on the systemic lupus erythematosus, can obviously reduce toxic and side effects, greatly improve the prognosis of the systemic lupus erythematosus and improve the life quality of patients with the systemic lupus erythematosus. Syndrome differentiation is the core of traditional Chinese medicine treatment, and syndrome differentiation is the basis of prescription dispatching and treatment, and has very important significance for treating systemic lupus erythematosus through correct syndrome differentiation in traditional Chinese medicine. However, differentiation of syndromes is a complicated process, and it is difficult for most of the physicians who have come into clinical use to accurately differentiate syndromes. Therefore, in order to solve the problem of traditional Chinese medicine syndrome type discrimination of systemic lupus erythematosus and improve the clinical diagnosis and treatment effect of traditional Chinese medicines on systemic lupus erythematosus, a traditional Chinese medicine syndrome type model prediction system of systemic lupus erythematosus needs to be developed, so that a clinician is assisted in diagnosing the traditional Chinese medicine syndrome type of systemic lupus erythematosus. In addition, traditional Chinese medicine syndrome modeling data are unbalanced in distribution, a large number of classes can be fully trained, and classification performance of a small number of classes of samples cannot be fully trained, so that classification prediction performance of a large number of classes of samples is concerned more in a model learning process, and classification prediction effect of a model on relatively low occurrence frequency is influenced.
Disclosure of Invention
Aiming at the defects in the prior art, the invention aims to provide a modeling method and a system of a traditional Chinese medicine syndrome type prediction model of systemic lupus erythematosus, which can improve the diagnosis rate of the traditional Chinese medicine syndrome type of systemic lupus erythematosus.
In order to solve the problems, the technical scheme of the invention is as follows:
a modeling method of a traditional Chinese medicine syndrome type prediction model of systemic lupus erythematosus comprises the following steps:
formulating a traditional Chinese medicine clinical questionnaire table of systemic lupus erythematosus, and integrating traditional Chinese medicine four-diagnosis information of electronic medical records of systemic lupus erythematosus patients in hospitals and corresponding traditional Chinese medicine syndrome type diagnosis result information by utilizing the traditional Chinese medicine clinical questionnaire table of systemic lupus erythematosus;
preprocessing the traditional Chinese medicine four-diagnosis information and the corresponding traditional Chinese medicine syndrome type diagnosis result information in the traditional Chinese medicine clinical questionnaire of the systemic lupus erythematosus to obtain modeling data;
carrying out data statistics on the traditional Chinese medicine syndrome of the modeling data by using SPSS software, calculating the proportion of different syndromes in all syndromes, and calculating the weight of each syndrome by using a nonlinear weighting function; and
and training and testing a machine learning model by using the modeling data to obtain the traditional Chinese medicine syndrome type prediction model of the systemic lupus erythematosus.
Optionally, in the step of formulating a traditional Chinese medicine clinical questionnaire for systemic lupus erythematosus, the traditional Chinese medicine four-diagnosis information of the electronic medical record of the systemic lupus erythematosus patient in the hospital and the corresponding traditional Chinese medicine syndrome type diagnosis result information are integrated by using the traditional Chinese medicine clinical questionnaire for systemic lupus erythematosus, the traditional Chinese medicine clinical questionnaire for systemic lupus erythematosus comprises 98 clinical manifestations of the patient, such as bright red butterfly-shaped erythema, hyperpyrexia, thirst, dizziness, arthralgia, myalgia, yellow urine, scanty urine, fatigue, alopecia, aphtha, bitter taste of mouth, poor sleep, anorexia, red rash and the like.
Optionally, the step of formulating a systemic lupus erythematosus traditional Chinese medicine clinical questionnaire, and integrating traditional Chinese medicine four-diagnosis information and corresponding traditional Chinese medicine syndrome type diagnosis result information of an electronic medical record of a systemic lupus erythematosus patient in a hospital by using the systemic lupus erythematosus traditional Chinese medicine clinical questionnaire specifically comprises: extracting the four-diagnosis information of the electronic medical record of the systemic lupus erythematosus patient in the hospital and the corresponding traditional Chinese medicine syndrome type diagnosis result information, synonymously matching the four-diagnosis information of the electronic medical record with the traditional Chinese medicine clinical questionnaire of the systemic lupus erythematosus, extracting the four-diagnosis information of the electronic medical record and the traditional Chinese medicine syndrome type diagnosis result information, filling the four-diagnosis information and the traditional Chinese medicine syndrome type diagnosis result information into the traditional Chinese medicine clinical questionnaire of the systemic lupus erythematosus, and corresponding the unique patient number with the four-diagnosis information and the traditional Chinese medicine syndrome type diagnosis result information one by one.
Optionally, the step of preprocessing the four-diagnosis information of traditional Chinese medicine and the corresponding traditional Chinese medicine syndrome type diagnosis result information in the traditional Chinese medicine clinical questionnaire for systemic lupus erythematosus to obtain modeling data specifically includes the following steps:
deleting the data of the single traditional Chinese medicine syndrome accounting for less than 3% of the total traditional Chinese medicine syndrome;
carrying out 0-1 standardization processing on the traditional Chinese medicine four-diagnosis information data;
performing one-hot coding on the traditional Chinese medicine syndrome type diagnosis result information;
deleting the missing data in the tongue condition data and the pulse condition data or the Chinese medicine syndrome type one-hot coding data in the Chinese medicine four diagnosis information; and
and eliminating the value with abnormality greater than 0.5% from the data after missing value processing by utilizing Minitab software.
Optionally, the step of preprocessing the four-diagnosis information of traditional Chinese medicine and the corresponding traditional Chinese medicine syndrome type diagnosis result information in the traditional Chinese medicine clinical questionnaire for systemic lupus erythematosus to obtain modeling data further includes the following steps: and (3) interpolating missing values of the tongue condition data and the pulse condition data of the samples in the four diagnostic methods according to the traditional Chinese medicine syndrome types of the samples by using the average values of the tongue condition data and the pulse condition data of the samples in the syndrome types.
Optionally, the step of training and testing a machine learning model by using the modeling data to obtain the traditional Chinese medicine syndrome type prediction model of the systemic lupus erythematosus specifically includes: and training a machine learning model by using 70% of the modeling data, and testing the machine learning model by using the rest 30% of the modeling data to obtain the traditional Chinese medicine evidence type prediction model of the systemic lupus erythematosus, wherein the machine learning model comprises at least two of a random forest model, a support vector machine model, an XGboost model and a nonlinear empowerment XGboost model.
Furthermore, the invention also provides a modeling system of the traditional Chinese medicine syndrome type prediction model of the systemic lupus erythematosus, and the system comprises:
the data acquisition module is used for extracting the traditional Chinese medicine four-diagnosis information and the corresponding traditional Chinese medicine syndrome type diagnosis result information of the electronic medical record of the systemic lupus erythematosus patient in the hospital, and integrating the traditional Chinese medicine four-diagnosis information and the corresponding traditional Chinese medicine syndrome type diagnosis result information into a traditional Chinese medicine clinical questionnaire;
the data processing module is used for preprocessing the traditional Chinese medicine four-diagnosis information and the corresponding traditional Chinese medicine syndrome type diagnosis result information in the traditional Chinese medicine clinical questionnaire of the systemic lupus erythematosus to obtain modeling data;
the nonlinear weighting module is used for carrying out data statistics on the traditional Chinese medicine syndrome of the modeling data by using SPSS software, calculating the proportion of different syndromes in all syndromes, and calculating the weight of each syndrome by using a nonlinear weighting function; and
and the model establishing module is used for training and testing a machine learning model by using the modeling data to obtain the traditional Chinese medicine syndrome type prediction model of the systemic lupus erythematosus.
Optionally, the traditional Chinese medicine clinical questionnaire of systemic lupus erythematosus comprises 98 clinical manifestations of bright red butterfly-shaped erythema of the face, hyperpyrexia, thirst, dizziness, arthralgia, myalgia, yellow urine, oliguria, fatigue, alopecia, aphtha of the mouth and tongue, bitter taste in the mouth, poor sleep, anorexia, red rash, and the like of the patient.
Optionally, the data preprocessing of the data processing module includes: removing less than 3% of traditional Chinese medicine syndrome types, standardizing, one-hot coding, deleting missing values and removing abnormal values.
Optionally, the machine learning model of the model building module includes at least two of a random forest model, a support vector machine model, an XGBoost model, and a nonlinear-empowerment-based XGBoost model.
Compared with the prior art, the modeling method and the system for the traditional Chinese medicine syndrome type prediction model of the systemic lupus erythematosus have the advantages that: according to the invention, a large amount of existing domestic electronic medical record data of traditional Chinese medicine of systemic lupus erythematosus is used for establishing a traditional Chinese medicine syndrome type prediction model of systemic lupus erythematosus, so that the diagnosis rate of the traditional Chinese medicine syndrome type of systemic lupus erythematosus can be improved, the modeling method of the traditional Chinese medicine syndrome type prediction model of systemic lupus erythematosus is applied to the traditional Chinese medicine syndrome type prediction of systemic lupus erythematosus, the clinical teaching and doctor diagnosis and treatment can be assisted, and the diagnosis and treatment rate of students and doctors on systemic lupus erythematosus patients can be improved.
Drawings
Other features, objects and advantages of the invention will become more apparent upon reading of the detailed description of non-limiting embodiments with reference to the following drawings:
FIG. 1 is a block diagram of a modeling method of a TCM syndrome type prediction model of SLE according to an embodiment of the present invention;
FIG. 2 is another block diagram of the flow chart of the modeling method of the TCM syndrome type prediction model of SLE according to the embodiment of the present invention;
FIG. 3 is a block diagram of a modeling system of a TCM syndrome prediction model of SLE according to an embodiment of the present invention.
Detailed Description
The present invention will be described in detail with reference to specific examples. The following examples will assist those skilled in the art in further understanding the invention, but are not intended to limit the invention in any way. It should be noted that it would be obvious to those skilled in the art that various changes and modifications can be made without departing from the spirit of the invention. All falling within the scope of the present invention.
Fig. 1 is a flow chart of a modeling method of a traditional Chinese medicine syndrome type prediction model of systemic lupus erythematosus provided in the embodiment of the present invention, and as shown in fig. 1, the method includes the following steps:
s1: formulating a traditional Chinese medicine clinical questionnaire table of systemic lupus erythematosus, and integrating traditional Chinese medicine four-diagnosis information of electronic medical records of systemic lupus erythematosus patients in hospitals and corresponding traditional Chinese medicine syndrome type diagnosis result information by utilizing the traditional Chinese medicine clinical questionnaire table of systemic lupus erythematosus;
specifically, 98 systemic lupus erythematosus clinical manifestations such as facial fresh red butterfly-shaped erythema, hyperpyrexia, thirst, dizziness, arthralgia, myalgia, yellow urine, oliguria, fatigue, alopecia, aphtha, bitter taste of mouth, poor sleep, anorexia, erythra and the like are screened out through clinical literature analysis and retrospective analysis of various databases, a 2002 new Chinese medicine clinical research guiding principle and an expert questionnaire, and a systemic lupus erythematosus traditional Chinese medicine clinical questionnaire is formulated.
Traditional Chinese medicine four-diagnosis information and corresponding traditional Chinese medicine syndrome type diagnosis result information of an electronic medical record of a systemic lupus erythematosus patient in a hospital are extracted, and after integration, the unique patient number corresponds to the four-diagnosis information and the traditional Chinese medicine syndrome type diagnosis result information one by one. The integration mainly comprises the following: synonymously matching the traditional Chinese medicine four-diagnosis information of the electronic medical record with the traditional Chinese medicine clinical questionnaire of the systemic lupus erythematosus, for example, words such as 'bad appetite', 'not eating', 'bad appetite' and the like in the electronic medical record are matched with 'na wei' in the traditional Chinese medicine clinical questionnaire of the systemic lupus erythematosus; extracting the four diagnostic information and the syndrome diagnostic result information of traditional Chinese medicine in the electronic medical record and filling the information into the traditional Chinese medicine clinical questionnaire of systemic lupus erythematosus.
S2: preprocessing the traditional Chinese medicine four-diagnosis information and the corresponding traditional Chinese medicine syndrome type diagnosis result information in the traditional Chinese medicine clinical questionnaire of the systemic lupus erythematosus to obtain modeling data;
specifically, since each table of the clinical questionnaire of systemic lupus erythematosus has more or less noise data such as missing values, abnormal values, etc., in order to control the accuracy and completeness of the data and ensure the accuracy of the result, data preprocessing is required to be performed on the data, as shown in fig. 2, the data preprocessing includes the following steps:
s21: deleting the data of which the proportion of the single traditional Chinese medicine syndrome in the data is less than 3 percent of the total traditional Chinese medicine syndrome;
specifically, in this embodiment, after deleting the data in which the ratio of the single traditional Chinese medicine syndrome in the data to the total traditional Chinese medicine syndrome is less than 3%, seven common traditional Chinese medicine syndromes of systemic lupus erythematosus can be obtained, which are: syndrome of exuberance of heat-toxin, syndrome of yin deficiency of liver and kidney, syndrome of yang deficiency of spleen and kidney, syndrome of internal heat due to yin deficiency, syndrome of arthralgia due to wind-damp-heat, syndrome of deficiency of both qi and yin and syndrome of deficiency of both qi and blood.
S22: carrying out 0-1 standardization processing on the traditional Chinese medicine four-diagnosis information data;
specifically, 0 represents none, 1 represents presence, e.g., tinnitus, then 1 is filled in the preprocessed data set, and no tinnitus, then 0 is filled in.
S23: performing one-hot coding on the traditional Chinese medicine syndrome type diagnosis result information;
specifically, for example, the seven systemic lupus erythematosus traditional Chinese medicine syndrome types in this embodiment are coded as the syndrome of exuberance of heat toxin (1000000), the syndrome of yin deficiency of liver and kidney (0100000), the syndrome of yang deficiency of spleen and kidney (0010000), the syndrome of internal heat due to yin deficiency (0001000), the syndrome of arthralgia due to wind-dampness (0000100), the syndrome of deficiency of both qi and yin (0000010), and the syndrome of deficiency of both qi and blood (0000001), respectively.
S24: deleting the missing data in the tongue condition data and the pulse condition data or the Chinese medicine syndrome type one-hot coding data in the Chinese medicine four diagnosis information;
specifically, the tongue condition and pulse condition has a relatively large weight in the diagnosis result of the syndrome type of traditional Chinese medicine, and it can be certain that each patient has the tongue condition and the pulse condition, but some clinicians sometimes forget to write the tongue condition and the pulse condition when writing medical records, so that the leakage rate is certain, and the tongue condition and pulse condition has an important meaning for the diagnosis of the syndrome type of traditional Chinese medicine, if the tongue condition and pulse condition data are all missing, the result of the judgment of the syndrome type of traditional Chinese medicine is not very accurate, so that the patient number with the tongue condition and pulse condition missing is deleted. Secondly, the traditional Chinese medicine syndrome diagnosis result is not filled in some medical records, and the data cannot be used for modeling and prediction and needs to be deleted.
In addition, for the sample with only one missing item of tongue and pulse condition data, the missing value can be interpolated by the average value of the tongue and pulse conditions of the sample according to the traditional Chinese medicine syndrome of the sample, for example, if there is a yin deficiency internal heat syndrome, there is a tongue condition, but there is no pulse condition, the data can be filled in a column with red tongue and little coating according to the average value of the yin deficiency internal heat syndrome, namely 0.95 of red tongue and little coating in the sample.
S25: and eliminating the value with abnormality greater than 0.5% from the data after missing value processing by utilizing Minitab software.
S3: carrying out data statistics on the traditional Chinese medicine syndrome of the modeling data by using SPSS software, calculating the proportion of different syndromes in all syndromes, and calculating the weight of each syndrome by using a nonlinear weighting function;
specifically, a heuristic function is adopted to carry out nonlinear weighting on samples of different classes, the number of the samples and the weight of the samples are in negative correlation, and the weight calculation method comprises the following steps: and calculating the sample proportion as shown in the following formula (1), wherein D is the total number of samples, dk is the number of k-th samples, and upsilok is the proportion of k-th samples in the total samples.
Calculating a nonlinear weighting function, wherein the nonlinear weighting function based on the sample proportion is shown as the following formula (2):
according to the formula (2), the value range of the function is [0.5+ alpha/(1 + e), 0.5+ alpha/2 ]. In this embodiment, according to the parameter optimization result, setting the α value to 1, the obtained weight of the syndrome of exuberant heat-toxin, the syndrome of yin deficiency of liver and kidney, the syndrome of yang deficiency of spleen and kidney, the syndrome of yin deficiency and internal heat, the syndrome of wind-damp heat arthralgia, the syndrome of both qi and yin deficiency and the syndrome of both qi and blood deficiency are respectively: 0.984, 1.036, 1.106, 0.982, 1.023, 0.994 and 1.215.
S4: and training and testing a machine learning model by using the modeling data to obtain the traditional Chinese medicine syndrome type prediction model of the systemic lupus erythematosus.
Specifically, 70% of the modeling data is used for training a machine learning model, the rest 30% of the modeling data is used for testing the machine learning model, and the traditional Chinese medicine evidence type prediction model of the systemic lupus erythematosus is obtained, wherein the machine learning model comprises at least two of a random forest model, a support vector machine model, an XGboost model and a nonlinear empowerment XGboost model. In this embodiment, four machine learning model codes may be used to read data in sequence, train and test the data, and after the four machine learning model codes run various data, the performance of different machine learning models may be compared based on the relevant statistical indexes and performance curves of the confusion matrix, where the evaluation indexes include: accuracy (ACCURACY, ACC), mean Accuracy (BACC), F1-score, and Kappa coefficient, the performance curves include: receiver Operating Characteristics (ROC) curve and Precision-Recall (PR) curve.
The following table 1 shows comparison of evaluation indexes of four machine learning model classifiers. As can be seen from the table, in the RF model, ACC is 79.36%, BACC is 25.43, F1-score is 0.25, Kappa is 0.02, AUC of ROC curve is 0.886, AUC of PR curve is 0.745; in the SVM model, ACC is 81.23%, BACC is 25.27, F1-score is 0.23, Kappa is 0.01, AUC value of ROC curve is 0.873, AUC value of PR curve is 0.718; in the NW-XGBoost model, ACC is 84.56%, BACC is 28.56, F1-score is 0.28, Kappa is 0.08, AUC of ROC curve is 0.928, AUC of PR curve is 0.834, and each evaluation index performs best. In the XGBoost model, ACC is 83.76%, BACC is 27.42, F1-score is 0.26, Kappa is 0.07, AUC value of ROC curve is 0.919, AUC value of PR curve is 0.826, each evaluation index is inferior to the NW-XGBoost model, and the SVM model and the RF model are similar in performance and have advantages, but are inferior to the NW-XGBoost model and the XGBoost model.
TABLE 1
Fig. 3 is a structural block diagram of a modeling system of a traditional Chinese medicine syndrome type prediction model of systemic lupus erythematosus according to an embodiment of the present invention, and as shown in fig. 3, the modeling system of the traditional Chinese medicine syndrome type prediction model of systemic lupus erythematosus includes:
the data acquisition module 31: the system is used for extracting the traditional Chinese medicine four-diagnosis information and the corresponding traditional Chinese medicine syndrome type diagnosis result information of the electronic medical record of the systemic lupus erythematosus patient in the hospital, and integrating the traditional Chinese medicine four-diagnosis information and the corresponding traditional Chinese medicine syndrome type diagnosis result information into a traditional Chinese medicine clinical questionnaire;
specifically, the traditional Chinese medicine clinical questionnaire of the systemic lupus erythematosus comprises 98 clinical manifestations of a patient, such as bright red butterfly-shaped erythema on the face, hyperpyrexia, thirst, dizziness, arthralgia, myalgia, yellow urine, oliguria, fatigue, alopecia, aphtha, bitter taste in the mouth, poor sleep, anorexia, red rash and the like.
Traditional Chinese medicine four-diagnosis information and corresponding traditional Chinese medicine syndrome type diagnosis result information of an electronic medical record of a systemic lupus erythematosus patient in a hospital are extracted, and after integration, the unique patient number corresponds to the four-diagnosis information and the traditional Chinese medicine syndrome type diagnosis result information one by one. The integration mainly comprises the following: synonymously matching the traditional Chinese medicine four-diagnosis information of the electronic medical record with the traditional Chinese medicine clinical questionnaire of the systemic lupus erythematosus, for example, words such as 'bad appetite', 'not eating', 'bad appetite' and the like in the electronic medical record are matched with 'na wei' in the traditional Chinese medicine clinical questionnaire of the systemic lupus erythematosus; extracting the four diagnostic information and the syndrome diagnostic result information of traditional Chinese medicine in the electronic medical record and filling the information into the traditional Chinese medicine clinical questionnaire of systemic lupus erythematosus.
The data processing module 32: the system is used for preprocessing the traditional Chinese medicine four-diagnosis information and the corresponding traditional Chinese medicine syndrome type diagnosis result information in the traditional Chinese medicine clinical questionnaire table of the systemic lupus erythematosus to obtain modeling data;
specifically, the data preprocessing of the data processing module 32 includes: removing less than 3% of traditional Chinese medicine syndrome types, standardizing, one-hot coding, deleting missing values and removing abnormal values.
The nonlinear weighting module 33: the traditional Chinese medicine syndrome type modeling system is used for carrying out data statistics on the traditional Chinese medicine syndrome type of the modeling data by using SPSS software, calculating the proportion of different syndrome types to all syndrome types, and calculating the weight of each syndrome type by using a nonlinear weighting function; and
the model building module 34: and training and testing a machine learning model by using the modeling data to obtain the traditional Chinese medicine syndrome type prediction model of the systemic lupus erythematosus.
Specifically, 70% of the modeling data is used for training a machine learning model, the rest 30% of the modeling data is used for testing the machine learning model, and the traditional Chinese medicine evidence type prediction model of the systemic lupus erythematosus is obtained, wherein the machine learning model comprises at least two of a random forest model, a support vector machine model, an XGboost model and a nonlinear weighting-based XGboost model.
Compared with the prior art, the modeling method and the system of the traditional Chinese medicine syndrome type prediction model of the systemic lupus erythematosus can improve the diagnosis rate of the traditional Chinese medicine syndrome type of the systemic lupus erythematosus by establishing the traditional Chinese medicine syndrome type prediction model of the systemic lupus erythematosus by using a large amount of existing domestic electronic medical record data of traditional Chinese medicine of the systemic lupus erythematosus, can assist clinical teaching and doctor diagnosis and treatment and improve the diagnosis and treatment rate of students and doctors to patients with the systemic lupus erythematosus by applying the modeling method of the traditional Chinese medicine syndrome type prediction model of the systemic lupus erythematosus to the traditional Chinese medicine syndrome type prediction of the systemic lupus erythematosus.
The foregoing description of specific embodiments of the present invention has been presented. It is to be understood that the present invention is not limited to the specific embodiments described above, and that various changes or modifications may be made by one skilled in the art within the scope of the appended claims without departing from the spirit of the invention. The embodiments and features of the embodiments of the present application may be combined with each other arbitrarily without conflict.