Coronary artery stenosis degree identification method based on multi-classifier fusion
1. A coronary artery stenosis degree identification method based on multi-classifier fusion is characterized by comprising the following steps:
step 1, constructing an image sample library;
step 2, carrying out denoising and segmentation binarization processing on a CT original sequence diagram extracted from cardiac CTA to obtain a coronary artery extraction diagram;
step 3, extracting the features of the segmented image, and extracting three major features of interesting texture features, gray features and geometric features;
step 4, according to the principle of 7:3, a random index method is adopted, 500 grouped samples are divided into a training group and a testing group, the three image omics characteristics of texture, gray level and geometry extracted in the step 3 are screened by adopting a multi-classification Relieff feature weighting algorithm, cross validation is adopted for random characteristics, the correlation between each characteristic and a prediction result is calculated, and the characteristic with small correlation is removed;
step 5, forming a feature set by the features of the texture, the gray scale and the geometry selected in the step 4, establishing a multi-classifier fusion prediction model, and selecting three classifiers of a Support Vector Machine (SVM), a random forest RF and an Extreme Learning Machine (ELM) which have good classification effects on medical images to perform fusion prediction on the degree of coronary lesion; and determining the weight of the 3 classifiers in the fusion classifier by adopting a weighting method, and judging as a normal sample when the stenosis degree is lower than 50%, and judging as a lesion sample when the stenosis degree is greater than 50%.
2. The method for identifying the stenosis degree of coronary artery based on multi-classifier fusion according to claim 1, wherein the step 1 is as follows:
the method comprises the steps of collecting information and images of patients who have been subjected to cardiac CTA and DSA examinations in a hospital data system for three years, enabling CT images and coronary stenosis index data to correspond to each other, hiding basic information of the patients in the images, selecting 500 coronary CT images of the patients meeting image quality as selected input samples, and labeling the label types.
3. The method for identifying the stenosis degree of coronary artery based on multi-classifier fusion according to claim 2, wherein the step 2 is as follows:
step 2.1, arranging all pixel points in the neighborhood of the Gaussian noise point in the original CT image according to a size rule, taking the gray value of the pixel at the middle as the gray value of the noise point to reduce the noise of the image, wherein the principle expression is as follows:
wherein i, j represents the coordinate value of the pixel point, gijA is the grey value of the noise point, and A is the neighborhood region taken by the noise point; { fijIs the data sequence; med means median operation;
the image quality of the CT image can be improved through denoising processing, and the denoised image can more clearly reflect coronary artery structure information in the CT image, so that the segmentation operation in the step 2.2 is facilitated;
step 2.2, R represents the whole image, and the whole de-noised CT image R is divided into c sub-regions, and the following conditions of (i) - (iv) are simultaneously met:
①U(Rx)=R,Rxis a sub-connected region;
②Rx∩Ryc, and x ≠ y for any x, y;
③P(Rx) C, True, for x 1,2,3.
④R(Rx∪Ry)=False,x≠y;
The coronary artery blood vessels with continuous regions are extracted from the original CT image through a region growing segmentation algorithm, namely a coronary artery extraction image is obtained.
4. The method for identifying the stenosis degree of coronary artery based on multi-classifier fusion according to claim 3, wherein the step 3 is as follows:
3.1, extracting the gray features of the mean value, the variance, the energy, the entropy, the kurtosis and the skewness of the coronary artery extraction image in the step 2 by adopting a gray histogram method;
3.2, constructing a gray level co-occurrence matrix, selecting a 5 multiplied by 5 sliding window to calculate the gray level characteristic value of each pixel point of the coronary artery extraction image in the step 2, and extracting the texture characteristic of the image;
and 3.3, extracting the geometric characteristics of the coronary artery image by adopting a Hu invariant moment method based on the coronary artery extraction image obtained in the step 2, firstly calculating the second-order and third-order center distances of the coronary artery image, then carrying out normalization processing to obtain an invariant moment group, and describing the shape geometric characteristics of the coronary artery extraction image by the invariant moment group.
5. The method for identifying the stenosis degree of coronary artery based on multi-classifier fusion according to claim 4, wherein the step 4 is as follows:
4.1, selecting the first d features with the maximum correlation from all the features of the texture, gray scale and geometric three-large image omics extracted in the step 3 through a Relieff feature weighting algorithm to form d feature subsets, wherein each subset comprises the feature numbers from 1 to d in sequence;
step 4.2, performing ten-fold cross validation, dividing the sample set into 10 subsets, selecting one subset as a test set each time, using the other 9 subsets as training sets, repeating for 10 times, and finally selecting 10 times of average recognition accuracy as a result;
and 4.3, calculating the prediction error rate of each feature subset according to the process, and selecting the feature subset with the minimum prediction error rate as the input feature of the multi-classifier fusion prediction model in the step 5.
6. The method for identifying the stenosis degree of coronary artery based on multi-classifier fusion according to claim 5, wherein the Relieff feature weighting algorithm in the step 4.1 is as follows:
one sample S is randomly extracted from the training sample set each time, and then the samples S of the same type and different types are respectively extracted from the samples S of the same type and the samples S of different typesFinding k neighbor samples Hl、MlAnd then updating the weight of each feature in the three types of features of texture, gray scale and geometry extracted in the step 3 in the prediction process, wherein the feature with the weight smaller than the set threshold value is eliminated, and the feature weight calculation formula is as follows:
in the above equation, m is the number of sample sampling times, k is the number of nearest neighbor samples, l is 1l) Denotes sample S and sample HlThe difference in characteristic A, C is the sample class, p (C) is the proportion of the number of samples of class C to the total number of samples, and p (class (S)) is the proportion of the number of samples in sample S to the total number of samples.
7. The method for identifying the stenosis degree of coronary artery based on multi-classifier fusion according to claim 5, wherein the step 5 is as follows:
step 5.1, firstly, the feature sample set screened out in the step 4 is respectively passed through three single classifiers of a Support Vector Machine (SVM), an Extreme Learning Machine (ELM) and a random forest Radio Frequency (RF) to obtain the identification result of each classifier on the stenosis degree of the coronary artery, namely 3 categories obtained by the classification and prediction of the samples to be identified by each classifier, and the weight occupied by each single classifier in a final multi-classifier fusion prediction model is calculated according to the classification correct capability of each classifier;
step 5.2, fusing the classification results of the support vector machine SVM, the extreme learning machine ELM and the random forest RF by adopting a majority weighted voting method, wherein when the output result of the classifier is +1, the classification result is represented as a normal classification, namely the stenosis degree is lower than 50%, and when the output result of the classifier is-1, the classification result is represented as a lesion classification, namely the stenosis degree is higher than 50%; and (3) multiplying the classification result of each classifier by the corresponding weight obtained in the step 5.1, and then adding the three products to be used as final output to obtain the classification result of the multi-classifier fusion prediction model, wherein the classification result is judged to be a normal classification when the addition result is a positive number, and is judged to be a lesion classification when the addition result is a negative number.
8. The method for identifying the stenosis degree of coronary artery according to claim 7, wherein the weight of each classifier in the step 5.1 is determined according to the classification accuracy, and the accuracy calculation formula of the classification model is as follows:
wherein, a is 1,2, 3; n-narrow, Non-narrow; e'nIs composed ofCumulative number of times of (e)nThe accumulated times are classified into normal or abnormal; y isaE { +1, -1} is a label of the training sample,respectively representing the classification result of each model;
calculating the weight w of each modelaComprises the following steps:
wherein the content of the first and second substances,
and 5.2, multiplying and adding the results obtained by the models with corresponding weight values respectively to obtain a final output result:
when the output result is positive, the classification result is indicated as a normal classification, namely the stenosis degree is lower than 50%, and when the output result is negative, the classification result is indicated as a lesion classification, namely the stenosis degree is higher than 50%.
Background
In recent years, the incidence and fatality rate of cardiovascular and cerebrovascular diseases have leap forward to the first of various diseases, especially coronary arteries are located on the surface of the heart and supply blood to cardiac muscle, and once a lesion occurs, the examination and treatment measures of the disease need to be more emphasized. Coronary artery disease can be diagnosed by anatomical parameters (e.g., diameter stenosis) or functional parameters associated with coronary myocardial ischemia. Clinically, the coronary heart disease is diagnosed and a treatment scheme is determined, usually, a patient needs to perform cardiac CTA examination, a doctor gives a preliminary diagnosis through a CT image, only a preliminary judgment of the lesion degree can be given according to the CT image, a diagnosis result is subjective, only the preliminary judgment of the light, medium and severe stenosis degrees can be given, if the stenosis rate is further determined, the patient needs to perform coronary angiography DSA, a diagnosed gold index is obtained, and the treatment scheme is given through the gold index. However, it is worth noting that coronary angiography is invasive detection, and needs contrast agent injection and pressure guide wire intervention operation, so that adverse reaction is easy to occur to patients during and after operation, and the trauma is large. Aiming at the problems, the noninvasive detection becomes the current research hotspot, and on the patient level, the noninvasive detection can avoid the serious pain and the serious risk caused by the intervention of the living body operation, and on the doctor level, the noninvasive detection can greatly improve the diagnosis efficiency and the diagnosis accuracy of doctors. The development of image omics is to mine hidden information from a two-dimensional CT image, construct the association between image characteristics and diseases, and further perform in-vitro identification and prediction of pathological changes, and important research achievements are obtained on diagnosis and treatment of many diseases, so that the future medicine is developed towards more intelligent and convenient directions, and the diagnosis of diseases by using the front-edge technology is also a focus and hot point of future research.
On the basis of the diagnosis method of coronary stenosis related lesion, there are currently medical coronary angiography pressure guide wire in vivo detection methods, and a noninvasive diagnosis method based on hydrodynamics and deep learning appears later, but these methods still have some problems that need further research, which are mainly reflected in the following two aspects:
the fluid mechanics simulation modeling process is complex: the method is characterized in that a patient-specific coronary artery model needs to be established for performing hemodynamic simulation, a single data single model is adopted, the blood flow running condition of a human body is complex, some factors may not be considered, for example, when transient simulation is performed, although pressure and velocity waveforms can be obtained from documents, more accurate individual simulation needs to be performed, in-vitro or in-vivo measurement needs to be performed, in addition, the image needs to be processed and reconstructed in the modeling process, the requirement on a processor is higher, the operation complexity is increased, and the calculation time is correspondingly prolonged and is unequal for hours.
The deep learning algorithm is easy to over-fit: the medical real data are generally difficult to obtain and generally belong to the field of small sample learning, and the deep neural network needs a large number of image samples for training due to the complex model structure. However, the algorithm with strong expression ability focuses on interpreting training data, so that the interpretation ability of future data, i.e., test data, is easily sacrificed, and in order to avoid overfitting, more data samples are often needed for learning to ensure that a better effect can still be achieved on a new data set, which is not suitable when image sample data is less.
Therefore, on the basis of machine learning, the invention uses a small sample classifier aiming at the characteristic of few medical coronary CT images and avoids the characteristic of easy overfitting of the traditional deep learning algorithm. The method specifically comprises the steps of segmenting coronary vessels from a CTA image to classify stenosis directly without invasive examination, and simultaneously providing a diagnosis method of multi-classifier fusion.
Disclosure of Invention
The invention aims to provide a coronary artery stenosis degree identification method based on multi-classifier fusion, which realizes automatic classification and prejudgment on stenosis degree judgment and avoids the injury of invasive surgery to a patient.
The technical scheme adopted by the invention is that a coronary artery stenosis degree identification method based on multi-classifier fusion is implemented according to the following steps:
step 1, constructing an image sample library;
step 2, carrying out denoising and segmentation binarization processing on a CT original sequence diagram extracted from cardiac CTA to obtain a coronary artery extraction diagram;
step 3, extracting the features of the segmented image, and extracting three major features of interesting texture features, gray features and geometric features;
step 4, according to the principle of 7:3, a random index method is adopted, 500 grouped samples are divided into a training group and a testing group, the three image omics characteristics of texture, gray level and geometry extracted in the step 3 are screened by adopting a multi-classification Relieff feature weighting algorithm, cross validation is adopted for random characteristics, the correlation between each characteristic and a prediction result is calculated, and the characteristic with small correlation is removed;
step 5, forming a feature set by the features of the texture, the gray scale and the geometry selected in the step 4, establishing a multi-classifier fusion prediction model, and selecting three classifiers, namely a Support Vector Machine (SVM), a Random Forest (RF) and an Extreme Learning Machine (ELM), which have a good medical image classification effect to perform fusion prediction on the coronary lesion degree; and determining the weight of the 3 classifiers in the fusion classifier by adopting a weighting method, and judging as a normal sample when the stenosis degree is lower than 50%, and judging as a lesion sample when the stenosis degree is greater than 50%.
The present invention is also characterized in that,
the step 1 is as follows:
the method comprises the steps of collecting information and images of patients who have been subjected to cardiac CTA and DSA examinations in a hospital data system for three years, enabling CT images and coronary stenosis index data to correspond to each other, hiding basic information of the patients in the images, selecting 500 coronary CT images of the patients meeting image quality as selected input samples, and labeling the label types.
The step 2 is as follows:
step 2.1, arranging all pixel points in the neighborhood of the Gaussian noise point in the original CT image according to a size rule, taking the gray value of the pixel at the middle as the gray value of the noise point to reduce the noise of the image, wherein the principle expression is as follows:
wherein i, j represents the coordinate value of the pixel point, gijA is the grey value of the noise point, and A is the neighborhood region taken by the noise point; { fijIs the data sequence; med means median operation.
The image quality of the CT image can be improved through denoising processing, and meanwhile, the denoised image can more clearly embody the coronary artery structure information in the CT image, and the segmentation operation in the step 2.2 is facilitated to be promoted.
Step 2.2, R represents the whole image, and the whole de-noised CT image R is divided into c sub-regions, and the following conditions of (i) - (iv) are simultaneously met:
①U(Rx)=R,Rxis a sub-connected region;
②Rx∩Ryc, and x ≠ y for any x, y;
③P(Rx) C, True, for x 1,2,3.
④R(Rx∪Ry)=False,x≠y;
The coronary artery blood vessels with continuous regions are extracted from the original CT image through a region growing segmentation algorithm, namely a coronary artery extraction image is obtained.
The step 3 is as follows:
3.1, extracting the gray features of the mean value, the variance, the energy, the entropy, the kurtosis and the skewness of the coronary artery extraction image in the step 2 by adopting a gray histogram method;
3.2, constructing a gray level co-occurrence matrix, selecting a 5 multiplied by 5 sliding window to calculate the gray level characteristic value of each pixel point of the coronary artery extraction image in the step 2, and extracting the texture characteristic of the image;
and 3.3, extracting the geometric characteristics of the coronary artery image by adopting a Hu invariant moment method based on the coronary artery extraction image obtained in the step 2, firstly calculating the second-order and third-order center distances of the coronary artery image, then carrying out normalization processing to obtain an invariant moment group, and describing the shape geometric characteristics of the coronary artery extraction image by the invariant moment group.
The step 4 is as follows:
4.1, selecting the first d features with the maximum correlation from all the features of the texture, gray scale and geometric three-large image omics extracted in the step 3 through a Relieff feature weighting algorithm to form d feature subsets, wherein each subset comprises the feature numbers from 1 to d in sequence;
step 4.2, performing ten-fold cross validation, dividing the sample set into 10 subsets, selecting one subset as a test set each time, using the other 9 subsets as training sets, repeating for 10 times, and finally selecting 10 times of average recognition accuracy as a result;
and 4.3, calculating the prediction error rate of each feature subset according to the process, and selecting the feature subset with the minimum prediction error rate as the input feature of the multi-classifier fusion prediction model in the step 5.
The specific characteristics of the Relieff feature weighting algorithm in step 4.1 are as follows:
randomly extracting a sample S from the training sample set each time, and then respectively finding out k adjacent samples H from the same type of samples and different types of samples of the sample Sl、MlAnd then updating the weight of each feature in the three types of features of texture, gray scale and geometry extracted in the step 3 in the prediction process, wherein the feature with the weight smaller than the set threshold value is eliminated, and the feature weight calculation formula is as follows:
in the above formula, m is the number of sample sampling timesK is the number of nearest neighbor samples, l 1l) Denotes sample S and sample HlThe difference in characteristic A, C is the sample class, p (C) is the proportion of the number of samples of class C to the total number of samples, and p (class (S)) is the proportion of the number of samples in sample S to the total number of samples.
The step 5 is as follows:
step 5.1, firstly, the feature sample set screened out in the step 4 is respectively passed through three single classifiers of a Support Vector Machine (SVM), an Extreme Learning Machine (ELM) and a Random Forest (RF) to obtain the identification result of each classifier on the stenosis degree of the coronary artery, namely 3 categories obtained by the classification and prediction of the samples to be identified by each classifier, and the weight occupied by each single classifier in a final multi-classifier fusion prediction model is calculated according to the correct classification capability of each classifier;
step 5.2, fusing the classification results of the Support Vector Machine (SVM), the Extreme Learning Machine (ELM) and the Random Forest (RF) by adopting a majority weighted voting method, wherein when the output result of the classifier is +1, the classification result is represented as a normal classification, namely the stenosis degree is lower than 50%, and when the output result of the classifier is-1, the classification result is represented as a lesion classification, namely the stenosis degree is higher than 50%; and (3) multiplying the classification result of each classifier by the corresponding weight obtained in the step 5.1, and then adding the three products to be used as final output to obtain the classification result of the multi-classifier fusion prediction model, wherein the classification result is judged to be a normal classification when the addition result is a positive number, and is judged to be a lesion classification when the addition result is a negative number.
The weight occupied by each classifier in the step 5.1 is determined according to the classification accuracy, and the accuracy calculation formula of the classification model is as follows:
wherein, a is 1,2, 3; n-narrow, Non-narrow; e'nIs composed ofCumulative number of times of (e)nIs classified as normal or abnormalThe cumulative number of classes; y isaE { +1, -1} is a label of the training sample,respectively representing the classification result of each model;
calculating the weight w of each modelaComprises the following steps:
wherein the content of the first and second substances,
and 5.2, multiplying and adding the results obtained by the models with corresponding weight values respectively to obtain a final output result:
when the output result is positive, the classification result is indicated as a normal classification, namely the stenosis degree is lower than 50%, and when the output result is negative, the classification result is indicated as a lesion classification, namely the stenosis degree is higher than 50%.
The coronary artery stenosis degree identification method based on multi-classifier fusion has the advantages that CTA (computed tomography angiography) and DSA (digital signal processor) images and diagnosis reports of existing patients can be corresponded, a multi-classifier fusion prediction model can be established by directly using cardiac CTA detection results and machine learning, so that the gold index of the coronary artery stenosis degree of the patients can be predicted, and a treatment scheme is determined. The in-vitro classification prediction mode is adopted, adverse reaction and trauma brought to a patient by invasive coronary angiography are avoided, and the patient does not need to carry out coronary angiography independently, so that the applicability of coronary lesion diagnosis can be improved, meanwhile, the advantages of all classifiers can be combined by fusing multiple classifiers, the prediction accuracy and the prediction speed have good performance, and the diagnosis efficiency of a clinician is improved.
Drawings
FIG. 1 is a schematic diagram of a multi-classifier fusion-based prediction model framework according to the present invention;
FIG. 2 is a flow chart of the structure of the multi-classifier fusion-based prediction model according to the present invention;
FIG. 3 is a schematic diagram of a region growing segmentation algorithm employed in the present invention;
FIG. 4 is a flow chart of feature extraction according to the present invention;
FIG. 5 is a flow chart of feature screening according to the present invention;
FIG. 6 is a flow chart of various classifier implementations used in the present invention;
fig. 7 is a schematic diagram of the construction of the converged classifier network of the present invention.
Detailed Description
The present invention will be described in detail below with reference to the accompanying drawings and specific embodiments.
The invention provides a machine learning method for automatically identifying disease degree by using a fusion classifier based on the initial purpose of noninvasive coronary stenosis degree identification, and the machine learning method is used for comparing golden indexes by using two-dimensional CT (computed tomography) shot images so as to improve the clinical diagnosis efficiency. As shown in fig. 1, the overall framework diagram mainly includes six basic modules, namely, sample library construction, image preprocessing, feature extraction, feature screening, fusion classifier model construction and experimental verification, and can be understood as mainly including two main stages, namely, sample acquisition and modeling. In the stage of sample acquisition, all processing processes of training samples need to be completed, and in the stage of modeling, a machine learning model needs to be established to determine the structure and parameter tuning of the classifier. Finally, the effectiveness of the method provided by the invention can be verified and evaluated. It should be noted that the present invention is directed to the present protocol but not limited thereto, and is applicable to the diagnosis of other diseases besides being suitable for the study setting.
The invention discloses a coronary artery stenosis degree identification method based on multi-classifier fusion, which is implemented by combining a graph 1 and a graph 2 according to the following steps:
step 1, constructing an image sample library;
the step 1 is as follows:
the method comprises the steps of collecting information and images of patients who have been subjected to cardiac CTA and DSA examinations in a hospital data system for three years, enabling CT images and coronary stenosis index data to correspond to each other, hiding basic information of the patients in the images, selecting 500 coronary CT images of the patients meeting image quality as selected input samples, and labeling the label types.
Step 2, carrying out denoising and segmentation binarization processing on a CT original sequence diagram extracted from cardiac CTA to obtain a coronary artery extraction diagram;
the step 2 is as follows:
step 2.1, because the target object is a CT image, the noise mainly introduced by the medical image is Gaussian noise, a good denoising effect can be achieved on the Gaussian noise by adopting a median filtering mode, all pixel points in the neighborhood of the Gaussian noise in the original CT image are arranged according to the size rule, the image is denoised by taking the gray value of the middle pixel as the gray value of the noise, and the principle expression is as follows:
wherein i, j represents the coordinate value of the pixel point, gijA is the grey value of the noise point, and A is the neighborhood region taken by the noise point; { fijIs the data sequence; med means median operation.
The image quality of the CT image can be improved through denoising processing, and meanwhile, the denoised image can more clearly embody the coronary artery structure information in the CT image, and the segmentation operation in the step 2.2 is facilitated to be promoted.
And 2.2, according to the shape characteristics of the blood vessels, the extracted region has obvious difference with the external region, so that the image is segmented by using a region growth-based algorithm, and the concept means that all pixel points meeting certain similar characteristics are divided into the same region to realize segmentation. Firstly, aiming at a plurality of regions to be segmented of the whole image, selecting a seed point in each region to be segmented as a starting point of region growth, merging pixels which are close to or similar to the characteristics of the pixel point in the periphery of the region to be segmented into a region where a preset seed pixel is located according to a growth criterion which optimizes the target of the pixel, continuing to grow the merged new pixels serving as seed regions according to the method until the whole image is traversed, so that when pixels which do not meet the preset condition or criterion in the whole image can be merged into the seed regions, the whole region growth segmentation process is finished.
The region growing segmentation algorithm can better segment the connected regions with the same characteristics, provide good boundary information, represent the whole image, and be taken as the process of segmenting the whole de-noised CT image into c sub-regions, and the following conditions of (i) - (iv) are simultaneously satisfied:
①U(Rx)=R,Rxis a sub-connected region;
②Rx∩Ryc, and x ≠ y for any x, y;
③P(Rx) C, True, for x 1,2,3.
④R(Rx∪Ry)=False,x≠y;
The region growing segmentation algorithm is a process of aggregating pixels or sub-regions into a larger region according to a predefined criterion, and the coronary artery blood vessels with continuous regions are extracted from the original CT image through the region growing segmentation algorithm, i.e. a coronary artery extraction map is obtained.
Step 3, extracting the features of the segmented image, and extracting three major features of interest, namely texture features, gray features and geometric features according to the characteristics of the medical image as shown in fig. 3;
the step 3 is as follows:
3.1, extracting the gray features of the mean value, the variance, the energy, the entropy, the kurtosis and the skewness of the coronary artery extraction image in the step 2 by adopting a gray histogram method;
3.2, constructing a gray level co-occurrence matrix, selecting a 5 multiplied by 5 sliding window to calculate the gray level characteristic value of each pixel point of the coronary artery extraction image in the step 2, and extracting the texture characteristic of the image;
and 3.3, extracting the geometric features of the coronary artery image by adopting a Hu invariant moment method based on the coronary artery extraction image obtained in the step 2, wherein in statistics, the moment reflects the dispersion situation of random variables and is popularized to the field of images, and if the gray value of the image is regarded as a density dispersion function, the moment mode can be used for extracting the image features. The Hu invariant moment method represents the geometric characteristics of an image area, firstly, the second-order and third-order center distances of a coronary image are calculated, then normalization processing is carried out to obtain an invariant moment group, and the invariant moment group describes the shape geometric characteristics of the coronary extracted image.
Step 4, according to the principle of 7:3, a random index method is adopted, 500 grouped samples are divided into a training group and a testing group, the three image omics characteristics of texture, gray level and geometry extracted in the step 3 are screened by adopting a multi-classification Relieff feature weighting algorithm, cross validation is adopted for random characteristics, the correlation between each characteristic and a prediction result is calculated, and the characteristic with small correlation is removed;
the step 4 is as follows:
4.1, selecting the first d features with the maximum correlation from all the features of the texture, gray scale and geometric three-large image omics extracted in the step 3 through a Relieff feature weighting algorithm to form d feature subsets, wherein each subset comprises the feature numbers from 1 to d in sequence;
step 4.2, performing ten-fold cross validation, dividing the sample set into 10 subsets, selecting one subset as a test set each time, using the other 9 subsets as training sets, repeating for 10 times, and finally selecting 10 times of average recognition accuracy as a result;
and 4.3, calculating the prediction error rate of each feature subset according to the process, and selecting the feature subset with the minimum prediction error rate as the input feature of the multi-classifier fusion prediction model in the step 5.
The specific characteristics of the Relieff feature weighting algorithm in step 4.1 are as follows:
as shown in the flowchart 4, the features are the basis of machine learning, but redundancy and correlation between the features adversely reduce the accuracy of classification, and particularly, in the application of a small sample learning model, too many features not only increase the complexity of the model, but also reduce the generalization capability of the model to a certain extent, so that the features extracted in step 2 are optimized and selected by using a ReliefF feature weighting algorithm, different weights are given to the features, and the features with weights less than a set threshold value are eliminated.
Randomly extracting a sample S from the training sample set each time, and then respectively finding out k adjacent samples H from the same type of samples and different types of samples of the sample Sl、MlAnd then updating the weight of each feature in the three types of features of texture, gray scale and geometry extracted in the step 3 in the prediction process, wherein the feature with the weight smaller than the set threshold value is eliminated, and the feature weight calculation formula is as follows:
in the above equation, m is the number of sample sampling times, k is the number of nearest neighbor samples, l is 1l) Denotes sample S and sample HlThe difference in characteristic A, C is the sample class, p (C) is the proportion of the number of samples of class C to the total number of samples, and p (class (S)) is the proportion of the number of samples in sample S to the total number of samples.
Step 5, forming a feature set by the features of the texture, the gray scale and the geometry selected in the step 4, establishing a multi-classifier fusion prediction model, and selecting three classifiers, namely a Support Vector Machine (SVM), a Random Forest (RF) and an Extreme Learning Machine (ELM), which have a good medical image classification effect to perform fusion prediction on the coronary lesion degree; and determining the weight of the 3 classifiers in the fusion classifier by adopting a weighting method so as to ensure that the prediction effect is optimal, judging as a normal sample when the stenosis degree is lower than 50%, and judging as a lesion sample when the stenosis degree is higher than 50%.
The step 5 is as follows:
as shown in the topological structure of the classifier in fig. 7, a Support Vector Machine (SVM), an Extreme Learning Machine (ELM), and a Random Forest (RF) are selected to form a fusion classifier to classify a sample set, and the principles of each classifier are as follows:
as shown in fig. 6(a), the basic principle of Support Vector Machine (SVM) is to find the optimal hyperplane that can separate different samples, and its solution is equivalent to the optimization process of convex quadratic programming: and searching an objective function and determining constraint conditions. The dimensionality disaster can be avoided, the robustness is good, and the generalization capability is strong; the classification performance of the SVM is influenced by various factors, wherein two key factors are 1) an error punishment parameter C; 2) kernel function form and its parameter g. The error penalty parameter enables the generalization capability of the learning machine to be optimal by adjusting the confidence range and the experience risk in the feature subspace. The radial basis kernel function has nonlinearity and few parameters, and can map original features to infinite dimensions, so the application selects the radial basis kernel function as the kernel function of the support vector machine.
As shown in fig. 6(b), the basic structure of the extreme learning machine is a single hidden layer neural network, which has better generalization ability and faster learning speed compared with the conventional BP neural network, and in brief, the network structure of the Extreme Learning Machine (ELM) model is the same as that of the single hidden layer feedforward neural network (SLFN), but is no longer a gradient-based algorithm (back propagation) trial and error in the conventional neural network in the training stage, and the random input layer weight and deviation are adopted, and the output layer weight is calculated by the generalized inverse matrix theory. And (3) finishing the training of the ultimate learning machine (ELM) after the weights and the deviations on all the network nodes are obtained, and calculating the prediction of the data output completion of the network by using the output layer weights just obtained when the test data comes. In the algorithm implementation process, the input comprises a data set, the number of hidden layer neurons and an activation function, the output is beta weight, and the hidden layer output and the output layer weight are calculated by randomly generating the input weight and the hidden layer deviation.
As shown in fig. 6(c), the input of the random forest includes training data sets and the number of sample subsets, the output is a final strong classifier, the applicability is good in machine learning, and no complex parameter tuning process is needed, only one tree can be constructed under a normal condition for one data set, a plurality of data subsets related to each other can be divided on the same data set through a guiding aggregation algorithm idea to construct a plurality of sub-trees, and the optimal classification is determined by voting on the classification results of the plurality of decision trees.
Step 5.1, as shown in the fusion schematic diagram of fig. 7, firstly, the feature sample set screened in step 4 is respectively passed through three single classifiers of a Support Vector Machine (SVM), an Extreme Learning Machine (ELM) and a Random Forest (RF) to obtain the recognition result of each classifier on the stenosis degree of the coronary artery, namely 3 classes obtained by the classification and prediction of the samples to be recognized by each classifier, and the weight occupied by each single classifier in the final multi-classifier fusion prediction model is calculated according to the correct classification capability of each classifier;
step 5.2, fusing the classification results of the Support Vector Machine (SVM), the Extreme Learning Machine (ELM) and the Random Forest (RF) by adopting a majority weighted voting method, wherein when the output result of the classifier is +1, the classification result is represented as a normal class, namely the stenosis degree is lower than 50%, and when the output result of the classifier is-1, the classification result is represented as a lesion class, namely the stenosis degree is higher than 50%; and (3) multiplying the classification result of each classifier by the corresponding weight obtained in the step 5.1, and then adding the three products to be used as final output to obtain the classification result of the multi-classifier fusion prediction model, wherein the classification result is judged to be a normal classification when the addition result is a positive number, and is judged to be a lesion classification when the addition result is a negative number.
The weight occupied by each classifier in the step 5.1 is determined according to the classification accuracy, and the accuracy calculation formula of the classification model is as follows:
wherein, a is 1,2, 3; n-narrow, Non-narrow; e'nIs composed ofCumulative number of times of (e)nThe accumulated times are classified into normal or abnormal; y isaE { +1, -1} is a label of the training sample,respectively representing the classification result of each model;
calculating the weight w of each modelaComprises the following steps:
wherein the content of the first and second substances,
and 5.2, multiplying and adding the results obtained by the models with corresponding weight values respectively to obtain a final output result:
when the output result is positive, the classification result is indicated as a normal classification, namely the stenosis degree is lower than 50%, and when the output result is negative, the classification result is indicated as a lesion classification, namely the stenosis degree is higher than 50%.
The technical scheme adopted by the invention comprises the design of two main components: an image processing phase and a classifier modeling phase. Firstly, a database is required to be acquired, a constructed image sample library is preprocessed, and a coronary artery segmentation graph is extracted to perform subsequent classification learning aiming at coronary arteries; and then, constructing a fusion classifier model, determining a topological structure of the classifier and a result output mode, identifying and classifying the stenosis degree according to the defined training sample, and fusing classification results. And finally, analyzing the accuracy, sensitivity, specificity, negative predicted value and positive predicted value of the predicted result by adopting SPSS software and carrying out classified prediction by utilizing a test set. In the process, the invention defines two types of labeling categories from the perspective of non-invasively determining the coronary stenosis degree: the degree of stenosis is 50% or more and 50% or less. Clinically, coronary heart disease is defined as when the stenosis degree is more than 50%, so that attention needs to be paid to the cases with stenosis degree of more than 50% when the cases are classified, and a treatment plan is made. The fusion classifier is one of important components for carrying out degree identification and degree type marking, parameter training is carried out on the model by using a marked training set, and then identification and marking are carried out by applying a test set. In order to obtain higher classification accuracy, the invention screens the features by adopting an algorithm aiming at the characteristic that overfitting is caused by excessive features, three classifiers with better image classification performance are selected to be jointly constructed on the selection of a single classifier, the multiple classifiers can be mutually coordinated due to the optimization of multiple hyper-parameters of the single classifier, the problem of parameter optimization is reduced, the effect that one plus one is greater than two can be realized on the accuracy of the classification result, and the weighting fusion algorithm gives a larger weight to the classifier with better classification performance, so that the classification result is more credible.
The invention adopts SPSS software to analyze the accuracy, sensitivity, specificity, negative predicted value and positive predicted value of the predicted result. And testing the classification effect of the model by adopting the test set. In the designed technical scheme, the proportion of each classifier in the fusion classifier in the step 5 is distributed through the prediction capability of each classifier, the classification standard takes the international latest coronary artery stenosis diagnosis standard CAD-RADS as a criterion, when the stenosis degree is less than 50%, observation and prevention are mainly used, and when the stenosis degree is more than 50%, treatment such as medicine and operation is considered.
The invention can directly classify the stenosis degree by the heart CTA image through learning the comparison of the prior patient data CTA and DSA diagnosis results, and finally realize that the stenosis degree corresponding to the DSA is predicted automatically by the coronary CT image, namely, the gold index of coronary lesion is accurately determined by the CTA image, thereby providing a treatment scheme without invasive examination, not only assisting a doctor to provide a diagnosis result, improving the working efficiency, but also greatly relieving the pain of a patient, and having important clinical significance.
- 上一篇:石墨接头机器人自动装卡簧、装栓机
- 下一篇:抽真空结果判定方法、装置、设备及存储介质