Classification method based on integrated classification model and related equipment
1. A classification method based on an integrated classification model is characterized by comprising the following steps:
constructing the ensemble classification model, the ensemble classification model comprising a base classifier layer, the base classifier layer comprising a base classifier;
training the integrated classification model by using a model training sample set;
obtaining current data to be classified; and
processing the current data to be classified through the trained integrated classification model to obtain a target classification result of the current data to be classified;
wherein constructing the ensemble classification model comprises:
selecting k classifiers from the N classifiers to obtain at least two classifier combinations, wherein k is a positive integer which is greater than or equal to 2 and less than or equal to N, and N is a positive integer which is greater than or equal to 2;
determining the number index of the target classifiers of each classifier combination based on the number of the classifiers contained in each classifier combination;
determining the accuracy and diversity measurement index of the weighted number of each classifier combination according to the number index of the target classifiers of each classifier combination;
determining a target classifier combination from the at least two classifier combinations according to the weighted number accuracy diversity metric index of each classifier combination;
using the classifiers in the target classifier combination as the base classifiers in the base classifier layer of the integrated classification model.
2. The method of claim 1, wherein determining the target classifier count index for each classifier combination based on the number of classifiers included in each classifier combination comprises:
determining the number of classifiers in each classifier combination according to the number of classifiers in each classifier combination;
determining the maximum classifier number and the minimum classifier number from the classifier numbers of the at least two kinds of classifier combinations;
and determining the target classifier number index of each classifier combination according to the maximum classifier number, the minimum classifier number and the classifier number of each classifier combination.
3. The method of claim 1 or 2, wherein determining the weighted number accuracy diversity metric index for each classifier combination based on the target classifier number index for each classifier combination comprises:
obtaining a target accuracy index of each classifier combination;
obtaining a target diversity index of each classifier combination;
determining an accuracy weight, a diversity weight and a classifier number weight;
and determining the weighted number accuracy diversity measurement index of each classifier combination according to the target accuracy index and the accuracy weight, the target diversity index and the diversity weight of each classifier combination, and the target classifier number index and the classifier number weight.
4. The method of claim 3, wherein obtaining the target accuracy indicator for each classifier combination comprises:
obtaining an accuracy measurement index of each classifier in each classifier combination;
obtaining an accuracy measurement index mean value of each classifier combination according to the accuracy measurement index of each classifier in each classifier combination;
determining a maximum accuracy measure index mean value and a minimum accuracy measure index mean value from the accuracy measure index mean values of the at least two classifier combinations;
and determining the target accuracy index of each classifier combination according to the maximum accuracy measurement index mean value, the minimum accuracy measurement index mean value and the accuracy measurement index mean value of each classifier combination.
5. The method of claim 4, wherein obtaining an accuracy metric for each classifier in each classifier combination comprises:
inputting a first sample in a first classifier training sample set into each classifier in each classifier combination, and obtaining a prediction label of the corresponding first sample output by each classifier in each classifier combination;
determining the true sample number, the false negative sample number, the false positive sample number and the true negative sample number of each classifier in each classifier combination according to the prediction label of the corresponding first sample output by each classifier in each classifier combination and the true label thereof;
determining a true sample rate for each classifier in each classifier combination according to the true sample number and the false negative sample number of each classifier in each classifier combination;
determining a false positive sample rate for each classifier in each classifier combination based on the false positive and true negative sample numbers for each classifier in each classifier combination;
determining a positive and negative sample separation degree index of each classifier in each classifier combination according to the maximum value of the difference value of the real sample rate and the false positive sample rate of each classifier in each classifier combination;
and determining the accuracy measurement index of each classifier in each classifier combination according to the positive and negative sample separation degree index of each classifier in each classifier combination.
6. The method of claim 4, wherein obtaining an accuracy metric for each classifier in each classifier combination comprises:
inputting a second sample in a second classifier training sample set into each classifier in each classifier combination to obtain a prediction label of the corresponding second sample output by each classifier in each classifier combination;
determining the true sample number, the false negative sample number, the false positive sample number and the true negative sample number of each classifier in each classifier combination according to the prediction label of the corresponding second sample output by each classifier in each classifier combination and the true label thereof;
determining a true sample rate for each classifier in each classifier combination according to the true sample number and the false negative sample number of each classifier in each classifier combination;
determining a false positive sample rate for each classifier in each classifier combination based on the false positive and true negative sample numbers for each classifier in each classifier combination;
determining area indexes under curves of each classifier in each classifier combination according to the real sample rate and the false positive sample rate of each classifier in each classifier combination;
and determining the accuracy measurement index of each classifier in each classifier combination according to the area under the curve index of each classifier in each classifier combination.
7. The method of claim 4, wherein obtaining an accuracy metric for each classifier in each classifier combination comprises:
inputting a third sample in the third classifier training sample set into each classifier in each classifier combination, and obtaining a prediction label of the corresponding third sample output by each classifier in each classifier combination;
determining the true sample number and the false negative sample number of each classifier in each classifier combination according to the prediction label of the corresponding third sample output by each classifier in each classifier combination and the true label thereof;
determining a recall rate for each classifier in each classifier combination based on the true and false negative sample counts for each classifier in each classifier combination;
and determining an accuracy measure index of each classifier in each classifier combination according to the recall rate of each classifier in each classifier combination.
8. The method of claim 3, wherein obtaining a target diversity indicator for each classifier combination comprises:
obtaining diversity measurement indexes of every two classifiers in each classifier combination;
obtaining a diversity measurement index mean value of each classifier combination according to the diversity measurement indexes of every two classifiers in each classifier combination;
determining a maximum diversity measure mean value and a minimum diversity measure mean value from the diversity measure mean values of the at least two classifier combinations;
and determining the target diversity index of each classifier combination according to the maximum diversity measure index mean value, the minimum diversity measure index mean value and the diversity measure index mean value of each classifier combination.
9. The method of claim 8, wherein the at least two classifier combinations include a first classifier combination comprising a first classifier and a second classifier; obtaining the diversity measure of every two classifiers in each classifier combination comprises:
inputting fourth samples in a fourth classifier training sample set to the first classifier and the second classifier respectively to obtain prediction labels of the corresponding fourth samples output by the first classifier and the second classifier respectively;
according to the prediction label and the real label of the corresponding fourth sample output by the first classifier and the second classifier respectively, obtaining the number of samples which are classified correctly by the first classifier and the second classifier simultaneously, the number of samples which are classified correctly by the first classifier and the second classifier incorrectly, the number of samples which are classified incorrectly by the first classifier and the second classifier correctly, and the number of samples which are classified incorrectly by the first classifier and the second classifier;
determining a diversity measure for the first classifier and the second classifier in the first classifier combination based on the number of samples that the first classifier and the second classifier classify correctly, the number of samples that the first classifier and the second classifier classify incorrectly, and the number of samples that the first classifier and the second classifier both classify incorrectly.
10. The method of claim 9, wherein determining the diversity metric for the first classifier and the second classifier in the first classifier combination based on the first classifier and the second classifier classifying a correct number of samples simultaneously, the first classifier classifying a correct number of samples and the second classifier classifying an incorrect number of samples, the first classifier classifying an incorrect number of samples and the second classifier classifying a correct number of samples, and the first classifier and the second classifier averaging an incorrect number of samples comprises:
obtaining correlation coefficients of the first classifier and the second classifier in the first classifier combination according to the number of samples classified by the first classifier and the second classifier simultaneously, the number of samples classified by the first classifier correctly and the second classifier incorrectly, the number of samples classified by the first classifier incorrectly and the second classifier correctly, and the number of samples classified by the first classifier and the second classifier incorrectly;
and obtaining the diversity measurement indexes of the first classifier and the second classifier in the first classifier combination according to the correlation coefficient of the first classifier and the second classifier in the first classifier combination.
11. The method of claim 9, wherein determining the diversity metric for the first classifier and the second classifier in the first classifier combination based on the first classifier and the second classifier classifying a correct number of samples simultaneously, the first classifier classifying a correct number of samples and the second classifier classifying an incorrect number of samples, the first classifier classifying an incorrect number of samples and the second classifier classifying a correct number of samples, and the first classifier and the second classifier averaging an incorrect number of samples comprises:
obtaining Q statistics for the first classifier and the second classifier in the first classifier combination based on the number of samples that the first classifier and the second classifier classify correctly, the number of samples that the first classifier and the second classifier classify incorrectly, and the number of samples that the first classifier and the second classifier both classify incorrectly;
obtaining a diversity measure for the first classifier and the second classifier in the first classifier combination based on the Q statistics for the first classifier and the second classifier in the first classifier combination.
12. The method of claim 9, wherein determining the diversity metric for the first classifier and the second classifier in the first classifier combination based on the first classifier and the second classifier classifying a correct number of samples simultaneously, the first classifier classifying a correct number of samples and the second classifier classifying an incorrect number of samples, the first classifier classifying an incorrect number of samples and the second classifier classifying a correct number of samples, and the first classifier and the second classifier averaging an incorrect number of samples comprises:
obtaining kappa statistics for the first classifier and the second classifier in the first classifier combination according to the number of samples that the first classifier and the second classifier classify correctly, the number of samples that the first classifier and the second classifier classify incorrectly, and the number of samples that the first classifier and the second classifier both classify incorrectly;
obtaining a diversity measure for the first classifier and the second classifier in the first classifier combination according to the kappa statistics of the first classifier and the second classifier in the first classifier combination.
13. A classification apparatus based on an integrated classification model, comprising:
an integrated classification model construction unit, configured to construct the integrated classification model, where the integrated classification model includes a base classifier layer that includes a base classifier;
an integrated classification model training unit for training the integrated classification model using a model training sample set;
the current data to be classified obtaining unit is used for obtaining current data to be classified; and
a target classification result obtaining unit, configured to process the current data to be classified through the trained integrated classification model, and obtain a target classification result of the current data to be classified;
wherein the integrated classification model construction unit includes:
a classifier combination obtaining unit configured to select k classifiers from the N classifiers to obtain at least two classifier combinations, where k is a positive integer greater than or equal to 2 and less than or equal to N, and N is a positive integer greater than or equal to 2;
a target classifier number index determining unit configured to determine a target classifier number index for each classifier combination based on the number of classifiers included in each classifier combination;
the weighted number accuracy diversity measurement index determining unit is used for determining the weighted number accuracy diversity measurement index of each classifier combination according to the target classifier number index of each classifier combination;
a target classifier combination determining unit, configured to determine a target classifier combination from the at least two kinds of classifier combinations according to the weighted number accuracy diversity metric of each classifier combination;
a base classifier determination unit for using the classifiers in the target classifier combination as the base classifiers in the base classifier layer of the integrated classification model.
14. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the method according to any one of claims 1 to 12.
15. An electronic device, comprising:
at least one processor;
a storage device configured to store at least one program that, when executed by the at least one processor, causes the at least one processor to implement the method of any one of claims 1 to 12.
Background
An integrated classification model (also called a Stacked Generalization model) is an integrated classification algorithm, and can also be regarded as a special combination strategy.
The selection of the base classifier in the integrated classification model greatly affects the performance of the integrated classification model. How to determine which classifiers are selected and how many classifiers are selected as base classifiers in the integrated classification model to construct a robust integrated classification model is a technical problem to be solved urgently.
Therefore, a new classification method and apparatus based on an integrated classification model, a computer-readable storage medium, and an electronic device are needed.
It is to be noted that the information disclosed in the above background section is only for enhancement of understanding of the background of the present disclosure.
Disclosure of Invention
The embodiment of the disclosure provides a classification method and device based on an integrated classification model, a computer-readable storage medium and an electronic device, which can solve the technical problem of constructing a robust integrated classification model in the related art.
Additional features and advantages of the disclosure will be set forth in the detailed description which follows, or in part will be obvious from the description, or may be learned by practice of the disclosure.
The embodiment of the disclosure provides a classification method based on an integrated classification model, which comprises the following steps: constructing the ensemble classification model, the ensemble classification model comprising a base classifier layer, the base classifier layer comprising a base classifier; training the integrated classification model by using a model training sample set; obtaining current data to be classified; and processing the current data to be classified through the trained integrated classification model to obtain a target classification result of the current data to be classified.
Wherein constructing the ensemble classification model comprises: selecting k classifiers from the N classifiers to obtain at least two classifier combinations, wherein k is a positive integer which is greater than or equal to 2 and less than or equal to N, and N is a positive integer which is greater than or equal to 2; determining the number index of the target classifiers of each classifier combination based on the number of the classifiers contained in each classifier combination; determining the accuracy and diversity measurement index of the weighted number of each classifier combination according to the number index of the target classifiers of each classifier combination; determining a target classifier combination from the at least two classifier combinations according to the weighted number accuracy diversity metric index of each classifier combination; using the classifiers in the target classifier combination as the base classifiers in the base classifier layer of the integrated classification model.
The embodiment of the present disclosure provides a classification device based on an integrated classification model, the device includes: an integrated classification model construction unit, configured to construct the integrated classification model, where the integrated classification model includes a base classifier layer that includes a base classifier; an integrated classification model training unit for training the integrated classification model using a model training sample set; the current data to be classified obtaining unit is used for obtaining current data to be classified; and the target classification result obtaining unit is used for processing the current data to be classified through the trained integrated classification model to obtain a target classification result of the current data to be classified.
Wherein the integrated classification model construction unit includes: a classifier combination obtaining unit configured to select k classifiers from the N classifiers to obtain at least two classifier combinations, where k is a positive integer greater than or equal to 2 and less than or equal to N, and N is a positive integer greater than or equal to 2; a target classifier number index determining unit configured to determine a target classifier number index for each classifier combination based on the number of classifiers included in each classifier combination; the weighted number accuracy diversity measurement index determining unit is used for determining the weighted number accuracy diversity measurement index of each classifier combination according to the target classifier number index of each classifier combination; a target classifier combination determining unit, configured to determine a target classifier combination from the at least two kinds of classifier combinations according to the weighted number accuracy diversity metric of each classifier combination; a base classifier determination unit for using the classifiers in the target classifier combination as the base classifiers in the base classifier layer of the integrated classification model.
The disclosed embodiments provide a computer-readable storage medium, on which a computer program is stored, which when executed by a processor implements an integrated classification model-based classification method as described in the above embodiments.
An embodiment of the present disclosure provides an electronic device, including: at least one processor; a storage device configured to store at least one program that, when executed by the at least one processor, causes the at least one processor to implement the integrated classification model-based classification method as described in the above embodiments.
In some embodiments of the present disclosure, in the process of constructing the integrated classification model, a plurality of classifier combinations are obtained by selecting different numbers of classifiers from a given plurality of selectable classifiers, then a target classifier number index of each classifier combination can be determined according to the number of classifiers included in each classifier combination, and further a weighted number accuracy diversity metric index of each classifier combination is determined according to the target classifier number index of each classifier combination, then an optimal classifier combination is selected from the plurality of classifier combinations to cooperate as the target classifier combination according to the weighted number accuracy diversity metric index of each classifier combination, since the factor of the number of classifiers included in each classifier combination is considered in the process of determining the optimal classifier combination, the finally determined optimal classifier combination can determine the number of the optimal classifiers from the given multiple alternative classifiers according to the actual situation, so that the constructed base classifier of the integrated classification model has the optimal number of the classifiers. When the method provided by the embodiment of the disclosure is applied to the construction of different integrated classification models, the method can be suitable for the selection of the base classifiers with different numbers of classifiers, a selection strategy with more universality is provided, and finally the robustness of the constructed integrated classification model is stronger. When the trained integrated classification model is applied to the classification problem, the classification efficiency and the performance of the integrated classification model can be improved.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present disclosure and together with the description, serve to explain the principles of the disclosure. It is to be understood that the drawings in the following description are merely exemplary of the disclosure, and that other drawings may be derived from those drawings by one of ordinary skill in the art without the exercise of inventive faculty. In the drawings:
fig. 1 schematically shows a flow chart of an integrated classification model based classification method according to an embodiment of the present disclosure.
Fig. 2 schematically shows a flowchart of step S12 in fig. 1 in an exemplary embodiment.
Fig. 3 schematically shows a flowchart of step S13 in fig. 1 in an exemplary embodiment.
Fig. 4 schematically shows a flowchart of step S131 in fig. 3 in an exemplary embodiment.
Fig. 5 schematically shows a flowchart of step S132 in fig. 3 in an exemplary embodiment.
FIG. 6 schematically shows a flow diagram of an integrated classification model-based classification method according to an embodiment of the present disclosure.
FIG. 7 schematically shows a flow diagram of an integrated classification model-based classification method according to an embodiment of the present disclosure.
Fig. 8 schematically shows a schematic diagram of an integrated classification model-based classification method according to an embodiment of the present disclosure.
Fig. 9 schematically shows a schematic diagram of a classification method based on an integrated classification model according to an embodiment of the present disclosure.
FIG. 10 schematically shows a block diagram of an integrated classification model-based classification apparatus according to an embodiment of the present disclosure.
Fig. 11 is an exemplary scene diagram illustrating a classification method based on an integrated classification model to which an embodiment of the present disclosure may be applied.
FIG. 12 shows a schematic diagram of an electronic device suitable for use in implementing embodiments of the present disclosure.
Detailed Description
Example embodiments will now be described more fully with reference to the accompanying drawings. Example embodiments may, however, be embodied in many different forms and should not be construed as limited to the embodiments set forth herein; rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the concept of example embodiments to those skilled in the art. The same reference numerals denote the same or similar parts in the drawings, and thus, a repetitive description thereof will be omitted.
The described features, structures, or characteristics of the disclosure may be combined in any suitable manner in one or more embodiments. In the following description, numerous specific details are provided to give a thorough understanding of embodiments of the disclosure. One skilled in the relevant art will recognize, however, that the subject matter of the present disclosure can be practiced without one or more of the specific details, or with other methods, components, devices, steps, and the like. In other instances, well-known methods, devices, implementations, or operations have not been shown or described in detail to avoid obscuring aspects of the disclosure.
The drawings are merely schematic illustrations of the present disclosure, in which the same reference numerals denote the same or similar parts, and thus, a repetitive description thereof will be omitted. Some of the block diagrams shown in the figures do not necessarily correspond to physically or logically separate entities. These functional entities may be implemented in the form of software, or in at least one hardware module or integrated circuit, or in different networks and/or processor means and/or microcontroller means.
The flow charts shown in the drawings are merely illustrative and do not necessarily include all of the contents and steps, nor do they necessarily have to be performed in the order described. For example, some steps may be decomposed, and some steps may be combined or partially combined, so that the actual execution sequence may be changed according to the actual situation.
In this specification, the terms "a", "an", "the", "said" and "at least one" are used to indicate the presence of at least one element/component/etc.; the terms "comprising," "including," and "having" are intended to be inclusive and mean that there may be additional elements/components/etc. other than the listed elements/components/etc.; the terms "first," "second," and "third," etc. are used merely as labels, and are not limiting on the number of their objects.
The following detailed description of exemplary embodiments of the disclosure refers to the accompanying drawings.
Based on the technical problems in the related art, the embodiments of the present disclosure provide a classification method based on an integrated classification model, so as to at least partially solve the above problems. The method provided by the embodiments of the present disclosure may be executed by any electronic device, for example, a server, or a terminal, or an interaction between a server and a terminal, which is not limited in the present disclosure.
Fig. 1 schematically shows a flow chart of an integrated classification model based classification method according to an embodiment of the present disclosure. As shown in fig. 1, the method provided by the embodiment of the present disclosure may include the following steps.
In step S1, the ensemble classification model is constructed, which includes a base classifier layer including base classifiers.
In the embodiment of the present disclosure, the integrated classification model may also be referred to as a Stacking model, and is a layered model integrated framework, which may include multiple layers (two or more layers), and in the following description, two layers are taken as an example for illustration, but the present disclosure is not limited thereto.
Wherein the first layer comprises a plurality of (two or more) Base classifiers, called Base-Level. The multiple base classifiers may be different types of classifiers, or may be the same type of classifiers with different parameters. The second layer is referred to as the sub-classifier layer (Meta-Level) of the base classifier.
Referring to FIG. 1, the above step S1 may further include the following steps S11-S15 for determining a base classifier in a base classifier layer in the ensemble classification model.
In step S11, k classifiers are selected from the N classifiers to obtain at least two classifier combinations, where k is a positive integer greater than or equal to 2 and less than or equal to N, and N is a positive integer greater than or equal to 2.
Wherein k classifiers are selected at a time from N given alternative classifiers, at mostAnd (4) combining classifiers.
In step S12, a target classifier number index for each classifier combination is determined based on the number of classifiers included in each classifier combination.
In the embodiment of the present disclosure, the target classifier number index is an index for measuring the number of classifiers included in the classifier combination.
For example, as described aboveIn the classifier combination, when k is 2, the number of classifiers included in the corresponding classifier combination is equal to 2; when k is 3, the number of classifiers included in the corresponding classifier combination is equal to 3; and so on, when k is equal to N, the number of classifiers included in the corresponding classifier combination is equal to N.
In step S13, a weighted number accuracy diversity metric for each classifier combination is determined based on the target classifier number index for each classifier combination.
In step S14, a target classifier combination is determined from the at least two classifier combinations according to the weighted number accuracy diversity metric for each classifier combination.
In step S15, the classifiers in the target classifier combination are taken as the base classifiers in the base classifier layer of the integrated classification model.
In the embodiment of the disclosure, when calculating the weighted number accuracy diversity measurement index of each classifier combination, the number of classifiers included in each classifier combination is considered, that is, the number index of the target classifier of each classifier combination is introduced, so that the obtained weighted number accuracy diversity measurement index can be suitable for selection of classifiers with different numbers to serve as a base classifier.
The overall principle of the Stacking model design is that the base classifier is good and different, wherein good represents accuracy and different represents diversity. As can be seen from table 1 below, when the accuracy and diversity of the basis classifiers are the same, but the number of the basis classifiers is different, the effect of the Stacking model may also be affected, and especially when the number of samples in the model training sample set used for training the Stacking model is much larger than the number of the basis classifiers, the more the number of the basis classifiers is, the more the Stacking model is learned in general. Therefore, the embodiment of the disclosure can be applied to the selection of the base classifiers with different numbers by introducing the number index of the target classifiers.
TABLE 1Stacking model based classifier effect and Stacking effect schematic diagram
Sample 1
Sample 2
Sample 3
Classification results of the base classifier 1
Hook (correct classification)
√
×
Classification results of the base classifier 2
X (incorrect classification)
√
√
Classification result of the base classifier 3
√
×
√
Classification results of Stacking model
√
√
√
In step S2, the ensemble classification model is trained using a set of model training samples.
In the embodiment of the disclosure, the input of the base classifier layer is a model training sample set, and the secondary classifier layer of the base classifier takes the output of the base classifier in the first layer as a feature and adds the feature into the model training sample set for retraining, so as to obtain a complete integrated classification model. The prediction accuracy of the integrated classification model can be improved after the integrated classification model is fused.
In step S3, current data to be classified is obtained.
In the embodiment of the present disclosure, the current data to be classified may be determined according to different classification scenarios, which is not limited by the present disclosure.
In step S4, the trained integrated classification model is used to process the current data to be classified, so as to obtain a target classification result of the current data to be classified.
In the embodiment of the present disclosure, the target classification result of the current data to be classified may be determined according to different classification scenarios, which is not limited by the present disclosure. The Stacking model relearns the output of the base classifier, so that the Stacking model can be guaranteed to have better and more robust classification performance.
The classification method based on the integrated classification model provided by the embodiment of the disclosure can obtain a plurality of classifier combinations by selecting different numbers of classifiers from a given plurality of selectable classifiers in the process of constructing the integrated classification model, then can determine the target classifier number index of each classifier combination according to the number of classifiers contained in each classifier combination, further determine the weighted number accuracy diversity measurement index of each classifier combination according to the target classifier number index of each classifier combination, then select an optimal classifier combination from the plurality of classifier combinations to cooperate as the target classifier combination according to the weighted number accuracy diversity measurement index of each classifier combination, because the factor of the number of classifiers contained in each classifier combination is considered in the process of determining the optimal classifier combination, the finally determined optimal classifier combination can determine the number of the optimal classifiers from the given multiple alternative classifiers according to the actual situation, so that the constructed base classifier of the integrated classification model has the optimal number of the classifiers. When the method provided by the embodiment of the disclosure is applied to the construction of different integrated classification models, the method can be suitable for the selection of the base classifiers with different numbers of classifiers, a selection strategy with more universality is provided, and finally the robustness of the constructed integrated classification model is stronger. When the trained integrated classification model is applied to the classification problem, the classification efficiency and the performance of the integrated classification model can be improved.
Fig. 2 schematically shows a flowchart of step S12 in fig. 1 in an exemplary embodiment. As shown in fig. 2, the step S12 may further include the following steps in the embodiment of the present disclosure.
In step S121, the number of classifiers for each classifier combination is determined based on the number of classifiers included in each classifier combination.
For example, as described aboveIn the classifier combination, when k is 2, the number of classifiers included in the corresponding classifier combination is equal to 2, and the number of classifiers of the corresponding classifier combination is 2; when k is 3, the number of classifiers included in the corresponding classifier group is equal to 3, and the number of classifiers of the corresponding classifier group is 3; by analogy, when k is equal to N, the number of classifiers included in the corresponding classifier group is equal to N, and the number of classifiers of the corresponding classifier group is N.
In step S122, a maximum classifier number and a minimum classifier number are determined from the classifier numbers of the at least two kinds of classifier combinations.
For example, as described aboveIn a classifier combination, all togetherNumber k of classifiers forThe number k of the classifiers is arranged in descending order or ascending order, and the maximum number k of the classifiers can be determined from the descending order or the ascending ordermaxAnd the minimum number k of classifiersmin。
In step S123, a target classifier number index of each classifier combination is determined according to the maximum classifier number, the minimum classifier number, and the classifier number of each classifier combination.
For example, the number k of classifiers for each classifier combination may be normalized to the maximum or minimum according to the following formula to obtain the normalized number of classifiers for each classifier combination, and the normalized number of classifiers may be used as the index k of the number of target classifiers for the corresponding classifier combinationscale:
Wherein k is more than or equal to 2 and less than or equal to N.
In the embodiment of the disclosure, the number of classifiers of each classifier combination is normalized, and the normalized number of classifiers is used as the target classifier number index of each classifier combination, so that the accuracy and diversity measurement index of the weighted number of each classifier combination obtained according to the target classifier number index of each classifier combination is not influenced by dimensions, and the method has universality.
Fig. 3 schematically shows a flowchart of step S13 in fig. 1 in an exemplary embodiment. As shown in fig. 3, the step S13 may further include the following steps in the embodiment of the present disclosure.
In step S131, a target accuracy index for each classifier combination is obtained.
In the embodiment of the present disclosure, the target accuracy index is an index for measuring the classification accuracy performance of the classifiers in the classifier combination.
In step S132, a target diversity index for each classifier combination is obtained.
In the embodiment of the present disclosure, the target diversity index is an index for measuring the diversity performance of the classifiers in the classifier combination.
In step S133, the accuracy weight, the diversity weight, and the classifier number weight are determined.
For example, an accuracy weight can be represented by α, which represents a weight that controls how important the classification accuracy of the classifiers in the classifier combination is; the diversity weight can be represented by beta, which represents the weight controlling the importance degree of diversity of the classifiers in the classifier combination; the weight of the number of classifiers can be represented by λ, which represents the weight that controls the importance of the number of classifiers in the classifier combination.
Wherein α + β + λ is 1, 0< α <1, 0< β <1, 0< λ < 1.
When alpha is more than beta, representing accuracy is more important than diversity, and preferentially selecting a more accurate base classifier instead of a more diverse base classifier according to the weighted number accuracy diversity measurement index; when alpha < beta, the representative diversity is more important than the accuracy, and a more diverse rather than more accurate base classifier is preferentially selected according to the weighted number accuracy diversity measurement index.
If the accuracy, diversity and number of classifiers are considered as important, the target accuracy index, the target diversity index and the target classifier number index have the same weight, that is to say
In step S134, a weighted number accuracy diversity metric index for each classifier combination is determined according to the target accuracy index Rig and the accuracy weight, the target diversity index Vari and the diversity weight, and the target classifier number index and the classifier number weight for each classifier combination.
The selection of the base classifier in the Stacking model greatly influences the performance of the Stacking model. A robust integrated classification model should not only be accurate, but also be versatile. Therefore, when designing an ensemble classification model, it is desirable to be able to select a base classifier based on accuracy and diversity. The general principle of designing a good integrated classification model is to make the base classifiers "good and different", i.e., the base classifiers need to satisfy both accuracy and diversity factors.
In order to balance the Accuracy and Diversity of the basis classifier, the embodiment of the disclosure provides a Weighted Number Accuracy Diversity metric index (WNAD), and the basis classifier of the Stacking model is selected according to the WNAD index. Wherein the larger the value of WNAD, the better the performance of the integrated classification model.
For example, the weighted number accuracy diversity metric WNAD for each classifier combination may be calculated according to the following formula:
in the embodiment of the disclosure, the base classification of the Stacking model can be performed according to WNAD indexesThe selector makes a selection. The weighted number accuracy diversity metric WNAD for each classifier combination is defined as the target accuracy index Rig, the target diversity index Vari and the target classifier number index k for each classifier combinationscaleWeighted harmonic mean between.
Fig. 4 schematically shows a flowchart of step S131 in fig. 3 in an exemplary embodiment. As shown in fig. 4, the step S131 may further include the following steps in the embodiment of the present disclosure.
In step S1311, an accuracy metric index for each classifier in each classifier combination is obtained.
In an exemplary embodiment, obtaining the accuracy metric for each classifier in each classifier combination may include: inputting a first sample in a first classifier training sample set into each classifier in each classifier combination, and obtaining a prediction label of the corresponding first sample output by each classifier in each classifier combination; determining the true sample number, the false negative sample number, the false positive sample number and the true negative sample number of each classifier in each classifier combination according to the prediction label of the corresponding first sample output by each classifier in each classifier combination and the true label thereof; determining a true sample rate for each classifier in each classifier combination according to the true sample number and the false negative sample number of each classifier in each classifier combination; determining a false positive sample rate for each classifier in each classifier combination based on the false positive and true negative sample numbers for each classifier in each classifier combination; determining a positive and negative sample separation degree index of each classifier in each classifier combination according to the maximum value of the difference value of the real sample rate and the false positive sample rate of each classifier in each classifier combination; and determining the accuracy measurement index of each classifier in each classifier combination according to the positive and negative sample separation degree index of each classifier in each classifier combination.
In the embodiment of the present disclosure, the positive and negative sample separation degree index may also be referred to as a KS (Kolmogorov-Smirnov) value, where a horizontal axis of a KS Curve (Kolmogorov-Smirnov) is a different classification threshold, and a vertical axis thereof is a change Curve of a true sample rate (TPr, a ratio of a positive sample to a true label of a first sample output by the classifier) and a false positive sample rate (FPr, a ratio of a positive sample to a true label of a first sample output by the classifier), but the true label of the first sample is not the positive sample).
The KS value for each classifier can be calculated according to the following formula:
KS value max TPr FPr (3)
That is, the KS value is the maximum value of the difference between the TPr and the FPr, and can be used as a measure for distinguishing the separation degree of the predicted positive and negative samples. Within a certain range, the larger the KS value is, the better the positive and negative sample distinguishing degree is, and the higher the model distinguishing degree is.
Taking two classifications as an example, the classifier can output the prediction probability of a first sample by connecting a sigmoid function (S-type function), a classification threshold value is taken between 0 and 1, and if the prediction probability is greater than or equal to the classification threshold value, the prediction label of the first sample is a positive sample; if the prediction probability is less than the classification threshold, the prediction label of the first sample is a negative sample.
The value of KS is in the range of [0,1], multiplied by 100%. KS values reference value ranges are shown below in table 2:
TABLE 2
In the embodiment of the present disclosure, a classification result confusion matrix may be used to assist in evaluating the classification performance of each classifier after being constructed, as shown in table 3 below, where the classification result confusion matrix gives the prediction label and the true label (the category information of the first sample in the true case) obtained by the classifier.
TABLE 3 Classification result confusion matrix
Based on the above classification result confusion matrix (see table 3 for an example of two classifications), the true sample rate (TPr) and the false positive sample rate (FPr) of each classifier are calculated as follows:
in an exemplary embodiment, obtaining the accuracy metric for each classifier in each classifier combination may include: inputting a second sample in a second classifier training sample set into each classifier in each classifier combination to obtain a prediction label of the corresponding second sample output by each classifier in each classifier combination; determining the true sample number, the false negative sample number, the false positive sample number and the true negative sample number of each classifier in each classifier combination according to the prediction label of the corresponding second sample output by each classifier in each classifier combination and the true label thereof; determining a true sample rate for each classifier in each classifier combination according to the true sample number and the false negative sample number of each classifier in each classifier combination; determining a false positive sample rate for each classifier in each classifier combination based on the false positive and true negative sample numbers for each classifier in each classifier combination; determining area indexes under curves of each classifier in each classifier combination according to the real sample rate and the false positive sample rate of each classifier in each classifier combination; and determining the accuracy measurement index of each classifier in each classifier combination according to the area under the curve index of each classifier in each classifier combination.
In other embodiments, the area Under the curve indicator auc (area Under cut) for each classifier can also be employed to determine the accuracy metric for each classifier in each classifier combination. The AUC (receiver operating characteristic curve) is an area under the ROC curve, which is used to measure the performance of a classifier, wherein a larger value of the AUC indicates a better classification performance of the classifier.
The ROC curve shows the variation curves of the true sample rate (TPr) and the false positive sample rate (FPr) for different classification thresholds, i.e. the true sample rate (TPr) is plotted on the ordinate and the false positive sample rate (FPr) is plotted on the abscissa. In order to allow better comparisons between ROC curves, AUC can be used to measure the performance of a classifier.
In practical application, the prediction probability of the second sample may be obtained based on a trained classifier, and then the prediction probability of the second sample is compared with a given classification threshold, if the prediction probability is greater than the given classification threshold, the prediction label of the second sample is a positive sample, otherwise, the prediction label is a negative sample. The value of the classification threshold is also different for different classification tasks.
Taking the two classifications as shown in Table 3 as an example, the true sample rate TP of each classifier can be calculated based on the confusion matrix of the classification results and the formulas (4) and (5)rAnd false positive sample rate FPr。
In an exemplary embodiment, obtaining the accuracy metric for each classifier in each classifier combination may include: inputting a third sample in the third classifier training sample set into each classifier in each classifier combination, and obtaining a prediction label of the corresponding third sample output by each classifier in each classifier combination; determining the true sample number and the false negative sample number of each classifier in each classifier combination according to the prediction label of the corresponding third sample output by each classifier in each classifier combination and the true label thereof; determining a recall rate for each classifier in each classifier combination based on the true and false negative sample counts for each classifier in each classifier combination; and determining an accuracy measure index of each classifier in each classifier combination according to the recall rate of each classifier in each classifier combination.
In the embodiment of the present disclosure, the Recall rate Recall may also be referred to as a Recall rate, which is a ratio of the real samples to all the real tags as positive samples. The higher the recall ratio, the better the recall performance of the classifier on the positive sample. Recall is an important indicator in some application scenarios, such as for capturing "bad users", and this type of scenario focuses on recall, i.e., how many proportions of "bad users" the classifier can capture among all "bad users".
Also taking the classification result confusion matrix of the above two classifications as an example, the recall ratio of each classifier can be calculated according to the following formula:
it should be noted that the accuracy metric for measuring the classifier is not limited to the KS value, AUC metric and recall listed above. For example, the effectiveness of the classifier can also be measured by using the Accuracy index Acc (abbreviation of Accuracy), and the larger the Accuracy index is, the larger the proportion of the samples which represent that the classifier correctly classifies the total samples is.
For example, the accuracy index Acc may be calculated according to the following formula:
in the embodiment of the present disclosure, any one of the KS value, the AUC indicator, the recall rate, and the accuracy indicator Acc of each classifier may be used as the accuracy metric indicator of each classifier, or any several of the KS value, the AUC indicator, the recall rate, and the accuracy indicator Acc of each classifier may be selected to perform a combined operation to obtain the accuracy metric indicator of each classifier, for example, perform a weighted summation, which is not limited by the present disclosure.
In step S1312, an accuracy metric mean value of each classifier combination is obtained according to the accuracy metric of each classifier in each classifier combination.
For example, toAnd each classifier combination in the classifier combinations respectively calculates the mean value of the accuracy measurement indexes of all the classifiers in the corresponding classifier combination as the average value of the accuracy measurement indexes of the corresponding classifier combination.
For example, if the KS value of each classifier is selected as the accuracy measure of each classifier, the KS values of all classifiers in each classifier combination may be averaged to obtain the KS value mean KSmeanAnd the average value is used as the accuracy measure index mean value of the corresponding classifier combination.
In step S1313, a maximum accuracy measure mean value and a minimum accuracy measure mean value are determined from the accuracy measure mean values of the at least two classifier combinations.
For example, as described aboveThe classifier combination can be obtained through calculationAn accuracy measure of the mean value of the index, for whichMean value Rig of individual accuracy measure indexmeanPerforming ascending or descending arrangement to determine the average value Rig of the maximum accuracy measure indexmean-maxAnd minimum accuracy measure mean value Rigmean-min。
In step S1314, a target accuracy indicator for each classifier combination is determined according to the maximum accuracy metric mean, the minimum accuracy metric mean, and the accuracy metric mean for each classifier combination.
For example, the average value of the accuracy measure indexes of each classifier combination can be subjected to maximum and minimum normalization according to the following formula, and the normalized average value of the accuracy measure indexes of each classifier combination is obtained and is used as the target accuracy index Rig of the corresponding classifier combination:
in the embodiment of the disclosure, the accuracy measurement index mean value of each classifier combination is normalized, and the normalized accuracy measurement index mean value is used as the target accuracy index of each classifier combination, so that the accuracy and diversity measurement indexes of the weighted number of each classifier combination obtained according to the target accuracy index of each classifier combination are not influenced by dimensions, and the method has universality.
Fig. 5 schematically shows a flowchart of step S132 in fig. 3 in an exemplary embodiment. As shown in fig. 5, the step S132 in the embodiment of the present disclosure may further include the following steps.
In step S1321, a diversity measure for every two classifiers in each classifier combination is obtained.
In the embodiment of the present disclosure, the diversity measure Div of each two classifiers in each classifier combination is an index for measuring the diversity of the integrated classification model synthesized by different classifier groups, and any one or more combinations (for example, weighted average) of Q statistics, correlation coefficients, Kappa statistics, and the like may be used as the diversity measure of each two classifiers, which is illustrated below.
In an exemplary embodiment, the at least two classifier combinations may include a first classifier combination, which may include a first classifier and a second classifier.
For example, the first classifier combination may be as described aboveThe first classifier and the second classifier may be any two classifiers of the first classifier combination. Here, how to calculate the diversity measure of every two classifiers in each classifier combination is illustrated by how to calculate the first classifier and the second classifier in the first classifier combination, and the diversity measure of every two classifiers in other classifier combinations is calculated in a similar manner.
Obtaining the diversity measure of every two classifiers in each classifier combination may include: inputting fourth samples in a fourth classifier training sample set to the first classifier and the second classifier respectively to obtain prediction labels of the corresponding fourth samples output by the first classifier and the second classifier respectively; according to the prediction label and the real label of the corresponding fourth sample output by the first classifier and the second classifier respectively, obtaining the number of samples which are classified correctly by the first classifier and the second classifier simultaneously, the number of samples which are classified correctly by the first classifier and the second classifier incorrectly, the number of samples which are classified incorrectly by the first classifier and the second classifier correctly, and the number of samples which are classified incorrectly by the first classifier and the second classifier; determining a diversity measure for the first classifier and the second classifier in the first classifier combination based on the number of samples that the first classifier and the second classifier classify correctly, the number of samples that the first classifier and the second classifier classify incorrectly, and the number of samples that the first classifier and the second classifier both classify incorrectly.
It should be noted that, in the embodiment of the present disclosure, the first classifier training sample set to the fourth classifier training sample set may be the same sample set used for training the classifier or may be different sample sets used for training the classifier, and the corresponding first sample to the fourth sample may be the same or may be different, which is not limited in this disclosure.
In the embodiment of the disclosure, based on the two-classifier classification result table shown in table 4 below (where classifier i in the two classifiers is assumed to be the first classifier, and classifier j is assumed to be the second classifier), the diversity measure of the first classifier and the second classifier may be calculated.
TABLE 4 two classifiers classification result table
Wherein a is the number of samples classified correctly by classifier i and classifier j at the same time, b is the number of samples classified correctly by classifier i and classified incorrectly by classifier j, c is the number of samples classified incorrectly by classifier i and classified correctly by classifier j, and d is the number of samples classified incorrectly by classifier i and classifier j. For example, there are M fourth samples, then a + b + c + d is M, and M is a positive integer greater than or equal to 1.
In an exemplary embodiment, determining a diversity metric for the first classifier and the second classifier in the first classifier combination based on the number of samples for which the first classifier and the second classifier simultaneously classify correctly, the number of samples for which the first classifier and the second classifier classify incorrectly, the number of samples for which the first classifier and the second classifier both classify incorrectly, and the number of samples for which the first classifier and the second classifier both classify incorrectly may include: obtaining correlation coefficients of the first classifier and the second classifier in the first classifier combination according to the number of samples classified by the first classifier and the second classifier simultaneously, the number of samples classified by the first classifier correctly and the second classifier incorrectly, the number of samples classified by the first classifier incorrectly and the second classifier correctly, and the number of samples classified by the first classifier and the second classifier incorrectly; and obtaining the diversity measurement indexes of the first classifier and the second classifier in the first classifier combination according to the correlation coefficient of the first classifier and the second classifier in the first classifier combination.
In the embodiment of the disclosure, if the correlation coefficient is used as the diversity measure of the first classifier and the second classifier, the correlation coefficient ρ between the first classifier and the second classifier may be calculated according to the following formulai,j:
Where ρ isi,jThe value range is [ -1,1 [ ]]ρ if the first classifier and the second classifier are positively correlatedi,jIs positive, and vice versa is negative. When the correlation coefficient is 0, the diversity is strongest.
In an exemplary embodiment, determining a diversity metric for the first classifier and the second classifier in the first classifier combination based on the number of samples for which the first classifier and the second classifier simultaneously classify correctly, the number of samples for which the first classifier and the second classifier classify incorrectly, the number of samples for which the first classifier and the second classifier both classify incorrectly, and the number of samples for which the first classifier and the second classifier both classify incorrectly may include: obtaining Q statistics for the first classifier and the second classifier in the first classifier combination based on the number of samples that the first classifier and the second classifier classify correctly, the number of samples that the first classifier and the second classifier classify incorrectly, and the number of samples that the first classifier and the second classifier both classify incorrectly; obtaining a diversity measure for the first classifier and the second classifier in the first classifier combination based on the Q statistics for the first classifier and the second classifier in the first classifier combination.
In the disclosed embodiment, if Q statistic (Q-st) is usedatitic) as a diversity measure for the first classifier and the second classifier, the Q statistic Q between the first classifier and the second classifier can be calculated according to the following formulai,j:
Wherein Q isi,jThe value range is [ -1,1 [ ]]Q if the first classifier and the second classifier are positively correlatedi,jIs positive, and vice versa is negative. When the Q statistic is 0, the diversity is strongest.
In an exemplary embodiment, determining a diversity metric for the first classifier and the second classifier in the first classifier combination based on the number of samples for which the first classifier and the second classifier simultaneously classify correctly, the number of samples for which the first classifier and the second classifier classify incorrectly, the number of samples for which the first classifier and the second classifier both classify incorrectly, and the number of samples for which the first classifier and the second classifier both classify incorrectly may include: obtaining kappa statistics for the first classifier and the second classifier in the first classifier combination according to the number of samples that the first classifier and the second classifier classify correctly, the number of samples that the first classifier and the second classifier classify incorrectly, and the number of samples that the first classifier and the second classifier both classify incorrectly; obtaining a diversity measure for the first classifier and the second classifier in the first classifier combination according to the kappa statistics of the first classifier and the second classifier in the first classifier combination.
In the embodiment of the disclosure, if Kappa-statistic (Kappa-statistic) is used as the diversity measure of the first classifier and the second classifier, Kappa statistic Kappa between the first classifier and the second classifier can be calculated according to the following formulai,j:
m=a+b+c+d (14)
If the first classifier and the second classifier return the same prediction labels for all the fourth samples, the value of the Kappa statistic is 1; if the first classifier and the second classifier agree only by chance, the value of the Kappa statistic is 0; the Kappa statistic is negative when the probability of agreement between the first classifier and the second classifier is even lower than chance. The smaller the Kappa statistic, the more diverse.
In step S1322, a diversity measure mean value of each classifier combination is obtained according to the diversity measures of each two classifiers in each classifier combination.
For example, to the aboveAveraging the diversity metrics Div of every two classifiers in each classifier combination in the classifier combination to obtain the diversity metric mean Div of each classifier combinationmean。
In step S1323, a maximum diversity measure mean and a minimum diversity measure mean are determined from the diversity measure means of the at least two classifier combinations.
For example, for the above obtainedDiversity measure mean value of individual classifier combinationsDivmeanPerforming descending or ascending arrangement to determine the average value Div of the maximum diversity measuremean-maxAnd minimum diversity measure mean Divmean-min。
In step S1324, a target diversity index of each classifier combination is determined according to the maximum diversity measure average, the minimum diversity measure average, and the diversity measure average of each classifier combination.
For example, the diversity measure mean Div for each classifier combination may be based on the following formulameanAnd (3) performing maximum and minimum normalization to obtain a normalized diversity measurement index mean value of each classifier combination, and taking the mean value as a target diversity index Vari of the corresponding classifier combination:
in the embodiment of the disclosure, by normalizing the diversity measurement index mean value of each classifier combination and taking the normalized diversity measurement index mean value as the target diversity index of each classifier combination, the accuracy and diversity measurement index of the weighted number obtained according to the target diversity index of each classifier combination is not influenced by the dimension, and the method has universality.
It should be noted that, for the indexes measuring accuracy and diversity in the WNAD indexes, there may be various choices according to practical applications, and the examples are not limited to the above examples.
The method provided by the above embodiment is illustrated by the embodiment of fig. 6 and 7.
FIG. 6 schematically shows a flow diagram of an integrated classification model-based classification method according to an embodiment of the present disclosure.
Assuming that k (2 ≦ k ≦ N) classifiers are selected from the N classifiers as the basis classifiers for the Stacking model, the flow of computing the WNAD indicator for the k classifier combinations is shown in FIG. 6.
In step S601, k classifiers are selected from the N classifiers to obtain a plurality of classifier combinations.
For example, obtainAnd (4) combining classifiers.
In step S602, a sample is input to each of k classifiers in each classifier combination, and a prediction probability of each classifier output is obtained respectively.
The samples are input into each classifier of each classifier combination, and each classifier can output the prediction probability of the corresponding sample respectively.
In step S603, the KS value mean of each classifier combination is calculated from the prediction probability output by each classifier.
Comparing the prediction probability output by each classifier with a set classification threshold value, determining a prediction label of a corresponding sample, determining the true sample number TP, the false negative sample number FN, the false positive sample number FP and the true negative sample number TN according to the true label of the corresponding sample, calculating to obtain the true sample rate and the false positive sample rate of each classifier, obtaining the KS value of each classifier, and averaging the KS values of all the classifiers in the corresponding classifier combination to obtain the KS value average value KS of each classifier combinationmean。
In step S604, the KS value mean of each classifier combination is normalized to obtain a normalized KS value mean of each classifier combination.
In the embodiment of the present disclosure, the maximum and minimum normalization may be performed on the KS value mean, but the present disclosure is not limited thereto.
For example, toCombined with classifiersIndividual KS value mean value KSmeanIn descending or ascending orderColumn, maximum KS value mean and minimum KS value mean can be determined, and normalized KS value mean KS for each classifier combination can be obtained by reference to the maximum and minimum normalization formulas described abovemean-norm。
In step S605, a classification threshold is adjusted, and a prediction label of each classifier for the corresponding sample is determined according to the prediction probability and the classification threshold output by each classifier.
In the embodiment of the disclosure, according to the prediction probability and the classification threshold of the corresponding sample output by each classifier, the prediction label of the corresponding sample can be determined. The classification threshold may be adjustable.
In step S606, a mean correlation coefficient value of each classifier combination is calculated according to the prediction label of each classifier for the corresponding sample.
According to the prediction label of each classifier and the real label of the corresponding sample, whether the classification of the classifier for the sample is correct can be determined, so that a classification result table of a first classifier and a second classifier similar to the embodiment can be obtained, the correlation coefficient of each two classifiers can be calculated and obtained according to the classification result table of each two classifiers in each classifier combination, the correlation coefficient of each two classifiers in each classifier combination is averaged, and the correlation coefficient mean Corr of each classifier combination can be obtainedmean。
In step S607, the correlation coefficient mean value of each classifier combination is normalized to obtain the normalized correlation coefficient mean value of each classifier combination.
In the embodiment of the present disclosure, the maximum and minimum normalization may be performed on the correlation coefficient mean, but the present disclosure is not limited thereto.
For example, toCombined with classifiersMean of individual correlation coefficients CorrmeanArranged in descending or ascending order, as determinedThe maximum correlation coefficient mean value and the minimum correlation coefficient mean value are determined, and the normalized correlation coefficient mean value Corr of each classifier combination can be obtained by referring to the maximum and minimum normalization formulamean-norm。
In step S608, the number k of classifiers for each classifier combination is determined.
In step S609, the number of classifiers for each classifier combination is normalized to obtain the normalized number of classifiers for each classifier combination.
In the embodiment of the present disclosure, the maximum and minimum normalization may be performed on the number k of classifiers, but the present disclosure is not limited thereto.
For example, toCombined with classifiersThe number k of the classifiers is arranged in descending order or ascending order, the maximum number and the minimum number of the classifiers can be determined, and the number k of the normalized classifiers of each classifier combination can be obtained by referring to the maximum and minimum normalization formulascale。
In step S610, the WNAD index of each classifier combination is calculated from the normalized KS value mean, the normalized correlation coefficient mean, and the normalized classifier number of each classifier combination.
In the embodiment of the disclosure, a normalized KS value mean value KS obtained by normalizing a classifier KS value mean value is adoptedmean-normAs a target accuracy index Rig, a normalized correlation coefficient mean Corr obtained by normalizing the correlation coefficient mean of every two classifiers combined by the classifiers is adoptedmean-normAs the target diversity index Vari, the number of classifiers is adopted and normalized to obtain the normalized classifier number kscaleAs the target classifier number index, the obtained WNAD index is shown by the following formula:
wherein α + β + λ is 1 and 0< α, β, λ < 1.
If the accuracy, the diversity and the number of the classifiers are considered as important, namely the target accuracy index, the target diversity index and the target classifier number index have the same weight, namelyEquation (16) can be simplified to equation (17):
according to the method provided by the embodiment of the disclosure, in the process of calculating the WNAD index, the target accuracy index, the target diversity index and the target classifier number index are subjected to normalization processing, so that the WNAD index is not influenced by dimensions and has universality.
FIG. 7 schematically shows a flow diagram of an integrated classification model-based classification method according to an embodiment of the present disclosure. As shown in fig. 7, the method provided by the embodiment of the present disclosure may include the following steps.
In step S701, k classifiers are selected from the N classifiers to obtain a plurality of classifier combinations.
Assuming that k (2. ltoreq. k. ltoreq.N) classifiers are to be selected from among N classifiers, there is a totalThe classifier combinations, for each classifier combination, operate as follows.
In step S702, for each classifier in each classifier combination, a KS value of each classifier is calculated based on its prediction probability.
For each classifier, a KS value is calculated for each classifier based on its prediction probability for the sample. For example, if one of the classifiers is a binary classifier and the prediction label y is set to 0 (e.g., representing a negative sample) or 1 (e.g., representing a positive sample), the prediction probability is the probability p that the prediction label of the sample is predicted to be 1, and p is set to [0,1], such as p being 0.7.
In step S703, a KS value mean value of each classifier combination is calculated from the KS value of each classifier in each classifier combination.
And calculating the KS value mean value of the corresponding classifier combination.
In step S704, for each classifier in each classifier combination, a prediction label for each classifier is calculated based on its prediction probability and according to the classification threshold adjustment.
For each classifier, based on the value of the prediction probability p, the prediction label (1 or 0) of the classifier is obtained according to the classification threshold adjustment, for example, the value of p is 0.7, which is greater than the set classification threshold 0.5, that is, the prediction label is classified as 1 because 0.7 is greater than 0.5.
The above process involves classification threshold adjustment, i.e., re-scaling the basic strategy for class imbalance learning, i.e., ify' is the prediction probability of the classifier predicting as a negative sample (a bad sample, and the corresponding positive sample may also be referred to as a good sample).
In step S705, based on the prediction labels, the correlation coefficients of every two classifiers in each classifier combination are calculated, and the mean value of the correlation coefficients of each classifier combination is obtained by averaging.
According to the prediction label and the real label of each sample, whether each classifier classifies each sample correctly can be known, so that the classification result table of the two classifiers can be obtained, and the assumption is that a correlation coefficient is selected to measure the diversity. Experiments verify that the selected correlation coefficient is relatively robust in measuring diversity. However, other measures of diversity may be selected according to actual conditions, such as the Q statistic or the Kappa statistic, or a weighted sum of the Q statistic, the Kappa statistic, and the correlation coefficient.
And averaging the correlation coefficients of every two classifiers in each classifier combination to obtain the mean value of the correlation coefficients of each classifier combination.
In step S706, the KS value mean, the correlation coefficient mean, and the number of classifiers of each classifier combination are respectively subjected to maximum and minimum normalization, and the maximum and minimum normalization is respectively used as a target accuracy index, a target diversity index, and a target classifier number index of each classifier combination.
And respectively carrying out maximum and minimum normalization on the KS value mean value, the correlation coefficient mean value and the number of classifiers of each classifier combination to serve as a target accuracy index, a target diversity index and a target classifier number index of each classifier combination.
In step S707, a weighted number accuracy diversity metric index for each classifier combination is calculated based on the target accuracy index, the target diversity index, and the target classifier number index for each classifier combination.
The WNAD index for each classifier combination is calculated according to the above equation (16) or (17).
In step S708, the weighted number accuracy diversity metric indexes of each classifier combination are sorted in descending order, and the classifier with the largest weighted number accuracy diversity metric index is selected to be combined as the base classifier in the base classifier layer of the integrated classification model.
Will be provided withAnd sequencing the WNAD index values in a descending or ascending manner, and then taking the classifier combination corresponding to the largest WNAD index, namely the optimal classifier combination of the Stacking model, as a base classifier layer of the integrated classification model.
According to the method provided by the embodiment of the disclosure, on one hand, the number index of the target classifiers is introduced, so that the WNAD index is more suitable for selecting different numbers of base classifiers; on the other hand, the target accuracy index, the target diversity index and the target classifier number index are subjected to normalization processing, so that the WNAD index is not influenced by dimensions and has universality.
Fig. 8 schematically shows a schematic diagram of an integrated classification model-based classification method according to an embodiment of the present disclosure.
As shown in fig. 8, it is assumed that the total number N of classifiers is 7, and the 7 classifiers are LR (regression classifier), DNN (Deep Neural Networks), RF (random Forest), AdaBoost (an iterative algorithm in which different classifiers (weak classifiers) are trained for the same training set, and then these weak classifiers are grouped to form a stronger final classifier (strong classifier)), GBDT (Gradient Boosting Decision Tree), XGBoost (eXtreme Gradient Boosting), and Light gbm (Light Gradient Boosting Machine), respectively. It should be noted that these 7 kinds of classifiers are only used for illustration, and in practice, any kind of classifier may be provided as required.
Selecting k classifiers from the 7 classifiers, classifier combination 1, classifier combination 2, up to classifier combination 120, i.e., 120 classifier combination, can be formed. From these 120 classifier combinations, an optimal classifier combination is selected.
According to the method and formula provided in the above embodiment, WNAD index 1 of classifier combination 1, WNAD index 2 of classifier combination 2, and then WNAD index 120 of classifier combination 120 can be calculated.
And then selecting the optimal classifier combination with the maximum WNAD index from WNAD index 1, WNAD index 2 and WNAD index 120 as a base classifier layer.
Assume that the first 5 classifier combinations of WNAD metrics for these 120 classifier combinations are shown in table 5 below:
TABLE 5
According to the above table 5, based on the WNAD index, the selected optimal classification combination (with the largest value corresponding to the WNAD index) is LR + DNN + GBDT + XGBoost + LightGBM, and the Stacking model framework corresponding to the optimal classification combination is shown in fig. 9.
As shown in fig. 9, the base classifier layer of the integrated classification model includes LR, DNN, GBDT, XGBoost, LightGBM, and the model training sample set is input to the base classifier layer to obtain the prediction probability. After which it is normalized and then input to a secondary classifier layer of the integrated classification model, which here assumes LR is employed, although the disclosure is not so limited, and then outputs probabilities.
The method provided by the embodiment of the disclosure provides a selection mode of the basis classifier of the Stacking model based on the WNAD index, and the WNAD index is suitable for selection of the basis classifiers with different numbers by introducing the number index of the target classifiers, so that the method has universality. The WNAD index provides a new universal strategy for selecting a base classifier in the Stacking integrated classification model, assists in quickly building the Stacking integrated classification model, improves the efficiency and performance of the Stacking model applied to classification problems, and is very suitable for all classification scenes. In addition, the WNAD index is subjected to normalization processing on the target accuracy index, the target diversity index and the target classifier number index, so that the WNAD index is not influenced by dimensions and has universality.
FIG. 10 schematically shows a block diagram of an integrated classification model-based classification apparatus according to an embodiment of the present disclosure. As shown in fig. 10, the classification apparatus 1000 based on an integrated classification model provided by the embodiment of the present disclosure may include an integrated classification model building unit 1010, an integrated classification model training unit 1020, a current data to be classified obtaining unit 1030, and a target classification result obtaining unit 1040.
In the embodiment of the present disclosure, the integrated classification model building unit 1010 may be configured to build the integrated classification model, and the integrated classification model may include a base classifier layer, and the base classifier layer may include a base classifier. An integrated classification model training unit 1020 may be used to train the integrated classification model using a set of model training samples. The current data to be classified obtaining unit 1030 may be configured to obtain current data to be classified. The target classification result obtaining unit 1040 may be configured to process the current data to be classified through the trained integrated classification model, and obtain a target classification result of the current data to be classified.
Wherein the integrated classification model building unit 1010 may include: a classifier combination obtaining unit 1011, a target classifier number index determining unit 1012, a weighted number accuracy diversity metric determining unit 1013, a target classifier combination determining unit 1014, and a base classifier determining unit 1015.
In the embodiment of the present disclosure, the classifier combination obtaining unit 1011 may be configured to select k classifiers from N classifiers, and obtain at least two classifier combinations, where k is a positive integer greater than or equal to 2 and less than or equal to N, and N is a positive integer greater than or equal to 2. The target classifier number index determination unit 1012 may be configured to determine a target classifier number index for each classifier combination based on the number of classifiers included in each classifier combination. The weighted number accuracy diversity metric determining unit 1013 may be configured to determine the weighted number accuracy diversity metric for each classifier combination according to the target classifier number index for each classifier combination. The target classifier combination determination unit 1014 may be configured to determine a target classifier combination from the at least two classifier combinations according to the weighted number accuracy diversity metric for each classifier combination. The base classifier determination unit 1015 may be configured to use the classifiers in the target classifier combination as the base classifiers in the base classifier layer of the integrated classification model.
The classification device based on the integrated classification model provided by the embodiment of the disclosure can obtain a plurality of classifier combinations by selecting different numbers of classifiers from a given plurality of selectable classifiers in the process of constructing the integrated classification model, then can determine the target classifier number index of each classifier combination according to the number of classifiers contained in each classifier combination, further determine the weighted number accuracy diversity measurement index of each classifier combination according to the target classifier number index of each classifier combination, then select an optimal classifier combination from the plurality of classifier combinations to cooperate as the target classifier combination according to the weighted number accuracy diversity measurement index of each classifier combination, because the factor of the number of classifiers contained in each classifier combination is considered in the process of determining the optimal classifier combination, the finally determined optimal classifier combination can determine the number of the optimal classifiers from the given multiple alternative classifiers according to the actual situation, so that the constructed base classifier of the integrated classification model has the optimal number of the classifiers. When the method provided by the embodiment of the disclosure is applied to the construction of different integrated classification models, the method can be suitable for the selection of the base classifiers with different numbers of classifiers, a selection strategy with more universality is provided, and finally the robustness of the constructed integrated classification model is stronger. When the trained integrated classification model is applied to the classification problem, the classification efficiency and the performance of the integrated classification model can be improved.
In an exemplary embodiment, the target classifier number index determining unit 1012 may include: the classifier combination classifier number determining unit may be configured to determine the number of classifiers in each classifier combination according to the number of classifiers included in each classifier combination; the classifier number maximum value determining unit may be configured to determine a maximum classifier number and a minimum classifier number from the classifier numbers of the at least two kinds of classifier combinations; the classifier combination target classifier number index obtaining unit may be configured to determine a target classifier number index of each classifier combination according to the maximum classifier number, the minimum classifier number, and the classifier number of each classifier combination.
In an exemplary embodiment, the weighted number accuracy diversity metric determination unit 1013 may include: a target accuracy index obtaining unit, configured to obtain a target accuracy index for each classifier combination; a target diversity index obtaining unit operable to obtain a target diversity index for each classifier combination; the accuracy diversity number weight determining unit can be used for determining the accuracy weight, the diversity weight and the classifier number weight; and the weighted number accuracy diversity measurement index obtaining unit can be used for determining the weighted number accuracy diversity measurement index of each classifier combination according to the target accuracy index and the accuracy weight, the target diversity index and the diversity weight of each classifier combination, and the target classifier number index and the classifier number weight.
In an exemplary embodiment, the target accuracy index obtaining unit may include: an accuracy measure index obtaining unit, configured to obtain an accuracy measure index of each classifier in each classifier combination; the accuracy measure index mean value obtaining unit can be used for obtaining the accuracy measure index mean value of each classifier combination according to the accuracy measure index of each classifier in each classifier combination; the accuracy measure index mean value most-value determining unit can be used for determining a maximum accuracy measure index mean value and a minimum accuracy measure index mean value from the accuracy measure index mean values of the combination of the at least two classifiers; and the target accuracy index determining unit may be configured to determine the target accuracy index of each classifier combination according to the maximum accuracy measurement index mean, the minimum accuracy measurement index mean, and the accuracy measurement index mean of each classifier combination.
In an exemplary embodiment, the accuracy metric obtaining unit may include: a first sample prediction label obtaining unit, configured to input a first sample in a first classifier training sample set to each classifier in each classifier combination, and obtain a prediction label output by each classifier in each classifier combination and corresponding to the first sample; a first sample number determining unit, which can be used for determining the true sample number, the false negative sample number, the false positive sample number and the true negative sample number of each classifier in each classifier combination according to the prediction label of the corresponding first sample output by each classifier in each classifier combination and the true label thereof; a first true sample rate determining unit operable to determine a true sample rate of each classifier in each classifier combination from the true sample number and the false negative sample number of each classifier in each classifier combination; a first false positive sample rate determining unit operable to determine a false positive sample rate of each classifier in each classifier combination from the false positive sample number and the true negative sample number of each classifier in each classifier combination; the positive and negative sample separation degree index determining unit may be configured to determine a positive and negative sample separation degree index of each classifier in each classifier combination according to a maximum value of a difference between a true sample rate and a false positive sample rate of each classifier in each classifier combination; the first classifier accuracy measure index determining unit may be configured to determine the accuracy measure index of each classifier in each classifier combination according to the positive and negative sample separation degree index of each classifier in each classifier combination.
In an exemplary embodiment, the accuracy metric obtaining unit may include: a second sample prediction label obtaining unit, configured to input a second sample in the second classifier training sample set to each classifier in each classifier combination, and obtain a prediction label of the corresponding second sample output by each classifier in each classifier combination; a second sample number determination unit, which can be used for determining the true sample number, the false negative sample number, the false positive sample number and the true negative sample number of each classifier in each classifier combination according to the prediction label of the corresponding second sample output by each classifier in each classifier combination and the true label thereof; a second true sample rate determining unit operable to determine a true sample rate of each classifier in each classifier combination from the true sample number and the false negative sample number of each classifier in each classifier combination; a second false positive sample rate determination unit operable to determine a false positive sample rate of each classifier in each classifier combination from the false positive sample number and the true negative sample number of each classifier in each classifier combination; the under-curve area index determining unit may be configured to determine an under-curve area index of each classifier in each classifier combination according to the true sample rate and the false positive sample rate of each classifier in each classifier combination; the second classifier accuracy measure index determining unit may be configured to determine the accuracy measure index of each classifier in each classifier combination according to the area under the curve index of each classifier in each classifier combination.
In an exemplary embodiment, the accuracy metric obtaining unit may include: a third sample prediction label obtaining unit, configured to input a third sample in the third classifier training sample set to each classifier in each classifier combination, and obtain a prediction label of the corresponding third sample output by each classifier in each classifier combination; a third sample number determination unit, configured to determine a true sample number and a false negative sample number of each classifier in each classifier combination according to the prediction label of the corresponding third sample output by each classifier in each classifier combination and the true label thereof; a classifier recall ratio determination unit operable to determine a recall ratio of each classifier in each classifier combination according to the true sample number and the false negative sample number of each classifier in each classifier combination; the third classifier accuracy metric determining unit may be configured to determine the accuracy metric of each classifier in each classifier combination according to the recall rate of each classifier in each classifier combination.
In an exemplary embodiment, the target diversity index obtaining unit may include: a two-classifier diversity measure obtaining unit, configured to obtain a diversity measure for each two classifiers in each classifier combination; the classifier combination diversity measure index mean value obtaining unit can be used for obtaining the diversity measure index mean value of each classifier combination according to the diversity measure indexes of every two classifiers in each classifier combination; the classifier combination diversity measure index mean value most-value determining unit can be used for determining a maximum diversity measure index mean value and a minimum diversity measure index mean value from the diversity measure index mean values of the at least two kinds of classifier combinations; the classifier combination target diversity index determining unit may be configured to determine the target diversity index of each classifier combination according to the maximum diversity measure index mean, the minimum diversity measure index mean, and the diversity measure index mean of each classifier combination.
In an exemplary embodiment, the at least two classifier combinations may include a first classifier combination, which may include a first classifier and a second classifier. The two-classifier diversity measure obtaining unit may include: a fourth sample prediction label obtaining unit, configured to input fourth samples in a fourth classifier training sample set to the first classifier and the second classifier, respectively, and obtain prediction labels of corresponding fourth samples output by the first classifier and the second classifier, respectively; a two-classifier classification result table obtaining unit, configured to obtain, according to the prediction label and the true label of the fourth sample corresponding to the output of each of the first classifier and the second classifier, the number of samples that are classified correctly by the first classifier and the second classifier at the same time, the number of samples that are classified correctly by the first classifier and the second classifier is incorrect, the number of samples that are classified incorrectly by the first classifier and the second classifier, and the number of samples that are classified incorrectly by the first classifier and the second classifier; a two-classifier diversity measure determining unit, configured to determine a diversity measure for the first classifier and the second classifier in the first classifier combination according to the number of samples that the first classifier and the second classifier classify correctly, the number of samples that the first classifier and the second classifier classify incorrectly, and the number of samples that the first classifier and the second classifier equally classify incorrectly.
In an exemplary embodiment, the two-classifier diversity measure determining unit may include: a two-classifier correlation coefficient obtaining unit, configured to obtain correlation coefficients of the first classifier and the second classifier in the first classifier combination according to the number of samples classified by the first classifier and the second classifier at the same time, the number of samples classified by the first classifier correctly and the second classifier incorrectly, the number of samples classified by the first classifier incorrectly and the second classifier correctly, and the number of samples classified by the first classifier and the second classifier incorrectly; a correlation coefficient diversity measure obtaining unit, configured to obtain a diversity measure of the first classifier and the second classifier in the first classifier combination according to the correlation coefficients of the first classifier and the second classifier in the first classifier combination.
In an exemplary embodiment, the two-classifier diversity measure determining unit may include: a two-classifier Q-statistic obtaining unit operable to obtain Q-statistics of the first classifier and the second classifier in the first classifier combination based on the number of samples that are classified correctly by the first classifier and the second classifier at the same time, the number of samples that are classified correctly by the first classifier and are classified incorrectly by the second classifier, the number of samples that are classified incorrectly by the first classifier and are classified correctly by the second classifier, and the number of samples that are classified incorrectly by the first classifier and the second classifier at the same time; a Q statistic diversity measure obtaining unit, configured to obtain a diversity measure of the first classifier and the second classifier in the first classifier combination according to Q statistics of the first classifier and the second classifier in the first classifier combination.
In an exemplary embodiment, the two-classifier diversity measure determining unit may include: a two-classifier kappa statistic obtaining unit, configured to obtain kappa statistics of the first classifier and the second classifier in the first classifier combination according to the number of samples that are classified correctly by the first classifier and the second classifier simultaneously, the number of samples that are classified correctly by the first classifier and are classified incorrectly by the second classifier, the number of samples that are classified incorrectly by the first classifier and are classified correctly by the second classifier, and the number of samples that are classified incorrectly by the first classifier and the second classifier; a kappa statistic diversity measure obtaining unit, configured to obtain diversity measures of the first classifier and the second classifier in the first classifier combination according to the kappa statistics of the first classifier and the second classifier in the first classifier combination.
Other contents of the classification device based on the integrated classification model of the embodiment of the present disclosure can refer to the above-mentioned embodiment.
It should be noted that although in the above detailed description several units of the device for action execution are mentioned, this division is not mandatory. Indeed, the features and functions of two or more units described above may be embodied in one unit, in accordance with embodiments of the present disclosure. Conversely, the features and functions of one unit described above may be further divided into embodiments by a plurality of units.
The scheme provided by the attack embodiment relates to an artificial intelligence machine learning technology.
Among them, Artificial Intelligence (AI) is a theory, method, technique and application system that simulates, extends and expands human Intelligence using a digital computer or a machine controlled by a digital computer, senses the environment, acquires knowledge and uses the knowledge to obtain the best result. In other words, artificial intelligence is a comprehensive technique of computer science that attempts to understand the essence of intelligence and produce a new intelligent machine that can react in a manner similar to human intelligence. Artificial intelligence is the research of the design principle and the realization method of various intelligent machines, so that the machines have the functions of perception, reasoning and decision making.
The artificial intelligence technology is a comprehensive subject and relates to the field of extensive technology, namely the technology of a hardware level and the technology of a software level. The artificial intelligence infrastructure generally includes technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, mechatronics, and the like. The artificial intelligence software technology mainly comprises a computer vision technology, a voice processing technology, a natural language processing technology, machine learning/deep learning and the like.
Machine Learning (ML) is a multi-domain cross discipline, and relates to a plurality of disciplines such as probability theory, statistics, approximation theory, convex analysis, algorithm complexity theory and the like. The special research on how a computer simulates or realizes the learning behavior of human beings so as to acquire new knowledge or skills and reorganize the existing knowledge structure to continuously improve the performance of the computer. Machine learning is the core of artificial intelligence, is the fundamental approach for computers to have intelligence, and is applied to all fields of artificial intelligence. Machine learning and deep learning generally include techniques such as artificial neural networks, belief networks, reinforcement learning, transfer learning, inductive learning, and formal education learning.
The electronic device implementing the method or apparatus provided by the embodiments of the present disclosure may be various types of terminals or servers.
The server may be an independent server, a server cluster or a distributed system formed by a plurality of servers, or a cloud server providing basic cloud computing services such as a cloud service, a cloud database, cloud computing, a cloud function, cloud storage, a Network service, cloud communication, a middleware service, a domain name service, a security service, a CDN (Content Delivery Network), a big data and artificial intelligence platform, and the like. The terminal may be, but is not limited to, a smart phone, a tablet computer, a laptop computer, a desktop computer, a smart speaker, a smart watch, and the like. The terminal and the server may be directly or indirectly connected through wired or wireless communication, and the disclosure is not limited thereto.
Fig. 11 is an exemplary scene diagram illustrating a classification method based on an integrated classification model to which an embodiment of the present disclosure may be applied.
Referring to fig. 11, the terminal 1120 is connected to the server 1110 through a network 1130, and the network 1130 may be a wide area network or a local area network, or a combination of both.
The terminal 1120 (running a client, such as an educational learning client, a search client, etc.) may be used to obtain the user's current data to be classified. After the terminal 1120 acquires the current data to be classified, the current data to be classified is sent to the server 1110 through the network 1130, the server 1110 calls the constructed and trained Stacking integrated classification model according to the current data to be classified, a target classification result corresponding to the current data to be classified is predicted, and the target classification result is fed back to the terminal 1120.
The Stacking integrated classification model constructed by the embodiment of the disclosure is very suitable for all classification scenes.
For example, a credit wind control scenario builds a wind control model to predict the risk/probability that a user applying for a loan will be overdue in the future. For the credit wind control scene, in order to improve the prediction capability of a single classifier, the Stacking integrated classification model constructed and trained by the embodiment can be adopted, and the output results of a plurality of base classifiers are retrained, so that the classification effect of the model is further improved.
For another example, the Stacking integrated classification model may be used in a classification scenario such as advertisement click through rate prediction.
It should be noted that the above application scenarios are merely illustrated for the convenience of understanding the spirit and principles of the present disclosure, and the embodiments of the present disclosure are not limited in this respect. Rather, embodiments of the present disclosure may be applied to any scene where occlusion processing of sensitive elements in a video is required, where applicable.
Referring now to FIG. 12, shown is a schematic diagram of an electronic device suitable for use in implementing embodiments of the present application. The electronic device shown in fig. 12 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present application.
Referring to fig. 12, an electronic device provided in an embodiment of the present disclosure may include: a processor 1201, a communication interface 1202, a memory 1203, and a communication bus 1204.
Wherein the processor 1201, the communication interface 1202 and the memory 1203 are in communication with each other via a communication bus 1204.
Alternatively, the communication interface 1202 may be an interface of a communication module, such as an interface of a GSM (Global System for Mobile communications) module. The processor 1201 is used to execute a program. The memory 1203 is used for storing programs. The program may comprise a computer program comprising computer operating instructions. Wherein, can include in the procedure: and (5) a game client program.
The processor 1201 may be a central processing unit CPU or an application Specific Integrated circuit asic or one or more Integrated circuits configured to implement embodiments of the present disclosure.
The memory 1203 may include a Random Access Memory (RAM) memory, and may also include a non-volatile memory (e.g., at least one disk memory).
Among them, the procedure can be specifically used for: constructing the ensemble classification model, the ensemble classification model comprising a base classifier layer, the base classifier layer comprising a base classifier; training the integrated classification model by using a model training sample set; obtaining current data to be classified; and processing the current data to be classified through the trained integrated classification model to obtain a target classification result of the current data to be classified. Wherein constructing the ensemble classification model comprises: selecting k classifiers from the N classifiers to obtain at least two classifier combinations, wherein k is a positive integer which is greater than or equal to 2 and less than or equal to N, and N is a positive integer which is greater than or equal to 2; determining the number index of the target classifiers of each classifier combination based on the number of the classifiers contained in each classifier combination; determining the accuracy and diversity measurement index of the weighted number of each classifier combination according to the number index of the target classifiers of each classifier combination; determining a target classifier combination from the at least two classifier combinations according to the weighted number accuracy diversity metric index of each classifier combination; using the classifiers in the target classifier combination as the base classifiers in the base classifier layer of the integrated classification model.
According to an aspect of the application, a computer program product or computer program is provided, comprising computer instructions, the computer instructions being stored in a computer readable storage medium. The processor of the computer device reads the computer instructions from the computer-readable storage medium, and the processor executes the computer instructions to cause the computer device to perform the method provided in the various alternative implementations of the embodiments described above.
It is to be understood that any number of elements in the drawings of the present disclosure are by way of example and not by way of limitation, and any nomenclature is used for differentiation only and not by way of limitation.
Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure disclosed herein. This application is intended to cover any variations, uses, or adaptations of the disclosure following, in general, the principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.
It will be understood that the present disclosure is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the present disclosure is limited only by the appended claims.
- 上一篇:石墨接头机器人自动装卡簧、装栓机
- 下一篇:目标检测方法及相关设备