Merchant classification method

文档序号:8628 发布日期:2021-09-17 浏览:38次 中文

1. A merchant classification method comprises the following steps:

acquiring sales data of a merchant, wherein the sales data comprises training samples;

preprocessing the sales data;

performing a feature extraction process based on the pre-processed sales data, thereby obtaining a plurality of features;

performing a feature selection process on the plurality of features to select an important feature from the plurality of features;

training a classification model based on the important features of the training samples; and

classifying the merchant based on the classification model.

2. The merchant classification method according to claim 1, wherein the sales data includes at least one of: sales date, sales quantity, sales price, merchant name, commodity name.

3. The merchant classifying method according to claim 1, wherein the sales data includes sales data for a predetermined period of time.

4. The merchant classification method according to claim 3, wherein the sales data includes: the number of sales and the name of the goods,

the pretreatment comprises the following steps:

calculating the total sales volume of each commodity of each merchant;

and selecting the sales data of K commodities before the sales of each merchant according to the total sales, wherein K is a positive integer.

5. The merchant classifying method according to claim 1, wherein the preprocessing includes a process of removing a dimension of the sales data.

6. The merchant classification method according to claim 5, wherein the preprocessing comprises a normalization process.

7. The merchant classification method according to claim 5, wherein the normalization process is performed using the following formula:

x’=(x-xmin)/(xmax-xmin),

wherein x is sales data, xminIs the minimum value, x, of the sales datamaxAnd x' is the maximum value of the sales data and is the normalized sales data after normalization processing.

8. A merchant classifying device comprising:

a processor; and

a memory having stored therein a set of instructions that,

when the processor executes the instructions, the processor is configured to:

acquiring sales data of a merchant, wherein the sales data comprises training samples;

preprocessing the sales data;

performing a feature extraction process based on the pre-processed sales data, thereby obtaining a plurality of features;

performing a feature selection process on the plurality of features to select an important feature from the plurality of features;

training a classification model based on the important features of the training samples; and

classifying the merchant based on the classification model.

9. A computer-readable non-transitory storage medium having instructions stored thereon, which when executed by a processor, the processor is configured to:

acquiring sales data of a merchant, wherein the sales data comprises training samples;

preprocessing the sales data;

performing a feature extraction process based on the pre-processed sales data, thereby obtaining a plurality of features;

performing a feature selection process on the plurality of features to select an important feature from the plurality of features;

training a classification model based on the important features of the training samples; and

classifying the merchant based on the classification model.

10. A computer program comprising a series of instructions which, when executed by a processor, the processor is configured to:

acquiring sales data of a merchant, wherein the sales data comprises training samples;

preprocessing the sales data;

performing a feature extraction process based on the pre-processed sales data, thereby obtaining a plurality of features;

performing a feature selection process on the plurality of features to select an important feature from the plurality of features;

training a classification model based on the important features of the training samples; and

classifying the merchant based on the classification model.

Background

The merchandise sold by the retailer is typically obtained from a manufacturer or an agent. The sales of goods by individual retailers often fluctuate within a relatively stable range. However, in some cases, the sales of the product may exceed an expected range, and problems such as lost sales and lost sales of the product may easily occur. By classifying the retailers, the merchants with large fluctuation of commodity sales can be determined in advance. By controlling the amount of offerings to these merchants, the resource allocation can be optimized.

Disclosure of Invention

According to an aspect of the present disclosure, there is provided a merchant classification method including:

acquiring sales data of a merchant, wherein the sales data comprises training samples;

preprocessing the sales data;

performing a feature extraction process based on the pre-processed sales data, thereby obtaining a plurality of features;

performing a feature selection process on the plurality of features to select an important feature from the plurality of features;

training a classification model based on the important features of the training samples; and

classifying the merchant based on the classification model.

In some embodiments according to the disclosure, the sales data comprises at least one of: sales date, sales quantity, sales price, merchant name, commodity name.

In some embodiments according to the disclosure, the sales data comprises sales data for a predetermined period of time.

In some embodiments according to the present disclosure, the sales data comprises: the number of sales and the name of the goods,

the pretreatment comprises the following steps:

calculating the total sales volume of each commodity of each merchant;

and selecting the sales data of K commodities before the sales of each merchant according to the total sales, wherein K is a positive integer.

In some embodiments according to the present disclosure, the preprocessing includes a process of removing a dimension of the sales data.

In some embodiments according to the present disclosure, the preprocessing comprises normalization processing.

In some embodiments according to the present disclosure, the normalization process is performed using the following formula:

x’=(x-xmin)/(xmax-xmin),

wherein x is sales data, xminIs the minimum value, x, of the sales datamaxAnd x' is the maximum value of the sales data and is the normalized sales data after normalization processing.

In some embodiments according to the present disclosure, the features include timing features.

In some embodiments according to the present disclosure, the timing feature comprises: period class characteristics, statistical class characteristics, frequency domain class characteristics, nonlinear characteristics, and linear characteristics.

In some embodiments according to the present disclosure, the cycle class characteristics include: number of peaks, period of peak intervals.

In some embodiments according to the present disclosure, the statistical class features include: variance, standard deviation, mean, extremum, median, entropy, length of longest continuous subsequence greater than mean, length of longest continuous subsequence less than mean, repeatability of extremum, location where extremum last occurred, number of sales data greater than mean, and number of sales data less than mean.

In some embodiments according to the present disclosure, the frequency domain class features include: frequency, phase, spectral centroid, variance, skewness, coefficients of the fourier transform, and kurtosis of the absolute fourier transform spectrum.

In some embodiments according to the present disclosure, the nonlinear characteristic comprises: Fisher-Pearson normalized moment coefficients, signal-to-noise ratios, coefficients fitted to a Langevin model, continuous wavelet variations of a Ricker wavelet, autocorrelation coefficients of a lag.

In some embodiments according to the present disclosure, the linear feature comprises: linear least squares regression coefficients, auto-regression model coefficients.

In some embodiments according to the present disclosure, the feature selection process comprises:

calculating the importance of each characteristic by using a random forest algorithm;

comparing the importance to a predetermined threshold;

selecting the feature having the importance greater than the threshold as the important feature.

In some embodiments according to the present disclosure, the merchant classification method further comprises:

a plurality of candidate threshold values are determined,

calculating the values of the evaluation functions of the rough classification models under the candidate threshold values;

selecting a candidate threshold corresponding to a maximum value of the evaluation function as the predetermined threshold.

In some embodiments according to the present disclosure, the merit function includes an accuracy rate (accuracy rate) and a macro F1 (macro-F1).

In some embodiments according to the present disclosure, the merchant classification method further comprises: training the coarse classification model based on all of the plurality of features of the training sample.

In some embodiments according to the present disclosure, the training of the classification model comprises:

a grid-search (grid-search) is used to determine hyper-parameters of the classification model.

In some embodiments according to the present disclosure, the classification model comprises: logistic regression models and support vector machine models.

In some embodiments according to the present disclosure, the method further comprises:

and (5) carrying out category balancing processing.

In some embodiments according to the disclosure, the class balancing process comprises:

and performing oversampling processing or undersampling processing on the training samples.

In some embodiments according to the disclosure, the class balancing process comprises:

and setting a weight coefficient for the training sample.

In some embodiments according to the present disclosure, the weight coefficient for each training sample is equal to the total number of training samples divided by the number of training samples in the class to which the training sample belongs.

According to another aspect of the present disclosure, there is provided a merchant classifying device including:

a processor; and

a memory having stored therein a set of instructions that,

when the processor executes the instructions, the processor is configured to:

acquiring sales data of a merchant, wherein the sales data comprises training samples;

preprocessing the sales data;

performing a feature extraction process based on the pre-processed sales data, thereby obtaining a plurality of features;

performing a feature selection process on the plurality of features to select an important feature from the plurality of features;

training a classification model based on the important features of the training samples; and

classifying the merchant based on the classification model.

According to yet another aspect of the disclosure, there is provided a computer-readable non-transitory storage medium having instructions stored thereon, which when executed by a processor, the processor is configured to:

acquiring sales data of a merchant, wherein the sales data comprises training samples;

preprocessing the sales data;

performing a feature extraction process based on the pre-processed sales data, thereby obtaining a plurality of features;

performing a feature selection process on the plurality of features to select an important feature from the plurality of features;

training a classification model based on the important features of the training samples; and

classifying the merchant based on the classification model.

According to yet another aspect of the disclosure, there is provided a computer program comprising a series of instructions which, when executed by a processor, the processor is configured to:

acquiring sales data of a merchant, wherein the sales data comprises training samples;

preprocessing the sales data;

performing a feature extraction process based on the pre-processed sales data, thereby obtaining a plurality of features;

performing a feature selection process on the plurality of features to select an important feature from the plurality of features;

training a classification model based on the important features of the training samples; and

classifying the merchant based on the classification model.

Other features of the present disclosure and advantages thereof will become apparent from the following detailed description of exemplary embodiments thereof, which proceeds with reference to the accompanying drawings.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments of the disclosure and together with the description, serve to explain the principles of the disclosure.

The present disclosure may be understood more clearly and in accordance with the following detailed description, taken with reference to the accompanying drawings,

wherein:

FIG. 1 shows a flow diagram of a merchant categorization method according to an embodiment of the disclosure.

Fig. 2 shows a flow diagram of a feature selection process according to an embodiment of the present disclosure.

Fig. 3 shows a flow chart for determining the predetermined threshold value according to an embodiment of the present disclosure.

FIG. 4 illustrates a schematic diagram of a portion of a sales and inventory daily report, according to an embodiment of the present disclosure.

Fig. 5 shows information of merchant codes, commodity codes, sales numbers and the like corresponding to 86 screened cigarettes according to the embodiment of the disclosure.

FIG. 6 shows a schematic of sales data after normalization processing according to an embodiment of the disclosure.

FIG. 7 shows an example of the importance of various features computed by a random forest algorithm according to an embodiment of the present disclosure.

FIG. 8 illustrates a set of candidate thresholds and the accuracy rates and values of the macro F1 for each candidate threshold, according to an embodiment of the disclosure.

Fig. 9 illustrates candidate values of a hyperparameter of a support vector machine algorithm according to an embodiment of the present disclosure.

Fig. 10 shows the results of 1-fold of the SVM algorithm in a 5-fold cross-validation process according to an embodiment of the disclosure.

FIG. 11 illustrates a set of candidate values for a hyperparameter of a logistic regression algorithm, according to an embodiment of the disclosure.

Figure 12 shows the results of 1-fold in 5-fold cross-validation according to embodiments of the disclosure.

Figure 13 shows the results of 1-fold in 5-fold cross-validation according to embodiments of the present disclosure.

Figure 14 shows the results of 1-fold in 5-fold cross-validation according to embodiments of the present disclosure.

Fig. 15 shows tuning results of a logistic regression algorithm and an SVM algorithm according to an embodiment of the present disclosure.

Figure 16 shows the results of a city a tobacco retailer classification process according to an embodiment of the present disclosure.

Fig. 17 shows the results of a tobacco retailer classification process for city B, C, according to an embodiment of the disclosure.

FIG. 18 shows a block diagram of a computing device, according to an example embodiment of the present disclosure.

Note that in the embodiments described below, the same reference numerals are used in common between different drawings to denote the same portions or portions having the same functions, and a repetitive description thereof will be omitted. In this specification, like reference numerals and letters are used to designate like items, and therefore, once an item is defined in one drawing, further discussion thereof is not required in subsequent drawings.

For convenience of understanding, the positions, sizes, ranges, and the like of the respective structures shown in the drawings and the like do not sometimes indicate actual positions, sizes, ranges, and the like. Therefore, the disclosed invention is not limited to the positions, dimensions, ranges, etc., disclosed in the drawings and the like.

Detailed Description

Various exemplary embodiments of the present disclosure will now be described in detail with reference to the accompanying drawings. It should be noted that: the relative arrangement of the components and steps, the numerical expressions, and numerical values set forth in these embodiments do not limit the scope of the present disclosure unless specifically stated otherwise.

The following description of at least one exemplary embodiment is merely illustrative in nature and is in no way intended to limit the disclosure, its application, or uses.

Techniques, methods, and apparatus known to those of ordinary skill in the relevant art may not be discussed in detail but are intended to be part of the specification where appropriate.

In all examples shown and discussed herein, any particular value should be construed as merely illustrative, and not limiting. Thus, other examples of the exemplary embodiments may have different values.

FIG. 1 shows a flow diagram of a merchant categorization method according to an embodiment of the disclosure. As shown in fig. 1, a merchant classification method according to an embodiment of the present disclosure includes the steps of:

acquiring sales data of a merchant, wherein the sales data comprises training samples (step 101);

preprocessing the sales data (step 102);

performing a feature extraction process based on the pre-processed sales data, thereby obtaining a plurality of features (step 103);

performing a feature selection process on the plurality of features to select an important feature from the plurality of features (step 104);

training a classification model based on the significant features of the training samples (step 105); and

the merchant is classified based on the classification model (step 106).

Hereinafter, each step will be described in detail with reference to examples.

In step 101, various sales data of the merchant may be obtained, for example, the sales data of the merchant may be recorded in a database of the server, and a sales date, a sales amount, a sales price, a merchant name, a commodity name, and the like of the commodity within a predetermined period (for example, the past year or years) may be obtained from the server database. Training samples may also be included in the sales data for machine learning purposes.

The obtained sales data is then preprocessed in step 102. For example, the sales data may be normalized to remove dimensions of the sales data. Equation (1) shows a normalization process according to an embodiment of the disclosure:

x’=(x-xmin)/(xmax-xmin) (1)

wherein x is sales data, xminAs the minimum value of sales data, xmaxAnd x' is the maximum value of the sales data, and is the normalized sales data after normalization processing.

Through the above normalization processing, dimensions of sales data can be removed, and each sales data falls in the [0, 1] section. Of course, the present disclosure is not so limited and other suitable means of normalization may be employed in accordance with the teachings of the present disclosure. For example, where there are extremely large or small values in the sales data, normalization may be achieved in a normalized manner such that the sales data is distributed with a mean of 0 and a variance of 1.

In addition, screening may also be performed on the acquired sales data. For example, people often pay attention to goods with large sales volumes to avoid problems such as lost sales and lost sales. Therefore, the commodity with a small sales volume can be selectively removed. For example, the total sales volume of each commodity of each merchant may be calculated, and then ranked according to the calculated total sales volume, and sales data of commodities whose total sales volume is top is selected. For example, sales data for K items before each merchant's sales amount may be selected, K being a positive integer.

Next, at step 103, a feature extraction process may be performed based on the pre-processed sales data, resulting in a plurality of features. In the feature extraction process, a variety of time series features may be extracted. These features may include:

period-like features such as the number of peaks, the period of the peak intervals, etc.;

statistical class features such as variance, standard deviation, mean, maximum, minimum, median, entropy, longest continuous subsequence length greater than mean, longest continuous subsequence length less than mean, repetition of extrema (e.g., with or without repetition of minimum and/or maximum), location where extrema (maximum/minimum) last occurred, number of sales data greater than mean, number of sales data less than mean, etc.;

frequency domain class features such as frequency, phase, spectral centroid, variance, skewness, coefficients of fourier transform, kurtosis of absolute fourier transform spectrum, etc.;

measuring nonlinear characteristics of time series data, such as Fisher-Pearson normalized moment coefficient G1, signal-to-noise ratio (SNR), Langevin model fitting coefficient, continuous wavelet change of Ricker wavelet, lag (lag) autocorrelation coefficient, etc.; and

linear characteristics of the time series data are measured, such as linear least squares regression coefficients, autoregressive model coefficients, and the like.

It is to be understood that the present disclosure is not limited to the various features listed above, and that suitable features may be selected as desired.

Next, at step 104, a feature selection process may be performed on the various features extracted in step 103 above to select important features from among the features.

In the above feature extraction process, a multi-dimensional feature time sequence construction is adopted, a large number of time sequence features are extracted, but not all the features are useful, and because the number of the extracted features is large, if people select the features by experience, a large amount of manpower is consumed and the accuracy is not high, the features need to be selected by an algorithm so as to screen out important features.

Fig. 2 shows a flow diagram of a feature selection process according to an embodiment of the present disclosure. As shown in fig. 2, the feature selection process may include:

step 201, calculating the importance of each feature by using a random forest algorithm;

step 202, comparing the importance with a predetermined threshold;

step 203, selecting the feature with the importance greater than the threshold as the important feature.

In step 201, importance values of the features can be obtained through a random forest algorithm. A random forest is a classifier that trains and predicts samples using multiple decision trees. The classifier was originally proposed by Leo Breiman and Adele Cutler. Random forest algorithms are widely adopted in the field of machine learning, and detailed description of the random forest algorithms is omitted in the present disclosure.

In step 202 and step 203, it is necessary to filter the calculated importance values of the features, and select the features with the importance values larger than a predetermined threshold as the important features.

Fig. 3 shows a flow chart for determining the predetermined threshold value according to an embodiment of the present disclosure. As shown in fig. 3, the process of determining the predetermined threshold value may include the steps of:

in step 301, a plurality of candidate thresholds are determined. For example, a set of candidate thresholds may be manually set in advance.

Step 302, calculating the evaluation function value of the rough classification model under each candidate threshold value.

Step 303, selecting a candidate threshold corresponding to the maximum value of the evaluation function as the predetermined threshold.

In some embodiments according to the present disclosure, the merit functions may be, for example, Accuracy Rate (Accuracy Rate) and macro F1 (macro-F1). Accuracy refers to the ratio of the number of samples correctly classified by the classification model to the total number of samples for a given test data set.

For the binary problem, training samples can be classified into True Positive (TP), False Positive (FP), True Negative (TN), and False Negative (FN) according to the combination of their true classes and classification model prediction classes, obviously, TP + FP + TN + FN is the total number of training samples. The "confusion matrix" (fusion matrix) of the classification results is shown in table 1.

TABLE 1

Based on the confusion matrix of the classification result, Precision (Precision) P and Recall (Recall) R can be defined as

P=TP/(TP+FP) (2)

R=TP/(TP+FN) (3)

Based on the precision P and recall R above, the F1 metric may be further defined

F1=2×P×R/(P+R) (4)

In the case of having multiple two-class confusion matrices, precision and recall may be considered over multiple confusion matrices. One straightforward way is to calculate the precision and recall, respectively, on each confusion matrix, and then calculate the average, thus obtaining the macro precision (macro _ P), the macro recall (macro _ R), and the corresponding macro F1 (macro-F1):

in addition, the classification model may be trained based on all features (without feature selection processing) using training samples to obtain the coarse classification model.

With the accuracy of the rough classification model at each candidate threshold and the macro F1, a candidate threshold corresponding to the maximum of the accuracy and the macro F1 may be selected as the predetermined threshold.

Next, at step 105, a classification model may be trained based on the training samples and the significant features selected at step 104. In this way, only features of importance greater than a predetermined threshold are used to train the classification model. The hyper-parameters of the classification model may be obtained by searching parameter values in a grid-search (grid-search) manner, for example. The classification model may be, for example, a logistic regression model, a support vector machine model, or the like. This is well known to those skilled in the art and the disclosure will not be described in detail.

Finally, at step 106, the merchant may be classified based on the classification model trained in step 105.

Further, in some embodiments according to the present disclosure, category balancing processing may also be included. In the case of extreme imbalance of the sample classes of the data set, the training process of the model may be affected, and thus the accuracy of classification is affected. Thus, in some embodiments according to the present disclosure, category balancing may also be undertaken. For example, the training samples are over-sampled or under-sampled, such that the sample classes are balanced in the data set of the training samples. In still other embodiments according to the present disclosure, a weight coefficient may also be set for the training samples. For example, the weight coefficient for each training sample may be equal to the total number of training samples divided by the number of training samples in the class to which the training sample belongs.

The merchant classification method according to an embodiment of the present disclosure is described in more detail below with reference to specific examples.

In coastal cities, fishermen can regularly go out of the sea to fish, the fishing time of fishermen goes out of the sea is as short as several days to one week, and is as long as several months or even half a year, so that fishermen can carry some living necessities when going out of the sea to fish. Fishermen who have demands on tobacco products are called as fishery users, the fishery users can purchase a certain amount of tobacco products from tobacco retailers to prepare for the demands of going to the sea before going to the sea, and if the tobacco products required by the fishery users are not enough, the tobacco retailers need to stock the tobacco companies for many times, so that the aim of resource optimization configuration cannot be achieved. By attaching the fishing demand label to the tobacco retailer, the tobacco company can conveniently identify the tobacco retailer related to the fishing demand user, and the supply of tobacco products can be flexibly adjusted. However, in the existing tobacco industry, the labeling problem of the fishing needs of tobacco retailers is realized according to self experience or a regularization method, and the mode seriously depends on human experience and a large amount of manual participation, so that the work efficiency and the accuracy cannot be guaranteed.

By adopting the merchant classification method according to the embodiment of the disclosure, the tobacco retailer can be labeled, and whether the tobacco retailer is associated with the fishery users or not can be determined according to the sales data of the tobacco retailer.

First, field information such as sale date, sale quantity, sale price, retailer name and tobacco seed name can be obtained from the purchase-sale-stock-date report of each tobacco retailer stored in the database. FIG. 4 illustrates a schematic diagram of a portion of a cost-sales-date report. As shown in fig. 4, the purchase-sale-deposit-date report includes field information such as the sale date, the sale quantity, the merchant code, the merchant name, the product code, and the product name of the product. And acquiring sales data of the time period from 6 months and 1 day to 11 months and 30 days from the purchase-sale-storage-day report. In the purchase-sale-stock-day report, there may be a case where a certain day of a certain cigarette sold by a tobacco retailer is not recorded in the table, and the number of sales of the certain cigarette for that day may be set to zero.

The obtained sales data of the cigarettes are then preprocessed. During the preprocessing, cigarette sales data is first screened. For example, the total sales per cigarette per tobacco retailer may be calculated; the cigarettes from each tobacco retailer that were 10 before the sale were then screened for total sales. A total of 86 cigarettes were obtained by all tobacco retailers. Fig. 5 shows information of merchant codes, commodity codes, sales numbers and the like corresponding to the 86 screened cigarettes. In fig. 5, the first column shows the merchant code, the second column time shows the ith day from day 1 of 6 months, since the data is taken from the time period of day 1 of 6 months to day 30 of 11 months, i has a value range of [0,182], and then the 86 columns show the sales of 86 cigarettes in the corresponding merchant and the ith day. In fig. 5, only a portion of the merchant code and a corresponding portion of the sales of the cigarette are schematically shown.

The following processing will be based on sales data for these 86 cigarettes. For the sales data of these 86 kinds of cigarettes, normalization processing was performed according to the above formula (1), thereby removing dimensions. The sales data subjected to the normalization process is shown in fig. 6.

Next, based on the sales data after the normalization processing, a feature extraction process is performed. For example, various time series characteristics of cigarette sales can be extracted, mainly including period class characteristics (number of peaks, period of peak interval, etc.), statistical class characteristics (variance, standard deviation, mean, maximum, minimum, median, entropy, length of longest continuous subsequence greater than/less than mean, presence or absence of repetition of minimum/maximum, position where maximum/minimum appears last, number of numbers greater than/less than mean, etc.), frequency domain class characteristics (frequency of fourier change, phase, spectral centroid, variance, skewness, coefficient, kurtosis of absolute fourier transform spectrum, etc.), nonlinear characteristics for measuring time series data (herfis-Pearson normalized moment coefficient G1, SNR, coefficient of Langevin model fitting, continuous wavelet change of Ricker wavelet, autocorrelation coefficient of lag (lag)), linear characteristics for measuring time series data (linear least squares regression coefficient G), linear characteristics for measuring time series data (linear least squares regression coefficient, Autoregressive model coefficients).

Then, the importance of each feature is calculated using a random forest algorithm. Fig. 7 shows an example of the importance of various features calculated by the random forest algorithm. As shown in fig. 7, values of the importance of 25 features are listed. The character string in each row represents the meaning of the feature, and the subsequent numerical values represent the importance of the feature. The numeric part where the character string begins is a smoke code, and the character string is followed by some statistical feature of the smoke data, for example, "agg _ linear _ refer _ term _ atrr _ interrupt" in the first line is the intercept of a linear regression, and "chunk _ len" and "mean" are parameters corresponding to the linear regression.

To determine the predetermined threshold used in screening features, a set of candidate thresholds may be determined and then selected from the set of candidate thresholds. Specifically, a classification model (e.g., a support vector machine model) is trained based on all features that have not been screened, so as to obtain a coarse classification model. Then, the values of the merit functions under the respective candidate thresholds are calculated using the coarse classification model. The merit function may be accuracy and/or macro F1.

FIG. 8 shows a set of candidate thresholds and the accuracy rates corresponding to each candidate threshold and the value of the macro F1. As shown in fig. 8, the candidate threshold value starts at 0 and gradually increases to 0.000459. When the candidate threshold is equal to 0.000059, the corresponding accuracy reaches a maximum value of 0.8424. When the candidate threshold is equal to 0.000149, the corresponding macro F1 reaches a maximum value of 0.3930. Therefore, if the accuracy is taken as the evaluation function, 0.000059 may be selected as the predetermined threshold. If macro F1 is used as the evaluation function, 0.000149 may be selected as the predetermined threshold. It should be understood that the present disclosure is not limited thereto. For example, the merit function may be based on both the macro F1 and the accuracy, such as in one exemplary embodiment, the merit function may be defined as an average or weighted average of both the macro F1 and the accuracy. (in fact, F1 and accuracy are not evaluated together in a mixed way, and the two index measures respectively show different aspects of the result and determine which index is according to actual requirements). In practical applications, the accuracy and macro F1 measure different aspects of the classification model. If the classification effect emphasizing the whole samples is applied, the accuracy should be used as the evaluation index, and 0.000059 may be selected as the predetermined threshold. If the evaluation category average result is applied, then macro F1 should be used as the evaluation index, and 0.000149 may be selected as the predetermined threshold.

After the threshold value (predetermined threshold value) is determined, the respective features may be screened based on the threshold value, and the feature having the importance value larger than the threshold value is selected as the important feature.

Next, algorithm parameter tuning processing is performed. In the algorithm parameter optimization processing, the accuracy of the logistic regression algorithm and the Support Vector Machine (SVM) algorithm on the prediction classification of the tobacco retailer needs to be compared, and the hyper-parameters in the algorithm are adjusted. For example, fig. 9 shows candidate values of the hyperparameters supporting the vector machine algorithm. Where Kernel has two candidate values, linear Kernel and rbf. Candidate value of hyper-parameter Gamma is from 10 to 10-4. The candidate value of the hyperparameter C (penalty factor) is from 0.25 to 512.

For the various combinations of candidate hyper-parameters shown in FIG. 9, the macro F1 may be selected as an evaluation value because tobacco retailers associated with fishery users belong to a small number of categories. Using 5-fold cross validation, the best parameters can be found as: { ' kernel ': linear ', ' C ': 0.25}, in the experimental results under the optimal hyper-parameter, the macro F1 equals 0.3561 and the accuracy equals 0.8424.

FIG. 10 shows the result of the SVM algorithm at 1-fold during the 5-fold cross-validation process. As shown in fig. 10, in the 1-fold result, the evaluation condition of each category is displayed, and the overall result is displayed. Wherein precision, call and F1-score respectively represent precision, recall and F1 indexes in the test result, and support represents the number of test samples. Wherein accuracy represents the overall accuracy, macro avg is the corresponding macro _ P, macro _ R and macro _ F1 indexes, and weighted avg is the macro _ P, macro _ R and macro _ F1 indexes calculated by weighted average of the number of class samples. The 1-fold result shows the classification results of 4 categories, respectively, according to different categories, and based on the F1 index of the category, the result of "irrelevant" is the best, and is followed by "closely relevant", and the result of "slightly relevant" category is the worst. In terms of overall results, in this 1-compromise, the accuracy reaches 0.86 and the macro F1 reaches 0.41.

In addition to the svm algorithm, a logistic regression algorithm may be used for classification. FIG. 11 shows a set of candidate values for a hyperparameter of a logistic regression algorithm. Wherein, the value of Max _ iter is from 100 to 1000, and the value of C is from 1 to 512. Similar to the svm algorithm above, selecting macro F1 as the evaluation value, using 5-fold cross validation, the best set of hyper-parameters can be obtained as: { 'C': 512, 'Max _ iter': 100 }. Under the set of optimal hyper-parameters, macro F1 equals 0.3561, and accuracy equals 0.8424. FIG. 12 shows the results of the 1-fold in 5-fold cross-validation. As shown in fig. 12, in the 1-fold result, the meaning of each index is the same as that in fig. 10, and the logistic regression result is the same as that of SVM in terms of the overall index, and the accuracy rate is as high as 0.86, and the macro F1 is as high as 0.41, and it can be seen that the logistic regression result is very close to that of SVM.

Since tobacco retailers associated with fishery users belong to a small number, training samples can also be class balanced. In some embodiments according to the present disclosure, the class balancing process is performed by setting a weight coefficient for the training samples. Wherein the weight coefficient of each training sample is equal to the total number of training samples divided by the number of training samples in the class to which the training sample belongs. In other embodiments according to the present disclosure, the training samples may be directly oversampled or undersampled.

With the category balancing processing, through the candidate values of the above similar hyperparameters, using 5-fold cross validation, the evaluation value selection macro F1, the optimal hyperparameters of the SVM algorithm are { ' kernel ': linear ', ' C ': 0.25}, and in the experimental result under the optimal hyperparameters, the macro F1 is equal to 0.3595, and the accuracy is equal to 0.8424. The results of the 1-fold in the 5-fold cross-validation are shown in FIG. 13.

Similarly, the optimal hyperparameter for the logistic regression algorithm is { 'C': 1, 'Max _ iter': 200}, and in the experimental results under the optimal hyperparameter, the macro F1 equals 0.3917 and the accuracy equals 0.8451. The results of the 1-fold in the 5-fold cross-validation are shown in FIG. 14.

Fig. 15 shows tuning results of the logistic regression algorithm and the SVM algorithm. As shown in fig. 15, the overall accuracy of the logistic regression algorithm is highest without the category balancing processing. In practical applications, the model should be selected if the overall situation is to be considered. In the case of the class balancing processing, the macro F1 of the logistic regression algorithm is the highest. In practical applications, if emphasis is placed on the situation of tobacco retailers associated with fishery users, the model should be selected to classify tobacco retailers to identify tobacco retailers associated with fishery users.

Based on the above method, tobacco retailers in three cities A, B, C were classified. The experimental result of city a is shown in fig. 16, and the highest accuracy rate can reach 0.8478 under the logistic regression algorithm, and the highest accuracy rate under the SVM algorithm can reach 0.8424. The overall accuracy of the logistic regression algorithm is highest without category balancing. In practical applications, the model should be selected if the overall situation is to be considered. In the case of the class balancing processing, the macro F1 of the logistic regression algorithm is the highest. In practical applications, if emphasis is placed on the situation of tobacco retailers associated with fishery users, the model should be selected to classify tobacco retailers to identify tobacco retailers associated with fishery users.

The results of the experiments for cities B and C are shown in fig. 17. As can be seen from FIG. 17, the highest accuracy of the classification of the fishery needs of city B can reach 0.89, and the highest accuracy of the classification of the fishery needs of city C can reach 0.74. According to the method for classifying the tobacco, the tobacco retailers relevant to the fishery users can be effectively identified.

Fig. 18 shows a block diagram of a computing device, which is one example of a hardware device applicable to aspects of the present disclosure, according to an example embodiment of the present disclosure.

With reference to fig. 18, a computing device 700, which is one example of a hardware device applicable to aspects of the present disclosure, will now be described. Computing device 700 may be any machine configured to implement processing and/or computing, and may be, but is not limited to, a workstation, a server, a desktop computer, a laptop computer, a tablet computer, a personal digital assistant, a smart phone, an in-vehicle computer, or any combination thereof. The various aforementioned apparatus/servers/client devices may be implemented in whole or at least in part by computing device 700 or similar devices or systems.

Computing device 700 may include components connected to or in communication with bus 702, possibly via one or more interfaces. For example, computing device 700 may include a bus 702, one or more processors 704, one or more input devices 706, and one or more output devices 708. The one or more processors 704 may be any type of processor and may include, but are not limited to, one or more general purpose processors and/or one or more special purpose processors (e.g., dedicated processing chips). Input device 706 may be any type of device capable of inputting information to a computing device and may include, but is not limited to, a mouse, a keyboard, a touch screen, a microphone, and/or a remote controller. Output device 708 may be any type of device capable of presenting information and may include, but is not limited to, a display, speakers, a video/audio output terminal, a vibrator, and/or a printer. Computing device 700 may also include or be connected with non-transitory storage device 710, which may be any storage device that is non-transitory and that enables data storage, and which may include, but is not limited to, disk drives, optical storage devices, solid-state memory, floppy disks, hard disks, magnetic tape, or any other magnetic medium, optical disks or any other optical medium, ROM (read only memory), RAM (random access memory), cache memory, and/or any memory chip or cartridge, and/or any other medium from which a computer can read data, instructions, and/or code. The non-transitory storage device 710 may be detached from the interface. The non-transitory storage device 710 may have data/instructions/code for implementing the above-described methods and steps. Computing device 700 may also include a communication device 712. The communication device 712 may be any type of device or system capable of communicating with internal apparatus and/or with a network and may include, but is not limited to, a modem, a network card, an infrared communication device, a wireless communication device, and/or a chipset, such as a Bluetooth device, 1302.11 device, a WiFi device, a WiMax device, a cellular communication device, and/or the like.

The bus 702 may include, but is not limited to, an Industry Standard Architecture (ISA) bus, a Micro Channel Architecture (MCA) bus, an enhanced ISA (eisa) bus, a Video Electronics Standards Association (VESA) local bus, and a Peripheral Component Interconnect (PCI) bus.

Computing device 700 may also include a working memory 714, which may be any type of working memory capable of storing instructions and/or data that facilitate the operation of processor 704 and may include, but is not limited to, random access memory and/or read only memory devices.

Software components may be located in the working memory 714, including, but not limited to, an operating system 716, one or more application programs 718, drivers, and/or other data and code. Instructions for implementing the above-described methods and steps may be included in the one or more application programs 718, and the aforementioned modules/units/components of the various apparatus/server/client devices may be implemented by the processor 704 reading and executing the instructions of the one or more application programs 718.

It should also be appreciated that variations may be made in accordance with specific needs. For example, customized hardware might also be used and/or particular components might be implemented in hardware, software, firmware, middleware, microcode, hardware description languages, or any combination thereof. In addition, connections to other computing devices, such as network input/output devices and the like, may be employed. For example, some or all of the disclosed methods and apparatus may be implemented with logic and algorithms according to the present disclosure through programming hardware (e.g., programmable logic circuitry including Field Programmable Gate Arrays (FPGAs) and/or Programmable Logic Arrays (PLAs)) having assembly language or hardware programming languages (e.g., VERILOG, VHDL, C + +).

In addition, the embodiment according to the present disclosure may further include the following technical solutions:

(1) a merchant classification method, comprising:

acquiring sales data of a merchant, wherein the sales data comprises training samples;

preprocessing the sales data;

performing a feature extraction process based on the pre-processed sales data, thereby obtaining a plurality of features;

performing a feature selection process on the plurality of features to select an important feature from the plurality of features;

training a classification model based on the important features of the training samples; and

classifying the merchant based on the classification model.

(2) The merchant classification method according to claim 1, wherein the sales data comprises at least one of: sales date, sales quantity, sales price, merchant name, commodity name.

(3) The merchant classifying method according to claim 1, wherein the sales data includes sales data for a predetermined period of time.

(4) The merchant classification method according to claim 3, wherein the sales data comprises: the number of sales and the name of the goods,

the pretreatment comprises the following steps:

calculating the total sales volume of each commodity of each merchant;

and selecting the sales data of K commodities before the sales of each merchant according to the total sales, wherein K is a positive integer.

(5) The merchant classifying method according to claim 1, wherein the preprocessing includes a process of removing a dimension of the sales data.

(6) The merchant classification method according to claim 5, wherein the preprocessing comprises a normalization process.

(7) The merchant classification method according to claim 5, wherein the normalization process is performed using the following formula:

x’=(x-xmin)/(xmax-xmin),

wherein x is sales data, xminIs the minimum of the sales dataValue, xmaxAnd x' is the maximum value of the sales data and is the normalized sales data after normalization processing.

(8) The merchant classification method according to claim 1, wherein the features include timing features.

(9) The merchant classification method according to claim 1, wherein the timing features include: period class characteristics, statistical class characteristics, frequency domain class characteristics, nonlinear characteristics, and linear characteristics.

(10) The merchant classification method according to claim 9, wherein the cycle-like features include: number of peaks, period of peak intervals.

(11) The merchant classification method according to claim 9, wherein the statistical class features include: variance, standard deviation, mean, extremum, median, entropy, length of longest continuous subsequence greater than mean, length of longest continuous subsequence less than mean, repeatability of extremum, location where extremum last occurred, number of sales data greater than mean, and number of sales data less than mean.

(12) The merchant classification method according to claim 9, wherein the frequency domain class features comprise: frequency, phase, spectral centroid, variance, skewness, coefficients of the fourier transform, and kurtosis of the absolute fourier transform spectrum.

(13) The merchant classification method according to claim 9, wherein the non-linear features comprise: Fisher-Pearson normalized moment coefficients, signal-to-noise ratios, coefficients fitted to a Langevin model, continuous wavelet variations of a Ricker wavelet, autocorrelation coefficients of a lag.

(14) The merchant classification method according to claim 9, wherein the linear features comprise: linear least squares regression coefficients, auto-regression model coefficients.

(15) The merchant classification method according to claim 1, wherein the feature selection process includes:

calculating the importance of each characteristic by using a random forest algorithm;

comparing the importance to a predetermined threshold;

selecting the feature having the importance greater than the threshold as the important feature.

(16) The merchant classifying method according to claim 15, further comprising:

a plurality of candidate threshold values are determined,

calculating the values of the evaluation functions of the rough classification models under the candidate threshold values;

selecting a candidate threshold corresponding to a maximum value of the evaluation function as the predetermined threshold.

(17) The merchant classification method according to claim 16, wherein the merit function includes an accuracy rate (accuracy rate) and a macro F1 (macro-F1).

(18) The merchant classifying method according to claim 16, further comprising: training the coarse classification model based on all of the plurality of features of the training sample.

(19) The merchant classification method of claim 1, wherein the training a classification model comprises:

a grid-search (grid-search) is used to determine hyper-parameters of the classification model.

(20) The merchant classification method according to claim 1, wherein the classification model comprises: logistic regression models and support vector machine models.

(21) The merchant classification method according to claim 1, wherein the method further comprises:

and (5) carrying out category balancing processing.

(22) The merchant classification method according to claim 21, wherein the category balancing process comprises:

and performing oversampling processing or undersampling processing on the training samples.

(23) The merchant classification method according to claim 21, wherein the category balancing process comprises:

and setting a weight coefficient for the training sample.

(24) The merchant classification method of claim 23, wherein the weight coefficient for each training sample is equal to the total number of training samples divided by the number of training samples in the category to which the training sample belongs.

(25) A merchant classifying device comprising:

a processor; and

a memory having stored therein a set of instructions that,

when the processor executes the instructions, the processor is configured to:

acquiring sales data of a merchant, wherein the sales data comprises training samples;

preprocessing the sales data;

performing a feature extraction process based on the pre-processed sales data, thereby obtaining a plurality of features;

performing a feature selection process on the plurality of features to select an important feature from the plurality of features;

training a classification model based on the important features of the training samples; and

classifying the merchant based on the classification model.

(26) A computer-readable non-transitory storage medium having instructions stored thereon, which when executed by a processor, the processor is configured to:

acquiring sales data of a merchant, wherein the sales data comprises training samples;

preprocessing the sales data;

performing a feature extraction process based on the pre-processed sales data, thereby obtaining a plurality of features;

performing a feature selection process on the plurality of features to select an important feature from the plurality of features;

training a classification model based on the important features of the training samples; and

classifying the merchant based on the classification model.

(27) A computer program comprising a series of instructions which, when executed by a processor, the processor is configured to:

acquiring sales data of a merchant, wherein the sales data comprises training samples;

preprocessing the sales data;

performing a feature extraction process based on the pre-processed sales data, thereby obtaining a plurality of features;

performing a feature selection process on the plurality of features to select an important feature from the plurality of features;

training a classification model based on the important features of the training samples; and

classifying the merchant based on the classification model.

As used herein, the word "exemplary" means "serving as an example, instance, or illustration," and not as a "model" that is to be replicated accurately. Any implementation exemplarily described herein is not necessarily to be construed as preferred or advantageous over other implementations. Furthermore, the disclosure is not limited by any expressed or implied theory presented in the preceding technical field, background, brief summary or the detailed description.

As used herein, the term "substantially" is intended to encompass any minor variation resulting from design or manufacturing imperfections, device or component tolerances, environmental influences, and/or other factors. The word "substantially" also allows for differences from a perfect or ideal situation due to parasitic effects, noise, and other practical considerations that may exist in a practical implementation.

It will be further understood that the terms "comprises/comprising," "includes" and/or "including," when used herein, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.

In the present disclosure, the term "providing" is used broadly to encompass all ways of obtaining an object, and thus "providing an object" includes, but is not limited to, "purchasing," "preparing/manufacturing," "arranging/setting," "installing/assembling," and/or "ordering" the object, and the like.

Those skilled in the art will appreciate that the boundaries between the above described operations merely illustrative. Multiple operations may be combined into a single operation, single operations may be distributed in additional operations, and operations may be performed at least partially overlapping in time. Moreover, alternative embodiments may include multiple instances of a particular operation, and the order of operations may be altered in various other embodiments. However, other modifications, variations, and alternatives are also possible. The specification and drawings are, accordingly, to be regarded in an illustrative rather than a restrictive sense.

Although some specific embodiments of the present disclosure have been described in detail by way of example, it should be understood by those skilled in the art that the foregoing examples are for purposes of illustration only and are not intended to limit the scope of the present disclosure. The various embodiments disclosed herein may be combined in any combination without departing from the spirit and scope of the present disclosure. It will also be appreciated by those skilled in the art that various modifications may be made to the embodiments without departing from the scope and spirit of the disclosure. The scope of the present disclosure is defined by the appended claims.

完整详细技术资料下载
上一篇:石墨接头机器人自动装卡簧、装栓机
下一篇:目标对象的确定方法、装置和服务器

网友询问留言

已有0条留言

还没有人留言评论。精彩留言会获得点赞!

精彩留言,会给你点赞!