Method and system for improving multi-label classification accuracy
1. A method for improving the accuracy of multi-label classification is characterized by comprising the following steps:
s1, acquiring online information of each user to be classified in social media users to a plurality of monitoring self-media operation platforms;
s2, acquiring self-media operation data of each user to be classified according to the online information of each user to be classified on a plurality of monitoring self-media operation platforms;
and S3, dividing the social media users to be classified into a plurality of label classifications based on the self-media running data of the users to be classified.
2. The method of claim 1, wherein before S1, the method further comprises the following steps:
s0. determining the subject area;
determining a plurality of self-media operation platforms of which the theme labels belong to the theme field as monitoring self-media operation platforms;
and determining the individuals using the monitoring self-media operation platform, the number of which reaches the threshold value of the monitoring number, as the users to be classified.
3. The method of claim 1, wherein the presence information in S1 includes: an online time;
the S1 specifically comprises the following steps:
acquiring online time of each user to be classified and each monitoring self-media operation platform in a monitoring historical time period aiming at each user to be classified;
s2 specifically includes the following steps:
and generating the self-media operation data of the user to be classified according to the sequence of the online time between the user to be classified and each monitoring self-media operation platform and the position information of each monitoring self-media operation platform aiming at each user to be classified.
4. The method as claimed in claim 1, wherein the step S3 specifically includes the following steps:
and clustering each user to be classified in the social media users to be classified based on the self-media running data of each user to be classified to obtain a plurality of label classifications.
5. The method of claim 1, further comprising the following steps after S3:
s4, classifying a plurality of users to be classified in the label classification based on the online frequency information of the users to be classified in the label classification to each monitoring self-media operation platform according to each label classification to obtain at least one sub-classification corresponding to the label classification;
the online frequency information comprises: online frequency or online number.
6. The method as claimed in claim 5, wherein the step S4 specifically includes the following steps:
s41, acquiring online time between each user to be classified and each monitoring self-media operation platform aiming at each user to be classified;
s42, counting online time information of online time falling into a monitoring historical time period between the user to be classified and the monitoring self-media operation platform aiming at each monitoring self-media operation platform;
s43, clustering a plurality of users to be classified in the label classification based on the online times of the users to be classified in the label classification and each monitoring self-media operation platform to obtain a plurality of sub-classifications.
7. The method as claimed in claim 6, wherein the step S43 specifically includes the following steps:
s431, establishing a frequency characteristic vector corresponding to each user to be classified in the label classification based on the online frequency information of each user to be classified in the label classification and each monitored self-media operation platform;
and S432, clustering all users to be classified in the label classification based on the number characteristic vectors corresponding to the users to be classified in the label classification by using a monitored clustering algorithm.
8. The method for improving the accuracy of multi-label classification as claimed in claim 7, wherein before S431, the method further comprises the following steps:
s430, performing interference elimination processing on the users to be classified and the online times of each monitored self-media operation platform in the label classification;
s431 specifically includes the following steps:
and establishing a frequency characteristic vector corresponding to each user to be classified in the label classification based on the result of interference removal processing.
9. The method of claim 8, wherein the step S430 specifically includes the following steps:
logarithmizing the online times of each user to be classified and each monitoring self-media operation platform in the label classification;
after the online times are logarithmic, the online times of the monitoring self-media operation platform which are smaller than the monitoring threshold value return to zero.
10. A system for improving multi-label classification accuracy, comprising:
the system comprises an acquisition unit, a monitoring unit and a processing unit, wherein the acquisition unit is used for acquiring online time information of each user to be classified in social media users to be classified on a plurality of monitored self-media operation platforms;
the self-media operation computing unit is used for acquiring self-media operation data of each user to be classified according to the online time information of each user to be classified on a plurality of monitoring self-media operation platforms;
and the classification unit is used for dividing the social media users to be classified into a plurality of label classifications based on the self-media running data of the users to be classified.
Background
Social media, as a product of the internet era, has become an indispensable part of people's lives, and user accounts, as publishers and propagators of information, hold a large amount of valuable data information. Therefore, mass account numbers are identified and classified in a targeted manner, manpower resources and time cost required by the construction of a traditional account number management system are reduced, and meanwhile real-time information and dynamics in a certain field can be acquired more comprehensively and effectively.
The prior art generally determines its self-media run data by obtaining a registration for presence information. However, when registering for use, multiple self-media may be used simultaneously, so it is difficult to obtain the complete self-media running data. Therefore, the method for acquiring the self-media running data has the technical problem of poor comprehensiveness.
Disclosure of Invention
In order to solve the technical problems, the application provides a method and a system for improving the accuracy of multi-tag classification, which can determine self-media operation data of a user based on the use condition of the user on a self-media operation platform, perform tag classification on social media users based on the self-media operation data, mine the characteristics of the social media users based on the tag classification result, and improve the comprehensiveness of information mining.
A method for improving the accuracy of multi-label classification comprises the following steps:
s1, acquiring online information of each user to be classified in social media users to a plurality of monitoring self-media operation platforms;
s2, acquiring self-media operation data of each user to be classified according to the online information of each user to be classified on a plurality of monitoring self-media operation platforms;
and S3, dividing the social media users to be classified into a plurality of label classifications based on the self-media running data of the users to be classified.
Preferably, before S1, the method further comprises the following steps:
s0. determining the subject area;
determining a plurality of self-media operation platforms of which the theme labels belong to the theme field as monitoring self-media operation platforms;
and determining the individuals using the monitoring self-media operation platform, the number of which reaches the threshold value of the monitoring number, as the users to be classified.
Further, the presence information in S1 includes: an online time;
the S1 specifically comprises the following steps:
acquiring online time of each user to be classified and each monitoring self-media operation platform in a monitoring historical time period aiming at each user to be classified;
s2 specifically includes the following steps:
and generating the self-media operation data of the user to be classified according to the sequence of the online time between the user to be classified and each monitoring self-media operation platform and the position information of each monitoring self-media operation platform aiming at each user to be classified.
Preferably, S3 specifically includes the following steps:
and clustering each user to be classified in the social media users to be classified based on the self-media running data of each user to be classified to obtain a plurality of label classifications.
Further, after S3, the method further includes the following steps:
s4, classifying a plurality of users to be classified in the label classification based on the online frequency information of the users to be classified in the label classification to each monitoring self-media operation platform according to each label classification to obtain at least one sub-classification corresponding to the label classification;
the online frequency information comprises: online frequency or online number.
Further, S4 specifically includes the following steps:
s41, acquiring online time between each user to be classified and each monitoring self-media operation platform aiming at each user to be classified;
s42, counting online time information of online time falling into a monitoring historical time period between the user to be classified and the monitoring self-media operation platform aiming at each monitoring self-media operation platform;
s43, clustering a plurality of users to be classified in the label classification based on the online times of the users to be classified in the label classification and each monitoring self-media operation platform to obtain a plurality of sub-classifications.
Preferably, S43 specifically includes the following steps:
s431, establishing a frequency characteristic vector corresponding to each user to be classified in the label classification based on the online frequency information of each user to be classified in the label classification and each monitored self-media operation platform;
and S432, clustering all users to be classified in the label classification based on the number characteristic vectors corresponding to the users to be classified in the label classification by using a monitored clustering algorithm.
Preferably, before S431, the method further comprises the following steps:
s430, performing interference elimination processing on the users to be classified and the online times of each monitored self-media operation platform in the label classification;
s431 specifically includes the following steps:
and establishing a frequency characteristic vector corresponding to each user to be classified in the label classification based on the result of interference removal processing.
Further, S430 specifically includes the following steps:
logarithmizing the online times of each user to be classified and each monitoring self-media operation platform in the label classification;
after the online times are logarithmic, the online times of the monitoring self-media operation platform which are smaller than the monitoring threshold value return to zero.
The invention also provides a system for improving the multi-label classification accuracy, which comprises the following steps:
the system comprises an acquisition unit, a monitoring unit and a processing unit, wherein the acquisition unit is used for acquiring online time information of each user to be classified in social media users to be classified on a plurality of monitored self-media operation platforms;
the self-media operation computing unit is used for acquiring self-media operation data of each user to be classified according to the online time information of each user to be classified on a plurality of monitoring self-media operation platforms;
and the classification unit is used for dividing the social media users to be classified into a plurality of label classifications based on the self-media running data of the users to be classified.
According to the method and the system, the online information of each user to be classified in the social media users with the classification to the plurality of monitored self-media operation platforms is obtained to determine the self-media operation data of the user to be classified, the self-media operation platforms can continuously obtain the online information of the user, so that the online information of the user to be classified can be determined according to the use condition of the user to be classified to the self-media operation platforms and the theme label of each self-media operation platform, the user group characteristics can be mined through more comprehensive online information, and more comprehensive and complete mining results can be obtained.
Detailed Description
In order to make the objects, technical solutions and advantages of the present application more apparent, the present application is further described in detail with reference to the following embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application.
Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs. The terminology used herein in the description of the present application is for the purpose of describing particular embodiments only and is not intended to be limiting of the application.
Example 1
A method for improving the accuracy of multi-label classification comprises the following steps:
s1, acquiring online information of each user to be classified in social media users to a plurality of monitoring self-media operation platforms;
s2, acquiring self-media operation data of each user to be classified according to the online information of each user to be classified on a plurality of monitoring self-media operation platforms;
and S3, dividing the social media users to be classified into a plurality of label classifications based on the self-media running data of the users to be classified.
Example 2
A method for improving the accuracy of multi-label classification comprises the following steps:
s0. determining the subject area;
determining a plurality of self-media operation platforms of which the theme labels belong to the theme field as monitoring self-media operation platforms;
and determining the individuals using the monitoring self-media operation platform, the number of which reaches the threshold value of the monitoring number, as the users to be classified.
S1, acquiring online time of each user to be classified and each monitoring self-media operation platform in a monitoring historical time period aiming at each user to be classified;
s2, aiming at each user to be classified, generating self-media operation data of the user to be classified according to the sequence of online time between the user to be classified and each monitoring self-media operation platform and the position information of each monitoring self-media operation platform;
and S3, dividing the social media users to be classified into a plurality of label classifications based on the self-media running data of the users to be classified.
Example 3
A method for improving the accuracy of multi-label classification comprises the following steps:
s0. determining the subject area;
determining a plurality of self-media operation platforms of which the theme labels belong to the theme field as monitoring self-media operation platforms;
and determining the individuals using the monitoring self-media operation platform, the number of which reaches the threshold value of the monitoring number, as the users to be classified.
S1, acquiring online time of each user to be classified and each monitoring self-media operation platform in a monitoring historical time period aiming at each user to be classified;
s2, aiming at each user to be classified, generating self-media operation data of the user to be classified according to the sequence of online time between the user to be classified and each monitoring self-media operation platform and the position information of each monitoring self-media operation platform;
and S3, clustering the users to be classified in the social media users to be classified based on the self-media running data of the users to be classified to obtain a plurality of label classifications. S4, classifying a plurality of users to be classified in the label classification based on the online frequency information of the users to be classified in the label classification to each monitoring self-media operation platform according to each label classification to obtain at least one sub-classification corresponding to the label classification;
the online frequency information comprises: online frequency or online number.
S4 specifically includes the following steps:
s41, acquiring online time between each user to be classified and each monitoring self-media operation platform aiming at each user to be classified;
s42, counting online time information of online time falling into a monitoring historical time period between the user to be classified and the monitoring self-media operation platform aiming at each monitoring self-media operation platform;
s43, clustering a plurality of users to be classified in the label classification based on the online times of the users to be classified in the label classification and each monitoring self-media operation platform to obtain a plurality of sub-classifications.
S43 specifically includes the following steps:
s431, establishing a frequency characteristic vector corresponding to each user to be classified in the label classification based on the online frequency information of each user to be classified in the label classification and each monitored self-media operation platform;
and S432, clustering all users to be classified in the label classification based on the number characteristic vectors corresponding to the users to be classified in the label classification by using a monitored clustering algorithm.
Example 4
On the basis of embodiment 3, before S431, the method further includes the following steps:
s430, performing interference elimination processing on the users to be classified and the online times of each monitored self-media operation platform in the label classification;
s430 specifically includes the following steps:
logarithmizing the online times of each user to be classified and each monitoring self-media operation platform in the label classification;
after the online times are logarithmic, the online times of the monitoring self-media operation platform which are smaller than the monitoring threshold value return to zero.
S431 specifically includes the following steps:
and establishing a frequency characteristic vector corresponding to each user to be classified in the label classification based on the result of interference removal processing.
Example 5
A system for improving the accuracy of multi-label classification is provided, which comprises:
the system comprises an acquisition unit, a monitoring unit and a processing unit, wherein the acquisition unit is used for acquiring online time information of each user to be classified in social media users to be classified on a plurality of monitored self-media operation platforms;
the self-media operation computing unit is used for acquiring self-media operation data of each user to be classified according to the online time information of each user to be classified on a plurality of monitoring self-media operation platforms;
and the classification unit is used for dividing the social media users to be classified into a plurality of label classifications based on the self-media running data of the users to be classified.
The above description is only a preferred embodiment of the present application and is not intended to limit the present application, and various modifications and changes may be made by those skilled in the art. Any modification, equivalent replacement, improvement and the like made within the spirit and principle of the present application shall be included in the protection scope of the present application.
It will be evident to those skilled in the art that the present application is not limited to the details of the foregoing illustrative embodiments, and that the present application may be embodied in other specific forms without departing from the spirit or essential attributes thereof. The present embodiments are therefore to be considered in all respects as illustrative and not restrictive, the scope of the application being indicated by the appended claims rather than by the foregoing description, and all changes which come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein. Any reference sign in a claim should not be construed as limiting the claim concerned.