Employee community discovery method, system, computer and readable storage medium
1. A method for discovering a community of employees, comprising:
an interaction sequence obtaining step, namely obtaining session interaction data of a target employee, encrypting the session interaction data, dividing the session interaction data into a plurality of session units according to a preset segmentation unit, and outputting the session units as an employee interaction sequence;
a sequence model obtaining step, namely constructing and training a sequence model through a sequence modeling method based on the employee interaction sequence;
a sequence vector obtaining step, namely obtaining a sequence vector of the staff interaction sequence based on the sequence model;
and a step of discovering the community of the staff, which is to cluster the sequence vectors by using a clustering algorithm to complete community discovery.
2. The employee community discovery method of claim 1, wherein the sequence model is a Word2Vec model, and the sequence model obtaining step further comprises:
a data preprocessing step, namely converting the staff interaction sequence into staff interaction expectation;
a dictionary building step, traversing words in the staff interactive expectation to build a dictionary and counting word frequency;
a Huffman tree construction step, wherein a Huffman tree is constructed based on the word frequency;
and a model training step, namely training a CBOW model or a Skip-Gram model in the Word2Vec model by using the Huffman tree.
3. The employee community discovery method according to claim 1 or 2, further comprising:
and a community employee output step, namely outputting the employees clustered into the community based on the query request of the user.
4. The employee community discovery method according to claim 1 or 2, further comprising:
model iteration step, obtaining increment data of conversation interactive data of a preset increment period, and carrying out iterative training on the sequence model based on the increment data;
and an incremental community discovery step, namely acquiring a sequence vector according to the sequence model and carrying out clustering operation.
5. An employee community discovery system, comprising:
the interaction sequence acquisition module is used for acquiring conversation interaction data of a target employee, encrypting the conversation interaction data, dividing the conversation interaction data into a plurality of conversation units according to a preset segmentation unit, and outputting the conversation units as an employee interaction sequence;
the sequence model acquisition module is used for constructing and training a sequence model through a sequence modeling method based on the employee interaction sequence;
the sequence vector acquisition module is used for acquiring a sequence vector of the staff interaction sequence based on the sequence model;
and the employee community discovery module is used for clustering the sequence vectors by utilizing a clustering algorithm to complete community discovery.
6. The system of claim 5, wherein the sequence model is a Word2Vec model, and the sequence model obtaining module further comprises:
the data preprocessing module is used for converting the staff interaction sequence into staff interaction expectation;
the dictionary building module is used for traversing words in the staff interactive prediction to build a dictionary and counting word frequency;
the Huffman tree construction module is used for constructing a Huffman tree based on the word frequency;
and the model training module is used for training a CBOW model or a Skip-Gram model in the Word2Vec model by utilizing the Huffman tree.
7. The system for community discovery of employees of claim 5 or 6, further comprising:
and the community employee output module is used for outputting the employees clustered into the community based on the query request of the user.
8. The system for community discovery of employees of claim 5 or 6, further comprising:
the model iteration module is used for acquiring incremental data of session interactive data of a preset incremental period and carrying out iterative training on the sequence model based on the incremental data;
and the increment community discovery module is used for acquiring the sequence vector according to the sequence model and carrying out clustering operation.
9. A computer device comprising a memory, a processor, and a computer program stored on the memory and executable on the processor, wherein the processor when executing the computer program implements the employee community discovery method of any one of claims 1 to 4.
10. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the method for employee community discovery according to any one of claims 1 to 4.
Background
The social network is originated from social networking, the starting point of the social networking is email, the current social networking mode is not limited to enterprise WeChat, QQ, microblog, OA or nailing and other social platforms, and the core of the social network is users participating in the social networking and the relationship among the users. From the angle of enterprise management, the Community discovery (Community Detection) of enterprise employees can help the enterprise to discover the Community structure of the employees in the social network, so that the cooperation mode and the team composition condition of the internal employees are known, the cooperation of the employees can be better promoted, the internal cooperation mode of the enterprise is known, the operation efficiency of the enterprise is improved, and a foundation is also set up for the enterprise Community.
For example, according to patent document CN104077723A, a social network recommendation system and method are disclosed, in which a community discovery method is disclosed, in which data such as user information, an attention list, and a fan list of a social network is used to perform user link strength calculation, so as to implement community discovery. However, for the application to the interior of an enterprise, especially for a large enterprise, it is not practical and suitable for the enterprise employee community to obtain the social account information of each employee, and there is a problem that the community discovery is inaccurate; in addition, the data volume of the interactive data generated by the staff in the work is also large, so that the requirement on the data storage cost is high, and the cost for directly calculating the interactive data is also high.
Disclosure of Invention
The embodiment of the application provides a method, a system, computer equipment and a computer readable storage medium for discovering employee communities in an enterprise, so that more accurate and more complete community discovery can be realized.
In a first aspect, an embodiment of the present application provides a method for discovering an employee community, including:
an interactive sequence obtaining step, namely obtaining session interactive data of a target employee, encrypting the session interactive data, dividing the session interactive data into a plurality of session units according to a preset segmentation unit, and outputting the session units as an employee interactive sequence according to a time sequence;
a sequence model obtaining step, namely constructing and training a sequence model through a sequence modeling method based on the employee interaction sequence; specifically, the sequence model comprises a Word2Vec model and/or a Glove model.
A sequence vector obtaining step, namely obtaining a sequence vector of the staff interaction sequence based on the sequence model;
and a step of discovering the community of the staff, which is to cluster the sequence vectors by using a clustering algorithm to complete community discovery. Specifically, the clustering algorithm is a K-Means algorithm.
Based on the steps, the embodiment of the application realizes the employee community discovery by taking the conversation interaction process of the enterprise employees as basic data, wherein the conversation interaction data comprises the employee cooperation relationship in group chat and the interaction relationship between individuals, so that the employee community can be discovered more accurately and perfectly. In addition, the conversation interaction data is processed into the staff interaction sequence, and the vectorization representation of the staff relation is realized based on the sequence model, so that the storage cost of the original data is greatly reduced. Moreover, the method does not limit the size of the data volume, and even the larger the data volume is, the better the model training effect is, so that the calculation cost of the data is reduced, and the problem that the calculation cost is increased by mass data is avoided.
In some embodiments, the sequence model is a Word2Vec model, and the sequence model obtaining step further includes:
a data preprocessing step, namely converting the staff interaction sequence into staff interaction expectation; wherein, the comma in each line of data in the staff interaction sequence needs to be replaced by a blank.
A dictionary building step, traversing words in the employee interactive expectation to build a dictionary and counting word frequency;
a Huffman tree construction step, wherein a Huffman tree is constructed based on the word frequency;
and a model training step, namely training a CBOW model (Continuous Bag-of-Words) or Skip-Gram model in the Word2Vec model by using the Huffman tree.
Based on the steps, the establishment and training of the sequence model of the embodiment of the application are completed by combining the staff interaction sequence, so that the staff interaction sequence is conveniently subjected to vector representation through the sequence model, and the data storage cost is reduced.
In some embodiments, the method for discovering the employee community further includes:
and a community employee output step, namely outputting the employees clustered into the community based on the query request of the user.
In some embodiments, the method for discovering the employee community further includes:
model iteration step, obtaining increment data of conversation interactive data of a preset increment period, and carrying out iterative training on the sequence model based on the increment data;
and an incremental community discovery step, namely acquiring a sequence vector according to the sequence model and carrying out clustering operation.
Based on the steps, dynamic community discovery is achieved through model iteration based on data increment, the community is rapidly and dynamically updated directly through the model iteration mode under the condition that the data increment is fast, staff configuration is further facilitated by referring to the latest staff community when project preparation is conducted, staff cooperation capacity is improved, and work efficiency is improved.
In a second aspect, an embodiment of the present application provides an employee community discovery system, including:
the interaction sequence acquisition module is used for acquiring conversation interaction data of a target employee, encrypting the conversation interaction data, dividing the conversation interaction data into a plurality of conversation units according to a preset segmentation unit, and outputting the conversation units into an employee interaction sequence according to a time sequence;
the sequence model acquisition module is used for constructing and training a sequence model through a sequence modeling method based on the employee interaction sequence; specifically, the sequence model comprises a Word2Vec model and/or a Glove model.
The sequence vector acquisition module is used for acquiring a sequence vector of the staff interaction sequence based on the sequence model;
and the employee community discovery module is used for clustering the sequence vectors by utilizing a clustering algorithm to complete community discovery. Specifically, the clustering algorithm is a K-Means algorithm.
Based on the structure, the embodiment of the application realizes the employee community discovery by taking the conversation interaction process of the enterprise employees as basic data, and the conversation interaction data comprises the employee cooperation relationship in group chat and the interaction relationship between individuals, so that the employee community discovery is more accurate and perfect. In addition, the conversation interaction data is processed into the staff interaction sequence, and the vectorization representation of the staff relation is realized based on the sequence model, so that the storage cost of the original data is greatly reduced. Moreover, the method does not limit the size of the data volume, and even the larger the data volume is, the better the model training effect is, so that the calculation cost of the data is reduced, and the problem that the calculation cost is increased by mass data is avoided.
In some embodiments, the sequence model is a Word2Vec model, and the sequence model obtaining module further includes:
the data preprocessing module is used for converting the staff interaction sequence into staff interaction expectation; wherein, the comma in each line of data in the staff interaction sequence needs to be replaced by a blank.
The dictionary building module is used for traversing words in the staff interactive prediction to build a dictionary and counting word frequency;
the Huffman tree construction module is used for constructing a Huffman tree based on the word frequency;
and the model training module is used for training a CBOW model or a Skip-Gram model in the Word2Vec model by utilizing the Huffman tree.
Based on the structure, the establishment and training of the sequence model of the embodiment of the application are completed by combining the staff interaction sequence, so that the staff interaction sequence is conveniently subjected to vector representation through the sequence model, and the data storage cost is reduced.
In some embodiments, the above-mentioned employee community discovery system further comprises:
and the community employee output module is used for outputting the employees clustered into the community based on the query request of the user.
In some embodiments, the above-mentioned employee community discovery system further comprises:
the model iteration module is used for acquiring incremental data of session interactive data of a preset incremental period and carrying out iterative training on the sequence model based on the incremental data;
and the increment community discovery module is used for acquiring the sequence vector according to the sequence model and carrying out clustering operation.
Based on the structure, dynamic community discovery is realized through model iteration based on data increment, the community is rapidly and dynamically updated directly through the model iteration mode under the condition of fast data increment, staff configuration is further facilitated by referring to the latest staff community when project preparation is carried out, the staff cooperation capacity is improved, and the work efficiency is improved.
In a third aspect, an embodiment of the present application provides a computer device, which includes a memory, a processor, and a computer program stored on the memory and executable on the processor, and when the processor executes the computer program, the employee community discovery method according to the first aspect is implemented.
In a fourth aspect, an embodiment of the present application provides a computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements the employee community discovery method according to the first aspect.
Compared with the related technologies, the employee community discovery method, the system, the computer device and the computer readable storage medium provided by the embodiment of the application particularly relate to a marketing intelligent technology, and the employee community discovery is realized in an encryption mode, so that the data security and privacy are effectively protected; by means of vector representation of session interaction data, the problems of high data storage cost and high calculation cost under the premise of large data and data increment in the current big data environment are solved, and the data storage cost and the calculation cost are effectively reduced.
The details of one or more embodiments of the application are set forth in the accompanying drawings and the description below to provide a more thorough understanding of the application.
Drawings
The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the application and together with the description serve to explain the application and not to limit the application. In the drawings:
FIG. 1 is a flow diagram of a method for employee community discovery in accordance with an embodiment of the present application;
FIG. 2 is a preferred flow diagram of a method for employee community discovery in accordance with an embodiment of the present application;
FIG. 3 is a flow diagram of a method for employee community discovery in accordance with a preferred embodiment of the present application;
FIG. 4 is a diagram of conversational interaction data, according to a preferred embodiment of the present application;
FIG. 5 is a schematic diagram of an employee interaction sequence in accordance with a preferred embodiment of the present application;
FIG. 6 is a schematic diagram of a sequence vector according to a preferred embodiment of the present application;
FIG. 7 is a diagram illustrating employee community clustering results in accordance with a preferred embodiment of the present application;
FIG. 8 is a schematic diagram illustrating the principle of the steps of the employee community discovery method according to the preferred embodiment of the present application;
FIG. 9 is a block diagram of a system for community discovery of employees according to an embodiment of the present application;
FIG. 10 is a block diagram of a preferred architecture of a system for community discovery of employees according to an embodiment of the present application.
Wherein:
1. an interactive sequence acquisition module; 2. a sequence model acquisition module; 3. a sequence vector acquisition module;
4. a staff community discovery module; 5. a community employee output module; 6. a model iteration module;
7. an incremental community discovery module; 201. a data preprocessing module; 202. a dictionary construction module;
203. a Huffman tree construction module; 204. and a model training module.
Detailed Description
In order to make the objects, technical solutions and advantages of the present application more apparent, the present application will be described and illustrated below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application. All other embodiments obtained by a person of ordinary skill in the art based on the embodiments provided in the present application without any inventive step are within the scope of protection of the present application.
It is obvious that the drawings in the following description are only examples or embodiments of the present application, and that it is also possible for a person skilled in the art to apply the present application to other similar contexts on the basis of these drawings without inventive effort. Moreover, it should be appreciated that in the development of any such actual implementation, as in any engineering or design project, numerous implementation-specific decisions must be made to achieve the developers' specific goals, such as compliance with system-related and business-related constraints, which may vary from one implementation to another.
Reference in the specification to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment can be included in at least one embodiment of the specification. The appearances of the phrase in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. Those of ordinary skill in the art will explicitly and implicitly appreciate that the embodiments described herein may be combined with other embodiments without conflict.
Unless defined otherwise, technical or scientific terms referred to herein shall have the ordinary meaning as understood by those of ordinary skill in the art to which this application belongs. Reference to "a," "an," "the," and similar words throughout this application are not to be construed as limiting in number, and may refer to the singular or the plural. The present application is directed to the use of the terms "including," "comprising," "having," and any variations thereof, which are intended to cover non-exclusive inclusions; for example, a process, method, system, article, or apparatus that comprises a list of steps or modules (elements) is not limited to the listed steps or elements, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus. Reference to "connected," "coupled," and the like in this application is not intended to be limited to physical or mechanical connections, but may include electrical connections, whether direct or indirect. The term "plurality" as referred to herein means two or more. "and/or" describes an association relationship of associated objects, meaning that three relationships may exist, for example, "A and/or B" may mean: a exists alone, A and B exist simultaneously, and B exists alone. The character "/" generally indicates that the former and latter associated objects are in an "or" relationship. Reference herein to the terms "first," "second," "third," and the like, are merely to distinguish similar objects and do not denote a particular ordering for the objects.
Implicit interaction: the extraction of implicit interactions comes from the "mention" ("@") and "forward" behavior of the user. When there is an interaction ("mention" or "forward") behavior between users, the probability of establishing an association between users will increase.
The employee community discovery method is provided for achieving employee community discovery based on massive employee session interaction data, overcoming the problem that data size and data increment are large, achieving reduction of data storage cost and calculation cost, and considering the situation that the employee session interaction data relate to data safety and privacy.
The embodiment provides a method for discovering a community of employees. Fig. 1 is a flowchart of an employee community discovery method according to an embodiment of the present application, and as shown in fig. 1, the flowchart includes the following steps:
an interaction sequence obtaining step S1, obtaining conversation interaction data of a target employee, encrypting the conversation interaction data, dividing the conversation interaction data into a plurality of conversation units according to a preset segmentation unit, and outputting the conversation units as an employee interaction sequence according to a time sequence; specifically, the encryption processing of the session interaction data is specifically to perform anonymous ID processing on the employee name, for example and without limitation, for example, based on an MD5(Message-Digest Algorithm) encryption Algorithm, and the MD5 encryption Algorithm is a one-way encryption Algorithm, so as to effectively implement privacy protection and data security. Optionally, the preset splitting unit may be day, week, month, year, and the like, and the embodiment of the application supports splitting of the session unit for the single chat data and the group chat data in the session interaction data in the same preset splitting unit or different preset splitting units. Based on this, the data base of the embodiment includes not only the staff cooperative relationship in the group chat but also the interaction relationship between individuals, which is helpful for more accurately and perfectly discovering the staff community.
A sequence model obtaining step S2, constructing and training a sequence model through a sequence modeling method based on the employee interaction sequence; specifically, the sequence model includes a Word2Vec model and/or a Glove model.
A sequence vector obtaining step S3, obtaining a sequence vector of the staff interaction sequence based on the sequence model; the sequence vector obtained based on this step may be a dense vector expressed as 32 bits according to the staff.
And an employee community discovery step S4, clustering the sequence vectors by using a clustering algorithm, and completing community discovery. Optionally, the clustering algorithm is a K-Means algorithm, and may also be other clustering algorithms.
And an employee community output step S5, wherein the employees clustered into the community are output based on the query request of the user. Notably, to protect data security and privacy, the employee is exported as an anonymous ID, but the use of downstream traffic supports anonymous ID to name conversion based on the employee dictionary repository.
Based on the steps, the embodiment of the application realizes the employee community discovery by taking the conversation interaction process of the enterprise employees as basic data, wherein the conversation interaction data comprises the employee cooperation relationship in group chat and the interaction relationship between individuals, so that the employee community can be discovered more accurately and perfectly. In addition, the conversation interaction data is processed into the staff interaction sequence, and the vectorization representation of the staff relation is realized based on the sequence model, so that the storage cost of the original data is greatly reduced. Moreover, the method does not limit the size of the data volume, and even the larger the data volume is, the better the model training effect is, so that the calculation cost of the data is reduced, and the problem that the calculation cost is increased by mass data is avoided.
In some embodiments, the sequence model is a Word2Vec model, and the sequence model obtaining step S2 further includes:
a data preprocessing step S201, converting the staff interaction sequence into staff interaction expectation, wherein a comma in each line of data in the staff interaction sequence needs to be replaced by a blank.
And a dictionary building step S202, traversing the text in the staff interactive expectation, finding out all the appeared words to build a dictionary, and counting the appearance frequency of each word to realize word frequency statistics.
In step S203, a huffman tree is constructed based on the word frequency. In the embodiment of the application, the Hoffman tree is adopted to replace neurons of the hidden layer and the output layer, leaf nodes of the Hoffman tree play a role of neurons of the output layer, the number of the leaf nodes is the size of a vocabulary, and internal nodes play a role of hiding the neurons of the output layer.
And a model training step S204, training and training a CBOW model or a Skip-Gram model in the Word2Vec model by using the Huffman tree obtained in the step S203.
Based on the steps, the embodiment is based on the characteristics of high universality and high efficiency of the Word2Vec model, the sequence model is built by adopting the Word2Vec model, and the establishment and training of the sequence model in the embodiment are completed by combining the staff interaction sequence, so that the staff interaction sequence is conveniently subjected to vector representation through the sequence model, and the data storage cost is reduced.
The embodiment also provides a method for discovering the employee community. Fig. 2 is a preferred flowchart of the employee community discovery method according to an embodiment of the present application, and as shown in fig. 2, the flowchart includes the following steps in addition to the steps of the above embodiment:
model iteration step S6, obtaining incremental data of conversation interactive data of a preset incremental period, and performing iterative training on the sequence model based on the incremental data;
and an incremental community finding step S7, acquiring sequence vectors according to the sequence model and carrying out clustering operation.
Based on the steps, dynamic community discovery is achieved through model iteration based on data increment, the community is rapidly and dynamically updated directly through the model iteration mode under the condition that the data increment is fast, staff configuration is further facilitated by referring to the latest staff community when project preparation is conducted, staff cooperation capacity is improved, and work efficiency is improved.
The embodiments of the present application are described and illustrated below by means of preferred embodiments.
Fig. 3 is a flowchart of an employee community discovery method according to a preferred embodiment of the present application, and as shown in fig. 3, the employee community discovery method includes the following steps:
staff interaction sequence generation S301: obtaining employee conversation interaction data, as shown in fig. 4, which includes implicit interaction between employees, performing anonymous ID processing on employee names by using an MD5 encryption algorithm, segmenting single chat data and group chat data of the conversation interaction data into conversation units by using the day as a preset segmentation unit, and then generating an employee interaction sequence from the conversation units according to a time sequence, where a specific example of the employee interaction sequence is shown in fig. 5.
Employee interaction sequence modeling S302: sequence modeling is performed based on the staff interaction sequence data, and the sequence modeling can be completed by a sequence modeling method based on Word2Vec, Glove and the like, preferably using a Word2Vec model in the embodiment, and the specific steps are as follows:
firstly, processing the staff interaction sequence data into a corpus form, and specifically, removing commas in each row of data in the data and replacing the commas with spaces.
Then, a dictionary is constructed, and the word frequency is counted. Specifically, it is necessary to traverse all texts once, find out all the appeared words, and count the appearance frequency of each word:
subsequently, a tree structure is constructed: constructing a Huffman tree according to the occurrence frequency of each word;
and finally, training an intermediate vector and a Word vector model based on the Huffman tree, and completing the CBOW or Skip-Gram model of the Word2Vec model to complete the training.
Employee digital representation S303: and completing vector representation of the staff based on the sequence model to obtain a sequence vector of staff interaction sequences, and specifically representing each staff according to a dense vector of which the staff is represented into 32 bits, as shown in fig. 6.
Clustering is done using the K-Means algorithm S304: and (4) finishing the discovery of the community by using a clustering algorithm, and finishing the debugging and verification of the K-Means algorithm by using the dense vector based on the 32-dimensional dense vector obtained in the step S303. Specifically, as shown in fig. 7, the numbers in the figure represent several communities partitioned by using a clustering algorithm.
Output of employees clustered into communities S305: and outputting the employees clustered into the community according to the query request.
Data increment and model iteration S306: and starting a model training process and subsequent digital representation of the employees directly in a model iteration mode through incremental data of every day or fixed days, and then performing clustering operation through a clustering algorithm to complete final community discovery. Referring to fig. 8, in step S306, the whole process is to continuously process incremental data, iterate a model, reconstruct digital representation of the employee, and complete clustering operation. Therefore, model iteration based on incremental data of the enterprise employee session data is achieved, and dynamic community discovery is achieved through data increment.
It should be noted that the steps illustrated in the above-described flow diagrams or in the flow diagrams of the figures may be performed in a computer system, such as a set of computer-executable instructions, and that, although a logical order is illustrated in the flow diagrams, in some cases, the steps illustrated or described may be performed in an order different than here.
The embodiment also provides a system for discovering a community of employees, which is used for implementing the above embodiments and preferred embodiments, and the description of the system is omitted. As used hereinafter, the terms "module," "unit," "subunit," and the like may implement a combination of software and/or hardware for a predetermined function. While the system described in the embodiments below is preferably implemented in software, implementations in hardware, or a combination of software and hardware are also possible and contemplated.
Fig. 9 is a block diagram of a structure of an employee community discovery system according to an embodiment of the present application, and as shown in fig. 9, the system includes:
the interaction sequence acquisition module 1 is used for acquiring conversation interaction data of a target employee, encrypting the conversation interaction data, dividing the conversation interaction data into a plurality of conversation units according to a preset segmentation unit, and outputting the conversation units into an employee interaction sequence according to a time sequence; specifically, the encryption processing of the session interaction data is specifically to perform anonymous ID processing on the employee name, for example and without limitation, based on an MD5 encryption algorithm, so as to effectively implement privacy protection and data security. Optionally, the preset splitting unit may be day, week, month, year, and the like, and the embodiment of the application supports splitting of the session unit for the single chat data and the group chat data in the session interaction data in the same preset splitting unit or different preset splitting units.
The sequence model acquisition module 2 is used for constructing and training a sequence model by a sequence modeling method based on the employee interaction sequence; specifically, the sequence model includes a Word2Vec model and/or a Glove model. Optionally, the sequence model is a Word2Vec model, and the sequence model obtaining module 2 further includes: the data preprocessing module 201 is used for converting the staff interaction sequence into staff interaction expectation; wherein, the comma in each line of data in the staff interaction sequence needs to be replaced by a blank. And the dictionary building module 202 is used for traversing the words in the staff interaction expectation to build a dictionary and counting word frequency. And a Huffman tree constructing module 203 for constructing a Huffman tree based on the word frequency. And the model training module 204 is used for training a CBOW model or a Skip-Gram model in the Word2Vec model by utilizing the Huffman tree. Based on this, the establishment and training of the sequence model of the embodiment of the application are completed by combining the staff interaction sequence, so that the staff interaction sequence is conveniently subjected to vector representation through the sequence model, and the data storage cost is reduced.
The sequence vector obtaining module 3 obtains the sequence vector of the staff interaction sequence based on the sequence model, and the sequence vector obtained based on the module can be a dense vector expressed as 32 bits according to the staff.
And the employee community discovery module 4 is used for clustering the sequence vectors by using a clustering algorithm to complete community discovery. Specifically, the clustering algorithm is a K-Means algorithm.
And the community employee output module 5 is used for outputting the employees clustered into the community based on the query request of the user.
Based on the structure, the embodiment of the application realizes the employee community discovery by taking the conversation interaction process of the enterprise employees as basic data, and the conversation interaction data comprises the employee cooperation relationship in group chat and the interaction relationship between individuals, so that the employee community discovery is more accurate and perfect. In addition, the conversation interaction data is processed into the staff interaction sequence, and the vectorization representation of the staff relation is realized based on the sequence model, so that the storage cost of the original data is greatly reduced. Moreover, the method does not limit the size of the data volume, and even the larger the data volume is, the better the model training effect is, so that the calculation cost of the data is reduced, and the problem that the calculation cost is increased by mass data is avoided.
Fig. 10 is a block diagram of a preferred structure of an employee community discovery system according to an embodiment of the present application, and as shown in fig. 10, the system includes all the modules shown in fig. 9, and further includes:
the model iteration module 6 is used for acquiring incremental data of session interactive data of a preset incremental period and carrying out iterative training on the sequence model based on the incremental data;
and the increment community discovery module 7 is used for acquiring the sequence vector according to the sequence model and carrying out clustering operation.
Based on the structure, dynamic community discovery is realized through model iteration based on data increment, the community is rapidly and dynamically updated directly through the model iteration mode under the condition of fast data increment, staff configuration is further facilitated by referring to the latest staff community when project preparation is carried out, the staff cooperation capacity is improved, and the work efficiency is improved.
The above modules may be functional modules or program modules, and may be implemented by software or hardware. For a module implemented by hardware, the modules may be located in the same processor; or the modules can be respectively positioned in different processors in any combination.
In addition, the employee community discovery method described in connection with fig. 1-2 in the embodiments of the present application may be implemented by a computer device. The computer device may include a processor and a memory storing computer program instructions. In particular, the processor may include a Central Processing Unit (CPU), or A Specific Integrated Circuit (ASIC), or may be configured to implement one or more Integrated circuits of the embodiments of the present Application.
The memory may include, among other things, mass storage for data or instructions. By way of example, and not limitation, memory may include a Hard Disk Drive (Hard Disk Drive, abbreviated to HDD), a floppy Disk Drive, a Solid State Drive (SSD), flash memory, an optical Disk, a magneto-optical Disk, tape, or a Universal Serial Bus (USB) Drive or a combination of two or more of these. The memory may include removable or non-removable (or fixed) media, where appropriate. The memory may be internal or external to the data processing apparatus, where appropriate. In a particular embodiment, the memory is a Non-Volatile (Non-Volatile) memory. In particular embodiments, the Memory includes Read-Only Memory (ROM) and Random Access Memory (RAM). The ROM may be mask-programmed ROM, Programmable ROM (PROM), Erasable PROM (EPROM), Electrically Erasable PROM (EEPROM), Electrically rewritable ROM (earrom), or FLASH Memory (FLASH), or a combination of two or more of these, where appropriate. The RAM may be a Static Random-Access Memory (SRAM) or a Dynamic Random-Access Memory (DRAM), where the DRAM may be a Fast Page Mode Dynamic Random-Access Memory (FPMDRAM), an Extended data output Dynamic Random-Access Memory (EDODRAM), a Synchronous Dynamic Random-Access Memory (SDRAM), and the like.
The memory may be used to store or cache various data files for processing and/or communication use, as well as possibly computer program instructions for execution by the processor.
The processor may be configured to read and execute the computer program instructions stored in the memory to implement any one of the employee community discovery methods in the above embodiments.
In addition, in combination with the employee community discovery method in the foregoing embodiment, the embodiment of the present application may provide a computer-readable storage medium to implement. The computer readable storage medium having stored thereon computer program instructions; the computer program instructions, when executed by a processor, implement any of the employee community discovery methods of the above embodiments.
The technical features of the embodiments described above may be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the embodiments described above are not described, but should be considered as being within the scope of the present specification as long as there is no contradiction between the combinations of the technical features.
The above-mentioned embodiments only express several embodiments of the present application, and the description thereof is more specific and detailed, but not construed as limiting the scope of the invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the concept of the present application, which falls within the scope of protection of the present application. Therefore, the protection scope of the present patent shall be subject to the appended claims.
- 上一篇:石墨接头机器人自动装卡簧、装栓机
- 下一篇:非结构化文本事件抽取方法