Robot personalized emotion interaction device and method based on multi-mode knowledge graph
1. A robot personalized emotion interaction device based on a multi-modal knowledge graph is characterized by comprising: the system comprises a multi-mode knowledge graph, a user registration module, a state recognition module, an interactive implementation module, a knowledge updating module and the like.
The multimodal knowledge graph is used to store information for each registered user, including user ID, facial features, voice preferences, emotional strategies.
The user registration module is used for registering a new user and comprises sub-modules: user ID generation, face feature acquisition, sound preference acquisition and emotion strategy initialization.
The state identification module is used for identifying the identity and emotion of the current user and comprises sub-modules: face recognition and emotion recognition.
The interactive implementation module is used for generating an implementation scheme interactive with the current user personalized feelings and issuing the implementation scheme to the robot for execution, and comprises sub-modules: personalized voice synthesis and emotion interaction.
The knowledge updating module is used for updating the information of the registered user in the multi-modal knowledge graph, and comprises sub-modules: user feedback collection, emotion strategy updating and sound preference updating.
2. The multi-modal knowledge graph-based personalized emotion interaction apparatus of a home service robot in accordance with claim 1, wherein the emotion policy of the user comprises a feedback action, an answer sentence corresponding to each preset emotion.
3. The multi-modal knowledge graph-based personalized emotion interaction apparatus for home service robots as claimed in claim 2, wherein in the user registration module, the emotion policy is initialized as follows: an initialized emotion policy is generated using a randomization method.
4. The multi-modal knowledge graph-based personalized emotion interaction device for home service robots as claimed in claim 2, wherein in the interaction implementation module, personalized speech synthesis is performed as follows: and synthesizing the corresponding style of sound when interacting with the voice of the current user by utilizing the sound preference of the current user stored by the multi-mode knowledge map.
5. The multi-modal knowledge graph-based personalized emotion interaction device for the family service robot as claimed in claim 2, wherein in the interaction implementation module, the emotion interaction is: and inquiring the multi-mode knowledge graph to acquire the emotion strategy of the current user under the emotion by using the emotion of the current user recognized by the state recognition module, so that the robot executes feedback action, and synthesizes sound of a user preference style to broadcast answer sentences by the robot.
6. The multi-modal knowledge-graph-based personalized emotion interaction apparatus for home service robots as claimed in claim 2, wherein in the knowledge update module, the user feedback comprises satisfaction or dissatisfaction; wherein, the emotion strategy is updated as follows: and randomly generating a new feedback action or answer sentence replacement for the feedback action or answer sentence which is not satisfied by the user, and updating the multi-mode knowledge map. The sound preference is updated as: and for the sound style which is not satisfied by the user, reselecting a preset sound style by the user for replacement, and updating the multi-mode knowledge map.
7. The robot personalized emotion interaction method based on the multi-modal knowledge graph and based on the device of claim 1 is characterized by comprising the following steps:
(1) the user registration module generates a user ID for a new user, then collects face features and voice preferences of the new user, then generates an initialized emotion strategy for the new user, and finally records the user ID, the face features, the voice preferences and the emotion strategy of the new user into the multi-mode knowledge map.
(2) The state recognition module calls face recognition to obtain face features of the current user, and then the face features of the current user are matched with the face features of the registered user stored in the multi-mode knowledge graph; if the registered user is not matched, prompting the current user to register first, and returning to the step (1); and (4) if the registered user is matched, inquiring the user ID of the current user, and entering the step (3).
(3) The state identification module calls emotion identification to try to identify the emotion of the current user; if the known emotion is identified, transmitting the emotion identification result and the user ID to an interactive implementation module, and entering the step (4); if no known emotion is identified, this step is repeated until a known emotion is identified.
(4) The interaction implementation module queries the multi-mode knowledge graph according to the emotion of the current user identified in the step (3) to obtain an emotion strategy of the current user corresponding to the emotion, and simultaneously queries the multi-mode knowledge graph according to the user ID to obtain the voice preference of the current user; next, the feedback action in the emotion strategy is issued to the robot for execution by emotion interaction; then, the emotion interaction transmits answer sentences in the emotion strategy to personalized voice synthesis; and the personalized voice synthesis uses the answer sentences and the voice preference synthesis voice to be sent to the robot for broadcasting.
(5) The knowledge updating module collects feedback of the user on personalized interaction by using a user feedback collecting function, wherein the feedback comprises feedback of the user on an emotion interaction result and feedback of the user on a current voice style; the emotion strategy updating updates the emotion strategy of the corresponding emotion of the user in the multi-mode knowledge map according to the feedback result of the user on emotion interaction, wherein the emotion strategy comprises feedback action updating and answer sentence updating; the voice preference update updates the user's voice preferences stored in the multimodal knowledge map based on the current user's feedback of the current voice style.
Background
The home service robot is a service robot that works autonomously or semi-autonomously in a home environment, and can provide services such as companions, entertainment, education, home assistance, cleaning, and the like to a person. How to enable the family service robot to interact with people more intelligently is an important problem which needs to be solved urgently in the technical field, and an important aspect of intelligent interaction of the family service robot is that each member in a family can obtain personalized emotional interaction experience.
The existing family service robot products and published research results of family service robots do not independently relate to or pay attention to the individualized emotional interaction experience of each member in a family.
Disclosure of Invention
The invention aims to provide a family service robot personalized emotion interaction device and method based on a multi-mode knowledge graph, aiming at the defects of the prior art. The invention provides personalized emotional interaction for each user in the family by using knowledge stored in the knowledge map.
The purpose of the invention is realized by the following technical scheme: a robot personalized emotion interaction device based on a multi-modal knowledge graph comprises: the system comprises a multi-mode knowledge graph, a user registration module, a state recognition module, an interactive implementation module, a knowledge updating module and the like.
The multimodal knowledge graph is used to store information for each registered user, including user ID, facial features, voice preferences, emotional strategies.
The user registration module is used for registering a new user and comprises sub-modules: user ID generation, face feature acquisition, sound preference acquisition and emotion strategy initialization.
The state identification module is used for identifying the identity and emotion of the current user and comprises sub-modules: face recognition and emotion recognition.
The interactive implementation module is used for generating an implementation scheme interactive with the current user personalized feelings and issuing the implementation scheme to the robot for execution, and comprises sub-modules: personalized voice synthesis and emotion interaction.
The knowledge updating module is used for updating the information of the registered user in the multi-modal knowledge graph, and comprises sub-modules: user feedback collection, emotion strategy updating and sound preference updating.
Further, the emotion strategy of the user comprises a feedback action and an answer sentence corresponding to each preset emotion.
Further, in the user registration module, the emotion policy is initialized as follows: an initialized emotion policy is generated using a randomization method.
Further, in the interaction implementation module, personalized speech synthesis is as follows: and synthesizing the corresponding style of sound when interacting with the voice of the current user by utilizing the sound preference of the current user stored by the multi-mode knowledge map.
Further, in the interaction implementation module, the emotional interaction is as follows: and inquiring the multi-mode knowledge graph to acquire the emotion strategy of the current user under the emotion by using the emotion of the current user recognized by the state recognition module, so that the robot executes feedback action, and synthesizes sound of a user preference style to broadcast answer sentences by the robot.
Further, in the knowledge updating module, the user feedback comprises satisfaction or dissatisfaction; wherein, the emotion strategy is updated as follows: and randomly generating a new feedback action or answer sentence replacement for the feedback action or answer sentence which is not satisfied by the user, and updating the multi-mode knowledge map. The sound preference is updated as: and for the sound style which is not satisfied by the user, reselecting a preset sound style by the user for replacement, and updating the multi-mode knowledge map.
A robot personalized emotion interaction method based on a multi-mode knowledge graph based on the device comprises the following steps:
(1) the user registration module generates a user ID for a new user, then collects face features and voice preferences of the new user, then generates an initialized emotion strategy for the new user, and finally records the user ID, the face features, the voice preferences and the emotion strategy of the new user into the multi-mode knowledge map.
(2) The state recognition module calls face recognition to obtain face features of the current user, and then the face features of the current user are matched with the face features of the registered user stored in the multi-mode knowledge graph; if the registered user is not matched, prompting the current user to register first, and returning to the step (1); and (4) if the registered user is matched, inquiring the user ID of the current user, and entering the step (3).
(3) The state identification module calls emotion identification to try to identify the emotion of the current user; if the known emotion is identified, transmitting the emotion identification result and the user ID to an interactive implementation module, and entering the step (4); if no known emotion is identified, this step is repeated until a known emotion is identified.
(4) The interaction implementation module queries the multi-mode knowledge graph according to the emotion of the current user identified in the step (3) to obtain an emotion strategy of the current user corresponding to the emotion, and simultaneously queries the multi-mode knowledge graph according to the user ID to obtain the voice preference of the current user; next, the feedback action in the emotion strategy is issued to the robot for execution by emotion interaction; then, the emotion interaction transmits answer sentences in the emotion strategy to personalized voice synthesis; and the personalized voice synthesis uses the answer sentences and the voice preference synthesis voice to be sent to the robot for broadcasting.
(5) The knowledge updating module collects feedback of the user on personalized interaction by using a user feedback collecting function, wherein the feedback comprises feedback of the user on an emotion interaction result and feedback of the user on a current voice style; the emotion strategy updating updates the emotion strategy of the corresponding emotion of the user in the multi-mode knowledge map according to the feedback result of the user on emotion interaction, wherein the emotion strategy comprises feedback action updating and answer sentence updating; the voice preference update updates the user's voice preferences stored in the multimodal knowledge map based on the current user's feedback of the current voice style.
Compared with the prior art, the invention has the beneficial effects that: the invention considers the use of the multi-mode knowledge map to express and store the personalized information of each user, and can realize that the family service robot provides personalized emotion interaction experience for each member in the family, so that the interaction between the family service robot and people is more intelligent.
Drawings
FIG. 1 is a schematic diagram of a modular architecture of a robot personalized emotion interaction device based on a multi-modal knowledge graph;
FIG. 2 is a diagram of the knowledge structure of a multimodal knowledge map.
Detailed Description
The technical solution proposed by the present invention is specifically described below with reference to the accompanying drawings and examples.
As shown in FIG. 1, the family service robot personalized emotion interaction device based on the multi-modal knowledge graph comprises five main parts, namely the multi-modal knowledge graph, a user registration module, a state recognition module, an interaction implementation module and a knowledge updating module.
As shown in FIG. 2, the multimodal knowledge graph stores the user ID, facial features, voice preferences for each registered user, and the emotional policy for that user. The user ID is a unique identification of the user and is used for distinguishing different users. The face features are extracted by using a face recognition algorithm and are used for recognizing the identity of the current user. The voice preference is one of a plurality of voice genres preset, for example, "boy voice", for synthesizing a corresponding style of voice when interacting with a user's voice. Each emotion policy specifically includes an emotion of the user and a feedback action and an answer sentence corresponding to the emotion, for example, the emotion is "depressed", the corresponding feedback action is "right-hand raised", and the answer sentence "works together", so as to provide an emotion interaction scheme for the interaction implementation module.
The user registration module is used for registering a new user and mainly comprises sub-modules: user ID generation, face feature acquisition, sound preference acquisition and emotion strategy initialization. The user registration module firstly generates a user ID for a new user; in this embodiment, a UUID generation algorithm is used. Then, collecting the face characteristics of a new user; the embodiment uses the FaceNet algorithm. The new user's voice preferences are then collected, allowing the user to select a best preference among a plurality of options for voice styles, which are preset. Then generating an initialized emotion strategy for a new user, wherein the initialized emotion strategy is generated by using a randomization method in the embodiment, that is, for each preset emotion, one of a plurality of preset feedback actions corresponding to the emotion is randomly selected as a feedback action in the emotion strategy, and one of a plurality of preset answer sentences corresponding to the emotion is randomly selected as an answer sentence in the emotion strategy, so that the initialized emotion strategy is generated for each preset emotion; and finally, the user registration module inputs the user ID, the face characteristics, the voice preference and the emotion strategy obtained in the step into the multi-modal knowledge map in a structure shown in figure 2.
The state identification module is used for identifying the identity and emotion of the current user and mainly comprises sub-modules: face recognition and emotion recognition. The face recognition is used for recognizing the identity of the current user, and specifically comprises the following steps: and extracting the face features of the current user by using a face recognition algorithm, and matching the face features of the current user with the face features of each user stored in the multi-mode knowledge graph so as to judge whether the current user is a registered user, and if the current user is the registered user, further judging which registered user the current user is. In this embodiment, the face recognition uses a FaceNet algorithm to extract the face features of the user. The emotion recognition uses a facial expression recognition algorithm to recognize the emotion of the current user for an interactive implementation module to use; the present embodiment recognizes facial expressions using a deep convolutional neural network.
The interactive implementation module is used for generating an implementation scheme interactive with the current user personalized feelings and issuing the implementation scheme to the robot control system for execution, and mainly comprises sub-modules: personalized voice synthesis and emotion interaction. Personalized speech synthesis utilizes the voice preference of the current user stored by the multi-mode knowledge graph to synthesize the voice with corresponding style when interacting with the current user; in the embodiment, each preset voice style has a corresponding preset voice library, and the corresponding voice library is selected for synthesizing the voice with the corresponding style according to the voice preference of the user during voice synthesis. The emotion interaction utilizes the emotion of the current user identified by the state identification module, queries a multi-mode knowledge map to acquire an emotion strategy of the current user under the emotion, and utilizes feedback actions and answer sentences in the emotion strategy to interact with the current user; specifically, the feedback action in the emotion strategy is issued to the robot for execution, and the text of the answer sentence in the emotion strategy is subjected to voice synthesis by using personalized voice synthesis and is issued to the robot for broadcasting.
The knowledge updating module is used for updating the information of the registered user in the multi-modal knowledge graph, and mainly comprises sub-modules: user feedback collection, emotion strategy updating and sound preference updating. The user feedback acquisition is used for acquiring feedback of the current user on the emotion interaction result and the feedback of the voice style; in this embodiment, after the robot completes the emotional interaction by the interaction implementation module each time, the user feedback collection asks the user whether the feedback action, the voice style, and the answer sentence are satisfied through the screen, and the user can select "satisfied" or "dissatisfied" for each item. Updating the emotion strategy of the user under the corresponding emotion stored in the multi-mode knowledge map according to the feedback of the current user on emotion interaction; in the embodiment, if the user selects that the feedback action is not satisfied, the emotion strategy updates and randomly selects another preset feedback action, and updates the feedback action in the corresponding emotion strategy of the user in the multi-modal knowledge map; and if the user selects that the answer sentence is not satisfied, the emotion strategy updating randomly selects another preset answer sentence, and updates the answer sentence in the corresponding emotion strategy of the user in the multi-mode knowledge map. Updating the voice preference of the user stored in the multi-mode knowledge map according to the feedback of the current user to the current voice style by the voice preference updating; in this embodiment, if the user selects that the current sound style is not satisfied, the sound preference update gives all options of the preset sound styles, so that the user selects one of the sound styles and updates the sound preference of the user in the multimodal knowledge base map.
The invention discloses a method for realizing personalized emotion interaction of a family service robot based on a multi-mode knowledge graph, which comprises the following steps of:
(1) the user registration module generates a user ID for a new user, then collects the face characteristics and the voice preference of the user, then generates an initialized emotion strategy for the user, and finally records the user ID and the face characteristics, the voice preference and the emotion strategy of the user into a multi-mode knowledge graph.
(2) The state recognition module calls face recognition to obtain face features of the current user, and then the face features are matched with face features of registered users stored in the multi-modal knowledge map; if the registered user is not matched, prompting the user to register first, and returning to the step (1); if the registered user is matched, inquiring the user ID of the user, and entering the step (3).
(3) The state identification module calls emotion identification to try to identify the emotion of the current user; if the known emotion is identified, transmitting the emotion identification result and the user ID to an interactive implementation module, and entering the step (4); if no known emotion is identified, this step is repeated until a known emotion is identified. Preferably, the identification is performed every 1 second.
(4) The interaction implementation module queries the multi-mode knowledge graph according to the emotion of the current user identified in the last step to obtain an emotion strategy of the current user corresponding to the emotion, and simultaneously queries the multi-mode knowledge graph according to the user ID to obtain the voice preference of the user; next, the feedback action in the emotion strategy is issued to the robot for execution by emotion interaction; then, the emotion interaction transmits answer sentences in the emotion strategy to personalized voice synthesis; and the personalized voice synthesis uses the answer sentences and the voice preference synthesis voice to be sent to the robot for broadcasting.
(5) The knowledge updating module collects feedback of the user on personalized interaction by using a user feedback collecting function, wherein the feedback comprises feedback of the user on an emotion interaction result and feedback of the user on a current voice style; the emotion strategy updating updates the emotion strategy of the corresponding emotion of the user in the multi-mode knowledge map according to the feedback result of the user on emotion interaction, wherein the emotion strategy comprises feedback action updating and answer sentence updating; the voice preference update updates the user's voice preferences stored in the multimodal knowledge map based on the current user's feedback of the current voice style.
The following describes in detail the workflow of each component in the device during the process of completing personalized interaction by the device according to an embodiment of the present invention. Assuming that the family member "zhang san" has not yet registered on the robot, when he uses the robot, the workflow of each component in the device is as follows:
(1) when Zhangsan comes in front of the robot, the state recognition module calls a human face to recognize and extract human face features of Zhangsan and then matches the human face features of registered users stored in the multi-mode knowledge graph; the state identification module prompts that Zhang III is registered first because Zhang III is not registered and no matching result exists;
(2) the user registration module generates a user ID for Zhang III by using the user ID; then, the facial feature collection uses a faceNet algorithm to extract the facial features of Zhang III; then the voice preference collection prompt "zhang san" selects one of a plurality of preset voice styles, such as "boy voice"; next, the emotion strategy is initialized, for each preset emotion, the emotion strategy is initialized by 'three-out' random generation, for example, the feedback action of emotion 'depression' is used as 'right hand lifting', and the answer sentence is 'jolting'; finally, the user registration module records the user ID of Zhang III, the face characteristics, the sound preference and the emotion strategy into the multi-mode knowledge map;
(3) when Zhangsan comes to the robot again, the state recognition module calls a face to recognize and extract the face features of Zhangsan, then the face features are matched with the face features of the registered user stored in the multi-mode knowledge graph, and the face features are matched with the face features of Zhangsan stored in the multi-mode knowledge graph, so that the face recognition judges that the current user is Zhangsan, and inquires out the user ID of Zhangsan;
(4) the state identification module calls emotion identification to continuously try to identify the emotion of Zhang III until a preset emotion is identified, for example, when the emotion of Depression is identified, the user ID of Zhang III and the emotion of Depression are transmitted to the interaction implementation module;
(5) the interactive implementation module queries the multi-mode knowledge graph according to the emotion 'depression' and 'Zhang three' user IDs identified in the last step to obtain an emotion strategy that 'Zhang three' corresponds to the emotion 'depression', such as a feedback action 'lifting the right hand', an answer sentence 'shaking up', and a voice preference of 'Zhang three', such as 'boy voice', by querying the multi-mode knowledge graph according to the user IDs; next, emotional interaction issues a feedback action of lifting the right hand to the robot to execute; then emotional interaction transmits the answer sentence 'vibration' and the voice preference 'boy voice' to personalized voice synthesis; the personalized voice synthesis uses the answer sentence 'vibration' and the voice preference 'man voice' to synthesize voice, and sends the voice to the robot for broadcasting; finally, the interactive implementation module transmits the user ID and the emotion depression of Zhang III to the knowledge updating module;
(6) the knowledge updating module uses a user feedback collection inquiry 'Zhang III' to judge whether the personalized interaction of the last step is satisfied, and specifically comprises the feedback action of emotion 'depression', answer sentences and the satisfaction of the current voice style; if the 'Zhang III' selection is not satisfied with the feedback action 'lifting the right hand', the emotion strategy updates another preset feedback action randomly, and updates the feedback action of 'Zhang III' corresponding to emotion 'depression' stored in the multi-modal knowledge map; if the 'Zhang three' selection is unsatisfactory to the answer sentence 'jolt up', the emotion strategy updates another preset answer sentence which is randomly selected, and updates the answer sentence which is stored in the multi-modal knowledge map and corresponds to the emotion 'frustration'; if the "three open" selection is not satisfied with the current voice style "boy's voice", the voice preference prompts the "three open" selection of another preset voice style, and then updates to the voice preference of "three open" stored in the multimodal knowledge map.
The above examples are only intended to illustrate the technical solution of the present invention, but not to limit it, and it should be noted that: it will be apparent to those skilled in the art that various modifications and adaptations can be made without departing from the principles of the invention and these are intended to be within the scope of the invention.