Vehicle-based voice interaction method and device, vehicle and storage medium
1. The vehicle-based voice interaction method is characterized by comprising the following steps:
responding to current voice sent by a user, and acquiring current space-time data of a vehicle;
determining a target behavior pattern from a set of inertial behaviors that have been generated by the vehicle based on the current speech and the current spatiotemporal data, the set of inertial behaviors being generated by behavior modeling of time-aligned historical spatiotemporal data and historical speech dialogs on the vehicle;
and executing corresponding response operation by adopting the inertia behavior information in the target behavior mode.
2. The method of claim 1, wherein determining a target behavior pattern from the set of inertial behaviors that the vehicle has generated based on the current speech and the current spatiotemporal data comprises:
inputting the current voice into a pre-constructed semantic understanding model to obtain a corresponding user universal intention, and searching out an adaptive behavior mode of the current time-space data from the inertial behavior set;
and determining a corresponding target behavior mode according to the comparison result between the universal intention of the user and the intention slot position information in each suitability-based model.
3. The method of claim 2, wherein the semantic understanding model is trained by performing the following steps:
determining historical response operation executed after the historical voice sent by the user passes through multiple rounds of dialogue operation;
if a first behavior pattern matching the historical response operation exists in the set of inertial behaviors, taking the intended slot information in the first behavior pattern as a sample tag of the historical voice:
and training the semantic understanding model by using the historical speech and the sample label of the historical speech.
4. The method of claim 1, wherein the employing the inertial behavior information within the target behavior pattern to perform corresponding response operations comprises:
if the target behavior mode is not empty, adopting inertia behavior information of each slot position in the target behavior mode to execute corresponding response operation;
and if the target behavior mode is empty, determining the clear intention of the user through multiple rounds of dialogue operations, and executing corresponding response operations.
5. The method of claim 4, after determining the user's express intent through multiple rounds of dialog operations and performing corresponding response operations, further comprising:
and carrying out time alignment on the dialogue data and the space-time data under the multi-turn dialogue operation, and optimizing the behavior mode in the inertia behavior set by adopting the dialogue data and the space-time data after the time alignment.
6. The method of any of claims 1-5, wherein each behavior pattern within the set of inertial behaviors includes a trigger execution condition for that behavior pattern.
7. The method of claim 6, further comprising:
and for each behavior mode in the inertial behavior set, if the trigger execution condition of the behavior mode is reached after the vehicle is started, generating a prompt message for the user to judge whether to execute the behavior mode.
8. A vehicle-based voice interaction apparatus, comprising:
the voice response module is used for responding to the current voice sent by the user and acquiring the current space-time data of the vehicle;
a behavior pattern determination module for determining a target behavior pattern from an inertial behavior set generated by the vehicle based on the current speech and the current spatiotemporal data, the inertial behavior set being generated by performing behavior modeling on time-aligned historical spatiotemporal data and historical speech dialogue on the vehicle;
and the interactive response module is used for executing corresponding response operation by adopting the inertial behavior information in the target behavior mode.
9. A vehicle, characterized in that the vehicle comprises:
one or more processors;
storage means for storing one or more programs;
when executed by the one or more processors, cause the one or more processors to implement the vehicle-based voice interaction method of any of claims 1-7.
10. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out a vehicle-based voice interaction method according to any one of claims 1 to 7.
Background
In the driving process of the vehicle, the driver is required to keep hands and eyes to be concentrated on the road condition, so that the voice interaction becomes the standard configuration core function of intelligent driving of the vehicle, and the driver can perform various interactive operations such as navigation, entertainment or vehicle control and the like through the voice and the vehicle.
Currently, existing voice interaction generally utilizes contextual voice conversation content uttered by a driver to perform context resolution on currently uttered voice information so as to eliminate ambiguous understanding in a short time and determine an operation intention of the driver. At this time, if the vehicle is required to execute various accurate intention operations, such as the window opening degree, the specific navigation route and the like, the voice information sent by the driver is required to contain accurate intention contents, so that the voice information sent by the driver is extremely complicated during vehicle voice interaction, and the convenience of vehicle voice interaction is greatly reduced; or, through carrying out many rounds of voice interaction between the vehicle and the driver, the driver is enabled to give accurate intention content continuously, and the driving concentration of the driver is affected, so that the driving safety risk of the vehicle is increased.
Disclosure of Invention
The embodiment of the invention provides a vehicle-based voice interaction method and device, a vehicle and a storage medium, which can improve the accuracy and convenience of vehicle voice interaction and reduce the safety risk of vehicle driving.
In a first aspect, an embodiment of the present invention provides a vehicle-based voice interaction method, where the method includes:
responding to current voice sent by a user, and acquiring current space-time data of a vehicle;
determining a target behavior pattern from a set of inertial behaviors that have been generated by the vehicle based on the current speech and the current spatiotemporal data, the set of inertial behaviors being generated by behavior modeling of time-aligned historical spatiotemporal data and historical speech dialogs on the vehicle;
and executing corresponding response operation by adopting the inertia behavior information in the target behavior mode.
In a second aspect, an embodiment of the present invention provides a vehicle-based voice interaction apparatus, including:
the voice response module is used for responding to the current voice sent by the user and acquiring the current space-time data of the vehicle;
a behavior pattern determination module for determining a target behavior pattern from an inertial behavior set generated by the vehicle based on the current speech and the current spatiotemporal data, the inertial behavior set being generated by performing behavior modeling on time-aligned historical spatiotemporal data and historical speech dialogue on the vehicle;
and the interactive response module is used for executing corresponding response operation by adopting the inertial behavior information in the target behavior mode.
In a third aspect, an embodiment of the present invention provides a vehicle including:
one or more processors;
storage means for storing one or more programs;
when executed by the one or more processors, cause the one or more processors to implement a vehicle-based voice interaction method as described in any embodiment of the invention.
In a fourth aspect, the present invention provides a computer-readable storage medium, on which a computer program is stored, where the computer program is executed by a processor to implement the vehicle-based voice interaction method according to any embodiment of the present invention.
The embodiment of the invention provides a vehicle-based voice interaction method, a vehicle-based voice interaction device, a vehicle and a storage medium, wherein a corresponding inertia behavior set can be generated by performing behavior modeling on historical space-time data and historical voice conversations which are aligned in time on the vehicle, so that after current voice sent by a user is received, the current space-time data of the vehicle is firstly obtained, then a target behavior pattern matched with the current voice and the current space-time data is searched from the inertia behavior set generated by the vehicle, further, the inertia behavior information in the target behavior pattern is adopted, and corresponding response operation is executed, so that the voice interaction of the vehicle is realized, at the moment, even if the current voice does not contain clear user intention, the target behavior pattern meeting the user requirement can be accurately positioned by combining the current space-time data, on the basis of simplifying the voice interaction of the user, the accuracy and the convenience of vehicle voice interaction are improved, the intention of a user is determined without executing multiple rounds of voice conversations, and the safety risk of vehicle driving is reduced.
Drawings
Other features, objects and advantages of the invention will become more apparent upon reading of the detailed description of non-limiting embodiments made with reference to the following drawings:
fig. 1 is a flowchart of a vehicle-based voice interaction method according to an embodiment of the present invention;
FIG. 2 is a flowchart of a vehicle-based voice interaction method according to a second embodiment of the present invention;
fig. 3 is a schematic structural diagram of a vehicle-based voice interaction device according to a third embodiment of the present invention;
fig. 4 is a schematic structural diagram of a vehicle according to a fourth embodiment of the present invention.
Detailed Description
The present invention will be described in further detail with reference to the accompanying drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the invention and are not limiting of the invention. It should be further noted that, for the convenience of description, only some of the structures related to the present invention are shown in the drawings, not all of the structures.
Example one
Fig. 1 is a flowchart of a vehicle-based voice interaction method according to an embodiment of the present invention. The embodiment can be applied to the situation that voice interaction is carried out with a user on any vehicle so as to control the vehicle to execute corresponding operation. The vehicle-based voice interaction method provided by the embodiment of the invention can be executed by a vehicle-based voice interaction device provided by the embodiment of the invention, and the device can be realized in a software and/or hardware manner and is integrated in a vehicle executing the method.
Specifically, referring to fig. 1, the method specifically includes the following steps:
and S110, responding to the current voice sent by the user, and acquiring the current space-time data of the vehicle.
Specifically, with the development of vehicle intelligence, various functions of the vehicle support voice control of the user, such as user voice control of windowing, navigation, song playing, and the like. After the vehicle is started, whether the voice sent by the user is received or not is detected in real time, so that the user intention is analyzed through semantic analysis of the received user voice, and the corresponding function on the vehicle is controlled to respond according to the user intention.
However, considering that the voice uttered by the user may not include a definite intention, it is necessary for the user to perform multiple rounds of voice conversations with the vehicle to analyze the definite intention of the user, so that the voice information during the vehicle voice interaction is extremely cumbersome, and the safety risk of driving the vehicle is increased, so in order to avoid the above problems, after receiving the current voice uttered by the user, the present embodiment first obtains the current time-space data of the vehicle according to the current driving state of the vehicle, where the current time-space data may include the driving track, the stopping Point, the Point of Interest (POI) during the driving of the vehicle, the time period of the vehicle in different traveling states, and the operation behavior between the user and the vehicle (such as the selected song to be played, the input navigation destination, etc.), so as to subsequently analyze the current voice, the behavior operation which needs to be executed by the vehicle by the user is accurately analyzed by further combining the current time-space data of the vehicle, and the clear intention of the user is avoided being judged through multiple rounds of voice conversations.
And S120, determining a target behavior pattern from the inertia behavior set generated by the vehicle according to the current voice and the current time-space data.
Optionally, in order to accurately analyze the behavior operation when the user performs the voice interaction with the vehicle, the embodiment may model a corresponding inertial behavior set for the vehicle offline, where the inertial behavior set may include various behavior modes, such as navigation, listening to songs, windowing, and the like, that are commonly used by the user during the vehicle driving process.
Wherein the inertial behavior set may be generated by behavior modeling of time-aligned historical spatiotemporal data and historical speech dialogs on the vehicle. Specifically, historical spatiotemporal data of the vehicle in each previous historical driving process is collected, such as a historical driving track (composed of position information at different times in the historical driving process), a stop point (a position kept stationary for a period of time in the historical driving process) and a POI (a certain type of interest point in a certain area in the historical driving process, such as a subway station, a hospital, a business surpass, a restaurant and the like), and a historical voice conversation is obtained between the vehicle and the user in each historical driving process, such as voice interaction contents (such as voice instructions for controlling the vehicle to execute various types of behavior operations in the historical driving process, vehicle voice reply contents and the like), and the behavior operation of the vehicle triggered by the user (such as list selection, trigger button clicking and the like). Then, the historical space-time data of the vehicle is analyzed and mined to determine the inertia running behaviors frequently adopted by the vehicle, such as the morning track of each day and the afternoon track of each day, the historical voice conversations are aligned with the inertia running behaviors determined by the historical space-time data in time, semantic analysis is carried out on the historical voice conversations, and semantic labels corresponding to the inertia running behaviors and the behavior intentions and the operation states of the user in the inertia running behaviors are determined by using the semantic analysis results of the historical voice conversations aligned with the inertia running behaviors in time, so that various behavior patterns frequently used by the vehicle in the historical running process are obtained.
For example, the behavior pattern in the present embodiment may include: < intention: navigating; behavior: working; time period: 8 to 8 and a half in the morning; starting point: a; end point: b; path: xxx >, < intent: listening to songs; behavior: working; singing: xxx >, < intent: windowing; behavior: working; degree of opening and closing of the window: 1/3>, each behavior pattern in the inertial behavior set in the present embodiment includes a plurality of slot positions, and each slot position has inertial behavior information of the vehicle in a corresponding dimension set therein.
In this embodiment, after receiving the current speech uttered by the user and the current spatiotemporal data where the vehicle is located, semantic analysis may be performed on the current speech first, so as to obtain a user intention included in the current speech, where the user intention may be a simplified general intention and does not specify real behavior content. Meanwhile, common inertia running behavior analysis is carried out on the current time-space data, for example, the current time is 8 am, the inertia running behavior of the vehicle at the moment can be determined to be on duty, then, a target behavior mode which simultaneously contains the user intention corresponding to the current voice and the inertia running behavior pointed by the current time-space data is found out from the generated inertia behavior set, for example, if the current voice is 'head song', the current time-space data is a travel track from 8 am, the user intention of the current voice can be determined to be 'listen to song', and the inertia running behavior corresponding to the current time-space data is 'on duty', so that the target behavior mode can be determined to be: < intention: listening to songs; behavior: working; singing: xxx >.
In addition, in order to ensure the comprehensiveness of the vehicle voice interaction, each behavior pattern in the inertial behavior set in the embodiment further includes a trigger execution condition of the behavior pattern, where the trigger execution condition is used to determine whether the behavior pattern needs to be executed during the vehicle driving process. In the vehicle behavior process, whether a trigger execution condition of the behavior mode is reached after the vehicle is started or not is detected in real time for each behavior mode in the inertia behavior set, if the trigger execution condition of the behavior mode is reached after the vehicle is started, a prompt message for judging whether the behavior mode is executed or not by a user is generated, the prompt message is displayed to the user, and the user artificially judges whether the behavior mode needs to be executed at present or not so as to realize the active prompt execution of each behavior mode in the inertia behavior set.
And S130, adopting the inertia behavior information in the target behavior mode to execute corresponding response operation.
In this embodiment, after the target behavior mode is determined, the inertia behavior information at each slot position in the target behavior mode may be obtained, and at this time, the inertia behavior information in the target behavior mode may indicate a clear user intention, so that the corresponding response operation is executed according to the inertia behavior information at each slot position in the target behavior mode. For example, if the target behavior pattern is: < intention: listening to songs; behavior: working; singing: xxx >, then after receiving the current voice as "put song", automatically control the song list: the songs in xxx are played.
According to the technical scheme provided by the embodiment, the corresponding inertial behavior set can be generated by performing behavior modeling on historical space-time data and historical voice dialogue which are subjected to time alignment on the vehicle, so that after current voice sent by a user is received, the current space-time data of the vehicle is obtained firstly, then a target behavior mode matched with the current voice and the current space-time data is found out from the generated inertial behavior set of the vehicle, and further the inertial behavior information in the target behavior mode is adopted to execute corresponding response operation, so that the voice interaction of the vehicle is realized, at the moment, even if the current voice does not contain clear user intention, the target behavior mode meeting the user requirement can be accurately positioned by combining the current space-time data, on the basis of simplifying the voice interaction of the user, the accuracy and convenience of the voice interaction of the vehicle are improved, and the user does not need to execute multiple rounds of voice dialogue to clear the intention of the user, the safety risk of vehicle driving is reduced.
Example two
Fig. 2 is a flowchart of a vehicle-based voice interaction method according to a second embodiment of the present invention. The embodiment of the invention is optimized on the basis of the embodiment. Optionally, the present embodiment mainly explains in detail a specific process of determining the target behavior pattern from the set of inertial behaviors and a specific response process of the target behavior pattern.
Specifically, referring to fig. 2, the method of this embodiment may specifically include:
and S210, responding to the current voice sent by the user, and acquiring the current space-time data of the vehicle.
S220, inputting the current voice into a pre-constructed semantic understanding model to obtain a corresponding user universal intention, and searching out an adaptive behavior mode of the current time-space data from the inertial behavior set.
Optionally, in order to accurately analyze the user intention in the current speech, a semantic understanding model is pre-constructed in the embodiment, and is used for performing corresponding semantic analysis on the current speech. Therefore, after receiving the current voice sent by the user, the current voice is input into the pre-constructed semantic understanding model to perform semantic analysis on the current voice, and at this time, even if the current voice is simpler, the user general purpose represented by the current voice can be obtained, wherein the user general purpose is a specific function which can represent that the user controls the vehicle to execute, but specific execution parameters of the specific function are not set, for example, the user general purpose is 'windowing', but specific opening degree of the windowing is not specified.
It should be noted that the semantic understanding model in this embodiment may be trained in the following manner: determining historical response operation executed after multiple rounds of dialogue operation of historical voice sent by a user; if there is a first behavior pattern in the set of inertial behaviors that the historical response operation matches, then the intended slot information within the first behavior pattern is used as a sample tag for the historical speech: and training the semantic understanding model by using the historical speech and the sample labels of the historical speech.
That is to say, when the semantic understanding model is not trained yet, in order to ensure the accuracy of controlling the vehicle execution function by the voice of the user, for the historical voice uttered by the user, if the historical voice is simple and does not contain the explicit intention of the user, the vehicle is usually controlled to perform multiple rounds of voice interaction with the user, so as to continuously guide the user to indicate the explicit intention of controlling the vehicle execution function at this time through multiple rounds of dialogue operation, and thus the vehicle is controlled to execute the corresponding historical response operation according to the explicit intention, so as to implement the voice control of the specific function. At this time, a history response operation executed after each history voice sent by the user passes through multiple rounds of dialog operations may be obtained, and then the history response operation is compared with a response operation corresponding to each behavior pattern in the inertia behavior set, and if a first behavior pattern matching with the history response operation exists in the inertia behavior set, it is described that a user pan-intention included in the history voice sent by the user is consistent with the intention slot position information in the first behavior pattern, so in order to ensure that the voice understanding model can output the user pan-intention of the history voice, the intention slot position information in the first behavior pattern may be used as a sample tag of the history voice sent by the user, so that a training sample set composed of a large number of history voices may be obtained by adopting the above steps, and each training sample is provided with a corresponding sample tag. Furthermore, the semantic understanding model can perform semantic analysis on the spoken voice data by utilizing each historical voice in the training sample set and the sample label of each historical voice, so that the user universal intention consistent with the intention slot position information in a certain behavior mode in the inertial behavior set is obtained, and the identification accuracy of the user universal intention is ensured.
Moreover, considering that the current voice uttered by the user may be relatively simple, it is not fully mentioned about each specific operation data when the vehicle is controlled to perform a certain function this time, for example, when the current voice is "open window", it is not specified which window is opened, and to what extent the window is opened, so that the clear intention of the user cannot be obtained, and therefore, it is necessary to further combine the current spatiotemporal data where the vehicle is located to analyze the clear intention of the user.
Specifically, by analyzing the current time-space data of the vehicle, the current driving time period, the driving position and other information of the vehicle can be judged, and then the generated common driving behaviors set by the historical time-space data in each behavior mode in the inertia behavior set are checked, and the common driving behaviors in each behavior mode are compared with the driving time period, the driving position and other information represented by the current time-space data one by one, so that an adaptive behavior pattern containing the driving time period, the driving position and other information represented by the current time-space data in the inertia behavior set is found, for example, the current time-space data represents 8 am, the driving position is continuously located in an on-duty route, and then the inertia behavior set contains' behaviors: the behavior patterns of work "may be the adapted behavior patterns in the present embodiment.
And S230, determining a corresponding target behavior mode according to the comparison result between the universal intention of the user and the intention slot position information in each suitability-based model.
In this embodiment, since the user may have an intention of controlling the vehicle to perform a plurality of different functions in the same driving behavior, the intention slot information in different fitting behavior modes is different. At this time, after the user universal intention represented by the historical voice sent by the user is determined, the adaptation behavior mode with the intention slot position information consistent with the user universal intention is found out by comparing the user universal intention with the intention slot position information in each adaptation model one by one, and the adaptation behavior mode is used as the target behavior mode in the embodiment, so that the target behavior mode which can be achieved by one language can be determined by adopting the user universal intention, multiple rounds of voice conversation interaction with the user are not needed, and the intelligence of vehicle voice interaction is improved.
S240, judging whether the target behavior mode is empty, if so, executing S260; if not, go to S250.
After the target behavior mode is determined, because the behavior mode including the general purpose of the user represented by the current voice may not exist in the inertia behavior set, the target behavior mode is empty and the corresponding response operation cannot be executed, so that in order to ensure the accuracy of the vehicle voice interaction, it is necessary to further determine whether the target behavior mode is empty so as to subsequently execute different response operations.
And S250, executing corresponding response operation by adopting the inertia behavior information of each slot position in the target behavior mode.
Optionally, if the target behavior mode is non-empty, it is indicated that functional operation parameters with different dimensions have been set at each slot position in the target behavior mode, at this time, the inertial behavior information of each slot position in the target behavior mode may be directly obtained, and then the corresponding response operation is executed according to the inertial behavior information of each slot position in the target behavior mode.
And S260, determining the clear intention of the user through multiple rounds of dialogue operations, and executing corresponding response operations.
Optionally, if the target behavior mode is null, it indicates that the explicit intention of the user still cannot be obtained, at this time, the user may be continuously guided to indicate the explicit intention of controlling the vehicle to execute the function through multiple rounds of dialog operations by controlling the vehicle to perform multiple rounds of voice interaction with the user, and then corresponding response operations are executed according to the explicit intention.
And S270, carrying out time alignment on the dialogue data and the space-time data under the multi-turn dialogue operation, and optimizing the behavior mode in the inertia behavior set by adopting the dialogue data and the space-time data after the time alignment.
Optionally, after the clear intention of the user is determined through multiple rounds of dialogue operations, the dialogue data and the time-space data of each vehicle under the multiple rounds of dialogue operations can be continuously collected, then the driving behavior of the vehicle can be obtained by analyzing the time-space data, further, the time alignment is performed on the dialogue data and the time-space data under the multiple rounds of dialogue operations, a corresponding semantic label can be set for the driving behavior, and the behavior intention, the operation state and the like under the driving behavior are determined. Then, the driving behavior represented by the dialogue data and the time-space data after time alignment and the behavior intention and the operation state under the driving behavior are adopted to continuously optimize the behavior pattern in the inertia behavior set, so that the inertia behavior set can be continuously optimized in each driving process of the vehicle, and the comprehensiveness of each behavior pattern in the inertia behavior set is gradually realized.
According to the technical scheme provided by the embodiment, the corresponding inertial behavior set can be generated by performing behavior modeling on historical space-time data and historical voice dialogue which are subjected to time alignment on the vehicle, so that after current voice sent by a user is received, the current space-time data of the vehicle is obtained firstly, then a target behavior mode matched with the current voice and the current space-time data is found out from the generated inertial behavior set of the vehicle, and further the inertial behavior information in the target behavior mode is adopted to execute corresponding response operation, so that the voice interaction of the vehicle is realized, at the moment, even if the current voice does not contain clear user intention, the target behavior mode meeting the user requirement can be accurately positioned by combining the current space-time data, on the basis of simplifying the voice interaction of the user, the accuracy and convenience of the voice interaction of the vehicle are improved, and the user does not need to execute multiple rounds of voice dialogue to clear the intention of the user, the safety risk of vehicle driving is reduced.
EXAMPLE III
Fig. 3 is a schematic structural diagram of a vehicle-based voice interaction apparatus according to a third embodiment of the present invention, and as shown in fig. 3, the apparatus may include:
the voice response module 310 is used for responding to the current voice sent by the user and acquiring the current space-time data of the vehicle;
a behavior pattern determination module 320 configured to determine a target behavior pattern from an inertial behavior set generated by the vehicle based on the current speech and the current spatio-temporal data, the inertial behavior set being generated by performing behavior modeling on historical spatio-temporal data and historical speech dialogue on the vehicle after time alignment;
and the interactive response module 330 is configured to execute a corresponding response operation by using the inertial behavior information in the target behavior pattern.
According to the technical scheme provided by the embodiment, the corresponding inertial behavior set can be generated by performing behavior modeling on historical space-time data and historical voice dialogue which are subjected to time alignment on the vehicle, so that after current voice sent by a user is received, the current space-time data of the vehicle is obtained firstly, then a target behavior mode matched with the current voice and the current space-time data is found out from the generated inertial behavior set of the vehicle, and further the inertial behavior information in the target behavior mode is adopted to execute corresponding response operation, so that the voice interaction of the vehicle is realized, at the moment, even if the current voice does not contain clear user intention, the target behavior mode meeting the user requirement can be accurately positioned by combining the current space-time data, on the basis of simplifying the voice interaction of the user, the accuracy and convenience of the voice interaction of the vehicle are improved, and the user does not need to execute multiple rounds of voice dialogue to clear the intention of the user, the safety risk of vehicle driving is reduced.
Further, the behavior pattern determining module 320 may be specifically configured to:
inputting the current voice into a pre-constructed semantic understanding model to obtain a corresponding user universal intention, and searching out an adaptive behavior mode of the current time-space data from the inertial behavior set;
and determining a corresponding target behavior mode according to the comparison result between the universal intention of the user and the intention slot position information in each suitability-based model.
Further, the semantic understanding model can be obtained by performing the following training steps:
determining historical response operation executed after the historical voice sent by the user passes through multiple rounds of dialogue operation;
if a first behavior pattern matching the historical response operation exists in the set of inertial behaviors, taking the intended slot information in the first behavior pattern as a sample tag of the historical voice:
and training the semantic understanding model by using the historical speech and the sample label of the historical speech.
Further, the interactive response module 330 may be specifically configured to:
if the target behavior mode is not empty, adopting inertia behavior information of each slot position in the target behavior mode to execute corresponding response operation;
and if the target behavior mode is empty, determining the clear intention of the user through multiple rounds of dialogue operations, and executing corresponding response operations.
Further, the vehicle-based voice interaction device may further include:
and the inertia behavior set optimization module is used for carrying out time alignment on the dialogue data and the space-time data under the multi-round dialogue operation and optimizing the behavior mode in the inertia behavior set by adopting the dialogue data and the space-time data after the time alignment.
Further, each behavior pattern in the inertial behavior set includes a trigger execution condition of the behavior pattern.
Further, the vehicle-based voice interaction device may further include:
and the behavior triggering module is used for generating a prompt message for the user to judge whether to execute the behavior mode or not according to each behavior mode in the inertia behavior set if the vehicle reaches the triggering execution condition of the behavior mode after being started.
The vehicle-based voice interaction device provided by the embodiment can be applied to the vehicle-based voice interaction method provided by any embodiment, and has corresponding functions and beneficial effects.
Example four
Fig. 4 is a schematic structural diagram of a vehicle according to a fourth embodiment of the present invention. As shown in fig. 4, the vehicle includes a processor 40, a storage device 41, and a communication device 42; the number of processors 40 in the vehicle may be one or more, and one processor 40 is illustrated in fig. 4; the processor 40, the storage device 41 and the communication device 42 of the vehicle may be connected by a bus or other means, and the bus connection is exemplified in fig. 4.
The storage device 41, as a computer-readable storage medium, may be used for storing software programs, computer-executable programs, and modules, such as the modules corresponding to the vehicle-based voice interaction method in the embodiment of the present invention (e.g., the voice response module 310, the behavior pattern determination module 320, and the interaction response module 330 in the vehicle-based voice interaction device). The processor 40 executes various functional applications and data processing of the electronic device by running software programs, instructions and modules stored in the storage device 41, that is, implements the vehicle-based voice interaction method described above.
The storage device 41 may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function; the storage data area may store data created according to the use of the terminal, and the like. Further, the storage device 41 may include high speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other non-volatile solid state storage device. In some examples, storage device 41 may further include memory located remotely from multifunction controller 40, which may be connected to the electronic device via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
The communication means 42 may be used to enable a network connection or a mobile data connection between the devices.
The vehicle provided by the embodiment can be used for executing the vehicle-based voice interaction method provided by any embodiment, and has corresponding functions and beneficial effects.
EXAMPLE five
Fifth, an embodiment of the present invention further provides a computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, can implement the vehicle-based voice interaction method in any of the above embodiments. The method specifically comprises the following steps:
responding to current voice sent by a user, and acquiring current space-time data of a vehicle;
determining a target behavior pattern from a set of inertial behaviors that have been generated by the vehicle based on the current speech and the current spatiotemporal data, the set of inertial behaviors being generated by behavior modeling of time-aligned historical spatiotemporal data and historical speech dialogs on the vehicle;
and executing corresponding response operation by adopting the inertia behavior information in the target behavior mode.
Of course, the storage medium containing the computer-executable instructions provided by the embodiments of the present invention is not limited to the method operations described above, and may also perform related operations in the vehicle-based voice interaction method provided by any embodiments of the present invention.
From the above description of the embodiments, it is obvious for those skilled in the art that the present invention can be implemented by software and necessary general hardware, and certainly, can also be implemented by hardware, but the former is a better embodiment in many cases. Based on such understanding, the technical solutions of the present invention may be embodied in the form of a software product, which can be stored in a computer-readable storage medium, such as a floppy disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a FLASH Memory (FLASH), a hard disk or an optical disk of a computer, and includes several instructions for enabling a computer device (which may be a personal computer, a server, or a network device) to execute the methods according to the embodiments of the present invention.
It should be noted that, in the embodiment of the vehicle-based voice interaction apparatus, the included units and modules are only divided according to functional logic, but are not limited to the above division, as long as the corresponding functions can be implemented; in addition, specific names of the functional units are only for convenience of distinguishing from each other, and are not used for limiting the protection scope of the present invention.
The above description is only a preferred embodiment of the present invention and is not intended to limit the present invention, and various modifications and changes may be made by those skilled in the art. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.