Monitoring scheduling method, system, device and storage medium
1. A monitoring and scheduling method is characterized by comprising the following steps:
acquiring scheduling voice;
determining a scheduling text according to the trained recognition model and the scheduling voice;
performing word segmentation processing on the scheduling text, and determining a word segmentation result;
determining an operation instruction according to the word segmentation result;
and displaying the monitoring video according to the operation instruction.
2. The monitoring and scheduling method of claim 1, wherein the identification model is obtained by:
acquiring monitoring information, wherein the monitoring information at least comprises a name of a monitoring lens, a coordinate of a position where monitoring is located and a name of the position where monitoring is located;
classifying the monitoring information and determining a conceptual model, wherein the conceptual model comprises localized address data;
and training the recognition model according to a conceptual model, and determining the trained recognition model.
3. The monitoring and scheduling method of claim 1, wherein the determining an operation instruction according to the word segmentation result comprises:
and performing part-of-speech tagging on the vocabulary in the word segmentation result, wherein the part-of-speech at least comprises: a technical noun, a place noun, and a referring entity noun;
determining the operation instruction according to the word segmentation result and the part of speech, wherein the operation instruction comprises a scheduling field, an element field and a reference field;
the method for determining the operation instruction specifically comprises the following steps:
determining the scheduling field according to the terminology;
determining the element field from the place noun;
determining the reference field from the reference entity.
4. The monitoring and scheduling method of claim 3, further comprising:
storing a plurality of sentences of the word segmentation results into an information queue;
and if at least one of a scheduling field, an element field or a reference field is absent in the current operation instruction, completing the operation instruction according to the sentence segmentation results in the information queue.
5. The monitoring and scheduling method of claim 3, further comprising:
determining a corpus according to the word segmentation result;
determining the word frequency of the part of speech according to the corpus;
determining a local corpus rule according to the corpus and the word frequency;
and performing word segmentation processing on the scheduling text according to the local corpus rule, and determining the word segmentation result.
6. The monitoring and scheduling method of claim 5, further comprising:
determining a cleaning library according to the corpus;
and correcting the scheduling text according to the cleaning library, and determining the cleaned scheduling text.
7. The monitoring scheduling method of claim 3, wherein the displaying the monitoring video according to the operation instruction further comprises:
acquiring a plurality of instruction evaluation modes;
and when the sentence pattern of the operation instruction is matched with the instruction evaluation pattern, displaying the monitoring video according to the operation instruction.
8. A supervisory scheduling system, comprising:
the acquisition module is used for acquiring scheduling voice;
the voice recognition module is used for determining a scheduling text according to the trained recognition model and the scheduling voice;
the word segmentation processing module is used for performing sentence segmentation processing and word segmentation processing on the scheduling text and determining a word segmentation result;
the instruction generation module is used for determining an operation instruction according to the word segmentation result;
and the instruction execution module is used for executing the operation instruction and displaying the monitoring video.
9. An apparatus, comprising:
at least one processor;
at least one memory for storing at least one program;
when executed by the at least one processor, cause the at least one processor to implement the monitoring scheduling method of any one of claims 1-7.
10. A storage medium having stored therein a processor-executable program, wherein the processor-executable program, when executed by the processor, is configured to implement the monitoring scheduling method of any one of claims 1-7.
Background
In order to respond to the requirement of national information construction and accelerate the pace of government information construction, in recent years, with the advance of a series of government construction projects such as a safe city, a snow project, a smart city and the like, the construction speed of video monitoring equipment is greatly improved, various video monitoring systems are produced, most of the monitoring systems basically rely on manual keyboard and mouse operation for retrieving and scheduling video monitoring contents, and even the systems are inconvenient to project to a large-screen video command scene.
For the monitoring video-on-demand scheduling of the urban level of the construction scale of the ten thousand roads, corresponding operation can be completed generally by cooperation of a plurality of people, and the monitoring resources can be scheduled quickly and accurately by arranging professional operators familiar with the video monitoring resources. Therefore, much inconvenience is caused to the daily scheduling command and the demonstration reporting work of the user.
Disclosure of Invention
The present application is directed to solving, at least to some extent, one of the technical problems in the related art. Therefore, the application provides a monitoring scheduling method, system, device and storage medium.
In a first aspect, an embodiment of the present application provides a monitoring and scheduling method, including: acquiring scheduling voice; determining a scheduling text according to the trained recognition model and the scheduling voice; performing word segmentation processing on the scheduling text, and determining a word segmentation result; determining an operation instruction according to the word segmentation result; and displaying the monitoring video according to the operation instruction.
Optionally, the identification model is obtained by: acquiring monitoring information, wherein the monitoring information at least comprises a name of a monitoring lens, a coordinate of a position where monitoring is located and a name of the position where monitoring is located; classifying the monitoring information and determining a conceptual model, wherein the conceptual model comprises localized address data; and training the recognition model according to a conceptual model, and determining the trained recognition model.
Optionally, the determining an operation instruction according to the word segmentation result includes: and performing part-of-speech tagging on the vocabulary in the word segmentation result, wherein the part-of-speech at least comprises: a technical noun, a place noun, and a referring entity noun; determining the operation instruction according to the word segmentation result and the part of speech, wherein the operation instruction comprises a scheduling field, an element field and a reference field; the method for determining the operation instruction specifically comprises the following steps: determining the scheduling field according to the terminology; determining the element field from the place noun; determining the reference field from the reference entity.
Optionally, the method further comprises: storing a plurality of sentences of the word segmentation results into an information queue; and if at least one of a scheduling field, an element field or a reference field is absent in the current operation instruction, completing the operation instruction according to the sentence segmentation results in the information queue.
Optionally, the method further comprises: determining a corpus according to the word segmentation result; determining the word frequency of the part of speech according to the corpus; determining a local corpus rule according to the corpus and the word frequency; and performing word segmentation processing on the scheduling text according to the local corpus rule, and determining the word segmentation result.
Optionally, the method further comprises: determining a cleaning library according to the corpus; and correcting the scheduling text according to the cleaning library, and determining the cleaned scheduling text.
Optionally, the displaying the monitoring video according to the operation instruction further includes: acquiring a plurality of instruction evaluation modes; and when the sentence pattern of the operation instruction is matched with the instruction evaluation pattern, displaying the monitoring video according to the operation instruction.
In a second aspect, an embodiment of the present application provides a monitoring and scheduling system, including: the acquisition module is used for acquiring scheduling voice; the voice recognition module is used for determining a scheduling text according to the trained recognition model and the scheduling voice; the word segmentation processing module is used for performing sentence segmentation processing and word segmentation processing on the scheduling text and determining a word segmentation result; the instruction generation module is used for determining an operation instruction according to the word segmentation result; and the instruction execution module is used for executing the operation instruction and displaying the monitoring video.
In a third aspect, an embodiment of the present application provides an apparatus, including:
at least one processor;
at least one memory for storing at least one program;
when the at least one program is executed by the at least one processor, the at least one processor is caused to implement the monitoring scheduling method of the first aspect.
In a fourth aspect, an embodiment of the present application provides a storage medium, in which a program executable by a processor is stored, and when the program executable by the processor is executed by the processor, the program is configured to implement the monitoring scheduling method according to the first aspect.
The beneficial effects of the embodiment of the application are as follows: acquiring scheduling voice; converting the scheduling voice into a scheduling text according to the trained recognition model, and performing word segmentation processing on the scheduling text to obtain a word segmentation result; determining different operation instructions according to different word segmentation results; and displaying the corresponding monitoring video according to the operation instruction. The embodiment of the application provides a method for converting user voice into text and determining an operation instruction according to the text, and the accuracy of the whole process of voice conversion and instruction generation is improved through word segmentation processing. The user can realize the scheduling of the monitoring video only by inputting the scheduling language under the natural language, the complex operation of the traditional scheduling and monitoring by using equipment such as a mouse, a keyboard and the like is avoided, and the use burden of the user is greatly reduced.
Drawings
The accompanying drawings are included to provide a further understanding of the claimed subject matter and are incorporated in and constitute a part of this specification, illustrate embodiments of the subject matter and together with the description serve to explain the principles of the subject matter and not to limit the subject matter.
Fig. 1 is a flowchart illustrating steps of a monitoring and scheduling method according to an embodiment of the present application;
FIG. 2 is a flowchart of the steps provided in an embodiment of the present application to obtain a recognition model;
FIG. 3 is a flowchart illustrating steps of a method for word segmentation processing according to an embodiment of the present disclosure;
FIG. 4 is a flowchart illustrating steps for establishing local corpus rules according to an embodiment of the present application;
FIG. 5 is a flowchart illustrating steps for creating a cleaning library according to an embodiment of the present disclosure;
FIG. 6 is a flowchart illustrating steps for completing an operation instruction according to an embodiment of the present disclosure;
fig. 7 is a schematic diagram of a monitoring and scheduling system according to an embodiment of the present application;
fig. 8 is an apparatus provided in accordance with some embodiments of the present application.
Detailed Description
In order to make the objects, technical solutions and advantages of the present application more apparent, the present application is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application.
It should be noted that although functional block divisions are provided in the system drawings and logical orders are shown in the flowcharts, in some cases, the steps shown and described may be performed in different orders than the block divisions in the systems or in the flowcharts. The terms first, second and the like in the description and in the claims, and the drawings described above, are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order.
The embodiments of the present application will be further explained with reference to the drawings.
Referring to fig. 1, fig. 1 is a flowchart illustrating steps of a monitoring and scheduling method provided by an embodiment of the present application, where the method includes, but is not limited to, steps S100-S500;
s100, acquiring scheduling voice;
specifically, when many emergencies occur in a city every day, such as a fire accident or a traffic accident, a user needs to quickly call a monitoring video near an incident place in numerous video monitoring of the city so as to quickly grasp the scene situation.
Specifically, the embodiment of the present application implements an automatic speech recognition ASR audio acquisition scheme based on a WebRTC (Web Real-Time Communication, Web instant messaging) technology, and enables a pickup function of an external microphone by a getUserMedia () method in an HTTPS (hypertext Transfer Protocol over Secure Socket Layer, hypertext Transfer Secure Protocol) security domain, so as to acquire a Real-Time audio stream including three formats of mp3, wav, and pcm. The embodiment of the application adopts the wav voice stream with the duration of 30 seconds as the scheduling voice, so that higher tone quality reduction degree can be achieved, and the voice recognition error caused by the scheduling voice distortion is prevented.
S200, determining a scheduling text according to the trained recognition model and the scheduling voice;
specifically, in the embodiment of the present application, a speech recognition framework FDNN (feed forward Deep Neural Networks) is used to train a recognition model, and the trained recognition model is used to recognize the scheduling speech, so as to convert the scheduling speech into the scheduling text in the natural language. It can be understood that in terms of denoising the semantics of the scheduled text, the method helps to improve the conversion accuracy by adding the filtering of analyzing the "impurities" of the Chinese text content, and the "impurities" representing the semantic noise include but are not limited to punctuation marks, word and the like.
Referring to fig. 2, fig. 2 is a flowchart illustrating steps of obtaining a recognition model according to an embodiment of the present application, where the method includes, but is not limited to, steps S210-S230;
s210, acquiring monitoring information, wherein the monitoring information at least comprises a name of a monitoring lens, coordinates of a position where monitoring is located and a name of the position where monitoring is located;
specifically, a large amount of monitoring information in a city is obtained, the monitoring information at least comprises a name of a monitoring lens, coordinates of a position where the monitoring is located, and a name of the position where the monitoring is located, and according to the obtained monitoring information, a specific place where the monitoring is located or a monitoring coverage range can be determined. In addition, the monitoring information can also comprise video marshalling plan information, video monitoring lens 'one machine for one file' and other structured data. The "one machine for one file" refers to the exclusive file information of one monitoring device, and the file information includes, but is not limited to, device codes, video catalogues, affiliated mechanisms, device types, application modes, installation pole numbers, control modes, access modes, transmission modes, point location types, picture definition, picture code rates, construction units, maintenance units, and other file information.
It can be understood that, since the city video monitoring is built by different projects and different units, for example, the monitoring lens name field has no uniform specification when being defined at first, that is, the monitoring information obtained in this step includes a large amount of redundant information, and therefore, data preprocessing needs to be performed on the monitoring information. The data preprocessing mainly refers to unifying data formats and numerical units of the monitoring information and removing noise data. Illustratively, for example, there is a monitoring name "face 1" in the third period a of the ecological garden at the boundary between the white stone road and the west sand river, where the "face 1" identifies that this shot is used for face recognition, and since the face recognition function is less associated with the monitoring scheduling method proposed in the embodiment of the present application, for this solution, this field belongs to redundant noise data and needs to be removed in a preprocessing stage. In addition, Chinese capital figures are uniformly converted into Arabic numerals, and data related to a road network is separated out independently, so that the converted monitoring name can be obtained: "the ecological garden is located northwest at the 3 th period A". For other monitoring information, data preprocessing is also required to be completed.
S220, classifying the monitoring information, and determining a conceptual model, wherein the conceptual model comprises localized address data;
specifically, according to the monitoring information acquired in step S220, a conceptual model is established for words of the relevant service locations related to the monitoring information, and the words are managed and expanded. The conceptual model includes localized address data including, but not limited to, data of localized institutions, banks, schools, hospitals, enterprises, rivers, communities, hotels, and road networks, and classifies the address data and records corresponding coordinates, and a standard 84 coordinate system is used in the embodiment of the present application. The concept model containing the localization address data enables the dispatching voice and the localization space-time scene of the embodiment of the application to have the basis of correlation analysis.
It can be understood that, according to the localized address data in the conceptual model, the address information of the conceptual model can be further expanded, for example, information such as a certain road intersection, a certain road north, or a city waterlogging-prone point and the like are supplemented, and coordinates of the addresses are labeled correspondingly.
S230, training the recognition model according to the conceptual model, and determining the trained recognition model;
specifically, the speech recognition framework FDNN is used for training the recognition model in the embodiment of the present application, and at the initial stage of the training of the recognition model, the training data sets of the chs-30 and the aishell can be used for training, so that the recognition model can recognize the basic language content. However, in the monitoring and scheduling scheme of the embodiment of the present application, a large amount of localized vocabularies are involved, and in order to improve the accuracy of the recognition model for recognizing and scheduling the speech, a large amount of localized address data in the conceptual model needs to be used to train the recognition model, where the localized address data includes, but is not limited to, local organs, businesses, scenic spots, banking outlets, communities, schools, kindergartens, hotels, hospitals, enterprises, road networks, places, river channels, and the like. And finishing the training of the recognition model according to the localization address data in the conceptual model to obtain the trained recognition model.
Through steps S210 to S230, in the embodiment of the present application, training of the recognition model is completed through the local address data in the conceptual model, so as to improve the recognition accuracy of the recognition model for the local place name.
The steps related to obtaining the recognition model are already described above, and the following steps begin with step S300 in fig. 1.
S300, performing word segmentation processing on the scheduling text, and determining a word segmentation result;
specifically, in order to obtain effective scheduling information from a scheduling text in a natural language state, word segmentation processing needs to be performed on the scheduling text. The method and the device have the advantages that functions of word segmentation, lexical analysis, syntactic analysis, text analysis, emotion analysis and the like of the scheduled text are achieved based on a natural semantic analysis framework HanLP of an HMM (hidden Markov) algorithm and a naive Bayes algorithm in a statistical algorithm. And after word segmentation processing is carried out on the scheduling text, a plurality of vocabulary sets are obtained, and the vocabulary sets are called word segmentation results.
S400, determining an operation instruction according to the word segmentation result;
specifically, after word segmentation processing is carried out on the scheduling text, an operation instruction is determined according to a word segmentation result.
Referring to fig. 3, fig. 3 is a flowchart illustrating steps of a word segmentation processing method according to an embodiment of the present application, where the method includes, but is not limited to, steps S410-S420;
s410, performing part-of-speech tagging on the vocabulary in the segmentation result, wherein the part-of-speech at least comprises: a technical noun, a place noun, and a referring entity noun;
specifically, according to the word segmentation result obtained after the word segmentation processing, different words in the word segmentation result are labeled according to the part of speech. Referring to table 1, table 1 is a part-of-speech tagging classification table provided in the embodiment of the present application, where the part-of-speech mentioned in the embodiment of the present application at least includes a technical noun, a place noun, and a referring entity noun. Skill nouns are generally words representing actions such as "on demand", "play", "locate"; the location noun is local address data, such as local bank, company, factory, etc.; the term "entity" refers to an entity controlled by a terminology, such as "monitoring", "monitoring video", and "pan/tilt". Referring to table 1, for example, if there is a converted scheduling text "play the first route monitoring of the ecosystem park", the segmentation result after the segmentation processing is performed, where "play" is sk skill nouns, "ecosystem park" is ns place nouns, "first route" is m-number words, and "monitoring" is nks to refer to entity nouns.
TABLE 1
S420, determining an operation instruction according to the word segmentation result and the part of speech, wherein the operation instruction comprises a scheduling field, an element field and a reference field;
specifically, the word segmentation result labeled with the part of speech is obtained according to step 410, and the operation instruction is determined. The operation instruction comprises at least three fields, namely a scheduling field, an element field and a reference field. The scheduling field refers to a field indicating an operation concept of explicit directivity in a scheduling service scene, specifically, a concept expressed by a single verb vocabulary or a group of vocabularies, and points to an element field, and is used for scheduling content corresponding to the element field. Illustratively, if the word segmentation result is "please play the video surveillance of the ecology technology park", the scheduling field corresponding to the obtained operation instruction is "play". The element field refers to element concepts related to the business, and is concepts such as enterprises, organizations, place names, attributes and the like commonly used by the business. The element field can be composed of a single element or a plurality of elements, and the element field is composed of words with part-of-speech labels of m, nr, nkj, ns and the like. Illustratively, if the word segmentation result is "please play the video surveillance of the ecology technology park", the corresponding obtained element field is "the ecology technology park". The reference field refers to an entity for scheduling field scheduling, and is composed of a vocabulary with a part-of-speech tag of nks. Illustratively, if the word segmentation result is "please play the video surveillance of the ecology technology park", the entity field of the corresponding obtained operation instruction is "video surveillance". Therefore, according to the word segmentation result, please play the video monitoring of the ecological technology park, the operation instructions of playing, ecological technology park and video monitoring can be obtained.
It can be understood that each field of the operation instruction corresponds to different parts of speech of the word segmentation result, and the determination method of the operation instruction specifically includes: determining a scheduling field according to the technical noun; determining element fields according to location nouns; the reference field is determined from the reference entity.
Through steps S410-S420, part-of-speech tagging is performed on the segmentation result, and an operation instruction is determined according to the segmentation result tagged with the part-of-speech.
In some embodiments, a plurality of instruction evaluation modes are further established in the embodiments of the present application, and the instruction evaluation mode refers to a specific composition of an operation instruction corresponding to a specific scheduling scenario. As mentioned above, the operation instruction includes at least a schedule field, an element field, and a reference field, and when the pattern of the operation instruction matches one of the instruction evaluation patterns, the operation instruction executes the instruction according to a particular scheduling scenario. Referring to table 2, table 2 is an instruction evaluation mode table provided in the embodiment of the present application, where "___" indicates different element instructions. Illustratively, for example, the operation command is "play", "ecology garden", "monitor", referring to table 2, the operation command conforms to the command evaluation mode "open ___ monitor", and therefore corresponds to the playing skill of the system, and then corresponds to the operation of playing. It should be noted that, because the system executes different operation instructions in a certain order logically, for example, the corresponding video resource needs to be searched first to play the video resource, so that priorities need to be set for the evaluation models corresponding to each scheduling scenario, which is convenient for the system to operate according to the priority order. For example, referring to table 2, the range lookup has priority over the playing skill.
TABLE 2
In some embodiments, referring to fig. 4, fig. 4 is a flowchart illustrating steps of establishing local corpus rules according to an embodiment of the present application, where the method includes, but is not limited to, steps S430 to S460:
s430, determining a corpus according to the word segmentation result;
specifically, the word segmentation results obtained in step S300 are collected into a corpus.
S440, determining word frequency of the part of speech according to the corpus;
specifically, according to step S410, part-of-speech tagging is performed on the word segmentation result, and according to different parts-of-speech of the word segmentation result, the word frequency corresponding to the part-of-speech is determined. Word frequency refers to the frequency of use of the word in the linguistic material to evaluate how repeatedly a word is to a set of domain documents in a document or corpus. Illustratively, referring to table 1, the word frequency of the skill vocabulary sk is 5000. And by analogy, the word frequency corresponding to the part of speech is counted.
S450, determining a local corpus rule according to the corpus and the word frequency;
specifically, after the corpus of the word frequencies of the words is obtained, the word frequencies are higher, which indicates that the occurrence frequency of the words in the current corpus is higher, so that the word frequencies of the words can be used as an action vector for deducing the priority of the participles. Illustratively, for example, the scheduled text is "play pool crossing monitoring", the "play" is a correct vocabulary, and the "drop" is also a correct vocabulary, it is understood that, as a corpus for monitoring scheduling, the word frequency of "play" is much higher than that of "drop", so that the "play" with higher word frequency has the priority of word segmentation, and after the word segmentation of the scheduled text, the obtained word segmentation result is "play/pool crossing/monitoring". And deducing the priority of the participles according to the word frequency of different words in the corpus to form a local corpus rule.
S460, performing word segmentation processing on the scheduling text according to the local corpus rule, and determining a word segmentation result;
specifically, according to the local corpus rule determined in step S450, the HanLp natural language engine performs word segmentation processing on the scheduled text by using the local prediction rule, so that the accuracy of word segmentation can be effectively improved.
Through steps S430 to S460, the local corpus rule is determined by determining the word frequencies of different words in the corpus, and the word segmentation processing is performed on the scheduled text according to the local corpus rule, so as to improve the accuracy of the word segmentation processing.
In some embodiments, referring to fig. 5, fig. 5 is a flowchart illustrating steps of establishing a cleaning library according to an embodiment of the present application, where the method includes, but is not limited to, steps S470-S480:
s470, determining a cleaning library according to the corpus;
specifically, words which are easy to confuse in the corpus are collected, and a cleansing library is determined. The confusable vocabulary mainly refers to two or more vocabularies which are easily confused under the influence of accents of users, recognition accuracy, microphone reception quality and the like, such as 'dead end' corresponding to a 'lens', 'reaching' corresponding to a 'great road', 'blue mountain/lan mountain/man san' corresponding to a 'south mountain' and the like.
S480, correcting the scheduling text according to the cleaning library, and determining the cleaned scheduling text;
specifically, after the cleaning library is determined, the cleaning library is compared with a scheduling text obtained through voice conversion, the scheduling text is corrected, and the cleaned scheduling text is determined.
Through the steps S470-S480, the cleaning library is determined according to the confusable vocabulary, and the scheduled text is corrected according to the cleaning library, so that the accuracy of the scheduled text is improved, and the fault tolerance of the voice recognition is improved.
As to how to determine the operation instruction and the like, the step S500 in fig. 1 is described below after the above description.
S500, executing an operation instruction, and displaying a monitoring video;
specifically, referring to the above steps S410-S420, each field of the operation instruction is determined according to the part of speech of the word segmentation result, and the monitoring scheduling system can control to display the specified monitoring video according to each field of the operation instruction. Illustratively, if the operation instruction is "play", "ecology technology park" or "video monitoring" arranged in the order of the scheduling field, the element field and the reference field, the monitoring scheduling system can specify, by the instruction, that it is necessary to search the video resource corresponding to the ecology technology park in the video monitoring database and display the video resource on the display device. In addition, the embodiment of the application can display the monitoring video and the scheduling text obtained by converting the scheduling voice so that a user can determine whether the scheduling voice is correctly understood, and the interactive user experience is improved.
It should be noted that, referring to the content and the content in table 2, it can be understood that, the operation instruction mentioned in the embodiment of the present application, in addition to being used for monitoring specified by scheduling display, may also be used for controlling other referring entities to meet various scheduling scene requirements, for example, the monitoring device is connected with the cradle head, the operation instruction may control the rotation of the cradle head in 8 directions through the interface, and issue instructions about rotation amplitude, wiper, rotation speed, zooming of the image, and the like, and the machine instruction converted from the scheduling voice may call the corresponding monitoring device interface through the reserved interface layer.
Through steps S100 to S500, in the embodiment of the present application, a scheduling voice is obtained, the scheduling voice is converted into a scheduling text by using a trained recognition model, a word segmentation processing is performed on the scheduling text to obtain a word segmentation result, an operation instruction is determined according to the word segmentation result, an appointed monitoring video is displayed according to the operation instruction, and a user only needs to input a scheduling language in a natural language, so that the scheduling of the monitoring video can be realized.
In some embodiments, the monitoring and scheduling method provided in the embodiment of the present application further includes: the information queue is utilized to improve the accuracy of the operation instruction. Referring to fig. 6, fig. 6 is a flowchart illustrating steps of completing an operation command according to an embodiment of the present application, where the method includes, but is not limited to, steps S600 to S610:
s600, storing the word segmentation results of a plurality of sentences into an information queue;
specifically, in a natural language environment, an instruction included in scheduling speech is a complex time-varying instruction having strong correlation among sentences, and the correlation is mainly reflected in a phenomenon that a context subject object is placed in front when speaking, and often several preceding and following sentences affect a meaning that a person wants to express at present, that is, long-term correlation exists between preceding and following sentences of speech. Therefore, the context information is acquired to a certain extent by adopting the way of splicing the sentences, and the context information is helpful for improving the understanding of the meaning which is required to be expressed currently. However, because the window length of the FDNN input is fixed, the recognition model learns a fixed input-to-input mapping relationship, resulting in a weak modeling of the long-term correlation of the FDNN with respect to the timing information. Therefore, the method and the device can be used for increasing the sentence understanding of the context by the aid of the information queue and providing help for improving the friendliness of human-computer interaction.
Specifically, in the embodiment of the application, a plurality of sentences in the word segmentation result are placed in an information queue, context is understood and recorded according to semantics, and the current sentence is removed from the queue after more than a plurality of rounds of conversation.
S610, if the current operation instruction lacks an element instruction, completing the operation instruction according to a plurality of sentence segmentation results in the information queue;
specifically, a plurality of sentences in the word segmentation result are placed in the information queue according to step S600, and if the current sentence cannot obtain the operation completion instruction including the scheduling field, the element field, and the reference field, the operation instruction is completed according to the context of the sentence in the information queue. Illustratively, take four statements in the information queue as an example: and positioning a main door of the ecological science and technology park, searching peripheral video monitoring, playing a second path of video pictures, and zooming in a video holder. "wherein, the first sentence" location ecological technology garden main door "contains element field" ecological technology garden ", and the second sentence" look for peripheral video monitoring "only contains scheduling field" look for "and indicate field" control ", lacks the element instruction, contacts the context, completes the operation instruction from the first sentence, obtains complete operation instruction: the method comprises the steps of 'searching', 'ecological technology garden', 'periphery', 'monitoring', and 'monitoring', wherein according to the operation instruction, the monitoring and scheduling system can inquire video monitoring resources within a range of a plurality of meters in diameter of the periphery of the ecological technology garden. In addition, the third sentence "play the second path of video pictures" contains the element field "the second path", and in combination with the first and second sentences, the monitoring scheduling system can inquire the monitoring resources monitored by the second path of the ecological technology park.
Through steps S600-S610, the word segmentation result is stored in the information queue, and the operation instruction is completed according to the context understanding of a plurality of sentences in the word segmentation result. Through the understanding of the upper sentence and the lower sentence, the accuracy of the operation instruction can be practically improved, and the switching of each path of monitoring video is facilitated.
In summary, the embodiment of the present application provides a monitoring scheduling method, which obtains a scheduling text by converting a scheduling voice of a user, performs word segmentation processing on the scheduling text, determines an operation instruction according to a word segmentation result, and schedules and displays a corresponding monitoring video according to the operation instruction. The embodiment of the application provides a method for completing monitoring and scheduling by a user through voice, which firstly gets rid of the traditional complex operation of scheduling and monitoring by using equipment such as a mouse, a keyboard and the like, and greatly lightens the use burden of the user; moreover, the accuracy of the whole process of voice conversion and instruction generation is improved through word segmentation processing; in addition, the expansibility of the monitoring and scheduling system is effectively improved through the establishment of language libraries such as a conceptual model and a corpus; finally, through the steps of using the information queue, establishing a local corpus rule, establishing a cleaning library and the like, the accuracy and the fault tolerance rate of voice recognition and word segmentation are effectively improved, and the accuracy rate of voice scheduling monitoring can meet the actual combat requirement.
Referring to fig. 7, fig. 7 is a schematic diagram of a monitoring and scheduling system according to an embodiment of the present application, where the system 700 includes an obtaining module 710, a speech recognition module 720, a word segmentation processing module 730, an instruction generating module 740, and an instruction executing module 750, where the obtaining module is configured to obtain a scheduling speech; the voice recognition module is used for determining a scheduling text according to the trained recognition model and the scheduling voice; the word segmentation processing module is used for performing sentence segmentation processing and word segmentation processing on the scheduling text and determining a word segmentation result; the instruction generation module is used for determining an operation instruction according to the word segmentation result; and the instruction execution module is used for executing the operation instruction and displaying the monitoring video.
Referring to fig. 8, fig. 8 is an apparatus provided in some embodiments of the present application, the apparatus 800 comprising at least one processor 810 and further comprising at least one memory 820 for storing at least one program; in fig. 8, a processor and a memory are taken as an example.
The processor and memory may be connected by a bus or other means, such as by a bus in FIG. 8.
The memory, which is a non-transitory computer readable storage medium, may be used to store non-transitory software programs as well as non-transitory computer executable programs. Further, the memory may include high speed random access memory, and may also include non-transitory memory, such as at least one disk storage device, flash memory device, or other non-transitory solid state storage device. In some embodiments, the memory optionally includes memory located remotely from the processor, and these remote memories may be connected to the device over a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
Another embodiment of the present application also provides an apparatus that may be used to perform the control method as in any of the embodiments above, for example, performing the method steps of fig. 1 described above.
The above-described embodiments of the apparatus are merely illustrative, wherein the units illustrated as separate components may or may not be physically separate, i.e. may be located in one place, or may also be distributed over a plurality of network elements. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment.
The embodiment of the application also discloses a computer storage medium, in which a program executable by a processor is stored, wherein the program executable by the processor is used for implementing the monitoring scheduling method provided by the application when being executed by the processor.
One of ordinary skill in the art will appreciate that all or some of the steps, systems, and methods disclosed above may be implemented as software, firmware, hardware, and suitable combinations thereof. Some or all of the physical components may be implemented as software executed by a processor, such as a central processing unit, digital signal processor, or microprocessor, or as hardware, or as an integrated circuit, such as an application specific integrated circuit. Such software may be distributed on computer readable media, which may include computer storage media (or non-transitory media) and communication media (or transitory media). The term computer storage media includes volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer readable instructions, data structures, program modules or other data, as is well known to those of ordinary skill in the art. Computer storage media includes, but is not limited to, RAM, ROM, EEPROM, flash memory or other memory technology, CD-ROM, Digital Versatile Disks (DVD) or other optical disk storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other medium which can be used to store the desired information and which can accessed by a computer. In addition, communication media typically embodies computer readable instructions, data structures, program modules or other data in a modulated data signal such as a carrier wave or other transport mechanism and includes any information delivery media as known to those skilled in the art.
While the preferred embodiments of the present invention have been described, the present invention is not limited to the above embodiments, and those skilled in the art can make various equivalent modifications or substitutions without departing from the spirit of the present invention, and such equivalent modifications or substitutions are included in the scope of the present invention defined by the claims.