Training method and device of text processing model, electronic equipment and storage medium
1. A method for training a text processing model, the method comprising:
constructing an initial neural network model, wherein the initial neural network model comprises a first text processing model and a second text processing model, the first text processing model comprises a text feature extraction module and a result prediction module which are connected in series, the second text processing model is connected with the output of the text feature extraction module, and the second text processing model comprises at least one of a mask language model or a named entity recognition model;
acquiring a first training data set corresponding to the first text processing model and a second training data set corresponding to the second text processing model, wherein the first training data set and the second training data set belong to different fields;
and training the initial neural network model based on the first training data set and the second training data set until a preset training ending condition is met, and taking the first text processing model at the training ending as a final text processing model.
2. The method of claim 1, wherein the second text processing model comprises a mask language model and a named entity recognition model respectively connected to an output of the text feature extraction module, and wherein the second training data set comprises a third training data set corresponding to the mask language model and a fourth training data set corresponding to the named entity recognition model.
3. The method according to claim 1 or 2, wherein the training the initial neural network model based on the first training data set and the second training data set until a preset training end condition is met comprises:
performing joint alternating training on the first text processing model and the second text processing model based on the first training data set and the second training data set until a loss function corresponding to the first text processing model converges, wherein the training end condition comprises the convergence of the loss function corresponding to the first text processing model;
for each training, determining a loss function value corresponding to a model branch based on a training data set corresponding to the trained model branch and a text processing result of the training data set obtained through the model branch, and adjusting model parameters of the model branch based on the loss function value corresponding to the model branch, wherein the model branch is a first text processing model or a second text processing model.
4. The method of claim 3, wherein the second text processing model comprises a mask language model and a named entity recognition model respectively connected to the output of the text feature extraction module, and wherein the second training data set comprises a third training data set corresponding to the mask language model and a fourth training data set corresponding to the named entity recognition model;
the performing joint-alternating training on the first text processing model and the second text processing model based on the first training data set and the second training data set until the loss function corresponding to the first text processing model converges includes:
performing joint alternate training on the first text processing model, the mask language model and the named entity recognition model based on the first training data set, the third training data set and the fourth training data set until a loss function corresponding to the first text processing model converges;
wherein the model branch is any one of a first text processing model, a mask language model or a named entity recognition model.
5. The method according to claim 4, wherein for the mask language model or the named entity recognition model, determining the value of the penalty function corresponding to the model branch based on the training data set corresponding to the trained model branch and the text processing result of the training data set obtained through the model branch for each training comprises:
inputting each training sample of the training data set corresponding to the model branch to the text feature extraction module to obtain the text feature of each sample;
inputting the text characteristics of each sample into the model branches to obtain the text processing result of each sample;
and determining the value of the loss function corresponding to the model branch based on the sample label of each sample and the text processing result of each sample.
6. The method of claim 1, wherein the second text processing model comprises a mask language model, and wherein obtaining a second training data set corresponding to the second text processing model comprises:
acquiring first texts belonging to a first field;
for each first text, at least one character in the first text is shielded to obtain a second text;
and taking each first text and a second text corresponding to the first text as a training sample corresponding to the mask language model in the second training data set.
7. The method of claim 1, wherein the second text processing model comprises a named entity recognition model, and wherein obtaining a second training data set corresponding to the second text processing model comprises:
acquiring a third text belonging to a second field;
for each third text, labeling at least one entity in the third text to obtain a fourth text;
and taking each third text and a fourth text corresponding to the third text as a training sample of the second training data set.
8. An apparatus for training a text processing model, the apparatus comprising:
the model construction module is used for constructing an initial neural network model, the initial neural network model comprises a first text processing model and a second text processing model, the first text processing model comprises a text feature extraction module and a result prediction module which are connected in series, the second text processing model is connected with the output of the text feature extraction module, and the second text processing model comprises at least one of a mask language model or a named entity recognition model;
a data obtaining module, configured to obtain a first training data set corresponding to the first text processing model and a second training data set corresponding to the second text processing model, where the first training data set and the second training data set belong to different fields;
and the model training module is used for training the initial neural network model based on the first training data set and the second training data set until a preset training ending condition is met, and taking the first text processing model at the training ending as a final text processing model.
9. An electronic device, characterized in that the electronic device comprises:
one or more processors;
a memory;
one or more computer programs, wherein the one or more computer programs are stored in the memory and configured to be executed by the one or more processors, the one or more computer programs configured to perform the method of any of claims 1-7.
10. A computer-readable storage medium, characterized in that the computer-readable storage medium is for storing a computer program which, when run on a processor, causes the processor to perform the method of any of claims 1-7.
Background
The text processing technology is a technology commonly used in natural language processing in the industry at present, and text data is correspondingly processed according to different application scenes and actual processing tasks through a text processing model. The text processing may include text matching, text classification, text generation, and the like. In the prior art, for a trained text processing model, the processing performance of the model is mainly limited to the field to which the training data of the model belongs, and when the text in the extended field is processed, the text processing effect is usually not ideal.
Disclosure of Invention
The embodiment of the application provides a training method and device for a text processing model, electronic equipment and a storage medium.
In order to achieve the above purpose, the embodiments of the present application provide the following specific technical solutions:
in one aspect, an embodiment of the present application provides a method for training a text processing model, where the method includes:
constructing an initial neural network model, wherein the initial neural network model comprises a first text processing model and a second text processing model, the first text processing model comprises a text feature extraction module and a result prediction module which are connected in series, the second text processing model is connected with the output of the text feature extraction module, and the second text processing model comprises at least one of a mask language model or a named entity recognition model;
acquiring a first training data set corresponding to a first text processing model and a second training data set corresponding to a second text processing model, wherein the first training data set and the second training data set belong to different fields;
and training the initial neural network model based on the first training data set and the second training data set until a preset training ending condition is met, and taking the first text processing model at the training ending as a final text processing model.
On the other hand, the embodiment of the invention also provides a text processing method, which comprises the following steps:
acquiring a text to be processed;
inputting a text to be processed into a text processing model to obtain a processing result;
based on the processing result, corresponding processing is carried out;
wherein the text processing model is trained based on the method of any one of the implementation manners of the first aspect.
The embodiment of the invention also provides a device for training the text processing model, which comprises:
the model construction module is used for constructing an initial neural network model, the initial neural network model comprises a first text processing model and a second text processing model, the first text processing model comprises a text feature extraction module and a result prediction module which are connected in series, the second text processing model is connected with the output of the text feature extraction module, and the second text processing model comprises at least one of a mask language model or a named entity recognition model;
the data acquisition module is used for acquiring a first training data set corresponding to the first text processing model and a second training data set corresponding to the second text processing model, wherein the first training data set and the second training data set belong to different fields;
and the model training module is used for training the initial neural network model based on the first training data set and the second training data set until a preset training ending condition is met, and taking the first text processing model when training is ended as a final text processing model.
An embodiment of the present invention further provides a text processing apparatus, where the apparatus includes:
the text acquisition module is used for acquiring a text to be processed;
the model processing module is used for inputting the text to be processed into the text processing model to obtain a processing result;
the result processing module is used for carrying out corresponding processing based on the processing result;
wherein the text processing model is trained based on the method of any one of the implementation manners of the first aspect.
The embodiment of the invention also provides the electronic equipment, which comprises one or more processors; a memory; one or more computer programs, wherein the one or more computer programs are stored in the memory and configured to be executed by the one or more processors, the one or more computer programs configured to perform the method as illustrated in the first or second aspect of the present application.
Embodiments of the present invention also provide a computer-readable storage medium for storing a computer program, which, when executed on a processor, enables the processor to perform the method as shown in the first aspect or the second aspect of the present application.
Embodiments of the present invention also provide a computer program product or computer program comprising computer instructions stored in a computer readable storage medium. The processor of the computer device reads the computer instructions from the computer-readable storage medium, and the processor executes the computer instructions to cause the computer device to perform the method provided in the above-described training method of the text processing model or various alternative implementations of the text processing method.
The beneficial effect that technical scheme that this application provided brought is:
the application provides a training method and a device of a text processing model, electronic equipment and a storage medium, wherein the training method of the text processing model comprises the following steps: and performing joint training on the first text processing model and the second text processing model by adopting a first training data set and a second training data set in different fields until a preset training ending condition is met, thereby obtaining a text processing model meeting requirements. The auxiliary training of the first text processing model can be realized through the combined training, and the accuracy of the trained text processing model for processing data in different fields is improved. Moreover, because the second text processing model is at least one of a mask language model or a named entity recognition model, the training sample does not need to expand sentence pairs and corresponding labels in the field, the training cost of the model is not increased, and the field expansion without increasing the cost can be realized, so that the problems of difficult field migration and low data processing accuracy when the model processes data in the expanded field are solved, and the use effect of the text processing model in the expanded field is improved.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings used in the description of the embodiments of the present application will be briefly described below.
Fig. 1 is a schematic flowchart of a method for training a text processing model according to an embodiment of the present disclosure;
FIG. 2 is a diagram illustrating a training process of a second text processing model according to an embodiment of the present application;
fig. 3 is a schematic diagram illustrating joint and alternative training performed by a first text processing model and a second text processing model according to an embodiment of the present application;
fig. 4 is a schematic flowchart of a text processing method according to an embodiment of the present application;
FIG. 5 is a schematic diagram of an interface for a backlog fund query according to an embodiment of the present disclosure;
FIG. 6 is a schematic diagram of data processing of a question-answering model provided in an embodiment of the present application;
FIG. 7 is a schematic interface diagram of a question-answer model provided in an embodiment of the present application;
FIG. 8 is a schematic structural diagram of a training apparatus for a text processing model according to an embodiment of the present disclosure;
fig. 9 is a schematic structural diagram of a text processing apparatus according to an embodiment of the present application;
fig. 10 is a schematic structural diagram of an electronic device according to an embodiment of the present application.
Detailed Description
Reference will now be made in detail to embodiments of the present application, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to the same or similar elements or elements having the same or similar function throughout. The embodiments described below with reference to the drawings are exemplary only for the purpose of explaining the present application and are not to be construed as limiting the present application.
As used herein, the singular forms "a", "an", "the" and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It will be further understood that the terms "comprises" and/or "comprising," when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof. It will be understood that when an element is referred to as being "connected" or "coupled" to another element, it can be directly connected or coupled to the other element or intervening elements may also be present. Further, "connected" or "coupled" as used herein may include wirelessly connected or wirelessly coupled. As used herein, the term "and/or" includes all or any element and all combinations of one or more of the associated listed items.
The embodiment of the application aims at the problems that a text processing model in the prior art is limited in the field of a training data set, and when text data in an extended field is processed, the field migration is difficult and the processing effect is not good. According to the training method of the text processing model, the first training data set and the second training data set in different fields are adopted, the first text processing model and the second text processing model are subjected to combined training until a preset training ending condition is met, and therefore the text processing model meeting requirements is obtained. The auxiliary training of the first text processing model can be realized through the combined training, and the accuracy of the trained text processing model for processing data in different fields is improved. Moreover, because the second text processing model is a mask language model or a named entity recognition model, the training sample does not need to expand sentence pairs and corresponding labels in the field, the training cost of the model cannot be increased, and the field expansion without increasing the cost can be realized, so that the problems of difficult field migration and low data processing accuracy when the model processes data in the expanded field are solved, and the use effect of the text processing model in the expanded field is improved.
The application also provides a text processing method obtained by the training method of the text processing model provided by the embodiment of the application, and based on the method, the corresponding processing of the text to be processed in the model expansion field can be realized, and the data processing accuracy is higher.
The scheme provided by each optional embodiment of the application relates to the fields of artificial intelligence, cloud technology, big data and the like in the computer technology.
The model training method and the text processing method in the embodiment of the application can be realized through machine learning in an artificial intelligence technology.
Artificial Intelligence (AI) is a theory, method, technique and application system that uses a digital computer or a machine controlled by a digital computer to simulate, extend and expand human Intelligence, perceive the environment, acquire knowledge and use the knowledge to obtain the best results. In other words, artificial intelligence is a comprehensive technique of computer science that attempts to understand the essence of intelligence and produce a new intelligent machine that can react in a manner similar to human intelligence. Artificial intelligence is the research of the design principle and the realization method of various intelligent machines, so that the machines have the functions of perception, reasoning and decision making.
The artificial intelligence technology is a comprehensive subject and relates to the field of extensive technology, namely the technology of a hardware level and the technology of a software level. The artificial intelligence infrastructure generally includes technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, mechatronics, and the like. The artificial intelligence technology related to the embodiment of the application mainly comprises a natural language processing technology, machine learning/deep learning and the like.
Natural Language Processing (NLP) is an important direction in the fields of computer science and artificial intelligence. It studies various theories and methods that enable efficient communication between humans and computers using natural language. Natural language processing is a science integrating linguistics, computer science and mathematics. Therefore, the research in this field will involve natural language, i.e. the language that people use everyday, so it is closely related to the research of linguistics. Natural language processing techniques typically include text processing, semantic understanding, machine translation, robotic question and answer, knowledge mapping, and the like.
Machine Learning (ML) is a multi-domain cross discipline, and relates to a plurality of disciplines such as probability theory, statistics, approximation theory, convex analysis, algorithm complexity theory and the like. The special research on how a computer simulates or realizes the learning behavior of human beings so as to acquire new knowledge or skills and reorganize the existing knowledge structure to continuously improve the performance of the computer. Machine learning is the core of artificial intelligence, is the fundamental approach for computers to have intelligence, and is applied to all fields of artificial intelligence. Machine learning and deep learning generally include techniques such as artificial neural networks, belief networks, reinforcement learning, transfer learning, inductive learning, and formal education learning.
The data processing related to the embodiment of the application can be realized by a cloud technology, and the data computing related to the data processing can be realized by cloud computing in the cloud technology.
Cloud computing (cloud computing) is a computing model that distributes computing tasks over a pool of resources formed by a large number of computers, enabling various application systems to obtain computing power, storage space, and information services as needed. The network that provides the resources is referred to as the "cloud". Resources in the "cloud" appear to the user as being infinitely expandable and available at any time, available on demand, expandable at any time, and paid for on-demand.
As a basic capability provider of cloud computing, a cloud computing resource pool (called as an ifas (Infrastructure as a Service) platform for short is established, and multiple types of virtual resources are deployed in the resource pool and are selectively used by external clients.
According to the logic function division, a PaaS (Platform as a Service) layer can be deployed on an IaaS (Infrastructure as a Service) layer, a SaaS (Software as a Service) layer is deployed on the PaaS layer, and the SaaS can be directly deployed on the IaaS. PaaS is a platform on which software runs, such as a database, a web container, etc. SaaS is a variety of business software, such as web portal, sms, and mass texting. Generally speaking, SaaS and PaaS are upper layers relative to IaaS.
Cloud computing (cloud computing) refers to a delivery and use mode of an IT infrastructure, and refers to obtaining required resources in an on-demand and easily-extensible manner through a network; the generalized cloud computing refers to a delivery and use mode of a service, and refers to obtaining a required service in an on-demand and easily-extensible manner through a network. Such services may be IT and software, internet related, or other services. Cloud Computing is a product of development and fusion of traditional computers and Network Technologies, such as Grid Computing (Grid Computing), distributed Computing (distributed Computing), Parallel Computing (Parallel Computing), Utility Computing (Utility Computing), Network Storage (Network Storage Technologies), Virtualization (Virtualization), Load balancing (Load Balance), and the like.
With the development of diversification of internet, real-time data stream and connecting equipment and the promotion of demands of search service, social network, mobile commerce, open collaboration and the like, cloud computing is rapidly developed. Different from the prior parallel distributed computing, the generation of cloud computing can promote the revolutionary change of the whole internet mode and the enterprise management mode in concept.
The training data required for model training in the embodiment of the present application may be big data acquired from the internet.
Big data (Big data) refers to a data set which cannot be captured, managed and processed by a conventional software tool within a certain time range, and is a massive, high-growth-rate and diversified information asset which can have stronger decision-making power, insight discovery power and flow optimization capability only by a new processing mode. With the advent of the cloud era, big data has attracted more and more attention, and the big data needs special technology to effectively process a large amount of data within a tolerance elapsed time. The method is suitable for the technology of big data, and comprises a large-scale parallel processing database, data mining, a distributed file system, a distributed database, a cloud computing platform, the Internet and an extensible storage system.
The following describes the technical solutions of the present application and how to solve the above technical problems with specific embodiments. The following several specific embodiments may be combined with each other, and details of the same or similar concepts or processes may not be repeated in some embodiments. Embodiments of the present application will be described below with reference to the accompanying drawings.
The execution subject of the technical scheme of the application is computer equipment, including but not limited to a server, a personal computer, a notebook computer, a tablet computer, a smart phone and the like. The computer equipment comprises user equipment and network equipment. User equipment includes but is not limited to computers, smart phones, PADs, etc.; the network device includes, but is not limited to, a single network server, a server group consisting of a plurality of network servers, or a cloud consisting of a large number of computers or network servers for cloud computing, wherein the cloud computing is a kind of distributed computing, and a super virtual computer is composed of a group of loosely coupled computers. The computer equipment can run independently to realize the application, and can also be accessed to the network to realize the application through the interactive operation with other computer equipment in the network. The network in which the computer device is located includes, but is not limited to, the internet, a wide area network, a metropolitan area network, a local area network, a VPN network, etc.
An embodiment of the present application provides a method for training a text processing model, where an execution subject of the method may be any electronic device, and as shown in fig. 1, the method may include:
step S101, constructing an initial neural network model;
the initial neural network model comprises a first text processing model and a second text processing model, the first text processing model comprises a text feature extraction module and a result prediction module which are connected in series, the second text processing model is connected with the output of the text feature extraction module, and the second text processing model comprises at least one of a mask language model or a named entity recognition model.
The text feature extraction module is configured to perform feature extraction on an input text, and may be any neural network model having a text feature extraction function, for example, a Bidirectional Encoder representation from transforms (BERT) model based on a transformer, and the like. The result prediction module is used for performing result prediction based on the text features output by the text feature extraction module, and may be any neural network model with a result prediction function, and taking text similarity prediction as an example, the result prediction module may be a twin network (simple network) or the like. The mask Language Model (Masked Language Model) is used for predicting the Masked part of the partially Masked text data to obtain the predicted value of the Masked part. The Named Entity Recognition (NER) model is used for recognizing the Named entities in the input text to obtain the predicted value of the Named entities. The named entities may include, but are not limited to, names of people, names of organizations, names of places, and all other entities identified by names, and further include numbers, dates, currencies, addresses, and the like.
In the embodiment of the application, the second text processing model connected with the output of the text feature extraction module assists the training of the text feature extraction module, and the coding modeling capacity of the finally trained text processing model can be improved.
Step S102, a first training data set corresponding to a first text processing model and a second training data set corresponding to a second text processing model are obtained;
wherein the first training data set and the second training data set belong to different fields. The first training data set is a training data set corresponding to the first text processing model for training the text feature extraction module and the result prediction module. The second training data set is a training data set corresponding to a second text processing model for training at least one of a mask language model or a named entity recognition model. Optionally, the first training data set may be data of an original preset domain, and the second training data set may be data of an extended domain.
And training the mask language model or the named entity recognition model by adopting a second training data set in a different field from the first training data set, which is beneficial to improving the coding modeling capacity of the finally trained text processing model on the text in the extended field, thereby improving the text processing effect of the text processing model on the extended field. Moreover, because the second text processing model is at least one of a mask language model or a named entity recognition model, the training sample does not need to expand sentence pairs and corresponding labels in the field, the training cost of the model is not increased, and the field expansion without increasing the cost can be realized, so that the problems of difficult field migration and low data processing accuracy when the model processes data in the expanded field are solved, and the use effect of the text processing model in the expanded field is improved.
Step S103, training the initial neural network model based on the first training data set and the second training data set until a preset training end condition is met, and taking the first text processing model at the end of training as a final text processing model.
According to the method and the device for processing the text, the initial neural network model comprising the first text processing model and the second text processing model is built, the first training data set and the second training data set which is formed by label-free data in different fields from the first training data set are adopted, and the first text processing model and the second text processing model are jointly trained, so that the text processing model meeting requirements is obtained. The auxiliary training of the first text processing model can be realized through the combined training, the accuracy rate of processing data in different fields by the trained text processing model is improved, moreover, due to the fact that label-free data in different fields are adopted, the training cost of the model cannot be increased, the field expansion without increasing the cost can be realized, and the using effect of the text processing model in the expansion field is improved.
The finally obtained text processing model in the embodiment of the application can be applied to scenes such as text matching, text classification, text generation and the like, correspondingly, the text processing model is any one of the text matching model, the text classification model and the text generation model, and corresponding output results are obtained based on the specific type of the text processing model to perform corresponding processing such as text matching, text classification, text generation and the like.
In practical applications, the text matching model is a model that handles the task of text matching. Text matching is to calculate semantic similarity between two texts, and similar or dissimilar matching judgment is carried out on the text pairs through semantic similarity calculation. Text matching can be applied to a large number of natural language processing tasks such as information retrieval, question and answer systems, perusal questions, dialog systems, and the like. These natural language processing tasks can be abstracted to a large extent to text matching questions, for example, information retrieval can be attributed to matching of search terms and document resources, question-answering systems can be attributed to matching of questions and candidate answers, rephrase questions can be attributed to matching of two synonymous sentences, and dialogue systems can be attributed to matching of preceding sentence dialogue and reply.
A text classification model is a model that handles text classification tasks. The text classification means that the computer automatically classifies and marks the text data set according to a certain classification system or standard. The text classification model is a model for determining the relationship between the document features and the text categories according to a labeled training text data set, and the text classification model can be used for carrying out category judgment on unlabeled texts.
The text generation model is a model that generates new text data based on input text data. The text generation model can learn the combination rule among different texts according to a large amount of texts based on text statistics, and further presume a possible combination mode as output according to input.
In addition, the specific structures of the text matching model, the text classification model and the text generation model can be constructed according to specific needs, and can be any neural network model capable of realizing corresponding functions, which is not limited in the application.
In one possible implementation, the second text processing model includes a mask language model and a named entity recognition model respectively connected to the output of the text feature extraction module, and the second training data set includes a third training data set corresponding to the mask language model and a fourth training data set corresponding to the named entity recognition model.
In practical application, the second text processing model may include two models, a mask language model and a named entity recognition model, and correspondingly, the second training data set includes a third training data set and a fourth training data set corresponding to the mask language model and the named entity recognition model, and when model training is performed, the two models are trained by using the training data sets corresponding to each other. In the embodiment of the application, the mask language model and the named entity recognition model are used for assisting the training of the text feature extraction module, so that the coding modeling capacity of the finally trained text processing model can be improved in two aspects of shielding text recognition and named entity recognition, and the data processing effect of the text processing model is further improved.
In a possible implementation manner, training the initial neural network model based on the first training data set and the second training data set until a preset training end condition is satisfied includes:
performing joint alternate training on the first text processing model and the second text processing model based on the first training data set and the second training data set until a loss function corresponding to the first text processing model converges, wherein the training end condition comprises the loss function convergence corresponding to the first text processing model;
for each training, determining the value of a loss function corresponding to the model branch based on the training data set corresponding to the trained model branch and the text processing result of the training data set obtained through the model branch, and adjusting the model parameters of the model branch based on the value of the loss function corresponding to the model branch, wherein the model branch is a first text processing model or a second text processing model.
In practical application, when performing joint and alternative training on the first text processing model and the second text processing model, the second text processing model is connected with the output of the text feature extraction module to assist the training of the text feature extraction module, and the sequence of the joint and alternative training may be: performing primary training on the first text processing model based on the first training data set, determining the value of a first loss function based on the first training data set and a text processing result of the first training data set obtained through the first text processing model, and adjusting the model parameters of the first text processing model if the first loss function is not converged; performing primary training on the second text processing model based on the second training data set, and determining a value of a second loss function based on the second training data set and a text processing result of the second training data set obtained through the second text processing model; if the second loss function is not converged, the model parameters of the second text processing model are adjusted, and the training steps are repeatedly executed until the first loss function is converged, at this time, even if the second loss function is not converged, the training can not be continued, and the model training is finished. Wherein the first loss function may be a cross-entropy loss function. In addition, if the second loss function converges first and the first loss function does not converge, the above joint alternation training process may be continued until the first loss function converges and the model training is completed.
In the embodiment of the application, the first training data set and the second training data set belong to different fields, and through the joint alternate training of the first text processing model and the second text processing model, the second text processing model assists the text feature extraction module in the first text processing model to perform training, so that the coding modeling capacity of the finally trained text processing model in the field to which the second training data set belongs can be improved, and the extension of the model application field is realized.
In one possible implementation, the second text processing model includes a mask language model and a named entity recognition model respectively connected to the output of the text feature extraction module, and the second training data set includes a third training data set corresponding to the mask language model and a fourth training data set corresponding to the named entity recognition model;
performing joint alternate training on the first text processing model and the second text processing model based on the first training data set and the second training data set until the loss function corresponding to the first text processing model converges, including:
and performing joint alternate training on the first text processing model, the mask language model and the named entity recognition model based on the first training data set, the third training data set and the fourth training data set until the loss function corresponding to the first text processing model is converged.
In practical applications, if the second text processing model includes a mask language model and a named entity recognition model, the model branches to any one of the first text processing model, the mask language model, or the named entity recognition model. The first text processing model, the masking language model, and the named entity recognition model may be jointly trained alternately by: performing primary training on the first text processing model based on the first training data set to determine a value of a first loss function; performing one training on the mask language model based on a third training data set to determine a value of a third loss function; performing primary training on the named entity recognition model based on a fourth training data set to determine a value of a fourth loss function; and training for multiple times until the first loss function converges, wherein the training can be stopped and the model training is finished even if the third loss function and the fourth loss function do not converge. In addition, if the third loss function and the fourth loss function converge first and the first loss function does not converge, the above joint alternation training process may be continuously performed until the first loss function converges and the model training is completed.
In a possible implementation manner, for a mask language model or a named entity recognition model, for each training, determining a value of a loss function corresponding to a model branch based on a training data set corresponding to the trained model branch and a text processing result of the training data set obtained through the model branch includes:
inputting each training sample of the training data set corresponding to the model branch to a text feature extraction module to obtain the text feature of each sample;
inputting the text characteristics of each sample into the model branches to obtain the text processing result of each sample;
and determining the value of the loss function corresponding to the model branch based on the sample label of each sample and the text processing result of each sample.
In practical application, when training model branches, for a mask language model or a named entity recognition model, corresponding training data sets respectively include training samples corresponding to the model branches and sample labels of the samples. The mask language model and the named entity recognition model are respectively connected with the output of the text feature extraction module, and each training sample corresponding to the model branch is input to the text feature extraction module to obtain the text feature of each sample; inputting the text characteristics of each sample into the model branches to obtain the text processing result of each sample; based on the sample label of each sample and the text processing result of each sample, the value of the loss function corresponding to the model branch, that is, the value of the third loss function corresponding to the mask language model and the value of the fourth loss function corresponding to the named entity recognition model can be calculated.
In one possible implementation, the second text processing model includes a mask language model, and obtaining a second training data set corresponding to the second text processing model includes:
acquiring first texts belonging to a first field;
for each first text, at least one character in the first text is shielded to obtain a second text;
and taking each first text and the second text corresponding to the first text as a training sample corresponding to the mask language model in the second training data set.
In practical application, the first field may be a field of a text to be processed of the trained text processing model, that is, an extended field of the model, which is a field different from the first training data set, and the coding modeling capability of the trained text processing model in the field may be enhanced by obtaining the training data set in the field. Each training sample may be a first text and a second text that is masked from characters in the first text. Wherein the first text may be a sentence in the first domain.
In one example, the first text is sentence a1, 15% of the characters in sentence a1 are masked, masked sentence a1 is used as the second text, and the unmasked pair sentence a1 and masked sentence a1 are used as one training sample of the mask language model.
In one possible implementation, the second text processing model includes a named entity recognition model, and obtaining a second training data set corresponding to the second text processing model includes:
acquiring a third text belonging to a second field;
for each third text, labeling at least one entity in the third text to obtain a fourth text;
and taking each third text and a fourth text corresponding to the third text as a training sample of the second training data set.
In practical application, in order to enhance the encoding modeling capability of the trained text processing model on the named entity of the text in the extended field, a training sample of the named entity model may be obtained in the extended field, specifically, a third text may be obtained, where the third text may be a text in a second field different from the first field, and may also be a text in the first field, the third text may specifically be a sentence, an entity in the sentence is labeled to obtain a labeled sentence as a fourth text, and the third text and the fourth text are used as a training sample of the named entity recognition model. Optionally, when the third text is subjected to entity tagging, entities in the sentence may be tagged, and the tagged entities may include nouns in the sentence, proper nouns in the first field, and other three categories.
The following describes the training process of the second text processing model in the present embodiment in detail by using a specific embodiment. The embodiment is only one embodiment of the technical solution of the present application, and does not represent all implementation manners of the technical solution of the present application.
As shown in fig. 2, in this embodiment, the second text processing module includes a mask language model or a named entity recognition model, the text feature extraction module is a BERT model, and the following describes a process of inputting and training samples of the second text processing module:
the BERT model comprises an input layer, an encoding layer and an output layer, and the training samples of the second text processing model are input into the input layer of the BERT model, and the training samples can be texts in the text processing model extension field. The embodiment takes the training MASK language model as an example, where each training sample is a sentence, as shown in the figure, "an example of presentation", where CLS and SEP are flag bits in the sentence, and are used to segment each sentence, and MASK "one" in the sentence, and the MASK flag bits represent the position of the masked character in the sentence. The input layer performs initial feature extraction on the input training sample, specifically, the initial feature of the text respectively acquires word features, segment features and position features through a word feature embedding (token embedding) layer, a segment feature embedding (segment embedding) layer and a position feature embedding (position embedding) layer in the input layer, splicing the features of the three dimensions to obtain an initialized feature vector, inputting the initialized feature vector into a coding layer (such as BERT shown in the figure) of a BERT model to obtain sample features after coding, outputting the sample features after coding to a mask language model for processing, and calculating a loss function corresponding to the mask language model, performing auxiliary training on the BERT model through the language model, the encoding modeling capacity of the text processing model in the expansion field can be improved, and the data processing effect of the text processing model in the expansion field is further improved.
The following describes in detail a process of performing joint and alternative training on the first text processing model and the second text processing model in the technical solution of the present application by using a specific embodiment. The embodiment is only one embodiment of the technical solution of the present application, and does not represent all implementation manners of the technical solution of the present application.
As shown in fig. 3, in this embodiment, the second text processing model includes a mask language model and a named entity recognition model, the text feature extraction module is a BERT model, and the result prediction module is a similarity prediction module (such as a cosine sim shown in the figure), for convenience of description, when the mask language model is trained, "sentence a" shown in the figure is used as a sample in a training set of the mask language model, and includes sentence a and a masked sentence a obtained by masking characters in sentence a; when training the named entity recognition model, "sentence a" shown in the figure is used as a sample in a training set of the named entity recognition model, and includes sentence a and a labeled sentence a obtained by labeling the named entity in sentence a. In training the first text processing model, the sentence a and the sentence B and the similarity labels of the sentence a and the sentence B are used as training samples. The specific process of performing the joint and alternate training on the first text processing model and the second text processing model is as follows:
firstly, training a mask language model once through a training sample corresponding to the mask language model, and determining a mask loss function according to a sentence A and a prediction result corresponding to a shielding sentence A output by the mask language model; then, training the named entity recognition model once through a training sample corresponding to the named entity recognition model, and determining a named entity loss function according to the sentence A and a named entity prediction result of the sentence A output by the named entity recognition model; finally, training the BERT model and the similarity prediction module for one time based on the training samples corresponding to the first text processing model, respectively inputting the training samples of sentence A and sentence B into the BERT model to perform feature extraction, processing sentence A to obtain the multi-dimensional features corresponding to sentence A, performing pooling processing on the multi-dimensional features to obtain the feature vector U corresponding to sentence A, processing sentence B in the same processing mode to obtain the multi-dimensional features corresponding to sentence B, performing pooling processing on the multi-dimensional features to obtain the feature vector V corresponding to sentence B, predicting the similarity of vector U, V through the similarity prediction module, calculating a similarity loss function according to the similarity labels and predicted similarities of sentences A and B, repeatedly executing the training process, and jointly and alternately training a mask language model, a similarity prediction model, and a similarity prediction model, wherein the similarity between sentences A and B is different from each other, And naming the entity recognition model, the BERT model and the similarity prediction module until the similarity loss function is converged, and finishing the training of the text processing model.
In an alternative embodiment, the relevant parameters of the software and hardware environment of the training method for the text processing model provided in the embodiment of the present application are shown in table 1:
operating system
Memory device
Language environment
Linux
>16G
Python/c++
TABLE 1
The electronic device executing the method for training the text processing model provided by the embodiment of the application may be a server, and the software and hardware environment related parameters are shown in table 1, the operating system may be a Linux system, the memory is >16G, and the language environment may be Python/c + +.
In the embodiment of the application, through the combined alternate training of the first text processing model and the second text processing model, the second text processing model assists the text feature extraction module in the first text processing model to perform training, so that the coding modeling capacity of the finally trained text processing model in the expansion field can be improved, and the expansion of the model application field is realized.
According to the training method of the text processing model, the first training data set and the second training data set in different fields are adopted, the first text processing model and the second text processing model are subjected to combined training until a preset training ending condition is met, and therefore the text processing model meeting requirements is obtained. The auxiliary training of the first text processing model can be realized through the combined training, and the accuracy of the trained text processing model for processing data in different fields is improved. Moreover, because the second text processing model is at least one of a mask language model or a named entity recognition model, the training sample does not need to expand sentence pairs and corresponding labels in the field, the training cost of the model is not increased, and the field expansion without increasing the cost can be realized, so that the problems of difficult field migration and low data processing accuracy when the model processes data in the expanded field are solved, and the use effect of the text processing model in the expanded field is improved.
An embodiment of the present application provides a text processing method, where an execution subject of the method may be any electronic device, for example, the method may be executed by a server, as shown in fig. 4, and the method may include:
step S201, acquiring a text to be processed;
the source of the text to be processed is not limited in this embodiment, and may be any text in each text pre-stored in the database, or any text in a plurality of texts received from each user device.
It is understood that the above text to be processed may be different for different application scenarios.
In one possible implementation, the text to be processed comprises text of a domain to which the second training data set belongs.
In practical application, the field to which the second training data set belongs may be an extended field of a text processing model, and the text to be processed may be a text in the extended field of the model, and it can be understood that the text to be processed may also be a text in the original processing field of the model, that is, the field to which the first training data set belongs. When the text processing model is trained, training samples in different fields adopted by the first text processing model and the second text processing model are trained, and the improvement of the coding modeling capacity of the text processing model to the extended field is facilitated.
Step S202, inputting a text to be processed into a text processing model to obtain a processing result;
specifically, the text processing model is a first text processing model at the end of training provided in the embodiment of the application, and includes a text feature extraction module and a result prediction module, where a text to be processed is input to the text processing model, and the text feature extraction and result prediction are performed by the text processing model to obtain a processing result output by the text processing model.
Step S203, based on the processing result, corresponding processing is carried out;
the text processing model can be any one of a text matching model, a text classification model and a text generation model, and corresponding output results are obtained based on the specific type of the text processing model to perform corresponding processing such as text matching, text classification and text generation.
In a possible implementation manner, the text processing model is a text matching model, and the text to be processed comprises a query text of a requester and a plurality of candidate query results corresponding to the query text;
inputting the text to be processed into a text processing model, comprising:
inputting the query text and each candidate query result into a text matching model to obtain a first matching degree of the query text and each candidate query result, wherein the processing result comprises the first matching degree;
based on the processing result, corresponding processing is carried out, and the processing comprises the following steps:
and determining a target query result from the candidate query results based on the first matching degrees, and providing the target query result to the requester.
In practical application, the text processing model may be a text matching model, the text to be processed includes the query text of the requester and a plurality of candidate query results corresponding to the query text, and the matching degree between the query text and the plurality of candidate query results may be calculated based on the text matching model to determine the target query result. Optionally, the candidate query result with the matching degree within the preset range may be used as the target query result, or one query result with the highest matching degree may be used as the target query result and provided to the requesting party.
In an example, as shown in fig. 5, in an application scenario of the accumulation fund query, the user terminal receives a query text "accumulation fund" input by a user through a search box in an accumulation fund query interface, obtains a plurality of candidate query results corresponding to the query text "accumulation fund" from an accumulation fund query database, inputs the query text "accumulation fund" and the plurality of candidate query results into a text matching model, determines matching degrees of the "accumulation fund" and the plurality of candidate query results corresponding to the "accumulation fund" through the text matching model, determines a target query result from the candidate query results according to the matching degrees, and provides the target query result to the requester. As shown in the figure, the finally obtained target query result is a text corresponding to each category of "the public fund query", "the public fund service", and "the public fund-article", and is provided to the requester through a display interface of the user terminal.
In a possible implementation manner, the text processing model comprises a question-answer model, and the text to be processed comprises a question text of a question asking party and a plurality of candidate answers corresponding to the question text;
inputting the text to be processed into a text processing model, comprising:
inputting the questioning text and each candidate answer into a questioning and answering model to obtain a second matching degree of the questioning text and each candidate answer, wherein the processing result comprises the second matching degree;
based on the processing result, corresponding processing is carried out, and the processing comprises the following steps:
and determining a target answer from the candidate answers based on the second matching degrees, and providing the target answer to the questioner.
In practical application, the text processing model may be a question-answer model, the text to be processed includes a question text of a questioner and a plurality of candidate answers corresponding to the question text, and the matching degree between the question text and the plurality of candidate answers may be calculated based on the question-answer model to determine the target answer. Optionally, the candidate answer with the matching degree within the preset range may be used as the target answer, or one candidate answer with the highest matching degree may be used as the target answer and provided to the questioner.
In one example, as shown in fig. 6, a question text input by a user through a user terminal is received, a plurality of candidate answers corresponding to the question text are retrieved from a search library, a question-answer model is input, a matching degree between the question text and each candidate answer is calculated (text matching calculation as shown in the figure), the candidate answers are ranked according to the matching degrees (matching result ranking as shown in the figure), and the candidate answers ranked at the previous preset positions are provided to a questioner. Alternatively, the target answer may be text in the form of a question, and the corresponding answer is determined according to a selection instruction input by the user for the text. Alternatively, the target answer may also be text in the form of an answer corresponding to the question text, that is, the answer to the question is directly provided to the user.
In yet another example, as shown in fig. 7, a question text input by a user through a user terminal is received, the content of the question text is "why i do not fire", a plurality of candidate answers corresponding to "why i do not fire" are retrieved from a search library, a question-answer model is input, the matching degrees between the question text and the candidate answers are calculated, the candidate answers are ranked according to the matching degrees, the candidate answers arranged in the top 5 are taken as target answers, and the target answers are shown as "a 30-second short video which is why do not see i synchronize to a friend circle? "how to remove attention? "can the micro account be logged off? "how to turn off the watermark? "is my play failed? Receiving a selection instruction given by the user for each target answer, determining an answer of a question corresponding to the selection instruction, and providing the answer to the questioner. Among them, the "asking you for you to know what is the following question? The 'above is not' default text pre-configured in the question answering model, and can be directly provided for the user without participating in the calculation of the similarity.
In the text processing method provided by the embodiment of the application, the text processing model is obtained by performing joint training on the first text processing model and the second text processing model by adopting the first training data set and the second training data set in different fields. The auxiliary training of the first text processing model can be realized through the combined training, and the accuracy of the trained text processing model for processing data in different fields is improved. Moreover, because the second text processing model is at least one of a mask language model or a named entity recognition model, the training sample does not need to expand sentence pairs and corresponding labels in the field, the training cost of the model is not increased, and the field expansion without increasing the cost can be realized, so that the problems of difficult field migration and low data processing accuracy when the model processes data in the expanded field are solved, and the use effect of the text processing model in the expanded field is improved.
In the same principle as the method shown in fig. 1, an embodiment of the present disclosure further provides a training apparatus 30 for a text processing model, and as shown in fig. 8, the training apparatus 30 for a text processing model includes:
the model construction module 31 is configured to construct an initial neural network model, where the initial neural network model includes a first text processing model and a second text processing model, where the first text processing model includes a text feature extraction module and a result prediction module that are cascaded, the second text processing model is connected to an output of the text feature extraction module, and the second text processing model includes at least one of a mask language model or a named entity recognition model;
a data obtaining module 32, configured to obtain a first training data set corresponding to a first text processing model and a second training data set corresponding to a second text processing model, where the first training data set and the second training data set belong to different fields;
and the model training module 33 is configured to train the initial neural network model based on the first training data set and the second training data set until a preset training end condition is met, and use the first text processing model at the end of training as a final text processing model.
In one possible implementation, the second text processing model includes a mask language model and a named entity recognition model respectively connected to the output of the text feature extraction module, and the second training data set includes a third training data set corresponding to the mask language model and a fourth training data set corresponding to the named entity recognition model.
In a possible implementation manner, the model training module 33 is specifically configured to:
performing joint alternate training on the first text processing model and the second text processing model based on the first training data set and the second training data set until a loss function corresponding to the first text processing model converges, wherein the training end condition comprises the loss function convergence corresponding to the first text processing model;
for each training, determining a value of a loss function corresponding to a model branch based on a training data set corresponding to the trained model branch and a text processing result of the training data set obtained through the model branch, and adjusting a model parameter of the model branch based on the value of the loss function corresponding to the model branch, wherein the model branch is a first text processing model or a second text processing model.
In one possible implementation, the second text processing model includes a mask language model and a named entity recognition model respectively connected to the output of the text feature extraction module, and the second training data set includes a third training data set corresponding to the mask language model and a fourth training data set corresponding to the named entity recognition model;
the model training module 33 is configured to perform joint and alternative training on the first text processing model and the second text processing model based on the first training data set and the second training data set until a loss function corresponding to the first text processing model converges, and is configured to:
performing joint alternate training on the first text processing model, the mask language model and the named entity recognition model based on the first training data set, the third training data set and the fourth training data set until a loss function corresponding to the first text processing model is converged;
wherein the model branch is any one of a first text processing model, a mask language model or a named entity recognition model.
In a possible implementation manner, for the mask language model or the named entity recognition model, for each training, the model training module 33, when determining the value of the loss function corresponding to the model branch based on the training data set corresponding to the trained model branch and the text processing result of the training data set obtained through the model branch, is configured to:
inputting each training sample of the training data set corresponding to the model branch to a text feature extraction module to obtain the text feature of each sample;
inputting the text characteristics of each sample into the model branches to obtain the text processing result of each sample;
and determining the value of the loss function corresponding to the model branch based on the sample label of each sample and the text processing result of each sample.
In one possible implementation, the second text processing model includes a mask language model, and the data obtaining module 32, when obtaining the second training data set corresponding to the second text processing model, is configured to:
acquiring first texts belonging to a first field;
for each first text, at least one character in the first text is shielded to obtain a second text;
and taking each first text and the second text corresponding to the first text as a training sample corresponding to the mask language model in the second training data set.
In one possible implementation, the second text processing model includes a named entity recognition model, and the data obtaining module 32, when obtaining a second training data set corresponding to the second text processing model, is configured to:
acquiring a third text belonging to a second field;
for each third text, labeling at least one entity in the third text to obtain a fourth text;
and taking each third text and a fourth text corresponding to the third text as a training sample of the second training data set.
The training device for the text processing model according to the embodiment of the present disclosure may execute the training method for the text processing model corresponding to fig. 1 provided in the embodiment of the present disclosure, and the implementation principle is similar, the actions executed by each module in the training device for the text processing model according to the embodiment of the present disclosure correspond to the steps in the training method for the text processing model according to the embodiment of the present disclosure, and for the detailed functional description of each module of the training device for the text processing model, reference may be specifically made to the description in the training method for the corresponding text processing model shown in the foregoing, and details are not repeated here.
The application provides a training device of a text processing model, which adopts a first training data set and a second training data set in different fields to carry out joint training on a first text processing model and a second text processing model until a preset training end condition is met, thereby obtaining the text processing model meeting requirements. The auxiliary training of the first text processing model can be realized through the combined training, and the accuracy of the trained text processing model for processing data in different fields is improved. Moreover, because the second text processing model is at least one of a mask language model or a named entity recognition model, the training sample does not need to expand sentence pairs and corresponding labels in the field, the training cost of the model is not increased, and the field expansion without increasing the cost can be realized, so that the problems of difficult field migration and low data processing accuracy when the model processes data in the expanded field are solved, and the use effect of the text processing model in the expanded field is improved.
In the same principle as the method shown in fig. 4, an embodiment of the present disclosure also provides a text processing apparatus 40, as shown in fig. 9, where the text processing apparatus 40 includes:
a text obtaining module 41, configured to obtain a text to be processed;
the model processing module 42 is configured to input the text to be processed into the text processing model to obtain a processing result;
a result processing module 43, configured to perform corresponding processing based on a processing result;
the text processing model is obtained by training based on the method in the embodiment corresponding to fig. 1 of the present application.
In one possible implementation, the text to be processed comprises text of a domain to which the second training data set belongs.
In a possible implementation manner, the text processing model is a text matching model, and the text to be processed comprises a query text of a requester and a plurality of candidate query results corresponding to the query text;
the model processing module 42 is specifically configured to:
inputting the query text and each candidate query result into a text matching model to obtain a first matching degree of the query text and each candidate query result, wherein the processing result comprises the first matching degree;
the result processing module 43 is specifically configured to:
and determining a target query result from the candidate query results based on the first matching degrees, and providing the target query result to the requester.
In a possible implementation manner, the text processing model comprises a question-answer model, and the text to be processed comprises a question text of a question asking party and a plurality of candidate answers corresponding to the question text;
the model processing module 42 is specifically configured to:
inputting the questioning text and each candidate answer into a questioning and answering model to obtain a second matching degree of the questioning text and each candidate answer, wherein the processing result comprises the second matching degree;
the result processing module 43 is specifically configured to:
and determining a target answer from the candidate answers based on the second matching degrees, and providing the target answer to the questioner.
The text processing apparatus according to the embodiment of the disclosure can execute the text processing method corresponding to fig. 1 provided in the embodiment of the disclosure, and the implementation principle is similar, the actions executed by the modules in the text processing apparatus according to the embodiment of the disclosure correspond to the steps in the text processing method according to the embodiment of the disclosure, and for the detailed functional description of the modules in the text processing apparatus, reference may be specifically made to the description in the corresponding text processing method shown in the foregoing, and details are not repeated here.
In the text processing apparatus provided in the embodiment of the application, the text processing model is obtained by performing joint training on the first text processing model and the second text processing model by using the first training data set and the second training data set in different fields. The auxiliary training of the first text processing model can be realized through the combined training, and the accuracy of the trained text processing model for processing data in different fields is improved. Moreover, because the second text processing model is at least one of a mask language model or a named entity recognition model, the training sample does not need to expand sentence pairs and corresponding labels in the field, the training cost of the model is not increased, and the field expansion without increasing the cost can be realized, so that the problems of difficult field migration and low data processing accuracy when the model processes data in the expanded field are solved, and the use effect of the text processing model in the expanded field is improved.
Wherein, the training device of the text processing model or the text processing device may be a computer program (including program code) running in a computer device, for example, the training device of the text processing model or the text processing device is an application software; the device can be used for executing the training method of the text processing model provided by the embodiment of the application or corresponding steps in the text processing method.
In some embodiments, the training Device or the text processing Device of the text processing model provided in the embodiments of the present invention may be implemented by a combination of hardware and software, and by way of example, the training Device or the text processing Device of the text processing model provided in the embodiments of the present invention may be a processor in the form of a hardware decoding processor, which is programmed to execute the training method or the text processing method of the text processing model provided in the embodiments of the present invention, for example, the processor in the form of the hardware decoding processor may be implemented by one or more Application Specific Integrated Circuits (ASICs), DSPs, Programmable Logic Devices (PLDs), Complex Programmable Logic Devices (CPLDs), Field Programmable Gate Arrays (FPGAs), or other electronic components.
In other embodiments, the training apparatus for the text processing model or the text processing apparatus provided in the embodiments of the present invention may be implemented in software, fig. 8 and fig. 9 respectively show the training apparatus for the text processing model and the text processing apparatus stored in the memory, which may be software in the form of programs and plug-ins, and include a series of modules, and the training apparatus 30 for the text processing model includes a model building module 31, a data obtaining module 32, and a model training module 33, and is used for implementing the training method for the text processing model provided in the embodiments of the present invention. The text processing apparatus 40 includes a text obtaining module 41, a model processing module 42, and a result processing module 43, and is configured to implement the text processing method provided in the embodiment of the present invention.
The above embodiment introduces the training apparatus and the text processing apparatus of the text model from the perspective of the virtual module, and the following introduces an electronic device from the perspective of the entity module, which is specifically as follows:
an embodiment of the present application provides an electronic device, and as shown in fig. 10, an electronic device 8000 shown in fig. 10 includes: a processor 8001 and memory 8003. Processor 8001 is coupled to memory 8003, such as via bus 8002. Optionally, the electronic device 8000 may also include a transceiver 8004. In addition, the transceiver 8004 is not limited to one in practical applications, and the structure of the electronic device 8000 does not limit the embodiment of the present application.
Processor 8001 may be a CPU, general purpose processor, GPU, DSP, ASIC, FPGA or other programmable logic device, transistor logic device, hardware component, or any combination thereof. Which may implement or perform the various illustrative logical blocks, modules, and circuits described in connection with the disclosure. Processor 8001 may also be a combination that implements computing functionality, e.g., comprising one or more microprocessor combinations, DSP and microprocessor combinations, and so forth.
Bus 8002 may include a path to transfer information between the aforementioned components. The bus 8002 may be a PCI bus or an EISA bus, etc. The bus 8002 may be divided into an address bus, a data bus, a control bus, and the like. For ease of illustration, only one thick line is shown in FIG. 10, but this is not intended to represent only one bus or type of bus.
Memory 8003 may be, but is not limited to, ROM or other types of static storage devices that can store static information and instructions, RAM or other types of dynamic storage devices that can store information and instructions, EEPROM, CD-ROM or other optical disk storage, optical disk storage (including compact disk, laser disk, optical disk, digital versatile disk, blu-ray disk, etc.), magnetic disk storage media or other magnetic storage devices, or any other medium that can be used to carry or store desired program code in the form of instructions or data structures and that can be accessed by a computer.
The memory 8003 is used for storing application program codes for executing the scheme of the present application, and the execution is controlled by the processor 8001. Processor 8001 is configured to execute application program code stored in memory 8003 to implement what is shown in any of the foregoing method embodiments.
An embodiment of the present application provides an electronic device, where the electronic device includes: one or more processors; a memory; one or more computer programs, wherein the one or more computer programs are stored in the memory and configured to be executed by the one or more processors, the one or more programs, when executed by the processors, construct an initial neural network model, the initial neural network model comprising a first text processing model and a second text processing model, wherein the first text processing model comprises a cascaded text feature extraction module and a result prediction module, the second text processing model is connected with an output of the text feature extraction module, the second text processing model comprises at least one of a masking language model or a named entity recognition model; acquiring a first training data set corresponding to a first text processing model and a second training data set corresponding to a second text processing model, wherein the first training data set and the second training data set belong to different fields; and training the initial neural network model based on the first training data set and the second training data set until a preset training ending condition is met, and taking the first text processing model at the training ending as a final text processing model.
The present application provides a computer-readable storage medium, on which a computer program is stored, and when the computer program runs on a processor, the processor can execute the corresponding content in the foregoing method embodiments.
According to an aspect of the application, a computer program product or computer program is provided, comprising computer instructions, the computer instructions being stored in a computer readable storage medium. The processor of the computer device reads the computer instructions from the computer-readable storage medium, and the processor executes the computer instructions to cause the computer device to perform the method provided in the various alternative implementations of the method for recognizing a text image described above.
It should be understood that, although the steps in the flowcharts of the figures are shown in order as indicated by the arrows, the steps are not necessarily performed in order as indicated by the arrows. The steps are not performed in the exact order shown and may be performed in other orders unless explicitly stated herein. Moreover, at least a portion of the steps in the flow chart of the figure may include multiple sub-steps or multiple stages, which are not necessarily performed at the same time, but may be performed at different times, which are not necessarily performed in sequence, but may be performed alternately or alternately with other steps or at least a portion of the sub-steps or stages of other steps.
The foregoing is only a partial embodiment of the present invention, and it should be noted that, for those skilled in the art, various modifications and decorations can be made without departing from the principle of the present invention, and these modifications and decorations should also be regarded as the protection scope of the present invention.
- 上一篇:石墨接头机器人自动装卡簧、装栓机
- 下一篇:金融主体的识别方法、电子装置和存储介质