Semantic engineering platform and construction method thereof
1. A construction method of a semantic engineering platform is characterized by comprising the following steps:
extracting semantic engineering problems related to a human-like solving technical link from a preset data set, wherein the data set is a natural language elementary mathematic application problem set;
performing linguistic data and semantic analysis on the extracted semantic engineering problem to obtain semantic representation instance data, wherein the semantic representation instance data comprises a linguistic data instance of data element variables, a semantic instance of numerical relations among the variables and a semantic instance of logical relations among the variables;
and constructing a semantic engineering platform facing the problem according to the obtained semantic representation example data.
2. The method of claim 1, wherein the performing linguistic and semantic analysis on the extracted semantic engineering problem to obtain semantic representation instance data comprises:
identifying data elements in the semantic engineering problem, and representing the identified data elements by adopting data element variables, wherein the data elements comprise digital strings representing digital quantities and Chinese character strings representing data questioning;
extracting context participles where the data elements are located and a vocabulary stream after semantic preprocessing, and constructing a clause frame library and a semantic feature library corresponding to data element variables according to the obtained context participles and the vocabulary stream;
identifying context explicit numerical relationships in semantic engineering problems, wherein the context explicit numerical relationships are explicit numerical relationships among data element variables, and constructing a semantic guide library corresponding to the data element variables according to the obtained context explicit numerical relationships;
identifying the logical relationship of data element variables in the semantic engineering problem, wherein the logical relationship of the data element variables is the direct operation relationship among the variables, and constructing a semantic map library corresponding to the data element variables according to the obtained logical relationship;
the clause frame library and the semantic feature library form a corpus example of the data element variables;
the clause frame library and the semantic map guide library form a semantic example of numerical relation among variables;
the clause frame library, the semantic feature library and the semantic guide library form a semantic instance of the logical relation of the variables.
3. The method of claim 2, wherein identifying data elements in the semantic engineering problem and representing the identified data elements with data element variables comprises:
performing word segmentation processing on the extracted semantic engineering problem, and sequentially identifying data elements in the semantic engineering problem through quantitative word identification, concept attribute identification, concept relation identification, reference identification, time segmentation identification and/or common sense relation identification after word segmentation;
and selecting a core word set and a modifier word set of the related variable of the current scene unit frame to represent the identified data element according to a preset variable label set.
4. The method of claim 2, wherein the semantic graph library is comprised of a plurality of closed semantic circles, each semantic circle comprising a plurality of semantic edges, each semantic edge corresponding to a mathematical formula.
5. The method of claim 4, wherein the method further comprises:
and configuring data files of the variable labels, the context word segmentation, the semantic circles, the semantic edges and the mathematical formulas of the variables according to preset data rules.
6. The method of claim 4, wherein the method further comprises:
and when the repeatedly named variable is detected, correcting according to the corresponding variable name of the related semantic circle and the semantic feature library, or adding scene modification words to remove the repeated name.
7. The method of claim 4, wherein the method further comprises:
matching the numerical operational relationship of the semantic edges, an
And (4) supplementing the numerical relation of the same type of variables and expanding the analog relation of a similar model.
8. The method of any one of claims 1-7, further comprising:
and analyzing and calculating the elementary mathematic application questions to be calculated and outputting results based on the semantic engineering platform.
9. The method according to claim 8, wherein the calculating and result outputting of the elementary math application questions to be calculated based on the semantic engineering platform specifically comprises:
identifying data element variables, context explicit numerical relationships and logical relationships of the data element variables in the elementary mathematic application questions to be calculated;
performing semantic expression example data matching according to the identified data element variables, the context explicit numerical relationship and the logical relationship of the data element variables, and generating dynamic semantic circles corresponding to elementary mathematic application questions to be calculated so as to describe the one-to-one corresponding relationship between the data element variables and the corresponding formula variables;
and performing simulation solving calculation based on the dynamic semantic circle.
10. A semantic engineering platform constructed using the method of any one of claims 1-7.
Background
At present, the existing technology for solving the elementary mathematics application problems of natural language adopts machine learning algorithms such as a support vector machine and the like to carry out semantic recognition, does not have the intermediate step of 'human-like', directly solves the result, and the result is not satisfactory. For the problem of solving the problem of the human machine of the problem class of the mathematical application of the natural language, the semantic feature sparsity problem is very prominent for the semantic processing problem because the expression form of the natural language is almost infinite. This makes many data-based machine learning algorithms untenable to natural language model matching. To solve the problem, a new algorithm adaptive to the semantic representation sparsity problem needs to be developed; secondly, a semantic engineering platform is built to accumulate linguistic data and semantic representation data.
Therefore, how to construct a semantic engineering platform to accumulate linguistic data and semantic representation data is of great significance to solving problems of natural language mathematics application by a human robot.
Disclosure of Invention
Based on the technical problems, the invention provides a semantic engineering platform and a construction method thereof, which can solve practical problems for mathematic application questions and people, realize specific and special semantics and corpus accumulation, construct a problem-oriented semantic engineering platform, and provide a data base for realizing real people-like thinking solution.
The invention provides a construction method of a semantic engineering platform, which comprises the following steps:
extracting semantic engineering problems related to a human-like solving technical link from a preset data set, wherein the data set is a natural language elementary mathematic application problem set;
performing linguistic data and semantic analysis on the extracted semantic engineering problem to obtain semantic representation instance data, wherein the semantic representation instance data comprises a linguistic data instance of data element variables, a semantic instance of numerical relations among the variables and a semantic instance of logical relations among the variables;
and constructing a semantic engineering platform facing the problem according to the obtained semantic representation example data.
Further, the performing corpus and semantic analysis on the extracted semantic engineering problem to obtain semantic representation instance data includes:
identifying data elements in the semantic engineering problem, and representing the identified data elements by adopting data element variables, wherein the data elements comprise digital strings representing digital quantities and Chinese character strings representing data questioning;
extracting context participles where the data elements are located and a vocabulary stream after semantic preprocessing, and constructing a clause frame library and a semantic feature library corresponding to data element variables according to the obtained context participles and the vocabulary stream;
identifying context explicit numerical relationships in semantic engineering problems, wherein the context explicit numerical relationships are explicit numerical relationships among data element variables, and constructing a semantic guide library corresponding to the data element variables according to the obtained context explicit numerical relationships;
identifying the logical relationship of data element variables in the semantic engineering problem, wherein the logical relationship of the data element variables is the direct operation relationship among the variables, and constructing a semantic map library corresponding to the data element variables according to the obtained logical relationship;
the clause frame library and the semantic feature library form a corpus example of the data element variables;
the clause frame library and the semantic map guide library form a semantic example of numerical relation among variables;
the clause frame library, the semantic feature library and the semantic guide library form a semantic instance of the logical relation of the variables.
Further, the identifying data elements in the semantic engineering problem and representing the identified data elements by using data element variables includes:
performing word segmentation processing on the extracted semantic engineering problem, and sequentially identifying data elements in the semantic engineering problem through quantitative word identification, concept attribute identification, concept relation identification, reference identification, time segmentation identification and/or common sense relation identification after word segmentation;
and selecting a core word set and a modifier word set of the related variable of the current scene unit frame to represent the identified data element according to a preset variable label set.
Furthermore, the semantic map guide library is composed of a plurality of closed semantic circles, each semantic circle comprises a plurality of semantic edges, and each semantic edge corresponds to a mathematical formula.
Further, the method further comprises:
and configuring data files of the variable labels, the context word segmentation, the semantic circles, the semantic edges and the mathematical formulas of the variables according to preset data rules.
Further, the method further comprises:
and when the repeatedly named variable is detected, correcting according to the corresponding variable name of the related semantic circle and the semantic feature library, or adding scene modification words to remove the repeated name.
Further, the method further comprises:
matching the numerical operational relationship of the semantic edges, an
And (4) supplementing the numerical relation of the same type of variables and expanding the analog relation of a similar model.
Further, the method further comprises:
and analyzing and calculating the elementary mathematic application questions to be calculated and outputting results based on the semantic engineering platform.
Further, the calculating and result outputting of the elementary mathematic application questions to be calculated based on the semantic engineering platform specifically includes:
identifying data element variables, context explicit numerical relationships and logical relationships of the data element variables in the elementary mathematic application questions to be calculated;
performing semantic expression example data matching according to the identified data element variables, the context explicit numerical relationship and the logical relationship of the data element variables, and generating dynamic semantic circles corresponding to elementary mathematic application questions to be calculated so as to describe the one-to-one corresponding relationship between the data element variables and the corresponding formula variables;
and performing simulation solving calculation based on the dynamic semantic circle.
The invention also provides a semantic engineering platform which is constructed by adopting the method.
According to the semantic engineering platform and the construction method thereof, provided by the invention, the semantic expression example data is obtained by performing linguistic and semantic analysis on the semantic engineering problem extracted from the elementary mathematic application questions of natural language, and the problem-oriented semantic engineering platform is constructed on the basis of the obtained semantic expression example data, so that the actual problem can be solved for mathematic application questions and human beings, specific and special semantics and linguistic accumulation is realized, and a data basis is provided for realizing the real human-like thinking solution.
Furthermore, the semantic engineering platform constructed by the invention can realize the human-like solving process by using interpretable intermediate steps, realize real human-like thinking and has important application value in the field of online education and auxiliary student learning.
Drawings
FIG. 1 is a flow chart of a method of construction of a semantic engineering platform of the present invention;
FIG. 2 is a flowchart illustrating a specific implementation of step S12 in the method for constructing a semantic engineering platform according to the present invention;
FIG. 3 is a schematic diagram of the analysis of the semantic problem of the mathematical application problem proposed by the present invention;
FIG. 4 is a semantic circle diagram involved in an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The dimensionality of semantic expression is very wide, and for accumulating semantic expression data, firstly, semantic engineering problems must be determined, and semantic expression example data is accumulated by taking the problems as the dimensionality; then, a semantic engineering problem is automatically solved by a method for processing a research natural language. The invention takes the human-like solution of the primary mathematic application problems of natural language as a research object, concreties the semantic engineering problems faced by the human-like solution technical link of the primary mathematic application problems of natural language, builds a semantic engineering platform aiming at the problems, accumulates semantic representation data and automatically processes the semantic representation data by an algorithm at a proper time.
Fig. 1 is a schematic flow chart illustrating a method for constructing a semantic engineering platform according to an embodiment of the present invention. The construction method of the semantic engineering platform provided by the embodiment of the invention specifically comprises the following steps:
s11, extracting semantic engineering problems related to the human-like solving technical link from a preset data set, wherein the data set is a natural language elementary math application problem set.
S12, performing linguistic data and semantic analysis on the extracted semantic engineering problem to obtain semantic representation instance data, wherein the semantic representation instance data comprises a linguistic data instance of data element variables, a semantic instance of numerical relations among the variables and a semantic instance of logical relations among the variables.
Mathematical application questions generally contain three types of information: the first is data element variables, namely character strings representing data, such as integers or real numbers, or Chinese character strings representing data questioning, such as 'how many', 'how long', 'several hundredths', and the like, wherein the character strings can be replaced by other data to form other questions with the same problem solving method but different answers; secondly, numerical relationships among variables, such as multiple relationships, multiple + multiple relationships, equivalence relationships and the like, are mostly used for deriving values of unknown variables from known variables, and are also used for expressing relationship transfer of the unknown variables of an equation; and thirdly, the logic meanings of the variables, such as yield, efficiency, duration, distance, speed and the like, and the calculation formula or method of the variables is contained behind the logic meanings.
According to the embodiment of the invention, the three kinds of information are obtained by performing the linguistic data and semantic analysis from the technical link of solving the specific application problem, so that the semantic and linguistic data accumulation is realized.
And S13, constructing a semantic engineering platform facing the problem according to the obtained semantic representation example data.
According to the semantic engineering platform and the construction method thereof, provided by the invention, through semantic engineering problems extracted from elementary mathematic application questions of natural language, linguistic data and semantic analysis are carried out from the technical links of solving the specific application problems to obtain semantic representation example data, the problem-oriented semantic engineering platform is constructed on the basis of the obtained semantic representation example data, development of semantic engineering application is developed from the bottom foundation construction, the actual problems can be solved for mathematic application questions and the like, specific and special semantic and linguistic data accumulation is realized, and a data basis is provided for realizing the real human-like thinking solving.
As shown in fig. 2, in the embodiment of the present invention, performing corpus and semantic analysis on the extracted semantic engineering problem in step S12 to obtain semantic representation instance data specifically includes the following steps:
and S121, identifying data elements in the semantic engineering problem, and representing the identified data elements by adopting data element variables, wherein the data elements comprise digital strings representing digital quantities and Chinese character strings representing data questioning.
In this embodiment, the identifying data elements in the semantic engineering problem and representing the identified data elements by using data element variables specifically include: performing word segmentation processing on the extracted semantic engineering problem, and sequentially identifying data elements in the semantic engineering problem through quantitative word identification, concept attribute identification, concept relation identification, reference identification, time segmentation identification and/or common sense relation identification after word segmentation; and selecting a core word set and a modifier word set of the related variable of the current scene unit frame to represent the identified data element according to a preset variable label set.
The invention uses the scene unit vocabulary sequence grammar to describe the scene unit frame of the input vocabulary and realizes the matching and identifying algorithm of the scene unit frame. The scene unit frame corresponds to the explicit knowledge, information and data of common sense knowledge and language expression contained behind the language expression, and the recognition of the scene unit frame means the recognition of the corresponding language expression semantics.
And S122, extracting context participles where the data elements are located and a vocabulary stream after semantic preprocessing, and constructing a clause frame library and a semantic feature library corresponding to the data element variables according to the obtained context participles and the vocabulary stream.
Specifically, the data elements in the semantic engineering problem comprise all numeric strings representing numeric quantities and Chinese character strings representing data questions such as 'how much', 'how long', 'several percent', 'several parts', and the like, and the data elements can be replaced by other numerical values, and the solving method is unchanged. These changeable data elements are represented by data element variables, which are initially assigned to the values of the data elements. The data elements are in context semantics, and the context participles where the data elements are located and the vocabulary stream after semantic preprocessing are stored as a clause frame library and a semantic feature library of data element variables for fixing the semantics of the data element variables.
S123, identifying context explicit numerical relationships in the semantic engineering problem, wherein the context explicit numerical relationships are explicit numerical relationships among data element variables, and constructing a semantic guide library corresponding to the data element variables according to the obtained context explicit numerical relationships.
And S124, identifying the logical relationship of the data element variables in the semantic engineering problem, wherein the logical relationship of the data element variables is the direct operation relationship among the variables, and constructing a semantic map guide library corresponding to the data element variables according to the obtained logical relationship. Specifically, the semantic map library is composed of a plurality of closed semantic circles, each semantic circle comprises a plurality of semantic edges, and each semantic edge corresponds to a mathematical formula.
Wherein, the clause frame library and the semantic feature library form a corpus example of the data element variable; the clause frame library and the semantic guide map library form a semantic instance of numerical relationship among variables; the clause frame library, the semantic feature library and the semantic guide library constitute semantic instances of the logical relationship of the variables.
Specifically, as shown in fig. 3, the elementary mathematics application questions include: the semantic problem relates to data element identification, context explicit numerical relation identification and data element variable logic meaning identification. The identification of the context explicit numerical relationship refers to the numerical relationship between the data element variables explicitly indicated by the context, and is comprehensively described by a clause frame library and a semantic guide diagram library. The logical meaning of the data element variables, whether the implied variables have direct operation relation or not, is comprehensively expressed by a semantic guide library, a semantic feature library and a clause frame library together.
It should be noted that the clause frame library, the semantic feature library, and the semantic atlas library provided in the embodiment of the present invention are set intermediate processing steps, and the setting of the intermediate step may be different in different technical solutions. It can be understood that the semantic and corpus accumulation can be generalized to two problems: (1) the data element variables are accumulated. (2) Data element variables imply the accumulation of formulas. In solving a specific semantic engineering problem, as shown in fig. 3, it can be generalized to two-step (1) data meta-variable matching. (2) And determining a data element variable containing formula.
The construction method of the semantic engineering platform provided by the invention further comprises the following steps: in the process of corpus accumulation, a semantic corpus full-index technology is adopted. Specifically, the method and the device can be used for solving the problems of mathematical formulas, semantic edges, semantic circles, variable naming and the like, accumulating the related context corpora, establishing a full index of semantic engineering platform attributes for the corpora, facilitating knowledge retrieval and backtracking exploration and reducing the difficulty of error detection of the solving link. The invention follows the principle of separating program and data, uses a large number of data configuration files to improve the flexibility of the platform system, and configures the data files by the variable labels, the context word segmentation, the semantic circle, the semantic edge and the mathematical formula of the variables according to the preset data rules, and each data configuration file is provided with a machine recognition algorithm corresponding to the format and the content, and the data configuration text file format recognition algorithms form a configuration file format recognition technology series on the whole.
The following explains the accumulation of relevant context corpus examples in the problem solving links such as mathematical formulas, semantic edges, semantic circles, variable naming and the like by a specific embodiment.
1. Accumulation of mathematical formulas
The mathematical formula library is composed of a set of 6-tuple < ID, MK, NA, BL, TZ, GS >, wherein ID is formula identification, MK is library feature, NA is formula name, BL is the variable table of formula, TZ is the variable feature table of corresponding variable table, and GS is the expression form of formula. Wherein, the naming rule of the variable is that the modifier is in front and the core word is in back; the mathematical operation symbols use a double-write form, which facilitates semantic recognition, as shown in table 1.
Table 1: mathematical formula table
The invention tests 300 channels of elementary mathematics application questions, and from the test result, the number of mathematical formulas is limited, about 50, which relates to equivalent assignment, addition, subtraction, multiplication and division operation, triangle, square, rectangle and trapezoid area calculation, and the like. The key of mathematical formula accumulation is the determination of the corresponding relation with semantic edges, and it is meaningless to unilaterally expand the set of mathematical formulas.
2. Semantic edge accumulation
Semantic edges are a set of semantic variables that embody mathematical computation relationships. Such as length, speed and duration, total length (AU-a), a part length and a part length. Each semantic edge corresponds to a corresponding mathematical formula, and the mathematical significance of each semantic edge is reflected. The semantic edge is composed of four-tuple < ID, EQU _ ID, DMVT, EQVT, XPIM >, wherein ID is semantic edge identification, EQU _ ID is mathematical calculation formula identification corresponding to the semantic edge, DMVT is data element variable table, EQVT is corresponding formula variable table, and XPIM is explicit expression or implicit meaning. Examples of semantic edges are shown in table 2 below:
table 2: semantic edge list
3. Semantic circle accumulation
The semantic circle is defined as a tightly associated data element variable set and is described as a set of five-tuple < ID, VarNum, VarTable, EquNum and EquTable >, wherein ID is a semantic circle identifier, VarNum is the number of semantic circle data element variables, VarTable is a data element variable table, EquNum is the number of semantic edges, and EquTable is a semantic edge table. For example, the semantic circle of the round trip distance problem is described as follows:
1
10
distance, speed, time,
9
9: : distance, when the distance is equal to-left end and right end
9: : distance, when the patient returns, the distance is-left end, right end
1: : distance, speed, time, length, efficiency, time
1: : the distance, speed, time, efficiency and time of the back-return
1: : distance, speed, time and efficiency
6: : distance, round trip, part-A, part-B, total amount
6: : the time of the part A and the part B and the total amount of the parts A and B
3: : distance of reciprocating:, 2, distance of reciprocating::, is-1 variable, multiple, 2 variable
3: : distance of reciprocating:, 2, distance of reciprocating:, is-1 variable, multiple, 2 variable
Fig. 4 is a semantic circle diagram of the present invention, and as shown in fig. 4, it can be seen that: (1) only the variable relation of the magnitude (sum and difference) and the multiple (product quotient) of the numerical value needs to be explicitly described by words; the distance, speed and time relationship, the round-trip distance and the coming/going distance relationship, and the distance and the coming/going distance relationship are all implicit expressions and do not reflect on the text description. (2) Explicit numerical relationships are all core conceptual variables or constant/multiple variable relationships of the same type. The variation of the topic expression is in the expression of an explicit numerical relationship to the core concept variables. And the various changes of the theme can be realized by expanding different explicit numerical relation expressions.
4. Variable context accumulation
The variable top-bottom library is composed of a set of four-tuples < ID, CNUM, TVAR, TCTX >, wherein ID is variable ID, CNUM is the number of pieces of context content information, TVAR is a variable table, and TCTX is a context information table. The following are the context information for the variables "distance" and "speed":
1
2
the distance of 205 is,
school has the destination of XX as follows:
2
3
when the speed is 206, 219,
the medicine is characterized by comprising the following components in percentage by hour:
an automobile comprises the following components, in each hour, XX, kilometer:
5. variable tag accumulation
The invention can specify labels representing variables, such as core words: yield, distance, length, time, duration, speed, efficiency; modifier: planning, original, actual, present, general, and the like; the label set is formed in a unified mode, and a foundation is laid for automatic variable naming of the machine. The cells in the variable name that are isolated by the colon serve as tags.
The construction method of the semantic engineering platform provided by the embodiment of the invention further comprises the internal management of the linguistic data. The method comprises the following steps of variable naming, variable duplicate checking, variable alias, semantic circle recognition, numerical operation relation semantic edge processing and the like, and specifically comprises the following steps:
1. variable naming
And selecting a core word set and a modifier set of the variables related to the scene unit framework according to the accumulated variable label set. For example, "8 km away from school, 5 km going per hour, 4 km going back per hour, and an average km going back and forth per hour? ", the four variables are named: the two steps are as follows, distance, speed when going, speed when returning, and average speed when going back and forth.
2. Variable check of duplicate
And when the repeatedly named variable is detected, correcting according to the corresponding variable name of the related semantic circle and the semantic feature library, or adding scene modification words to remove the repeated name.
In this embodiment, the possible renames of the variables named according to the naming rule are corrected according to the corresponding variable names of the semantic circle and the semantic feature general table, and the situation that the renames are removed by scene modification vocabularies can also be increased.
3. Variable alias
The semantic circle can have a plurality of scenes, variable names in different scenes can be different, and for semantic circle variables, the semantic circle variables are aliases, and the calculation rules and the method behind are completely the same. For example, in the context of the following variables:
4
4
the back-and-forth movement has the speed | |208 and 221, the back-and-forth movement has the average speed | |423,
the average time of the two-way operation is as follows, every hour is as follows, XX is as follows, kilometer is as follows:
the automobile comprises the following components in percentage by weight:
the vehicle comprises a vehicle which reciprocates back and forth, and has an average speed as follows:
wherein the variable "round trip:: average:: speed" of the clause frame 423 is an alias of the variable "round trip:: speed" of the clause frames 208 and 221.
4. Semantic circle recognition
If the variables of the known problem are all contained in a semantic circle variable set, the semantic circle is identified; if the variables of the known problem are in the expression range of the semantic circle common sense rule, and the numerical value transfer formula is not in the expression range of the semantic circle, adding the numerical value transfer rule formula to the semantic circle, and identifying the semantic circle. The key of the semantic circle identification is to identify the semantic corresponding relation between the problem variable and the semantic circle variable and the corresponding relation between the common sense rule and the numerical value transmission rule.
5. Variable concept identification
The identification of the concept of a variable relies on the comprehensive analysis of the attributes of the features. For example, distance, length, speed, efficiency, density, time, duration, number, multiple, identification of the query word(s), and conceptual attributes: identification of total (invariant)/total (variant), chronological (planned/actual), etc.
a. Quantifier recognition
And identifying the variable concept according to the quantifier after the numerical value. For example, the distance is kilometers, and the like; the time quantifier is hours, minutes, etc.; the expression of velocity is somewhat complex, expressed in a combination of time and distance terms beginning with the feature term "per". According to the identification of quantifier, the variable concept is recognized on a category level.
b. Concept attribute identification
Generally, a noun closest to a numerical variable represents the conceptual properties of the variable. For example, "100 cubic meters square" is a term closest to the data element "100 cubic meters", and indicates that the conceptual attribute of the category being volume is a square.
c. Concept relationship identification
The concept attribute relationship common sense description file can be designed in the following format:
number 1///sequence
2///number of description entries
Chimney,// macroscopic concept
Surface area,; length of side of cross section; long; // Attribute concept of Inclusion
In the event that it is determined that one of a macro concept and an attribute concept is included in the topic, it is determined that the data element having the attribute concept has a corresponding macro concept constraint. For example, "there is an iron chimney with a length of 2 m", the attribute concept of the data element "2 m" is "long", and the macroscopic concept is "chimney".
d. Reference identification
The referring relationship common sense description file can be designed into the following format:
1
rectangular parallelepiped, steel material,
Square, steel,
2
Rectangular parallelepiped, steel material,
Steel material,
The corresponding original subject is named as' a section of rectangular steel … …, the weight of the square steel? "and" … …, "how long is the steel material … … rectangular parallelepiped forged,? "how many" and how long "conceptual attributes and relationships of a data element are determined by referring to a relationship.
e. Time segment identification
The time attribute of the data element variable is distinguished by using the time feature words. For example, the following keyword list of "total score" structure:
and (2) totaling: planning, conceiving, calculating,
Dividing into: a,
Dividing into: the rest, the residue, the rest,
The following "total score total" structural keyword list:
and (2) totaling: TIME-MK,
Dividing into: front part,
Dividing into: then the first part is,
And (2) totaling: ask, TIME-MK,
Time segment identification implies, in addition to time attribute identification, common sense relationship identification, i.e. the total time equals the sum of the fractional times.
f. Common sense relationship recognition
The common sense feature words are keywords contained in semantic segments (divided by symbols and context variables), such as "front/back", "already", and "plan/residual". The way of determining the common sense relationship is that besides the above clause frame mode (mode matching by clauses) direct definition and concept attribute derivation mode, there is also a characteristic mode identification mode, for example, the speed and efficiency of characteristic keywords 'calculating as such' including 2 concepts before and after the common sense relationship are the same; the characteristic model "… forged steel material from steel slab …" includes that the volume, weight and density attributes of the steel slab and the steel material are the same in the front and rear 2 concepts.
Further, the method further comprises other platform management contents, specifically as follows:
firstly, matching the numerical operation relationship of the semantic edges, such as variable matching required by difference and relationship and multiple relationship identification, corpus accumulation, context vocabulary order and the like;
second, knowledge learning and expression supplementation: for example, the numerical relationship of the same type of variables is supplemented, the analog relationship of similar models is expanded, and the like.
In another embodiment of the invention, the method further comprises performing analysis calculation and result output on the elementary math application questions to be calculated based on the semantic engineering platform.
Further, the calculating and result outputting of the elementary mathematic application questions to be calculated based on the semantic engineering platform specifically comprises the following steps:
identifying data element variables, context explicit numerical relationships and logical relationships of the data element variables in the elementary mathematic application questions to be calculated;
performing semantic expression example data matching according to the identified data element variables, the context explicit numerical relationship and the logical relationship of the data element variables, and generating dynamic semantic circles corresponding to elementary mathematic application questions to be calculated so as to describe the one-to-one corresponding relationship between the data element variables and the corresponding formula variables;
and performing simulation solving calculation based on the dynamic semantic circle.
When solving the specific semantic engineering problem, the method can be summarized into two steps (1) of matching the data element variables. (2) And determining a data element variable containing formula.
Data element variable matching refers to the location of an explicit data element in the topic context and the variable in the semantic feature library that it matches. The relative comparison of the positions of the data elements is easy to identify, namely the numeric strings and the Chinese strings representing the data questions. Matching variables in the semantic feature library requires certain context instances to be accumulated and then classified and predicted by using a machine learning algorithm. In addition to overcoming the semantic sparseness problem of small sample multi-classification in single data element variable matching, the matching problem of the combinational logic meaning of all the data element variables of the question is also considered. That is, in addition to clauses, the logical meaning of the full text of the title is also considered. In the embodiment of the invention, after the context of a single data element variable is matched, the context of other data element variables in the matching title is also included, and the matching corresponding relation of all the matched data element variables in a logic semantic circle is selected to name the data element variables.
The semantic guide map is composed of a plurality of closed semantic circles, each semantic circle comprises a plurality of semantic edges, and each semantic edge corresponds to a mathematical formula calculation operation, such as speed, duration, distance, or assignment operation. The corresponding relation between the semantic circle where the data element set is located in the title and the data element variable is clarified, the mathematical formula set required for solving is found, and next step, only simulation solving calculation is carried out based on the dynamic semantic circle, and calculation is carried out by finding the minimum solving sequence matched with the formula.
The following describes a human-like solution of elementary mathematical application problems to be computed by using a semantic engineering platform by using a specific example.
Human-like solution example
A fence with the length of 30 meters, the width of 0.24 meter and the height of 5 meters is required to be built on the south of the park. If 500 bricks are used per cubic meter, how many bricks are needed? "
A. Variable concept identification
After word segmentation, identifying data element variable names formed by key semantic words in sequence through quantifier identification, concept attribute identification, concept relation identification, reference identification or time segmentation identification to obtain the following results:
length: : enclosing walls: : length (30 meter)
Width: : enclosing walls: : length (0.24 meter)
High: : enclosing walls: : length (5 m)
Brick making: : volume: : the number is as follows: : density (500)
And (2) totaling: enclosing walls: : brick making: : number (number of blocks)
B. Common sense relationship recognition
The common sense implications of this problem are two:
length of cuboid, width of cuboid, height of cuboid, volume of cuboid
Volume of cuboid (number/volume) -total number of bricks
Note: the enclosure wall can be inferred to be a cuboid according to the expression characteristics of length, width and height.
C. Dynamic semantic circle
And generating a dynamic semantic circle referred by the system in operation according to the numerical relationship expressed by the questions and the implied common sense relationship:
21: : the length is equal to the length of the rectangular parallelepiped, the width is equal to the length of the rectangular parallelepiped, the height is equal to the volume of the rectangular parallelepiped
22: : the enclosure wall comprises volume, bricks, the number, the density, the total number, the volume, the density, the number and the number
Note: wherein "21" and "22" represent the corresponding formula ID numbers; the front part of the symbol is a data element variable list, and the rear part is a variable list of a rule formula; the dynamic semantic circle describes a one-to-one correspondence of data element variables to corresponding formula variables.
D. Simulated operation solution
The platform traverses a solving rule list expressed according to formula knowledge, satisfies a variable set of semantic edge activation conditions in a dynamic semantic circle, corresponds to a numerical value in variable concept recognition, and adopts a calculation formula in the solving rule to calculate and solve. And after the question variables are assigned, reserving all steps with subsequent calculation, deleting interrupted meaningless operation steps, and outputting solving steps and results of system simulation operation.
For example, in the above example, according to formula 21 in semantic edge 1, it can be calculated that:
length, width, height, length, enclosing wall and volume
This in turn causes equation 22 in semantic edge 2 to be activated, which can solve:
the enclosure wall comprises bricks with the volume of the enclosure wall, the number of the bricks, the density of the enclosure wall, the total number of the bricks
The total enclosing wall, the brick number and the question variable, so the problem is solved successfully, and the solving steps 22 and 21 and the operation result are output.
Compared with the technical route of direct machine learning, the scheme of the invention can realize the human-like solving process by using interpretable intermediate steps, realize real human-like thinking and have application value in the field of online education and auxiliary student learning; compared with the technical scheme of corpus construction and dictionary construction such as dependency grammar and the like, the method is more directed to solving the practical problem solved by the math application question man, is not like corpus and dictionary construction, has single function, only concerns about general use value, but is directed to solving the practical problem by the math application question man, analyzing and solving the requirement, and designing specific and special semantic corpus accumulation, analysis and management functions. For the problems of complicated semantic engineering of natural language, the scheme of the invention is a progressive and incremental solution method which is based on basic work and is actually promoted.
In addition, the invention also provides a semantic engineering platform which is constructed by adopting the method.
According to the semantic engineering platform and the construction method thereof, provided by the invention, the semantic expression example data is obtained by performing linguistic and semantic analysis on the semantic engineering problem extracted from the elementary mathematic application questions of natural language, and the problem-oriented semantic engineering platform is constructed on the basis of the obtained semantic expression example data, so that the actual problem can be solved for mathematic application questions and human beings, specific and special semantics and linguistic accumulation is realized, and a data basis is provided for realizing the real human-like thinking solution.
Furthermore, the semantic engineering platform constructed by the invention can realize the human-like solving process by using interpretable intermediate steps, realize real human-like thinking and has important application value in the field of online education and auxiliary student learning.
The above embodiments are only for illustrating the invention and not for limiting the invention, and those skilled in the art can make various changes and modifications without departing from the spirit and scope of the invention, so that all equivalent technical solutions also belong to the scope of the invention, and the scope of the invention should be defined by the claims.