Man-machine cooperation based large-scale Fuzzing optimization system and method
1. A large-scale fuzzy optimization system based on man-machine cooperation is characterized by comprising a man-machine cooperation intelligent decision-making module based on a knowledge vector space, a man-machine cooperation parallelization fuzzy module based on an annotation source language and a human-in-loop hybrid enhancement module;
the human-computer cooperation intelligent decision module based on the knowledge vector space is used for completing the abstraction of elements such as human vulnerability mining knowledge and machine vulnerability mining capability, mapping the elements to a uniform vector space, taking the human-computer knowledge as part input of task planning, and providing decision guidance for parallelization fuzzy execution;
the human-computer collaboration parallelization fuzzy module based on the annotated source language is used for completing human-computer collaboration based fuzzy flow construction, receiving decision results output by a human-computer collaboration intelligent decision module, wherein the decision results are task information, an initially recommended vulnerability mining model, a fuzzy execution strategy and a human-computer interaction data format, and then realizing data interaction among computing nodes and dynamic adjustment of a parallel strategy through a sharing mechanism;
the human-in-loop hybrid enhancement module is used for providing a data path for guiding the vulnerability mining process for a human through a human-computer interaction channel, and transmitting the research, judgment and selection of experts to the human-computer collaborative parallelization fuzzy module according to the human-computer interaction data format selected by the human-computer collaborative intelligent decision module based on the knowledge vector space, so that the deep combination of human intelligence and the vulnerability mining process is realized.
2. The system according to claim 1, wherein the knowledge vector space-based human-computer collaborative intelligent decision module is specifically configured to convert a human vulnerability mining knowledge space into a machine-understandable human-computer collaborative vulnerability mining knowledge base vector space, actively construct a vulnerability mining model suitable for a target from the human-computer collaborative vulnerability mining knowledge base vector space by using automatic exploration discovery and associated learning capabilities of deep reinforcement learning, and perform corresponding optimal parameter decision, wherein the human-computer collaborative vulnerability mining knowledge base vector space contains a machine-understandable vulnerability principle and a behavior model of vulnerability existence, human mining vulnerability related knowledge and expert knowledge data, and based on the knowledge vector space and expert experience, a strategy decision model is trained by a deep reinforcement learning technique to obtain an optimal strategy < task information for vulnerability mining, the method comprises the steps of initializing a recommended vulnerability mining model, a fuzzy execution strategy and a human-computer interaction data format, transmitting a strategy result to a human-computer collaborative parallelization fuzzy module based on an annotation source language, and guiding a machine cluster to execute fuzzy tasks.
3. The system of claim 2, wherein the man-machine cooperative intelligent decision module based on the knowledge vector space adopts a deep reinforcement learning framework, selects vulnerability mining knowledge matched with the target program through aggregation analysis and association matching based on the interaction between the knowledge base vector space and the target program, so as to automatically construct a mining model, and generates optimal parameter configuration based on the association degree between the knowledge base and the target information through simulating a human vulnerability mining mode and adopting the deep reinforcement learning framework.
4. The system of claim 2, wherein the annotated source language based human-machine collaborative parallelization Fuzz module is specifically configured to fuse human knowledge and experience into a parallelization vulnerability mining process to guide parallelization tasks of a machine.
5. The system of claim 4, wherein the annotated source language based human-machine collaborative parallelization fuzzy module consists of two parts: the system comprises a server and a machine cluster, wherein the server comprises eight modules, namely a parallel task distribution and starting module, a state updating module, a database module, a crash analysis module, a bitmap monitoring and task adjusting module, a seed selection module, an anomaly analysis module and a server side communication module; each node in the machine cluster is equal in status, and each node comprises three modules, namely a node end communication module, a state monitoring module and a variation and test module;
the parallel task allocation and starting module is used for completing the disassembly and slicing of the vulnerability mining task, so that the task is suitable for large-scale concurrent execution of the machine cluster, and the result of task division (task description, slice index and task execution instruction) is issued to the machine cluster through the server-side communication module; the seed selection module is used for completing the seed force selection of Fuzz by combining the manual selection of experts based on the knowledge vector space; the anomaly analysis module and the Crash analysis module are used for carrying out anomaly in the task execution process, after expert analysis and judgment, if Crash belongs to program or function anomaly, the anomaly information is transmitted to the state updating module, the database module and the bitmap monitoring and task adjusting module, the task execution state is updated, the anomaly information is stored, and the task execution is dynamically adjusted, and then the adjusted task execution information and instructions are sent to the machine cluster through the server-side communication module;
the variation and test module of each node in the machine cluster is used for specifically executing a fuzzy test task aiming at a target object, recording task execution site and Crash information is completed through the state monitoring module, and information interaction with the server-side communication module is realized through the node-side communication module.
6. The system of claim 5, wherein the parallel task allocation and start module is configured to implement static analysis of a given target program, perform fuzzy task division according to a static analysis result and the number of parallel nodes, execute a start command, and send the start command to the fuzzy node cluster through the server-side communication module, where each node receives the allocated task and the start command to start performing a fuzzy test;
the state updating module is used for monitoring and updating the working and survival states of the server and the nodes, detecting whether each node is abnormal or offline in real time, and updating the overall running state of the system;
the database module is used for storing various system operation data including a test case md5 checksum, each node bitmap and a total bitmap;
the Crash analysis module is used for managing analysis tasks of Crash generated in the vulnerability mining process and receiving feedback results from experts;
the bitmap monitoring and task adjusting module is used for collecting bitmaps of all parallel nodes, updating the bitmaps within a fixed time period, maintaining a table in a database, storing the bitmaps collected by the bitmap monitoring and task adjusting module, observing the load condition of each node, adjusting the fuzzy strategy of each node according to the load condition, and realizing load balance among the parallel nodes;
the seed selection module is used for generating a seed use case used by a fuzzy task and collecting running information from a fuzzy node cluster, wherein the running information comprises target program coverage rate, execution time and newly generated crash number;
the abnormality analysis module is used for processing various emergency conditions and determining which fault processing strategy is to be executed;
the server-side communication module is used for collecting various operation information from the fuzzy node cluster, including a test case causing crash, a test case detecting a new path, an md5 checksum of each newly generated test case, a program overlay path bitmap, a variation strategy and a variation method causing new discovery, and a character set to be selected, and is also used for interacting with other modules in the server, submitting shared information from the node cluster, collecting feedback information from other modules in the server, and sending the feedback information to the node cluster.
7. The system of claim 5, wherein the node side communication module is configured to collect operation information of the node itself and send the operation information to the server, and is also configured to receive feedback information from the server and send the feedback information to the variation and test module of the node itself, and is further configured to interact with the status monitoring module and send status information to the server;
the state monitoring module is used for monitoring the running state of the fuzzy task, transmitting the information to the node end communication module and transmitting the information to the server for further processing and decision making by the node end communication module;
the mutation and test module is used for executing specific mutation work and test work based on the seed file, the mutation strategy and the mutation target provided by the expert, generating various shared information and reporting the task execution condition.
8. The system of claim 3, wherein the human-in-loop hybrid enhancement module is specifically configured to select a Markov decision process as a reinforcement learning model, to determine from the uncertainty and importance of the problem, to select a set of problems requiring manual intervention, to introduce deeper human intelligence in the vulnerability mining process, and to guide iterative optimization of the reinforcement learning model.
9. The system of claim 8, wherein the human-in-loop hybrid enhancement module is specifically configured to, with the questions as a granularity, throw the questions to be determined to a human for processing, i.e., a process of asking the human, and determine what questions require human intervention in order to determine, considering two aspects of the questions, i.e., the uncertainty of the questions and the importance of the questions:
(1) uncertainty: when judging whether a relation exists between two entities, giving a probability value belonging to [0, 1], and it is considered that when the probability value is closer to both ends (0 or 1), the given conclusion is more accurate, in the vulnerability mining scene, when a model in reinforcement learning selects behaviors, probability distribution of each behavior is selected and is finally expanded to be probability distribution of a selection path, when the model selects the path, the probability distribution of the model is more different, i.e. the machine has more confidence in selecting or not selecting a path, whereas, for the case where the probability distributions of the individual paths tend to be even, the machine can be considered to be difficult to distinguish which path is better to select, the probability distribution condition of path selection is evaluated in a mode of calculating entropy, and the more average the probability distribution of the paths is, the larger the value of the entropy is, and the smaller the entropy is otherwise;
(2) importance: in addition to uncertainty, the 'importance' is also used as a measure for 'whether to throw a problem out to judge people', for a relationship, in the learning process of a model, a problem can be selected for many times, so that a larger reasoning weight is provided during class prediction, if the reasoning information quantity provided by the problem cannot be matched with the weight, the final reasoning result can be influenced, and the influence is more serious than the result of mismatching of paths with small weights, the important problem is thrown out and handed to manual judgment, specifically, a relationship r is given, and a plurality of reasoning paths t are obtained through the modeliE T, the evaluation mode of the importance is to calculate the maximum value of the cumulative probability:
tselected=max(Σp(ti))
finally, the consideration of two dimensions of uncertainty and importance is integrated, and the manual judgment is given by selecting and throwingThe rules for the problem of outage are: for each relationship, if H (T) > c, then t in the relationship inference path is selectedselectedProviding a human for judgment, wherein c is a constant;
when the judgment is carried out manually, a person can understand the reasoning relation between the problems and the categories more easily and can give a judgment, the judgment of the person on the problems is carried out by grading and feeding back the problems to the model, the step is the process of solving the problems by the person, the reasoning needs to be judged manually according to uncertainty and importance, the model can feed the reasoning process back to the person for judgment, after the grading result of the manual judgment is obtained, the grading result is added into the return function of the path, the parameters of the model are retrained, and particularly, for the path which is finished by the manual judgment, the manual feedback item R is added into the original return functionturnmanThe definition of the artificial feedback term is as follows:
Rturnman=(Score-3)3
in the formula: score represents the Score of the person for the inference path, and the return function of the person for manually judging the completed path is defined as follows:
Rture.N=Rsturn+λ4Rturnman
in the formula: rsturn=λ1Rreachability+λ2Rlength+λ3RdiversityRepresenting a return function other than artificial feedback, λ1、λ2、λ3Are all preset coefficients, related to sample characteristics, Rreachability、Rlength、RdiversityAnd respectively representing the reachability measurement, the length factor and the diversity score, if the path is repeatedly selected by the model, directly using the previous manual judgment result without throwing the manual judgment, and finally calculating a comprehensive return function of the path and using the comprehensive return function to update the parameters of the model.
10. A man-machine cooperation based large-scale Fuzzing optimization method implemented by using the system of any one of claims 1 to 9.
Background
Vulnerability mining generally refers to a process in which security researchers audit software and system codes by using various tools and analyze a software execution process to search for defects. Vulnerability mining techniques aim to fully mine vulnerabilities present in hardware design, software programs, security management and network communication protocols in a target network architecture. Vulnerability mining is calculation-intensive work, has great relevance to the scale and complexity of software, the performance of a hardware system and the adopted analysis technology, and needs to dynamically adjust program analysis strategies according to the influence factors in research practice, so that better balance and compromise between analysis efficiency and analysis depth are obtained. Vulnerability mining has significant characteristics in the aspects of algorithm, analysis data storage and processing, the existing technology has low analysis efficiency on large complex programs, and the parallel processing capability provided by high-performance hardware equipment is not fully utilized. The method explores a large-scale and parallelization vulnerability mining technology, enhances the utilization rate of heterogeneous computing resources, and can well meet the requirement of rapid analysis of large-scale complex software. On one hand, a lightweight analysis technology and a heuristic state space detection technology (such as fragile path screening, low-frequency path screening and the like) need to be researched, and the guidance of vulnerability excavation is enhanced within a small cost; on the other hand, efficient large-scale and parallelization analysis methods need to be researched.
As an effective means of vulnerability discovery, the fuzzy test technology has important safety protection significance for scientific research and production. Due to the advantages of high usability, low cost, good detection effect and the like, the fuzzy test technology has become one of the security vulnerability detection technologies commonly used in the industry at present, and numerous researches and applications have been developed aiming at the targets of network protocols, portal websites, key information systems and the like. However, the current fuzzy testing technology still has the problems of main tool implementation, low intelligent degree, low man-machine cooperation degree, low machine-machine cooperation degree and the like.
Disclosure of Invention
Technical problem to be solved
The technical problem to be solved by the invention is as follows: how to solve the problems of main tool execution, low intelligent degree, low man-machine cooperation degree, low machine-machine cooperation degree and the like in the current fuzzy test.
(II) technical scheme
In order to solve the technical problem, the invention provides a large-scale fuzzy optimization system based on man-machine cooperation, which comprises a man-machine cooperation intelligent decision module based on a knowledge vector space, a man-machine cooperation parallelization fuzzy module based on an annotated source language and a human-in-loop hybrid enhancement module;
the human-computer cooperation intelligent decision module based on the knowledge vector space is used for completing the abstraction of elements such as human vulnerability mining knowledge and machine vulnerability mining capability, mapping the elements to a uniform vector space, taking the human-computer knowledge as part input of task planning, and providing decision guidance for parallelization fuzzy execution;
the human-computer collaboration parallelization fuzzy module based on the annotated source language is used for completing human-computer collaboration based fuzzy flow construction, receiving decision results output by a human-computer collaboration intelligent decision module, wherein the decision results are task information, an initially recommended vulnerability mining model, a fuzzy execution strategy and a human-computer interaction data format, and then realizing data interaction among computing nodes and dynamic adjustment of a parallel strategy through a sharing mechanism;
the human-in-loop hybrid enhancement module is used for providing a data path for guiding the vulnerability mining process for a human through a human-computer interaction channel, and transmitting the research, judgment and selection of experts to the human-computer collaborative parallelization fuzzy module according to the human-computer interaction data format selected by the human-computer collaborative intelligent decision module based on the knowledge vector space, so that the deep combination of human intelligence and the vulnerability mining process is realized.
The invention also provides a large-scale Fuzzing optimization method based on man-machine cooperation, which is realized by utilizing the system.
(III) advantageous effects
The invention designs a man-machine cooperation-based large-scale fuzzy optimization system based on a man-machine cooperation idea and oriented to file analysis target objects such as text editing software, audio and video software and the like, constructs a man-machine cooperation-based large-scale fuzzy technical framework, researches the vulnerability mining process of a man-machine cooperation mode, realizes the organic combination of human computing and computer computing capabilities, and explores for improving the whole vulnerability mining capability.
Drawings
FIG. 1 is a schematic diagram of a large-scale fuzzy optimization system based on human-computer collaboration;
FIG. 2 is a block diagram of a human-computer collaborative intelligent decision model designed in the present invention;
FIG. 3 is a diagram of a human-machine collaborative parallelization Fuzz module architecture designed by the present invention;
fig. 4 is a schematic diagram of a human-in-circuit hybrid enhancement module designed in accordance with the present invention.
Detailed Description
In order to make the objects, contents, and advantages of the present invention clearer, the following detailed description of the embodiments of the present invention will be made in conjunction with the accompanying drawings and examples.
As shown in FIG. 1, the large-scale fuzzy optimization system based on human-computer collaboration provided by the invention comprises a human-computer collaboration intelligent decision module based on a knowledge vector space, a human-computer collaboration parallelization fuzzy module based on an annotated source language and a human-in-loop hybrid enhancement module.
The human-computer cooperation intelligent decision module based on the knowledge vector space is used for abstracting elements such as human vulnerability mining knowledge and machine vulnerability mining capability, mapping the elements to a uniform vector space, taking the human-computer knowledge as part of task planning input, and providing decision guidance for parallelization fuzzy execution;
the human-computer collaboration parallelization fuzzy module based on the annotated source language is used for completing fuzzy process construction based on human-computer collaboration, receiving decision results output by the human-computer collaboration intelligent decision module, namely task information, an initially recommended vulnerability mining model, a fuzzy execution strategy and a human-computer interaction data format, and then realizing data interaction among large-scale computing nodes and dynamic adjustment of a parallel strategy through a sharing mechanism;
the human-in-loop hybrid enhancement module is used for providing a data path for guiding the vulnerability mining process for a human through a human-computer interaction channel, and transmitting the research and judgment and selection of experts to the human-computer collaborative parallelization fuzzy module according to the human-computer interaction data format selected by the human-computer collaborative intelligent decision module based on the knowledge vector space, so that the deep combination of human intelligence and the vulnerability mining process is realized.
The man-machine cooperation intelligent decision module based on the knowledge vector space comprises:
in many links of vulnerability discovery, a control body is required to make decisions to guide the task direction of the next step. In a traditional vulnerability mining mode, a decision-making role is usually undertaken by technicians or experts, and the technicians or experts make judgment and decision according to own experience and specific conditions. The main problem with expert experience based decision making is that under-consideration of the current state of the target results in a decision being made that is not optimal. Aiming at the problem, the human vulnerability mining knowledge space is converted into a man-machine cooperation vulnerability mining knowledge base vector space which can be understood by a machine, the automatic exploration discovery and associated learning capacity of deep reinforcement learning is utilized, a vulnerability mining model suitable for a target is actively constructed from the man-machine cooperation vulnerability mining knowledge base vector space, and corresponding optimal parameter decision is carried out. The vector space of the man-machine cooperation vulnerability mining knowledge base comprises data such as a vulnerability principle which can be understood by a machine, a behavior model of vulnerability existence, relevant knowledge of human mining vulnerability, expert knowledge and the like. Based on the existing knowledge vector space and expert experience, a strategy decision model is trained through a deep reinforcement learning technology, the optimal strategy < task information of vulnerability mining, an initially recommended vulnerability mining model, a fuzzy execution strategy and a man-machine interaction data format > are obtained, the strategy result is transmitted to a man-machine collaborative parallelization fuzzy module based on annotated source language, and a machine cluster is guided to execute fuzzy tasks. The intelligent decision-making model composition based on man-machine cooperation is shown in FIG. 2.
The man-machine cooperative intelligent decision model adopts a deep reinforcement learning framework, selects vulnerability mining knowledge matched with a target program through aggregation analysis and association matching based on the interaction between a knowledge base vector space and the target program, and automatically constructs a mining model. And generating optimal parameter configuration by simulating a human vulnerability mining mode and adopting a deep reinforcement learning framework based on the association degree of the knowledge base and the target information. Different vulnerability mining models need different parameter settings, and the vulnerability mining effect is directly influenced by the quality of the corresponding parameter settings.
Human-machine collaborative parallelization fuzzy module based on annotated source language:
the invention integrates human knowledge and experience into the parallelization vulnerability mining process, guides the parallelization task of the machine, and provides an idea for solving the problems of low artificial dynamic participation and low man-machine and machine-machine cooperation degree in a large-scale concurrency scene. At present, a Fuzz-based vulnerability mining process is man-machine independent, and the high computing performance of a machine and the experience of a person are not effectively combined. Meanwhile, the existing parallelization fuzzy technology has relatively independent work among parallel nodes, and does not share high-value information such as experience in the vulnerability mining process among all nodes, so that the parallelization efficiency is low, and the repetitive work of all nodes is large. Therefore, the man-machine large-scale collaborative fuzzy method based on the sharing mechanism is designed, and multi-dimensional knowledge and experience sharing among large-scale nodes is achieved. Through the continuous iteration process of the shared information, the parameters of each node are optimized, and the repeated work among the parallel nodes is reduced, so that the overall mining efficiency is improved. The overall framework of the human-computer collaborative parallelization fuzzy module is shown in fig. 3:
the human-machine collaborative parallelization fuzzy module based on the annotated source language mainly comprises two parts, namely a server part and a machine cluster part. The server comprises eight modules, namely a parallel task distribution and starting module, a state updating module, a database module, a crash analysis module, a bitmap monitoring and task adjusting module, a seed selection module, an abnormality analysis module and a server-side communication module; each node in the machine cluster is equal in status, and each node comprises three modules, namely a node end communication module, a state monitoring module and a variation and test module.
Specifically, the server part is mainly responsible for anomaly analysis and recording. The task allocation and starting module completes the disassembly and slicing of the vulnerability discovery task, so that the task is suitable for large-scale concurrent execution of the machine cluster, and the result of task division (task description, slice index and task execution instruction) is issued to the machine cluster through the server-side communication module; the seed selection module completes the seed force selection of the fuzzy based on the knowledge vector space in the invention and the manual selection of the expert; the anomaly analysis module and the Crash analysis module are both used for carrying out anomaly in the task execution process, after expert analysis and judgment, if Crash belongs to program or function anomaly, the anomaly information is transmitted to the state updating module, the database module and the bitmap monitoring and task adjusting module, the task execution state is updated, the anomaly information is stored, and the task execution is dynamically adjusted, and then the adjusted task execution information and instructions are sent to the machine cluster through the communication module at the server side.
The variation and test module of each node in the machine cluster part is responsible for specifically executing a fuzzy test task aiming at a target object, the recording of information such as a task execution site, Crash and the like is completed through the state monitoring module, and the information interaction with the server-side communication module is realized through the node-side communication module.
(1) The parallel task allocation and starting module comprises: the module is responsible for statically analyzing a given target program, dividing fuzzy tasks according to a static analysis result and the number of parallel nodes, executing a starting command and sending the starting command to a fuzzy node cluster through a server-side communication module, and each node receives the distributed tasks and the starting command to start to execute fuzzy test.
(2) A state updating module: the module is mainly used for monitoring and updating the working and survival states of the server and the nodes, detecting whether each node is abnormal or offline in real time, and updating the overall running state of the system.
(3) A database module: the module is responsible for storing various system operation data, and mainly comprises a test case md5 checksum, each node bitmap, a total bitmap and the like.
(4) A blast analysis module: the module mainly manages the analysis task of crash generated in the vulnerability mining process and receives the feedback result from the expert.
(5) The bitmap monitoring and task adjusting module: the module is responsible for collecting bitmaps of all parallel nodes, updating the bitmaps within a fixed time period, maintaining a table in the database, storing the bitmaps collected by the module, observing the load condition of each node, and adjusting the fuzzy strategy of each node according to the load condition to realize load balance among the parallel nodes.
(6) A seed selection module: the module is responsible for generating a seed use case used by a fuzzy task and collecting running information from a fuzzy node cluster, wherein the running information comprises target program coverage rate, execution time, newly generated crash quantity and the like.
(7) An anomaly analysis module: the module is responsible for processing various emergency situations, such as unexpected stop of a test task of a certain node, analyzing the situation according to the information fed back by the communication module, and determining the fault processing strategy to be executed.
(8) A server-side communication module: the module is responsible for collecting various running information from the fuzzy node cluster, including test cases leading to crash, test cases detecting new paths, md5 checksum of each newly generated test case, program overlay path bitmap, mutation strategies and mutation methods leading to new findings, candidate character sets, and the like. And meanwhile, the module is also responsible for interacting with other modules in the server, submitting shared information from the node cluster, collecting feedback information from other modules of the server, and sending the feedback information to the node cluster.
(9) A node end communication module: the module is responsible for collecting the running information of the node and then sending the running information to the server, and is responsible for receiving the feedback information of the server and then pushing the feedback information to the variation and test module of the node. In addition, the module is also responsible for interacting with the state monitoring module and sending the state information to the server.
(10) A state monitoring module: the module is responsible for monitoring the running conditions of the fuzzy task, such as target program coverage rate, running abnormal information, running duration, new path, crash and the like, transmitting the information to the communication module, and transmitting the information to the server by the communication module for further processing and decision making.
(11) A variation and test module: the module is a specific test module of the whole framework, executes specific mutation work and test work based on key information such as seed files, mutation strategies, mutation targets and the like provided by experts, generates various shared information, reports task execution conditions and the like.
Human-in-the-loop hybrid enhancement module:
based on human experience and analysis results, the deep learning model is continuously guided to be optimized, so that human computation and computer computation are mutually adaptive and work cooperatively, the accuracy of the deep learning model in executing tasks is improved, and the enhanced intelligent form of '1 +1> 2' is formed. The traditional intelligent vulnerability mining technology executes vulnerability mining processes based on predefined strategies or learned models, and interaction channels for model optimization are not designed between human and machines. In the invention, a Markov decision process is selected as a basic model of reinforcement learning, the uncertainty and importance of the problem are judged, a problem set needing manual intervention is selected, and deeper human intelligence is introduced in the vulnerability mining process to guide the iterative optimization of the reinforcement learning model. The process is shown in fig. 4:
the markov decision process is used as a basic model for reinforcement learning in the present invention. The model can be represented as a quadruple < S, a, P, R >, where S represents the State (State) the current model is in; a represents the selection of actions (actions) by the model; p represents a probability matrix from a current state to a next state; r represents a return function for selecting a certain behavior or a series of behaviors in the current state, the return function is a feedback value for adopting a series of behaviors for the model, and the return function guides the behavior selection of the model each time in the process of training the reinforcement learning model. Because the times of behavior selection are many, and it is difficult for people to intuitively judge whether each walk is reasonable in selection, the problems to be judged are thrown out to people for processing by taking the problems as granularity, namely, the process of inquiring people. To determine what problem requires human intervention to determine, two aspects of the problem need to be considered, namely the uncertainty of the problem and the importance of the problem.
(1) Uncertainty: when judging whether a certain relation exists between two entities, a probability value belonging to [0, 1] is given, and when the probability value is closer to two ends (0 or 1), the given conclusion is more accurate. In the vulnerability mining scene, when a model in reinforcement learning is used for behavior selection, probability distribution of each behavior is selected, and the probability distribution is finally expanded to be a selection path. When the model is used for path selection, the probability distribution of the model is more different, namely, the machine has more grasp on selecting or not selecting a path. Conversely, for the case where the probability distributions of the respective paths tend to be even, it may be considered that it is difficult for the machine to distinguish which one of the paths is the better choice. And evaluating the probability distribution condition of the path selection by calculating the entropy. The entropy has a larger value when the probability distribution of the path is more even, and vice versa. The formula for calculating entropy is defined as follows:
in the formula, p (t)i) Indicates the selection path tiThe probability of (c).
(2) Importance: in addition to uncertainty, the present invention takes "importance" as a measure of "whether or not to throw a question out of human judgment". For a relationship, a question may be selected multiple times during the learning of the model, so that it provides a greater inference weight in class prediction. If the inference information amount provided by the problem cannot be matched with the weight, the final inference result may be affected, and the effect is more serious than the result of mismatching of the path with small weight. Therefore, the invention throws such important questions to manual judgment to ensure that they can provide more accurate reasoning information amount. Specifically, given a relationship r, a plurality of inference paths t are obtained through the modeliE T, the evaluation mode of the importance is to calculate the maximum value of the cumulative probability:
tselected=max(∑p(ti))
finally, the consideration of two dimensions of uncertainty and importance is integrated, and the rule for throwing the problem of manual judgment is selected as follows: for each relationship, ifH (T) > c, then t in the relationship inference path is selectedselectedProviding a judgment to the human, wherein c is a constant.
When making a judgment manually, it is easier for a person to understand the reasoning relationship between the problem and the category and to give a judgment. The human judgment of the question can be fed back to the model by scoring the question, which is the process of human answering the question. If the model attempts to infer path C using path A, B, it is determined that the inference needs to be made manually based on the above-mentioned uncertainty and importance, and the model can feed back the inference process to the human for the determination. Whether the person reasonably gives a score of 1-5 points to the current inference is judged, in order to ensure the accuracy of manual judgment, a certain group of inference paths to be judged may be distributed to 1-3 persons for judgment, and finally, the average score is obtained and the result is fed back to the computing cluster model. After the manually judged scoring result is obtained, the manually judged scoring result is added into the return function of the path, and the parameters of the model are retrained. Specifically, for a path that is manually judged to be completed, an artificial feedback term R needs to be added to the original return functionturnman. The definition of the manual feedback item is as follows:
Rturnman=(Score-3)3
in the formula: score represents the person's Score for the inference path. For the path which is manually judged to be completed, the return function is defined as follows:
Rture.N=Rsturn+λ4Rturnman
in the formula: rsturn=λ1Rreachability+λ2Rlength+λ3RdiversityRepresenting a return function other than artificial feedback, λ1、λ2、λ3Are all preset coefficients and are related to the sample characteristics. Rreachability、Rlength、RdiversityAnd respectively representing the reachability measurement, the length factor and the diversity score, and directly using the previous manual judgment result instead of throwing out manual judgment if the path is repeatedly selected by the model. Finally, calculating the comprehensive return function of the path,and used to update the parameters of the model.
In the above process, the model adjusts the process of reasoning according to the human score. After the scoring result of manual judgment is obtained, the model adds the manual scoring result into the return function of the path, and retrains the parameters of the model. By the method, the hybrid enhancement intelligence of people in the loop is realized, so that the action of people in the loophole excavation loop is enhanced. The proposal realizes the dialogue between the human and the machine by throwing the questions by the machine and solving the questions by the human, and the dialogue is carried out on the basis that the human and the machine have a common knowledge level. The human action is introduced into a computing loop of an intelligent system, a high-level cognitive mechanism for analyzing and responding fuzzy and uncertain problems of a human can be closely coupled with a machine intelligent system, the high-level cognitive mechanism and the machine intelligent system are mutually adaptive and work cooperatively to form bidirectional information communication and control, and the perception and cognitive ability of the human and the strong computing ability and storage ability of a computer are combined to form an intelligent enhanced intelligent form of '1 +1> 2'.
The above description is only a preferred embodiment of the present invention, and it should be noted that, for those skilled in the art, several modifications and variations can be made without departing from the technical principle of the present invention, and these modifications and variations should also be regarded as the protection scope of the present invention.
- 上一篇:石墨接头机器人自动装卡簧、装栓机
- 下一篇:一种针对IoT设备的智能防护方法、系统