Task scheduling method, device, equipment and storage medium
1. A method for task scheduling, comprising:
acquiring a relation graph for reflecting the dependency relationship among tasks from a relation graph editing area of a visual configuration interface;
analyzing the relation graph to generate a configuration file adaptive to the task scheduling system;
after the configuration file is uploaded to the task scheduling system, scheduling information of each task is obtained from a scheduling information configuration area of the visual configuration interface, wherein the scheduling information comprises execution time and alarm information of a project to which each task belongs;
and sending the scheduling information to the task scheduling system so that the task scheduling system executes task scheduling on each task based on the configuration file and the scheduling information.
2. The method of claim 1, wherein parsing the relationship graph to generate a configuration file adapted to a task scheduling system comprises:
generating a relational graph description corresponding to the relational graph according to a preset computer language format, wherein the relational graph description comprises a first part description used for indicating node information of each node and a second part description used for indicating node relation of each node, and any task has a unique corresponding node in the relational graph;
respectively acquiring respective node information of each node from the first part of description and respectively acquiring respective dependent node of each node from the second part of description;
and generating the configuration file adaptive to the task scheduling system according to the respective node information of each node and the respective dependent node of each node.
3. The method of claim 2, wherein generating a relationship graph description corresponding to the relationship graph according to a preset computer language format comprises:
identifying at least one node in the relational graph according to preset node attributes;
respectively acquiring respective node information of each node;
respectively identifying directed line segments which respectively have connection relations with the nodes;
respectively determining the respective dependent nodes of the nodes based on the direction of the directed line segment;
and describing the respective node information of each node and the respective dependent node of each node according to the preset computer language format to obtain the relational graph description.
4. The method of claim 2, wherein obtaining respective dependent nodes of the nodes from the second partial description comprises:
for any node, searching a target node from the second part of description to obtain a target node relation of the node;
acquiring an initial node in the target node relation;
and determining the initial node as a dependent node of the node.
5. The method of claim 2, wherein generating the configuration file adapted to the task scheduling system according to the respective node information of each node and the respective dependent node of each node comprises:
determining leaf nodes from the nodes;
creating a tail node which has a dependency relationship with the leaf node, wherein the dependency node of the tail node is the leaf node;
acquiring node information of the tail node, and generating a configuration file corresponding to the tail node according to the node information of the tail node and the dependent node of the tail node;
generating configuration files corresponding to the nodes according to the respective node information of the nodes and the respective dependent nodes of the nodes;
and determining the configuration file corresponding to the tail node and the configuration files corresponding to the nodes as the configuration files.
6. The method of claim 5, wherein determining a leaf node from the nodes comprises:
determining a non-leaf node from each node according to the respective dependent node of each node;
and determining the node of the nodes except the non-leaf node as the leaf node.
7. The method of claim 6, wherein determining a non-leaf node from said nodes based on a respective dependent node of said nodes comprises:
searching the node relation of which the target node is not empty from the second part of description;
acquiring an initial node from the node relation of which the target node is not empty;
and taking the starting node as the non-leaf node.
8. A task scheduling apparatus, comprising:
the first obtaining unit is used for obtaining a relation graph for reflecting the dependency relationship among tasks from a relation graph editing area of a visual configuration interface;
the generating unit is used for analyzing the relation graph and generating a configuration file adaptive to the task scheduling system;
the second obtaining unit is used for obtaining the scheduling information of each task from a scheduling information configuration area of the visual configuration interface after the configuration file is uploaded to the task scheduling system, wherein the scheduling information comprises the execution time and the alarm information of the project to which each task belongs;
and the scheduling unit is used for sending the scheduling information to the task scheduling system so that the task scheduling system executes task scheduling on each task based on the configuration file and the scheduling information.
9. An electronic device, comprising: the system comprises a processor, a memory and a communication bus, wherein the processor and the memory are communicated with each other through the communication bus;
the memory for storing a computer program;
the processor, configured to execute the program stored in the memory, and implement the task scheduling method according to any one of claims 1 to 7.
10. A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out the method for task scheduling according to any one of claims 1 to 7.
Background
The Azkaban is used as a workflow scheduling tool and integrates functions of scheduling, arranging, failure retry, mail warning and the like. When the method is applied currently, a user manually writes and completes respective configuration files of each task, packs all the configuration files and required resource files into a zip file, uploads the zip file to the Azkaban server through a user interface provided by the Azkaban server, then calls respective scheduling information configuration pages of each task on the Azkaban server through the user interface, and completes the configuration of scheduling information on the scheduling information configuration pages one by one in a page switching mode.
However, the configuration process is complicated because the writing of the configuration file and the configuration of the scheduling information are performed by the user in multiple times.
Disclosure of Invention
The application provides a task scheduling method, a task scheduling device and a task scheduling storage medium, which are used for solving the problem that a configuration process is complex in the related art.
In a first aspect, a task scheduling method is provided, which includes:
acquiring a relation graph for reflecting the dependency relationship among tasks from a relation graph editing area of a visual configuration interface;
analyzing the relation graph to generate a configuration file adaptive to the task scheduling system;
after the configuration file is uploaded to the task scheduling system, scheduling information of each task is obtained from a scheduling information configuration area of the visual configuration interface, wherein the scheduling information comprises execution time and alarm information of a project to which each task belongs;
and sending the scheduling information to the task scheduling system so that the task scheduling system executes task scheduling on each task based on the configuration file and the scheduling information.
Optionally, parsing the relationship graph to generate a configuration file adapted to the task scheduling system includes:
generating a relational graph description corresponding to the relational graph according to a preset computer language format, wherein the relational graph description comprises a first part description used for indicating node information of each node and a second part description used for indicating node relation of each node, and any task has a unique corresponding node in the relational graph;
respectively acquiring respective node information of each node from the first part of description and respectively acquiring respective dependent node of each node from the second part of description;
and generating the configuration file adaptive to the task scheduling system according to the respective node information of each node and the respective dependent node of each node.
Optionally, generating a relationship diagram description corresponding to the relationship diagram according to a preset computer language format, including:
identifying at least one node in the relational graph according to preset node attributes;
respectively acquiring respective node information of each node;
respectively identifying directed line segments which respectively have connection relations with the nodes;
respectively determining the respective dependent nodes of the nodes based on the direction of the directed line segment;
and describing the respective node information of each node and the respective dependent node of each node according to the preset computer language format to obtain the relational graph description.
Optionally, the obtaining the respective dependent node of each node from the second partial description includes:
for any node, searching a target node from the second part of description to obtain a target node relation of the node;
acquiring an initial node in the target node relation;
and determining the initial node as a dependent node of the node.
Optionally, generating the configuration file adapted to the task scheduling system according to the respective node information of each node and the respective dependent node of each node, includes:
determining leaf nodes from the nodes;
creating a tail node which has a dependency relationship with the leaf node, wherein the dependency node of the tail node is the leaf node;
acquiring node information of the tail node, and generating a configuration file corresponding to the tail node according to the node information of the tail node and the dependent node of the tail node;
generating configuration files corresponding to the nodes according to the respective node information of the nodes and the respective dependent nodes of the nodes;
and determining the configuration file corresponding to the tail node and the configuration files corresponding to the nodes as the configuration files.
Optionally, determining a leaf node from the nodes includes:
determining a non-leaf node from each node according to the respective dependent node of each node;
and determining the node of the nodes except the non-leaf node as the leaf node.
Optionally, determining a non-leaf node from the nodes according to the respective dependent node of each node includes:
searching the node relation of which the target node is not empty from the second part of description;
acquiring an initial node from the node relation of which the target node is not empty;
and taking the starting node as the non-leaf node.
In a second aspect, there is provided a task scheduling apparatus, including:
the first obtaining unit is used for obtaining a relation graph for reflecting the dependency relationship among tasks from a relation graph editing area of a visual configuration interface;
the generating unit is used for analyzing the relation graph and generating a configuration file adaptive to the task scheduling system;
the second obtaining unit is used for obtaining the scheduling information of each task from a scheduling information configuration area of the visual configuration interface after the configuration file is uploaded to the task scheduling system, wherein the scheduling information comprises the execution time and the alarm information of the project to which each task belongs;
and the scheduling unit is used for sending the scheduling information to the task scheduling system so that the task scheduling system executes task scheduling on each task based on the configuration file and the scheduling information.
In a third aspect, an electronic device is provided, comprising: the system comprises a processor, a memory and a communication bus, wherein the processor and the memory are communicated with each other through the communication bus;
the memory for storing a computer program;
the processor is configured to execute the program stored in the memory, and implement the task scheduling method according to the first aspect.
A computer-readable storage medium is provided, in which a computer program is stored, which, when being executed by a processor, carries out the method of task scheduling according to the first aspect.
Compared with the prior art, the technical scheme provided by the embodiment of the application has the following advantages: according to the technical scheme provided by the embodiment of the application, the relation graph used for reflecting the dependency relationship among the tasks is obtained through the relation graph editing area of the visual configuration interface, the relation graph is analyzed, the configuration file adapted to the task scheduling system is generated, after the configuration file is uploaded to the task scheduling system, the scheduling information of the tasks is obtained through the scheduling information configuration area of the visual configuration interface, and the scheduling information is sent to the task scheduling system, so that the task scheduling system can execute task scheduling on the tasks based on the configuration file and the scheduling information. Therefore, the configuration of the scheduling information and the dependency relationship for analyzing the configuration file in one configuration process is realized through the relationship diagram editing area and the scheduling information configuration area of the visual configuration interface, so that the configuration process is simple, the efficiency of issuing the scheduling task by the user is greatly improved, and the process of developing the offline data is shortened.
In addition, due to the fact that the scheduling information configuration is carried out on the tasks through the unified visual configuration interface, the back-and-forth switching of multiple pages on the Azkaban and complex configuration operation are reduced.
In addition, because the relationship diagram reflecting the dependency relationship of each task can be configured in the relationship diagram editing area on the visual configuration interface, and the configuration file is obtained based on the analysis of the relationship diagram, the step of manually writing a plurality of configuration files by a user can be avoided, the time overhead of arranging tasks by the user is greatly reduced, and the trial-and-error times of arranging tasks are reduced or even avoided.
Finally, the scheme can meet the requirement that the user can release the information after completing the development by associating the online development task, and provides one-stop working experience of development and release.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the invention and together with the description, serve to explain the principles of the invention.
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious for those skilled in the art that other drawings can be obtained according to the drawings without inventive exercise.
Fig. 1 is a schematic structural diagram of an Azkaban-based task scheduling system provided in an embodiment of the present application;
FIG. 2 is a diagram illustrating a relationship between tasks provided by an embodiment of the present application;
fig. 3 is a schematic flowchart of a task scheduling method according to an embodiment of the present application;
fig. 4 is a schematic flowchart of another task scheduling method according to an embodiment of the present application;
FIG. 5 is a further relationship diagram provided by an embodiment of the present application;
FIG. 6 is a diagram illustrating a relationship between nodes according to an embodiment of the present application;
FIG. 7 is a diagram illustrating another example of a node relationship according to the present application;
fig. 8 is a schematic structural diagram of a task scheduling device according to an embodiment of the present application;
fig. 9 is a schematic structural diagram of an electronic device according to an embodiment of the present application.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present application clearer, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are some embodiments of the present application, but not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
It should be noted that the terms "first," "second," and the like in the description and claims of this application and in the drawings described above are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the application described herein are capable of operation in sequences other than those illustrated or described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
In order to facilitate understanding of the embodiments of the present application, some terms referred to in the present application will be explained below.
Task scheduling: in computer science, scheduling is a method of assigning tasks to resources for execution. The task may be at least one task or a workflow in a project, and may include a shell script (shell script), a java program, a MapReduce program, a hive script (hive script), and the like. Time, sequence, front and back dependency relations exist among tasks, periodic repetition exists, an operation rule is determined for all tasks, and the tasks to be executed are arranged according to the rule to be understood as task scheduling. Common task scheduling systems include Azkaban, Ooize, Cascadeng and the like, and the most basic functions of the task scheduling systems are task definition and task arrangement; the task definition mainly determines the logic and rules of data calculation and processing, including the frequency of task execution, specific execution time, corresponding execution script and parameters and other contents; the task scheduling mainly determines the precedence relationship of different tasks, and ensures that the tasks are performed orderly and efficiently. The output result of the task arrangement is a Directed Acyclic Graph (DAG), so that a user can conveniently check the dependency relationship and the execution condition of the task, and the operation process of the program can be visually tracked.
Directed acyclic graph: in mathematics and computer science, a directed acyclic graph refers to a directed graph without loops. The relationship existing between nodes of the directed acyclic graph is directional and unidirectional, can only point from one node to another node, cannot reverse, and can go from any node to a plurality of nodes along the direction, but can never return to the starting point again, and the path that goes cannot form a loop.
Leaf node: nodes in a tree that do not have children are called leaf nodes, called leaves for short, and also called terminal nodes.
And (4) project release: the method refers to a process of configuring the items of scheduling information and task dependency relationship on an offline data development platform and correspondingly generating scheduling items on a task scheduling system.
The following describes an application scenario related to the present application.
With the development of information technology, data analysis processing tasks mostly include a plurality of data processing steps, such as data acquisition, transmission, calculation, display, and the like, and a data processing algorithm of each step needs to be submitted to a calculation framework for operation, some steps may be executed concurrently, and some steps need to have a dependency relationship. The method can be manually controlled when the tasks are simple, but when the tasks are too many and the relationship is complex, if a clear task planning graph is not available, a closed loop is easily formed among the tasks so as to make an error, or a plurality of tasks which can be performed in parallel are not executed in parallel so as to waste resources. Moreover, some tasks need to be executed at a specific time point, and some tasks have a certain periodicity, so that many task scheduling tools are available for well organizing such a complex execution plan and scheduling such a complex task to a distributed computing framework for operation.
Azkaban is a mainstream task scheduling tool, and can be used for scheduling and monitoring tasks, including managing program scripts, configuring dependency relationships of the tasks, checking whether a program executes correctly, giving an alarm and retrying when the program has an error, and the like. It should be understood that Azkaban is not limited to use in the field of big data, and any related to task scheduling, Azkaban can exert its task scheduling capability, for example, a simple timed mail sending task can be scheduled to be executed by setting a task execution time on Azkaban. Under the Azkaban framework, the management is carried out according to project (project), workflow (flow) and task (job) in sequence. A project comprises one or more workflows, and a workflow comprises a plurality of tasks. The joba is a process running in Azkaban, and can be a simple Linux command, a java program, a complex shell script, a MapReduce program, a hive script or a Python script and the like. One job can depend on the operation result of another job, and the job forms a dependency relationship to form a workflow, and a group of tasks are operated in a specific sequence in one workflow.
Fig. 1 is a task scheduling system 100 based on Azkaban according to an embodiment of the present disclosure, where the task scheduling system 100 may include: a management Server 101(Azkaban Web Server), an execution Server 102(Azkaban ExecutorServer), and a Relational Database 103(Relational Database). The management server 101, the execution server 102, and the relational database 103 may be connected to each other via a network, which may be a wired network, a wireless network, or a mixture thereof.
The relational database 103 currently supports only the use of MySQL databases, requiring the Azkaban database to be created in the MySQL server and the initialization to be completed. The Azkaban system stores configuration file information and most state information in the MySQL database 103, and the management server 101 and the execution server 102 both need to access MySQL.
The management server 101 serves as a main manager of Azkaban, and functions include user login authentication, project creation, project management, uploading tasks, task timing, task execution state checking, historical task checking and the like. The user can access the management server 101 through the browser 104, and perform various management operations described above on a User Interface (UI) provided by the management server.
The execution server 102 is a node of an actual operation task in the whole task scheduling system, and is mainly responsible for submitting and executing the workflow, including a Hadoop scheduling task, a shell script scheduling task, a hive scheduling task, a single point fault and the like, and coordinating the execution of each task through the MySQL database 103.
That is, the Azkaban management server 101 exists as a distributor, the management server 101 distributes tasks to the execution server 102, the execution server 102 takes the compressed file uploaded by the user from the project file (project _ files) of the Azkaban database 103, decompresses the compressed file into a local project (projects) folder, and finally submits the tasks to the thread pool, and the essence of the execution is to place each job in the thread pool for execution.
Azkaban defines a KV file (properties) format to establish dependencies between tasks and provides an easy-to-use web user interface maintenance and tracking workflow. Firstly, a user needs to create a file with a job as an extension name, and one job file represents one task. All tasks need a type (type) that instructs it how to execute, and the default task types of Azkaban include command, java, etc. After defining the job type, adding parameters and parameter values required by the task into the file, wherein one of the parameters which can be added is a dependences parameter which defines the file depended on by the file, the value is the file name of the dependents, and a plurality of objects are separated by commas without adding extension names. And saving as a job file, namely creating a job, defining a job after all parameters are defined, and configuring the dependency relationship among the tasks to form the workflow if the dependences parameter is added. And packaging all the jobfiles and required resource files (such as a java package, a hive script file, a MapReduce program jar package and the like) into a zip file, and uploading the zip file on a user interface provided by Azkaban. The Azkaban decompresses the uploaded zip file, and then analyzes and forms a directed acyclic graph formed by all nodes, namely, a dependency relationship graph among tasks is presented on a web user interface, so that a user can conveniently check the dependency relationship among the tasks, the execution state of each task and the like. The nodes with the dependency relationships are connected by solid lines in the dependency relationship graph, the default is gray to indicate that the tasks are not executed, the blue color indicates that the tasks are executed, the green color indicates that the tasks are executed successfully, and the red color indicates that the tasks are executed unsuccessfully. The user then chooses to configure the timed schedule (schedule) or execute immediately (execute). If the timing scheduling mode is selected, when the timing scheduling time point is reached, the execution service 102 will read the configuration file from the Azkaban database 103, and then download the required data to the local. Then, the execution server 102 starts executing the workflow, and continuously puts the execution status information of each task into the database 103, so that the execution status information and the like can be viewed through the web management server 101.
For example, FIG. 2 illustrates a relationship diagram between tasks. A user newly creates an a.job file locally, creates a b.job file again, and adds a statement "dependences ═ a" in the b.job file, that is, the execution result of the B task depending on the a task is defined by adding a dependences parameter. And similarly, creating a C.jobfile, enabling the C task to depend on the A task, creating a D.jobfile, enabling the D task to depend on the B task and the C task, and finally creating an E.jobfile and enabling the E task to depend on the D task. After the 5 job files are saved, the 5 job files and the resource files required by each task are packaged together into a zip package (i.e., configuration file). And (3) creating a project on the web management interface of the Azkaban, wherein the project comprises the steps of filling out a workflow name, remark information and the like, uploading the zip package, and accordingly a workflow is created, storing the configuration file in the Azkaban database 103 by the task scheduling system, and finally presenting a dependency graph between tasks as shown in fig. 2.
In the related technology, a user manually writes and completes respective configuration files of each task, packs all the configuration files and required resource files into a zip file, uploads the zip file to an Azkaban server through a user interface provided by the Azkaban server, calls respective scheduling information configuration pages of each task on the Azkaban server through the user interface, and completes configuration of scheduling information on the scheduling information configuration pages one by one in a page switching mode. However, the configuration process is complicated because the writing of the configuration file and the configuration of the scheduling information are performed by the user in multiple times.
In order to solve the above technical problems in the related art, an embodiment of the present application provides a task scheduling method, which may be applied to an electronic device capable of communicating with the task scheduling system shown in fig. 1, where an offline data development platform capable of presenting a visual configuration interface is deployed in the electronic device, and when a project needs to be released, a user starts to operate the offline data development platform and performs scheduling-related configuration on the visual configuration interface provided by the offline data development platform.
As shown in fig. 3, the method may include the steps of:
step 301, obtaining a relationship diagram for reflecting the dependency relationship among the tasks from a relationship diagram editing area of the visual configuration interface.
And the relational graph editing area is used for generating a relational graph reflecting the dependency relationship among the tasks according to the dragging operation of the user. The relational graph editing area comprises a directory tree area and a relational graph generating area, the directory tree area comprises at least one node, and each node uniquely corresponds to one task. In this embodiment, the nodes in the directory tree region may be configured by dividing items, that is, the directory tree region includes nodes corresponding to tasks under this item, and in addition, the directory tree region may further include nodes corresponding to tasks under other items; certainly, in order to ensure the generality of the nodes, the nodes in the directory tree region may also be classified and set, so that tasks of the same category under different items may be represented by the same node, and certainly, the nodes in the directory tree region may also be set according to other attributes or parameters, which is not specifically limited in this embodiment.
When a relational graph is generated according to the dragging operation of a user, acquiring at least two nodes dragged by the user from a directory tree region, and displaying the at least two nodes in a relational graph generation region; the method comprises the steps of obtaining two nodes which are indicated by a user from at least two nodes and can be connected through directed line segments, and displaying the directed line segments between the two nodes according to the indication of the user, wherein the directed line segments are used for representing the dependency relationship between the two nodes. For example, when a first node is connected to a second node by a directed line segment, it means that the second node depends on the operation result of the first node.
In this embodiment, the relationship graph includes, but is not limited to, a DAG graph.
And step 302, analyzing the relational graph to generate a configuration file adaptive to the task scheduling system.
When the electronic device analyzes the relational graph, firstly, a relational graph description corresponding to the relational graph is generated according to a preset computer language format, and since the task scheduling system cannot identify the dependency relationship between tasks through the relational graph description in the computer language format, and cannot directly create scheduling tasks according to the relational graph description in the computer language format, the relational graph description needs to be converted into a configuration file which can be identified by the task scheduling system.
In a specific embodiment, in order to facilitate searching for node information and dependent nodes of a node when generating a configuration file, in this embodiment, a relationship graph description generated according to a preset computer language format is divided into a first part description for indicating node information of each node and a second part description for indicating a node relationship of each node.
Optionally, in this embodiment, the preset computer language format includes, but is not limited to, a JSON format. When the relational graph description is generated according to the JSON format, the relational graph description is embodied as a JSON character string conforming to the JSON format. In the JSON character string, "nodes" are used for describing nodes in the relational graph, and "edges" are used for describing the node relation of each node in the relational graph.
The node information of each node comprises a node identifier, a node name, an input port number, an output port number and/or coordinates of the node in a relational graph generation area; the "edges" section includes multiple sets of node relationships, each set of node relationships including a node identification of a starting node (srcNodeId) and a node identification of a target node (dstNodeId), and a set of node relationships indicating that dstNodeId depends on srcNodeId.
Corresponding to this embodiment, the first part describes the "nodes" part in the corresponding JSON string, and the second part describes the "edges" part in the corresponding JSON string.
Based on the above specific implementation of the relationship diagram description, the process of converting the relationship diagram into the relationship diagram description is essentially the process of the electronic device to identify the relationship diagram, and based on this, as shown in fig. 4, the present embodiment provides the following steps of generating the relationship diagram description:
step 401, identifying at least one node in the relational graph according to preset node attributes;
the node attributes include, but are not limited to, attributes such as node name of the node, shape of the node, or type of identifier.
Because the directed line segments in the relational graph do not usually have names, and the nodes usually can take tasks corresponding to the nodes as the names of the nodes, the nodes and the directed line segments can be distinguished through the node names, so that the nodes in the relational graph are identified; or, in the relational graph, the nodes are not generally represented in the form of line segments, but are represented in the shapes of circles, ellipses, and the like, so that the nodes and the directed line segments can be distinguished by the shapes to identify the nodes in the relational graph; or, in the relational graph, the node identifiers of the nodes and the identifiers of the directed line segments may be represented by different digits, for example, the node identifiers of the nodes are represented by three digits, and the identifiers of the directed line segments are represented by two digits, so that the nodes and the directed line segments are distinguished according to the types of the identifiers, thereby identifying the nodes in the relational graph.
It should be noted that, no matter the type of the node is identified by the name of the node, the shape of the node, or the identifier, the method is an optional implementation manner in this embodiment, and this is not specifically limited in this application, and of course, other manners in the prior art that can identify the node in the relationship graph are also suitable for this embodiment.
Step 402, respectively obtaining respective node information of each node;
in an optional implementation manner, each node in the relational graph carries a node attribute, such as a node name or a node identifier, for uniquely identifying the node, so that node information of the node can be acquired according to the node attribute carried by the node.
In this embodiment, each task may be associated with the electronic device in advance, so that the electronic device can obtain the relevant information of each task, and therefore, after the node attribute in the relational graph is obtained, the relevant information of the task corresponding to the node may be obtained according to the node attribute, and the relevant information is determined as the node information of the node.
Step 403, respectively identifying directed line segments which respectively have connection relations with each node;
step 404, respectively determining respective dependent nodes of each node based on the direction of the directed line segment;
for example, when it is identified that the first node is connected to the second node through the directed line segment, it may be determined that the directed line segment having a connection relationship with the first node points to the second node, and then it is determined that the first node is the node on which the second node depends.
Step 405, describing the respective node information of each node and the respective dependent node of each node according to a preset computer language format to obtain a relational graph description.
Referring to fig. 5, fig. 5 is another relationship diagram shown in the present embodiment. In the relational graph, each node carries a node name, which is create _ table. hql, al _ partition. hql, select. hql, insert _ override _ tab, so that each node in the relational graph can be identified according to the node name. In addition, since the nodes connected by the directed line segment having the connection relationship with the create _ table.hql node are the exchange _ partition.hql node and the select.hql node, respectively, and the directions of the directed line segment are from the create _ table.hql node to the exchange _ partition.hql node and from the create _ table.hql node to the select.hql node, it can be determined that both the exchange _ partition.hql node and the select.hql node depend on the operation result of the create _ table.hql node, and the determination of the node on which the insert _ overlay _ tab node depends is similar to the logic described above, and will not be described herein.
In a specific embodiment, when the configuration file is generated based on the above description of the relationship graph, specifically, the respective node information of each node is respectively obtained from the first part of description, and the respective dependent node of each node is respectively obtained from the second part of description; and generating a configuration file adaptive to the task scheduling system according to the respective node information of each node and the respective dependent node of each node.
Optionally, in this embodiment, the node relationship in the second part of description may be expressed in a relative relationship between the starting node and the target node, and the target node depends on the starting node. Therefore, when the dependent node of the node is obtained from the second part of description, the target node relation of the target node as the node is searched from the second part of description; acquiring an initial node in a target node relation; and determining the initial node as a dependent node of the node. It can be understood that, taking the starting node corresponding to the target node described in the second part of the description of the relational graph as the dependent node of the node in the configuration file, the configuration file is a file recognizable by the task scheduling system.
Optionally, in this embodiment, in order to simplify the complexity of task scheduling and improve the scheduling efficiency, a workflow is set under one item, where the workflow includes each task in the above description. For the Azkaban task scheduling system, the names of the workflows in the Azkaban are determined according to the names of tail nodes in the dependency relationship, and if a plurality of tail nodes exist, the names of the workflows cannot be judged in advance, so that certain difficulty is brought to maintenance work of task scheduling. Therefore, in the present embodiment, the last node is created based on each node in the above description of the relationship diagram, and the configuration files are generated for each node and the last node, respectively.
In specific implementation, leaf nodes are determined from all nodes; creating a tail node which has a dependency relationship with the leaf nodes, wherein the dependency node of the tail node is the leaf node; acquiring the node information of the tail node, and generating a configuration file corresponding to the tail node according to the node information of the tail node and the dependent node of the tail node; generating configuration files corresponding to the nodes according to the node information of the nodes and the dependent nodes of the nodes; and determining the configuration file corresponding to the tail node and the configuration files corresponding to the nodes as configuration files.
It should be understood that creating the last node actually includes determining the dependent node of the last node and generating the node information of the last node, so that after the creation of the last node is completed, the node information of the last node can be acquired, so as to generate the configuration file corresponding to the last node based on the node information and the dependent node of the last node.
The Azkaban configuration file comprises three parameters of type (type), command (command) and dependency (dependences), wherein the type represents the type of the node, the command represents the command required to be executed by the node, and the dependences represents the node on which the node depends. In this embodiment, "type" is fixed to "command", which indicates that the task corresponding to the node is a Shell command task. When the value of the dependences parameter of the node is determined, traversing each group of node relation in the second part of description, finding out a target node which is the target node of the node, and determining the initial node corresponding to the target node as the value of the dependences parameter in the configuration file corresponding to the node. Each node in the description of the relational graph corresponds to a configuration file, and the node name in the node is the name of the configuration file.
Referring to fig. 6, fig. 6 is a schematic diagram illustrating a node relationship in this embodiment. The graph includes three sets of node relationships, which are edge1, edge2, and edge3, respectively, where in two virtual boxes in the graph, a node in the left virtual box is a start node (srcNodeId), and a node in the right virtual box is a target node (dstNodeId). When the value of the dependences parameter in the configuration file of the node A is determined, traversing all 'edge' of A by 'dstNodeId', and finding out 'srcNodeId' of 3 nodes as B, C, D, so that the value of the dependences parameter in the configuration file of the node A is determined as 'B, C, D'.
In this embodiment, when determining the leaf node from the nodes, since the cost of directly obtaining the non-leaf node is lower than the cost of obtaining the leaf node, and the calculation complexity is small, when determining the leaf node, the non-leaf node in all the nodes is obtained first, and then the non-leaf node is removed from all the nodes, and the remaining nodes are the leaf nodes.
When the leaf node is determined from each node, optionally, searching the node relation that the target node is not empty from the second part of description; acquiring an initial node from the node relation of which the target node is not empty; the start node is taken as a non-leaf node.
The target node is not null, which means that the target node does not have a node identifier in the set of node relationships, i.e., the target node does not actually exist in the relationship graph.
Referring to fig. 7, fig. 7 is another schematic diagram illustrating the node relationship in this embodiment. The graph includes four node relationships, namely, edge1, edge2, edge3 and edge4, wherein, in two virtual boxes in the graph, a node in the left virtual box is a start node (srcNodeId), and a node in the right virtual box is a target node (dstNodeId). When the leaf node in fig. 7 is determined, all node relationships are traversed, if the dstNodeId is not null, the srcNodeId in the node relationship is a non-leaf node, and then the nodes A, B and C in fig. 7 can be determined to be non-leaf nodes through traversal.
Step 303, after the configuration file is uploaded to the task scheduling system, obtaining scheduling information of each task from a scheduling information configuration area of the visual configuration interface, where the scheduling information includes execution time of a project to which each task belongs and alarm information.
The alarm information includes, but is not limited to, duration of each task when executed, a notifier of the alarm, and/or information such as whether to call the alarm.
Optionally, in other embodiments of the present application, the scheduling information may further include item description, data dependency failure retry number and interval execution duration, single-script rerun number and interval duration, and other information.
Wherein, the item description can be the name of the item; explanation on the number of data-dependent failed retries and the interval execution time length: after the 'task dependence' is set in the script, if the data volume of the dependent table/partition is 0, the task automatically fails, a retry mechanism is set, the task can be prevented from directly failing after the dependent data is delayed, and the task continues to run after the dependent data is ready; description of information on the number of times of single-scenario rerun and interval duration: if a single script fails, only the script is re-executed, and the script which has been successfully executed is not re-executed.
And step 304, sending scheduling information to the task scheduling system so that the task scheduling system executes task scheduling on each task based on the configuration file and the scheduling information.
The task scheduling system provides an API (application programming interface) based on Restful for interaction with the outside, so that scheduling information can be sent to the task scheduling system through the API.
According to the technical scheme provided by the embodiment of the application, the relation graph used for reflecting the dependency relationship among the tasks is obtained through the relation graph editing area of the visual configuration interface, the relation graph is analyzed, the configuration file adapted to the task scheduling system is generated, after the configuration file is uploaded to the task scheduling system, the scheduling information of the tasks is obtained through the scheduling information configuration area of the visual configuration interface, and the scheduling information is sent to the task scheduling system, so that the task scheduling system can execute task scheduling on the tasks based on the configuration file and the scheduling information. Therefore, the configuration of the scheduling information and the dependency relationship for analyzing the configuration file in one configuration process is realized through the relationship diagram editing area and the scheduling information configuration area of the visual configuration interface, so that the configuration process is simple, the efficiency of issuing the scheduling task by the user is greatly improved, and the process of developing the offline data is shortened.
In addition, due to the fact that the scheduling information configuration is carried out on the tasks through the unified visual configuration interface, the back-and-forth switching of multiple pages on the Azkaban and complex configuration operation are reduced.
In addition, because the relationship diagram reflecting the dependency relationship of each task can be configured in the relationship diagram editing area on the visual configuration interface, and the configuration file is obtained based on the analysis of the relationship diagram, the step of manually writing a plurality of configuration files by a user can be avoided, the time overhead of arranging tasks by the user is greatly reduced, and the trial-and-error times of arranging tasks are reduced or even avoided.
Finally, the scheme can meet the requirement that the user can release the information after completing the development by associating the online development task, and provides one-stop working experience of development and release.
Based on the same concept, embodiments of the present application provide a task scheduling apparatus, and specific implementation of the apparatus may refer to descriptions in the method embodiment section, and repeated details are not repeated, as shown in fig. 8, the apparatus mainly includes:
a first obtaining unit 801, configured to obtain a relationship diagram used for reflecting a dependency relationship between tasks from a relationship diagram editing area of a visual configuration interface;
a generating unit 802, configured to parse the relationship graph and generate a configuration file adapted to the task scheduling system;
a second obtaining unit 803, configured to obtain scheduling information of each task from a scheduling information configuration area of the visual configuration interface after uploading the configuration file to the task scheduling system, where the scheduling information includes execution time of a project to which each task belongs and alarm information;
the scheduling unit 804 is configured to send scheduling information to the task scheduling system, so that the task scheduling system performs task scheduling on each task based on the configuration file and the scheduling information.
In other embodiments of the present application, the generating unit 802 is specifically configured to:
generating a relational graph description corresponding to the relational graph according to a preset computer language format, wherein the relational graph description comprises a first part description used for indicating node information of each node and a second part description used for indicating node relation of each node, and any task has a unique corresponding node in the relational graph;
respectively acquiring respective node information of each node from the first part of description and respectively acquiring respective dependent node of each node from the second part of description;
and generating a configuration file adaptive to the task scheduling system according to the respective node information of each node and the respective dependent node of each node.
In other embodiments of the present application, the generating unit 802 is specifically configured to:
identifying at least one node in the relational graph according to preset node attributes;
respectively acquiring respective node information of each node;
respectively identifying directed line segments which respectively have connection relations with each node;
respectively determining the respective dependent nodes of each node based on the direction of the directed line segment;
and describing the respective node information of each node and the respective dependent node of each node according to a preset computer language format to obtain the relational graph description.
In other embodiments of the present application, the generating unit 802 is specifically configured to:
for any node, searching a target node relation with the target node as the node from the second part of description;
acquiring an initial node in a target node relation;
and determining the initial node as a dependent node of the node.
In other embodiments of the present application, the generating unit 802 is specifically configured to:
determining leaf nodes from the nodes;
creating a tail node which has a dependency relationship with the leaf nodes, wherein the dependency node of the tail node is the leaf node;
generating a first type configuration file corresponding to the tail node according to the dependent node of the tail node; generating a second type configuration file according to the respective node information of each node and the respective dependent node of each node;
and determining the first type configuration file and the second type configuration file as configuration files.
In other embodiments of the present application, the generating unit 802 is specifically configured to:
determining non-leaf nodes from each node according to the respective dependent nodes of each node;
and determining the nodes except the non-leaf nodes in all the nodes as leaf nodes.
In other embodiments of the present application, the generating unit 802 is specifically configured to:
searching the node relation of which the target node is not empty from the second part of description;
acquiring an initial node from the node relation of which the target node is not empty;
the start node is taken as a non-leaf node.
Based on the same concept, an embodiment of the present application further provides an electronic device, as shown in fig. 9, the electronic device mainly includes: a processor 901, a memory 902 and a communication bus 903, wherein the processor 901 and the memory 902 communicate with each other via the communication bus 903. The memory 902 stores a program executable by the processor 901, and the processor 901 executes the program stored in the memory 902, so as to implement the following steps:
acquiring a relation graph for reflecting the dependency relationship among tasks from a relation graph editing area of a visual configuration interface;
analyzing the relation graph to generate a configuration file adaptive to the task scheduling system;
after the configuration file is uploaded to a task scheduling system, scheduling information of each task is obtained from a scheduling information configuration area of a visual configuration interface, wherein the scheduling information comprises execution time and alarm information of a project to which each task belongs;
and sending scheduling information to the task scheduling system so that the task scheduling system executes task scheduling on each task based on the configuration file and the scheduling information.
The communication bus 903 mentioned in the electronic device may be a Peripheral Component Interconnect (PCI) bus, an Extended Industry Standard Architecture (EISA) bus, or the like. The communication bus 903 may be divided into an address bus, a data bus, a control bus, etc. For ease of illustration, only one thick line is shown in FIG. 9, but this does not indicate only one bus or one type of bus.
The Memory 902 may include a Random Access Memory (RAM) or a non-volatile Memory (non-volatile Memory), such as at least one disk Memory. Alternatively, the memory may be at least one storage device located remotely from the processor 901.
The Processor 901 may be a general-purpose Processor, including a Central Processing Unit (CPU), a Network Processor (NP), etc., and may also be a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other Programmable logic devices, discrete gates or transistor logic devices, and discrete hardware components.
According to an aspect of the application, a computer program product or computer program is provided, comprising computer instructions, the computer instructions being stored in a computer readable storage medium. The processor of the computer device reads the computer instructions from the computer readable storage medium, and the processor executes the computer instructions to cause the computer device to execute the task scheduling method.
In the above embodiments, the implementation may be wholly or partially realized by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When loaded and executed on a computer, cause the processes or functions described in accordance with the embodiments of the application to occur, in whole or in part. The computer may be a general purpose computer, a special purpose computer, a network of computers, or other programmable device. The computer instructions may be stored on a computer readable storage medium or transmitted from one computer readable storage medium to another, for example, from one website site, computer, server, or data center to another website site, computer, server, or data center via wire (e.g., coaxial cable, fiber optic, Digital Subscriber Line (DSL)) or wirelessly (e.g., infrared, microwave, etc.). The computer-readable storage medium can be any available medium that can be accessed by a computer or a data storage device, such as a server, a data center, etc., that includes one or more of the available media. The available media may be magnetic media (e.g., floppy disks, hard disks, tapes, etc.), optical media (e.g., DVDs), or semiconductor media (e.g., solid state drives), among others.
It is noted that, in this document, relational terms such as "first" and "second," and the like, may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.
The foregoing are merely exemplary embodiments of the present invention, which enable those skilled in the art to understand or practice the present invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the invention. Thus, the present invention is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.
- 上一篇:石墨接头机器人自动装卡簧、装栓机
- 下一篇:一种定时任务管理系统