Resource demand-aware multi-queue scheduling method, system and server
1. A resource demand aware multi-queue scheduling method is characterized in that: the method comprises the following steps:
acquiring a task submitted by a user, and judging the task to be a CPU (central processing unit) task or a GPU (graphics processing unit) task;
when the task is a GPU task, determining the optimal CPU configuration based on adjusting the number of CPU cores and checking the GPU utilization rate, and entering GPU task scheduling;
when the task is a CPU task, directly entering CPU task scheduling;
executing GPU task scheduling or CPU task scheduling:
dividing CPU resources, and adjusting a CPU resource queue according to the queuing conditions of a current CPU task queue and a GPU task queue;
dividing GPU resources, and adjusting a GPU resource queue according to the queuing condition of the current GPU task queue;
and the competition of the memory system of the GPU task and the CPU task on the same node is eliminated.
2. The resource demand aware multi-queue scheduling method of claim 1, wherein: one implementation manner of adjusting the number of CPU cores is as follows:
acquiring a CPU searching initial point based on historical information;
determining increase and decrease of the CPU searching initial point according to one or more of pipeline information, weight information and CPU calculation complexity provided by a user;
the increased and decreased CPU search initial points form the search initial points of the task.
3. The resource demand aware multi-queue scheduling method of claim 2, wherein: determining increase and decrease of the CPU search initial point according to one or more of pipeline information, weight information and CPU calculation complexity provided by a user:
if the task submitted by the user contains pipeline information and pipeline optimization is used, subtracting 1 from the initial point searched by the CPU;
if the task submitted by the user contains weight information and the weight reaches a preset weight value, subtracting 1 from the initial point of the CPU search;
and if the task submitted by the user comprises the CPU end calculation complexity and the CPU calculation complexity reaches the preset complexity, adding 1 to the initial CPU search point.
4. The resource demand aware multi-queue scheduling method according to claim 1 or 2, characterized by: one implementation of the method for determining the optimal CPU configuration based on adjusting the number of CPU cores and checking the GPU utilization includes:
acquiring the GPU utilization rate under the current CPU configuration based on the initial search point of the task;
carrying out 1-subtracting cyclic operation on the initial search point of the task until the GPU utilization rate is not increased any more;
performing 1 addition cycle operation on the initial search point of the task until the GPU utilization rate is not increased;
and when the utilization rate of the output GPU is not improved any more, the corresponding CPU configuration is the optimal CPU configuration.
5. The resource demand aware multi-queue scheduling method of claim 1, wherein: the dividing the CPU resource partition comprises: dividing CPU resources into a CPU task CPU resource queue and a GPU task CPU resource queue; the method for adjusting the CPU resource queue according to the queuing conditions of the current CPU task queue and the GPU task queue comprises the following steps:
when the CPU task arrives and the GPU task queue is idle, allowing the CPU task to preempt CPU resources in the GPU task CPU resource queue for pre-operation;
when a GPU task arrives and CPU resources in a GPU task CPU resource queue need to be preempted, stopping the currently running CPU task, so that the GPU task can run; and the suspended CPU task reenters the queue head of the CPU resource queue of the CPU task to wait for being rescheduled.
6. The resource demand aware multi-queue scheduling method of claim 5, wherein: the GPU task CPU resource queue is determined by the highest CPU resource required by the historical GPU task.
7. The resource demand aware multi-queue scheduling method of claim 1, wherein: the dividing the GPU resource partition comprises: dividing GPU tasks into single GPU tasks and multiple GPU tasks, and correspondingly dividing GPU resources into single GPU resource queues and multiple GPU resource queues; one way to adjust the GPU resource queue according to the queuing status of the current GPU task queue includes:
when the GPUs in the multi-GPU resource queue are used up and the single-GPU task queue is empty, the multi-GPU task tries to occupy the GPUs in the single-GPU resource queue;
when the GPU of the single GPU resource queue is used up and the multi-GPU task queue is empty, the single GPU task tries to occupy the GPU of the multi-GPU resource queue.
8. The resource demand aware multi-queue scheduling method of claim 1, wherein: one implementation manner for eliminating the contention of the memory system of the GPU task and the CPU task on the same node includes:
monitoring the use conditions of all process bandwidths;
and when the bandwidth use condition of the CPU task reaches a preset broadband value and the GPU utilization rate of the GPU task on the node is reduced by a preset value, compressing the CPU use core number of the CPU task and reducing the bandwidth use rate of the CPU task.
9. A resource demand aware multi-queue scheduling system, comprising: the resource demand aware multi-queue scheduling system comprises:
the self-perception CPU allocation module is used for acquiring a task submitted by a user, judging the task to be a CPU task or a GPU task, determining the optimal CPU configuration based on adjusting the number of CPU cores and checking the GPU utilization rate when the task is the GPU task, and entering GPU task scheduling;
a resource adjustment module with multiple queues, configured to perform GPU task scheduling: dividing CPU resources, and adjusting a CPU resource queue according to the queuing conditions of a current CPU task queue and a GPU task queue; dividing GPU resources, and adjusting a GPU resource queue according to the queuing condition of the current GPU task queue;
and the real-time competition elimination module is used for eliminating the competition of the memory system of the GPU task and the CPU task on the same node.
10. A server, characterized by: the system comprises a CPU and a GPU; the CPU and the GPU are operable to perform the resource demand aware multi-queue scheduling method of any of claims 1 to 8.
Background
Because deep learning has a strong feature representation capability, a large number of fields begin to adopt deep learning to solve problems, such as object recognition, speech recognition, natural language processing, and the like. Deep learning training is a process that requires significant effort. To solve this problem, a large number of enterprises have begun introducing GPUs in their own native CPU clusters. The introduction of the GPU has led to the evolution of the original homogeneous cluster into a heterogeneous cluster containing both CPU and GPU. Meanwhile, in addition to the original traditional CPU task, the heterogeneous cluster starts to support the GPU task, i.e., the deep learning training task. For example, Facebook relies on a CPU to support deep learning inference tasks and traditional tasks, and a GPU to support deep learning training tasks. New heterogeneous clusters and new task scenarios present new challenges to the scheduling system.
Training for deep learning is a complex process. In addition to relying on the GPU to provide significant computational power, it still requires a CPU to support the necessary auxiliary work. First, when the GPU performs the computation, the CPU is responsible for preparing the next batch of training data for the GPU. Secondly, the CPU is responsible for collecting all gradients calculated by the GPUs, performing unified updating and then distributing to all the GPUs. Therefore, the GPU task needs to apply for a certain number of CPUs in addition to a certain number of GPUs.
Currently, because the CPU requirements of GPU tasks are not perceived, the CPU applications of the existing GPU tasks generally fall into two ranges: excessive CPU applications and insufficient CPU applications. When one GPU task applies for excessive CPUs, other GPUs on the same GPU node can not bear the tasks, GPU fragmentation is caused, the throughput rate of a cluster is further reduced, and the queuing time of other tasks is prolonged. When one GPU task applies too few CPUs, the GPU task is likely to suffer from corresponding performance loss, which also results in a decrease in the throughput of the cluster, thereby increasing the queuing time of other tasks.
In addition to training CPU applications for which the task is unreasonable, the GPU task still needs to compete with the CPU task for CPU resources of the entire cluster. Because the novel heterogeneous cluster needs to support both the CPU task and the GPU task, the CPU task and the GPU task need to compete for the CPU resource of the cluster. When a large number of CPU tasks arrive, GPU tasks may suffer from long wait times because the CPU is fully requested. In addition, the CPU task and the GPU task need to share a CPU-side memory system of the GPU node, such as cache and memory bandwidth. Even if the CPU task does not affect the submission of the GPU task, the performance of the GPU task may be severely affected when the CPU task occupies a large amount of memory resources of a node.
The DRF scheduling algorithm is one proposed by university of california, berkeley, university under NSDI 2011. Most mainstream systems at the present stage support the DRF scheduling algorithm. The DRF scheduling algorithm is an allocation strategy for maximizing the minimum resource allocation of multiple resources. In other words, in a multi-resource scheduling environment, the DRF considers that the resource allocation of a user should be determined by the dominant share of resources of the user. The dominant share resource is a resource which occupies the largest share of the whole resources in all the resources applied by the user. The DRF scheduling algorithm attempts to maximize the smallest dominant share of resources among all users.
The current DRF algorithm has three disadvantages. First, although the DRF algorithm essentially guarantees fairness among users, since the CPU required by the GPU task is unknown, the DRF algorithm still cannot avoid the throughput problem and the queuing problem caused by the application of the CPU with an unreasonable GPU task. Second, the DRF algorithm cannot distinguish between the two tasks, and the CPU task may still take up a large amount of CPU resources, resulting in a large amount of queuing of GPU tasks. Thirdly, since the GPU is a coarse-grained resource, when a user applies for a certain amount of GPU, the GPU is very likely to become its dominant share of resource, and therefore the CPU task submitted by the GPU is very likely to encounter long-time queuing, which in fact affects fairness among users.
Disclosure of Invention
In view of the above-mentioned shortcomings of the prior art, an object of the present invention is to provide a resource demand-aware multi-queue scheduling method, system and server, which are used for controlling resource task scheduling of CPU tasks and GPU tasks, improving throughput and reducing queued resource demand.
To achieve the above and other related objects, the present invention provides a resource demand aware multi-queue scheduling method, including: acquiring a task submitted by a user, and judging the task to be a CPU (central processing unit) task or a GPU (graphics processing unit) task; when the task is a GPU task, determining the optimal CPU configuration based on the adjustment of the number of CPU cores and the inspection of the GPU utilization rate, and entering GPU task scheduling; when the task is a CPU task, directly entering CPU task scheduling; executing GPU task scheduling or CPU task scheduling: dividing CPU resources, and adjusting a CPU resource queue according to the queuing conditions of a current CPU task queue and a GPU task queue; dividing GPU resources, and adjusting a GPU resource queue according to the queuing condition of the current GPU task queue; and the competition of the memory system of the GPU task and the CPU task on the same node is eliminated.
In an embodiment of the present invention, an implementation manner of the adjusting the number of CPU cores is as follows: acquiring a CPU searching initial point based on historical information; determining increase and decrease of the CPU searching initial point according to one or more of pipeline information, weight information and CPU calculation complexity provided by a user; the increased and decreased CPU search initial points form the search initial points of the task.
In an embodiment of the present invention, the determining, according to one or more of pipeline information, weight information and CPU computation complexity provided by a user, an increase or decrease of the CPU search initial point: if the task submitted by the user contains pipeline information and pipeline optimization is used, subtracting 1 from the initial CPU searching point; if the task submitted by the user contains weight information and the weight reaches a preset weight value, subtracting 1 from the initial point of the CPU search; and if the task submitted by the user comprises the CPU end calculation complexity, and the CPU calculation complexity reaches the preset complexity, adding 1 to the initial CPU search point.
In an embodiment of the present invention, the implementation manner for determining the optimal CPU configuration based on the adjustment of the number of CPU cores and the inspection of the GPU utilization includes: acquiring the GPU utilization rate under the current CPU configuration based on the initial search point of the task; carrying out 1-subtracting cyclic operation on the initial search point of the task until the GPU utilization rate is not increased any more; performing 1 addition cycle operation on the initial search point of the task until the GPU utilization rate is not increased; and when the utilization rate of the output GPU is not improved any more, the corresponding CPU configuration is the optimal CPU configuration.
In an embodiment of the present invention, the dividing the CPU resource partition includes: dividing CPU resources into a CPU task CPU resource queue and a GPU task CPU resource queue; the method for adjusting the CPU resource queue according to the queuing conditions of the current CPU task queue and the GPU task queue comprises the following steps: when the CPU task arrives and the GPU task queue is idle, allowing the CPU task to preempt CPU resources in the GPU task CPU resource queue for pre-operation; when the GPU task comes to need to preempt the CPU resource in the CPU resource queue of the GPU task, stopping the currently running CPU task so that the GPU task can run; and the suspended CPU task reenters the queue head of the CPU resource queue of the CPU task to wait for being rescheduled.
In an embodiment of the present invention, the GPU task CPU resource queue is determined by a highest CPU resource of historical GPU task requirements.
In an embodiment of the present invention, the dividing the GPU resource partition includes: dividing GPU tasks into single GPU tasks and multiple GPU tasks, and correspondingly dividing GPU resources into single GPU resource queues and multiple GPU resource queues; one way to adjust the GPU resource queue according to the queuing status of the current GPU task queue includes: when the GPUs in the multi-GPU resource queue are used up and the single-GPU task queue is empty, the multi-GPU task tries to occupy the GPUs in the single-GPU resource queue; when the GPU of the single GPU resource queue is used up and the multi-GPU task queue is empty, the single GPU task tries to occupy the GPU of the multi-GPU resource queue.
In an embodiment of the present invention, an implementation manner of eliminating contention of a memory system in which the GPU task and the CPU task are on the same node includes: monitoring the use conditions of all process bandwidths; and when the bandwidth use condition of the CPU task reaches a preset broadband value and the GPU utilization rate of the GPU task on the node is reduced by a preset value, compressing the CPU use core number of the CPU task and reducing the bandwidth use rate of the CPU task.
The embodiment of the present invention further provides a resource demand aware multi-queue scheduling system, where the resource demand aware multi-queue scheduling system includes: the self-perception CPU allocation module is used for acquiring tasks submitted by a user, judging the tasks to be CPU tasks or GPU tasks, determining the optimal CPU configuration based on adjusting the number of CPU cores and checking the GPU utilization rate when the tasks are GPU tasks, and entering GPU task scheduling; the resource adjusting module with multiple queues is used for executing GPU task scheduling or CPU task scheduling: dividing CPU resources, and adjusting a CPU resource queue according to the queuing conditions of a current CPU task queue and a GPU task queue; dividing GPU resources, and adjusting a GPU resource queue according to the queuing condition of the current GPU task queue; and the real-time competition elimination module is used for eliminating the competition of the memory system of the GPU task and the CPU task on the same node.
The embodiment of the invention also provides a server, which comprises a CPU and a GPU; the CPU and the GPU run-time implementing the resource demand aware multi-queue scheduling method as described above.
As described above, the resource demand aware multi-queue scheduling method, system and server of the present invention have the following beneficial effects:
the method can effectively control the resource task scheduling of the CPU task and the GPU task, improve the throughput and reduce the queuing resource requirements, maximize the throughput of the system and minimize the queuing of the system on the premise of not needing user perception, can indirectly provide the support of a scheduling technology for a potential heterogeneous cluster, and provides task scheduling service for a cloud environment based on the heterogeneous cluster.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings needed to be used in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.
Fig. 1 is a schematic overall flowchart of a resource demand aware multi-queue scheduling method in an embodiment of the present application.
Fig. 2 is a flowchart illustrating an implementation manner of determining an optimal CPU configuration in the resource demand aware multi-queue scheduling method according to an embodiment of the present application.
Fig. 3 is a schematic diagram illustrating an implementation process of the resource demand aware multi-queue scheduling method according to an embodiment of the present application.
FIG. 4 is a schematic block diagram of a resource demand aware multi-queue scheduling system according to an embodiment of the present application.
Fig. 5 is a schematic diagram illustrating an implementation process of a resource demand aware multi-queue scheduling system according to an embodiment of the present application.
Description of the element reference numerals
100 resource demand aware multi-queue scheduling system
110 self-sensing CPU distribution module
Resource adjusting module of 120 multi-queue
130 real-time competition cancellation module
S100 to S600
Detailed Description
The embodiments of the present invention are described below with reference to specific embodiments, and other advantages and effects of the present invention will be easily understood by those skilled in the art from the disclosure of the present specification. The invention is capable of other and different embodiments and of being practiced or of being carried out in various ways, and its several details are capable of modification in various respects, all without departing from the spirit and scope of the present invention. It is to be noted that the features in the following embodiments and examples may be combined with each other without conflict.
The embodiment aims to provide a resource demand-aware multi-queue scheduling method, system and server, which are used for controlling resource task scheduling of a CPU task and a GPU task, improving throughput and reducing queuing resource demands.
The embodiment aims to design and realize a system and a method for managing shared resources between cloud services and background application programs which are in mixed deployment and run on an energy consumption limited data center, and maximize the throughput of background batch processing tasks on the basis of ensuring the service quality of the cloud services and meeting the energy consumption limit of a server.
The principle and implementation of the resource demand aware multi-queue scheduling method, system and server of the present invention will be described in detail below, so that those skilled in the art can understand the resource demand aware multi-queue scheduling method, system and server of the present invention without creative labor.
Example 1
Specifically, as shown in fig. 1, this embodiment provides a resource demand-aware multi-queue scheduling method, where the resource demand-aware multi-queue scheduling method includes:
step S100, acquiring a task submitted by a user, and judging the task to be a CPU (central processing unit) task or a GPU (graphics processing unit) task;
step S200, when the task is a CPU task, directly entering CPU task scheduling;
step S300, when the task is a GPU task, determining the optimal CPU configuration based on the adjustment of the number of CPU cores and the inspection of the GPU utilization rate, and entering GPU task scheduling;
step S400, dividing CPU resource partitions, and adjusting a CPU resource queue according to the queuing conditions of the current CPU task queue and the GPU task queue;
step S500, dividing GPU resource partitions, and adjusting a GPU resource queue according to the queuing condition of the current GPU task queue;
step S600, the competition of the GPU task and the CPU task in the memory system on the same node is eliminated.
The steps S100 to S600 of the resource demand aware multi-queue scheduling method of the present embodiment will be described in detail with reference to fig. 2.
And step S100, acquiring a task submitted by a user, and judging the task to be a CPU task or a GPU task.
And step S200, when the task is a CPU task, directly entering CPU task scheduling.
And step S300, when the task is a GPU task, determining the optimal CPU configuration based on the adjustment of the CPU core number and the inspection of the GPU utilization rate, and entering GPU task scheduling.
Since the CPU that the GPU task needs is not aware of, the embodiment automatically searches for the optimal CPU configuration for the GPU task. Through corresponding experiments, the embodiment has three findings. Firstly, the GPU utilization rate of a model training task, namely the GPU utilization rate of a GPU task, has a similar variation trend with the performance of the task and reaches an optimal value under the same CPU configuration. Secondly, the GPU utilization rate of the GPU task has a linear relation with the number of the distributed CPUs. Third, GPU tasks can be simply categorized into three categories, image tasks, voice tasks, and natural language tasks. The processing work of the tasks in each class at the CPU end is similar, and therefore, the GPU utilization variation curve of each class of tasks is also similar. Based on the three findings, the embodiment finds and confirms that the GPU task obtains the optimal performance by adjusting the number of CPU cores, and further confirms the CPU configuration of the task that is optimal.
In this embodiment, one implementation manner of adjusting the number of CPU cores is as follows: acquiring a CPU searching initial point based on historical information; determining increase and decrease of the CPU searching initial point according to one or more of pipeline information, weight information and CPU calculation complexity provided by a user; the increased and decreased CPU search initial points form the search initial points of the task.
Wherein, the increase and decrease of the CPU search initial point is determined according to one or more of the pipeline information, the weight information and the CPU calculation complexity provided by the user: if the task submitted by the user contains pipeline information and pipeline optimization is used, subtracting 1 from the initial point searched by the CPU; if the task submitted by the user contains weight information and the weight reaches a preset weight value, subtracting 1 from the initial point of the CPU search; and if the task submitted by the user comprises the CPU end calculation complexity and the CPU calculation complexity reaches the preset complexity, adding 1 to the initial CPU search point.
Specifically, as shown in fig. 2, a CPU search initial point is obtained through history information, and then, if a user provides pipeline information and pipeline optimization is used, the corresponding CPU search initial point is decreased by 1; if the user provides weight information and the weight is more, subtracting 1 from the initial point of the corresponding CPU search; if the user provides the CPU end calculation complexity, and the CPU calculation complexity is higher, adding 1 to the initial point of the CPU search; thereby determining a search initiation point for each task.
In this embodiment, one implementation manner of determining the optimal CPU configuration based on adjusting the number of CPU cores and checking the GPU utilization includes: acquiring the GPU utilization rate under the current CPU configuration based on the initial search point of the task; carrying out 1-subtracting cyclic operation on the initial search point of the task until the GPU utilization rate is not increased any more; performing 1 addition cycle operation on the initial search point of the task until the GPU utilization rate is not increased; and when the utilization rate of the output GPU is not increased any more, the corresponding CPU configuration is the optimal CPU configuration.
And searching the optimal CPU configuration, namely finding the CPU configuration of the GPU task to obtain the maximum GPU utilization rate. Specifically, as shown in fig. 3, first, according to the determined search initial point of each task, a task is run for a preset time (for example, 90 seconds), and the GPU utilization under the CPU configuration is detected. Secondly, trying to reduce 1 core, checking whether the utilization rate of the GPU of the application is improved, and if the utilization rate of the GPU of the application is improved, continuing trying to reduce one core until the optimal number of CPU cores is found; if not, the next stage is entered. Then, trying to add 1 core, checking whether the GPU utilization rate of the application is improved, and if the GPU utilization rate is improved, continuously trying to add one core; if not, finding the optimal CPU core number.
The embodiment of the invention mainly solves the problem of CPU resource competition between the GPU task and the CPU task and the problem of GPU resource competition between GPU tasks with different configurations.
The present embodiment performs the CPU task scheduling first and then performs the CPU task scheduling.
And step S400, dividing the CPU resource partition, and adjusting the CPU resource queue according to the queuing conditions of the current CPU task queue and the GPU task queue.
In this embodiment, the dividing the CPU resource partition includes: dividing CPU resources into a CPU task CPU resource queue and a GPU task CPU resource queue; the method for adjusting the CPU resource queue according to the queuing conditions of the current CPU task queue and the GPU task queue comprises the following steps: when the CPU task arrives and the GPU task queue is idle, allowing the CPU task to preempt CPU resources in the GPU task CPU resource queue for pre-operation; when a GPU task arrives and CPU resources in a GPU task CPU resource queue need to be preempted, stopping the currently running CPU task, so that the GPU task can run; the suspended CPU task re-enters the queue head of the CPU resource queue of the CPU task and waits to be rescheduled.
In this embodiment, the GPU task CPU resource queue is determined by the highest CPU resource required by the historical GPU task.
Step S400 of this embodiment is responsible for CPU resource partitioning and adjustment of GPU tasks and CPU tasks. Specifically, as shown in fig. 4, in this embodiment, a task is first divided into a CPU task and a GPU task, and then a CPU resource is divided into a CPU task CPU resource queue and a GPU task CPU resource queue, where the GPU task CPU resource queue is a CPU resource reserved for the GPU task. The division of the corresponding resource queue is determined by the statistical information of the historical tasks. The GPU resource queue is determined by the highest CPU resource required by the historical GPU task. For the CPU task queue, the present embodiment employs a DRF scheduling algorithm, and schedules the CPU according to the use of the user. For the GPU task queue, the DRF scheduling algorithm is also adopted in this embodiment, and the GPU is scheduled according to the usage of the GPU by the user, and the CPU usage of the GPU task is determined based on the optimal CPU configuration.
In the actual operation process, when a CPU task burst comes and a GPU task queue is idle, allowing the CPU task to preempt CPU resources in the GPU task CPU resource queue for pre-operation. When the GPU task comes to need the occupied CPU resources, the running CPU task is suspended, so that the GPU task can run. And the suspended CPU tasks re-enter the queue head to wait for being rescheduled on the proper node. Conversely, when the GPU task arrives suddenly, the GPU task may preempt the resources of the CPU resource queue.
And step S500, dividing GPU resource partitions, and adjusting a GPU resource queue according to the queuing condition of the current GPU task queue.
In this embodiment, the dividing the GPU resource partition includes: dividing GPU tasks into single GPU tasks and multiple GPU tasks, and correspondingly dividing GPU resources into single GPU resource queues and multiple GPU resource queues; one way to adjust the GPU resource queue according to the queuing status of the current GPU task queue includes: when the GPUs in the multi-GPU resource queue are used up and the single-GPU task queue is empty, the multi-GPU task tries to occupy the GPUs in the single-GPU resource queue; when the GPU of the single GPU resource queue is used up and the multi-GPU task queue is empty, the single GPU task tries to occupy the GPU of the multi-GPU resource queue.
Step S500 of this embodiment is responsible for GPU resource partitioning and adjustment of single GPU task and multiple GPU task. Firstly, dividing GPU tasks into single GPU tasks and multiple GPU tasks, wherein the single GPU tasks are tasks applying for less than 4 GPUs, and the multiple GPU tasks are GPU tasks applying for more than or equal to 4 GPUs. Secondly, dividing GPU resources into a single GPU resource queue and a multi-GPU resource queue, and corresponding to the single GPU task queue and the multi-GPU task queue. When the GPUs in the multi-GPU resource queue are used up and the single GPU task queue is empty, the multi-GPU task attempts to preempt the GPUs of the single GPU resource queue. Conversely, when the GPU of the single GPU resource queue is used up and the multi-GPU task queue is empty, the single GPU task attempts to preempt the GPUs of the multi-GPU resource queue.
Step S600, the competition of the GPU task and the CPU task in the memory system on the same node is eliminated.
In this embodiment, one implementation manner of eliminating contention of the memory system in which the GPU task and the CPU task are on the same node includes: monitoring the use conditions of all process bandwidths; and when the bandwidth use condition of the CPU task reaches a preset broadband value and the GPU utilization rate of the GPU task on the node is reduced by a preset value, compressing the CPU use core number of the CPU task and reducing the bandwidth use rate of the CPU task.
Specifically, step S600 is responsible for eliminating the contention of the memory system on the same node between the GPU task and the CPU task. Because the tasks of the CPU are more diversified, the bandwidth usage of the tasks cannot be predicted, and the embodiment monitors the use condition of the bandwidth of all the processes through the Intel MBM. And when the bandwidth utilization of the CPU task reaches over 75 percent and the GPU utilization rate of the GPU task on the node is correspondingly reduced, the number of the CPU utilization cores of the CPU task is compressed by half, the bandwidth utilization rate of the CPU task is reduced, and the influence on the GPU task is further reduced. For the remaining CPU cores, no new CPU task is scheduled until the GPU task is completed.
In order to further understand the resource demand-aware multi-queue scheduling method of the present embodiment, as shown in fig. 4, a specific execution flow of the resource demand-aware multi-queue scheduling method of the present embodiment is described.
1) The user submits the task: and the user writes own program according to the requirement by calling the corresponding API.
2) Judging the task type: and if the task is the CPU task, scheduling to a CPU task queue, and if the task is the GPU task, scheduling to a GPU task queue.
3) And (3) scheduling the CPU task: and scheduling the CPU tasks according to the number of the CPUs applied for the tasks and the available resources in the CPU resource queues of the CPU tasks.
4) Determining a search initial point of a GPU task: and determining an initial point of the optimal CPU configuration search of the task according to the relevant information of the task.
5) Searching the optimal CPU configuration of the GPU task: and searching the optimal CPU configuration according to the flow designed by the optimal CPU configuration searching submodule.
6) GPU task scheduling: and scheduling the GPU tasks according to the GPU number applied by the tasks and the available resources in the GPU resource queue.
7) Dynamic adjustment of CPU resource queue: and determining whether the division of the CPU resources needs to be adjusted in real time according to the queuing conditions of the current CPU task queue and the GPU task queue.
8) Dynamic adjustment of GPU resource queue: according to the queuing conditions of the current single GPU task queue and the current multi-GPU task queue, whether the partition of GPU resources needs to be adjusted or not is determined in real time
9) Real-time CPU end resource contention elimination: and the competition of the GPU task and the CPU task in the memory system on the same node is eliminated. By monitoring the memory bandwidth used by the CPU tasks. When the bandwidth used by the CPU task exceeds a certain threshold, the number of CPU cores used by the task is limited, so that the requirement of the CPU task on the bandwidth is reduced.
10) And (3) task execution: and starting to execute the task on the corresponding node, and returning an experimental result after the execution is finished.
Therefore, the resource demand aware multi-queue scheduling method of the embodiment can effectively control the resource task scheduling of the CPU task and the GPU task, improve the throughput, reduce the queuing resource demand, and maximize the throughput of the system and minimize the queuing of the system on the premise of not needing user perception.
Example 2
As shown in fig. 5, the present embodiment provides a resource demand-aware multi-queue scheduling system 100, where the resource demand-aware multi-queue scheduling system 100 includes: a self-aware CPU allocation module 110, a multi-queue resource adjustment module 120, and a real-time contention resolution module 130.
Specifically, in this embodiment, as shown in fig. 6, the self-aware CPU allocation module 110 is configured to acquire a task submitted by a user, determine that the task is a CPU task or a GPU task, determine an optimal CPU configuration based on adjusting the number of CPU cores and checking the GPU utilization when the task is a GPU task, and enter GPU task scheduling.
The self-aware CPU allocation module 110 includes an initial point search unit and an optimal CPU configuration unit.
The initial point searching unit acquires a CPU searching initial point based on history information; determining increase and decrease of the CPU searching initial point according to one or more of pipeline information, weight information and CPU calculation complexity provided by a user; the increased and decreased CPU search initial points form the search initial points of the task.
Wherein, the increase and decrease of the CPU search initial point is determined according to one or more of the pipeline information, the weight information and the CPU calculation complexity provided by the user: if the task submitted by the user contains pipeline information and pipeline optimization is used, subtracting 1 from the initial point searched by the CPU; if the task submitted by the user contains weight information and the weight reaches a preset weight value, subtracting 1 from the initial point of the CPU search; and if the task submitted by the user comprises the CPU end calculation complexity and the CPU calculation complexity reaches the preset complexity, adding 1 to the initial CPU search point.
Specifically, a CPU searching initial point is obtained through historical information, and then if a user provides pipeline information and pipeline optimization is used, the corresponding CPU searching initial point is reduced by 1; if the user provides weight information and the weight is more, subtracting 1 from the initial point of the corresponding CPU search; if the user provides the CPU end calculation complexity, and the CPU calculation complexity is higher, adding 1 to the initial point of the CPU search; thereby determining a search initiation point for each task.
The optimal CPU configuration unit acquires the GPU utilization rate under the current CPU configuration based on the initial search point of the task; and performing 1-subtracting cyclic operation on the search initial point of the task until the GPU utilization rate is not increased any more, and performing 1-adding cyclic operation on the search initial point of the task until the GPU utilization rate is not increased any more, wherein the corresponding CPU configuration is the optimal CPU configuration when the output GPU utilization rate is not increased any more.
Specifically, first, according to the determined search initial point of each task, the task is run for a preset time (for example, 90 seconds), and the GPU utilization under the CPU configuration is detected. Secondly, trying to reduce 1 core, checking whether the utilization rate of the GPU is improved, and if the utilization rate of the GPU is improved, continuing trying to reduce one core until the optimal number of CPU cores is found; if not, the next stage is entered. Then, trying to add 1 core, checking whether the GPU utilization rate of the application is improved, and if the GPU utilization rate is improved, continuously trying to add one core; if not, finding the optimal CPU core number.
In this embodiment, as shown in fig. 6, the resource adjusting module 120 with multiple queues is configured to perform GPU task scheduling or CPU task scheduling. The resource adjustment module 120 of the multi-queue includes a CPU resource adjustment unit and a GPU resource adjustment unit.
The CPU resource adjusting unit divides CPU resource partitions and adjusts the CPU resource queues according to the queuing conditions of the current CPU task queue and the GPU task queue.
And the CPU resource adjusting unit is responsible for dividing and adjusting the CPU resources of the GPU task and the CPU task. Specifically, the resource adjusting module 120 of the multi-queue first divides the task into a CPU task and a GPU task, and then divides the CPU resource into a CPU task CPU resource queue and a GPU task CPU resource queue, where the GPU task CPU resource queue is a CPU resource reserved for the GPU task. The division of the corresponding resource queue is determined by the statistical information of the historical tasks. The GPU resource queue is determined by the highest CPU resource required by the historical GPU task. And for the CPU task queue, the CPU resource adjusting unit adopts a DRF scheduling algorithm and schedules the CPU according to the use of the user. For the GPU task queue, the CPU resource adjusting unit also adopts a DRF scheduling algorithm, the GPU is scheduled according to the use of the user to the GPU, and the CPU use of the GPU task is determined based on the optimal CPU configuration.
In the actual operation process, when a CPU task burst comes and a GPU task queue is idle, allowing the CPU task to preempt CPU resources in the GPU task CPU resource queue for pre-operation. When the GPU task comes to need the occupied CPU resources, the running CPU task is suspended, so that the GPU task can run. And the suspended CPU tasks re-enter the queue head to wait for being rescheduled on the proper node. Conversely, when the GPU task arrives suddenly, the GPU task may preempt the resources of the CPU resource queue.
And the GPU resource adjusting unit divides GPU resource partitions and adjusts a GPU resource queue according to the queuing condition of the current GPU task queue.
In this embodiment, the dividing the GPU resource partition includes: dividing GPU tasks into single GPU tasks and multiple GPU tasks, and correspondingly dividing GPU resources into single GPU resource queues and multiple GPU resource queues; one way to adjust the GPU resource queue according to the queuing status of the current GPU task queue includes: when the GPUs in the multi-GPU resource queue are used up and the single-GPU task queue is empty, the multi-GPU task tries to occupy the GPUs in the single-GPU resource queue; when the GPU of the single GPU resource queue is used up and the multi-GPU task queue is empty, the single GPU task tries to occupy the GPU of the multi-GPU resource queue.
In this embodiment, the GPU resource adjusting unit is responsible for dividing and adjusting GPU resources of a single GPU task and a multiple GPU task. Firstly, dividing GPU tasks into single GPU tasks and multiple GPU tasks, wherein the single GPU tasks are tasks applying for less than 4 GPUs, and the multiple GPU tasks are GPU tasks applying for more than or equal to 4 GPUs. Secondly, dividing GPU resources into a single GPU resource queue and a multi-GPU resource queue, and corresponding to the single GPU task queue and the multi-GPU task queue. When the GPUs in the multi-GPU resource queue are used up and the single GPU task queue is empty, the multi-GPU task attempts to preempt the GPUs of the single GPU resource queue. Conversely, when the GPU of the single GPU resource queue is used up and the multi-GPU task queue is empty, the single GPU task attempts to preempt the GPUs of the multi-GPU resource queue.
In this embodiment, the real-time contention resolution module 130 is configured to resolve contention of the memory system where the GPU task and the CPU task are on the same node.
Specifically, in this embodiment, as shown in fig. 6, the real-time contention resolution module 130 monitors the use of all process bandwidths; and when the bandwidth use condition of the CPU task reaches a preset broadband value and the GPU utilization rate of the GPU task on the node is reduced by a preset value, compressing the CPU use core number of the CPU task and reducing the bandwidth utilization rate of the CPU task.
In this embodiment, the real-time contention resolution module 130 is responsible for resolving the contention of the memory system of the GPU task and the CPU task on the same node. Because the tasks of the CPU are more diversified and the bandwidth usage of the tasks cannot be predicted, the embodiment monitors the use conditions of the bandwidth of all processes through the Intel MBM. And when the bandwidth utilization of the CPU task reaches over 75 percent and the GPU utilization rate of the GPU task on the node is correspondingly reduced, the number of the CPU utilization cores of the CPU task is compressed by half, the bandwidth utilization rate of the CPU task is reduced, and the influence on the GPU task is further reduced. And for the rest CPU cores, a new CPU task is not scheduled before the GPU task is completed.
Example 3
The embodiment of the invention also provides a server, wherein the server is a server in a cluster and comprises a CPU and a GPU; the CPU and the GPU perform the resource demand aware multi-queue scheduling method as described in embodiment 1 when running. The resource demand aware multi-queue scheduling method has been described in detail in embodiment 1, and is not described herein again.
In conclusion, the method and the device can effectively control the resource task scheduling of the CPU task and the GPU task, improve the throughput and reduce the queuing resource requirement, maximize the throughput of the system and minimize the queuing of the system on the premise of not needing the perception of a user, can indirectly provide the support of a scheduling technology for a potential heterogeneous cluster, and provide task scheduling service for a cloud environment based on the heterogeneous cluster. Therefore, the invention effectively overcomes various defects in the prior art and has high industrial utilization value.
The foregoing embodiments are merely illustrative of the principles and utilities of the present invention and are not intended to limit the invention. Any person skilled in the art can modify or change the above-mentioned embodiments without departing from the spirit and scope of the present invention. Accordingly, it is intended that all equivalent modifications or changes which may be accomplished by those skilled in the art without departing from the spirit and scope of the present invention as set forth in the appended claims.