Compiler optimization option recommendation method based on Bayesian optimization
1. A compiler optimization option recommendation method based on Bayesian optimization is characterized by comprising the following steps:
step 1: randomly generating an optimized sequence, measuring the execution time of a program after the optimized sequence is compiled to construct an initial training set, and after the initial training set is generated, realizing iteration based on the initial training set according to a Bayesian optimization framework; each iteration is based on the existing training set, and in each iteration, the steps 2 to 6 are executed;
step 2: establishing a prediction model through a random forest training algorithm; the processing performed by the predictive model includes: predicting a given optimization sequence forming a training set by using each decision tree of a random forest training algorithm to obtain a predicted value made by each decision tree; then, combining a plurality of decision trees in a random forest training algorithm, thereby obtaining the mean value and standard deviation of the predicted values on the basis of all the predicted values made by each decision tree;
and step 3: obtaining important optimization options by ranking the importance, and randomly setting the remaining optimization options as unimportant optimization options; the specific operation is as follows:
calculating the importance of each optimization option based on a trained random forest model, as shown in the following formula:
where u represents the total number of decision trees split on a node using the optimization option o, t represents the total number of decision trees, and djRepresenting nodes split by using o in a jth decision tree;
and 4, step 4: obtaining an important optimization option combination by listing all combinations consisting of the important optimization options; and determining the specific number of non-repeated combinations consisting of unimportant optimization options corresponding to a fixed important optimization option combination by the aid of a Gaussian attenuation function, thereby obtaining a candidate optimization sequence set;
c (x) is calculated using a gaussian decay function as shown in the following equation:
wherein, C1Representing the number of unimportant optimization option combinations corresponding to the important optimization option combinations, x representingThe number of iterations, c (x), represents the number of unimportant optimization option combinations at x iterations, and offset, scale, and escape represent three parameters that control the attenuation shape of the gaussian attenuation function;
and 5: predicting the performance of each candidate optimization sequence by using the constructed prediction model, and obtaining the average value and the standard deviation of each prediction for calculating the EI value of each candidate optimization sequence;
step 6: updating a training set, wherein the specific operation comprises the following steps: obtaining an optimized sequence which has the maximum EI value in all candidate optimized sequences and is not in a training set, measuring the program execution time caused by the optimized sequence, and adding the optimized sequence and the execution time thereof into the training set for the next iteration; if the termination condition is not met, entering next iteration, and starting to execute from the step 2 again;
and 7: and outputting the final optimization sequence when the termination condition is reached.
2. The bayesian-optimization-based compiler optimization option recommendation method of claim 1, wherein the training instances in the training set have two sources, namely (1) randomly selecting optimization sequences and evaluating them to obtain corresponding program execution times as an initial training set; (2) in each subsequent iteration, an optimal optimization sequence is selected from the unknown part in the optimization space according to the acquisition function, and the program execution time under the optimal optimization sequence is evaluated and added into the training set as an example.
Background
A compiler is a program that converts a program to be semantically equivalent, executes faster, and uses fewer resources. A typical compiler tends to have a large number of optimization options, each with both on and off states. Because the number of optimization options is too large, the user of the compiler cannot understand the function of each optimization option and cannot judge whether an optimization option should be turned on or off, so that the compiled code has a shorter execution time, a smaller code volume, or less energy consumption. Thus, the large amount of high dimensional data present in the compiler optimization option recommendation problem is not affordable by traditional bayesian optimization. This also results in the need to predict a large number of optimization sequences in bayesian optimization, an optimization sequence referring to a set of open optimization options. To solve this problem, the optimization levels of the compiler are introduced, and setting each optimization level opens a specific optimization option to achieve a certain optimization goal. However, these optimization levels do not allow each program to perform well enough on each platform. Even on some platforms, the performance achieved by the level of optimization varies greatly from the optimal performance of the program. Therefore, in addition to using a given level of optimization, it is necessary to make further optimization option recommendations for a particular program at a particular platform to achieve sufficiently good performance for the program being compiled.
There are two main categories of existing compiler optimization option recommendation methods: a recommendation method based on off-line learning and a recommendation method based on-line learning. Recent online learning-based recommendation methods include random optimization option recommendation methods (in each iteration, each optimization option is randomly turned on or off, and then the performance of the program is evaluated under the combination of optimization options, denoted as RIO), genetic algorithm-based optimization option recommendation methods (first, a set of initial optimization option combinations is generated, and then, in each iteration, the combination of optimization options in the set will be regarded as a chromosome and crossed and varied, denoted as GA), IRace-based optimization option recommendation methods (sample distribution of each optimization option during learning iteration, for deciding whether each optimization option is turned on or off, denoted as IRace), and the like. Although the above online learning methods have proven effective and do not rely on large amounts of training data, they still suffer from inefficiencies.
Bayesian optimization is a method for optimizing an objective function, and obtaining the result of the objective function usually requires high cost. The core idea of bayesian optimization is to use the accumulated knowledge of known regions in the search space to guide the selection of samples in the remaining regions in order to find the optimal sample more efficiently. More specifically, it is an iterative process comprising two main steps: firstly, establishing a prediction model based on an objective function of a measured sample; second, further sampling based on the prediction model and the acquisition function is directed by the prediction model. In conventional bayesian optimization, a gaussian process is used to build a predictive model that provides a bayesian posterior probability distribution to describe the objective function output of the candidate sample. An acquisition function may decide where to sample next in order to improve the effect of the sampling on the basis of the current best observation. Widely used acquisition functions include Expected Improvement, Maximum Variance, and Maximum Mean.
Disclosure of Invention
The invention aims to provide a compiler optimization option recommendation method based on Bayesian optimization, a new optimal compiler optimization option recommendation method (BOCA) is recommended for a program to be compiled, a new optimization option search strategy is provided in a Bayesian optimization framework, and an attenuation function is added to balance exploration of an unknown optimization space and utilization of a known optimization space so as to realize innovation of the compiler optimization option recommendation method.
The technical scheme adopted by the invention to solve the problems is as follows:
a compiler optimization option recommendation method based on Bayesian optimization comprises the following steps
Step 1: randomly generating an optimized sequence, measuring the execution time after the optimized sequence is compiled to construct an initial training set, and after the initial training set is generated, realizing iteration based on the initial training set according to a Bayesian optimization framework; each iteration is based on the existing training set, and in each iteration, the steps 2 to 6 are executed;
step 2: establishing a prediction model through a random forest training algorithm; the processing performed by the predictive model includes: predicting a given optimization sequence forming a training set by using each decision tree of a random forest training algorithm to obtain a predicted value made by each decision tree; then, combining a plurality of decision trees in a random forest training algorithm, thereby obtaining the mean value and the standard deviation of the predicted values on the basis of all the predicted values made by each decision tree;
and step 3: obtaining important optimization options by sequencing the importance, and randomly setting the rest optimization options as unimportant optimization options; the specific operation is as follows:
calculating the importance of each optimization option based on a trained random forest model, as shown in the following formula:
where u represents the total number of decision trees split on a node using the optimization option o, t represents the total number of decision trees, and djRepresenting nodes split by using o in a jth decision tree;
and 4, step 4: obtaining an important optimization option combination by listing all combinations consisting of the important optimization options; and determining the specific number of non-repeated combinations of unimportant optimization option combinations corresponding to a fixed important optimization option combination by the assistance of a Gaussian attenuation function, thereby obtaining a candidate optimization sequence set;
c (x) is calculated using a gaussian decay function as shown in the following equation:
wherein, C1Representing the number of unimportant optimization option combinations for each important optimization option combination, x representing the number of iterations, c (x) representing the number of important optimization option combinations in the x iterations, offset, scale and decade representing three parameters that control the attenuation shape of the gaussian attenuation function;
and 5: predicting the performance of each candidate optimization sequence by using the constructed prediction model, and obtaining the average value and the standard deviation of each prediction for calculating the EI value of each candidate optimization sequence;
step 6: updating a training set, wherein the specific operation comprises the following steps: obtaining an optimized sequence which has the maximum EI value in all candidate optimized sequences and is not in a training set, measuring the program execution time caused by the optimized sequence, and adding the optimized sequence and the execution time thereof into the training set for the next iteration; if the termination condition is not met, entering the next iteration, and starting to execute from the step 2 again;
and 7: and outputting the final optimization sequence when the termination condition is reached.
Compared with the prior art, the invention can achieve the following beneficial technical effects:
compared with the existing Bayesian optimization method, the optimization method has better performance and can more effectively realize the optimization option recommendation of the compiler.
Drawings
FIG. 1 is a flowchart illustrating an overall compiler optimization option recommendation method based on Bayesian optimization according to the present invention;
FIG. 2 is a diagram illustrating an exemplary implementation process of a compiler optimization option recommendation method based on Bayesian optimization according to the present invention;
FIG. 3 is a schematic of the decay function used in the present invention;
FIG. 4 is a graph showing the influence of K;
FIG. 5 is a graph showing an influence curve of scale values;
FIG. 6 shows the results of the comparison of the present invention with ε -PAL and FLASH.
Detailed Description
The technical solution of the present invention is further described in detail below with reference to the accompanying drawings and specific embodiments.
The invention discloses a compiler optimization option recommendation method (BOCA) based on Bayesian optimization, which is realized by adopting Python language and calls the existing third-party library scibitleft and numpy of the Python language. Where the part involved in the machine learning model directly uses the random forest and its default parameter settings provided in scinit leann. K, offset, decade, and scale in BOCA are set to 8, 20, 0.5, and 10, respectively. In addition, the initial set size is set to 2 and the default total number of iterations is set to 60. The invention adopts an online learning method.
In each iteration, the BOCA technical scheme is summarized in three aspects: (1) establishing a prediction model; (2) selecting a candidate optimization sequence; (3) the optimal sequence among these candidate optimized sequences is selected according to an acquisition function, and then evaluated and used to update the prediction model. Wherein, (1) and (2) are the innovation points of the invention.
Fig. 1 is a flowchart illustrating the overall compiler optimization option recommendation method based on bayesian optimization according to the present invention.
The method comprises the following specific implementation processes:
step 1: randomly generating an optimized sequence, measuring the execution time of a program after the optimized sequence is compiled to construct an initial training set, and after the initial training set is generated, realizing iteration based on the initial training set according to a Bayesian optimization framework; each iteration is based on the existing training set, and in each iteration, the steps 2 to 6 are executed;
step 2: establishing a prediction model through a random forest training algorithm;
the processing performed by the predictive model includes: predicting a given optimization sequence forming a training set by using each decision tree of a random forest training algorithm to obtain a predicted value made by each decision tree; then, combining a plurality of decision trees in a random forest training algorithm, thereby obtaining the mean value and standard deviation of the predicted values on the basis of all the predicted values made by each decision tree;
establishing a prediction model requires a definite training set and a model training algorithm. The training instances in the training set are optimization sequences and correspond to the execution times for a given program after it has been compiled under the optimization sequences. The training examples in the training set have two sources, namely: (1) randomly selecting optimized sequences and evaluating the optimized sequences to obtain corresponding program execution time serving as an initial training set; (2) in each subsequent iteration, an optimal optimization sequence is selected from the unknown part in the optimization space according to the acquisition function, and the program execution time under the optimal optimization sequence is evaluated and added into the training set as an example.
In the step, a prediction model is established by using a random forest instead of a Gaussian process in the traditional Bayesian optimization. The reason for replacing the gaussian process with random forest is: the gaussian process cannot be extended to high dimensional data (i.e. a huge number of optimization options). The inventive method (BOCA) uses the Expected Improvement (EI) as an acquisition function, since it considers both the exploration of the unknown optimization space (measured by the predicted standard deviation) and the exploitation of the known optimization space (measured by the predicted mean) in determining the next optimization sequence to be measured. In the Gaussian process, the predicted mean value and standard deviation can be directly output, and the method combines a plurality of decision trees in the random forest, so that the mean value and standard deviation of the predicted value are obtained on the basis of prediction of each decision tree. Given an optimization sequence, each decision tree of a random forest will predict it, and then calculate the mean and standard deviation from all these predictions made by the decision trees.
And step 3: obtaining a certain number of important optimization options by sequencing the importance, and regarding the rest of the optimization options as unimportant optimization options; the specific operation is as follows:
calculating the importance of each optimization option based on a trained random forest model by using the formulas (1), (2) and (3), wherein the calculation process specifically comprises the following steps:
the Gini importance of node d is calculated as follows:
wherein I () represents Gini purity, ω, of a nodeleft、ωrightRepresenting the proportion of the optimized sequence split from node d to the left node, the proportion of the optimized sequence split from node d to the right node, ndRepresents the number of optimized sequences in node d, and N represents the total number of optimized sequences in the entire training set.
Gini purity of node d is calculated as follows:
where c represents the total number of different tags in the set and piIndicating if the slave setAnd randomly selecting an optimized sequence in the synthesis, wherein the label of the selected optimized sequence is the probability of i. Although the goal is to construct a regression model through random forests, during training each decision tree will actually divide the entire range into several intervals, and the new intervals will serve as new labels, so that Gini purity can be calculated.
After obtaining the Gini purity for each optimization in each decision tree, the importance of the optimization options is further calculated by merging all of these decision trees as shown in the following equation:
where u represents the total number of decision trees split on a node using the optimization option o, t represents the total number of decision trees, and djRepresenting nodes split by using o in a jth decision tree;
obtaining a number of important optimization choices by ranking the importanceItem(s)And regarding the remaining optimization options as unimportant optimization options, the specific operation is as follows: and arranging the optimization options in descending order of importance, and identifying the K optimization options arranged at the top as important optimization options. Important optimization options can have a significant impact on the execution time of a given program. Here, all optimization option combinations consisting of these important optimization options are listed. To avoid the enormous costs of enumerating all combinations, K should be as small as possible.
In addition to setting these important optimization options, the remaining optimization options are set to obtain a complete optimization sequence. The invention is based on the idea that for a fixed combination of important optimization options, the remaining optimization options are randomly arranged to form an optimization sequence, since they have less impact on the execution time of a given program. However, any machine learning technique has difficulty ensuring that the importance of each optimization option is accurately predicted, especially when the training set is not large. Therefore, the optimization options other than the important optimization options may include the truly important optimization options. For this reason, the present invention further explores combinations of optimization options other than the important optimization options. Since it is not possible to explore all combinations, a certain number of combinations are randomly selected for exploration. Another benefit of random exploration is to avoid local optimality.
The size of the training set in the first iteration is relatively small, and inaccurate identification of important optimization options is more likely to be carried out, so that the attenuation should be slow at the beginning, the attenuation is a combination corresponding to one important optimization, and the number of the remaining optimization option combinations needing to be explored for expanding the important optimization into a complete optimization sequence. Subsequently, the training set becomes larger and larger, and the inaccuracy is reduced, so that the attenuation can be accelerated. Based on this assumption, in each iteration, a gaussian decay function is used to determine the number of remaining combinations of optimization options that need to be explored in order to extend the fixed combination of important optimization options into a complete optimization sequence. C (x) is calculated using a gaussian decay function as shown in the following equation:
wherein, C1Denotes the number of unimportant optimization option combinations for each important optimization option combination, x denotes the number of iterations, c (x) denotes the number of important optimization option combinations in the x iterations, and offset, scale, and decade denote three parameters that control the attenuation shape of the gaussian attenuation function.
Fig. 2 is a diagram illustrating an example of the implementation process of the compiler optimization option recommendation method based on bayesian optimization according to the present invention.
As shown in fig. 3, the BOCA attenuation process is shown, along with the specific meanings of offset, scale and decade. And according to the calculated number of the combinations, when each important optimization option combination is randomly generated by the BOCA and expanded into a complete optimization option combination, the combination of unimportant optimization options which need to be completed is generated. Based on the above, it is found that in each iteration, the total number of candidate optimization sequences is selected to be c (x) 2k. The EI values of these candidate optimized sequences are calculated by using a prediction modelThe optimal sequence with the largest EI value is selected as the optimal sequence in the set, then the execution time of the given program after its compilation is measured and added to the training set for updating the prediction model at the next iteration.
Therefore, the invention combines the random forest technology and provides a new selection strategy of a candidate optimization sequence to overcome the problem of high cost. The selection strategy of the present invention aims to select a subset of the unevaluated optimized sequences that contains as optimal a sequence as possible. The present invention finds the optimal optimized sequence more efficiently by predicting this subset rather than predicting all the unevaluated optimized sequences.
Since there are a great many unknown optimal sequences in the remaining optimization space, it is very expensive to predict all remaining optimal sequences using a prediction model to find the optimal sequence. Therefore, to overcome this problem, the present invention selects a subset of unknown optimized sequences that are likely to contain the optimal sequences as a set of candidate optimized sequences. As previously mentioned, the selection strategy of the present invention is directed to selecting a subset of the unevaluated optimized sequences that contains as optimal a sequence as possible. The present invention finds the optimal optimized sequence more efficiently by predicting the subset rather than all the unexevated optimized sequences. Thus, the present invention does not guarantee that the found self-subset must be guaranteed to contain the most significant sequences. However, since the optimization space to be searched is very large, it is also difficult to select a candidate optimization sequence of such a subset. In order to solve the problem, the invention designs a selection strategy: for a given program, only a small number of optimization options can have a large impact on the execution time of the given program (called important optimization options), and it is more likely that an optimal solution will be found by fully utilizing these important optimization options. However, it is generally difficult to accurately determine which of the numerous optimization options are important optimization options. Here, by balancing the development of the known optimization space and the exploration of the unknown optimization space, a candidate optimization sequence subset that is likely to contain the optimal sequence is selected based on the selection strategy designed by the present invention.
Feature selection relies on multiple decision trees and tree-based integrated machine learning techniques, whereas the decision trees used by the present invention perform the splitting of leaf nodes according to Gini purity. In the decision tree, each node has Gini significance, which corresponds to the reduction in Gini purity in the node when splitting of leaf nodes is performed using a feature (in compiler tuning, a feature corresponds to an optimization option). Here, Gini purity in a node corresponds to the probability that an optimized sequence randomly chosen from the set of optimized sequences in the node is mislabeled. In the problem of compiler optimization, each feature has only two selectable values, namely 0 and 1, so one feature can be used to split up at most one node in the decision tree. Thus, the Gini purity of a node is equal to the Gini purity of the feature on that node used for splitting.
And 4, step 4: obtaining an important optimization option combination by listing all combinations consisting of the important optimization options; and determining the specific number of non-repeated combinations composed of unimportant optimization options by the assistance of a Gaussian attenuation function, thereby obtaining a candidate optimization sequence set;
the purpose of this step is to fully explore these important optimization options. Besides setting important optimization options, to obtain a complete optimization sequence, unimportant optimization option combinations are also required to be set. A complete combination of optimization options consists of a combination of important optimization options and a combination of unimportant optimization options. While the setting of the unimportant optimization options is done by randomly generating the on or off state of the optimization options. And generating non-repeated combinations consisting of the unimportant optimization options corresponding to each important optimization option combination, wherein the specific number of the non-repeated combinations consisting of the unimportant optimization options is determined by assistance of a Gaussian attenuation function.
And 5: predicting and optimizing sequence performance, specifically: predicting the performance of each candidate optimization sequence by using the constructed prediction model, and obtaining the average value and the standard deviation of each prediction for calculating the EI value of each candidate optimization sequence;
step 6: updating the training set, specifically obtaining an optimized sequence which has the maximum EI value in all candidate optimized sequences and is not in the training set, measuring the program execution time caused by the optimized sequence, and adding the optimized sequence and the execution time thereof into the training set for the next iteration; if the termination condition is not met, entering next iteration, and starting to execute from the step 2 again;
and 7, outputting a final optimization sequence when the termination condition is reached.
The present invention also investigates the effect of the main parameters in BOCA, including K (i.e. the number of identified important optimization options) and scale (used to control the decay rate). The experimental results are shown in fig. 4 and 5, fig. 4 is a graph showing the influence curve of the K value, and fig. 5 is a graph showing the influence curve of the scale value. Where the x-axis represents the parameter value and the y-axis represents the average time it takes for the BOCA to reach the acceleration ratio in 60 iterations until the default parameter value is reached. From the figure it can be found that the default value herein (i.e. 8) performs best. When setting to 16, BOCA cannot complete the experiment within a given time period, as it is very time consuming to enumerate all combinations of 16 important optimization options. This also proves the importance of setting the appropriate value of K. From the figure, it can be seen that the small values of scale (i.e., 5 and 10) perform better than the large values (i.e., 15 and 20), indicating that the relatively fast decay helps to improve the efficiency of compiler auto-tuning. Moreover, the present scale value (i.e., 10) performs best. In summary, the main parameters in BOCA do have a certain influence on the effectiveness of BOCA, and the current parameter setting of the present invention is a better choice, and can be used as a default setting in subsequent use of BOCA.
The present invention contemplates a three-way comparison method. The first category is the existing compiler optimization option recommendation methods, including RIO, GA, and IRace. The second category is existing bayesian optimization methods, including the traditional bayesian optimization method (epsilon-PAL), the bayesian optimization method most recently used to configure software systems (FLASH), and the advanced bayesian optimization method in general (TPE). When the present invention compares various compiler optimization option recommendation methods and BOCA on GCC and LLVM, it compares the acceleration ratios they achieved at optimization level O3 and calculates the boost that BOCA achieved over the other methods based on the acceleration ratios. 20 procedures from Polybench and cbbench were used in the experiment.
BOCA was first compared to other compiler optimization option recommendation methods, namely RIO, GA and IRace, with the results shown in tables 1 and 2. The present invention also sets the initial set size in these methods to be the same as BOCA for fair comparison, except for GA. Due to the crossover mechanism in the genetic algorithm, the experiment cannot set the initial set size to 2 for GA, so the present invention sets it to the nearest even value of 4. In the existing literature, the initial set size of the genetic algorithm is set to 100, but in the experiment of the present invention, the cost required to evaluate 100 optimized sequences is very high, especially when the compilation and execution time of the program is long. Therefore, the present invention does not employ such an arrangement. To investigate the effectiveness of the GA with the larger initial value, the present invention also tried a GA with an initial set size of 10. The present invention refers to these two GA implementations as GAs, respectively4And GA10. The invention first counts the time taken for each comparison method to reach the acceleration ratio of BOCA in the 30 th iteration, 40 th iteration, 50 th iteration and 60 th iteration. Then, for the 30 th, 40 th, 50 th, and 60 th iterations, the enhancement (%) of BOCA over GCC relative to other compiler optimization option recommendation methods is shown in table 1, and the enhancement (%) of BOCA over LLVM relative to other compiler optimization option recommendation methods is shown in table 2. Each column in tables 1 and 2 shows the proportion of time spent by BOCA at the acceleration ratio corresponding to the number of iterations compared to the comparative method.
TABLE 1
TABLE 2
As can be seen from tables 1 and 2, the time-consuming promotion effect of BOCA in all cases varied from 42.30% to 78.04%, which confirms that BOCA can significantly improve the effect of compiler optimization option recommendation.
The inventive method (BOCA) was compared with other Bayesian optimization methods, i.e., TPE, ε -PAL, and FLASH. In the two tables above, the average proportion of time spent by BOCA relative to TPE in all cases varied from 43.01% to 71.06%. Since both epsilon-PAL and FLASH require enumeration and prediction of all optimization sequences in each iteration, while in the present study the number of optimization options is large, these two bayesian optimization methods do not yield results in an acceptable time. Thus, the present invention cannot directly compare BOCA to epsilon-PAL and FLASH over all optimization options, but rather attempts to conduct a comparative test by using a smaller set of optimization options. Here, the present invention randomly selects 20 optimization options for optimization option recommendation of GCC and LLVM compiler, respectively, and randomly selects 4 programs as representative to perform experiments.
FIG. 6 is a graph showing the results of the comparative experiments of the present invention method (BOCA) with ε -PAL and FLASH. Where the x-axis represents the sample acceleration ratio at the number of iterations for BOCA and the y-axis represents the time taken to reach the corresponding acceleration ratio. It can be seen that although the present invention uses a smaller set of optimization options, BOCA still takes less time to achieve a given speed-up ratio than epsilon-PAL and FLASH. The experimental result proves that the performance of the BOCA is better than that of the existing Bayesian optimization method, and the optimization option recommendation of the compiler can be realized more effectively.