Virtual driving scene element recombination method based on improved Latin hypercube sampling
1. A virtual driving scene element reorganization method based on improved Latin hypercube sampling is based on original data, the original data comprises discrete elements and continuous elements, and the method is characterized by comprising the following steps:
1) counting all discrete element combinations in the original data;
2) determining the sampling point position of the sampled discrete element combination in the accumulative probability distribution;
3) determining discrete element combination attributes of the samples;
4) determining the sampling times of each discrete element combination;
5) extracting continuous elements from the original data corresponding to each discrete element combination;
6) calculating the cumulative probability distribution of the continuous elements;
7) determining the sampling point position of the sampled continuous elements in the cumulative probability distribution according to the sampling times of the discrete element combination;
8) determining a continuous element value;
9) continuous element correlation control;
10) recombining the discrete elements and the continuous elements which are sampled completely to form virtual driving scene data;
wherein, in the step 1), the relative probability and the cumulative probability distribution of all discrete element combinations are counted;
in step 2) and step 3), the sampling point position of the discrete element combination of the sampling in the cumulative probability distribution is calculated based on the formula (1):
PSrepresents the cumulative probability corresponding to the discrete element combination in the s-th group of sampling data,
when 0 < PS≤P(D1) When P is determinedSCorresponding discrete element combination attribution is D1,
When P (D)i)<PS≤P(Di+1) When P is determinedSCorresponding discrete element combination attribution is Di+1,
DiRepresents the ith group of discrete element combination;
in step 7) and step 8), the sampling point positions of the sampled continuous elements in the cumulative probability distribution are calculated based on the formula (1), then the closest 5 sample points are respectively taken in the range smaller than the cumulative probability and in the range larger than the cumulative probability according to the cumulative probability distribution calculated in step 6), and then the values of the corresponding continuous elements are calculated by a cubic spline interpolation method.
2. The virtual driving scene element regrouping method according to claim 1,
and 4), determining the sampling times of each discrete element combination according to the discrete element combination attributes and probability statistics.
3. The virtual driving scene element regrouping method according to claim 1,
in step 6), the cumulative probability distribution of the continuous elements is counted based on the discrete element combination classification, and the method comprises the following steps:
extracting the kth continuous element data from the original data corresponding to a certain discrete element combination to form a sample set C1k(ii) a Then sample set C1kThe continuous elements in (1) are sorted according to ascending order of values, repeated values are removed, and the sample set C is obtained again1k-unique;
Analysis ofC1k-uniqueIf the interval of some adjacent value is greater than the average value of all intervals, inserting a new sample point in the interval of the adjacent value, and obtaining a new sample set C again1k-unique', the spacing of all adjacent values in the new sample set is less than the average spacing;
based on C1k-unique', all values of the k-th continuous element are counted to be C1kThe relative probability distribution is obtained according to the times of occurrence, and the cumulative probability distribution P is obtained according to the relative probability distributionC1k。
4. The virtual driving scene element regrouping method according to claim 1,
in step 7), if the number of sample points is less than the range of the cumulative probability, or the number of sample points is less than 5 in the range of the cumulative probability, all the sample points are extracted.
5. The virtual driving scene element regrouping method according to claim 1,
step 9), all continuous elements corresponding to each discrete element combination are formed into a sampling matrix C1sThe correlation control is carried out by adopting Cholesky decomposition method,
first, randomly generating an and C1sThe same row and column number sequence matrix R, each column of R is formed by random arrangement of integers; then calculating the Spearman correlation coefficient matrix CM of RRPerforming Cholesky decomposition to obtain a decomposed lower triangular matrix Q as shown in a formula (2);
then, for C1sSpearman correlation coefficient matrix CMinCholesky decomposition is also performed to obtain a lower triangular matrix P, and then an output order matrix G is calculated according to equation (3) and C is applied1sThe data of each column in the sequence G is reordered according to the sequence of the data of each column in the sequence G, and the reordered data C1s' that is, the data after being controlled,
CMR=Q*QT (2)
G=P*Q-1*R (3)。
Background
In recent years, with the continuous development of intelligent driving systems for automobiles, testing for ensuring safety and compliance is more important, and automobiles must be fully tested and verified before being on the market to ensure that the intelligent control system can safely operate in a designed operation domain.
Compared with field test and road test, the virtual simulation test has high efficiency and safety and is an important test method and means. The establishment of a virtual scene capable of reflecting real traffic characteristics is the basis of simulation testing and is highly valued by researchers. An important method for constructing a virtual scene is to analyze and count scene key elements based on real traffic data, then sample and recombine different dimensional elements according to the probability characteristics of the scene elements, and obtain a random test sample similar to the real traffic scene and the random test sample.
The scenario approach for data resampling is the Monte Carlo sampling algorithm, such as M-H sampling and Gibbs sampling. However, the traditional monte carlo sampling algorithm needs a lot of simulation experiments to obtain samples with good effects, and when more approximate rate samples exist in the original data, aggregation is easy to occur. Latin hypercube sampling is a typical layered sampling method, sampling is carried out based on cumulative probability distribution, and a random sample set with better effect can be obtained through fewer iteration times.
According to research results such as Latin Hypercube Sampling Techniques for Power Systems reliable Energy resources, an electric vehicle load model based on improved nuclear density estimation and Latin Hypercube Sampling, probability trend calculation based on improved Latin Hypercube Sampling and the like, the Latin Hypercube Sampling generally comprises two steps of sample generation and correlation control. The basis of sample generation is generally probability modeling of random elements, but in practical engineering application, the distribution of random elements generally hardly conforms to the typical distribution function characteristics, and if a non-parametric estimation is used for establishing a probability density function, an inverse function of the cumulative probability distribution is difficult to obtain, and meanwhile, when the original data contains both continuous elements and discrete elements, the establishment of the probability distribution is difficult. In the aspect of correlation control, A distribution-free approach to indexing chain correlation input variables provides a Cholesky decomposition method, and the calculation efficiency can be greatly improved through matrix calculation, so that the method is an important method which is relatively mature.
In summary, how to generate samples based on actual data is a key basis for latin hypercube sampling.
Disclosure of Invention
Aiming at the problem that the existing Latin hypercube sampling can not effectively generate samples based on original data, the invention provides a virtual driving scene element recombination method based on an improved Latin hypercube sampling method based on the thought of nonparametric estimation.
The technical scheme adopted by the invention is as follows: a virtual driving scene element reorganization method based on improved Latin hypercube sampling is based on original data, the original data comprises discrete elements and continuous elements, and the method is characterized by comprising the following steps:
1) counting all discrete element combinations in the original data;
2) determining the sampling point position of the sampled discrete element combination in the accumulative probability distribution;
3) determining discrete element combination attributes of the samples;
4) determining the sampling times of each discrete element combination;
5) extracting continuous elements from the original data corresponding to each discrete element combination;
6) calculating the cumulative probability distribution of the continuous elements;
7) determining the sampling point position of the sampled continuous elements in the cumulative probability distribution according to the sampling times of the discrete element combination;
8) determining a continuous element value;
9) continuous element correlation control;
10) recombining the discrete elements and the continuous elements which are sampled completely to form virtual driving scene data;
wherein, in the step 1), the relative probability and the cumulative probability distribution of all discrete element combinations are counted;
in step 2) and step 3), the sampling point position of the discrete element combination of the sampling in the cumulative probability distribution is calculated based on the formula (1):
PSrepresents the cumulative probability corresponding to the discrete element combination in the s-th group of sampling data,
when 0 < PS≤P(D1) When P is determinedSCorresponding discrete element combination attribution is D1,
When P (D)i)<PS≤P(Di+1) When P is determinedSCorresponding discrete element combination attribution is Di+1,
DiRepresents the ith group of discrete element combination;
in step 7) and step 8), the sampling point positions of the sampled continuous elements in the cumulative probability distribution are calculated based on the formula (1), then the closest 5 sample points are respectively taken in the range smaller than the cumulative probability and in the range larger than the cumulative probability according to the cumulative probability distribution calculated in step 6), and then the values of the corresponding continuous elements are calculated by a cubic spline interpolation method.
Further, in step 4), the sampling frequency of each discrete element combination is determined according to the discrete element combination attribute and probability statistics.
Further, in step 6), the cumulative probability distribution of the continuous elements is counted based on the discrete element combination classification, and the method is as follows:
extracting the kth continuous element data from the original data corresponding to a certain discrete element combination to form a sample set C1k(ii) a Then sample set C1kThe continuous elements in (1) are sorted according to ascending order of values, repeated values are removed, and the sample set C is obtained again1k-unique;
Analysis C1k-uniqueIf the interval of some adjacent value is greater than the average value of all intervals, inserting a new sample point in the interval of the adjacent value, and obtaining a new sample set C again1k-unique', the spacing of all adjacent values in the new sample set is less than the average spacing;
based on C1k-unique', all values of the k-th continuous element are counted to be C1kThe relative probability distribution is obtained according to the times of occurrence, and the cumulative probability distribution P is obtained according to the relative probability distributionC1k。
Further, in step 7), if the number of sample points is less than the range of the cumulative probability, or is greater than the range of the cumulative probability by no more than 5, all the sample points are extracted.
Further, in step 9), a sampling matrix C is formed for all the continuous elements corresponding to each discrete element combination1sThe correlation control is carried out by adopting Cholesky decomposition method,
first, randomly generating an and C1sThe same number of rows and columns of the sequential matrix R, R is composed of an integer for each columnThe numbers are randomly arranged; then calculating the Spearman correlation coefficient matrix CM of RRPerforming Cholesky decomposition to obtain a decomposed lower triangular matrix Q as shown in a formula (2);
then, for C1sSpearman correlation coefficient matrix CMinCholesky decomposition is also performed to obtain a lower triangular matrix P, and then an output order matrix G is calculated according to equation (3) and C is applied1sThe data of each column in the sequence G is reordered according to the sequence of the data of each column in the sequence G, and the reordered data C1s' that is, the data after being controlled,
CMR=Q*QT (2)
G=P*Q-1*R (3)。
the invention adopts an improved Latin hypercube sampling method to recombine the elements of the virtual driving scene, in the method, the probability distribution statistics of continuous elements is carried out on the basis of the combination condition of discrete elements; ensuring that a data missing region is effectively jumped out through interpolation of continuous elements; based on cubic spline interpolation method, the position of a sampling point is determined according to probability density, and on the basis of reproducing the characteristics of an original probability distribution function, the Cholesky decomposition method is adopted to ensure the correlation between elements, thereby greatly improving the calculation efficiency on the premise of ensuring the sampling effectiveness. The method analyzes and counts scene key elements based on real data, then samples and recombines different dimensional elements according to the probability characteristics of the scene elements to obtain random test samples with probability distribution similar to that of a real traffic scene, has good robustness, and is very suitable for testing and evaluating the intelligent driving system of the automobile.
Drawings
The drawings are only for purposes of illustrating particular embodiments and are not to be construed as limiting the invention.
FIG. 1 is a graph of relative probability distribution of discrete element combinations according to an embodiment of the present invention;
FIG. 2 is a cumulative probability distribution diagram of a discrete element combination according to an embodiment of the present invention;
FIG. 3 is a graph of the locations of sample points in the cumulative probability distribution for discrete element combinations according to an embodiment of the present invention;
FIG. 4 is a graph of sample point locations in the cumulative probability distribution for successive elements according to an embodiment of the present invention.
Detailed Description
The technical solution of the present invention is further illustrated by the accompanying drawings and examples.
Taking the adjacent vehicle cut-in working condition as an example, assuming that the collected original data of the cut-in scene of the adjacent vehicle has 1000 groups, each group of data comprises 5 elements, and a matrix X is used for borrowingm*nWhen m is 1000, the number of groups of data (rows of the matrix) is represented, and when n is 5, the number of types of elements (columns of the matrix) is represented.
In this embodiment, it is assumed that 2 of the 5 elements are discrete elements, located in the first two columns of the matrix, and are weather and road grades, respectively, and specific element values are expressed; the other 3 elements are continuous elements which are positioned in the rear 3 columns of the matrix and are respectively the main vehicle speed, the target vehicle speed and the relative longitudinal distance of the adjacent vehicle at the cut-in time, and the specific element values are expressed.
Assuming that 100 groups of data are randomly extracted from the 1000 groups of original data to be recombined to construct a typical virtual driving scene, the invention adopts an element recombination method for improving Latin hypercube sampling, which comprises the following steps:
1) step of statistical discrete element combination
The step is to count all discrete element combination forms in the original data, and then calculate the relative probability and the cumulative probability of the discrete element combination. The method comprises the following steps:
firstly, based on original data, taking out all discrete elements to form a sample set, counting all different discrete element combination forms, and supposing that the coexistence of [ D ] exists1(rainy day, expressway), D2Fine, expressway, D3Not (Qinglang, city road)]Three discrete element combination types;
then, the times of the appearance of different discrete element combinations in the original data are counted to obtain the relative probability distribution, and the relative probability distribution of the three discrete element combinations is assumed to be p (D)1)=0.2,p(D2)=0.5,p(D3)=0.3]As shown in fig. 1;
then, based on the relative probability distribution, calculating the cumulative probability distribution, where the cumulative probability distribution of the combination of the three discrete elements is [ P (D)1)=0.2,P(D2)=0.7,P(D3)=1]As shown in fig. 2.
2) Step for determining discrete element combination sampling point
This step is to determine the location of the sample point where the discrete elements are combined in the cumulative probability distribution. According to the preset sampling number, 100 groups of data are randomly extracted from 1000 groups of original data, and the sampling point position of the discrete element combination in the 100 groups of data in the cumulative probability distribution is determined.
The sampling point position of each discrete element combination in the cumulative probability distribution is determined based on equation (1), as shown in fig. 3:
wherein, PSRepresents the cumulative probability corresponding to the discrete element combination in the s-th group of sample data, wherein s is more than or equal to 1 and less than or equal to the sample number, and the sample number is 100 in the embodiment.
3) Step of determining discrete element combination attributes
According to the position (P) of the sampling point of the discrete element combination in the cumulative probability distribution in each group of dataSDecision), determining the attributes of each discrete element combination, for P (D)i)<PS≤P(Di+1) Is (1 is not less than i not more than 2), determining PSThe corresponding discrete element combination should be attributed to Di+1For 0 < PS≤P(D1) Determining P at the sampling pointSThe corresponding discrete element combination should be attributed to D1。
4) Step for determining discrete element combination sampling times
And according to the discrete element combination attribute determined in the last step, determining the sampling times of each discrete element combination through statistics. Of the above 100 sets of data, D1The number of sampling times of (2) is 20, D2The number of sampling times of (a) is 50,D3the number of sampling times of (2) is 30, and is marked as [ SN ]1=20,SN2=50,SN3=30]。
5) Step of continuous element extraction
This step extracts all relevant continuous elements from the raw data corresponding to each set of discrete elements. Combining D with discrete elements1For example, from D1All continuous elements are taken out from the corresponding original data to form a sample set C1According to relative probability, D1The corresponding raw data has 200 groups, C1There are 200 sets of continuous element data (2) as basic data for continuous element sampling.
6) Step of calculating cumulative probability of continuous elements
From C1The kth continuous element data (e.g., the host vehicle velocity) is extracted to form a sample set, denoted as C1k,C1kThere are also 200 sets of raw data (according to the present embodiment, there are three types of continuous elements, so k is labeled with three values, respectively). Sampling the kth continuous element sample set C1kThe continuous element values in (1) are sorted according to ascending order, repeated values are removed, and the sample set C is obtained again1k-unique. In order to make the sampling algorithm effectively handle the condition of data scattered distribution, C is analyzed1k-uniqueIf the interval of some adjacent value is greater than the average value of all intervals, inserting a new sample point in the interval of adjacent value to make the newly-generated C1k-unique' any adjacent value is less than the average interval. Based on the final C1k-unique', all values of the k-th continuous element are counted to be C1kThe relative probability distribution is obtained according to the times of occurrence, and the cumulative probability distribution P is obtained according to the relative probability distributionC1k。
7) Step of determining successive element sampling points
According to 100 groups of sampling data of the discrete element combination, the discrete element combination D1Is SN1The number of samples corresponding to the kth continuous element should also be 20.
Among these sampled dataThe sampling point positions of the continuous elements in the cumulative probability distribution are determined based on the mode of the formula (1). Cumulative probability P for a certain sample point calculatedSk:
Sample number 20
According to a sample set C of continuous elements1kCumulative probability distribution P ofC1kIn [ P ]C1k<PSk]Searching the cumulative probability distribution and P within the range ofSkNearest C1k-uniqueIf the number of the 5 sample points is less than 5, selecting all the sample points meeting the conditions;
in a similar way, in [ P ]C1k≥PSk]Searching the cumulative probability distribution and P within the range ofSkNearest C1k-uniqueIf the number of the sample points is less than 5, all the sample points meeting the condition are selected.
8) Step of determining values of successive elements
Calculating P by cubic spline interpolation method according to the found 10 sample points (less than 10 are all) and the cumulative probability distribution thereofSkThe values of the corresponding consecutive elements are shown in fig. 4.
Based on the above steps 6) to 8), C is calculated1All values of three consecutive elements in the sampled data (20 sets).
9) Step of continuous element correlation control
As in the above examples, for C1The sampling matrix C is formed by the sampling data of the three continuous elements1s(line 20, column 3), correlation control was performed using Cholesky decomposition.
First, a sequential matrix R of 20 rows and 3 columns is randomly generated, where each column of R is composed of [1,20 ]]The integers are randomly arranged; then calculating the Spearman correlation coefficient matrix CM of RRAnd performing Cholesky decomposition to obtain a decomposed lower triangular matrix Q, which is shown in formula (2).
Then, for C1sSpearman's phase relationNumber matrix CMinCholesky decomposition is also performed to obtain a lower triangular matrix P, and then an output order matrix G is calculated according to equation (3) and C is applied1sThe data of each column in the sequence G is reordered according to the sequence of the data of each column in the sequence G, and the reordered data C1s' this is the controlled scene element data.
CMR=Q*QT (2)
G=P*Q-1*R (3)
C1And C1sThe Spearman correlation coefficient matrixes are respectively shown in table 1 and table 2, and it can be seen that the correlation maintaining effect is good after sampling and recombining the elements with strong correlation.
TABLE 1 correlation coefficient matrix of raw data
Speed of main vehicle
Target vehicle speed
Time of cut-in
Speed of main vehicle
1
0.9001
0.1752
Target vehicle speed
0.9001
1
-0.0134
Time of cut-in
0.1752
-0.0134
1
TABLE 2 Recombinat data correlation coefficient matrix
Speed of main vehicle
Target vehicle speed
Time of cut-in
Speed of main vehicle
1
0.8679
0.2049
Target vehicle speed
0.9001
1
-0.0626
Time of cut-in
0.2049
-0.0626
1
Above is only the groupIn combination of discrete elements D1Performing correlation control on continuous elements, and performing correlation control on discrete element combination D2、D3And carrying out correlation control on the continuous elements by adopting the processes of the steps 5) to 8).
10) Step of recombining all elements
Finally, based on the obtained sampled data of all discrete elements and continuous elements, 100 groups of data samples containing both discrete elements and continuous elements are obtained.