Method for determining traffic distribution impedance function based on mobile phone signaling data
1. A method for determining a traffic travel distribution impedance function based on mobile phone signaling data is characterized by comprising the following steps:
s1, obtaining travel OD data of the whole city at the early peak time period by using the mobile phone signaling data; obtaining travel OD of the travel starting point in each subarea in ArcGIS;
s2, calling a Goodpasture map API, acquiring travel distance and travel times data, and calculating travel probability;
s3, selecting an impedance function to perform regression on the travel distance and the travel probability in different regions by using a non-linear regression tool of the sps to obtain a parameter and goodness of fit R square, comparing the R square, selecting a function with a higher R square, and calculating the sum of squares of errors;
s4, calculating the sum of squares of errors, carrying out comparative analysis on different functions in the same region, and then carrying out comparative analysis on different regions of the same function;
and S5, performing piecewise fitting on the function according to the fitting result and the region travel distance piecewise pair to obtain a final fitting function.
2. The method for determining a traffic travel distribution impedance function based on mobile phone signaling data according to claim 1, wherein the step S1 includes the following steps:
s11, after the obtained mobile phone signaling data are preprocessed, a user parking point is identified based on the station parking time and the service radius, when the user stays within the service radius threshold D with a certain station as the center for a time period exceeding a time threshold T, the station is used as the parking point of the user, a trip OD is obtained according to the trip parking point, and the departure time is extracted to obtain the trip OD of the early peak;
and S12, displaying the position in ArcGIS according to the longitude and latitude coordinates of the travel starting point in the travel OD obtained in the S11, importing a map file of the subarea, and sequentially selecting O points in the refined area range through a position selection tool to obtain the subarea travel OD.
3. The method for determining a traffic travel distribution impedance function based on mobile phone signaling data according to claim 1, wherein the step S2 includes the following steps:
s21, crawling the God navigation planning path API according to the coordinates of the travel starting point O and the travel destination D to obtain travel distance data of each OD travel, wherein the unit is as follows: km, matching with corresponding OD trip times in the mobile phone signaling data to obtain trip distances and trip times data corresponding to each OD of the urban area and each partition;
and S22, counting the number of people in each interval at a distance interval of 1km, and calculating the corresponding travel probability of each region, wherein the travel probability is the number of people in travel/the total number of people in travel.
4. The method for determining a traffic travel distribution impedance function based on mobile phone signaling data according to claim 1, wherein the step S3 includes the following steps:
s31, selecting five function forms of a power function, an exponential function, a composite function, a Rayleigh function and a general traffic impedance function to perform regression analysis on the travel distance and the travel probability acquired in the step S2, wherein the function forms are as follows:
power function:
exponential function:
the composite function:
rayleigh function:
general traffic impedance function:
in the above formulas:
tij: the travel distance between the mobile phone base stations i and j; α, β and γ are parameters of the traffic impedance function;
s32, opening SPSS software, importing the data of the full-market trip distance and trip probability obtained in the step S1, selecting and analyzing regression-nonlinear regression, inputting each impedance function to fit the data in sequence to obtain parameter regression results of the city and each subarea;
and S33, comparing the distribution curve of the actual travel distance with the regression simulation curve, comparing the fitting goodness R square, selecting a function with a higher R square, and calculating the error sum of squares.
5. The method for determining a traffic travel distribution impedance function based on mobile phone signaling data according to claim 4, wherein the step S4 includes the following steps:
s41, calculating the sum of the squares of the errors of the three functions with higher R square in S3, wherein the formula of the sum of the squares of the errors is as follows:
in the formula: tij: the travel distance between the mobile phone base stations i and j; f (t)ij) Is a value calculated from the corresponding impedance function;the average value of the observed values is also the average value of the trip probability;
s42, drawing an actual travel distance, a travel probability distribution curve and a regression fitting curve of the same region in the same coordinate system, and respectively analyzing the fitting effect of different functions of each region;
and S43, representing the actual travel distances and the probability curves of different areas in the same coordinate system, analyzing the characteristics of the travel distances between the urban area and each area, then representing each area in the same coordinate system according to the simulation results of the three functions with higher R-square, and analyzing the simulation results of the three functions in different distance ranges.
6. The method for determining a traffic travel distribution impedance function based on mobile phone signaling data according to claim 1, wherein the step S5 includes the following steps:
s51, taking the urban area range as an example, selecting distance segments according to the analysis result of S4, respectively fitting the travel distance and the travel probability of each segment, and finally obtaining a corrected impedance function;
and S52, checking the actual value of the impedance function and the model, and calculating the mean square error.
Background
The traffic travel distribution is an important step in traffic travel demand prediction, the existing traffic distribution prediction model mainly comprises an increase coefficient method, a gravity model method, a probability distribution model and the like, compared with other distribution prediction models, the gravity model method comprehensively considers the social and economic growth influence factors of the areas with bad travel distribution among cells and travel between traffic cells such as time and distance impedance and the like, and the method is the most widely used traffic distribution prediction method in domestic and foreign traffic planning. Before the traffic distribution prediction of the gravity model is carried out, the model needs to be calibrated, wherein the important thing is the selection and calibration of an impedance function, the traditional distributed impedance basically refers to the investigation of the same city or small samples and takes the travel time or the travel distance as an impedance value, but considering the applicability of the model, the distributed impedance is not limited to such a simple value, and the more general impedance function form of the impedance factor is considered.
Disclosure of Invention
The invention provides a method for determining a traffic travel distribution impedance function based on mobile phone signaling data, aiming at obtaining travel OD data in a specific time interval aiming at the mobile phone signaling data of a large sample, obtaining travel probability by combining the travel distance obtained by a Goods map API, respectively selecting impedance functions in different forms by utilizing a nonlinear regression tool in SPSS, performing regression on the travel distance and the travel probability in different regions, then performing comparative analysis on different functions in a R side, a region and different regions of the same function respectively, finally selecting to perform piecewise fitting on the impedance functions according to the comparison result, analyzing the piecewise impedance function suitable for the city region by taking the city region as an example, and providing qualitative and quantitative basis for improving the prediction of the travel distribution model. Meanwhile, the characteristics of large sample size, wide coverage, mature and stable acquisition mode, low cost and the like of big data of the mobile phone are utilized, the accuracy of results is improved, the cost of the process is reduced, and the research efficiency is improved.
The technical scheme is as follows:
a method for determining a traffic distribution impedance function based on mobile phone signaling data comprises the following steps:
s1, obtaining travel OD data of the whole city at the early peak time period by using the mobile phone signaling data; obtaining travel OD of the travel starting point in each subarea in ArcGIS;
s2, calling a Goodpasture map API, acquiring travel distance and travel times data, and calculating travel probability;
s3, selecting an impedance function to perform regression on the travel distance and the travel probability in different regions by using a non-linear regression tool of the sps to obtain a parameter and goodness of fit R square, comparing the R square, selecting a function with a higher R square, and calculating the sum of squares of errors;
s4, calculating the sum of squares of errors, carrying out comparative analysis on different functions in the same region, and then carrying out comparative analysis on different regions of the same function;
and S5, performing piecewise fitting on the function according to the fitting result and the region travel distance piecewise pair to obtain a final fitting function.
The specific steps of step S1 are as follows:
s11, after the obtained mobile phone signaling data are preprocessed, a user parking point is identified based on the station parking time and the service radius, when the user stays within the service radius threshold D with a certain station as the center for a time period exceeding a time threshold T, the station is used as the parking point of the user, a trip OD is obtained according to the trip parking point, and the departure time is extracted to obtain the trip OD of the early peak;
and S12, displaying the position in ArcGIS according to the longitude and latitude coordinates of the travel starting point in the travel OD obtained in the S11, importing a map file of the subarea, and sequentially selecting O points in the refined area range through a position selection tool to obtain the subarea travel OD.
The specific steps of step S2 are as follows:
s21, crawling the God navigation planning path API according to the coordinates of the travel starting point O and the travel destination D to obtain travel distance data of each OD travel, wherein the unit is as follows: km, matching with corresponding OD trip times in the mobile phone signaling data to obtain trip distances and trip times data corresponding to each OD of the urban area and each partition;
and S22, counting the number of people in each interval at a distance interval of 1km, and calculating the corresponding travel probability of each region, wherein the travel probability is the number of people in travel/the total number of people in travel.
The specific steps of step S3 are as follows:
s31, selecting five function forms of a power function, an exponential function, a composite function, a Rayleigh function and a general traffic impedance function to perform regression analysis on the travel distance and the travel probability acquired in the step S2, wherein the function forms are as follows:
power function:
exponential function:
the composite function:
rayleigh function:
general traffic impedance function:
in the above formulas:
tij: the travel distance between the mobile phone base stations i and j; α, β and γ are parameters of the traffic impedance function;
s32, opening SPSS software, importing the data of the full-market trip distance and trip probability obtained in the step S1, selecting and analyzing regression-nonlinear regression, inputting each impedance function to fit the data in sequence to obtain parameter regression results of the city and each subarea;
and S33, comparing the distribution curve of the actual travel distance with the regression simulation curve, comparing the fitting goodness R square, selecting a function with a higher R square, and calculating the error sum of squares.
The specific steps of step S4 are as follows:
s41, calculating the sum of the squares of the errors of the three functions with higher R square in S3, wherein the formula of the sum of the squares of the errors is as follows:
in the formula: tij: the travel distance between the mobile phone base stations i and j; f (a), (b)tij) Is a value calculated from the corresponding impedance function;the average value of the observed values is also the average value of the trip probability;
s42, drawing an actual travel distance, a travel probability distribution curve and a regression fitting curve of the same region in the same coordinate system, and respectively analyzing the fitting effect of different functions of each region;
and S43, representing the actual travel distances and the probability curves of different areas in the same coordinate system, analyzing the characteristics of the travel distances between the urban area and each area, then representing each area in the same coordinate system according to the simulation results of the three functions with higher R-square, and analyzing the simulation results of the three functions in different distance ranges.
The specific steps of step S5 are as follows:
s51, taking the urban area range as an example, selecting distance segments according to the analysis result of S4, respectively fitting the travel distance and the travel probability of each segment, and finally obtaining a corrected impedance function;
and S52, checking the actual value of the impedance function and the model, and calculating the mean square error.
The invention has the advantages of
The main data source of the invention is mobile phone signaling data, which has the characteristics of high sample volume, low cost and wide coverage range, and the acquisition mode is stable and mature, and the invention can record the time-space information of the user activity track more completely, and is a high-quality data source for urban traffic analysis. According to the method, early peak trip OD data are obtained by utilizing mobile phone signaling data, trip probability is obtained by combining trip distance obtained by a Goods map API, impedance functions in different forms are respectively selected by utilizing a nonlinear regression tool in SPSS, the trip distance and the trip probability are regressed in different regions, then comparison analysis is respectively carried out on different functions in an R direction, a region and the same function in different regions, the impedance functions are selected to be subjected to piecewise fitting according to comparison results, piecewise impedance functions are obtained, and qualitative and quantitative bases are provided for improving trip distribution model prediction. The method has universal applicability to different cities, is high in accuracy, and plays an important role in predicting the urban traffic trip accuracy.
Drawings
FIG. 1 is a flow chart of the method of the present invention
FIG. 2 is a schematic diagram of analysis region division in the embodiment
FIG. 3 is a table diagram of travel distance and travel probability in the embodiment
FIG. 4 is a graph of the actual probability in the example
FIG. 5 is a graph showing the distribution of the impedance between the actual value and the model value in the market range
FIG. 6 is a graph of impedance distribution of actual value and model value of central city area range in the embodiment
FIG. 7 is a graph of impedance distribution of the east vice-center range of the embodiment with the model value
FIG. 8 is a graph of impedance distribution of the actual value and model value of the western sub-center range in the example
FIG. 9 is a graph showing the impedance distribution of the actual value and model value of the flower bridge city range in the embodiment
FIG. 10 is a diagram illustrating an actual travel distance distribution in the embodiment
FIG. 11 is a graph of the simulated row distance distribution of the complex function in the example
FIG. 12 is a graph of the simulated distance distribution of the Rayleigh function in the example
FIG. 13 is a graph of the simulated row distance distribution of the general impedance function in the example
FIG. 14 is a graph showing the fitting results of the unsegmented impedance function in the examples
FIG. 15 is a graph showing the fitting results of the piecewise impedance function in the example
Detailed Description
The invention is further illustrated by the following examples, without limiting the scope of the invention:
as shown in fig. 2, in the example, the mobile phone data of a certain day in 2019 of kunshan in 5 months is used as a sample, and the research unit is partitioned according to the spatial structure of the kunshan city, and the method specifically includes: the city core area, the east sub-center, the west sub-center and the flower bridge business city.
The method flow chart provided by combining the figure 1 comprises the following specific steps:
step S1 is to obtain travel OD data at the early peak time of the whole city area by using the mobile phone signaling data, and obtain the travel OD of the travel starting point in each sub-area in ArcGIS, which is preferably:
s11, after the obtained mobile phone signaling data are preprocessed, a user parking point is identified based on the station parking time and the service radius, when the user stays within the service radius threshold D with a certain station as the center for a time period exceeding a time threshold T, the station is used as the parking point of the user, a travel OD is obtained according to the travel parking point, and the departure time is extracted to obtain the travel OD of the early peak. In this case, the time threshold T is 40 min.
S12, displaying the position in ArcGIS according to the longitude and latitude coordinates of the travel starting point obtained in S11, importing the map file of the subarea, and sequentially selecting O points in the thinning area range through a position selection tool to obtain the travel OD of the subarea
Step S2 calls the high map API, obtains the trip probability from the obtained trip distance, and calculates the trip probability, which is preferably:
s21, crawling the God navigation planning path API according to the coordinates of the travel starting point O and the travel destination D to obtain travel distance data (unit: m) of each OD travel, and matching the travel distance data with corresponding OD travel times in the mobile phone signaling data to obtain travel distance data and travel time data corresponding to each OD of the city area and each partition.
And S22, counting the number of people in each interval at a distance interval of 1km, and calculating the corresponding travel probability of each region, wherein the travel probability is the number of people in travel/the total number of people in travel. Fig. 3 is a trip distance correspondence probability table.
And S3, selecting an impedance function to perform regression on the travel distance and the travel probability in different regions by using a non-linear regression tool of the sps to obtain parameters and a goodness-of-fit R square, comparing the R squares, selecting a function with a higher R square, and calculating the sum of squares of errors.
Preferably, the method specifically comprises the following steps:
s31, selecting five function forms of a power function, an exponential function, a composite function, a Rayleigh function and a general traffic impedance function to perform regression analysis on the travel distance and the travel probability acquired in the step S1, wherein the regression analysis parameters are all parameters;
and S32, opening SPSS software, importing the data of the urban area trip distance and trip probability obtained in the step S1, selecting and analyzing- > regression- > nonlinear regression, inputting each impedance function, and fitting the data in sequence to obtain the parameter regression results of the urban area and each subarea. Parametric results are shown in the following table:
TABLE 1 regression summary of traffic impedance function parameters
TABLE 2 regression summary of traffic impedance function parameters (II)
S33, comparing the distribution curve of the actual travel distance with the regression simulation curve, comparing the goodness of fit R square,
it can be seen by adopting two most commonly used traffic impedance function regression results for each region of the Kun-shan city that the fitting degree of the power function and the exponential function to the trip distance of the Kun-shan mountain is very low, the R-squares of the power function and the exponential function are both lower than 0.5, and the actual probability curve is shown in fig. 4 by comparing the statistical data of the actual trip conditions of the Kun-shan city region and each region. Therefore, it is not reasonable to consider a single power function or exponential function as a traffic impedance function at ordinary times, which is not in accordance with the actual situation. The R-side obtained by the composite function, the Rayleigh function and the general impedance function is higher, the fitting degree of the function model is higher, and the practical traveling conditions of the Kun-shan city area and each area are basically met. In order to further analyze and compare the accuracy of the three functions, the simulation data of the three functions are compared and analyzed according to the actual situation, and further discussion is carried out on the fitting degree and the error situation.
Step S4, calculating the sum of squares of errors, performing comparative analysis on different functions in the same region, and performing comparative analysis on different regions of the same function, preferably:
step S41, calculating the error square sum of the three functions;
s42, drawing an actual travel distance, a travel probability distribution curve and a regression fitting curve of the same region in the same coordinate system, and respectively analyzing the fitting effect of different functions of each region;
and comparing the distribution curve of the actual travel distance with the regression simulation curve, and calculating the error square sum so as to verify the reliability of the travel distribution model.
Checking of urban range travel distance distribution
The impedance distribution curve of the actual value and the model value in the city range is shown in FIG. 5, and the fitting degree and the error value of each simulation function in the city range are shown in Table 3.
TABLE 3 degree of fit and error value of each simulation function in the city region
Complex function
Rayleigh function
General impedance function
R side
0.934
0.849
0.949
Sum of squares of errors
0.023%
0.052%
0.018%
The R squares of the three types of functions in the SPSS regression result are respectively 0.934, 0.849 and 0.949, and the fitting degree of the general impedance function is the highest. The sum of the squares of the average synthesized impedance values for various purposes calculated from the calibration parameters and the error of the investigated average synthesized impedance is small, wherein the sum of the squares of the errors of the general impedance functions is minimum, thereby showing that the parameters calibrated by the general impedance functions are more accurate.
Checking trip distance distribution of each region
The impedance distribution curve diagram of the actual value and the model value of the central urban area range is shown in figure 6, and the fitting degree and the error value of each simulation function of the central urban area are shown in table 4.
TABLE 4 degree of fit and error value of each simulation function in central urban area
Complex function
Rayleigh function
General impedance function
R side
0.923
0.837
0.939
Sum of squares of errors
0.027%
0.057%
0.021%
The R squares of the three types of functions in the SPSS regression result are respectively 0.923, 0.837 and 0.939, and the fitting degree of the general impedance function is the highest. Compared with an actual curve, the travel distance distribution simulation curve in the central urban area range has the advantages that the sum of squares of errors of the composite function and the general impedance function is small and has little difference, the sum of squares of errors of the general impedance function is minimum, and a simulation result is most accurate.
East auxiliary center travel distance distribution checking
The impedance distribution curve of the actual value and the model value of the east region center range is shown in FIG. 7, and the fitting degree and the error value of each simulation function of the east region center are shown in Table 5.
TABLE 5 simulation function fitting degree and error value of east vice center
Complex function
Rayleigh function
General impedance function
R side
0.934
0.846
0.950
Sum of squares of errors
0.025%
0.057%
0.019%
The R squares of the three types of functions in the SPSS regression result are respectively 0.934, 0.846 and 0.950, and the fitting degree of the general impedance function is the highest. The least square sum of the traveling distribution errors in the range of the east auxiliary center is a general impedance function, and the simulation result is most accurate.
Western subsidiary center travel distance distribution checking
The impedance distribution curve of the actual value and the model value of the western center range is shown in FIG. 8, and the fitting degree and the error value of each simulation function of the western center range are shown in Table 6.
TABLE 6 degree of fit and error value of each simulation function of western centers
Complex function
Rayleigh function
General impedance function
R side
0.827
0.795
0.913
Sum of squares of errors
0.035%
0.041%
0.017%
The R-squares of the three types of functions in the SPSS regression results are 0.827, 0.795 and 0.913 respectively, and the fitting degree of the general impedance function is the highest. The least square sum of the traveling distribution errors in the western center range is a general impedance function, and the simulation result is the most accurate.
Check for travel distance distribution of flower bridge business city
The impedance distribution curve of the actual value and the model value of the Huaqiao Business city range is shown in FIG. 9, and the fitting degree and the error value of each simulation function of the Huaqiao Business city are shown in Table 7.
TABLE 7 simulation function fitting degree and error value of Huaqiao business city
Complex function
Rayleigh function
General impedance function
R side
0.932
0.864
0.950
Sum of squares of errors
0.021%
0.041%
0.015%
The R-squares of the three types of functions in the SPSS regression result are respectively 0.932, 0.864 and 0.950, and the fitting degree of the general impedance function is the highest. The least square sum of the traveling distribution errors in the flower bridge business city range is a general impedance function, and the simulation result is the most accurate.
S43, representing the actual travel distance and probability curves of different areas in the same coordinate system, analyzing the characteristics of the travel distance between the urban area range and each area, then representing each area in the same coordinate system according to the simulation results of a composite function, a Rayleigh function and a general impedance function, and analyzing the simulation results of the three functions in different distance ranges.
Actual travel distance distribution contrast
With reference to fig. 10, most of the travel distances between the urban area range and each area are concentrated in the travel range of 2-3km, but the travel distances in each area are concentrated in the range of 2-3km, and the concentration degrees of the travel distances are eastern subsidiary center, urban area range, central city area, flower bridge business city and western subsidiary center respectively; after the distance is more than 6km, the traveling probability of the western center becomes maximum, and the traveling probability of the Huaqiao business city becomes minimum along with further increase of the traveling distance; the western auxiliary center is in a strip shape in terms of space shape, so that the compactness of the base station is not high, and the space shape of the flower bridge business city is relatively round and has high compactness; after 15km, the trip probability of each region is reduced to be very low, and the difference is not great.
Comparison of traffic impedance function simulation results
With reference to fig. 11, in a trip distance range less than 5km, the composite function can better simulate a trip distance distribution situation, but after the trip distance is greater than 5km, trip distribution curves in the urban area range and in each region are almost overlapped, so that a difference and a change situation of the trip distance greater than 5km cannot be well simulated.
Referring to fig. 12, from the urban area range simulated by the rayleigh function and the function form of each area, the travel distance concentration range and the change condition of each area are relatively close to the actual travel distance distribution condition, but the overall simulated travel probability values are all lower than the actual travel probability values, and the error value range is within-0.02, so the simulation result of the rayleigh function cannot well reflect the travel probability condition.
With reference to fig. 13, the range of the urban area simulated by the general impedance function is closer to the concentration range of the travel distance of each area, and the variation situation is closer to the actual situation, and the simulated travel probability value and the variation situation of the probability value of each area increased along with the travel distance are very close to the actual situation, so the general impedance function can well simulate the distribution situation of the travel distances of the urban area range and each area.
Firstly, from comparison of simulation results of the same impedance function in different ranges, the general impedance function is better than the simulation results of a composite function and a Rayleigh function, so that a value of a trip probability can be more accurately simulated, and the trip probability of each region is better simulated along with the change of the trip distance; secondly, comparing the simulation and the actual situation of different impedance functions in the same area, the R square of a general impedance function is the highest, the error square sum is the smallest, the simulation result is the most accurate, and then the simulation result is a complex function and finally a Rayleigh function; finally, although the three traffic impedance functions all meet the calibration allowable error range, the impedance functions are most consistent with the actual situation and the model parameters are more reliable by combining the comparative analysis results.
However, all impedance functions generally have a problem, when the trip distance is large, the analog values of the functions are all lower than the actual impedance, so that the problem that the remote trip amount is inconsistent with the actual trip amount is caused, and based on the calibration of the functions, a segmented fitting method is adopted to correct the general impedance functions.
Step S5 is to select a suitable function according to the analysis fitting result, taking the urban area as an example, and perform piecewise fitting of the function according to the region travel distance piecewise pair to obtain a final fitting function, which is preferably:
and S51, taking the urban area range as an example, selecting distance segments according to the analysis result of S4, respectively fitting the travel distance and the travel probability of each segment, and finally obtaining the corrected impedance function.
During the regression of the impedance function, the whole city area and each secondary center are respectively fitted, the research aims at the research of the city area traffic model, and the impedance function regressed in the city area range is finally uniformly adopted for calculation. The partition fitting result can be used as a reference for regional traffic model research.
In general gamma functions in the urban area range, after the travel distance is greater than 7km, the analog value is far lower than the actual value (as shown in fig. 14), so that a piecewise fitting method is considered to correct the impedance function after the travel distance is greater than 7km (as shown in fig. 15), that is, the travel distance is 0-7km while the general gamma function is kept unchanged, and after the travel distance is greater than 7km, a power function is used to perform fitting of a second half function, so as to finally obtain the corrected impedance function, as follows:
wherein d isijIs the trip impedance, i.e. the trip distance value.
And S52, checking the actual value of the impedance function and the model, and calculating the mean square error.
The actual value and the analog value of the impedance function are checked, the function fitting degree is high, and the average value of the integral error square sum is 0.3%.
The specific embodiments described herein are merely illustrative of the spirit of the invention. Various modifications or additions may be made to the described embodiments or alternatives may be employed by those skilled in the art without departing from the spirit or ambit of the invention as defined in the appended claims.