Static IP building-level positioning method based on IP historical positions
1. A static IP building-level positioning method based on IP historical positions is characterized by comprising the following steps: comprises that
Step 1, collecting and cleaning historical datum point data of an IP (Internet protocol) by using a big data processing technology;
step 2, screening historical datum point data of the static IP;
and 3, clustering the historical datum point data of the IP by using a clustering algorithm to realize the positioning of the static IP.
2. The IP historical location based static IP building level positioning method of claim 1 wherein: in the step 1, a distributed data acquisition platform is built by utilizing a big data processing technology, customized acquisition strategies are adopted for different data sources, and historical datum data at least comprising a WHOIS type, a host name type, a mobile APP type and a website WEB type are acquired.
3. The IP historical location based static IP building level positioning method of claim 1 wherein: in step 1, a specific cleaning rule is set according to the source of the data characteristics, and the initially acquired historical datum point data is cleaned and filtered to obtain effective datum point data.
4. The IP historical location based static IP building level positioning method of claim 1 wherein: in step 2, screening historical position data with an application scene of a static IP type from the historical reference point data obtained by cleaning according to the distribution characteristics of the historical reference point data; the static IP type application scenarios include at least school and enterprise lines.
5. The IP historical location based static IP building level positioning method of claim 1 wherein: in step 3, clustering algorithm of clustering by distance is used for each IP, historical datum point data of each IP are clustered, and clustering results are represented by longitude and latitude of a central position and corresponding radius, so that building-level positioning of static IPs is realized.
6. The IP historical location based static IP building level positioning method of claim 5 wherein: the clustering algorithm at least comprises one or more of a K-MEANS algorithm, a DBSCAN algorithm and a mean shift clustering algorithm.
Background
In recent years, an IP address positioning technology is receiving more and more attention, and now, location-based services in the internet industry have become a trend, and network applications based on geographic locations are in the endlessly, and IP address positioning has been widely applied to the fields of network security, online security payment, big data analysis, anti-fraud wind control, big data credit investigation and the like. High-precision IP address positioning technology is increasingly important in the field of Internet. Many research institutes and scholars have conducted systematic studies on how to improve the positioning accuracy of the IP address positioning technology, application scenarios, and other different issues.
At present, IP positioning products mostly position the IP to a wide or specific geographic position, the positioning accuracy is national, provincial, city or street, building level positioning cannot be achieved aiming at static IP, and the problems of old data, coarse positioning granularity (mostly only reaching the city level) and the like exist.
Disclosure of Invention
In order to solve the problems in the background art, the invention provides a static IP building-level positioning method based on IP historical positions.
A static IP building-level positioning method based on IP historical positions comprises
Step 1, collecting and cleaning historical datum point data of an IP (Internet protocol) by using a big data processing technology;
step 2, screening historical datum point data of the static IP;
and 3, clustering the historical datum point data of the IP by using a clustering algorithm to realize the positioning of the static IP.
Based on the above, in step 1, a distributed data acquisition platform is built by using a big data processing technology, and historical datum data at least including WHOIS types, host name types, mobile APP types and WEB types are acquired by adopting a customized acquisition strategy for different data sources.
Based on the above, in step 1, a specific cleaning rule is set according to the source of the data feature, and the initially collected historical reference point data is cleaned and filtered to obtain effective reference point data.
Based on the above, in step 2, from the historical reference point data obtained by cleaning, according to the distribution characteristics of the historical reference point data, screening historical position data of which the application scene is a static IP type; the static IP type application scenarios include at least school and enterprise lines.
Based on the above, in step 3, for each IP, a clustering algorithm that clusters by distance is used to cluster the historical reference point data of each IP, and the clustering result is represented by longitude and latitude of a central position and a corresponding radius, so as to realize building-level positioning of static IPs.
Based on the above, the clustering algorithm at least comprises one or more of a K-MEANS algorithm, a DBSCAN algorithm and a mean shift clustering algorithm.
Compared with the prior art, the method has outstanding substantive characteristics and remarkable progress, and particularly, the method combines the network characteristics and the geographic characteristics of the IP, uses a clustering algorithm to cluster the historical position information of the static IP to form longitude and latitude of a central position and a corresponding radius, and realizes the building-level positioning of the static IP.
Drawings
FIG. 1 is a graphical illustration of historical fiducial point distribution and clustering results for a static single IP in accordance with an embodiment of the present invention.
In the figure, 1) the dots in the shape of inverted drops represent historical reference point data; 2) the circles represent the clustering results.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be obtained by a person skilled in the art without inventive effort based on the embodiments of the present invention, are within the scope of the present invention.
As shown in fig. 1, a static IP building-level positioning method based on IP historical locations is described by taking the processing of a static IP as an example.
Step 1, historical datum point data of the IP are collected and cleaned by utilizing a big data processing technology.
Firstly, a high-efficiency and distributed data acquisition platform is built by utilizing a big data processing technology, customized acquisition strategies are adopted for different data sources, and datum point data of types such as a WHOIS type, a host name type, a mobile APP type, a website WEB type and the like are acquired. Aiming at the specified IP, acquiring corresponding WHOIS data and related WEB data by utilizing a big data processing technology, and converting geographical position information in the WHOIS data and the WEB data into longitude and latitude information through map service to form historical datum data of a WHOIS type and a WEB type; meanwhile, historical position data of the IP are extracted from the collected APP type data to form historical datum point data of the APP type.
And secondly, cleaning and filtering the initial reference point data according to the obtained historical reference point data and the data characteristics of different sources by using a specific cleaning rule to obtain effective reference point data of the specified IP. For example, for the WHOIS type reference point, the availability of the WHOIS reference point is determined based on the index such as the type of the IP registration mechanism, the region, the number of times of change of the history information, and the registration time, and the reference point satisfying the requirements in the registration mechanism type, the region, the number of times of change of the history information, and the registration time is selected.
And 2, screening historical datum point data of the static IP.
And screening historical position data with an application scene being a static IP type from the historical datum points obtained by cleaning according to the distribution characteristics of the historical datum points. The application scene of the static IP type comprises school units, enterprise lines and the like, and the geographic distribution characteristics of the reference points are as follows: in a single IP historical reference point set, the historical reference points of different IPs in adjacent IP sections or even in one IP section are distributed relatively independently in geography. And screening the static IP according to the characteristics in the single IP historical reference point set, such as the reference point distribution in the figure 1.
And 3, clustering the historical datum point data by using a clustering algorithm to realize the positioning of the static IP.
And aiming at the static IP, clustering the historical datum point data by using a DBSCAN clustering algorithm, and expressing a clustering result by using longitude and latitude of a central position and a corresponding radius to realize the building-level positioning of the static IP. In other embodiments, clustering algorithms such as a K-MEANS algorithm, a DBSCAN algorithm, a mean shift clustering algorithm and the like can be adopted.
It will be evident to those skilled in the art that the invention is not limited to the details of the foregoing illustrative embodiments, and that the present invention may be embodied in other specific forms without departing from the spirit or essential attributes thereof. The present embodiments are therefore to be considered in all respects as illustrative and not restrictive, the scope of the invention being indicated by the appended claims rather than by the foregoing description, and all changes which come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein.