What are the methods used by GeoIP services besides WHOIS info?
Such services usually use 3 ways to geolocate an IP address:
- Going through whois databases to search for an address;
- Tracking reverse DNS queries to try and find clues based on domain-name records or tracking the path of packet sent to the destination, which could also give clues (using traceroute, for example).
- And lastly, they use RTT triangulation.
Round-Trip Time (RTT) Triangulation is a method used to obtain the approximate geolocation of an IP address by measuring the ping latency from three different locations.
For example, if you have three servers spread across the world in the shape of a triangle, and if you ping an IP address from all the three and get the same results for latency, then that would mean that the IP address is located right in the centre of that triangle. It's the way triangulation works, however, in this case it is used with ICMP pings.
Resources you can read:
What is ping? @ Wikipedia
SIGCOMM paper about RTT triangulation
I'm the founder of IPinfo, so I can definitely offer some details around this! There's not one single method we use, or a single data source, to produce our own geolocation database (or any of our other data sets, like IP to company, or IP to carrier). It's a mix of a bunch of different data sets, data processing techniques, and lessons learned doing this for a several years now!
Some data sources and techniques not often mentioned include:
Direct feeds from ISPs. Our service handles around 500 million API requests a day, and it used on many popular high profile websites. Therefore ISPs are incentivized to provide us with accurate up-to-date geolocation data so that their customers get a great experience on the web. We're working directly with more and more ISPs all the time.
GPS location data. It's possible to collect precise location information with GPS on mobile devices. You can pair that with the IP address and some network topology inference to work out the location for IP ranges given just a few measurements.
User submitted corrections. When we do get the location wrong (or it hasn't been updated after a change) we'll often quickly get feedback from users, and can manually fix the location, or tweak our algorithm to ensure it's correctly located on the next run of our data processing pipeline.
For our IP to company data set we actually scrape every single domain name every month, and cross reference the data we extract there with IP ownership information, rwhois records and more. We then also use the domain scraping data to show what domains are hosted on what IP addresses, and also in our IP type classifier, along with many other data sources, to determine the probability of an IP address being primarily used as a residential ISP, business, or hosting provider. We also analyze the link structure of those pages, and show some of this data on host.io.