Converting Maxmind’s New Geoip Database Format to MySQL

A Huge Number of IP Addresses to Resolve

IP addresses can be resolved to their approximate locations using FreeGeoip.net which, with its cap of 10,000 addresses an hour, should be fine for most applications.  Sometimes though, there are reasons to host that information database locally, perhaps for speed or bandwidth reasons.   The database behind FreeGeoip.net is Maxmind’s Geoip database.  They make a free version of this available for download.  This post describes how to migrate the new Maxmind Geoip database to MySQL.

This has been done plenty of times before, but only with Maxmind’s older Geoip Legacy databases, and not their newer, richer Geoip2 databases.  For new applications, it makes sense to now use the newer databases (for future-proofing and data quality reasons), but there does not seem to be a description about how to do this out there on the Web.  So this post is meant to address that information gap.

The best and most useful description that I could find of how to migrate the Free Geoip legacy database to MySQL was a brief account written by Gregor Aisch, in 2013.  It was terse, but easy to get the gist.  What I needed to do was to create a way to import of the newer Maxmind database into MySQL and no one had written an account how how it was done, so I used this as a rough guide.

Gregor Aisch is a graphics editor at The New York Times with a passion for visualization. His blog can be found at http://driven-by-data.net/ 

Maxmind provides both versions of its free Geoip databases in 2 formats; a binary MaxMind format, and a much more accessible CSV format.  The approach documented here is to take the 2 CSV files and import them into a 2-table MySQL database.

The 2 tables can be created as part of an existing database, or an entirely new one.  My SQL code below shows the tables’ create statements for the new Geoip database.  The legacy database has far fewer columns, it also stores the IP address ranges as a range, with a start IP and an end IP.  These correspond to two columns in the ‘blocks’ database table.    The new database has only one column, with a special notation call IP/CIDR, which looks like

 

Each column corresponds exactly to a column in the CSV.  The two fields at the bottom of the blocks table, network and broadcast, do not correspond to the CSV.   More on these two important columns later.

Gregor’s gist contained the shell script below:

I found that the mysqlimport command did not work properly when executed as part of the shell script.  So, instead of using the shell  script, I downloaded, extracted, and renamed the files manually to location.csv and blocks.csv.  Then I typed this command for each file (arguements on separate lines for clarity).

The second file, blocks.csv, took much longer to import.

Once the new database tables were populated, a new problem became apparent.

A Key Difference Between The Geoip Database Versions

A key difference between the older and newer Geoip database versions was the way that they stored IP address information.  With the older database, it had been possible to ask the database whether an IP address existed within a particular IP address range.  That IP address range was stored as a start and an end IP number.  An SQL query could test the IP number as follows:

In the new database, the address range is stored in a single column, using the IP/CIDR notation.  This value contains the start and end IP addresses, but they need to be extracted, into the broadcast and network fields, which were left blank earlier.  The start and end IP addresses are stored as a number.  The IP is converted to an IP number using MySQL’s native inet_aton() function, and back to a dotted IP address using the inet_ntoa() function.  Storing the addresses as numbers makes it more efficient to query the database.

Using the IP and CIDR, it’s possible to calculate the broadcast address.  The broadcast address is the network IP, plus, 32 minus the CIDR, to the power of 2, minus 1.

So, I ran the following SQL update to populate the two columns.

Information about an IP can now be extracted from the newly imported Geoip database, using the following query!

References:

 

Cite this article as: Kane, David, "Converting Maxmind’s New Geoip Database Format to MySQL," in DavidKane.Net, June 18, 2015, http://davidkane.net/installing-new-geoip-database-sql-database/.

 

One Comment

  1. Ben says:

    This is a great resource thank you. Do you know how I would similarly handle the Maxmind IPv6 table, so that I could have a start IP and end IP range? The table formats are nearly identical. Have you dealt with the calculations to to create the IPv6 version of:

    update blocks set broadcast = (INET_ATON(substring_index(network_cidr, ‘/’, 1)) + (pow(2, (32-substr(network_cidr, instr(network_cidr, ‘/’)+1)))-1)), network = inet_aton(substring_index(network_cidr, ‘/’, 1));

    Thanks,
    Ben

Leave a Reply

Your email address will not be published. Required fields are marked *

Pin It on Pinterest

Share This