Pretty much any language commonly used for data analysis (R, SAS, Python) can calculate the distance between two geographic coordinates with relative ease. But always having to pull your data out of your data warehouse any time you want to do some basic geographic analysis can be frustrating - sometimes it's nice to keep simple queries all in one system. If you've got a spatially enabled version of Postgres or SQL Server, you're in business. But if not, you'll need to roll your own SQL solution.

In today's post, we're going to write our own code in vanilla SQL to calculate the distance between two latitude and longitude coordinates.

## Basic concepts

I'm going to start out with a couple of areas of explanation. If you just want to know how to make this work, you can scroll down to the SQL at the bottom...

First area of explanation: Latitude and longitude are basically nothing more than angles. Latitude is measured as your degrees north or south of the equator. Longitude is your degrees east or west of the prime meridian. The combination of these two angles pinpoints an exact location on the surface of the earth.

Second bit of explanation: As shown in the image above, the quickest route between two points on the surface of the earth is a "great circle path" - in other words, a path that comprises a part of the longest circle you could draw around the globe that intersects the two points. (It's counter-intuitive, but it works if you think about it.) And, since this is a circular path on a sphere using coordinates expressed in angles, all of the properties of the distance will be given by trigonometric formulas. Fair enough.

Final bit of explanation: The shortest distance between two points on the globe can be calculated using the Haversine formula. All of the math behind it is beyond the scope of this post but (surprise, surprise), it's just a bunch of trig. The great circle distance can be calculated using the Haversine formula shown below.

Note that Phi 1 and Phi 2 are latitudes and Lambda 1 and Lambda 2 are longitudes, while R is the radius of the earth (which we'll call 3961 miles). All we need to do now is implement this in SQL!

## The SQL code

Here's the code for implementing the Haversine formula in SQL, plus a dummy table for trying it out. This code is specifically tailored to Redshift (since this is an analytics blog), but it should be relatively easy to translate to other flavors of SQL...

The demo calculation puts these points about 88.7 miles away from each other, which just happens to be the same result I get from Andrew Hedges' great circle distance calculator. Awesome.

## Conclusion

Implementing algorithms in SQL is always a good time. If you've got any comments about this one, be sure to leave them below! Also, feel free to check out my other posts on implementing various algorithms in standard SQL code:

Thanks for stopping by!

Jude MarcApril 10, 2017 / 11:32 amHi Dayne,

I’m not seeing the SQL Code and table on this one. I think you meant to post a link but it didn’t go through.

Thanks,

daynebattenApril 10, 2017 / 1:05 pmThere’s a large GitHub Gist right in the middle. Do you have JavaScript disabled or anything?

bikashJuly 8, 2017 / 5:32 amgot follwing error:

Msg 402, Level 16, State 1, Line 2

The data types float and int are incompatible in the ‘^’ operator.

daynebattenJuly 10, 2017 / 9:18 amMaybe cast, then?

KurtJuly 11, 2017 / 10:44 amHey All,

I had the same error “The data types float and int are incompatible in the ‘^’ operator.”

This is the solution I found, if you use the Power func, https://docs.microsoft.com/en-us/sql/t-sql/functions/power-transact-sql.

select

2 * 3961 * asin(sqrt( power((sin(radians((lat2 – lat1) / 2))) , 2) + cos(radians(lat1)) * cos(radians(lat2)) * power((sin(radians((lon2 – lon1) / 2))) , 2) )) as distance

from

the_data;

Seems to work great.

@daynebatten This is Amazing. THANK YOU VERY MUCH you have saved me a lot of time.

PaulaMarch 25, 2020 / 6:38 pmHi Kurt!

output is in miles, right?

JeffAugust 10, 2018 / 11:13 amPerhaps I missed it, even after reading it multiple times but what is the unit of measure of the output of this formula? Feet, Miles, Meters or Kilometers?

daynebattenAugust 10, 2018 / 11:53 amSame units as whatever you use for the radius of the earth. If you choose miles, it’s miles.

SatyamMarch 7, 2019 / 10:57 pmWhat’s the time complexity of this operation?

daynebattenApril 19, 2019 / 7:48 amIt’s not terrible if you’re, say, computing the distance between a bunch of points and a single other point. Computers are fast at math.

Doing spatial joins this way would be very slow, though, because you’d have to compare all rows in table A with all rows in table B. If you want to do that, I recommend using a spatially enabled database (e.g., PostGIS).

JurajFebruary 5, 2020 / 10:35 amHey daynebatten,

My college during code review pointed out to me, that we have to subtract cords in radiant format instead of pure cords.

Please take a look below whether you agree that our approach is correct.

Thank you

Yours

2 * 3961 * asin(sqrt((sin(radians((lat2 – lat1) / 2))) ^ 2 + cos(radians(lat1)) * cos(radians(lat2)) * (sin(radians((lon2 – lon1) / 2))) ^ 2))

Ours

2 * 3961 * asin(sqrt((sin((radians(lat2)-radians(lat1) / 2)^2) + cos(radians(lat1)) * cos(radians(lat2)) * (sin((radians(lon2) – radians(lon1)) / 2 )^2) ))

daynebattenApril 6, 2020 / 1:38 pmYour code looks fine, but converting to radians before subtracting isn’t necessary. They’re just units… Doing math in miles and converting to km is the same as converting to km and then doing your math.