How does Yelp efficiently calculate distance in the database?

If I understand the question correctly (and I'm not sure I do), you are worried about computing "(Some formula to compute distance here)" for every row in the table each time you do a query?

This can be mitigated to a degree by using the indexes on latitude and longitude so we only have to compute the distance for a 'box' of points containing the circle we actually want:

select * from business
where (latitude>96 and latitude<116) and 
      (longitude>-5 and longitude<15) and 
      (Some formula to compute distance here) < 2000

Where 96, 116 etc are chosen to match the unit of the value '2000' and the point on the globe you are calculating distances from.

How precisely this uses indexes will depend on your RDBMS and the choices its planner makes.

In general terms, this is a primitive way of optimising a kind of nearest neighbour search. If your RDBMS supports GiST indexes, like postgres then you should consider using them instead.


(Disclosure: I'm a Microsoft SQL Server guy, so my answers are influenced by that.)

To really do it efficiently, there's two things you want: caching and native spatial data support. Spatial data support lets you store geography and geometry data directly in the database without doing intensive/expensive calculations on the fly, and lets you build indexes to very rapidly find the closest point to your current location (or most efficient route or whatever).

Caching is important if you want to scale, period. The fastest query is the one you never make. Whenever a user asks for the closest things to him, you store his location and the result set in a cache like Redis or memcached for a period of hours. Business locations aren't going to change for 4 hours - well, they might if someone edits a business, but you don't necessarily need that to be immediately updated in all result sets.