SQL Server 2005
(1)
SQL Server
(1)
CREATE TABLE
(1)
PRIMARY KEY
(1)
NOT NULL
(1)
Varchar
(1)
RevGeo
(1)
Longitude
(1)

Managing Large Datasets for Reverse Geocoding

Asked By Mari
30-Jul-08 11:34 AM
Hi

I'm facing a low performance issue in a new development probably someone had
facing before. There is many way to perform the reverse geocoding process (to
know the street address based on a Latitude Longitude Point). the easiest way
is to use the Haversine algorithm to measure the distance of the know
Lat/Long with a table wich contains the information of the streets of overall
nation with the format:

ID(Int),NAME(Char),LATITUDE(Float),LONGITUDE(Float),PARENTID(Int)

Now you can check the register with the lowest distance to the original
point(Wich means the nearest street address to the desired point) and use the
Name of the location of the entered Location.  Until here everything is ok
table with approximate 1.000.000 registres, so server have to calculate
1.000.000 distances and then Order by Distance!!!. Something like:

SELECT TOP 1
ID,Name,Latitude,Longitude,dbo.Haversine(4.7131725,-74.057433,dbo.RevGeo.Latitude,dbo.RevGeo.Longitude)as Distance FROM RevGeo Order by Distance

This operation takes around 3 seconds in a desktop computer, however it is
too much time if you keep in mind we have 1000 vehicles on field requeting
reverse geocoding services each minute, it mean i have to improve a lot the
service performance. Looking inside the Query Analizer it looks like Sorting
process takes more tan 90% of query time, so anyone can help me to understand
how to improve Query to reduce dramatically the query time? Thanks in advance
by your help!!

Mario

Please use real DDL instead of your personal shorthand.

Asked By --CELKO--
31-Jul-08 11:58 PM
Please use real DDL instead of your personal shorthand.  The data
element names make almost no sense.  You have magical generic "id" and
a "parent_id" without any hint as to what standard they follow or how
the parent role relates to the other.  what is the key?
UNIQUE(latitude, longitude)?  Maybe you meant this?:

CREATE TABLE Locations
(location_id INTEGER NOT NULL PRIMARY KEY, -- industry standard used?
location_name CHAR(n) NOT NULL, -- not varchar? what is (n)?
latitude FLOAT NOT NULL,
longitude FLOAT NOT NULL,
parent_location_id INTEGER NOT NULL);


SELECT TOP 1 location_id, location_name, latitude, longitude,
dbo.Haversine(4.7131725, -74.057433, dbo.RevGeo.latitude,
dbo.RevGeo.longitude)AS distance
FROM RevGeo
ORDER BY distance;

The Haversine() function is killing you.
1) Can you buy a floating point processor for your computer?  Probably
not, and I am not sure if SQL Server would use it.
2) Can you write the Haversine as in-line code and get rid of the row-
by-row function call?  This will help a little.

I did a system like this years ago for a city-wide delivery system.
In the U.S. we have a series of map books called the Thomas Guides
(part of the Rand-McNally map company), which break the city into a
square grid defined by (page_nbr, x-coordinate, y-coordinate).  You
can get an approximate distance with a simple Pythagorean distance
formula if the locations are on the same page.  You use a look-up
table if they are not on the same page.  You can buy the map
information from them on computer media.

Mario,Depending on the data, and range of input coordinates you deal with,

Asked By Michael Ware
30-Jul-08 01:27 PM
Mario,
Depending on the data, and range of  input coordinates you deal with, you
should be able narrow the result set before the sorting process.  Something
like:

SELECT TOP 1 ID,Name,Latitude,Longitude,dbo.Haversine(4.7131725,-74.057433,
dbo.RevGeo.Latitude,dbo.RevGeo.Longitude)as Distance
FROM RevGeo

WHERE Latitude BETWEEN (4.7131725)-0.5 AND (4.7131725)+0.5
AND  Longitude BETWEEN (-74.057433) -0.5 AND (-74.057433) +0.5

ORDER by Distance

This way only rows within a 1 degree "square" block need to be processed.
In dealing with US Zip Code's I've found +/- 0.2 degrees Latitude & +/-0.25
degrees Longitude is enough to find the nearest zip code to any point.
PROVIDED the point in question is within a valid zipcode to begin with.

-Mike
----------------------
little earth than plagues or earthquakes"
Voltaire

Managing Large Datasets for Reverse Geocoding

Asked By Hugo Kornelis
30-Jul-08 06:55 PM
Hi Mario,

Grab a copy of Expert SQL Server 2005 Development (by Adam Machanic,
Lara Rubbelke, Hugo Kornelis - yes, this is a shameless self-plug) and
read chapter 9. That's the chapter I wrote, and it deals with spatial
data. I discuss two common problems:
1) "which (hotels / trees / streets / whatever) are in a 10-mile radius
from a specific point?" (with an optimization to prevent having to
calculate distance for each point in the table), and
2) "which (hotel / tree / street / whatever) is nearest to a specific
point?" (with a different optimization to, again, prevent having to
calculate all those distances).

Unfortunately, one error in these queries slipped through review until
it was too late to change, so I blogged about that one:
http://sqlblog.com/blogs/hugo_kornelis/archive/2007/05/18/correcting-my-mistake.aspx
http://sqlblog.com/blogs/hugo_kornelis/archive/2007/06/27/The-bounding-box_2C00_-corrected-version.aspx

--
Hugo Kornelis, SQL Server MVP
My SQL Server blog: http://sqlblog.com/blogs/hugo_kornelis
Post Question To EggHeadCafe