The existence (and value of) "clean" geocoding tools?

View: New views
7 Messages — Rating Filter:   Alert me  

The existence (and value of) "clean" geocoding tools?

by David Dearing :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hi.  I just recently stumbled across OSGeo and have poked around to try
and get a feel for the different projects, but still have a lingering
question.  Forgive me if this isn't the appropriate channel to be asking
this.

It seems that there is a solid focus on mapping, image manipulation, and
geometric processing at OSGeo.  And, in the more broad world including
non-open source projects, there are a lot of tools available for the
mass production of geotagged or geocoded documents.  However, the
accuracy of these systems, while good, doesn't seem sufficient when
accuracy is at a premium (from what I've seen they tend to focus on volume).

Are there any existing tools that can be used to tag/code documents,
perhaps sacrificing the mass-produced aspect for better accuracy?  Have
I just missed/overlooked some existing tool(s) that meet this
description?  Or, am I in the minority in wanting to produce fewer
"clean" geocoded/tagged documents rather than many "pretty good" documents?

Thanks so much,
dave
_______________________________________________
Discuss mailing list
Discuss@...
http://lists.osgeo.org/mailman/listinfo/discuss

Re: The existence (and value of) "clean" geocoding tools?

by Christopher Schmidt :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Wed, Sep 24, 2008 at 01:53:34PM -0700, David Dearing wrote:

> Hi.  I just recently stumbled across OSGeo and have poked around to try
> and get a feel for the different projects, but still have a lingering
> question.  Forgive me if this isn't the appropriate channel to be asking
> this.
>
> It seems that there is a solid focus on mapping, image manipulation, and
> geometric processing at OSGeo.  And, in the more broad world including
> non-open source projects, there are a lot of tools available for the
> mass production of geotagged or geocoded documents.  However, the
> accuracy of these systems, while good, doesn't seem sufficient when
> accuracy is at a premium (from what I've seen they tend to focus on volume).
>
> Are there any existing tools that can be used to tag/code documents,
> perhaps sacrificing the mass-produced aspect for better accuracy?  Have
> I just missed/overlooked some existing tool(s) that meet this
> description?  Or, am I in the minority in wanting to produce fewer
> "clean" geocoded/tagged documents rather than many "pretty good" documents?

I'm not aware of many/any open source solutions for "geotagging"
documents. The solutions that exist tend to be proprietary, so far as
I'm aware.

In general, if you're looking for a tool to tag natural langauge placenames
in documents to lat/lon locations, the 'best' solution I'm aware of is
MetaCarta's GeoTagger (which we can discuss more offlist, if you're
interested). However, I'm not aware of anything that does this kind of
analysis that isn't statistical in nature -- and statistical-based
things are subject to the same types of flaws in this field, so you just
tend to look for the people who are doing statistics the best. (Given
that I work for MetaCarta, I won't offer an opinion on how much better
we are than everyone else at this. ;))

Is this what you're looking for? Are there open source related solutions
that you've found?

Regards,
--
Christopher Schmidt
Web Developer
_______________________________________________
Discuss mailing list
Discuss@...
http://lists.osgeo.org/mailman/listinfo/discuss

Re: The existence (and value of) "clean" geocoding tools?

by George Silva :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hello David,

A good geocoding algorithm will only produce good/great results with a
good streetbase. Thats the first step. get a excellent street base.

There are some OSprojects that will geocode for you. Im developing one
for my Bsc thesis based on PostgreSQL. I adapted it from a previous
geocoder that David Bitner wrote. If you want, send me a email and i
will pass you the details.

Later

george

David Dearing escreveu:

> Hi.  I just recently stumbled across OSGeo and have poked around to
> try and get a feel for the different projects, but still have a
> lingering question.  Forgive me if this isn't the appropriate channel
> to be asking this.
>
> It seems that there is a solid focus on mapping, image manipulation,
> and geometric processing at OSGeo.  And, in the more broad world
> including non-open source projects, there are a lot of tools available
> for the mass production of geotagged or geocoded documents.  However,
> the accuracy of these systems, while good, doesn't seem sufficient
> when accuracy is at a premium (from what I've seen they tend to focus
> on volume).
>
> Are there any existing tools that can be used to tag/code documents,
> perhaps sacrificing the mass-produced aspect for better accuracy?  
> Have I just missed/overlooked some existing tool(s) that meet this
> description?  Or, am I in the minority in wanting to produce fewer
> "clean" geocoded/tagged documents rather than many "pretty good"
> documents?
>
> Thanks so much,
> dave
> _______________________________________________
> Discuss mailing list
> Discuss@...
> http://lists.osgeo.org/mailman/listinfo/discuss
>
_______________________________________________
Discuss mailing list
Discuss@...
http://lists.osgeo.org/mailman/listinfo/discuss

Re: The existence (and value of) "clean" geocoding tools?

by Stephen Woodbridge :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

David Dearing wrote:

> Hi.  I just recently stumbled across OSGeo and have poked around to try
> and get a feel for the different projects, but still have a lingering
> question.  Forgive me if this isn't the appropriate channel to be asking
> this.
>
> It seems that there is a solid focus on mapping, image manipulation, and
> geometric processing at OSGeo.  And, in the more broad world including
> non-open source projects, there are a lot of tools available for the
> mass production of geotagged or geocoded documents.  However, the
> accuracy of these systems, while good, doesn't seem sufficient when
> accuracy is at a premium (from what I've seen they tend to focus on
> volume).
>
> Are there any existing tools that can be used to tag/code documents,
> perhaps sacrificing the mass-produced aspect for better accuracy?  Have
> I just missed/overlooked some existing tool(s) that meet this
> description?  Or, am I in the minority in wanting to produce fewer
> "clean" geocoded/tagged documents rather than many "pretty good" documents?

Have you looked at http://ofb.net/~egnor/google.html
http://www.pagcgeo.org/


Geocoding is NOT exact, in fact it deals with a very messy area of
natural language parsing. While it is constrained more than free text,
it still has to deal with all the issues of typos, abbreviations,
punctuations, etc and then it has to match the user into to some vendor
data.

For example: matching AL 44, Alabama 44, AL-44, Alabama Highway 44,
Highway 44, State Highway 44, Rt 44, and various other abbreviations for
Highway, simple typo errors, adding N, N., North, S, S., South, etc
designations to the Highway, adding Alt., Bus., Byp., etc and on it
goes. You also need to deal with accented characters, that are sometimes
entered without accents.

In a geocoder, you typically have a standardizer that sort our all that
craziness. Then when you load the geocoder, you standardize the vendor
data and store it in a standard form. When you get a geocode request you
standardize the incoming request and then try to match the standard form
with the vendor data which is also in standard form. As an alternative
to a standardizer some geocoders use statistical record match techniques.

You can also you techniques like metaphone/soundex codes to do fuzzy
searching and then use levensthein distance to score the possible
matched results for how close they are to the request.

You need to be prepared to handle multiple results to a query, for
example you search for Oak St. but only find North Oak Street and South
Oak Street.

And all this can only happen after you have tagged some text in a
document if you are doing tagging. You mention accuracy is important,
well how do you determine what is "right", remember the Oak St example
above.

Anyway this is a good place to discuss this topic.

-Stephen Woodbridge
  http://imaptools.com/
_______________________________________________
Discuss mailing list
Discuss@...
http://lists.osgeo.org/mailman/listinfo/discuss

Re: The existence (and value of) "clean" geocoding tools?

by Andrew Turner-8 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

It seems as though the "where is a good geocoding engine" typically
devolves into either "you need data", or "it's tough, and here's an
explicit explanation why". I'm surprised that there are rarely answers
(or projects) that say, "here's a project, it needs data, but just get
it into this form, and it has these shortcomings but here's how to
configure it".

The 2002 Google Code contest is a good start, and so is the PostGIS
based one. SRC open-sourced a C++ one, but I've heard mixed reviews.
Just started playing with it myself:

http://www.extendthereach.com/products/OSGeocoder.srct

Anyways, seems like there is a severe need for a good, supported
geocoder. It's a major missing piece in the Open-Source Geo stack.

Andrew


On Thu, Sep 25, 2008 at 7:44 AM, Stephen Woodbridge
<woodbri@...> wrote:

> David Dearing wrote:
>>
>> Hi.  I just recently stumbled across OSGeo and have poked around to try
>> and get a feel for the different projects, but still have a lingering
>> question.  Forgive me if this isn't the appropriate channel to be asking
>> this.
>>
>> It seems that there is a solid focus on mapping, image manipulation, and
>> geometric processing at OSGeo.  And, in the more broad world including
>> non-open source projects, there are a lot of tools available for the mass
>> production of geotagged or geocoded documents.  However, the accuracy of
>> these systems, while good, doesn't seem sufficient when accuracy is at a
>> premium (from what I've seen they tend to focus on volume).
>>
>> Are there any existing tools that can be used to tag/code documents,
>> perhaps sacrificing the mass-produced aspect for better accuracy?  Have I
>> just missed/overlooked some existing tool(s) that meet this description?
>>  Or, am I in the minority in wanting to produce fewer "clean"
>> geocoded/tagged documents rather than many "pretty good" documents?
>
> Have you looked at http://ofb.net/~egnor/google.html
> http://www.pagcgeo.org/
>
>
> Geocoding is NOT exact, in fact it deals with a very messy area of natural
> language parsing. While it is constrained more than free text, it still has
> to deal with all the issues of typos, abbreviations, punctuations, etc and
> then it has to match the user into to some vendor data.
>
> For example: matching AL 44, Alabama 44, AL-44, Alabama Highway 44, Highway
> 44, State Highway 44, Rt 44, and various other abbreviations for Highway,
> simple typo errors, adding N, N., North, S, S., South, etc designations to
> the Highway, adding Alt., Bus., Byp., etc and on it goes. You also need to
> deal with accented characters, that are sometimes entered without accents.
>
> In a geocoder, you typically have a standardizer that sort our all that
> craziness. Then when you load the geocoder, you standardize the vendor data
> and store it in a standard form. When you get a geocode request you
> standardize the incoming request and then try to match the standard form
> with the vendor data which is also in standard form. As an alternative to a
> standardizer some geocoders use statistical record match techniques.
>
> You can also you techniques like metaphone/soundex codes to do fuzzy
> searching and then use levensthein distance to score the possible matched
> results for how close they are to the request.
>
> You need to be prepared to handle multiple results to a query, for example
> you search for Oak St. but only find North Oak Street and South Oak Street.
>
> And all this can only happen after you have tagged some text in a document
> if you are doing tagging. You mention accuracy is important, well how do you
> determine what is "right", remember the Oak St example above.
>
> Anyway this is a good place to discuss this topic.
>
> -Stephen Woodbridge
>  http://imaptools.com/
> _______________________________________________
> Discuss mailing list
> Discuss@...
> http://lists.osgeo.org/mailman/listinfo/discuss
>



--
Andrew Turner
mobile: 248.982.3609
andrew@...
http://highearthorbit.com

http://mapufacture.com           Helping build the Geospatial Web
Introduction to Neogeography - http://oreilly.com/catalog/neogeography
_______________________________________________
Discuss mailing list
Discuss@...
http://lists.osgeo.org/mailman/listinfo/discuss

The existence (and value of) "clean" geocoding tools?

by David Dearing :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Thanks for the good variety of responses!

Certainly there is no way to perfectly automate geocoding.  That's just
a very hard problem and, as Stephen mentioned, there are so many
possible spelling/formatting differences that it's impossible to plan
for every case.

Although, to steer this away from the "where is a good geocoding engine"
line of thought, I am curious of the use of these systems.  I think we
all agree that current systems do not reach, as I termed it, "clean"
geocoding of a document.  Does this cause a problem for the applications
of the geocoded data?  I'm not very familiar with the traditional uses
of these systems.  Does it need to be manually corrected or are the
applications happy with "pretty good"?

dave


_______________________________________________
Discuss mailing list
Discuss@...
http://lists.osgeo.org/mailman/listinfo/discuss

RE: The existence (and value of) "clean" geocoding tools?

by Woolard, Zachary S. :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

I've worked on several geocoding projects, and they have all required
manual cleanup of the geocoded addresses and the initial datasets.
Geocoding is used extensively in public safety/EMS applications.  For
their purposes a "pretty good" geocode is not acceptable.  The engine is
important, but not as important as having clean street centerline and
address data to work with.  My two cents.


Zach

 

-----Original Message-----
From: discuss-bounces@...
[mailto:discuss-bounces@...] On Behalf Of David Dearing
Sent: Monday, September 29, 2008 6:39 PM
To: discuss@...
Subject: [OSGeo-Discuss] The existence (and value of) "clean" geocoding
tools?

Thanks for the good variety of responses!

Certainly there is no way to perfectly automate geocoding.  That's just
a very hard problem and, as Stephen mentioned, there are so many
possible spelling/formatting differences that it's impossible to plan
for every case.

Although, to steer this away from the "where is a good geocoding engine"

line of thought, I am curious of the use of these systems.  I think we
all agree that current systems do not reach, as I termed it, "clean"
geocoding of a document.  Does this cause a problem for the applications

of the geocoded data?  I'm not very familiar with the traditional uses
of these systems.  Does it need to be manually corrected or are the
applications happy with "pretty good"?

dave


_______________________________________________
Discuss mailing list
Discuss@...
http://lists.osgeo.org/mailman/listinfo/discuss
_______________________________________________
Discuss mailing list
Discuss@...
http://lists.osgeo.org/mailman/listinfo/discuss
LightInTheBox - Buy quality products at wholesale price!