|
View:
New views
7 Messages
—
Rating Filter:
Alert me
|
|
|
The existence (and value of) "clean" geocoding tools?Hi. I just recently stumbled across OSGeo and have poked around to try
and get a feel for the different projects, but still have a lingering question. Forgive me if this isn't the appropriate channel to be asking this. It seems that there is a solid focus on mapping, image manipulation, and geometric processing at OSGeo. And, in the more broad world including non-open source projects, there are a lot of tools available for the mass production of geotagged or geocoded documents. However, the accuracy of these systems, while good, doesn't seem sufficient when accuracy is at a premium (from what I've seen they tend to focus on volume). Are there any existing tools that can be used to tag/code documents, perhaps sacrificing the mass-produced aspect for better accuracy? Have I just missed/overlooked some existing tool(s) that meet this description? Or, am I in the minority in wanting to produce fewer "clean" geocoded/tagged documents rather than many "pretty good" documents? Thanks so much, dave _______________________________________________ Discuss mailing list Discuss@... http://lists.osgeo.org/mailman/listinfo/discuss |
|
|
Re: The existence (and value of) "clean" geocoding tools?On Wed, Sep 24, 2008 at 01:53:34PM -0700, David Dearing wrote:
> Hi. I just recently stumbled across OSGeo and have poked around to try > and get a feel for the different projects, but still have a lingering > question. Forgive me if this isn't the appropriate channel to be asking > this. > > It seems that there is a solid focus on mapping, image manipulation, and > geometric processing at OSGeo. And, in the more broad world including > non-open source projects, there are a lot of tools available for the > mass production of geotagged or geocoded documents. However, the > accuracy of these systems, while good, doesn't seem sufficient when > accuracy is at a premium (from what I've seen they tend to focus on volume). > > Are there any existing tools that can be used to tag/code documents, > perhaps sacrificing the mass-produced aspect for better accuracy? Have > I just missed/overlooked some existing tool(s) that meet this > description? Or, am I in the minority in wanting to produce fewer > "clean" geocoded/tagged documents rather than many "pretty good" documents? I'm not aware of many/any open source solutions for "geotagging" documents. The solutions that exist tend to be proprietary, so far as I'm aware. In general, if you're looking for a tool to tag natural langauge placenames in documents to lat/lon locations, the 'best' solution I'm aware of is MetaCarta's GeoTagger (which we can discuss more offlist, if you're interested). However, I'm not aware of anything that does this kind of analysis that isn't statistical in nature -- and statistical-based things are subject to the same types of flaws in this field, so you just tend to look for the people who are doing statistics the best. (Given that I work for MetaCarta, I won't offer an opinion on how much better we are than everyone else at this. ;)) Is this what you're looking for? Are there open source related solutions that you've found? Regards, -- Christopher Schmidt Web Developer _______________________________________________ Discuss mailing list Discuss@... http://lists.osgeo.org/mailman/listinfo/discuss |
|
|
Re: The existence (and value of) "clean" geocoding tools?Hello David,
A good geocoding algorithm will only produce good/great results with a good streetbase. Thats the first step. get a excellent street base. There are some OSprojects that will geocode for you. Im developing one for my Bsc thesis based on PostgreSQL. I adapted it from a previous geocoder that David Bitner wrote. If you want, send me a email and i will pass you the details. Later george David Dearing escreveu: > Hi. I just recently stumbled across OSGeo and have poked around to > try and get a feel for the different projects, but still have a > lingering question. Forgive me if this isn't the appropriate channel > to be asking this. > > It seems that there is a solid focus on mapping, image manipulation, > and geometric processing at OSGeo. And, in the more broad world > including non-open source projects, there are a lot of tools available > for the mass production of geotagged or geocoded documents. However, > the accuracy of these systems, while good, doesn't seem sufficient > when accuracy is at a premium (from what I've seen they tend to focus > on volume). > > Are there any existing tools that can be used to tag/code documents, > perhaps sacrificing the mass-produced aspect for better accuracy? > Have I just missed/overlooked some existing tool(s) that meet this > description? Or, am I in the minority in wanting to produce fewer > "clean" geocoded/tagged documents rather than many "pretty good" > documents? > > Thanks so much, > dave > _______________________________________________ > Discuss mailing list > Discuss@... > http://lists.osgeo.org/mailman/listinfo/discuss > Discuss mailing list Discuss@... http://lists.osgeo.org/mailman/listinfo/discuss |
|
|
Re: The existence (and value of) "clean" geocoding tools?David Dearing wrote:
> Hi. I just recently stumbled across OSGeo and have poked around to try > and get a feel for the different projects, but still have a lingering > question. Forgive me if this isn't the appropriate channel to be asking > this. > > It seems that there is a solid focus on mapping, image manipulation, and > geometric processing at OSGeo. And, in the more broad world including > non-open source projects, there are a lot of tools available for the > mass production of geotagged or geocoded documents. However, the > accuracy of these systems, while good, doesn't seem sufficient when > accuracy is at a premium (from what I've seen they tend to focus on > volume). > > Are there any existing tools that can be used to tag/code documents, > perhaps sacrificing the mass-produced aspect for better accuracy? Have > I just missed/overlooked some existing tool(s) that meet this > description? Or, am I in the minority in wanting to produce fewer > "clean" geocoded/tagged documents rather than many "pretty good" documents? Have you looked at http://ofb.net/~egnor/google.html http://www.pagcgeo.org/ Geocoding is NOT exact, in fact it deals with a very messy area of natural language parsing. While it is constrained more than free text, it still has to deal with all the issues of typos, abbreviations, punctuations, etc and then it has to match the user into to some vendor data. For example: matching AL 44, Alabama 44, AL-44, Alabama Highway 44, Highway 44, State Highway 44, Rt 44, and various other abbreviations for Highway, simple typo errors, adding N, N., North, S, S., South, etc designations to the Highway, adding Alt., Bus., Byp., etc and on it goes. You also need to deal with accented characters, that are sometimes entered without accents. In a geocoder, you typically have a standardizer that sort our all that craziness. Then when you load the geocoder, you standardize the vendor data and store it in a standard form. When you get a geocode request you standardize the incoming request and then try to match the standard form with the vendor data which is also in standard form. As an alternative to a standardizer some geocoders use statistical record match techniques. You can also you techniques like metaphone/soundex codes to do fuzzy searching and then use levensthein distance to score the possible matched results for how close they are to the request. You need to be prepared to handle multiple results to a query, for example you search for Oak St. but only find North Oak Street and South Oak Street. And all this can only happen after you have tagged some text in a document if you are doing tagging. You mention accuracy is important, well how do you determine what is "right", remember the Oak St example above. Anyway this is a good place to discuss this topic. -Stephen Woodbridge http://imaptools.com/ _______________________________________________ Discuss mailing list Discuss@... http://lists.osgeo.org/mailman/listinfo/discuss |
|
|
Re: The existence (and value of) "clean" geocoding tools?It seems as though the "where is a good geocoding engine" typically
devolves into either "you need data", or "it's tough, and here's an explicit explanation why". I'm surprised that there are rarely answers (or projects) that say, "here's a project, it needs data, but just get it into this form, and it has these shortcomings but here's how to configure it". The 2002 Google Code contest is a good start, and so is the PostGIS based one. SRC open-sourced a C++ one, but I've heard mixed reviews. Just started playing with it myself: http://www.extendthereach.com/products/OSGeocoder.srct Anyways, seems like there is a severe need for a good, supported geocoder. It's a major missing piece in the Open-Source Geo stack. Andrew On Thu, Sep 25, 2008 at 7:44 AM, Stephen Woodbridge <woodbri@...> wrote: > David Dearing wrote: >> >> Hi. I just recently stumbled across OSGeo and have poked around to try >> and get a feel for the different projects, but still have a lingering >> question. Forgive me if this isn't the appropriate channel to be asking >> this. >> >> It seems that there is a solid focus on mapping, image manipulation, and >> geometric processing at OSGeo. And, in the more broad world including >> non-open source projects, there are a lot of tools available for the mass >> production of geotagged or geocoded documents. However, the accuracy of >> these systems, while good, doesn't seem sufficient when accuracy is at a >> premium (from what I've seen they tend to focus on volume). >> >> Are there any existing tools that can be used to tag/code documents, >> perhaps sacrificing the mass-produced aspect for better accuracy? Have I >> just missed/overlooked some existing tool(s) that meet this description? >> Or, am I in the minority in wanting to produce fewer "clean" >> geocoded/tagged documents rather than many "pretty good" documents? > > Have you looked at http://ofb.net/~egnor/google.html > http://www.pagcgeo.org/ > > > Geocoding is NOT exact, in fact it deals with a very messy area of natural > language parsing. While it is constrained more than free text, it still has > to deal with all the issues of typos, abbreviations, punctuations, etc and > then it has to match the user into to some vendor data. > > For example: matching AL 44, Alabama 44, AL-44, Alabama Highway 44, Highway > 44, State Highway 44, Rt 44, and various other abbreviations for Highway, > simple typo errors, adding N, N., North, S, S., South, etc designations to > the Highway, adding Alt., Bus., Byp., etc and on it goes. You also need to > deal with accented characters, that are sometimes entered without accents. > > In a geocoder, you typically have a standardizer that sort our all that > craziness. Then when you load the geocoder, you standardize the vendor data > and store it in a standard form. When you get a geocode request you > standardize the incoming request and then try to match the standard form > with the vendor data which is also in standard form. As an alternative to a > standardizer some geocoders use statistical record match techniques. > > You can also you techniques like metaphone/soundex codes to do fuzzy > searching and then use levensthein distance to score the possible matched > results for how close they are to the request. > > You need to be prepared to handle multiple results to a query, for example > you search for Oak St. but only find North Oak Street and South Oak Street. > > And all this can only happen after you have tagged some text in a document > if you are doing tagging. You mention accuracy is important, well how do you > determine what is "right", remember the Oak St example above. > > Anyway this is a good place to discuss this topic. > > -Stephen Woodbridge > http://imaptools.com/ > _______________________________________________ > Discuss mailing list > Discuss@... > http://lists.osgeo.org/mailman/listinfo/discuss > -- Andrew Turner mobile: 248.982.3609 andrew@... http://highearthorbit.com http://mapufacture.com Helping build the Geospatial Web Introduction to Neogeography - http://oreilly.com/catalog/neogeography _______________________________________________ Discuss mailing list Discuss@... http://lists.osgeo.org/mailman/listinfo/discuss |
|
|
The existence (and value of) "clean" geocoding tools?Thanks for the good variety of responses!
Certainly there is no way to perfectly automate geocoding. That's just a very hard problem and, as Stephen mentioned, there are so many possible spelling/formatting differences that it's impossible to plan for every case. Although, to steer this away from the "where is a good geocoding engine" line of thought, I am curious of the use of these systems. I think we all agree that current systems do not reach, as I termed it, "clean" geocoding of a document. Does this cause a problem for the applications of the geocoded data? I'm not very familiar with the traditional uses of these systems. Does it need to be manually corrected or are the applications happy with "pretty good"? dave _______________________________________________ Discuss mailing list Discuss@... http://lists.osgeo.org/mailman/listinfo/discuss |
|
|
RE: The existence (and value of) "clean" geocoding tools?I've worked on several geocoding projects, and they have all required
manual cleanup of the geocoded addresses and the initial datasets. Geocoding is used extensively in public safety/EMS applications. For their purposes a "pretty good" geocode is not acceptable. The engine is important, but not as important as having clean street centerline and address data to work with. My two cents. Zach -----Original Message----- From: discuss-bounces@... [mailto:discuss-bounces@...] On Behalf Of David Dearing Sent: Monday, September 29, 2008 6:39 PM To: discuss@... Subject: [OSGeo-Discuss] The existence (and value of) "clean" geocoding tools? Thanks for the good variety of responses! Certainly there is no way to perfectly automate geocoding. That's just a very hard problem and, as Stephen mentioned, there are so many possible spelling/formatting differences that it's impossible to plan for every case. Although, to steer this away from the "where is a good geocoding engine" line of thought, I am curious of the use of these systems. I think we all agree that current systems do not reach, as I termed it, "clean" geocoding of a document. Does this cause a problem for the applications of the geocoded data? I'm not very familiar with the traditional uses of these systems. Does it need to be manually corrected or are the applications happy with "pretty good"? dave _______________________________________________ Discuss mailing list Discuss@... http://lists.osgeo.org/mailman/listinfo/discuss _______________________________________________ Discuss mailing list Discuss@... http://lists.osgeo.org/mailman/listinfo/discuss |
| Free Forum Powered by Nabble | Forum Help |