|
View:
New views
10 Messages
—
Rating Filter:
Alert me
|
|
|
FYI: Java-based email parser, mostly 2822-compliantHi Bill, all -
Previously Bill and I discussed (on this list) the understandable limitations of JavaMail's parsing/handling of email addresses. I decided to take code by Les Hazlewood and improve it to suit our needs. My company is sharing the code (it's Apache-licensed by Mr. Hazlewood). As far as Mr. Hazlewood and I know, this is the only public Java-based code to handle this task. It's very-nearly-2822-compliant (all the useful/important stuff is covered). It has extensive methods to verify and/or extract emails. There are some caveats and so forth; you can find all the code, thorough documentation, etc, at: http://boxbe.com/freebox.html Bill, something tells me it may not a be perfect match for integration into JavaMail :-), but you mentioned that if someone wanted to write a more thorough/robust parser you'd be open to including it. I can't speak for the original author who issued the license, but if in fact this code (or some derivative of it) would be useful to JavaMail, then that'd be great with me. Let me know. FYI, Bill is right, 2822 syntax is very tricky (some might say, "crazy"). Ok, hope it's useful to some, -Casey ----- P.S. I tried to post this before, by emailing it in, but it didn't seem to appear on the list... ? So now I try to send it via the archives.java.sun.com web form... =========================================================================== To unsubscribe, send email to listserv@... and include in the body of the message "signoff JAVAMAIL-INTEREST". For general help, send email to listserv@... and include in the body of the message "help". |
|
|
Re: FYI: Java-based email parser, mostly 2822-compliantThanks, Casey! I'll add some pointers to this to our web site when I get a
chance, and I'll look into it to see if it's reasonable to consider including in JavaMail in any way. Casey Connor wrote: > Hi Bill, all - > > Previously Bill and I discussed (on this list) the understandable > limitations of JavaMail's parsing/handling of email addresses. > > I decided to take code by Les Hazlewood and improve it to suit our needs. My > company is sharing the code (it's Apache-licensed by Mr. Hazlewood). As far > as Mr. Hazlewood and I know, this is the only public Java-based code to > handle this task. It's very-nearly-2822-compliant (all the useful/important > stuff is covered). It has extensive methods to verify and/or extract emails. > > There are some caveats and so forth; you can find all the code, thorough > documentation, etc, at: > > http://boxbe.com/freebox.html > > Bill, something tells me it may not a be perfect match for integration into > JavaMail :-), but you mentioned that if someone wanted to write a more > thorough/robust parser you'd be open to including it. I can't speak for the > original author who issued the license, but if in fact this code (or some > derivative of it) would be useful to JavaMail, then that'd be great with me. > Let me know. > > FYI, Bill is right, 2822 syntax is very tricky (some might say, "crazy"). > > Ok, hope it's useful to some, > -Casey > > ----- > P.S. I tried to post this before, by emailing it in, but it didn't seem to > appear on the list... ? So now I try to send it via the > archives.java.sun.com web form... > > =========================================================================== > To unsubscribe, send email to listserv@... and include in the body > of the message "signoff JAVAMAIL-INTEREST". For general help, send email to > listserv@... and include in the body of the message "help". =========================================================================== To unsubscribe, send email to listserv@... and include in the body of the message "signoff JAVAMAIL-INTEREST". For general help, send email to listserv@... and include in the body of the message "help". |
|
|
|
|
|
|
|
|
|
|
|
Re: FYI: Java-based email parser, mostly 2822-compliant> Bill - out of curiosity, can you describe some of the cases that don't
> follow 2822 but that "we need to support"? While I'm not Bill I'll reply anyway :) A problem of strict 2822 compliancy is that sometimes email addresses that are not 2822 compliant are used in practice. Most servers are more relaxed and allow for example the following email address local.@... This is an invalid email adress according to 2822 because the local part should have been quoted but, I know for a fact that email addresses like this one are used in practice. Martijn Brinkers On Thu, 2008-05-15 at 14:16 -0600, Casey Connor wrote: > First, a quick update: I realized that there is a function or two that > should be added to EmailAddress.java (extracting the localpart), so I'll be > whipping that up today, and there should be an update in the next week or so. > > > Can one of you explain what this does that JavaMail doesn't? I read the > > website, but I wasn't clear on what scenarios this would be useful for. > > Thanks, I should have done so. > > In terms of scenarios, the main one that concerned me and which I suspect > may be useful to others is the reliable and predictable extraction of > addresses and their related personal names from emails (messages that were > generated by remote senders, not your own code) or from other sources of > addresses (e.g. importing an address book.) That is, server-side tasks where > you need to know that the "address" and "personal name" that you get back > from parsing are accurate, 2822-legal, and ready for your database. > > That's the main scenario I know, though anyone that wants more accurate > address verifying/extraction could maybe find further use for it. > > You said you read the website, so maybe the following is redundant, but for > the general record, here's info about the parsing (Bill can do a much better > job of this, but just for fun, I'll take a stab): > > For most users this class is probably not all that useful. Essentially, the > issue is that the 2822 grammar is very difficult to parse, so handling all > the corner cases isn't a reasonable proposition for most implementors of > email services. > > Since Javamail is not really built for server-side use, where accurately > validating lots of real-world addresses is a major use-case, the parser > isn't too comprehensive (Bill, correct me here if needed.) > > Bill said: > > > Simply put, the code is not perfect. It's an ad hoc "parser" that works > > well for most common cases, both those that follow the specs and those > > that don't but that we need to support. The "strict" flag should probably > > be called "stricter". :-) > ... > > ...one of the shortcuts the code takes it that it > > doesn't do any checking of the address if the address contains a quote mark. > > The example that spurred all this for me was the (note: illegal) address: > > "Bob Smith" bob@... > > (To be legal, it should be: "Bob Smith" <bob@...>) > > Javamail passes that address. Further, the following (note: illegal) value > is returned from getAddress() (which normally shouldn't return the personal > name): > > <"Bob Smith" bob@...> > > The EmailAddress class will fail that example. > > There are also numerous examples of legal address that javamail won't allow, > although these are less common (and often you may not want to accept them > anyway). E.g.: > > bob @example.com > "bob" @ example.com > > ...EmailAddress will accept and clean up those (legal) addresses. > > EmailAddress also has functions to: > > - extract or validate the return path accurately (which has a different spec) > - extract addresses (with the same accuracy) from header String's > - validate whole headers > - extract the domain part from an address > - soon: extract the local part from an address > > Whether or not any of that is actually useful is up to you, of course. > > Bill - out of curiosity, can you describe some of the cases that don't > follow 2822 but that "we need to support"? > > > p.s. Thanks Casey for putting this out there to the community! > > Sure, and thanks also to Les Hazlewood for the original code. :-) > > -Casey > Boxbe, Inc. -- http://boxbe.com/freebox.html > > =========================================================================== > To unsubscribe, send email to listserv@... and include in the body > of the message "signoff JAVAMAIL-INTEREST". For general help, send email to > listserv@... and include in the body of the message "help". =========================================================================== To unsubscribe, send email to listserv@... and include in the body of the message "signoff JAVAMAIL-INTEREST". For general help, send email to listserv@... and include in the body of the message "help". |
|
|
Re: FYI: Java-based email parser, mostly 2822-compliantHi,
I'm also interested with this. I am curious what people do when they try to parse this stuff like: - From: Administrator - To: <Undisclosed-Recipient:;> Some stuff I have found (and I made my custom code sort of work with) is: - To: "john doe" (john.doe@...) - To: john.doe@...; john.smith@...; john.john@... javier On Thu, May 15, 2008 at 10:40 PM, Martijn Brinkers <martijn.list@...> wrote: >> Bill - out of curiosity, can you describe some of the cases that don't >> follow 2822 but that "we need to support"? > > While I'm not Bill I'll reply anyway :) > > A problem of strict 2822 compliancy is that sometimes email addresses > that are not 2822 compliant are used in practice. Most servers are more > relaxed and allow for example the following email address > > local.@... > > This is an invalid email adress according to 2822 because the local part > should have been quoted but, I know for a fact that email addresses like > this one are used in practice. > > Martijn Brinkers > > > On Thu, 2008-05-15 at 14:16 -0600, Casey Connor wrote: >> First, a quick update: I realized that there is a function or two that >> should be added to EmailAddress.java (extracting the localpart), so I'll be >> whipping that up today, and there should be an update in the next week or so. >> >> > Can one of you explain what this does that JavaMail doesn't? I read the >> > website, but I wasn't clear on what scenarios this would be useful for. >> >> Thanks, I should have done so. >> >> In terms of scenarios, the main one that concerned me and which I suspect >> may be useful to others is the reliable and predictable extraction of >> addresses and their related personal names from emails (messages that were >> generated by remote senders, not your own code) or from other sources of >> addresses (e.g. importing an address book.) That is, server-side tasks where >> you need to know that the "address" and "personal name" that you get back >> from parsing are accurate, 2822-legal, and ready for your database. >> >> That's the main scenario I know, though anyone that wants more accurate >> address verifying/extraction could maybe find further use for it. >> >> You said you read the website, so maybe the following is redundant, but for >> the general record, here's info about the parsing (Bill can do a much better >> job of this, but just for fun, I'll take a stab): >> >> For most users this class is probably not all that useful. Essentially, the >> issue is that the 2822 grammar is very difficult to parse, so handling all >> the corner cases isn't a reasonable proposition for most implementors of >> email services. >> >> Since Javamail is not really built for server-side use, where accurately >> validating lots of real-world addresses is a major use-case, the parser >> isn't too comprehensive (Bill, correct me here if needed.) >> >> Bill said: >> >> > Simply put, the code is not perfect. It's an ad hoc "parser" that works >> > well for most common cases, both those that follow the specs and those >> > that don't but that we need to support. The "strict" flag should probably >> > be called "stricter". :-) >> ... >> > ...one of the shortcuts the code takes it that it >> > doesn't do any checking of the address if the address contains a quote mark. >> >> The example that spurred all this for me was the (note: illegal) address: >> >> "Bob Smith" bob@... >> >> (To be legal, it should be: "Bob Smith" <bob@...>) >> >> Javamail passes that address. Further, the following (note: illegal) value >> is returned from getAddress() (which normally shouldn't return the personal >> name): >> >> <"Bob Smith" bob@...> >> >> The EmailAddress class will fail that example. >> >> There are also numerous examples of legal address that javamail won't allow, >> although these are less common (and often you may not want to accept them >> anyway). E.g.: >> >> bob @example.com >> "bob" @ example.com >> >> ...EmailAddress will accept and clean up those (legal) addresses. >> >> EmailAddress also has functions to: >> >> - extract or validate the return path accurately (which has a different spec) >> - extract addresses (with the same accuracy) from header String's >> - validate whole headers >> - extract the domain part from an address >> - soon: extract the local part from an address >> >> Whether or not any of that is actually useful is up to you, of course. >> >> Bill - out of curiosity, can you describe some of the cases that don't >> follow 2822 but that "we need to support"? >> >> > p.s. Thanks Casey for putting this out there to the community! >> >> Sure, and thanks also to Les Hazlewood for the original code. :-) >> >> -Casey >> Boxbe, Inc. -- http://boxbe.com/freebox.html >> >> =========================================================================== >> To unsubscribe, send email to listserv@... and include in the body >> of the message "signoff JAVAMAIL-INTEREST". For general help, send email to >> listserv@... and include in the body of the message "help". > > =========================================================================== > To unsubscribe, send email to listserv@... and include in the body > of the message "signoff JAVAMAIL-INTEREST". For general help, send email to > listserv@... and include in the body of the message "help". > =========================================================================== To unsubscribe, send email to listserv@... and include in the body of the message "signoff JAVAMAIL-INTEREST". For general help, send email to listserv@... and include in the body of the message "help". |
|
|
|
|
|
Re: FYI: Java-based email parser, mostly 2822-compliantFinally getting back to this...
> Bill - out of curiosity, can you describe some of the cases that don't > follow 2822 but that "we need to support"? The big one I remember is addresses of the form "joe", i.e., addresses with no domain. At the time JavaMail was created, such addresses were widely used within Sun. Since many people were in the same domain, specifying the domain name explicitly was seen as redundant, so the mail server supported simple names as email addresses in the same domain. That's less common these days, but some of us have mailboxes full of old messages that we'd still like to be able to read. And again, a major reason that we never felt it was worth to effort to do a "perfect" job of parsing the email address was because JavaMail was created for use in clients where the server would do the more strict checking. Plus, even if you could determine perfectly that the address had the correct syntax, that didn't tell you anything about whether the address was valid. The only way to find out that the address is valid is to send a message to it. Since applications had to handle such failures anyway, it's easy enough to also handle the failure where the server decides the address isn't syntactically correct, even though JavaMail thinks it's ok. I have to say, the EmailAddress.java code is impressive in its use of regular expressions. A better (more reliable, more efficient) approach would be to use a lexical analyzer and a parser that follows the RFC 2822 BNF grammar. Applying common parser error correcting techniques would probably allow you to handle many invalid addresses as well. Still, I'll add a pointer to EmailAddress.java to the JavaMail third party products page the next time I update it. It will surely be useful to some people. =========================================================================== To unsubscribe, send email to listserv@... and include in the body of the message "signoff JAVAMAIL-INTEREST". For general help, send email to listserv@... and include in the body of the message "help". |
|
|
|
| Free Forum Powered by Nabble | Forum Help |