Hi Jukka,
>
> Tika has already come a long way since the 0.1 release, and I'd like
> to push for the next release, 0.2. Any special wishes of the features
> to include?
Yes, you are right and I am really looking forward to Tika 0.2. I've got a
couple wishes:
TIKA-80 Utility method in MimeUtils to perform full mime resolution using
all available strategies
TIKA-74 Test Resources should be loaded by the class loader (e.g.
getResourceAsStream()).
TIKA-61 Add namespaces to our metadata keys
TIKA-121 MimeType.clean method no longer exists as a capability
TIKA-79 Mime type detection from file header appears to be failing.
TIKA-118 Bouncycastle binaries requires US exports regulation compliance
As for TIKA-80, TIKA-74, TIKA-61, TIKA-121, TIKA-79, I assigned them to me
and will push hard to get them closed out within the next few weeks. I'm not
sure how much I can help with TIKA-118, but we have the same issue now in
Nutch (since Nutch now depends on apache-tika-0.1-incubating official
release), so I will watch how you guys solve that problem and then follow
suit :)
>
> My goals for the release would be finishing TIKA-115 (making a
> runnable jar instead of using startup scripts), upgrading our parser
> dependencies (especially POI), and closing some of the reported bugs.
+1
>
> It would be nice to get the media type registry and configuration
> changes that I've been working on finished, but that's IMO not a
> requirement before 1.0. A nice extra feature would be some light
> integration with Lucene Java. Also, I've been thinking about
> potentially splitting Tika into component libraries like tika-core,
> tika-parsers, tika-lucene, etc. to better manage external dependencies
> and to make it more attractive for parser libraries to directly
> implement the Parser interface.
I think separate libraries is a very interesting and cool idea. I'm happy to
help out with the separation, but I don't think it's a req for 0.2.
Also, once we're ready to release, I volunteer to be the release manager if
everyone is +1 for it.
Thanks!
Cheers,
Chris
>
> BR,
>
> Jukka Zitting
______________________________________________
Chris Mattmann, Ph.D.
Chris.Mattmann@...
Cognizant Development Engineer
Early Detection Research Network Project
_________________________________________________
Jet Propulsion Laboratory Pasadena, CA
Office: 171-266B Mailstop: 171-246
_______________________________________________________
Disclaimer: The opinions presented within are my own and do not reflect
those of either NASA, JPL, or the California Institute of Technology.