« Return to Thread: Planning Tika 0.2

Re: Planning Tika 0.2

by Chris Mattmann :: Rate this Message:

Reply to Author | View in Thread

Hi Jukka,

>
> Tika has already come a long way since the 0.1 release, and I'd like
> to push for the next release, 0.2. Any special wishes of the features
> to include?

Yes, you are right and I am really looking forward to Tika 0.2. I've got a
couple wishes:

TIKA-80 Utility method in MimeUtils to perform full mime resolution using
all available strategies

TIKA-74 Test Resources should be loaded by the class loader (e.g.
getResourceAsStream()).

TIKA-61 Add namespaces to our metadata keys

TIKA-121 MimeType.clean method no longer exists as a capability

TIKA-79 Mime type detection from file header appears to be failing.

TIKA-118 Bouncycastle binaries requires US exports regulation compliance


As for TIKA-80, TIKA-74, TIKA-61, TIKA-121, TIKA-79, I assigned them to me
and will push hard to get them closed out within the next few weeks. I'm not
sure how much I can help with TIKA-118, but we have the same issue now in
Nutch (since Nutch now depends on apache-tika-0.1-incubating official
release), so I will watch how you guys solve that problem and then follow
suit :)

>
> My goals for the release would be finishing TIKA-115 (making a
> runnable jar instead of using startup scripts), upgrading our parser
> dependencies (especially POI), and closing some of the reported bugs.

+1

>
> It would be nice to get the media type registry and configuration
> changes that I've been working on finished, but that's IMO not a
> requirement before 1.0. A nice extra feature would be some light
> integration with Lucene Java. Also, I've been thinking about
> potentially splitting Tika into component libraries like tika-core,
> tika-parsers, tika-lucene, etc. to better manage external dependencies
> and to make it more attractive for parser libraries to directly
> implement the Parser interface.

I think separate libraries is a very interesting and cool idea. I'm happy to
help out with the separation, but I don't think it's a req for 0.2.

Also, once we're ready to release, I volunteer to be the release manager if
everyone is +1 for it.

Thanks!

Cheers,
 Chris


>
> BR,
>
> Jukka Zitting

______________________________________________
Chris Mattmann, Ph.D.
Chris.Mattmann@...
Cognizant Development Engineer
Early Detection Research Network Project
_________________________________________________
Jet Propulsion Laboratory            Pasadena, CA
Office: 171-266B                     Mailstop:  171-246
_______________________________________________________

Disclaimer:  The opinions presented within are my own and do not reflect
those of either NASA, JPL, or the California Institute of Technology.


 « Return to Thread: Planning Tika 0.2

LightInTheBox - Buy quality products at wholesale price!