Planning Tika 0.2

View: New views
11 Messages — Rating Filter:   Alert me  

Planning Tika 0.2

by Jukka Zitting :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hi,

Tika has already come a long way since the 0.1 release, and I'd like
to push for the next release, 0.2. Any special wishes of the features
to include?

My goals for the release would be finishing TIKA-115 (making a
runnable jar instead of using startup scripts), upgrading our parser
dependencies (especially POI), and closing some of the reported bugs.

It would be nice to get the media type registry and configuration
changes that I've been working on finished, but that's IMO not a
requirement before 1.0. A nice extra feature would be some light
integration with Lucene Java. Also, I've been thinking about
potentially splitting Tika into component libraries like tika-core,
tika-parsers, tika-lucene, etc. to better manage external dependencies
and to make it more attractive for parser libraries to directly
implement the Parser interface.

BR,

Jukka Zitting

Re: Planning Tika 0.2

by robert burrell donkin-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Sun, May 25, 2008 at 2:50 PM, Jukka Zitting <jukka.zitting@...> wrote:

> Hi,
>
> Tika has already come a long way since the 0.1 release, and I'd like
> to push for the next release, 0.2. Any special wishes of the features
> to include?
>
> My goals for the release would be finishing TIKA-115 (making a
> runnable jar instead of using startup scripts), upgrading our parser
> dependencies (especially POI), and closing some of the reported bugs.
>
> It would be nice to get the media type registry and configuration
> changes that I've been working on finished, but that's IMO not a
> requirement before 1.0. A nice extra feature would be some light
> integration with Lucene Java. Also, I've been thinking about
> potentially splitting Tika into component libraries like tika-core,
> tika-parsers, tika-lucene, etc. to better manage external dependencies
> and to make it more attractive for parser libraries to directly
> implement the Parser interface.

components sound good to me :-)

- robert

Re: Planning Tika 0.2

by Chris Mattmann :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hi Jukka,

>
> Tika has already come a long way since the 0.1 release, and I'd like
> to push for the next release, 0.2. Any special wishes of the features
> to include?

Yes, you are right and I am really looking forward to Tika 0.2. I've got a
couple wishes:

TIKA-80 Utility method in MimeUtils to perform full mime resolution using
all available strategies

TIKA-74 Test Resources should be loaded by the class loader (e.g.
getResourceAsStream()).

TIKA-61 Add namespaces to our metadata keys

TIKA-121 MimeType.clean method no longer exists as a capability

TIKA-79 Mime type detection from file header appears to be failing.

TIKA-118 Bouncycastle binaries requires US exports regulation compliance


As for TIKA-80, TIKA-74, TIKA-61, TIKA-121, TIKA-79, I assigned them to me
and will push hard to get them closed out within the next few weeks. I'm not
sure how much I can help with TIKA-118, but we have the same issue now in
Nutch (since Nutch now depends on apache-tika-0.1-incubating official
release), so I will watch how you guys solve that problem and then follow
suit :)

>
> My goals for the release would be finishing TIKA-115 (making a
> runnable jar instead of using startup scripts), upgrading our parser
> dependencies (especially POI), and closing some of the reported bugs.

+1

>
> It would be nice to get the media type registry and configuration
> changes that I've been working on finished, but that's IMO not a
> requirement before 1.0. A nice extra feature would be some light
> integration with Lucene Java. Also, I've been thinking about
> potentially splitting Tika into component libraries like tika-core,
> tika-parsers, tika-lucene, etc. to better manage external dependencies
> and to make it more attractive for parser libraries to directly
> implement the Parser interface.

I think separate libraries is a very interesting and cool idea. I'm happy to
help out with the separation, but I don't think it's a req for 0.2.

Also, once we're ready to release, I volunteer to be the release manager if
everyone is +1 for it.

Thanks!

Cheers,
 Chris


>
> BR,
>
> Jukka Zitting

______________________________________________
Chris Mattmann, Ph.D.
Chris.Mattmann@...
Cognizant Development Engineer
Early Detection Research Network Project
_________________________________________________
Jet Propulsion Laboratory            Pasadena, CA
Office: 171-266B                     Mailstop:  171-246
_______________________________________________________

Disclaimer:  The opinions presented within are my own and do not reflect
those of either NASA, JPL, or the California Institute of Technology.



Re: Planning Tika 0.2

by Jukka Zitting :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hi,

On Fri, Jun 6, 2008 at 8:45 PM, Chris Mattmann
<chris.mattmann@...> wrote:
> TIKA-118 Bouncycastle binaries requires US exports regulation compliance

Done.

> I think separate libraries is a very interesting and cool idea. I'm happy to
> help out with the separation, but I don't think it's a req for 0.2.

Agreed, we can do that later.

> Also, once we're ready to release, I volunteer to be the release manager if
> everyone is +1 for it.

Excellent, +1 from me.

BR,

Jukka Zitting

Re: Planning Tika 0.2

by Sami Siren-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Chris Mattmann wrote:
> Hi Jukka,
>
>  
> Also, once we're ready to release, I volunteer to be the release manager if
> everyone is +1 for it.
>  

+1

--
 Sami Siren

Re: Planning Tika 0.2

by Niall Pemberton-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Mon, Jun 9, 2008 at 6:27 PM, Sami Siren <ssiren@...> wrote:

> Chris Mattmann wrote:
>>
>> Hi Jukka,
>>
>>  Also, once we're ready to release, I volunteer to be the release manager
>> if
>> everyone is +1 for it.
>>
>
> +1

+1 from me, sorry haven't found any time to help with Tika

Niall

Re: Planning Tika 0.2

by Rida Benjelloun :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

+1
Rida.


2008/6/9 Niall Pemberton <niall.pemberton@...>:

> On Mon, Jun 9, 2008 at 6:27 PM, Sami Siren <ssiren@...> wrote:
> > Chris Mattmann wrote:
> >>
> >> Hi Jukka,
> >>
> >>  Also, once we're ready to release, I volunteer to be the release
> manager
> >> if
> >> everyone is +1 for it.
> >>
> >
> > +1
>
> +1 from me, sorry haven't found any time to help with Tika
>
> Niall
>

Re: Planning Tika 0.2

by Keith R. Bennett :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

A *very* belated +1 from me too.

- Keith

Chris Mattmann wrote:
Hi Jukka,


Also, once we're ready to release, I volunteer to be the release manager if
everyone is +1 for it.

Thanks!

Cheers,
 Chris

Re: Planning Tika 0.2

by Jukka Zitting :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hi,

The following issues were remaining on the 0.2 roadmap:

  TIKA-50  Unit tests are incomplete.
  TIKA-61  Add namespaces to our metadata keys
  TIKA-69  ParseUtils methods need to support Metadata
  TIKA-74  Test Resources should be loaded by the class loader ...
  TIKA-79  Mime type detection from file header appears to be failing
  TIKA-80  Utility method in MimeUtils to perform full mime resolution ...
  TIKA-121 MimeType.clean method no longer exists as a capability

None of them looked terribly urgent or blocking, so I just removed
them from the 0.2 roadmap.

I think the current trunk is good enough to be released.

BR,

Jukka Zitting

Re: Planning Tika 0.2

by Sami Siren-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Jukka Zitting wrote:
> I think the current trunk is good enough to be released.
>  
+1

--
 Sami Siren


Re: Planning Tika 0.2

by Dave Meikle-3 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

2008/9/28 Sami Siren <ssiren@...>

> Jukka Zitting wrote:
>
>> I think the current trunk is good enough to be released.
>>
>>
> +1
>
> --
> Sami Siren
>
>
If it mattered from me I would give it a +1, but since it doesn't I will
just give it a smile :-)
LightInTheBox - Buy quality products at wholesale price