<?xml version="1.0" encoding="utf-8"?>
<feed xmlns="http://www.w3.org/2005/Atom">
	<id>tag:www.nabble.com,2006:forum-20913</id>
	<title>Nabble - Apache Tika - Development</title>
	<updated>2008-10-08T16:16:50Z</updated>
	<link rel="self" type="application/atom+xml" href="http://www.nabble.com/Apache-Tika---Development-f20913.xml" />
	<link rel="alternate" type="text/html" href="http://www.nabble.com/Apache-Tika---Development-f20913.html" />
	<subtitle type="html">&lt;a href=&quot;http://incubator.apache.org/tika/&quot; target=&quot;_top&quot; rel=&quot;nofollow&quot;&gt;Apache Tika&lt;/a&gt;&amp;nbsp;is a toolkit for detecting and extracting metadata and structured text content from various documents using existing parser libraries.</subtitle>
	
<entry>
	<id>tag:www.nabble.com,2006:post-19889445</id>
	<title>Re: Tika report due October 8th</title>
	<published>2008-10-08T16:16:50Z</published>
	<updated>2008-10-08T16:16:50Z</updated>
	<author>
		<name>Jukka Zitting</name>
	</author>
	<content type="html">Hi,
&lt;br&gt;&lt;br&gt;On Thu, Oct 2, 2008 at 11:17 PM, Bertrand Delacretaz
&lt;br&gt;&amp;lt;&lt;a href=&quot;http://www.nabble.com/user/SendEmail.jtp?type=post&amp;post=19889445&amp;i=0&quot; target=&quot;_top&quot; rel=&quot;nofollow&quot;&gt;bdelacretaz@...&lt;/a&gt;&amp;gt; wrote:
&lt;br&gt;&amp;gt; Find a draft below - I'll be offline next week, feel free to finalize and post.
&lt;br&gt;&lt;br&gt;Posted, thanks!
&lt;br&gt;&lt;br&gt;BR,
&lt;br&gt;&lt;br&gt;Jukka Zitting
&lt;br&gt;</content>
	<link rel="alternate" type="text/html" href="http://www.nabble.com/Tika-report-due-October-8th-tp19787768p19889445.html" />
</entry>

<entry>
	<id>tag:www.nabble.com,2006:post-19854708</id>
	<title>[jira] Updated: (TIKA-164) Update nekohtml version</title>
	<published>2008-10-07T02:41:44Z</published>
	<updated>2008-10-07T02:41:44Z</updated>
	<author>
		<name>JIRA jira@apache.org</name>
	</author>
	<content type="html">&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp;[ &lt;a href=&quot;https://issues.apache.org/jira/browse/TIKA-164?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel&quot; target=&quot;_top&quot; rel=&quot;nofollow&quot;&gt;https://issues.apache.org/jira/browse/TIKA-164?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel&lt;/a&gt;&amp;nbsp;]
&lt;br&gt;&lt;br&gt;Jukka Zitting updated TIKA-164:
&lt;br&gt;-------------------------------
&lt;br&gt;&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; Component/s: parser
&lt;br&gt;&amp;nbsp; &amp;nbsp; Affects Version/s: &amp;nbsp; &amp;nbsp; (was: 0.1-incubating)
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;Issue Type: Improvement &amp;nbsp;(was: Task)
&lt;br&gt;&lt;br&gt;+1
&lt;br&gt;&lt;br&gt;For the record, the nekohtml change history is at &lt;a href=&quot;http://nekohtml.sourceforge.net/changes.html&quot; target=&quot;_top&quot; rel=&quot;nofollow&quot;&gt;http://nekohtml.sourceforge.net/changes.html&lt;/a&gt;&lt;br&gt;&lt;div class='shrinkable-quote'&gt;&lt;br&gt;&amp;gt; Update nekohtml version
&lt;br&gt;&amp;gt; -----------------------
&lt;br&gt;&amp;gt;
&lt;br&gt;&amp;gt; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; Key: TIKA-164
&lt;br&gt;&amp;gt; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; URL: &lt;a href=&quot;https://issues.apache.org/jira/browse/TIKA-164&quot; target=&quot;_top&quot; rel=&quot;nofollow&quot;&gt;https://issues.apache.org/jira/browse/TIKA-164&lt;/a&gt;&lt;br&gt;&amp;gt; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; Project: Tika
&lt;br&gt;&amp;gt; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;Issue Type: Improvement
&lt;br&gt;&amp;gt; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;Components: parser
&lt;br&gt;&amp;gt; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;Reporter: Sami Siren
&lt;br&gt;&amp;gt;
&lt;br&gt;&amp;gt; Latest version currently available is 1.9.9.
&lt;/div&gt;&lt;br&gt;-- 
&lt;br&gt;This message is automatically generated by JIRA.
&lt;br&gt;-
&lt;br&gt;You can reply to this email to add a comment to the issue online.
&lt;br&gt;&lt;br&gt;</content>
	<link rel="alternate" type="text/html" href="http://www.nabble.com/-jira--Created%3A-%28TIKA-164%29-Update-nekohtml-version-tp19728113p19854708.html" />
</entry>

<entry>
	<id>tag:www.nabble.com,2006:post-19854654</id>
	<title>[jira] Updated: (TIKA-165) update icu4j</title>
	<published>2008-10-07T02:37:44Z</published>
	<updated>2008-10-07T02:37:44Z</updated>
	<author>
		<name>JIRA jira@apache.org</name>
	</author>
	<content type="html">&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp;[ &lt;a href=&quot;https://issues.apache.org/jira/browse/TIKA-165?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel&quot; target=&quot;_top&quot; rel=&quot;nofollow&quot;&gt;https://issues.apache.org/jira/browse/TIKA-165?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel&lt;/a&gt;&amp;nbsp;]
&lt;br&gt;&lt;br&gt;Jukka Zitting updated TIKA-165:
&lt;br&gt;-------------------------------
&lt;br&gt;&lt;br&gt;&amp;nbsp; &amp;nbsp; Component/s: parser
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp;Issue Type: Improvement &amp;nbsp;(was: Task)
&lt;br&gt;&lt;br&gt;+1
&lt;br&gt;&lt;div class='shrinkable-quote'&gt;&lt;br&gt;&amp;gt; update icu4j
&lt;br&gt;&amp;gt; ------------
&lt;br&gt;&amp;gt;
&lt;br&gt;&amp;gt; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; Key: TIKA-165
&lt;br&gt;&amp;gt; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; URL: &lt;a href=&quot;https://issues.apache.org/jira/browse/TIKA-165&quot; target=&quot;_top&quot; rel=&quot;nofollow&quot;&gt;https://issues.apache.org/jira/browse/TIKA-165&lt;/a&gt;&lt;br&gt;&amp;gt; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; Project: Tika
&lt;br&gt;&amp;gt; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;Issue Type: Improvement
&lt;br&gt;&amp;gt; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;Components: parser
&lt;br&gt;&amp;gt; &amp;nbsp; &amp;nbsp;Affects Versions: 0.1-incubating
&lt;br&gt;&amp;gt; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;Reporter: Sami Siren
&lt;br&gt;&amp;gt;
&lt;br&gt;&amp;gt; Latest version currently available is 3.8. 
&lt;/div&gt;&lt;br&gt;-- 
&lt;br&gt;This message is automatically generated by JIRA.
&lt;br&gt;-
&lt;br&gt;You can reply to this email to add a comment to the issue online.
&lt;br&gt;&lt;br&gt;</content>
	<link rel="alternate" type="text/html" href="http://www.nabble.com/-jira--Created%3A-%28TIKA-165%29-update-icu4j-tp19728170p19854654.html" />
</entry>

<entry>
	<id>tag:www.nabble.com,2006:post-19852633</id>
	<title>[jira] Commented: (TIKA-167) Tika presentation @ ApacheConUs 2008: review</title>
	<published>2008-10-07T00:30:44Z</published>
	<updated>2008-10-07T00:30:44Z</updated>
	<author>
		<name>JIRA jira@apache.org</name>
	</author>
	<content type="html">&lt;br&gt;&amp;nbsp; &amp;nbsp; [ &lt;a href=&quot;https://issues.apache.org/jira/browse/TIKA-167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&amp;focusedCommentId=12637401#action_12637401&quot; target=&quot;_top&quot; rel=&quot;nofollow&quot;&gt;https://issues.apache.org/jira/browse/TIKA-167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&amp;focusedCommentId=12637401#action_12637401&lt;/a&gt;&amp;nbsp;] 
&lt;br&gt;&lt;br&gt;Jerome Charron commented on TIKA-167:
&lt;br&gt;-------------------------------------
&lt;br&gt;&lt;br&gt;Very nice presentation.
&lt;br&gt;Just a detail : Tika doesn't come from my son's name, but from the name my son gives to its &amp;quot;doudou&amp;quot; (I don't know what is the english for &amp;quot;doudou&amp;quot; ... &amp;nbsp;do you see what I mean?)
&lt;br&gt;&lt;br&gt;&lt;div class='shrinkable-quote'&gt;&lt;br&gt;&amp;gt; Tika presentation @ ApacheConUs 2008: review
&lt;br&gt;&amp;gt; --------------------------------------------
&lt;br&gt;&amp;gt;
&lt;br&gt;&amp;gt; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; Key: TIKA-167
&lt;br&gt;&amp;gt; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; URL: &lt;a href=&quot;https://issues.apache.org/jira/browse/TIKA-167&quot; target=&quot;_top&quot; rel=&quot;nofollow&quot;&gt;https://issues.apache.org/jira/browse/TIKA-167&lt;/a&gt;&lt;br&gt;&amp;gt; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; Project: Tika
&lt;br&gt;&amp;gt; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;Issue Type: Task
&lt;br&gt;&amp;gt; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;Components: documentation
&lt;br&gt;&amp;gt; &amp;nbsp; &amp;nbsp;Affects Versions: 0.2-incubating
&lt;br&gt;&amp;gt; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;Reporter: Paolo Mottadelli
&lt;br&gt;&amp;gt; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; Attachments: ApacheConUS2008_Tika_PaoloMottadelli.pdf
&lt;br&gt;&amp;gt;
&lt;br&gt;&amp;gt;
&lt;br&gt;&amp;gt; As I have not been involved in the development process, it would be great if someone could review the Tika part of my presentation. I am attaching a rough version of my slides concerning the Tika presentation and listing some *** Open Points ***. Please, let me know if I am out of scope in some parts and if I can get better anyhow.
&lt;br&gt;&amp;gt; *** Open Points: ***
&lt;br&gt;&amp;gt; &amp;nbsp; &amp;nbsp;* What does TIKA mean? (literally)
&lt;br&gt;&amp;gt; &amp;nbsp; &amp;nbsp;* How many registered media types; glob/magic header (slide 7)
&lt;br&gt;&amp;gt; &amp;nbsp; &amp;nbsp;* How many supported media types (slide 7)
&lt;br&gt;&amp;gt; &amp;nbsp; &amp;nbsp;* How many committers? (slide 7)
&lt;br&gt;&amp;gt; &amp;nbsp; &amp;nbsp;* Do we have a download/chackout history for Tika? (eventually slide 10)
&lt;br&gt;&amp;gt; &amp;nbsp; &amp;nbsp;* Future goals; to be completed? &amp;nbsp;(slide 31)
&lt;br&gt;&amp;gt; &amp;nbsp; &amp;nbsp;* Next parsers to be implemented? (slide 32)
&lt;br&gt;&amp;gt; &amp;nbsp; &amp;nbsp;* Who uses Tika? projects using Tika &amp;nbsp;(slide 33)
&lt;br&gt;&amp;gt; &amp;nbsp; &amp;nbsp;* Integration scenarios with other Lucene projects (slide 34)
&lt;br&gt;&amp;gt; &amp;nbsp; &amp;nbsp;* Related projects: others? (slide 34)
&lt;/div&gt;&lt;br&gt;-- 
&lt;br&gt;This message is automatically generated by JIRA.
&lt;br&gt;-
&lt;br&gt;You can reply to this email to add a comment to the issue online.
&lt;br&gt;&lt;br&gt;</content>
	<link rel="alternate" type="text/html" href="http://www.nabble.com/-jira--Created%3A-%28TIKA-167%29-Tika-presentation-%40-ApacheConUs-2008%3A-review-tp19744250p19852633.html" />
</entry>

<entry>
	<id>tag:www.nabble.com,2006:post-19848470</id>
	<title>[jira] Commented: (TIKA-167) Tika presentation @ ApacheConUs 2008: review</title>
	<published>2008-10-06T16:42:44Z</published>
	<updated>2008-10-06T16:42:44Z</updated>
	<author>
		<name>JIRA jira@apache.org</name>
	</author>
	<content type="html">&lt;br&gt;&amp;nbsp; &amp;nbsp; [ &lt;a href=&quot;https://issues.apache.org/jira/browse/TIKA-167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&amp;focusedCommentId=12637302#action_12637302&quot; target=&quot;_top&quot; rel=&quot;nofollow&quot;&gt;https://issues.apache.org/jira/browse/TIKA-167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&amp;focusedCommentId=12637302#action_12637302&lt;/a&gt;&amp;nbsp;] 
&lt;br&gt;&lt;br&gt;Jukka Zitting commented on TIKA-167:
&lt;br&gt;------------------------------------
&lt;br&gt;&lt;br&gt;The presentation looks great!
&lt;br&gt;&lt;br&gt;&amp;gt; What does TIKA mean? (literally)
&lt;br&gt;&lt;br&gt;&amp;quot;Tika&amp;quot; comes from the name of Jérôme Charron's (who first proposed the project in 2006) son.
&lt;br&gt;&lt;br&gt;&amp;gt; How many registered media types; glob/magic header (slide 7)
&lt;br&gt;&lt;br&gt;We currently have 77 registered mime types (plus 36 aliases), 179 glob patterns, and 18 magic patterns.
&lt;br&gt;&lt;br&gt;&amp;gt; How many supported media types (slide 7)
&lt;br&gt;&lt;br&gt;Depends on what you mean by &amp;quot;supported&amp;quot;. We currently have 15 parser classes configured for a total of 66 mime types.
&lt;br&gt;&lt;br&gt;&amp;gt; How many committers? (slide 7)
&lt;br&gt;&lt;br&gt;Six, plus Dave Meikle who was just voted in.
&lt;br&gt;&lt;br&gt;&amp;gt; Do we have a download/chackout history for Tika? (eventually slide 10)
&lt;br&gt;&lt;br&gt;Not really. I could try to dig something up if you want, though &amp;nbsp;I expect the numbers to be fairly low still as we've kept a relatively low profile so far.
&lt;br&gt;&lt;br&gt;&amp;gt; Future goals; to be completed? (slide 31)
&lt;br&gt;&lt;br&gt;Main goals off the top of my head, see the issue tracker for more:
&lt;br&gt;&lt;br&gt;- Improved metadata handling, perhaps with XMP support
&lt;br&gt;- Better configurability of Tika
&lt;br&gt;- Improved media type registry
&lt;br&gt;- More parser implementations
&lt;br&gt;&lt;br&gt;&amp;gt; Next parsers to be implemented? (slide 32)
&lt;br&gt;&lt;br&gt;- Office Open XML based on a POI upgrade
&lt;br&gt;- Structural parsers (i.e. more than just a flat text stream) for PDF, Word, OpenDocument, etc. 
&lt;br&gt;- More multimedia formats: image, audio, video
&lt;br&gt;&lt;br&gt;&amp;gt; Who uses Tika? projects using Tika (slide 33)
&lt;br&gt;&amp;gt; Integration scenarios with other Lucene projects (slide 34)
&lt;br&gt;&lt;br&gt;Not that many now that we're still incubating. Beyond Nutch we have at least Apache Jackrabbit with a sandbox component with Tika support, the Droids lab (to be incubated) that is currently adding Tika integration, and the UIMA project (incubating) that has a proposed patch with Tika support.
&lt;br&gt;&lt;br&gt;&amp;gt; Related projects: others? (slide 34)
&lt;br&gt;&lt;br&gt;Aperture (&lt;a href=&quot;http://aperture.sourceforge.net/&quot; target=&quot;_top&quot; rel=&quot;nofollow&quot;&gt;http://aperture.sourceforge.net/&lt;/a&gt;) is another project with similar (though wider) goals.
&lt;br&gt;&lt;br&gt;&lt;br&gt;&lt;div class='shrinkable-quote'&gt;&lt;br&gt;&amp;gt; Tika presentation @ ApacheConUs 2008: review
&lt;br&gt;&amp;gt; --------------------------------------------
&lt;br&gt;&amp;gt;
&lt;br&gt;&amp;gt; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; Key: TIKA-167
&lt;br&gt;&amp;gt; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; URL: &lt;a href=&quot;https://issues.apache.org/jira/browse/TIKA-167&quot; target=&quot;_top&quot; rel=&quot;nofollow&quot;&gt;https://issues.apache.org/jira/browse/TIKA-167&lt;/a&gt;&lt;br&gt;&amp;gt; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; Project: Tika
&lt;br&gt;&amp;gt; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;Issue Type: Task
&lt;br&gt;&amp;gt; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;Components: documentation
&lt;br&gt;&amp;gt; &amp;nbsp; &amp;nbsp;Affects Versions: 0.2-incubating
&lt;br&gt;&amp;gt; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;Reporter: Paolo Mottadelli
&lt;br&gt;&amp;gt; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; Attachments: ApacheConUS2008_Tika_PaoloMottadelli.pdf
&lt;br&gt;&amp;gt;
&lt;br&gt;&amp;gt;
&lt;br&gt;&amp;gt; As I have not been involved in the development process, it would be great if someone could review the Tika part of my presentation. I am attaching a rough version of my slides concerning the Tika presentation and listing some *** Open Points ***. Please, let me know if I am out of scope in some parts and if I can get better anyhow.
&lt;br&gt;&amp;gt; *** Open Points: ***
&lt;br&gt;&amp;gt; &amp;nbsp; &amp;nbsp;* What does TIKA mean? (literally)
&lt;br&gt;&amp;gt; &amp;nbsp; &amp;nbsp;* How many registered media types; glob/magic header (slide 7)
&lt;br&gt;&amp;gt; &amp;nbsp; &amp;nbsp;* How many supported media types (slide 7)
&lt;br&gt;&amp;gt; &amp;nbsp; &amp;nbsp;* How many committers? (slide 7)
&lt;br&gt;&amp;gt; &amp;nbsp; &amp;nbsp;* Do we have a download/chackout history for Tika? (eventually slide 10)
&lt;br&gt;&amp;gt; &amp;nbsp; &amp;nbsp;* Future goals; to be completed? &amp;nbsp;(slide 31)
&lt;br&gt;&amp;gt; &amp;nbsp; &amp;nbsp;* Next parsers to be implemented? (slide 32)
&lt;br&gt;&amp;gt; &amp;nbsp; &amp;nbsp;* Who uses Tika? projects using Tika &amp;nbsp;(slide 33)
&lt;br&gt;&amp;gt; &amp;nbsp; &amp;nbsp;* Integration scenarios with other Lucene projects (slide 34)
&lt;br&gt;&amp;gt; &amp;nbsp; &amp;nbsp;* Related projects: others? (slide 34)
&lt;/div&gt;&lt;br&gt;-- 
&lt;br&gt;This message is automatically generated by JIRA.
&lt;br&gt;-
&lt;br&gt;You can reply to this email to add a comment to the issue online.
&lt;br&gt;&lt;br&gt;</content>
	<link rel="alternate" type="text/html" href="http://www.nabble.com/-jira--Created%3A-%28TIKA-167%29-Tika-presentation-%40-ApacheConUs-2008%3A-review-tp19744250p19848470.html" />
</entry>

<entry>
	<id>tag:www.nabble.com,2006:post-19787768</id>
	<title>Tika report due October 8th</title>
	<published>2008-10-02T14:17:23Z</published>
	<updated>2008-10-02T14:17:23Z</updated>
	<author>
		<name>Bertrand Delacretaz</name>
	</author>
	<content type="html">Hi,
&lt;br&gt;&lt;br&gt;See &lt;a href=&quot;http://wiki.apache.org/incubator/October2008&quot; target=&quot;_top&quot; rel=&quot;nofollow&quot;&gt;http://wiki.apache.org/incubator/October2008&lt;/a&gt;&lt;br&gt;&lt;br&gt;Find a draft below - I'll be offline next week, feel free to finalize and post.
&lt;br&gt;&lt;br&gt;-Bertrand
&lt;br&gt;&lt;br&gt;&lt;br&gt;Apache Tika is a toolkit for detecting and extracting metadata and
&lt;br&gt;structured text content from various documents using existing parser
&lt;br&gt;libraries. Tika entered incubation on March 22nd, 2007.
&lt;br&gt;&lt;br&gt;Community
&lt;br&gt;Dave Meikle was just voted in as a new committer. Paolo Mottadelli
&lt;br&gt;will present Tika at ApacheCon US.
&lt;br&gt;&lt;br&gt;Development
&lt;br&gt;Tika 0.2 should be released soon.
&lt;br&gt;Usage documentation has been added to the website.
&lt;br&gt;&lt;br&gt;Issues before graduation:
&lt;br&gt;The current plan is to graduate as a Lucene subproject, which could
&lt;br&gt;happen soon as the incubation criteria seem to be met.
&lt;br&gt;</content>
	<link rel="alternate" type="text/html" href="http://www.nabble.com/Tika-report-due-October-8th-tp19787768p19787768.html" />
</entry>

<entry>
	<id>tag:www.nabble.com,2006:post-19774006</id>
	<title>Re: New Tika committer</title>
	<published>2008-10-01T23:20:56Z</published>
	<updated>2008-10-01T23:20:56Z</updated>
	<author>
		<name>Bertrand Delacretaz</name>
	</author>
	<content type="html">On Wed, Oct 1, 2008 at 10:50 PM, Jukka Zitting &amp;lt;&lt;a href=&quot;http://www.nabble.com/user/SendEmail.jtp?type=post&amp;post=19774006&amp;i=0&quot; target=&quot;_top&quot; rel=&quot;nofollow&quot;&gt;jukka.zitting@...&lt;/a&gt;&amp;gt; wrote:
&lt;br&gt;&amp;gt; ...Dave, welcome to the team!..
&lt;br&gt;&lt;br&gt;Congrats Dave, and welcome!
&lt;br&gt;-Bertrand
&lt;br&gt;</content>
	<link rel="alternate" type="text/html" href="http://www.nabble.com/New-Tika-committer-tp19768863p19774006.html" />
</entry>

<entry>
	<id>tag:www.nabble.com,2006:post-19768863</id>
	<title>New Tika committer</title>
	<published>2008-10-01T13:50:13Z</published>
	<updated>2008-10-01T13:50:13Z</updated>
	<author>
		<name>Jukka Zitting</name>
	</author>
	<content type="html">Hi,
&lt;br&gt;&lt;br&gt;I'm proud to announce that Dave Meikle will be joining the Tika team
&lt;br&gt;as a committer. Based on Dave's enthusiasm and the high quality of his
&lt;br&gt;contributions the Tika PPMC recently decided to offer him
&lt;br&gt;committership in the project. The Incubator PMC has approved the
&lt;br&gt;decision and Dave has accepted the nomination, so as soon as we get
&lt;br&gt;the administrative bits in order he will be a full member of the Tika
&lt;br&gt;team.
&lt;br&gt;&lt;br&gt;Dave, welcome to the team!
&lt;br&gt;&lt;br&gt;BR,
&lt;br&gt;&lt;br&gt;Jukka Zitting
&lt;br&gt;</content>
	<link rel="alternate" type="text/html" href="http://www.nabble.com/New-Tika-committer-tp19768863p19768863.html" />
</entry>

<entry>
	<id>tag:www.nabble.com,2006:post-19751216</id>
	<title>Re: Apache Tika on the Fast Feather Track</title>
	<published>2008-09-30T14:58:24Z</published>
	<updated>2008-09-30T14:58:24Z</updated>
	<author>
		<name>Grant Ingersoll-6</name>
	</author>
	<content type="html">Would love to have Tika people at the Search BOF, too, if you feel it &amp;nbsp;
&lt;br&gt;is the right fit. &amp;nbsp;It would be good to have someone update the search &amp;nbsp;
&lt;br&gt;community on Tika's progress. &amp;nbsp;See &lt;a href=&quot;http://wiki.apache.org/apachecon/BirdsOfaFeatherUs08&quot; target=&quot;_top&quot; rel=&quot;nofollow&quot;&gt;http://wiki.apache.org/apachecon/BirdsOfaFeatherUs08&lt;/a&gt;&lt;br&gt;&lt;br&gt;-Grant
&lt;br&gt;&lt;br&gt;On Sep 28, 2008, at 6:56 AM, Jukka Zitting wrote:
&lt;br&gt;&lt;div class='shrinkable-quote'&gt;&lt;br&gt;&amp;gt; Hi,
&lt;br&gt;&amp;gt;
&lt;br&gt;&amp;gt; On Sun, Sep 28, 2008 at 1:21 AM, Jukka Zitting &amp;lt;&lt;a href=&quot;http://www.nabble.com/user/SendEmail.jtp?type=post&amp;post=19751216&amp;i=0&quot; target=&quot;_top&quot; rel=&quot;nofollow&quot;&gt;jukka.zitting@...&lt;/a&gt; 
&lt;br&gt;&amp;gt; &amp;gt; wrote:
&lt;br&gt;&amp;gt;&amp;gt; ApacheCon US is coming up and there will again be a Fast Feather &amp;nbsp;
&lt;br&gt;&amp;gt;&amp;gt; Track
&lt;br&gt;&amp;gt;&amp;gt; where we could present the latest developments in Tika. We have
&lt;br&gt;&amp;gt;&amp;gt; implemented quite a lot of new stuff since the last ApacheCon, and it
&lt;br&gt;&amp;gt;&amp;gt; would be good to showcase our progress.
&lt;br&gt;&amp;gt;
&lt;br&gt;&amp;gt; I just noticed (thanks to Grant's promo message) that Paolo Mottadelli
&lt;br&gt;&amp;gt; is already doing a full session [1] on Tika, so an extra FFT talk
&lt;br&gt;&amp;gt; might not be needed.
&lt;br&gt;&amp;gt;
&lt;br&gt;&amp;gt; [1] &lt;a href=&quot;http://us.apachecon.com/c/acus2008/sessions/12&quot; target=&quot;_top&quot; rel=&quot;nofollow&quot;&gt;http://us.apachecon.com/c/acus2008/sessions/12&lt;/a&gt;&lt;br&gt;&amp;gt;
&lt;br&gt;&amp;gt; BR,
&lt;br&gt;&amp;gt;
&lt;br&gt;&amp;gt; Jukka Zitting
&lt;/div&gt;&lt;br&gt;&lt;br&gt;</content>
	<link rel="alternate" type="text/html" href="http://www.nabble.com/Apache-Tika-on-the-Fast-Feather-Track-tp19707363p19751216.html" />
</entry>

<entry>
	<id>tag:www.nabble.com,2006:post-19744396</id>
	<title>Re: Apache Tika on the Fast Feather Track</title>
	<published>2008-09-30T08:27:01Z</published>
	<updated>2008-09-30T08:27:01Z</updated>
	<author>
		<name>Paolo Mottadelli</name>
	</author>
	<content type="html">Hi,
&lt;br&gt;&lt;br&gt;&amp;gt; I just noticed (thanks to Grant's promo message) that Paolo Mottadelli
&lt;br&gt;&amp;gt; is already doing a full session [1] on Tika, so an extra FFT talk
&lt;br&gt;&amp;gt; might not be needed.
&lt;br&gt;&amp;gt; [1] &lt;a href=&quot;http://us.apachecon.com/c/acus2008/sessions/12&quot; target=&quot;_top&quot; rel=&quot;nofollow&quot;&gt;http://us.apachecon.com/c/acus2008/sessions/12&lt;/a&gt;&lt;br&gt;&lt;br&gt;I have been working with Tika since last spring, using Tika for
&lt;br&gt;building the Open XML support in Alfresco ECM. I used Tika as a user,
&lt;br&gt;more than a Tika developer; anyway, I am really proud to present Tika
&lt;br&gt;at ApacheConUS.
&lt;br&gt;&lt;br&gt;I am completing a first version of the Office Open XML format Tika
&lt;br&gt;Parser, and I am looking forward to contribute it to the project. At
&lt;br&gt;the moment some *** POI(3.5-beta3)_dependencies -&amp;gt; maven *** related
&lt;br&gt;issues (and no time to solve them) are making me waiting.
&lt;br&gt;&lt;br&gt;As I have not been involved in the development process, it would be
&lt;br&gt;great if someone could review the Tika part of my presentation.
&lt;br&gt;I also listed some Open Points that would be great to 'close'.
&lt;br&gt;&lt;br&gt;I have created an Jira issue on this, attaching thta part of my presentation:
&lt;br&gt;&lt;a href=&quot;https://issues.apache.org/jira/browse/TIKA-167&quot; target=&quot;_top&quot; rel=&quot;nofollow&quot;&gt;https://issues.apache.org/jira/browse/TIKA-167&lt;/a&gt;&lt;br&gt;&lt;br&gt;Thank you very much for your so valuable contribution.
&lt;br&gt;&lt;br&gt;Cheers,
&lt;br&gt;&lt;br&gt;&lt;br&gt;&lt;br&gt;-- 
&lt;br&gt;Paolo Mottadelli: &lt;a href=&quot;http://www.paolomottadelli.com&quot; target=&quot;_top&quot; rel=&quot;nofollow&quot;&gt;http://www.paolomottadelli.com&lt;/a&gt;&lt;br&gt;Sourcesense - making sense of Open Source: &lt;a href=&quot;http://www.sourcesense.com&quot; target=&quot;_top&quot; rel=&quot;nofollow&quot;&gt;http://www.sourcesense.com&lt;/a&gt;&lt;br&gt;</content>
	<link rel="alternate" type="text/html" href="http://www.nabble.com/Apache-Tika-on-the-Fast-Feather-Track-tp19707363p19744396.html" />
</entry>

<entry>
	<id>tag:www.nabble.com,2006:post-19744249</id>
	<title>[jira] Updated: (TIKA-167) Tika presentation @ ApacheConUs 2008: review</title>
	<published>2008-09-30T08:19:44Z</published>
	<updated>2008-09-30T08:19:44Z</updated>
	<author>
		<name>JIRA jira@apache.org</name>
	</author>
	<content type="html">&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp;[ &lt;a href=&quot;https://issues.apache.org/jira/browse/TIKA-167?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel&quot; target=&quot;_top&quot; rel=&quot;nofollow&quot;&gt;https://issues.apache.org/jira/browse/TIKA-167?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel&lt;/a&gt;&amp;nbsp;]
&lt;br&gt;&lt;br&gt;Paolo Mottadelli updated TIKA-167:
&lt;br&gt;----------------------------------
&lt;br&gt;&lt;br&gt;&amp;nbsp; &amp;nbsp; Attachment: ApacheConUS2008_Tika_PaoloMottadelli.pdf
&lt;br&gt;&lt;br&gt;Attaching a PDF version (limited to the Tika part) of the presentation
&lt;br&gt;&lt;div class='shrinkable-quote'&gt;&lt;br&gt;&amp;gt; Tika presentation @ ApacheConUs 2008: review
&lt;br&gt;&amp;gt; --------------------------------------------
&lt;br&gt;&amp;gt;
&lt;br&gt;&amp;gt; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; Key: TIKA-167
&lt;br&gt;&amp;gt; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; URL: &lt;a href=&quot;https://issues.apache.org/jira/browse/TIKA-167&quot; target=&quot;_top&quot; rel=&quot;nofollow&quot;&gt;https://issues.apache.org/jira/browse/TIKA-167&lt;/a&gt;&lt;br&gt;&amp;gt; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; Project: Tika
&lt;br&gt;&amp;gt; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;Issue Type: Task
&lt;br&gt;&amp;gt; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;Components: documentation
&lt;br&gt;&amp;gt; &amp;nbsp; &amp;nbsp;Affects Versions: 0.2-incubating
&lt;br&gt;&amp;gt; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;Reporter: Paolo Mottadelli
&lt;br&gt;&amp;gt; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; Attachments: ApacheConUS2008_Tika_PaoloMottadelli.pdf
&lt;br&gt;&amp;gt;
&lt;br&gt;&amp;gt;
&lt;br&gt;&amp;gt; As I have not been involved in the development process, it would be great if someone could review the Tika part of my presentation. I am attaching a rough version of my slides concerning the Tika presentation and listing some *** Open Points ***. Please, let me know if I am out of scope in some parts and if I can get better anyhow.
&lt;br&gt;&amp;gt; *** Open Points: ***
&lt;br&gt;&amp;gt; &amp;nbsp; &amp;nbsp;* What does TIKA mean? (literally)
&lt;br&gt;&amp;gt; &amp;nbsp; &amp;nbsp;* How many registered media types; glob/magic header (slide 7)
&lt;br&gt;&amp;gt; &amp;nbsp; &amp;nbsp;* How many supported media types (slide 7)
&lt;br&gt;&amp;gt; &amp;nbsp; &amp;nbsp;* How many committers? (slide 7)
&lt;br&gt;&amp;gt; &amp;nbsp; &amp;nbsp;* Do we have a download/chackout history for Tika? (eventually slide 10)
&lt;br&gt;&amp;gt; &amp;nbsp; &amp;nbsp;* Future goals; to be completed? &amp;nbsp;(slide 31)
&lt;br&gt;&amp;gt; &amp;nbsp; &amp;nbsp;* Next parsers to be implemented? (slide 32)
&lt;br&gt;&amp;gt; &amp;nbsp; &amp;nbsp;* Who uses Tika? projects using Tika &amp;nbsp;(slide 33)
&lt;br&gt;&amp;gt; &amp;nbsp; &amp;nbsp;* Integration scenarios with other Lucene projects (slide 34)
&lt;br&gt;&amp;gt; &amp;nbsp; &amp;nbsp;* Related projects: others? (slide 34)
&lt;/div&gt;&lt;br&gt;-- 
&lt;br&gt;This message is automatically generated by JIRA.
&lt;br&gt;-
&lt;br&gt;You can reply to this email to add a comment to the issue online.
&lt;br&gt;&lt;br&gt;</content>
	<link rel="alternate" type="text/html" href="http://www.nabble.com/-jira--Created%3A-%28TIKA-167%29-Tika-presentation-%40-ApacheConUs-2008%3A-review-tp19744250p19744249.html" />
</entry>

<entry>
	<id>tag:www.nabble.com,2006:post-19744250</id>
	<title>[jira] Created: (TIKA-167) Tika presentation @ ApacheConUs 2008: review</title>
	<published>2008-09-30T08:19:44Z</published>
	<updated>2008-09-30T08:19:44Z</updated>
	<author>
		<name>JIRA jira@apache.org</name>
	</author>
	<content type="html">Tika presentation @ ApacheConUs 2008: review
&lt;br&gt;--------------------------------------------
&lt;br&gt;&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;Key: TIKA-167
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;URL: &lt;a href=&quot;https://issues.apache.org/jira/browse/TIKA-167&quot; target=&quot;_top&quot; rel=&quot;nofollow&quot;&gt;https://issues.apache.org/jira/browse/TIKA-167&lt;/a&gt;&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;Project: Tika
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; Issue Type: Task
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; Components: documentation
&lt;br&gt;&amp;nbsp; &amp;nbsp; Affects Versions: 0.2-incubating
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; Reporter: Paolo Mottadelli
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;Attachments: ApacheConUS2008_Tika_PaoloMottadelli.pdf
&lt;br&gt;&lt;br&gt;As I have not been involved in the development process, it would be great if someone could review the Tika part of my presentation. I am attaching a rough version of my slides concerning the Tika presentation and listing some *** Open Points ***. Please, let me know if I am out of scope in some parts and if I can get better anyhow.
&lt;br&gt;&lt;br&gt;*** Open Points: ***
&lt;br&gt;&amp;nbsp; &amp;nbsp;* What does TIKA mean? (literally)
&lt;br&gt;&amp;nbsp; &amp;nbsp;* How many registered media types; glob/magic header (slide 7)
&lt;br&gt;&amp;nbsp; &amp;nbsp;* How many supported media types (slide 7)
&lt;br&gt;&amp;nbsp; &amp;nbsp;* How many committers? (slide 7)
&lt;br&gt;&amp;nbsp; &amp;nbsp;* Do we have a download/chackout history for Tika? (eventually slide 10)
&lt;br&gt;&amp;nbsp; &amp;nbsp;* Future goals; to be completed? &amp;nbsp;(slide 31)
&lt;br&gt;&amp;nbsp; &amp;nbsp;* Next parsers to be implemented? (slide 32)
&lt;br&gt;&amp;nbsp; &amp;nbsp;* Who uses Tika? projects using Tika &amp;nbsp;(slide 33)
&lt;br&gt;&amp;nbsp; &amp;nbsp;* Integration scenarios with other Lucene projects (slide 34)
&lt;br&gt;&amp;nbsp; &amp;nbsp;* Related projects: others? (slide 34)
&lt;br&gt;&lt;br&gt;-- 
&lt;br&gt;This message is automatically generated by JIRA.
&lt;br&gt;-
&lt;br&gt;You can reply to this email to add a comment to the issue online.
&lt;br&gt;&lt;br&gt;</content>
	<link rel="alternate" type="text/html" href="http://www.nabble.com/-jira--Created%3A-%28TIKA-167%29-Tika-presentation-%40-ApacheConUs-2008%3A-review-tp19744250p19744250.html" />
</entry>

<entry>
	<id>tag:www.nabble.com,2006:post-19729887</id>
	<title>[jira] Updated: (TIKA-166) Update HTMLParser to parse contents of meta tags</title>
	<published>2008-09-29T12:08:44Z</published>
	<updated>2008-09-29T12:08:44Z</updated>
	<author>
		<name>JIRA jira@apache.org</name>
	</author>
	<content type="html">&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp;[ &lt;a href=&quot;https://issues.apache.org/jira/browse/TIKA-166?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel&quot; target=&quot;_top&quot; rel=&quot;nofollow&quot;&gt;https://issues.apache.org/jira/browse/TIKA-166?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel&lt;/a&gt;&amp;nbsp;]
&lt;br&gt;&lt;br&gt;Dave Meikle updated TIKA-166:
&lt;br&gt;-----------------------------
&lt;br&gt;&lt;br&gt;&amp;nbsp; &amp;nbsp; Attachment: TIKA-166.diff
&lt;br&gt;&lt;br&gt;Patch to implement the functionality. This can be used for further discussion as I am not sure if we want to include the http-equiv info as well.
&lt;br&gt;&lt;div class='shrinkable-quote'&gt;&lt;br&gt;&amp;gt; Update HTMLParser to parse contents of meta tags
&lt;br&gt;&amp;gt; ------------------------------------------------
&lt;br&gt;&amp;gt;
&lt;br&gt;&amp;gt; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; Key: TIKA-166
&lt;br&gt;&amp;gt; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; URL: &lt;a href=&quot;https://issues.apache.org/jira/browse/TIKA-166&quot; target=&quot;_top&quot; rel=&quot;nofollow&quot;&gt;https://issues.apache.org/jira/browse/TIKA-166&lt;/a&gt;&lt;br&gt;&amp;gt; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; Project: Tika
&lt;br&gt;&amp;gt; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;Issue Type: Improvement
&lt;br&gt;&amp;gt; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;Components: parser
&lt;br&gt;&amp;gt; &amp;nbsp; &amp;nbsp;Affects Versions: 0.1-incubating
&lt;br&gt;&amp;gt; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;Reporter: Dave Meikle
&lt;br&gt;&amp;gt; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;Priority: Minor
&lt;br&gt;&amp;gt; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; Attachments: TIKA-166.diff
&lt;br&gt;&amp;gt;
&lt;br&gt;&amp;gt;
&lt;br&gt;&amp;gt; Enhancement of the HTMLParser to return the HTML &amp;lt;meta&amp;gt; tags found in the document in the Metadata object, as raised in the mailing list.
&lt;/div&gt;&lt;br&gt;-- 
&lt;br&gt;This message is automatically generated by JIRA.
&lt;br&gt;-
&lt;br&gt;You can reply to this email to add a comment to the issue online.
&lt;br&gt;&lt;br&gt;</content>
	<link rel="alternate" type="text/html" href="http://www.nabble.com/-jira--Created%3A-%28TIKA-166%29-Update-HTMLParser-to-parse-contents-of-meta-tags-tp19729819p19729887.html" />
</entry>

<entry>
	<id>tag:www.nabble.com,2006:post-19729819</id>
	<title>[jira] Created: (TIKA-166) Update HTMLParser to parse contents of meta tags</title>
	<published>2008-09-29T12:04:44Z</published>
	<updated>2008-09-29T12:04:44Z</updated>
	<author>
		<name>JIRA jira@apache.org</name>
	</author>
	<content type="html">Update HTMLParser to parse contents of meta tags
&lt;br&gt;------------------------------------------------
&lt;br&gt;&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;Key: TIKA-166
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;URL: &lt;a href=&quot;https://issues.apache.org/jira/browse/TIKA-166&quot; target=&quot;_top&quot; rel=&quot;nofollow&quot;&gt;https://issues.apache.org/jira/browse/TIKA-166&lt;/a&gt;&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;Project: Tika
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; Issue Type: Improvement
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; Components: parser
&lt;br&gt;&amp;nbsp; &amp;nbsp; Affects Versions: 0.1-incubating
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; Reporter: Dave Meikle
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; Priority: Minor
&lt;br&gt;&lt;br&gt;&lt;br&gt;Enhancement of the HTMLParser to return the HTML &amp;lt;meta&amp;gt; tags found in the document in the Metadata object, as raised in the mailing list.
&lt;br&gt;&lt;br&gt;-- 
&lt;br&gt;This message is automatically generated by JIRA.
&lt;br&gt;-
&lt;br&gt;You can reply to this email to add a comment to the issue online.
&lt;br&gt;&lt;br&gt;</content>
	<link rel="alternate" type="text/html" href="http://www.nabble.com/-jira--Created%3A-%28TIKA-166%29-Update-HTMLParser-to-parse-contents-of-meta-tags-tp19729819p19729819.html" />
</entry>

<entry>
	<id>tag:www.nabble.com,2006:post-19728170</id>
	<title>[jira] Created: (TIKA-165) update icu4j</title>
	<published>2008-09-29T10:26:44Z</published>
	<updated>2008-09-29T10:26:44Z</updated>
	<author>
		<name>JIRA jira@apache.org</name>
	</author>
	<content type="html">update icu4j
&lt;br&gt;------------
&lt;br&gt;&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;Key: TIKA-165
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;URL: &lt;a href=&quot;https://issues.apache.org/jira/browse/TIKA-165&quot; target=&quot;_top&quot; rel=&quot;nofollow&quot;&gt;https://issues.apache.org/jira/browse/TIKA-165&lt;/a&gt;&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;Project: Tika
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; Issue Type: Task
&lt;br&gt;&amp;nbsp; &amp;nbsp; Affects Versions: 0.1-incubating
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; Reporter: Sami Siren
&lt;br&gt;&lt;br&gt;&lt;br&gt;Latest version currently available is 3.8. 
&lt;br&gt;&lt;br&gt;-- 
&lt;br&gt;This message is automatically generated by JIRA.
&lt;br&gt;-
&lt;br&gt;You can reply to this email to add a comment to the issue online.
&lt;br&gt;&lt;br&gt;</content>
	<link rel="alternate" type="text/html" href="http://www.nabble.com/-jira--Created%3A-%28TIKA-165%29-update-icu4j-tp19728170p19728170.html" />
</entry>

<entry>
	<id>tag:www.nabble.com,2006:post-19728113</id>
	<title>[jira] Created: (TIKA-164) Update nekohtml version</title>
	<published>2008-09-29T10:22:45Z</published>
	<updated>2008-09-29T10:22:45Z</updated>
	<author>
		<name>JIRA jira@apache.org</name>
	</author>
	<content type="html">Update nekohtml version
&lt;br&gt;-----------------------
&lt;br&gt;&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;Key: TIKA-164
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;URL: &lt;a href=&quot;https://issues.apache.org/jira/browse/TIKA-164&quot; target=&quot;_top&quot; rel=&quot;nofollow&quot;&gt;https://issues.apache.org/jira/browse/TIKA-164&lt;/a&gt;&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;Project: Tika
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; Issue Type: Task
&lt;br&gt;&amp;nbsp; &amp;nbsp; Affects Versions: 0.1-incubating
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; Reporter: Sami Siren
&lt;br&gt;&lt;br&gt;&lt;br&gt;Latest version currently available is 1.9.9.
&lt;br&gt;&lt;br&gt;-- 
&lt;br&gt;This message is automatically generated by JIRA.
&lt;br&gt;-
&lt;br&gt;You can reply to this email to add a comment to the issue online.
&lt;br&gt;&lt;br&gt;</content>
	<link rel="alternate" type="text/html" href="http://www.nabble.com/-jira--Created%3A-%28TIKA-164%29-Update-nekohtml-version-tp19728113p19728113.html" />
</entry>

<entry>
	<id>tag:www.nabble.com,2006:post-19716297</id>
	<title>Re: Planning Tika 0.2</title>
	<published>2008-09-28T15:43:28Z</published>
	<updated>2008-09-28T15:43:28Z</updated>
	<author>
		<name>Dave Meikle-3</name>
	</author>
	<content type="html">2008/9/28 Sami Siren &amp;lt;&lt;a href=&quot;http://www.nabble.com/user/SendEmail.jtp?type=post&amp;post=19716297&amp;i=0&quot; target=&quot;_top&quot; rel=&quot;nofollow&quot;&gt;ssiren@...&lt;/a&gt;&amp;gt;
&lt;br&gt;&lt;div class='shrinkable-quote'&gt;&lt;br&gt;&amp;gt; Jukka Zitting wrote:
&lt;br&gt;&amp;gt;
&lt;br&gt;&amp;gt;&amp;gt; I think the current trunk is good enough to be released.
&lt;br&gt;&amp;gt;&amp;gt;
&lt;br&gt;&amp;gt;&amp;gt;
&lt;br&gt;&amp;gt; +1
&lt;br&gt;&amp;gt;
&lt;br&gt;&amp;gt; --
&lt;br&gt;&amp;gt; Sami Siren
&lt;br&gt;&amp;gt;
&lt;br&gt;&amp;gt;
&lt;/div&gt;If it mattered from me I would give it a +1, but since it doesn't I will
&lt;br&gt;just give it a smile :-)
&lt;br&gt;</content>
	<link rel="alternate" type="text/html" href="http://www.nabble.com/Planning-Tika-0.2-tp17458121p19716297.html" />
</entry>

<entry>
	<id>tag:www.nabble.com,2006:post-19713498</id>
	<title>Re: Planning Tika 0.2</title>
	<published>2008-09-28T10:18:19Z</published>
	<updated>2008-09-28T10:18:19Z</updated>
	<author>
		<name>Sami Siren-2</name>
	</author>
	<content type="html">Jukka Zitting wrote:
&lt;br&gt;&amp;gt; I think the current trunk is good enough to be released.
&lt;br&gt;&amp;gt; &amp;nbsp; 
&lt;br&gt;+1
&lt;br&gt;&lt;br&gt;--
&lt;br&gt;&amp;nbsp;Sami Siren
&lt;br&gt;&lt;br&gt;</content>
	<link rel="alternate" type="text/html" href="http://www.nabble.com/Planning-Tika-0.2-tp17458121p19713498.html" />
</entry>

<entry>
	<id>tag:www.nabble.com,2006:post-19710587</id>
	<title>Re: Apache Tika on the Fast Feather Track</title>
	<published>2008-09-28T03:56:08Z</published>
	<updated>2008-09-28T03:56:08Z</updated>
	<author>
		<name>Jukka Zitting</name>
	</author>
	<content type="html">Hi,
&lt;br&gt;&lt;br&gt;On Sun, Sep 28, 2008 at 1:21 AM, Jukka Zitting &amp;lt;&lt;a href=&quot;http://www.nabble.com/user/SendEmail.jtp?type=post&amp;post=19710587&amp;i=0&quot; target=&quot;_top&quot; rel=&quot;nofollow&quot;&gt;jukka.zitting@...&lt;/a&gt;&amp;gt; wrote:
&lt;br&gt;&amp;gt; ApacheCon US is coming up and there will again be a Fast Feather Track
&lt;br&gt;&amp;gt; where we could present the latest developments in Tika. We have
&lt;br&gt;&amp;gt; implemented quite a lot of new stuff since the last ApacheCon, and it
&lt;br&gt;&amp;gt; would be good to showcase our progress.
&lt;br&gt;&lt;br&gt;I just noticed (thanks to Grant's promo message) that Paolo Mottadelli
&lt;br&gt;is already doing a full session [1] on Tika, so an extra FFT talk
&lt;br&gt;might not be needed.
&lt;br&gt;&lt;br&gt;[1] &lt;a href=&quot;http://us.apachecon.com/c/acus2008/sessions/12&quot; target=&quot;_top&quot; rel=&quot;nofollow&quot;&gt;http://us.apachecon.com/c/acus2008/sessions/12&lt;/a&gt;&lt;br&gt;&lt;br&gt;BR,
&lt;br&gt;&lt;br&gt;Jukka Zitting
&lt;br&gt;</content>
	<link rel="alternate" type="text/html" href="http://www.nabble.com/Apache-Tika-on-the-Fast-Feather-Track-tp19707363p19710587.html" />
</entry>

<entry>
	<id>tag:www.nabble.com,2006:post-19707363</id>
	<title>Apache Tika on the Fast Feather Track</title>
	<published>2008-09-27T16:21:56Z</published>
	<updated>2008-09-27T16:21:56Z</updated>
	<author>
		<name>Jukka Zitting</name>
	</author>
	<content type="html">Hi,
&lt;br&gt;&lt;br&gt;ApacheCon US is coming up and there will again be a Fast Feather Track
&lt;br&gt;where we could present the latest developments in Tika. We have
&lt;br&gt;implemented quite a lot of new stuff since the last ApacheCon, and it
&lt;br&gt;would be good to showcase our progress.
&lt;br&gt;&lt;br&gt;I'll be attending the ApacheCon and would be interested in presenting
&lt;br&gt;Tika there. Anyone else interested?
&lt;br&gt;&lt;br&gt;BR,
&lt;br&gt;&lt;br&gt;Jukka Zitting
&lt;br&gt;</content>
	<link rel="alternate" type="text/html" href="http://www.nabble.com/Apache-Tika-on-the-Fast-Feather-Track-tp19707363p19707363.html" />
</entry>

<entry>
	<id>tag:www.nabble.com,2006:post-19707123</id>
	<title>Re: Planning Tika 0.2</title>
	<published>2008-09-27T15:46:10Z</published>
	<updated>2008-09-27T15:46:10Z</updated>
	<author>
		<name>Jukka Zitting</name>
	</author>
	<content type="html">Hi,
&lt;br&gt;&lt;br&gt;The following issues were remaining on the 0.2 roadmap:
&lt;br&gt;&lt;br&gt;&amp;nbsp; TIKA-50 &amp;nbsp;Unit tests are incomplete.
&lt;br&gt;&amp;nbsp; TIKA-61 &amp;nbsp;Add namespaces to our metadata keys
&lt;br&gt;&amp;nbsp; TIKA-69 &amp;nbsp;ParseUtils methods need to support Metadata
&lt;br&gt;&amp;nbsp; TIKA-74 &amp;nbsp;Test Resources should be loaded by the class loader ...
&lt;br&gt;&amp;nbsp; TIKA-79 &amp;nbsp;Mime type detection from file header appears to be failing
&lt;br&gt;&amp;nbsp; TIKA-80 &amp;nbsp;Utility method in MimeUtils to perform full mime resolution ...
&lt;br&gt;&amp;nbsp; TIKA-121 MimeType.clean method no longer exists as a capability
&lt;br&gt;&lt;br&gt;None of them looked terribly urgent or blocking, so I just removed
&lt;br&gt;them from the 0.2 roadmap.
&lt;br&gt;&lt;br&gt;I think the current trunk is good enough to be released.
&lt;br&gt;&lt;br&gt;BR,
&lt;br&gt;&lt;br&gt;Jukka Zitting
&lt;br&gt;</content>
	<link rel="alternate" type="text/html" href="http://www.nabble.com/Planning-Tika-0.2-tp17458121p19707123.html" />
</entry>

<entry>
	<id>tag:www.nabble.com,2006:post-19707054</id>
	<title>[jira] Updated: (TIKA-135) The command line files (tika.bat, tika.sh) are not usable</title>
	<published>2008-09-27T15:35:44Z</published>
	<updated>2008-09-27T15:35:44Z</updated>
	<author>
		<name>JIRA jira@apache.org</name>
	</author>
	<content type="html">&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp;[ &lt;a href=&quot;https://issues.apache.org/jira/browse/TIKA-135?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel&quot; target=&quot;_top&quot; rel=&quot;nofollow&quot;&gt;https://issues.apache.org/jira/browse/TIKA-135?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel&lt;/a&gt;&amp;nbsp;]
&lt;br&gt;&lt;br&gt;Jukka Zitting updated TIKA-135:
&lt;br&gt;-------------------------------
&lt;br&gt;&lt;br&gt;&amp;nbsp; &amp;nbsp; Fix Version/s: &amp;nbsp; &amp;nbsp; (was: 0.2-incubating)
&lt;br&gt;&lt;div class='shrinkable-quote'&gt;&lt;br&gt;&amp;gt; The command line files (tika.bat, tika.sh) are not usable
&lt;br&gt;&amp;gt; ---------------------------------------------------------
&lt;br&gt;&amp;gt;
&lt;br&gt;&amp;gt; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; Key: TIKA-135
&lt;br&gt;&amp;gt; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; URL: &lt;a href=&quot;https://issues.apache.org/jira/browse/TIKA-135&quot; target=&quot;_top&quot; rel=&quot;nofollow&quot;&gt;https://issues.apache.org/jira/browse/TIKA-135&lt;/a&gt;&lt;br&gt;&amp;gt; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; Project: Tika
&lt;br&gt;&amp;gt; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;Issue Type: Bug
&lt;br&gt;&amp;gt; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;Components: cli
&lt;br&gt;&amp;gt; &amp;nbsp; &amp;nbsp;Affects Versions: 0.2-incubating
&lt;br&gt;&amp;gt; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; Environment: Windows XP; Java 1.5
&lt;br&gt;&amp;gt; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;Reporter: Karl Heinz Marbaise
&lt;br&gt;&amp;gt; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;Assignee: Chris A. Mattmann
&lt;br&gt;&amp;gt; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; Attachments: batsh-patch1.diff
&lt;br&gt;&amp;gt;
&lt;br&gt;&amp;gt; &amp;nbsp; Original Estimate: 0.08h
&lt;br&gt;&amp;gt; &amp;nbsp;Remaining Estimate: 0.08h
&lt;br&gt;&amp;gt;
&lt;br&gt;&amp;gt; The bat/sh file do not include the correct path to the jar's nor to the classpath where all other files can be found.
&lt;/div&gt;&lt;br&gt;-- 
&lt;br&gt;This message is automatically generated by JIRA.
&lt;br&gt;-
&lt;br&gt;You can reply to this email to add a comment to the issue online.
&lt;br&gt;&lt;br&gt;</content>
	<link rel="alternate" type="text/html" href="http://www.nabble.com/-jira--Created%3A-%28TIKA-135%29-The-command-line-files-%28tika.bat%2C-tika.sh%29-are-not-usable-tp16328244p19707054.html" />
</entry>

<entry>
	<id>tag:www.nabble.com,2006:post-19694432</id>
	<title>ApacheCon US promo</title>
	<published>2008-09-26T11:53:59Z</published>
	<updated>2008-09-26T11:53:59Z</updated>
	<author>
		<name>Grant Ingersoll-6</name>
	</author>
	<content type="html">Cross-posting...
&lt;br&gt;&lt;br&gt;Just wanted to let everyone know that there will be a number of Lucene/ 
&lt;br&gt;Solr/Mahout/Tika related talks, training sessions, and Birds of a &amp;nbsp;
&lt;br&gt;Feather (BOF) gatherings at ApacheCon New Orleans this fall.
&lt;br&gt;&lt;br&gt;Details:
&lt;br&gt;When: November 3-7
&lt;br&gt;Where: &amp;nbsp;Sheraton, New Orleans, USA
&lt;br&gt;URL: &lt;a href=&quot;http://us.apachecon.com/c/acus2008/&quot; target=&quot;_top&quot; rel=&quot;nofollow&quot;&gt;http://us.apachecon.com/c/acus2008/&lt;/a&gt;&lt;br&gt;&lt;br&gt;Lucene:
&lt;br&gt;&lt;br&gt;Advanced Indexing Techniques by Michael Busch: &lt;a href=&quot;http://us.apachecon.com/c/acus2008/sessions/7&quot; target=&quot;_top&quot; rel=&quot;nofollow&quot;&gt;http://us.apachecon.com/c/acus2008/sessions/7&lt;/a&gt;&lt;br&gt;&lt;br&gt;Lucene Boot Camp (2 day hands-on training by me): &lt;a href=&quot;http://us.apachecon.com/c/acus2008/sessions/69&quot; target=&quot;_top&quot; rel=&quot;nofollow&quot;&gt;http://us.apachecon.com/c/acus2008/sessions/69&lt;/a&gt;&lt;br&gt;&lt;br&gt;Solr:
&lt;br&gt;&lt;br&gt;Solr out of the Box by Chris Hostetter: &lt;a href=&quot;http://us.apachecon.com/c/acus2008/sessions/9&quot; target=&quot;_top&quot; rel=&quot;nofollow&quot;&gt;http://us.apachecon.com/c/acus2008/sessions/9&lt;/a&gt;&lt;br&gt;&lt;br&gt;Beyond the Box by Hoss: &lt;a href=&quot;http://us.apachecon.com/c/acus2008/sessions/10&quot; target=&quot;_top&quot; rel=&quot;nofollow&quot;&gt;http://us.apachecon.com/c/acus2008/sessions/10&lt;/a&gt;&lt;br&gt;&lt;br&gt;Solr Boot Camp (1 day hands-on training by Erik Hatcher): &lt;a href=&quot;http://us.apachecon.com/c/acus2008/sessions/91&quot; target=&quot;_top&quot; rel=&quot;nofollow&quot;&gt;http://us.apachecon.com/c/acus2008/sessions/91&lt;/a&gt;&lt;br&gt;&lt;br&gt;Mahout:
&lt;br&gt;&lt;br&gt;Intro to Mahout and Machine Learning (by me): &lt;a href=&quot;http://us.apachecon.com/c/acus2008/sessions/11&quot; target=&quot;_top&quot; rel=&quot;nofollow&quot;&gt;http://us.apachecon.com/c/acus2008/sessions/11&lt;/a&gt;&lt;br&gt;&lt;br&gt;Tika:
&lt;br&gt;&lt;br&gt;Content Analysis for ECM with Apache Tika by Paolo Mottadelli : &lt;a href=&quot;http://us.apachecon.com/c/acus2008/sessions/12&quot; target=&quot;_top&quot; rel=&quot;nofollow&quot;&gt;http://us.apachecon.com/c/acus2008/sessions/12&lt;/a&gt;&lt;br&gt;&lt;br&gt;&lt;br&gt;There's also one more Lucene session that is TBD, but it will be on &amp;nbsp;
&lt;br&gt;that same Wednesday as everything else. &amp;nbsp;Chances are it will be an &amp;nbsp;
&lt;br&gt;intro to Lucene type talk.
&lt;br&gt;&lt;br&gt;&lt;br&gt;BOFs: &amp;nbsp;&lt;a href=&quot;http://wiki.apache.org/apachecon/BirdsOfaFeatherUs08&quot; target=&quot;_top&quot; rel=&quot;nofollow&quot;&gt;http://wiki.apache.org/apachecon/BirdsOfaFeatherUs08&lt;/a&gt;&lt;br&gt;&lt;br&gt;&lt;br&gt;Cheers,
&lt;br&gt;Grant
&lt;br&gt;</content>
	<link rel="alternate" type="text/html" href="http://www.nabble.com/ApacheCon-US-promo-tp19694432p19694432.html" />
</entry>

<entry>
	<id>tag:www.nabble.com,2006:post-19695055</id>
	<title>ANNOUNCE: Application Period Opens for Travel Assistance to ApacheCon US 2008</title>
	<published>2008-09-26T10:25:03Z</published>
	<updated>2008-09-26T10:25:03Z</updated>
	<author>
		<name>hossman</name>
	</author>
	<content type="html">&lt;br&gt;NOTE: This is a cross posted announcement to all Lucene sub-projects, 
&lt;br&gt;please confine any replies to general@lucene.
&lt;br&gt;&lt;br&gt;-------------
&lt;br&gt;&lt;br&gt;The Travel Assistance Committee is taking in applications for those wanting
&lt;br&gt;to attend ApacheCon US 2008 between the 3rd and 7th November 2008 in New
&lt;br&gt;Orleans.
&lt;br&gt;&lt;br&gt;The Travel Assistance Committee is looking for people who would like to be
&lt;br&gt;able to attend ApacheCon US 2008 who need some financial support in order to
&lt;br&gt;get there. There are VERY few places available and the criteria is high,
&lt;br&gt;that aside applications are open to all open source developers who feel that
&lt;br&gt;their attendance would benefit themselves, their project(s), the ASF and
&lt;br&gt;open source in general.
&lt;br&gt;&lt;br&gt;Financial assistance is available for flights, accomodation and entrance
&lt;br&gt;fees either in full or in part, depending on circumstances. It is intended
&lt;br&gt;that all our ApacheCon events are covered, so it may be prudent for those in
&lt;br&gt;Europe and or Asia to wait until an event closer to them comes up - you are
&lt;br&gt;all welcome to apply for ApacheCon US of course, but there must be
&lt;br&gt;compelling reasons for you to attend an event further away that your home
&lt;br&gt;location for your application to be considered above those closer to the
&lt;br&gt;event location.
&lt;br&gt;&lt;br&gt;More information can be found on the main Apache website at
&lt;br&gt;&lt;a href=&quot;http://www.apache.org/travel/index.html&quot; target=&quot;_top&quot; rel=&quot;nofollow&quot;&gt;http://www.apache.org/travel/index.html&lt;/a&gt;&amp;nbsp;- where you will also find a link to
&lt;br&gt;the application form and details for submitting.
&lt;br&gt;&lt;br&gt;Time is very tight for this event, so applications are open now and will end
&lt;br&gt;on the 2nd October 2008 - to give enough time for travel arrangements to be
&lt;br&gt;made.
&lt;br&gt;&lt;br&gt;Good luck to all those that will apply.
&lt;br&gt;&lt;br&gt;Regards,
&lt;br&gt;&lt;br&gt;The Travel Assistance Committee
&lt;br&gt;</content>
	<link rel="alternate" type="text/html" href="http://www.nabble.com/ANNOUNCE%3A-Application-Period-Opens-for-Travel-Assistance-to-ApacheCon-US-2008-tp19695055p19695055.html" />
</entry>

<entry>
	<id>tag:www.nabble.com,2006:post-19661818</id>
	<title>RE: HTML &lt;meta&gt; tags</title>
	<published>2008-09-24T19:01:10Z</published>
	<updated>2008-09-24T19:01:10Z</updated>
	<author>
		<name>Brian Levay</name>
	</author>
	<content type="html">Dave,
&lt;br&gt;&lt;br&gt;I thought it might have been my configuration at work so I tried again at
&lt;br&gt;home on a clean machine tonight and I get the same error. &amp;nbsp;My files are
&lt;br&gt;identical to your diff. &amp;nbsp;For me it won't enter the startElement() method for
&lt;br&gt;the meta handler.
&lt;br&gt;&lt;br&gt;I attached the three files (HTMLParser, HTMLParserTester, testHTML.html).
&lt;br&gt;&lt;br&gt;I'm sure this is something simple. &amp;nbsp;Maybe some kind of configuration
&lt;br&gt;difference? &amp;nbsp;I've tried Java 5 and 6 with no change. &amp;nbsp;Do you have a zip of
&lt;br&gt;all your dependent tika .jar files I can try to use? &amp;nbsp;That is my only guess
&lt;br&gt;now.
&lt;br&gt;&lt;br&gt;--Brian
&lt;br&gt;&lt;br&gt;-----Original Message-----
&lt;br&gt;From: Dave Meikle [mailto:&lt;a href=&quot;http://www.nabble.com/user/SendEmail.jtp?type=post&amp;post=19661818&amp;i=0&quot; target=&quot;_top&quot; rel=&quot;nofollow&quot;&gt;loompa@...&lt;/a&gt;] 
&lt;br&gt;Sent: Wednesday, September 24, 2008 5:04 PM
&lt;br&gt;To: &lt;a href=&quot;http://www.nabble.com/user/SendEmail.jtp?type=post&amp;post=19661818&amp;i=1&quot; target=&quot;_top&quot; rel=&quot;nofollow&quot;&gt;tika-dev@...&lt;/a&gt;
&lt;br&gt;Subject: Re: HTML &amp;lt;meta&amp;gt; tags
&lt;br&gt;&lt;br&gt;Hi
&lt;br&gt;&lt;br&gt;2008/9/24 Brian Levay &amp;lt;&lt;a href=&quot;http://www.nabble.com/user/SendEmail.jtp?type=post&amp;post=19661818&amp;i=2&quot; target=&quot;_top&quot; rel=&quot;nofollow&quot;&gt;brian.levay@...&lt;/a&gt;&amp;gt;
&lt;br&gt;&lt;div class='shrinkable-quote'&gt;&lt;div class='shrinkable-quote'&gt;&lt;br&gt;&amp;gt; I'll submit the updates when I'm done (along with unit tests). &amp;nbsp;I'm having
&lt;br&gt;&amp;gt; a
&lt;br&gt;&amp;gt; problem though. &amp;nbsp;I sync'ed my tika baseline this morning and the Matcher
&lt;br&gt;&amp;gt; stopped matching the &amp;lt;meta&amp;gt; tags. &amp;nbsp;Any idea what my be causing this? &amp;nbsp;I've
&lt;br&gt;&amp;gt; tried many variations of the xpath expressions to match the &amp;lt;meta&amp;gt; tags.
&lt;br&gt;&amp;gt; Right now my code in HTMLParser looks like this:
&lt;br&gt;&amp;gt;
&lt;br&gt;&amp;gt; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;Matcher body = xpath.parse(&amp;quot;/HTML/BODY//node()&amp;quot;);
&lt;br&gt;&amp;gt; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;Matcher title = xpath.parse(&amp;quot;/HTML/HEAD/TITLE//node()&amp;quot;);
&lt;br&gt;&amp;gt; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;Matcher meta = xpath.parse(&amp;quot;/HTML/HEAD/META//node()&amp;quot;);
&lt;br&gt;&amp;gt; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;handler = new TeeContentHandler(
&lt;br&gt;&amp;gt; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;new MatchingContentHandler(getBodyHandler(xhtml), body),
&lt;br&gt;&amp;gt; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;new MatchingContentHandler(getTitleHandler(metadata),
&lt;br&gt;&amp;gt; title),
&lt;br&gt;&amp;gt; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;new MatchingContentHandler(getMetaHandler(metadata),
&lt;/div&gt;meta));
&lt;br&gt;&amp;gt;
&lt;br&gt;&amp;gt; The &amp;lt;meta&amp;gt; handler isn't being called. &amp;nbsp;If I use /HTML/HEAD//node() the
&lt;br&gt;&amp;gt; handler will get called for the &amp;lt;head&amp;gt; and &amp;lt;title&amp;gt; tags but it will skip
&lt;br&gt;&amp;gt; right past the &amp;lt;meta&amp;gt; tags. &amp;nbsp;I know the tika code is seeing the META tags
&lt;br&gt;&amp;gt; because I see the tags trying to be matched in the startElement method of
&lt;br&gt;&amp;gt; MatchingContentHandler. &amp;nbsp;Any ideas?
&lt;br&gt;&amp;gt;
&lt;br&gt;&amp;gt; --Brian
&lt;br&gt;&amp;gt;
&lt;/div&gt;&lt;br&gt;I am using effectively the same thing in a local copy and have just re-based
&lt;br&gt;it again HEAD (shown in the diff below), and it appears to be working fine
&lt;br&gt;for me.
&lt;br&gt;&lt;br&gt;What is your test XML like?
&lt;br&gt;&lt;br&gt;Cheers,
&lt;br&gt;Dave
&lt;br&gt;&lt;br&gt;&lt;br&gt;Index: src/main/java/org/apache/tika/parser/html/HtmlParser.java
&lt;br&gt;===================================================================
&lt;br&gt;--- src/main/java/org/apache/tika/parser/html/HtmlParser.java &amp;nbsp; &amp;nbsp;(revision
&lt;br&gt;698705)
&lt;br&gt;+++ src/main/java/org/apache/tika/parser/html/HtmlParser.java &amp;nbsp; &amp;nbsp;(working
&lt;br&gt;copy)
&lt;br&gt;@@ -95,9 +95,11 @@
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;XPathParser xpath = new XPathParser(null, &amp;quot;&amp;quot;);
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;Matcher body = xpath.parse(&amp;quot;/HTML/BODY//node()&amp;quot;);
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;Matcher title = xpath.parse(&amp;quot;/HTML/HEAD/TITLE//node()&amp;quot;);
&lt;br&gt;+ &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;Matcher meta = xpath.parse(&amp;quot;/HTML/HEAD/META//node()&amp;quot;);
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;handler = new TeeContentHandler(
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;new MatchingContentHandler(getBodyHandler(xhtml), body),
&lt;br&gt;- &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;new MatchingContentHandler(getTitleHandler(metadata),
&lt;br&gt;title));
&lt;br&gt;+ &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;new MatchingContentHandler(getTitleHandler(metadata),
&lt;br&gt;title),
&lt;br&gt;+ &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;new MatchingContentHandler(getMetaHandler(metadata),
&lt;br&gt;meta));
&lt;br&gt;&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;// Parse the HTML document
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;xhtml.startDocument();
&lt;br&gt;@@ -116,6 +118,17 @@
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;};
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp;}
&lt;br&gt;&lt;br&gt;+ &amp;nbsp; &amp;nbsp;private ContentHandler getMetaHandler(final Metadata metadata) {
&lt;br&gt;+ &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;return new WriteOutContentHandler() {
&lt;br&gt;+ &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;@Override
&lt;br&gt;+ &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;public void startElement(
&lt;br&gt;+ &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;String uri, String local, String name, Attributes atts)
&lt;br&gt;+ &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;throws SAXException {
&lt;br&gt;+ &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;metadata.set(atts.getValue(0), atts.getValue(1));
&lt;br&gt;+ &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;}
&lt;br&gt;+ &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;};
&lt;br&gt;+ &amp;nbsp; &amp;nbsp;}
&lt;br&gt;+
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp;private ContentHandler getBodyHandler(final XHTMLContentHandler xhtml)
&lt;br&gt;{
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;return new TextContentHandler(xhtml) {
&lt;br&gt;</content>
	<link rel="alternate" type="text/html" href="http://www.nabble.com/HTML-%3Cmeta%3E-tags-tp19576308p19661818.html" />
</entry>

<entry>
	<id>tag:www.nabble.com,2006:post-19658201</id>
	<title>Re: HTML &lt;meta&gt; tags</title>
	<published>2008-09-24T14:14:35Z</published>
	<updated>2008-09-24T14:14:35Z</updated>
	<author>
		<name>Dave Meikle-3</name>
	</author>
	<content type="html">Hi Jukka,
&lt;br&gt;&lt;br&gt;2008/9/24 Jukka Zitting &amp;lt;&lt;a href=&quot;http://www.nabble.com/user/SendEmail.jtp?type=post&amp;post=19658201&amp;i=0&quot; target=&quot;_top&quot; rel=&quot;nofollow&quot;&gt;jukka.zitting@...&lt;/a&gt;&amp;gt;
&lt;br&gt;&lt;br&gt;&amp;gt;
&lt;br&gt;&amp;gt; The XHTMLDowngradeHandler wrapper will uppercase all element names and
&lt;br&gt;&amp;gt; drop all namespaces and namespaced attributes, but as far as I can
&lt;br&gt;&amp;gt; tell your code should still match the META tags. But there might be
&lt;br&gt;&amp;gt; some bug in the XHTMLDowngradeHandler code that breaks things.
&lt;br&gt;&amp;gt;
&lt;br&gt;&lt;br&gt;This is working fine for me with the XHTMLDowngradeHandler in the code. I
&lt;br&gt;just sync'd with SVN before I posted.
&lt;br&gt;&lt;br&gt;Cheers,
&lt;br&gt;Dave
&lt;br&gt;</content>
	<link rel="alternate" type="text/html" href="http://www.nabble.com/HTML-%3Cmeta%3E-tags-tp19576308p19658201.html" />
</entry>

<entry>
	<id>tag:www.nabble.com,2006:post-19658043</id>
	<title>Re: HTML &lt;meta&gt; tags</title>
	<published>2008-09-24T14:04:13Z</published>
	<updated>2008-09-24T14:04:13Z</updated>
	<author>
		<name>Jukka Zitting</name>
	</author>
	<content type="html">Hi,
&lt;br&gt;&lt;br&gt;On Wed, Sep 24, 2008 at 10:17 PM, Brian Levay &amp;lt;&lt;a href=&quot;http://www.nabble.com/user/SendEmail.jtp?type=post&amp;post=19658043&amp;i=0&quot; target=&quot;_top&quot; rel=&quot;nofollow&quot;&gt;brian.levay@...&lt;/a&gt;&amp;gt; wrote:
&lt;br&gt;&amp;gt; I'll submit the updates when I'm done (along with unit tests). &amp;nbsp;I'm having a
&lt;br&gt;&amp;gt; problem though. &amp;nbsp;I sync'ed my tika baseline this morning and the Matcher
&lt;br&gt;&amp;gt; stopped matching the &amp;lt;meta&amp;gt; tags. &amp;nbsp;Any idea what my be causing this?
&lt;br&gt;&lt;br&gt;Most likely the TIKA-140 fix that I committed recently. You may want
&lt;br&gt;to try reverting this change:
&lt;br&gt;&lt;br&gt;--- incubator/tika/trunk/src/main/java/org/apache/tika/parser/html/HtmlParser.java	2008/09/22
&lt;br&gt;22:57:10	698027
&lt;br&gt;+++ incubator/tika/trunk/src/main/java/org/apache/tika/parser/html/HtmlParser.java	2008/09/22
&lt;br&gt;23:00:27	698028
&lt;br&gt;@@ -102,7 +102,7 @@
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;// Parse the HTML document
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;xhtml.startDocument();
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;SAXParser parser = new SAXParser();
&lt;br&gt;- &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;parser.setContentHandler(handler);
&lt;br&gt;+ &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;parser.setContentHandler(new XHTMLDowngradeHandler(handler));
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;parser.parse(new InputSource(Utils.getUTF8Reader(stream, metadata)));
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;xhtml.endDocument();
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp;}
&lt;br&gt;&lt;br&gt;&amp;gt; The &amp;lt;meta&amp;gt; handler isn't being called. &amp;nbsp;If I use /HTML/HEAD//node() the
&lt;br&gt;&amp;gt; handler will get called for the &amp;lt;head&amp;gt; and &amp;lt;title&amp;gt; tags but it will skip
&lt;br&gt;&amp;gt; right past the &amp;lt;meta&amp;gt; tags. &amp;nbsp;I know the tika code is seeing the META tags
&lt;br&gt;&amp;gt; because I see the tags trying to be matched in the startElement method of
&lt;br&gt;&amp;gt; MatchingContentHandler. &amp;nbsp;Any ideas?
&lt;br&gt;&lt;br&gt;The XHTMLDowngradeHandler wrapper will uppercase all element names and
&lt;br&gt;drop all namespaces and namespaced attributes, but as far as I can
&lt;br&gt;tell your code should still match the META tags. But there might be
&lt;br&gt;some bug in the XHTMLDowngradeHandler code that breaks things.
&lt;br&gt;&lt;br&gt;BR,
&lt;br&gt;&lt;br&gt;Jukka Zitting
&lt;br&gt;</content>
	<link rel="alternate" type="text/html" href="http://www.nabble.com/HTML-%3Cmeta%3E-tags-tp19576308p19658043.html" />
</entry>

<entry>
	<id>tag:www.nabble.com,2006:post-19657997</id>
	<title>Re: HTML &lt;meta&gt; tags</title>
	<published>2008-09-24T14:03:30Z</published>
	<updated>2008-09-24T14:03:30Z</updated>
	<author>
		<name>Dave Meikle-3</name>
	</author>
	<content type="html">Hi
&lt;br&gt;&lt;br&gt;2008/9/24 Brian Levay &amp;lt;&lt;a href=&quot;http://www.nabble.com/user/SendEmail.jtp?type=post&amp;post=19657997&amp;i=0&quot; target=&quot;_top&quot; rel=&quot;nofollow&quot;&gt;brian.levay@...&lt;/a&gt;&amp;gt;
&lt;br&gt;&lt;div class='shrinkable-quote'&gt;&lt;br&gt;&amp;gt; I'll submit the updates when I'm done (along with unit tests). &amp;nbsp;I'm having
&lt;br&gt;&amp;gt; a
&lt;br&gt;&amp;gt; problem though. &amp;nbsp;I sync'ed my tika baseline this morning and the Matcher
&lt;br&gt;&amp;gt; stopped matching the &amp;lt;meta&amp;gt; tags. &amp;nbsp;Any idea what my be causing this? &amp;nbsp;I've
&lt;br&gt;&amp;gt; tried many variations of the xpath expressions to match the &amp;lt;meta&amp;gt; tags.
&lt;br&gt;&amp;gt; Right now my code in HTMLParser looks like this:
&lt;br&gt;&amp;gt;
&lt;br&gt;&amp;gt; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;Matcher body = xpath.parse(&amp;quot;/HTML/BODY//node()&amp;quot;);
&lt;br&gt;&amp;gt; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;Matcher title = xpath.parse(&amp;quot;/HTML/HEAD/TITLE//node()&amp;quot;);
&lt;br&gt;&amp;gt; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;Matcher meta = xpath.parse(&amp;quot;/HTML/HEAD/META//node()&amp;quot;);
&lt;br&gt;&amp;gt; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;handler = new TeeContentHandler(
&lt;br&gt;&amp;gt; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;new MatchingContentHandler(getBodyHandler(xhtml), body),
&lt;br&gt;&amp;gt; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;new MatchingContentHandler(getTitleHandler(metadata),
&lt;br&gt;&amp;gt; title),
&lt;br&gt;&amp;gt; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;new MatchingContentHandler(getMetaHandler(metadata), meta));
&lt;br&gt;&amp;gt;
&lt;br&gt;&amp;gt; The &amp;lt;meta&amp;gt; handler isn't being called. &amp;nbsp;If I use /HTML/HEAD//node() the
&lt;br&gt;&amp;gt; handler will get called for the &amp;lt;head&amp;gt; and &amp;lt;title&amp;gt; tags but it will skip
&lt;br&gt;&amp;gt; right past the &amp;lt;meta&amp;gt; tags. &amp;nbsp;I know the tika code is seeing the META tags
&lt;br&gt;&amp;gt; because I see the tags trying to be matched in the startElement method of
&lt;br&gt;&amp;gt; MatchingContentHandler. &amp;nbsp;Any ideas?
&lt;br&gt;&amp;gt;
&lt;br&gt;&amp;gt; --Brian
&lt;br&gt;&amp;gt;
&lt;/div&gt;&lt;br&gt;I am using effectively the same thing in a local copy and have just re-based
&lt;br&gt;it again HEAD (shown in the diff below), and it appears to be working fine
&lt;br&gt;for me.
&lt;br&gt;&lt;br&gt;What is your test XML like?
&lt;br&gt;&lt;br&gt;Cheers,
&lt;br&gt;Dave
&lt;br&gt;&lt;br&gt;&lt;br&gt;Index: src/main/java/org/apache/tika/parser/html/HtmlParser.java
&lt;br&gt;===================================================================
&lt;br&gt;--- src/main/java/org/apache/tika/parser/html/HtmlParser.java &amp;nbsp; &amp;nbsp;(revision
&lt;br&gt;698705)
&lt;br&gt;+++ src/main/java/org/apache/tika/parser/html/HtmlParser.java &amp;nbsp; &amp;nbsp;(working
&lt;br&gt;copy)
&lt;br&gt;@@ -95,9 +95,11 @@
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;XPathParser xpath = new XPathParser(null, &amp;quot;&amp;quot;);
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;Matcher body = xpath.parse(&amp;quot;/HTML/BODY//node()&amp;quot;);
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;Matcher title = xpath.parse(&amp;quot;/HTML/HEAD/TITLE//node()&amp;quot;);
&lt;br&gt;+ &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;Matcher meta = xpath.parse(&amp;quot;/HTML/HEAD/META//node()&amp;quot;);
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;handler = new TeeContentHandler(
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;new MatchingContentHandler(getBodyHandler(xhtml), body),
&lt;br&gt;- &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;new MatchingContentHandler(getTitleHandler(metadata),
&lt;br&gt;title));
&lt;br&gt;+ &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;new MatchingContentHandler(getTitleHandler(metadata),
&lt;br&gt;title),
&lt;br&gt;+ &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;new MatchingContentHandler(getMetaHandler(metadata),
&lt;br&gt;meta));
&lt;br&gt;&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;// Parse the HTML document
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;xhtml.startDocument();
&lt;br&gt;@@ -116,6 +118,17 @@
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;};
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp;}
&lt;br&gt;&lt;br&gt;+ &amp;nbsp; &amp;nbsp;private ContentHandler getMetaHandler(final Metadata metadata) {
&lt;br&gt;+ &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;return new WriteOutContentHandler() {
&lt;br&gt;+ &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;@Override
&lt;br&gt;+ &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;public void startElement(
&lt;br&gt;+ &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;String uri, String local, String name, Attributes atts)
&lt;br&gt;+ &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;throws SAXException {
&lt;br&gt;+ &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;metadata.set(atts.getValue(0), atts.getValue(1));
&lt;br&gt;+ &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;}
&lt;br&gt;+ &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;};
&lt;br&gt;+ &amp;nbsp; &amp;nbsp;}
&lt;br&gt;+
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp;private ContentHandler getBodyHandler(final XHTMLContentHandler xhtml)
&lt;br&gt;{
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;return new TextContentHandler(xhtml) {
&lt;br&gt;</content>
	<link rel="alternate" type="text/html" href="http://www.nabble.com/HTML-%3Cmeta%3E-tags-tp19576308p19657997.html" />
</entry>

<entry>
	<id>tag:www.nabble.com,2006:post-19657051</id>
	<title>Re: HTML &lt;meta&gt; tags</title>
	<published>2008-09-24T13:17:25Z</published>
	<updated>2008-09-24T13:17:25Z</updated>
	<author>
		<name>Brian Levay</name>
	</author>
	<content type="html">I'll submit the updates when I'm done (along with unit tests). &amp;nbsp;I'm having a
&lt;br&gt;problem though. &amp;nbsp;I sync'ed my tika baseline this morning and the Matcher
&lt;br&gt;stopped matching the &amp;lt;meta&amp;gt; tags. &amp;nbsp;Any idea what my be causing this? &amp;nbsp;I've
&lt;br&gt;tried many variations of the xpath expressions to match the &amp;lt;meta&amp;gt; tags.
&lt;br&gt;Right now my code in HTMLParser looks like this:
&lt;br&gt;&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; Matcher body = xpath.parse(&amp;quot;/HTML/BODY//node()&amp;quot;);
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; Matcher title = xpath.parse(&amp;quot;/HTML/HEAD/TITLE//node()&amp;quot;);
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; Matcher meta = xpath.parse(&amp;quot;/HTML/HEAD/META//node()&amp;quot;);
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; handler = new TeeContentHandler(
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; new MatchingContentHandler(getBodyHandler(xhtml), body),
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; new MatchingContentHandler(getTitleHandler(metadata),
&lt;br&gt;title),
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; new MatchingContentHandler(getMetaHandler(metadata), meta));
&lt;br&gt;&lt;br&gt;The &amp;lt;meta&amp;gt; handler isn't being called. &amp;nbsp;If I use /HTML/HEAD//node() the
&lt;br&gt;handler will get called for the &amp;lt;head&amp;gt; and &amp;lt;title&amp;gt; tags but it will skip
&lt;br&gt;right past the &amp;lt;meta&amp;gt; tags. &amp;nbsp;I know the tika code is seeing the META tags
&lt;br&gt;because I see the tags trying to be matched in the startElement method of
&lt;br&gt;MatchingContentHandler. &amp;nbsp;Any ideas?
&lt;br&gt;&lt;br&gt;--Brian
&lt;br&gt;&lt;br&gt;On Tue, Sep 23, 2008 at 6:04 PM, Thorsten Scherler &amp;lt;&lt;a href=&quot;http://www.nabble.com/user/SendEmail.jtp?type=post&amp;post=19657051&amp;i=0&quot; target=&quot;_top&quot; rel=&quot;nofollow&quot;&gt;thorsten@...&lt;/a&gt;&amp;gt;wrote:
&lt;br&gt;&lt;div class='shrinkable-quote'&gt;&lt;br&gt;&amp;gt; On Sat, 2008-09-20 at 22:41 +0200, Jukka Zitting wrote:
&lt;br&gt;&amp;gt; &amp;gt; Hi,
&lt;br&gt;&amp;gt; &amp;gt;
&lt;br&gt;&amp;gt; &amp;gt; On Fri, Sep 19, 2008 at 7:16 PM, Brian Levay &amp;lt;&lt;a href=&quot;http://www.nabble.com/user/SendEmail.jtp?type=post&amp;post=19657051&amp;i=1&quot; target=&quot;_top&quot; rel=&quot;nofollow&quot;&gt;brian.levay@...&lt;/a&gt;&amp;gt;
&lt;br&gt;&amp;gt; wrote:
&lt;br&gt;&amp;gt; &amp;gt; &amp;gt; I need to enhance the functionality of HTMLParser to return the HTML
&lt;br&gt;&amp;gt; &amp;lt;meta&amp;gt;
&lt;br&gt;&amp;gt; &amp;gt; &amp;gt; tags found in the document in the Metadata object. &amp;nbsp;Is overriding
&lt;br&gt;&amp;gt; HTMLParser
&lt;br&gt;&amp;gt; &amp;gt; &amp;gt; (or installing a customer HTMLParser) the best way to do this?
&lt;br&gt;&amp;gt; &amp;gt;
&lt;br&gt;&amp;gt; &amp;gt; We would be happy to receive a patch that adds this feature directly
&lt;br&gt;&amp;gt; &amp;gt; in Tika. :-)
&lt;br&gt;&amp;gt;
&lt;br&gt;&amp;gt; +1
&lt;br&gt;&amp;gt;
&lt;br&gt;&amp;gt; salu2
&lt;br&gt;&amp;gt; &amp;gt;
&lt;br&gt;&amp;gt; &amp;gt; BR,
&lt;br&gt;&amp;gt; &amp;gt;
&lt;br&gt;&amp;gt; &amp;gt; Jukka Zitting
&lt;br&gt;&amp;gt; --
&lt;br&gt;&amp;gt; Thorsten Scherler &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; thorsten.at.apache.org
&lt;br&gt;&amp;gt; Open Source Java &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;consulting, training and solutions
&lt;br&gt;&amp;gt;
&lt;br&gt;&amp;gt;
&lt;br&gt;&lt;/div&gt;</content>
	<link rel="alternate" type="text/html" href="http://www.nabble.com/HTML-%3Cmeta%3E-tags-tp19576308p19657051.html" />
</entry>

<entry>
	<id>tag:www.nabble.com,2006:post-19638139</id>
	<title>Re: HTML &lt;meta&gt; tags</title>
	<published>2008-09-23T15:04:05Z</published>
	<updated>2008-09-23T15:04:05Z</updated>
	<author>
		<name>Thorsten Scherler</name>
	</author>
	<content type="html">On Sat, 2008-09-20 at 22:41 +0200, Jukka Zitting wrote:
&lt;br&gt;&amp;gt; Hi,
&lt;br&gt;&amp;gt; 
&lt;br&gt;&amp;gt; On Fri, Sep 19, 2008 at 7:16 PM, Brian Levay &amp;lt;&lt;a href=&quot;http://www.nabble.com/user/SendEmail.jtp?type=post&amp;post=19638139&amp;i=0&quot; target=&quot;_top&quot; rel=&quot;nofollow&quot;&gt;brian.levay@...&lt;/a&gt;&amp;gt; wrote:
&lt;br&gt;&amp;gt; &amp;gt; I need to enhance the functionality of HTMLParser to return the HTML &amp;lt;meta&amp;gt;
&lt;br&gt;&amp;gt; &amp;gt; tags found in the document in the Metadata object. &amp;nbsp;Is overriding HTMLParser
&lt;br&gt;&amp;gt; &amp;gt; (or installing a customer HTMLParser) the best way to do this?
&lt;br&gt;&amp;gt; 
&lt;br&gt;&amp;gt; We would be happy to receive a patch that adds this feature directly
&lt;br&gt;&amp;gt; in Tika. :-)
&lt;br&gt;&lt;br&gt;+1
&lt;br&gt;&lt;br&gt;salu2
&lt;br&gt;&amp;gt; 
&lt;br&gt;&amp;gt; BR,
&lt;br&gt;&amp;gt; 
&lt;br&gt;&amp;gt; Jukka Zitting
&lt;br&gt;-- 
&lt;br&gt;Thorsten Scherler &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; thorsten.at.apache.org
&lt;br&gt;Open Source Java &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;consulting, training and solutions
&lt;br&gt;&lt;br&gt;</content>
	<link rel="alternate" type="text/html" href="http://www.nabble.com/HTML-%3Cmeta%3E-tags-tp19576308p19638139.html" />
</entry>

<entry>
	<id>tag:www.nabble.com,2006:post-19618405</id>
	<title>[jira] Resolved: (TIKA-163) GUI does not support drag and drop in Gnome or KDE</title>
	<published>2008-09-22T16:11:44Z</published>
	<updated>2008-09-22T16:11:44Z</updated>
	<author>
		<name>JIRA jira@apache.org</name>
	</author>
	<content type="html">&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp;[ &lt;a href=&quot;https://issues.apache.org/jira/browse/TIKA-163?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel&quot; target=&quot;_top&quot; rel=&quot;nofollow&quot;&gt;https://issues.apache.org/jira/browse/TIKA-163?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel&lt;/a&gt;&amp;nbsp;]
&lt;br&gt;&lt;br&gt;Jukka Zitting resolved TIKA-163.
&lt;br&gt;--------------------------------
&lt;br&gt;&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;Resolution: Fixed
&lt;br&gt;&amp;nbsp; &amp;nbsp; Fix Version/s: 0.2-incubating
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;Assignee: Jukka Zitting
&lt;br&gt;&lt;br&gt;Patch applied in revision 698032. Thanks!
&lt;br&gt;&lt;div class='shrinkable-quote'&gt;&lt;br&gt;&amp;gt; GUI does not support drag and drop in Gnome or KDE
&lt;br&gt;&amp;gt; --------------------------------------------------
&lt;br&gt;&amp;gt;
&lt;br&gt;&amp;gt; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; Key: TIKA-163
&lt;br&gt;&amp;gt; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; URL: &lt;a href=&quot;https://issues.apache.org/jira/browse/TIKA-163&quot; target=&quot;_top&quot; rel=&quot;nofollow&quot;&gt;https://issues.apache.org/jira/browse/TIKA-163&lt;/a&gt;&lt;br&gt;&amp;gt; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; Project: Tika
&lt;br&gt;&amp;gt; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;Issue Type: Bug
&lt;br&gt;&amp;gt; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;Components: gui
&lt;br&gt;&amp;gt; &amp;nbsp; &amp;nbsp;Affects Versions: 0.2-incubating
&lt;br&gt;&amp;gt; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; Environment: Ubuntu 8.04.1, Gnome 2.22.3
&lt;br&gt;&amp;gt; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;Reporter: Dave Meikle
&lt;br&gt;&amp;gt; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;Assignee: Jukka Zitting
&lt;br&gt;&amp;gt; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; Fix For: 0.2-incubating
&lt;br&gt;&amp;gt;
&lt;br&gt;&amp;gt; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; Attachments: TIKA-163.diff
&lt;br&gt;&amp;gt;
&lt;br&gt;&amp;gt;
&lt;br&gt;&amp;gt; As Gnome/KDE do not represent transfer file lists as DataFlavor.javaFileListFlavor the GUI does not respond to drag and drop events, rendering it useless in these desktop environments. As they represent transfer file lists as a list of URIs in a String a workaround is to convert this to a list of files.
&lt;br&gt;&amp;gt; A patch will follow to implement this.
&lt;/div&gt;&lt;br&gt;-- 
&lt;br&gt;This message is automatically generated by JIRA.
&lt;br&gt;-
&lt;br&gt;You can reply to this email to add a comment to the issue online.
&lt;br&gt;&lt;br&gt;</content>
	<link rel="alternate" type="text/html" href="http://www.nabble.com/-jira--Created%3A-%28TIKA-163%29-GUI-does-not-support-drag-and-drop-in-Gnome-or-KDE-tp19615494p19618405.html" />
</entry>

<entry>
	<id>tag:www.nabble.com,2006:post-19618327</id>
	<title>[jira] Resolved: (TIKA-140) HTML parser unable to extract text</title>
	<published>2008-09-22T16:03:44Z</published>
	<updated>2008-09-22T16:03:44Z</updated>
	<author>
		<name>JIRA jira@apache.org</name>
	</author>
	<content type="html">&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp;[ &lt;a href=&quot;https://issues.apache.org/jira/browse/TIKA-140?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel&quot; target=&quot;_top&quot; rel=&quot;nofollow&quot;&gt;https://issues.apache.org/jira/browse/TIKA-140?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel&lt;/a&gt;&amp;nbsp;]
&lt;br&gt;&lt;br&gt;Jukka Zitting resolved TIKA-140.
&lt;br&gt;--------------------------------
&lt;br&gt;&lt;br&gt;&amp;nbsp; &amp;nbsp; Resolution: Fixed
&lt;br&gt;&lt;br&gt;Resolved in a somewhat different manner in revision 698028.
&lt;br&gt;&lt;br&gt;Instead of adding the special &amp;quot;*&amp;quot; wildcard to the XPath matcher, I created a new XHTMLDowngradeHandler decorator class that makes sure that all incoming (X)HTML is uniformly structured.
&lt;br&gt;&lt;div class='shrinkable-quote'&gt;&lt;br&gt;&amp;gt; HTML parser unable to extract text 
&lt;br&gt;&amp;gt; -----------------------------------
&lt;br&gt;&amp;gt;
&lt;br&gt;&amp;gt; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; Key: TIKA-140
&lt;br&gt;&amp;gt; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; URL: &lt;a href=&quot;https://issues.apache.org/jira/browse/TIKA-140&quot; target=&quot;_top&quot; rel=&quot;nofollow&quot;&gt;https://issues.apache.org/jira/browse/TIKA-140&lt;/a&gt;&lt;br&gt;&amp;gt; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; Project: Tika
&lt;br&gt;&amp;gt; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;Issue Type: Bug
&lt;br&gt;&amp;gt; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;Components: parser
&lt;br&gt;&amp;gt; &amp;nbsp; &amp;nbsp;Affects Versions: 0.2-incubating
&lt;br&gt;&amp;gt; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;Reporter: julien nioche
&lt;br&gt;&amp;gt; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;Assignee: Jukka Zitting
&lt;br&gt;&amp;gt; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; Fix For: 0.2-incubating
&lt;br&gt;&amp;gt;
&lt;br&gt;&amp;gt; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; Attachments: 1.html, anynamespace.diff
&lt;br&gt;&amp;gt;
&lt;br&gt;&amp;gt;
&lt;br&gt;&amp;gt; At revision 648732
&lt;br&gt;&amp;gt; The file in attachment is not parsed properly by the current HTML parser which returns an empty string when calling ParseUtils.getStringContent(). Saving the same document as .txt from Firefox gives some text.
&lt;/div&gt;&lt;br&gt;-- 
&lt;br&gt;This message is automatically generated by JIRA.
&lt;br&gt;-
&lt;br&gt;You can reply to this email to add a comment to the issue online.
&lt;br&gt;&lt;br&gt;</content>
	<link rel="alternate" type="text/html" href="http://www.nabble.com/-jira--Created%3A-%28TIKA-140%29-HTML-parser-unable-to-extract-text-tp16727862p19618327.html" />
</entry>

<entry>
	<id>tag:www.nabble.com,2006:post-19616893</id>
	<title>Re: New UIMA annotator based on Tika</title>
	<published>2008-09-22T14:25:49Z</published>
	<updated>2008-09-22T14:25:49Z</updated>
	<author>
		<name>Jukka Zitting</name>
	</author>
	<content type="html">Hi,
&lt;br&gt;&lt;br&gt;On Mon, Sep 22, 2008 at 10:58 AM, Julien Nioche
&lt;br&gt;&amp;lt;&lt;a href=&quot;http://www.nabble.com/user/SendEmail.jtp?type=post&amp;post=19616893&amp;i=0&quot; target=&quot;_top&quot; rel=&quot;nofollow&quot;&gt;lists.digitalpebble@...&lt;/a&gt;&amp;gt; wrote:
&lt;br&gt;&amp;gt; Just to let you know that we've just donated a UIMA component based on Tika
&lt;br&gt;&amp;gt; which is used to convert markup into UIMA annotations, extract the text and
&lt;br&gt;&amp;gt; metadata etc...
&lt;br&gt;&lt;br&gt;Cool, thanks for sharing!
&lt;br&gt;&lt;br&gt;&amp;gt; More details on &lt;a href=&quot;https://issues.apache.org/jira/browse/UIMA-1095&quot; target=&quot;_top&quot; rel=&quot;nofollow&quot;&gt;https://issues.apache.org/jira/browse/UIMA-1095&lt;/a&gt;&lt;br&gt;&lt;br&gt;I gave a quick look at the code and noticed that you apparently need
&lt;br&gt;to sanitize (clean out control characters, normalize spaces) some of
&lt;br&gt;the parsed text output from Word documents. I guess that's something
&lt;br&gt;that we could and should do already in Tika itself.
&lt;br&gt;&lt;br&gt;BR,
&lt;br&gt;&lt;br&gt;Jukka Zitting
&lt;br&gt;</content>
	<link rel="alternate" type="text/html" href="http://www.nabble.com/New-UIMA-annotator-based-on-Tika-tp19604122p19616893.html" />
</entry>

<entry>
	<id>tag:www.nabble.com,2006:post-19615577</id>
	<title>[jira] Updated: (TIKA-163) GUI does not support drag and drop in Gnome or KDE</title>
	<published>2008-09-22T13:07:44Z</published>
	<updated>2008-09-22T13:07:44Z</updated>
	<author>
		<name>JIRA jira@apache.org</name>
	</author>
	<content type="html">&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp;[ &lt;a href=&quot;https://issues.apache.org/jira/browse/TIKA-163?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel&quot; target=&quot;_top&quot; rel=&quot;nofollow&quot;&gt;https://issues.apache.org/jira/browse/TIKA-163?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel&lt;/a&gt;&amp;nbsp;]
&lt;br&gt;&lt;br&gt;Dave Meikle updated TIKA-163:
&lt;br&gt;-----------------------------
&lt;br&gt;&lt;br&gt;&amp;nbsp; &amp;nbsp; Attachment: TIKA-163.diff
&lt;br&gt;&lt;br&gt;Patch to implement workaround.
&lt;br&gt;&lt;div class='shrinkable-quote'&gt;&lt;br&gt;&amp;gt; GUI does not support drag and drop in Gnome or KDE
&lt;br&gt;&amp;gt; --------------------------------------------------
&lt;br&gt;&amp;gt;
&lt;br&gt;&amp;gt; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; Key: TIKA-163
&lt;br&gt;&amp;gt; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; URL: &lt;a href=&quot;https://issues.apache.org/jira/browse/TIKA-163&quot; target=&quot;_top&quot; rel=&quot;nofollow&quot;&gt;https://issues.apache.org/jira/browse/TIKA-163&lt;/a&gt;&lt;br&gt;&amp;gt; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; Project: Tika
&lt;br&gt;&amp;gt; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;Issue Type: Bug
&lt;br&gt;&amp;gt; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;Components: gui
&lt;br&gt;&amp;gt; &amp;nbsp; &amp;nbsp;Affects Versions: 0.2-incubating
&lt;br&gt;&amp;gt; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; Environment: Ubuntu 8.04.1, Gnome 2.22.3
&lt;br&gt;&amp;gt; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;Reporter: Dave Meikle
&lt;br&gt;&amp;gt; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; Attachments: TIKA-163.diff
&lt;br&gt;&amp;gt;
&lt;br&gt;&amp;gt;
&lt;br&gt;&amp;gt; As Gnome/KDE do not represent transfer file lists as DataFlavor.javaFileListFlavor the GUI does not respond to drag and drop events, rendering it useless in these desktop environments. As they represent transfer file lists as a list of URIs in a String a workaround is to convert this to a list of files.
&lt;br&gt;&amp;gt; A patch will follow to implement this.
&lt;/div&gt;&lt;br&gt;-- 
&lt;br&gt;This message is automatically generated by JIRA.
&lt;br&gt;-
&lt;br&gt;You can reply to this email to add a comment to the issue online.
&lt;br&gt;&lt;br&gt;</content>
	<link rel="alternate" type="text/html" href="http://www.nabble.com/-jira--Created%3A-%28TIKA-163%29-GUI-does-not-support-drag-and-drop-in-Gnome-or-KDE-tp19615494p19615577.html" />
</entry>

<entry>
	<id>tag:www.nabble.com,2006:post-19615494</id>
	<title>[jira] Created: (TIKA-163) GUI does not support drag and drop in Gnome or KDE</title>
	<published>2008-09-22T13:03:44Z</published>
	<updated>2008-09-22T13:03:44Z</updated>
	<author>
		<name>JIRA jira@apache.org</name>
	</author>
	<content type="html">GUI does not support drag and drop in Gnome or KDE
&lt;br&gt;--------------------------------------------------
&lt;br&gt;&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;Key: TIKA-163
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;URL: &lt;a href=&quot;https://issues.apache.org/jira/browse/TIKA-163&quot; target=&quot;_top&quot; rel=&quot;nofollow&quot;&gt;https://issues.apache.org/jira/browse/TIKA-163&lt;/a&gt;&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;Project: Tika
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; Issue Type: Bug
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; Components: gui
&lt;br&gt;&amp;nbsp; &amp;nbsp; Affects Versions: 0.2-incubating
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;Environment: Ubuntu 8.04.1, Gnome 2.22.3
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; Reporter: Dave Meikle
&lt;br&gt;&lt;br&gt;&lt;br&gt;As Gnome/KDE do not represent transfer file lists as DataFlavor.javaFileListFlavor the GUI does not respond to drag and drop events, rendering it useless in these desktop environments. As they represent transfer file lists as a list of URIs in a String a workaround is to convert this to a list of files.
&lt;br&gt;&lt;br&gt;A patch will follow to implement this.
&lt;br&gt;&lt;br&gt;-- 
&lt;br&gt;This message is automatically generated by JIRA.
&lt;br&gt;-
&lt;br&gt;You can reply to this email to add a comment to the issue online.
&lt;br&gt;&lt;br&gt;</content>
	<link rel="alternate" type="text/html" href="http://www.nabble.com/-jira--Created%3A-%28TIKA-163%29-GUI-does-not-support-drag-and-drop-in-Gnome-or-KDE-tp19615494p19615494.html" />
</entry>

</feed>
