|
View:
New views
6 Messages
—
Rating Filter:
Alert me
|
|
|
once again: indexing of pdf documentsHi folks,
when I installed my first tiki earlier this year I posted a problem here that my tiki was indexing every kind of file (text, ps, doc, ppt, whatever I set up), but no pdfs. Back in February I found a bug in refresh-functions.php, which was fixed (and maybe did some other things), and after that indexing of pdfs magically worked for me. Now I have updated from 1.9.9 to 1.9.11, and indexing is broken again for pdfs. :-( I already looked into refresh-functions.php to make sure it was the correct version. As far as I can tell it is (at least it contains the fix mentioned above). However, it is not the version I edited in February anymore, but a new one from the 1.9.11 install. Can anyone here confirm that indexing of pdf documents is working for her/him (or not)? I am using application/pdf /usr/local/bin/pdftotext %1 - as Mime Type filter, if that should matter. Any other hints/thoughts would be greatly appreciated, too. cu Gerrit ------------------------------------------------------------------------- Check out the new SourceForge.net Marketplace. It's the best place to buy or sell services for just about anything Open Source. http://sourceforge.net/services/buy/index.php _______________________________________________ Tikiwiki-users mailing list Tikiwiki-users@... https://lists.sourceforge.net/lists/listinfo/tikiwiki-users |
|
|
Re: once again: indexing of pdf documentsOn Fri, Jun 27, 2008 at 03:54:08PM +0200, Gerrit Kühn wrote:
> Can anyone here confirm that indexing of pdf documents is working for > her/him (or not)? I am using > > application/pdf /usr/local/bin/pdftotext %1 - > > as Mime Type filter, if that should matter. > Any other hints/thoughts would be greatly appreciated, too. One addition to this: Meanwhile I replaced pdftotext with a self-written script that calls pdftotext and then tees the output into a file (and to stdout of course). This way I can make sure that an uploaded pdf file is actually processed during indexing and that the output of pstotext is correct. The problem has to be somewhere later in the processing, because the pdf files are not found by the search module, although pstotext is abviously working as expected. postscript and other file types are working fine, too. I'm somewhat puzzled here... cu Gerrit -- ------------------------------------------------------------------------- Check out the new SourceForge.net Marketplace. It's the best place to buy or sell services for just about anything Open Source. http://sourceforge.net/services/buy/index.php _______________________________________________ Tikiwiki-users mailing list Tikiwiki-users@... https://lists.sourceforge.net/lists/listinfo/tikiwiki-users |
|
|
Re: once again: indexing of pdf documentsOn Fri, Jun 27, 2008 at 11:24:47PM +0200, Gerrit Kühn wrote:
> One addition to this: [...] And one more thing: I have enabled auto-indexing on upload. However, after uploading a file, the search function does not find its contents. I have to reindex all files for search manually to make this work. I guess that is not quite the way it should be... cu Gerrit -- ------------------------------------------------------------------------- Check out the new SourceForge.net Marketplace. It's the best place to buy or sell services for just about anything Open Source. http://sourceforge.net/services/buy/index.php _______________________________________________ Tikiwiki-users mailing list Tikiwiki-users@... https://lists.sourceforge.net/lists/listinfo/tikiwiki-users |
|
|
Mostly solved (bugfix included): Re: once again: indexing of pdf documentsOn Sat, 28 Jun 2008 15:11:32 +0200 Gerrit Kühn
<gerrit@...> wrote about Re: [Tikiwiki-users] once again: indexing of pdf documents: Hi, I have made the following changes: mclane# diff -u refresh-functions.php refresh-functions.php.orig --- refresh-functions.php 2008-07-02 15:25:20.000000000 +0200 +++ refresh-functions.php.orig 2008-07-02 15:26:28.000000000 +0200 @@ -387,16 +387,16 @@ $query="select * from `tiki_files`"; $result=$tikilib->query($query,array(),1,rand(0,$cant-1)); $info=$result->fetchRow(); - $words=&search_index($info["description"]." ".$info["name"]." ".$info ["search_data"]." ".$info["filename"]); + $words=&search_index($info["data"]." ".$info["description"]." ".$info ["name"]); insert_index($words,"file",$info["fileId"]); } } function refresh_index_files() { global $tikilib; - $result = $tikilib->query("select * from `tiki_files`"); + $result = $tikilib->query("select * from `tiki_files`", array()); while ($info = $result->fetchRow()) { - $words=&search_index($info["description"]." ".$info["name"]." ". $info["search_data"]." ".$info["filename"]); + $words=&search_index($info['data'].' '.$info['description'].' '. $info['name']. ' '.$info['search_data']); insert_index($words,"file",$info ["fileId"]); } } Now indexing sort of works for me, even for pdf files. I removed the 'data' from the search words, because they're binary anyway and to me it seems not to make sense to include them here. The most important change which fixed my problems was to remove the array () from the query in refresh_index_files(). I still do not know much about php and mysql, so I do not really know what I changed there, but the other functions in this file show the same structure (the array() is only there in the random_-version of the fucntions). Maybe someone here can review this and commit it if it is the right solution. I have one problem left which I know of only since yesterday when I installed an sql browser to look into the actual database: Some of my uploaded pdf-files appear to have the wrong mime type. They are unrecognized and stored unter filetype application/unknown. Therefore they are not converted to ascii and subsequently not indexed. As I said, there are only some pdf files concerned, others are ok. I cannot see any scheme or structure behind this. Does anyone here have an idea why that happens and how to fix it? cu Gerrit ------------------------------------------------------------------------- Sponsored by: SourceForge.net Community Choice Awards: VOTE NOW! Studies have shown that voting for your favorite open source project, along with a healthy diet, reduces your potential for chronic lameness and boredom. Vote Now at http://www.sourceforge.net/community/cca08 _______________________________________________ Tikiwiki-users mailing list Tikiwiki-users@... https://lists.sourceforge.net/lists/listinfo/tikiwiki-users |
|
|
Re: Mostly solved (bugfix included): Re: once again: indexing of pdf documentsOn Wed, 2 Jul 2008 15:35:47 +0200 Gerrit Kühn <gerrit@...>
wrote about [Tikiwiki-users] Mostly solved (bugfix included): Re: once again: indexing of pdf documents: GK> mclane# diff -u refresh-functions.php refresh-functions.php.orig GK> --- refresh-functions.php 2008-07-02 15:25:20.000000000 +0200 GK> +++ refresh-functions.php.orig 2008-07-02 15:26:28.000000000 +0200 And for your convenience, here is the diff in the correct direction. :-) mclane# diff -u refresh-functions.php.orig refresh-functions.php --- refresh-functions.php.orig 2008-07-02 15:26:28.000000000 +0200 +++ refresh-functions.php 2008-07-02 16:03:58.000000000 +0200 @@ -387,16 +387,16 @@ $query="select * from `tiki_files`"; $result=$tikilib->query($query,array(),1,rand(0,$cant-1)); $info=$result->fetchRow(); - $words=&search_index($info["data"]." ".$info["description"]." ".$info ["name"]); + $words=&search_index($info["description"]." ".$info["name"]." ".$info ["search_data"]." ".$info["filename"]); insert_index($words,"file",$info ["fileId"]); } } function refresh_index_files() { global $tikilib; - $result = $tikilib->query("select * from `tiki_files`", array()); + $result = $tikilib->query("select * from `tiki_files`"); while ($info = $result->fetchRow()) { - $words=&search_index($info['data'].' '.$info['description'].' '. $info['name']. ' '.$info['search_data']); + $words=&search_index($info["description"]." ".$info["name"]." ". $info["search_data"]." ".$info["filename"]); insert_index($words,"file", $info["fileId"]); } } GK> I have one problem left which I know of only since yesterday when I GK> installed an sql browser to look into the actual database: GK> Some of my uploaded pdf-files appear to have the wrong mime type. They GK> are unrecognized and stored unter filetype application/unknown. GK> Therefore they are not converted to ascii and subsequently not GK> indexed. GK> As I said, there are only some pdf files concerned, others are ok. I GK> cannot see any scheme or structure behind this. Does anyone here have GK> an idea why that happens and how to fix it? For the record: I could solve this via Google and found out that the upload mime-type depends on the setup of the browser used, so I have to make the changes there. However, I think that a file-type detection independent of the browser would be desirable. There must already be something like this in tiki, because the file icons presented in the file gallery were always the right ones, independent of the mime type in the database which could be pdf, application/pdf, x-application/pdf, application/uknown or something else... cu Gerrit ------------------------------------------------------------------------- Sponsored by: SourceForge.net Community Choice Awards: VOTE NOW! Studies have shown that voting for your favorite open source project, along with a healthy diet, reduces your potential for chronic lameness and boredom. Vote Now at http://www.sourceforge.net/community/cca08 _______________________________________________ Tikiwiki-users mailing list Tikiwiki-users@... https://lists.sourceforge.net/lists/listinfo/tikiwiki-users |
|
|
Re: Mostly solved (bugfix included): Re: once again: indexing of pdf documentsGerrit, I would suggest you to send this thread to the Tikiwiki
developers <tikiwiki-devel@...> instead, for a higher probability to get appropriate feedback (coders and those coding discussions are mostly in that list) Cheers Xavi En/na Gerrit Kühn ha escrit: > On Wed, 2 Jul 2008 15:35:47 +0200 Gerrit Kühn <gerrit@...> > wrote about [Tikiwiki-users] Mostly solved (bugfix included): Re: once > again: indexing of pdf documents: > > GK> mclane# diff -u refresh-functions.php refresh-functions.php.orig > GK> --- refresh-functions.php 2008-07-02 15:25:20.000000000 +0200 > GK> +++ refresh-functions.php.orig 2008-07-02 15:26:28.000000000 +0200 > > And for your convenience, here is the diff in the correct direction. :-) > > mclane# diff -u refresh-functions.php.orig refresh-functions.php > --- refresh-functions.php.orig 2008-07-02 15:26:28.000000000 +0200 > +++ refresh-functions.php 2008-07-02 16:03:58.000000000 +0200 > @@ -387,16 +387,16 @@ > $query="select * from `tiki_files`"; > $result=$tikilib->query($query,array(),1,rand(0,$cant-1)); > $info=$result->fetchRow(); > - $words=&search_index($info["data"]." ".$info["description"]." ".$info > ["name"]); > + $words=&search_index($info["description"]." ".$info["name"]." ".$info > ["search_data"]." ".$info["filename"]); insert_index($words,"file",$info > ["fileId"]); } > } > > function refresh_index_files() { > global $tikilib; > - $result = $tikilib->query("select * from `tiki_files`", array()); > + $result = $tikilib->query("select * from `tiki_files`"); > while ($info = $result->fetchRow()) { > - $words=&search_index($info['data'].' '.$info['description'].' '. > $info['name']. ' '.$info['search_data']); > + $words=&search_index($info["description"]." ".$info["name"]." ". > $info["search_data"]." ".$info["filename"]); insert_index($words,"file", > $info["fileId"]); } > } > > > GK> I have one problem left which I know of only since yesterday when I > GK> installed an sql browser to look into the actual database: > GK> Some of my uploaded pdf-files appear to have the wrong mime type. They > GK> are unrecognized and stored unter filetype application/unknown. > GK> Therefore they are not converted to ascii and subsequently not > GK> indexed. > GK> As I said, there are only some pdf files concerned, others are ok. I > GK> cannot see any scheme or structure behind this. Does anyone here have > GK> an idea why that happens and how to fix it? > > For the record: I could solve this via Google and found out that the > upload mime-type depends on the setup of the browser used, so I have to > make the changes there. However, I think that a file-type detection > independent of the browser would be desirable. There must already be > something like this in tiki, because the file icons presented in the > file gallery were always the right ones, independent of the mime type > in the database which could be pdf, application/pdf, x-application/pdf, > application/uknown or something else... > > > cu > Gerrit > > ------------------------------------------------------------------------- > Sponsored by: SourceForge.net Community Choice Awards: VOTE NOW! > Studies have shown that voting for your favorite open source project, > along with a healthy diet, reduces your potential for chronic lameness > and boredom. Vote Now at http://www.sourceforge.net/community/cca08 > _______________________________________________ > Tikiwiki-users mailing list > Tikiwiki-users@... > https://lists.sourceforge.net/lists/listinfo/tikiwiki-users > > ------------------------------------------------------------------------- Sponsored by: SourceForge.net Community Choice Awards: VOTE NOW! Studies have shown that voting for your favorite open source project, along with a healthy diet, reduces your potential for chronic lameness and boredom. Vote Now at http://www.sourceforge.net/community/cca08 _______________________________________________ Tikiwiki-users mailing list Tikiwiki-users@... https://lists.sourceforge.net/lists/listinfo/tikiwiki-users |
| Free Forum Powered by Nabble | Forum Help |