Seems like i'm not the only one with this "no extraction" problem:http://www.mail-archive.com/solr-user@lucene.apache.org/msg33609.htmlApparently he tried the same thing, building from the trunk, and indexing a pdf, and no extraction occured... Strange. Marc G.
> From: dekay...@hotmail.com > To: solr-user@lucene.apache.org > Subject: RE: Problem with pdf, upgrading Cell > Date: Fri, 23 Apr 2010 15:12:39 +0200 > > > I'm launching it with the start.jar utility, and there doesn't seem to be > anything weird inside the console when i upload a pdf. Is there a way to > output the console to a log file? The only log file that get's updated is a > log file in the logs directory, and it seems to only show the input/ouput of > the web requests (get and posts...). > for example:127.0.0.1 - - [23/Apr/2010:13:06:47 +0000] "GET > /solr/core0/admin/luke?show=schema&wt=json HTTP/1.1" 200 21690 127.0.0.1 - - > [23/Apr/2010:13:06:47 +0000] "GET /solr/core0/admin/luke?wt=json HTTP/1.1" > 200 780 127.0.0.1 - - [23/Apr/2010:13:06:57 +0000] "POST > /solr/core0/update/extract?literal.id=C%3A%5CDocuments+and+Settings%5CM1B%5Cworkspace%5C3DS_FileIndexer%5Ctest%5Clucidworks-solr-refguide-1.4.pdf&literal.title=lucidworks-solr-refguide-1.4.pdf&literal.url=http%3A%2F%2Fwww.3ds.com%2Flucidworks-solr-refguide-1.4.pdf&literal.appKey=media&literal.type=document&literal.siteHash=53e446a6b81860dcfa1cc2fef4ef976b&literal.group=portal&literal.group=var&literal.group=0&literal.group=caa_gold&literal.group=caa_partner&literal.group=ag12&literal.group=ag17&wt=javabin&version=1 > HTTP/1.1" 200 41 127.0.0.1 - - [23/Apr/2010:13:06:58 +0000] "POST > /solr/core0/update/extract?literal.id=C%3A%5CDocuments+and+Settings%5CM1B%5Cworkspace%5C3DS_FileIndexer%5Ctest%5Cmysql-proxy-en.pdf&literal.title=mysql-proxy-en.pdf&literal.url=http%3A%2F%2Fwww.3ds.com%2Fmysql-proxy-en.pdf&literal.appKey=media&literal.type=document&literal.siteHash=53e446a6b81860dcfa1cc2fef4ef976b&literal.group=portal&literal.group=var&literal.group=0&literal.group=caa_gold&literal.group=caa_partner&literal.group=ag12&literal.group=ag17&wt=javabin&version=1 > HTTP/1.1" 200 44 127.0.0.1 - - [23/Apr/2010:13:06:59 +0000] "POST > /solr/core0/update/extract?literal.id=C%3A%5CDocuments+and+Settings%5CM1B%5Cworkspace%5C3DS_FileIndexer%5Ctest%5Cpython-cheat-sheet-v1.pdf&literal.title=python-cheat-sheet-v1.pdf&literal.url=http%3A%2F%2Fwww.3ds.com%2Fpython-cheat-sheet-v1.pdf&literal.appKey=media&literal.type=document&literal.siteHash=53e446a6b81860dcfa1cc2fef4ef976b&literal.group=portal&literal.group=var&literal.group=0&literal.group=caa_gold&literal.group=caa_partner&literal.group=ag12&literal.group=ag17&wt=javabin&version=1 > HTTP/1.1" 200 44 127.0.0.1 - - [23/Apr/2010:13:07:00 +0000] "POST > /solr/core0/update HTTP/1.1" 200 41 127.0.0.1 - - [23/Apr/2010:13:07:00 > +0000] "POST /solr/core0/update HTTP/1.1" 200 41 127.0.0.1 - - > [23/Apr/2010:13:07:05 +0000] "GET /solr/core0/admin/schema.jsp HTTP/1.1" 200 > 26395 127.0.0.1 - - [23/Apr/2010:13:07:05 +0000] "GET > /solr/core0/admin/jquery-1.2.3.min.js HTTP/1.1" 304 0 > I don't think that's going to help much :) > > Date: Fri, 23 Apr 2010 06:04:34 -0700 > > From: otis_gospodne...@yahoo.com > > Subject: Re: Problem with pdf, upgrading Cell > > To: solr-user@lucene.apache.org > > > > Marc, got anything in your logs? > > > > Otis > > ---- > > Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch > > Lucene ecosystem search :: http://search-lucene.com/ > > > > > > > > ----- Original Message ---- > > > From: Marc Ghorayeb <dekay...@hotmail.com> > > > To: solr-user@lucene.apache.org > > > Sent: Fri, April 23, 2010 8:42:53 AM > > > Subject: Problem with pdf, upgrading Cell > > > > > > > > Hello, > > I configured a Solr server to be able to extract data from various > > > documents, including pdfs. Unfortunately, the data extraction fails on > > > several > > > pdfs. I have read around here that this may be due to the old Tika > > > library being > > > used?I looked around and saw that the svn had a newer version so i > > > checked out > > > the trunk, and built it using ant dist, and ant example.I then set up my > > > schema > > > in the newly built server, and inserted the library from the newly built > > > cell > > > into the lib directory (in solr's home). However, now all i get is a > > > blank > > > response... The indexing works, but it doesn't extract anything, only the > > > literal values that i pass on are indexed. > > Any help would be greatly > > > appreciated!! :) > > Thank you. > > Marc Ghorayeb > > > > > > > > > > > _________________________________________________________________ > > Hotmail > > > arrive sur votre téléphone ! Compatible Iphone, Windows Phone, > > > Blackberry, > > > … > > > > > >http://www.messengersurvotremobile.com/?d=Hotmail > > > > _________________________________________________________________ > Consultez gratuitement vos emails Orange, Gmail, Free, ... directement dans > HOTMAIL ! > http://www.windowslive.fr/hotmail/agregation/ _________________________________________________________________ Hotmail arrive sur votre téléphone ! Compatible Iphone, Windows Phone, Blackberry, … http://www.messengersurvotremobile.com/?d=Hotmail