RE: Problem with pdf, upgrading Cell

Marc Ghorayeb Fri, 23 Apr 2010 07:00:20 -0700

Seems like i'm not the only one with this "no extraction" 
problem:http://www.mail-archive.com/solr-user@lucene.apache.org/msg33609.htmlApparently
 he tried the same thing, building from the trunk, and indexing a pdf, and no 
extraction occured... Strange.
Marc G.


> From: dekay...@hotmail.com
> To: solr-user@lucene.apache.org
> Subject: RE: Problem with pdf, upgrading Cell
> Date: Fri, 23 Apr 2010 15:12:39 +0200
> 
> 
> I'm launching it with the start.jar utility, and there doesn't seem to be 
> anything weird inside the console when i upload a pdf. Is there a way to 
> output the console to a log file? The only log file that get's updated is a 
> log file in the logs directory, and it seems to only show the input/ouput of 
> the web requests (get and posts...).
> for example:127.0.0.1 -  -  [23/Apr/2010:13:06:47 +0000] "GET 
> /solr/core0/admin/luke?show=schema&wt=json HTTP/1.1" 200 21690 127.0.0.1 -  - 
>  [23/Apr/2010:13:06:47 +0000] "GET /solr/core0/admin/luke?wt=json HTTP/1.1" 
> 200 780 127.0.0.1 -  -  [23/Apr/2010:13:06:57 +0000] "POST 
> /solr/core0/update/extract?literal.id=C%3A%5CDocuments+and+Settings%5CM1B%5Cworkspace%5C3DS_FileIndexer%5Ctest%5Clucidworks-solr-refguide-1.4.pdf&literal.title=lucidworks-solr-refguide-1.4.pdf&literal.url=http%3A%2F%2Fwww.3ds.com%2Flucidworks-solr-refguide-1.4.pdf&literal.appKey=media&literal.type=document&literal.siteHash=53e446a6b81860dcfa1cc2fef4ef976b&literal.group=portal&literal.group=var&literal.group=0&literal.group=caa_gold&literal.group=caa_partner&literal.group=ag12&literal.group=ag17&wt=javabin&version=1
>  HTTP/1.1" 200 41 127.0.0.1 -  -  [23/Apr/2010:13:06:58 +0000] "POST 
> /solr/core0/update/extract?literal.id=C%3A%5CDocuments+and+Settings%5CM1B%5Cworkspace%5C3DS_FileIndexer%5Ctest%5Cmysql-proxy-en.pdf&literal.title=mysql-proxy-en.pdf&literal.url=http%3A%2F%2Fwww.3ds.com%2Fmysql-proxy-en.pdf&literal.appKey=media&literal.type=document&literal.siteHash=53e446a6b81860dcfa1cc2fef4ef976b&literal.group=portal&literal.group=var&literal.group=0&literal.group=caa_gold&literal.group=caa_partner&literal.group=ag12&literal.group=ag17&wt=javabin&version=1
>  HTTP/1.1" 200 44 127.0.0.1 -  -  [23/Apr/2010:13:06:59 +0000] "POST 
> /solr/core0/update/extract?literal.id=C%3A%5CDocuments+and+Settings%5CM1B%5Cworkspace%5C3DS_FileIndexer%5Ctest%5Cpython-cheat-sheet-v1.pdf&literal.title=python-cheat-sheet-v1.pdf&literal.url=http%3A%2F%2Fwww.3ds.com%2Fpython-cheat-sheet-v1.pdf&literal.appKey=media&literal.type=document&literal.siteHash=53e446a6b81860dcfa1cc2fef4ef976b&literal.group=portal&literal.group=var&literal.group=0&literal.group=caa_gold&literal.group=caa_partner&literal.group=ag12&literal.group=ag17&wt=javabin&version=1
>  HTTP/1.1" 200 44 127.0.0.1 -  -  [23/Apr/2010:13:07:00 +0000] "POST 
> /solr/core0/update HTTP/1.1" 200 41 127.0.0.1 -  -  [23/Apr/2010:13:07:00 
> +0000] "POST /solr/core0/update HTTP/1.1" 200 41 127.0.0.1 -  -  
> [23/Apr/2010:13:07:05 +0000] "GET /solr/core0/admin/schema.jsp HTTP/1.1" 200 
> 26395 127.0.0.1 -  -  [23/Apr/2010:13:07:05 +0000] "GET 
> /solr/core0/admin/jquery-1.2.3.min.js HTTP/1.1" 304 0 
> I don't think that's going to help much :)
> > Date: Fri, 23 Apr 2010 06:04:34 -0700
> > From: otis_gospodne...@yahoo.com
> > Subject: Re: Problem with pdf, upgrading Cell
> > To: solr-user@lucene.apache.org
> > 
> > Marc, got anything in your logs?
> > 
> >  Otis
> > ----
> > Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
> > Lucene ecosystem search :: http://search-lucene.com/
> > 
> > 
> > 
> > ----- Original Message ----
> > > From: Marc Ghorayeb <dekay...@hotmail.com>
> > > To: solr-user@lucene.apache.org
> > > Sent: Fri, April 23, 2010 8:42:53 AM
> > > Subject: Problem with pdf, upgrading Cell
> > > 
> > > 
> > Hello,
> > I configured a Solr server to be able to extract data from various 
> > > documents, including pdfs. Unfortunately, the data extraction fails on 
> > > several 
> > > pdfs. I have read around here that this may be due to the old Tika 
> > > library being 
> > > used?I looked around and saw that the svn had a newer version so i 
> > > checked out 
> > > the trunk, and built it using ant dist, and ant example.I then set up my 
> > > schema 
> > > in the newly built server, and inserted the library from the newly built 
> > > cell 
> > > into the lib directory (in solr's home). However, now all i get is a 
> > > blank 
> > > response... The indexing works, but it doesn't extract anything, only the 
> > > literal values that i pass on are indexed.
> > Any help would be greatly 
> > > appreciated!! :)
> > Thank you.
> > Marc Ghorayeb     
> > >                 
> > >       
> > > 
> > _________________________________________________________________
> > Hotmail 
> > > arrive sur votre téléphone ! Compatible Iphone, Windows Phone, 
> > > Blackberry, 
> > > …
> > 
> > > >http://www.messengersurvotremobile.com/?d=Hotmail
> > 
>                                         
> _________________________________________________________________
> Consultez gratuitement vos emails Orange, Gmail, Free, ... directement dans 
> HOTMAIL !
> http://www.windowslive.fr/hotmail/agregation/
                                          
_________________________________________________________________
Hotmail arrive sur votre téléphone ! Compatible Iphone, Windows Phone, 
Blackberry, …
http://www.messengersurvotremobile.com/?d=Hotmail

RE: Problem with pdf, upgrading Cell

Reply via email to