Re: Help:Solr can't put all pdf files into index

2012-02-09 Thread Erick Erickson
settings allow content extraction > etc.. > > Regards, > > Vivek > > -Original Message- > From: 荣康 [mailto:whuiss_cs2...@163.com] > Sent: Wednesday, February 08, 2012 11:30 PM > To: solr-user@lucene.apache.org > Subject: Help:Solr can't put all pdf fi

RE: Help:Solr can't put all pdf files into index

2012-02-09 Thread Vivek Shrivastava
m: 荣康 [mailto:whuiss_cs2...@163.com] Sent: Wednesday, February 08, 2012 11:30 PM To: solr-user@lucene.apache.org Subject: Help:Solr can't put all pdf files into index Hey , I am using solr as my search engine to search my pdf files. I have 18219 files(different file names) and all the files

Re: Help:Solr can't put all pdf files into index

2012-02-09 Thread Michael Kuhlmann
I don't know much about Tika, but this seems to be a bug in PDFBox. See: https://issues.apache.org/jira/browse/PDFBOX-797 Yoz might also have a look at this: http://stackoverflow.com/questions/7489206/error-while-parsing-binary-files-mostly-pdf At least that's what I found when I googled the

Re:Re: Help:Solr can't put all pdf files into index

2012-02-09 Thread Rong Kang
I test one file that is missing in Solr index. And solr response as below ... 0 1 0 2012-02-10 00:03:23 Indexing completed. Added/Updated: 0 documents. Deleted 0 documents. ..  I see tomcat's log file and find this Exception in entity : tika-test:org.apache.solr.handler.dataimport.DataIm

Re: Help:Solr can't put all pdf files into index

2012-02-09 Thread Michael Kuhlmann
I'd suggest that you check which documents *exactly* are missing in Solr index. Or find at least one that's missing, and try to figure out how this document differs from the other ones that can be found in Solr. Maybe we can then find out what exact problem there is. Greetings, -Kuli On 09.02

Re: Help:Solr can't put all pdf files into index

2012-02-09 Thread François Schiettecatte
Have you tried checking any logs? Have you tried identifying a file which did not make it in and submitting just that one and seeing what happens? François On Feb 9, 2012, at 10:37 AM, Rong Kang wrote: > > Yes, I put all file in one directory and I have tested file names using > code. >

Re:Re: Help:Solr can't put all pdf files into index

2012-02-09 Thread Rong Kang
Yes, I put all file in one directory and I have tested file names using code. At 2012-02-09 20:45:49,"Jan Høydahl" wrote: >Hi, > >Are you 100% sure that the filename is globally unique, since you use it as >the uniqueKey? > >-- >Jan Høydahl, search solution architect >Cominvent AS - www.c

Re: Help:Solr can't put all pdf files into index

2012-02-09 Thread Jan Høydahl
Hi, Are you 100% sure that the filename is globally unique, since you use it as the uniqueKey? -- Jan Høydahl, search solution architect Cominvent AS - www.cominvent.com Solr Training - www.solrtraining.com On 9. feb. 2012, at 08:30, 荣康 wrote: > Hey , > I am using solr as my search engine to s

Help:Solr can't put all pdf files into index

2012-02-08 Thread 荣康
Hey , I am using solr as my search engine to search my pdf files. I have 18219 files(different file names) and all the files are in one same directory。But when I use solr to import the files into index using Dataimport method, solr report only import 17233 files. It's very strange. This problem