I'd suggest that you check which documents *exactly* are missing in Solr index. Or find at least one that's missing, and try to figure out how this document differs from the other ones that can be found in Solr.

Maybe we can then find out what exact problem there is.

Greetings,
-Kuli

On 09.02.2012 16:37, Rong Kang wrote:

Yes, I put all file in one directory and I have tested file names using code.




At 2012-02-09 20:45:49,"Jan Høydahl"<jan....@cominvent.com>  wrote:
Hi,

Are you 100% sure that the filename is globally unique, since you use it as the 
uniqueKey?

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com
Solr Training - www.solrtraining.com

On 9. feb. 2012, at 08:30, 荣康 wrote:

Hey ,
I am using solr as my search engine to search my pdf files. I have 18219 
files(different file names) and all the files are in one same directory。But 
when I use solr to import the files into index using Dataimport method, solr 
report only import 17233 files. It's very strange. This problem has stoped out 
project for a few days. I can't handle it.


please help me!


Schema.xml


<fields>
   <field name="text" type="text" indexed="true" multiValued="true" termVectors="true" 
termPositions="true" termOffsets="true"/>
   <field name="filename" type="filenametext" indexed="true" required="true" termVectors="true" 
termPositions="true" termOffsets="true"/>
   <field name="id" type="string" stored="true"/>
</fields>
<uniqueKey>id</uniqueKey>
<copyField source="filename" dest="text"/>


and
<dataConfig>
    <dataSource type="BinFileDataSource" name="bin"/>
<document>
<entity name="f" processor="FileListEntityProcessor" recursive="true"
rootEntity="false"
dataSource="null"  baseDir="H:/pdf/cls_1_16800_OCRed/1"
fileName=".*\.(PDF)|(pdf)|(Pdf)|(pDf)|(pdF)|(PDf)|(PdF)|(pDF)" onError="skip">


<entity name="tika-test" processor="TikaEntityProcessor"
url="${f.fileAbsolutePath}" format="text" dataSource="bin" onError="skip">
                <field column="text" name="text"/>
</entity>
<field column="file" name="id"/>
<field column="file" name="filename"/>
</entity>
    </document>
</dataConfig>




sincerecly
Rong Kang





Reply via email to