Hi,

I'm new to Lucene/Solr and I'm trying to build an index of a large body of
plaintext files for some corpus research that I'm doing.  There are about
37,000 files of typically 50-100 lines each, and they're scattered
throughout a huge nested directory structure.  I've worked through the basic
Solr tutorial and the text/html indexing tutorial at
http://www.slideshare.net/LucidImagination/indexing-text-and-html-files-with-solr-4063407
, but after some looking around, I haven't been able to find any resources
for indexing a large number of text files that aren't all sitting in the
same directory.

Is this simply a case of having to write a shell script to crawl through the
whole directory tree and call cURL for every single file, or is there a
library or utility that can do this, or just an easier way?  Any help would
be greatly appreciated!  Alternatively, if this is a solved problem and I
just need to RTFM, it'd be great if someone could point me in the right
direction.

Thanks a lot,
Colin

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Help-please-recursively-indexing-lots-and-lots-of-text-files-tp2635884p2635884.html
Sent from the Solr - User mailing list archive at Nabble.com.

Reply via email to