Re: how to index large number of files?

2010-10-21 Thread Robert Cadena
Take a look at this question and associated answer over on stackoverflow: http://stackoverflow.com/questions/354703/is-there-a-workaround-for-javas-poor-performance-on-walking-huge-directories It's not all inside java but it might work for you and you might not have to restructure your files.

Re: how to index large number of files?

2010-10-21 Thread Sahin Buyrukbilen
I dont know why I am getting this error, but it looks normal to me now. because when I try to list the contents of the folder I cannot get a response from linux shell. Now I have created a folder with 100.000 files and running eclipse with -Xmx2G parameter. it is still indexing for about 15 minute

Re: Using a TermFreqVector to get counts of all words in a document

2010-10-21 Thread appy74
Would you have an example of this or be able to point me in the direction of an example at all? Quoting Grant Ingersoll : > > On Oct 20, 2010, at 4:40 PM, Martin O'Shea wrote: > > > > http://mail-archives.apache.org/mod_mbox/lucene-java-user/201010.mbox/%3c128 > > 7065863.4cb7110774...@netmail

Re: Using a TermFreqVector to get counts of all words in a document

2010-10-21 Thread Grant Ingersoll
On Oct 20, 2010, at 4:40 PM, Martin O'Shea wrote: > http://mail-archives.apache.org/mod_mbox/lucene-java-user/201010.mbox/%3c128 > 7065863.4cb7110774...@netmail.pipex.net%3e will give you a better idea of > what I'm moving towards. > > It's all a bit grey at the moment so further investigation i

Re: how to index large number of files?

2010-10-21 Thread Toke Eskildsen
On Thu, 2010-10-21 at 05:01 +0200, Sahin Buyrukbilen wrote: > Unfortunately both methods didnt go through. I am getting memory error even > at reading the directory contents. Then your problem is probably not Lucene related, but the sheer number of files returned by listFiles. A Java File contain

Re: how to index large number of files?

2010-10-21 Thread Ian Lea
Maybe mostly 1K, but you only need 1 very large doc to cause a problem. I haven't been following this thread, so apologies if I've missed things, but you seem to be having problems running what should be a simple job of the sort that lucene handles every day without breaking sweat. Does it alway