Re: Big size xml file indexing

2007-01-21 Thread aslam bari
Hi Saikrishna, Unluckily my xml structure is not the same, some times it goes too long and some times too small on nodes. It may be one element go throught the whole document or there may be many elements of different types come. So need your help on it how to parse in good and efficient way so

Re: Big size xml file indexing

2007-01-21 Thread saikrishna venkata pendyala
Hai , Nothing to change in Indexing process. What requires is a little pre-processing. If the structure of ur xml file is same as what I said earlier,then split the 35MB file into small files and make sure that new small files generated are of correct xml syntax. Now Index small

Re: Big size xml file indexing

2007-01-21 Thread aslam bari
Hi Saikrishna, Thanks for reply, But i don't know how i can go with this. Here is my code sample, let me know where to change. SAXBuilder builder = new SAXBuilder(); //CONTENT here is bytearrayinputstream , i know i can give here file url also. Let me know whta is best. Document doc = builder.b

Re: Big size xml file indexing

2007-01-21 Thread saikrishna venkata pendyala
Hai , I have indexed 6.2 gb xml file using lucene. What I did was 1 . I have splitted the 6.2gb file into small files each of size 10mb. 2 . And then I worte a python script to quantize number no.ofdocuments in each file. Structure of my xml file is """

Big size xml file indexing

2007-01-21 Thread aslam bari
Dear all, I m using lucene to index xml files. For parsing i m using JDOM to get XPATH nodes and do some manipulation on them and indexed them. All things work well but when the file size is very big about 35 - 50 MB. Then it goes out of memory or take a lot of time. How can i set some parameter

RE : Re: Lucene and queries

2007-01-21 Thread david chris
thank you for your posts .. the "*" is any word .. hopefully, i should be able to identify sentences in a text and then apply one of these 3 rules .. (1) A * * find the sentences that include the word A and to its left exactly two words (any words) (2) A * * B * find the sentences that inclu

Re: Lucene and queries

2007-01-21 Thread Chris Hostetter
: There's no syntax I know of that'll give you this kind of query out of the : box. The closest thing would be span queries, which will give you things : like A**B, meaning "give me all documents where A is NOT MORE THAN 2 words : away from B. This is not what you're asking for, since it would als

Re: Lucene and queries

2007-01-21 Thread Erick Erickson
My question is "what are you trying to accomplish"? The reason I ask is that all three queries pre-suppose that the search you're performing is on a very precisely defined fields. (1) supposes a field where the A is exactly three words from the end. (3) supposes the A is exactly three words from t

Lucene and queries

2007-01-21 Thread david chris
Hi, I am wondering if Lucene can handle the following queries: (1) A * * give me all documents with word A followed by exactly two words (2) A * * B * give me all documents with words A and B exactly separated by 2 words and word B followed by one word (3) * * A give me all documents with word A