Hi Saikrishna,
Unluckily my xml structure is not the same, some times it goes too long and
some times too small on nodes. It may be one element go throught the whole
document or there may be many elements of different types come. So need your
help on it how to parse in good and efficient way so
Hai ,
Nothing to change in Indexing process. What requires is a little
pre-processing.
If the structure of ur xml file is same as what I said earlier,then
split the 35MB file into small files and make sure that new small files
generated are of correct xml syntax.
Now Index small
Hi Saikrishna,
Thanks for reply,
But i don't know how i can go with this. Here is my code sample, let me know
where to change.
SAXBuilder builder = new SAXBuilder();
//CONTENT here is bytearrayinputstream , i know i can give here file url also.
Let me know whta is best.
Document doc = builder.b
Hai ,
I have indexed 6.2 gb xml file using lucene. What I did was
1 . I have splitted the 6.2gb file into small files each of size
10mb.
2 . And then I worte a python script to quantize number
no.ofdocuments in each file.
Structure of my xml file is """
Dear all,
I m using lucene to index xml files. For parsing i m using JDOM to get XPATH
nodes and do some manipulation on them and indexed them. All things work well
but when the file size is very big about 35 - 50 MB. Then it goes out of memory
or take a lot of time. How can i set some parameter
thank you for your posts ..
the "*" is any word .. hopefully, i should be able to identify sentences in a
text and then apply one of these 3 rules ..
(1) A * *
find the sentences that include the word A and to its left exactly two words
(any words)
(2) A * * B *
find the sentences that inclu
: There's no syntax I know of that'll give you this kind of query out of the
: box. The closest thing would be span queries, which will give you things
: like A**B, meaning "give me all documents where A is NOT MORE THAN 2 words
: away from B. This is not what you're asking for, since it would als
My question is "what are you trying to accomplish"? The reason I ask is that
all three queries pre-suppose that the search you're performing is on a very
precisely defined fields. (1) supposes a field where the A is exactly three
words from the end. (3) supposes the A is exactly three words from t
Hi,
I am wondering if Lucene can handle the following queries:
(1) A * *
give me all documents with word A followed by exactly two words
(2) A * * B *
give me all documents with words A and B exactly separated by 2 words and word
B followed by one word
(3) * * A
give me all documents with word A