Re: XML parsing using Lucene in Java

2007-11-19 Thread Catalin Mititelu
Hi Fayyaz, I recommend to use SAX or, maybe, a custom parser for large xml files .It should be faster than using Digester. The main difference between those xml parsers is that Digester needs to load the entire xml document in memory when it creates those objects, meanwhile you can parse the doc

Re: get terms by positions

2006-10-02 Thread Catalin Mititelu
Hi, I have the same problem. This is useful when you try to extract the contexts (terms before and after) of a certain term (for example). I found a solution but it performs badly: when you try to retrieve those contexts you have to re-tokenize the documents containing the given term (i.e. "socc

Re: Big Ducument Indexing Limit?

2006-09-15 Thread Catalin Mititelu
ontent of your document. If you really want to index the whole xml file just read the file using java io anyway I would not suggest doing that at all. best regards simon > > Thanks... > Catalin Mititelu wrote: > Yes. The default max limit for indexing tokens is 10,000. > Look

Re: Big Ducument Indexing Limit?

2006-09-15 Thread Catalin Mititelu
Yes. The default max limit for indexing tokens is 10,000. Look here http://lucene.apache.org/java/docs/api/org/apache/lucene/index/IndexWriter.html#DEFAULT_MAX_FIELD_LENGTH aslam bari <[EMAIL PROTECTED]> wrote: Dear all, I am trying to index a Xml file which has 6MB size. Does lucene support t