HI, no it is not the same program. I'm basically calling the method below in a for loop.
In my first app I invoked it only once over the entire index (30 rows), and it took 2 minutes. Now I'm calling it in loop for each row, because I need to update my index, which is growing (first iteration 1 row, then 2... then 2 again, then 3... and so on -it is a clustering algorithm and each row is a cluster). It is supposed to be slow but I'm surpise it takes more than 1 hour. Thanks public static void performQuery(QueryDoc queryDoc) throws java.io.IOException { BooleanQuery booleanQuery = new BooleanQuery(true); notRelevant = new MatchAllDocsQuery(); booleanQuery.add(notRelevant, BooleanClause.Occur.SHOULD); try { phrase = queryDoc.getTitle(); for (int i = 0; i < phrase.length; i++) { title = new BooleanQuery(); booleanQuery.add(new QueryParser(org.apache.lucene.util.Version.LUCENE_40, "title", new WhitespaceAnalyzer(org.apache.lucene.util.Version.LUCENE_40)).parse(phrase[i]), BooleanClause.Occur.SHOULD); } phrase = queryDoc.getDescription(); for (int i = 0; i < phrase.length; i++) { description = new BooleanQuery(); booleanQuery.add(new QueryParser(org.apache.lucene.util.Version.LUCENE_40, "description", new WhitespaceAnalyzer(org.apache.lucene.util.Version.LUCENE_40)).parse(phrase[i]), BooleanClause.Occur.SHOULD); } //time = new TermQuery(new Term("time",queryDoc.getTime())); //booleanQuery.add(time, BooleanClause.Occur.SHOULD); phrase = queryDoc.getTags(); for (int i = 0; i < phrase.length; i++) { tags = new BooleanQuery(); booleanQuery.add(new QueryParser(org.apache.lucene.util.Version.LUCENE_40, "tags", new WhitespaceAnalyzer(org.apache.lucene.util.Version.LUCENE_40)).parse(phrase[i]), BooleanClause.Occur.SHOULD); } } catch (ParseException pe) { //System.out.println(pe.getMessage()); } topDocs = searcher.search(booleanQuery, 220000); writeResults(topDocs, queryDoc); } On 5 April 2011 15:45, Simone Tripodi <simonetrip...@apache.org> wrote: > Hi Patrick, > if the Digester program you're speaking about is the one you pasted > here time ago... well, there were a lot of optimization missed. For > example I suggested you using the Lucene rules instead of storing all > the properties in a POJO then creating the Lucene Document, in that > way you limit the amount of stored data. > > When parsing large XML document - like your case - I suggest you > mapping to Object as less as possible and stream more. > > HTH, > Simo > > http://people.apache.org/~simonetripodi/ > http://www.99soft.org/ > > > > 2011/4/5 Weiwei Wang <ww.wang...@gmail.com>: > > I don't not think your program becomes slower because you are not using > > Digester, RAM should be much faster. Suggest you make your main part of > your > > program simple and paste it in the email so as others can help > > > > On Tue, Apr 5, 2011 at 7:08 PM, Patrick Diviacco < > patrick.divia...@gmail.com > >> wrote: > > > >> hi, > >> > >> I've a java app and I've stopped to use Digester recently because all my > >> data is now kept in RAM and I don't need to write/parse xml files > anymore. > >> > >> However, since I don't use Digester and external xml files, the > performance > >> of my app got worse. > >> > >> I now have the same data stored in a ArrayList<ArrayList<String>> and > I'm > >> iterate them with a for cycle. > >> > >> Before they were in a xml file with the following structure: > >> > >> <collection> > >> <doc> > >> <field1></field1> > >> .. > >> </doc> > >> .. > >> </collection> > >> > >> Is really Digester much faster in iterating my data from xml file than a > >> for > >> loop iterating an ArrayList with the same content? > >> > >> thanks > >> > > > > > > > > -- > > 王巍巍 > > Cell: 18911288489 > > MSN: ww.wang...@gmail.com > > Blog: http://whisper.eyesay.org > > 围脖:http://t.sina.com/lolorosa > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: user-unsubscr...@commons.apache.org > For additional commands, e-mail: user-h...@commons.apache.org > >