Re: indexing/xpath query question

Mark J. Stang 5 Mar 2002 17:28:01 -0000

What platform are you running on?

Mark


Sreeni Chippada wrote:

> Mark,
>         It took me about 3 minutes to load about 2300 documents in 102MB.
>         It took me 31 sec to index /INVOICE/BILL_INVOICE.bill_ref_no.
>         Now, I deleted that collection and added a new collection with 22
> documents/1MB(just to make it simple)
>         I did not index the xpath. If run
>                 xindiceadmin xpath -c /db/lucent -q
> /INVOICE/BILL_INVOICE.bill_ref_no
>         I get all the 22 documents.
>         If I run
>                 xindiceadmin xpath -c /db/lucent -q
> /INVOICE/[BILL_INVOICE.bill_ref_no="2"]
>         I get nothing. I hope this query is correct.
>
>         What do you mean by 'VM startup is "hiding" the result' ? Could this
> be what is happening in my case.
>
> Thanks,
> Sreeni
>
>
>
> -----Original Message-----
> From: Mark J. Stang [mailto:[EMAIL PROTECTED]
> Sent: Monday, March 04, 2002 4:50 PM
> To: [email protected]
> Subject: Re: indexing/xpath query question
>
> How much time is it really taking?   It maybe fast enough that
> the VM startup is "hiding" the result.   In Kimbros example,
> he did a complete search of 149,025 documents in less than
> 12 minutes.   If you have 2,000 documents, then it could
> take 1/75 of the time or about 10 seconds.   If you are printing
> the output to the screen it may seem the same.   Try doing an XPath
> search for the last document added, with and without the index.
> Just the one document, not all of them.
>
> HTH,
> Mark
>
> Kimbro did some tests last September:
>
> He wrote this:
> "As I've been working out some issues with the CORBA system I've been
> working on getting larger document sets into the server. My largest set
> right now is 149,025 documents in a single collection. The server can
> easily handle more documents this is just the largest dataset I have
> available right now. Here are some stats to give us a better idea where we
> stand. These are run against the current CVS version with one exception. I
> used OpenORB for the server ORB  instead of JacORB. JacORB was still used
> for the client. It's likely we'll need to switch to OpenORB overall as
> even the latest JacORB leaks memory on the server.
>
> computer: 750MHZ P3 256MB RAM Laptop running Mandrake Linux 8
> jdk: Sun 1.3.0_04
> Dataset size: 149,025 documents 601MB
> Insertion time (no indexes): 1 hour 45 minutes which is roughly 1,424 docs
> per minute or 24 per second.
> Collection size: 657MB
> Document retrieval: 2 seconds (including VM startup which is most of the
> time)
> Full collection scan query /disc[id = '11041c03']: 12 minutes
> Index creation: 13.5 minutes
> Index based query /disc[id = '11041c03']: 2.12 seconds (including VM
> startup which is most of that time)
> Index size 164MB
>
> The data set consists of documents similar to the following.
>
> <?xml version="1.0"?>
> <disc>
> <id>11041c03</id>
> <length>1054</length>
> <title>Orchestral Manoeuvres In The Dark / The OMD Remixes (Single)</title>
> <genre>cddb/misc</genre>
> <track index="1" offset="150">Enola Gay (OMD vs Sash! Radio Edit)</track>
> <track index="2" offset="18790"> (2)Souvenir (Moby Remix)</track>
> <track index="3" offset="39790"> (3)Electricity (The Micronauts
> Remix)</track>
> </disc>
>
> Kimbro Staken"
>
> Sreeni Chippada wrote:
>
> > Hi,
> >         I am new to xindice. I added a few documents as DOMs and ran xpath
> > query successfully. Then I added an index on the collection and ran the
> > query. It takes same amount of time.
> >
> > Here are the details:
> >
> > My document structure looks like this:
> >
> > <INVOICE>
> >         <BILL_INVOICE.bill_ref_no>2</BILL_INVOICE.bill_ref_no)
> >         .
> >         .
> >         .
> > </INVOICE>
> >
> > I loaded about 2000 documents.
> >
> > When I run 'xindiceadmin xpath -c /db/test -q
> > /INOVICE/BILL_INVOICE.bill_ref_no' I get all the
> > /INOVICE/BILL_INVOICE.bill_ref_no elements.
> >
> > Then ran the following command to add an index.
> >
> > xindiceadmin ai -c /db/test -n BillRefNum  -p
> > /INOVICE/BILL_INVOICE.bill_ref_no
> >
> > Now if run the same query as above, it still takes same time. Looks like
> it
> > not using the index i created.
> >
> > Appreciate any help.
> >
> > Thanks,
> > Sreeni

Re: indexing/xpath query question

Reply via email to