RE: indexing/xpath query question

Sreeni Chippada 5 Mar 2002 15:30:50 -0000

Mark,
        It took me about 3 minutes to load about 2300 documents in 102MB.
        It took me 31 sec to index /INVOICE/BILL_INVOICE.bill_ref_no.
        Now, I deleted that collection and added a new collection with 22
documents/1MB(just to make it simple)
        I did not index the xpath. If run 
                xindiceadmin xpath -c /db/lucent -q
/INVOICE/BILL_INVOICE.bill_ref_no
        I get all the 22 documents.
        If I run 
                xindiceadmin xpath -c /db/lucent -q
/INVOICE/[BILL_INVOICE.bill_ref_no="2"]
        I get nothing. I hope this query is correct.

        What do you mean by 'VM startup is "hiding" the result' ? Could this
be what is happening in my case.

Thanks,
Sreeni

-----Original Message-----
From: Mark J. Stang [mailto:[EMAIL PROTECTED]
Sent: Monday, March 04, 2002 4:50 PM
To: [email protected]
Subject: Re: indexing/xpath query question

How much time is it really taking?   It maybe fast enough that
the VM startup is "hiding" the result.   In Kimbros example,
he did a complete search of 149,025 documents in less than
12 minutes.   If you have 2,000 documents, then it could
take 1/75 of the time or about 10 seconds.   If you are printing
the output to the screen it may seem the same.   Try doing an XPath
search for the last document added, with and without the index.
Just the one document, not all of them.

HTH,
Mark

Kimbro did some tests last September:

He wrote this:
"As I've been working out some issues with the CORBA system I've been
working on getting larger document sets into the server. My largest set
right now is 149,025 documents in a single collection. The server can
easily handle more documents this is just the largest dataset I have
available right now. Here are some stats to give us a better idea where we
stand. These are run against the current CVS version with one exception. I
used OpenORB for the server ORB  instead of JacORB. JacORB was still used
for the client. It's likely we'll need to switch to OpenORB overall as
even the latest JacORB leaks memory on the server.

computer: 750MHZ P3 256MB RAM Laptop running Mandrake Linux 8
jdk: Sun 1.3.0_04
Dataset size: 149,025 documents 601MB
Insertion time (no indexes): 1 hour 45 minutes which is roughly 1,424 docs
per minute or 24 per second.
Collection size: 657MB
Document retrieval: 2 seconds (including VM startup which is most of the
time)
Full collection scan query /disc[id = '11041c03']: 12 minutes
Index creation: 13.5 minutes
Index based query /disc[id = '11041c03']: 2.12 seconds (including VM
startup which is most of that time)
Index size 164MB

The data set consists of documents similar to the following.

<?xml version="1.0"?>
<disc>
<id>11041c03</id>
<length>1054</length>
<title>Orchestral Manoeuvres In The Dark / The OMD Remixes (Single)</title>
<genre>cddb/misc</genre>
<track index="1" offset="150">Enola Gay (OMD vs Sash! Radio Edit)</track>
<track index="2" offset="18790"> (2)Souvenir (Moby Remix)</track>
<track index="3" offset="39790"> (3)Electricity (The Micronauts
Remix)</track>
</disc>

Kimbro Staken"

Sreeni Chippada wrote:

> Hi,
>         I am new to xindice. I added a few documents as DOMs and ran xpath
> query successfully. Then I added an index on the collection and ran the
> query. It takes same amount of time.
>
> Here are the details:
>
> My document structure looks like this:
>
> <INVOICE>
>         <BILL_INVOICE.bill_ref_no>2</BILL_INVOICE.bill_ref_no)
>         .
>         .
>         .
> </INVOICE>
>
> I loaded about 2000 documents.
>
> When I run 'xindiceadmin xpath -c /db/test -q
> /INOVICE/BILL_INVOICE.bill_ref_no' I get all the
> /INOVICE/BILL_INVOICE.bill_ref_no elements.
>
> Then ran the following command to add an index.
>
> xindiceadmin ai -c /db/test -n BillRefNum  -p
> /INOVICE/BILL_INVOICE.bill_ref_no
>
> Now if run the same query as above, it still takes same time. Looks like
it
> not using the index i created.
>
> Appreciate any help.
>
> Thanks,
> Sreeni

RE: indexing/xpath query question

Reply via email to