Re: how-to query an xml repository efficiently

DAVIGNON Andre - CETE NP/DIODé/PANDOC Tue, 08 Sep 2009 08:16:20 -0700

Robby,

One more thing about this subject.

You can do all that stuff directly with Cocoon / Lucene with java codeonly, but Solr offers rich possibilities of index configuration byschema.xml and index can be handled with a HTTP client inside Cocoonthrough the Solr XML / HTTP API. Or in java code with SolrJ API if youprefer.


André (not David ;-) )

Le 08/09/2009 11:12, > Robby Pelssers (par Internet, dépôtusers-return-97980-andre.davignon=developpement-durable.gouv...@cocoon.apache.org)a écrit :

You all convinced me to investigate the SOLR path further ;-)

I already installed SOLR yesterday but I probably did not spent enough

time on playing with it due to lack of time.  That's why I ask the
experts on this mailing list ;-)

David's answer "The facet research funtionality in Solr can give access
to all possible values in the index of your data for a given property so
the user can pick one among them, then find all matching data." was the
missing piece of the puzzle.

Thx a lot guys !!

Robby

-----Original Message-----

From: Jeroen Reijn [mailto:[email protected]]Sent: Tuesday, September 08, 2009 10:45 AM

To: [email protected]
Subject: Re: how-to query an xml repository efficiently

Hi Robby,

in this case I even think SOLR would be a great match for this use case.

You can push XML with a http client to SOLR and let SOLR index theinformation. See the post.jar that comes with the SOLR example. Itpushes XML to the solr app and indexes it based on your configuration.


The great thing is that you can even configure all kinds of facets based

on what is stored in such a product file, so you can create a nice facet

view in your webapp.

A couple of years ago I was looking a some Forrest components [1], which

were made for using SOLR from a cocooon point of view. It helps you toperform queries to a SOLR instance from your sitemap and get XMLresponse back.


Regards,

Jeroen

[1]http://wiki.apache.org/solr/SolrForrest

Robby Pelssers wrote:

Hi jeroen and others who replied to my mail...  Let me further explain
my usecase and existing infrastructure.

My customer stores their product data in xml-files on file systemE.g.${repofolder}/

        products/
                product-1/      
                        product-1.xml
                        product-1-image.jpg
                        ...
                product-2/      
                        product-2.xml
                        product-2-image.jpg
                ...

This is a simplified representation but as you see there is no concept
of an xml database.

Now let's start with a small fictive example for product-1.xml:

<product>
  <id>xxxx</id>
  <description>grandma's cookies</description>
  <category>food</category>
  <price>2.0</price>
</product>

From a functional point of view they want to be able to search for
products based on some criteria.  So I'll have to build a small
searchform containing:
        - Dropdown with all possible categories
        - textbox to search for part of description
        - price "between/ equal to / greather then / less then" search
functionality

So for certain "Filter"-criteria I'll have to get all possible values

so

they can pick one and for others I don't need to know anything about

the

actual data.

The actual product xml-files are +- 500kb on average and I'm talking
about LOTS of products so I have to consider performance upfront.

SOLR seems good for indexing static html files etc but I don't get the
impression it can offer the necessary functionality for this use case.

Any comments??

Cheers,
Robby





-----Original Message-----

From: Jeroen Reijn [mailto:[email protected]]Sent: Tuesday, September 08, 2009 9:01 AM

To: [email protected]
Subject: Re: how-to query an xml repository efficiently

Hi Robby,

do you perhaps have any more specs on what kind of XML database it is?

At our company we have experience with an Apache Slide backed

database,

which we used for storing XML files and let Slide indexed them withLucene. Then based on DASL queries we could search the repository

really

quickly.

Next to DASK I know there are also XML databases that can use XQueries

to perform fast searches on their XML database.

Regards,

Jeroen

Robby Pelssers wrote:

Hi all,

I have following use case.  The customer has an xml repository which

is

nothing more then a directory on filesystem which containssubdirectories containing one or more xml files. They now want to

query

those xml files on some predefined criteria which might change over

time...

I'm looking for a solution which results in high performance search

and

some things that came to my mind was

*         extracting information and storing them in a database (e.g.

HSQLDB)

*         using lucene

Is there somewhere detailed documentation available on using these?

And

what would you recommend for my use case?

I already found some stuff but no real quick-start material.

http://cocoon.apache.org/2.1/userdocs/concepts/xmlsearching.html

http://cocoon.apache.org/2.2/blocks/hsqldb-client/1.0/

http://cocoon.apache.org/2.2/blocks/hsqldb-server/1.0/

Thx in advance,

Robby Pelssers

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: how-to query an xml repository efficiently

Reply via email to