Thanks for the ideas.

I think after reading enough documentation and articles around solr and xml 
indexing in general, I have come around to understand that there is no escaping 
denormalization.

However, one tiny thought remains... perhaps my last shot at avoiding 
denormalization (of course its going to be a costly affair)..

I was reading about how solr can handle multiple cores and therefore multiple 
indexes.  Can there be a single search interface sending queries to these three 
cores ?? in that case, who would do load balancing ? the merging of the results 
?? and whether I would be running three instances of solr on my system(s) or 
only one can handle that..



-----Original Message-----
From: Dennis Gearon [mailto:gear...@sbcglobal.net] 
Sent: Thursday, September 30, 2010 9:25 PM
To: solr-user@lucene.apache.org
Subject: RE: Is Solr right for my business situation ?

You need to be able to query the database with the 'Mother of all Queries', 
i.e. one that completely flattens all tables into each row. 

In other words, the JOIN section of the query will have EVERY table in it, and 
depending on your schema, some of them twice or more.

Trying to do that with CSV, separate tables would require you to put those into 
your OWN database, then query against that, as above.

Dennis Gearon

Signature Warning
----------------
EARTH has a Right To Life,
  otherwise we all die.

Read 'Hot, Flat, and Crowded'
Laugh at http://www.yert.com/film.php


--- On Wed, 9/29/10, Sharma, Raghvendra <sraghven...@corelogic.com> wrote:

> From: Sharma, Raghvendra <sraghven...@corelogic.com>
> Subject: RE: Is Solr right for my business situation ?
> To: "solr-user@lucene.apache.org" <solr-user@lucene.apache.org>
> Date: Wednesday, September 29, 2010, 9:40 AM
> Some questions.  
> 
> 1. I have about 3-5 tables. Now designing schema.xml for a
> single table looks ok, but whats the direction for handling
> multiple table structures is something I am not sure about.
> Would it be like a big huge xml, wherein those three tables
> (assuming its three) would show up as three different
> tag-trees, nullable. 
> 
> My source provides me a single flat file per table (tab
> delimited).
> 
> Do you think having multiple indexes could be a solution
> for this case ?? or do I really need to spend effort in
> denormalizing the data ?
> 
> 2. Further, loading into solr can use some perf tuning..
> any tips ? best practices ?
> 
> 3. Also, is there a way to specify a xslt at the server
> side, and make it default, i.e. whenever a response is
> returned, that xslt is applied to the response
> automatically...
> 
> 4. And last question for the day - :) there was one post
> saying that the spatial support is really basic in solr and
> is going to be improved in next versions... Can you ppl help
> me get a definitive yes or no on spatial support... in the
> current form, does it work on not ? I would store lat and
> long, and would need to make them searchable...
> 
> --raghav..
> 
> -----Original Message-----
> From: Sharma, Raghvendra [mailto:sraghven...@corelogic.com]
> 
> Sent: Tuesday, September 28, 2010 11:45 AM
> To: solr-user@lucene.apache.org
> Subject: RE: Is Solr right for my business situation ?
> 
> Thanks for the responses people.
> 
> @Grant  
> 
> 1. can you show me some direction on that.. loading data
> from an incoming stream.. do I need some third party tools,
> or need to build something myself...
> 
> 4. I am basically attempting to build a very fast search
> interface for the existing data. The volume I mentioned is
> more like static one (data is already there). The sql
> statements I mentioned are daily updates coming. The good
> thing is that the history is not there, so the overall
> volume is not growing, but I need to apply the update
> statements. 
> 
> One workaround I had in mind is, (though not so great
> performance) is to apply the updates to a copy of rdbms, and
> then feed the rdbms extract to solr.  Sounds like
> overkill, but I don't have another idea right now. Perhaps
> business discussions would yield something.
> 
> @All -
> 
> Some more questions guys.  
> 
> 1. I have about 3-5 tables. Now designing schema.xml for a
> single table looks ok, but whats the direction for handling
> multiple table structures is something I am not sure about.
> Would it be like a big huge xml, wherein those three tables
> (assuming its three) would show up as three different
> tag-trees, nullable. 
> 
> My source provides me a single flat file per table (tab
> delimited).
> 
> 2. Further, loading into solr can use some perf tuning..
> any tips ? best practices ?
> 
> 3. Also, is there a way to specify a xslt at the server
> side, and make it default, i.e. whenever a response is
> returned, that xslt is applied to the response
> automatically...
> 
> 4. And last question for the day - :) there was one post
> saying that the spatial support is really basic in solr and
> is going to be improved in next versions... Can you ppl help
> me get a definitive yes or no on spatial support... in the
> current form, does it work on not ? I would store lat and
> long, and would need to make them searchable...
> 
> Looks like I m close to my solution.. :)
> 
> --raghav
> 
> -----Original Message-----
> From: Grant Ingersoll [mailto:gsing...@apache.org]
> 
> Sent: Tuesday, September 28, 2010 1:05 AM
> To: solr-user@lucene.apache.org
> Subject: Re: Is Solr right for my business situation ?
> 
> Inline.
> 
> On Sep 27, 2010, at 1:26 PM, Walter Underwood wrote:
> 
> > When do you need to deploy?
> > 
> > As I understand it, the spatial search in Solr is
> being rewritten and is slated for Solr 4.0, the release
> after next.
> 
> It will be in 3.x, the next release
> 
> > 
> > The existing spatial search has some serious problems
> and is deprecated.
> > 
> > Right now, I think the only way to get spatial search
> in Solr is to deploy a nightly snapshot from the active
> development on trunk. If you are deploying a year from now,
> that might change.
> > 
> > There is not any support for SQL-like statements or
> for joins. The best practice for Solr is to think of your
> data as a single table, essentially creating a view from
> your database. The rows become Solr documents, the columns
> become Solr fields.
> 
> There is now group-by capabilities in trunk as well, which
> may or may not help.
> 
> > 
> > wunder
> > 
> > On Sep 27, 2010, at 9:34 AM, Sharma, Raghvendra
> wrote:
> > 
> >> I am sure these kind of questions keep coming to
> you guys, but I want to raise the same question in a
> different context...my own business situation.
> >> I am very very new to solr and though I have tried
> to read through the documentation, I have nowhere near
> completing the whole read.
> >> 
> >> The need is like this - 
> >> 
> >> We have a huge rdbms database/table. A single
> table perhaps houses 100+ million rows. Though oracle is
> doing a fine job of handling the insertion and updation of
> data, the querying is where our main concerns lie. 
> Since we have spatial data, the index building takes hours
> and hours for such tables.
> >> 
> >> That's when we thought of moving away from
> standard rdbms and thought of trying something different and
> fast. 
> >> My last week has been spent in a journey reading
> through bigtable to hadoop to hbase, to hive and then
> finally landed on solr. As far as I am in my tests, it looks
> pretty good, but I have a few unanswered questions still.
> Trying this group for them  :)  (I am sure I can
> find some answers if I read/google more on the topic, but
> now I m being lazy and feel asking the people who are
> already using it/or perhaps developing it is a better bet).
> >> 
> >> 1. Can I get my solr instance to load data (fresh
> data for indexing) from a stream (imagine a mq kind of
> queue, or similar) ?
> 
> Yes, with a little bit of work.
> 
> >> 2. Can I host my solr instance to use hbase as the
> database/file system (read HDFS) ?
> 
> Probably, but I doubt it will be fast.  Local disk is
> usually the best.  100+ M rows is large but not
> unreasonable.
> 
> >> 3. are there somewhere any reports available (as
> in benchmarks ) for a solr instance's performance ? 
> 
> You can probably search the web for these.  I've
> personally seen several installs w/ 1B+ docs and subsecond
> search and faceting and heard of others.  You might
> look at the stuff the Hathi trust has put up.  
> 
> >> 4. are there any APIs available which might help
> me apply ANSI sql kind of statements to my solr data ? 
> 
> No.  Question back?  What kinds of things are you
> trying to do?
> 
> >> 
> >> It would be great if people could help share their
> experience in the area... if it's too much trouble writing
> all of it, perhaps url would be easier... I welcome all
> kinds of help here... any advice/suggestions are good ...
> >> 
> >> Looking forward to your viewpoints..
> >> 
> >> --raghav..
> >>
> ******************************************************************************************
> 
> >> This message may contain confidential or
> proprietary information intended only for the use of the 
> >> addressee(s) named above or may contain
> information that is legally privileged. If you are 
> >> not the intended addressee, or the person
> responsible for delivering it to the intended addressee, 
> >> you are hereby notified that reading,
> disseminating, distributing or copying this message is
> strictly 
> >> prohibited. If you have received this message by
> mistake, please immediately notify us by  
> >> replying to the message and delete the original
> message and any copies immediately thereafter. 
> >> 
> >> Thank you. 
> >>
> ******************************************************************************************
> 
> >> CLLD
> >> 
> > 
> > 
> > 
> > 
> 
> --------------------------
> Grant Ingersoll
> http://lucenerevolution.org Apache Lucene/Solr
> Conference, Boston Oct 7-8
> 
> 

Reply via email to