Well,
Avoiding flattening the db to a flat table sounds like a great plan.
I found this solution
http://wiki.apache.org/solr/DataImportHandler#Full_Import_Example

import.a join. not handling a flat table.



On Tue, Jun 18, 2013 at 5:53 PM, Jack Krupansky <j...@basetechnology.com>wrote:

> You can in fact have multiple collections in Solr and do a limited amount
> of joining, and Solr has multivalued fields as well, but none of those
> techniques should be used to avoid the process of flattening and
> denormalizing a relational data model. It is hard work, but yes, it is
> required to use Solr effectively.
>
> Again, start with the queries - what problem are you trying to solve.
> Nobody stores data just for the sake of storing it - how will the data be
> used?
>
>
> -- Jack Krupansky
>
> -----Original Message----- From: Mysurf Mail
> Sent: Tuesday, June 18, 2013 9:58 AM
>
> To: solr-user@lucene.apache.org
> Subject: Re: How to define my data in schema.xml
>
> Hi Jack,
> Thanks, for you kind comment.
>
> I am truly in the beginning of data modeling my schema over an existing
> working DB.
> I have used the school-teachers-student db as an example scenario.
> (a, I have written it as a disclaimer in my first post. b. I really do not
> know anyone that has 300 hobbies too.)
>
> In real life my db is obviously much different,
> I just used this as an example of potential pitfalls that will occur if I
> use my old db data modeling notions.
> obviously, the old relational modeling idioms do not apply here.
>
> Now, my question was referring to the fact that I would really like to
> avoid a flat table/join/view because of the reason listed above.
> So, my scenario is answering a plain user generated text search over a
> MSSQLDB that contains a few 1:n relation (and a few 1:n:n relationship).
>
> So, I come here for tips. Should I use one combined index (treat it as a
> nosql source) or separate indices or another. any other ways to define
> relation data ?
> Thanks.
>
>
>
> On Tue, Jun 18, 2013 at 4:30 PM, Jack Krupansky <j...@basetechnology.com>*
> *wrote:
>
>  It sounds like you still have a lot of work to do on your data model. No
>> matter how you slice it, 8 billion rows/fields/whatever is still way too
>> much for any engine to search on a single server. If you have 8 billion of
>> anything, a heavily sharded SolrCloud cluster is probably warranted. Don't
>> plan ahead to put more than 100 million rows on a single node; plan on a
>> proof of concept implementation to determine that number.
>>
>> When we in Solr land say "flattened" or "denormalized", we mean in an
>> intelligent, "smart", thoughtful sense, not a mindless, mechanical
>> flattening. It is an opportunity for you to reconsider your data models,
>> both old and new.
>>
>> Maybe data modeling is beyond your skill set. If so, have a chat with your
>> boss and ask for some assistance, training, whatever.
>>
>> Actually, I am suspicious of your 8 billion number - change each of those
>> 300's to realistic, average numbers. Each teacher teaches 300 courses?
>> Right. Each Student has 300 hobbies? If you say so, but...
>>
>> Don't worry about schema.xml until you get your data model under control.
>>
>> For an initial focus, try envisioning the use cases for user queries. That
>> will guide you in thinking about how the data would need to be organized
>> to
>> satisfy those user queries.
>>
>> -- Jack Krupansky
>>
>> -----Original Message----- From: Mysurf Mail
>> Sent: Tuesday, June 18, 2013 2:20 AM
>> To: solr-user@lucene.apache.org
>> Subject: Re: How to define my data in schema.xml
>>
>>
>> Thanks for your reply.
>> I have tried the simplest approach and it works absolutely fantastic.
>> Huge table - 0s to result.
>>
>> two problems as I described earlier, and that is what I try to solve:
>> 1. I create a flat table just for solar. This requires maintenance and
>> develop. Can I run solr over my regular tables?
>>    This is my simplest approach. Working over my relational tables,
>> 2. When you query a flat table by school name, as I described, if the
>> school has 300 student, 300 teachers, 300  with 300 teacherCourses, 300
>> studentHobbies,
>>    you get 8.1 Billion rows (300*300*300*300). As I am sure this will work
>> great on solar - searching for the school name will retrieve 8.1 B rows.
>> 3. Lets say all my searches are user generated free text search that is
>> searching name and comments columns.
>> Thanks.
>>
>>
>> On Tue, Jun 18, 2013 at 7:32 AM, Gora Mohanty <g...@mimirtech.com> wrote:
>>
>>  On 18 June 2013 01:10, Mysurf Mail <stammail...@gmail.com> wrote:
>>
>>> > Thanks for your quick reply. Here are some notes:
>>> >
>>> > 1. Consider that all tables in my example have two columns: Name &
>>> > Description which I would like to index and search.
>>> > 2. I have no other reason to create flat table other than for solar. So
>>> > I
>>> > would like to see if I can avoid it.
>>> > 3. If in my example I will have a flat table then obviously it will >
>>> hold
>>> a
>>> > lot of rows for a single school.
>>> >     By searching the exact school name I will likely receive a lot of
>>> rows.
>>> > (my flat table has its own pk)
>>>
>>> Yes, all of this is definitely the case, but in practice
>>> it does not matter. Solr can efficiently search through
>>> millions of rows. To start with, just try the simplest
>>> approach, and only complicate things as and when
>>> needed.
>>>
>>> >     That is something I would like to avoid and I thought I can avoid
>>> this
>>> > by defining teachers and students as multiple value or something like
>>> this
>>> > and than teacherCourses and studentHobbies  as 1:n respectively.
>>> >     This is quite similiar to my real life demand, so I came here to >
>>> get
>>> > some tips as a solr noob.
>>>
>>> You have still not described what are the searches that
>>> you would want to do. Again, I would suggest starting
>>> with the most straightforward approach.
>>>
>>> Regards,
>>> Gora
>>>
>>>
>>>
>>
>

Reply via email to