I think he means a doc for each element. so you have a disease occurrence
index

<doc>
<group>1</group>
<dis>1</dis>
<occurrence>exist</occurrence>
<unique Field>1-1</unique field>
</doc>

assuming (and its a pretty fair assumption?) most groups have only a subset
of diseases this will be a sparse matrix so just don't index
the occurrence value "does not exist"

basically denormalize via adding fields which don't relate to the key.

This will work fine on modest hardware and no thought to performance for <5
million docs. It will work fine with some though and hardware for very
large numbers. Its worth a go anyway just to test. It should probably be
your first method to try out.




On 13 October 2013 12:10, Erick Erickson <erickerick...@gmail.com> wrote:

> This sounds like a denormalization issue. Don't be afraid <G>.
>
> Actually, I've seen from 50M 50 300M small docs on a Solr node,
> depending on query type, hardware, etc. So that gives you a
> place to start being cautious about the number of docs in your
> system. If your full expansion of your table numbers in that range,
> you might be just fine denormalizing the data.
>
> Alternatively, there's the "pseudo join" capability to consider. I'm
> usually hesitant to recommend that, but Joel is committing some
> really interesting stuff in the join area which you might take a look
> at if the existing pseudo-join isn't performant enough.
>
> But I'd consider denormalizing the data as the first approach.
>
> Best,
> Erick
>
>
> On Sun, Oct 13, 2013 at 8:07 AM, David Philip
> <davidphilipshe...@gmail.com>wrote:
>
> > Hi Jack, for the point: "each element of the array as a solr document,
> with
> > a group field and a disease field"
> > Did you mean it this way:
> >
> > <doc>
> >   "group1_grp": G1
> >  "disease1_d": 2,
> >  "disease2_d": 3,
> > </doc>
> > <doc>
> >   "group1_grp": G2
> >  "disease1_d": 2,
> >  "disease2_d": 3,
> > "disease3_d":  1,
> > "disease4_d":  1,
> > </doc>
> > similar to first case: having dynamic fields for disease?
> > Will it be performance issue if disease field increase to millions?
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> > On Sun, Oct 13, 2013 at 9:00 AM, Jack Krupansky <j...@basetechnology.com
> > >wrote:
> >
> > > You may be better off indexing each element of the array as a solr
> > > document, with a group field and a disease field. Then you can easily
> and
> > > efficiently add new diseases. Then to query a row, you query for the
> > group
> > > field having the desired group.
> > >
> > > If possible, index the array as being sparse - no document for a
> disease
> > > if it is not present for that group.
> > >
> > > -- Jack Krupansky
> > >
> > > -----Original Message----- From: David Philip
> > > Sent: Saturday, October 12, 2013 9:56 PM
> > > To: solr-user@lucene.apache.org
> > > Subject: Re: Storing 2 dimension array in Solr
> > >
> > >
> > > Hi Erick, Yes it is. But the columns here are dynamically and very
> > > frequently added.They can increase upto 1 million right now. So, 1
> > document
> > > with 1 million dynamic fields, is it fine? Or any other approach?
> > >
> > > While searching through web, I found that docValues are column
> oriented.
> > > http://searchhub.org/2013/04/**02/fun-with-docvalues-in-solr-**4-2/<
> > http://searchhub.org/2013/04/02/fun-with-docvalues-in-solr-4-2/>
> > > However,  I did not understand, how to use docValues to add these
> > columns.
> > >
> > > What is the recommended approach?
> > >
> > > Thanks - David
> > >
> > >
> > >
> > >
> > >
> > >
> > > On Sun, Oct 13, 2013 at 3:33 AM, Erick Erickson <
> erickerick...@gmail.com
> > >*
> > > *wrote:
> > >
> > >  Isn't this just indexing each row as a separate document
> > >> with a suitable ID "groupN" in your example?
> > >>
> > >>
> > >> On Sat, Oct 12, 2013 at 2:43 PM, David Philip
> > >> <davidphilipshe...@gmail.com>**wrote:
> > >>
> > >> > Hi Erick,
> > >> >
> > >> >    We have set of groups as represented below. New columns (diseases
> > as
> > >> in
> > >> > below matrix) keep coming and we need to add them as new column. To
> > that
> > >> > column, we have values such as 1 or 2 or 3 or 4 (exist, slight, na,
> > >> > notfound) for respective groups.
> > >> >
> > >> > While querying we need  to get the entire row for group:"group1".
>  We
> > >> will
> > >> > not be searching on columns(*_disease) values, index=false but
> stored
> > is
> > >> > true.
> > >> >
> > >> > for ex: we use, get group:"group1" and we need to get the entire
> row-
> > >> > exist,slight, not found. Hoping this explanation is clearer.
> > >> >
> > >> >                disease1    disease2     disease3
> > >> > group1    exist         slight          not found
> > >> > groups2   slight        not found    exist
> > >> > group3    slight         exist
> > >> > groupK    -                na             exist
> > >> >
> > >> >
> > >> >
> > >> > Thanks - David
> > >> >
> > >> >
> > >> >
> > >> >
> > >> >
> > >> > On Sat, Oct 12, 2013 at 11:39 PM, Erick Erickson <
> > >> erickerick...@gmail.com
> > >> > >wrote:
> > >> >
> > >> > > David:
> > >> > >
> > >> > > This feels like it may be an XY problem. _Why_ do you
> > >> > > want to store a 2-dimensional array and what
> > >> > > do you want to do with it? Maybe there are better
> > >> > > approaches.
> > >> > >
> > >> > > Best
> > >> > > Erick
> > >> > >
> > >> > >
> > >> > > On Sat, Oct 12, 2013 at 2:07 AM, David Philip
> > >> > > <davidphilipshe...@gmail.com>**wrote:
> > >> > >
> > >> > > > Hi,
> > >> > > >
> > >> > > >   I have a 2 dimension array and want it to be persisted in
> solr.
> > >
> > >> > > How
> > >> > > can I
> > >> > > > do that?
> > >> > > >
> > >> > > > Sample case:
> > >> > > >
> > >> > > >              disease1    disease2     disease3
> > >> > > > group1    exist         slight          not found
> > >> > > > groups2   slight        not found    exist
> > >> > > > group2    slight         exist
> > >> > > >
> > >> > > > exist-1 not found - 2 slight-3 .. can be stored like this also.
> > >> > > >
> > >> > > > Note: This array has frequent updates.  Every time new disease
> > get's
> > >> > > added
> > >> > > > and I have to add description about that disease to all groups.
> > And
> > >> at
> > >> > > > query time, I will do get by row  - get by group only group =
> > group2
> > >> > row.
> > >> > > >
> > >> > > > Any suggestion on how I can achieve this?  I am thankful to the
> >
> > >
> > >> > forum
> > >> > for
> > >> > > > replying with patience, on achieving this, i will blog and will
> >
> > >
> > >> > share
> > >> > it
> > >> > > > with all.
> > >> > > >
> > >> > > > Thanks - David
> > >> > > >
> > >> > >
> > >> >
> > >>
> > >>
> > >
> >
>

Reply via email to