Hi Vijay, 

The short answer is yes, you can combine almost anything you want into a single 
collection.   But, in addition to working out your queries, you might want work 
out your data life cycle.

In our application, we have comingled the structured and unstructured documents 
into a single collection for initial development purposes.   The only field 
they have in common is the unique ID.    Works fine.

In production, however, we see things like query rates, access controls, load 
balancing, availability, shard keys, overall document counts, update frequency, 
etc. will drive us to use separate collections.  For us, the deciding factor is 
less about "structured vs. unstructured" and more about "public vs. private".   
We have developed our app so that splitting the collection will have minimal 
impact by executing separate queries, in parallel, at runtime.   

Of course, your application is different.  YMMV, etc.

hth,
Charlie


-----Original Message-----
From: Jack Krupansky [mailto:jack.krupan...@gmail.com] 
Sent: Sunday, March 29, 2015 4:26 PM
To: solr-user@lucene.apache.org
Subject: Re: Structured and Unstructured data indexing in SolrCloud

The first step is to work out the queries that you wish to perform - that will 
determine how the data should be organized in the Solr schema.

-- Jack Krupansky

On Sun, Mar 29, 2015 at 4:04 PM, Vijay Bhoomireddy < 
vijaya.bhoomire...@whishworks.com> wrote:

> Hi,
>
>
>
> We have a requirement where both structured and unstructured data 
> comes into the system. We need to index both of them and then enable 
> search functionality on it. We are using SolrCloud on Hadoop platform. 
> For structured data, we are planning to put the data into HBase and 
> for unstructured, directly into HDFS.
>
>
>
> My question is how to index these sources under a single Solr core? 
> Would that be possible to index both structured and unstructured data 
> under a single core/collection in SolrCloud and then enable search 
> functionality over that index?
>
>
>
> Thanks in advance.
>
>
> --
> The contents of this e-mail are confidential and for the exclusive use 
> of the intended recipient. If you receive this e-mail in error please 
> delete it from your system immediately and notify us either by e-mail 
> or telephone. You should not copy, forward or otherwise disclose the 
> content of the e-mail. The views expressed in this communication may 
> not necessarily be the view held by WHISHWORKS.
>

*************************************************************************
This e-mail may contain confidential or privileged information.
If you are not the intended recipient, please notify the sender immediately and 
then delete it.

TIAA-CREF
*************************************************************************

Reply via email to