Re: Approach for Merge Database and Files

Erick Erickson Wed, 27 Jun 2018 08:48:02 -0700

Seriously consider de-normalizing the data at index time. Your
indexing client just accesses both the DB and the file system, selects
the relevant data from each then indexes that data as a _single_
document.


Then there's no joining necessary at query time.

It's a common pattern to do extra work at index time on the theory
that you query much more often than indexing.

Best,
Erick

On Wed, Jun 27, 2018 at 7:01 AM, Angel Addati <angeladd...@gmail.com> wrote:
> Thank you Erick. I readed a lot of in the web and I still a litle lost. In
> summarizze, I need perform a join (sql join). For the moment, I evaluated
> the following alternatives:
>
>    1. Alias (first response). I think it doesn't solve the problem, because
>    I understand this is usefull to union (sql union) of the results. So I dont
>    found how can it help me.
>    2. Join Query. it is not a sql join because only show data for the first
>    item and it only filters for the second item.So I discharded it.
>    3. Block join. I dont finish understand it (the functionality and where
>    I need to configure). But I think it dont solve the needed because It
>    define a paret/child relationship.
>
>
> Do you have some clue to investigate?  Thank you for your help!
>
>
> *Angel** Adrián Addati*
>
>
> 2018-06-26 11:31 GMT-03:00 Erick Erickson <erickerick...@gmail.com>:
>
>> bq.  I don't know if the best approach is combine in index time or in
>> query time
>>
>> It Depends (tm). What is your goal? Let's say you have db_f1 and fm_f2
>> (db == from the database and fm = file data).
>>
>> If you want to form a Solr query like
>>
>> db_f1:something fm_f2:something_else
>>
>> you don't have much choice, you've got to do it at index time or your
>> search time will be horrible.
>>
>> OTOH, if you want search, say, _only_ on the db_* data or _only_ on
>> the file data and enrich the results returned to the user with data
>> from the other source, that's perfectly reasonable, although you
>> should really do some prototyping to see if it meets your SLA. This
>> presupposes that you're only returning a few rows. For example, use
>> Solr to get the top 10 docs based on file data and have your app layer
>> reach out to the DB to enrich just those 10 docs.
>>
>> In general, you should always consider doing as much pre-processing at
>> index time as you can on the theory that what you want is fast
>> searches and you'll search over a doc many, many more times than you
>> index it.
>>
>> Best,
>> Erick
>>
>>
>> On Tue, Jun 26, 2018 at 7:02 AM, Angel Addati <angeladd...@gmail.com>
>> wrote:
>> > Thank both.
>> >
>> > *"From your problem description, it looks like you want to gather the
>> data
>> > from the DB and filesystem and combine them into a Solr document at index
>> > time, then index that document. " *
>> >
>> > Exactly. I don't know if the best approach is combine in index time or in
>> > query time. But I need search and show results of the combine items. I'm
>> > investigating the allias sugguest. Do you think it solve the problem or
>> Do
>> > you know other approach?
>> >
>> > PD: I need put the information in the file and the information in the
>> data
>> > base also, because it have some important content and metadata.
>> >
>> > Regards...
>> >
>> > * - - -*
>> > *Angel** Adrián Addati*
>> >
>> >
>> > 2018-06-26 10:50 GMT-03:00 Erick Erickson <erickerick...@gmail.com>:
>> >
>> >> From your problem description, it looks like you want to gather the
>> >> data from the DB and filesystem and combine them into a Solr document
>> >> at index time, then index that document.
>> >>
>> >> Put enough information in Solr to fetch the document as necessary,
>> >> often people don't put the entire file in Solr especially if it's,
>> >> say, a PDF or Word etc.
>> >>
>> >> Best,
>> >> Erick
>> >>
>> >> On Tue, Jun 26, 2018 at 5:21 AM, Peter Gylling Jørgensen
>> >> <peter.jorgen...@findwise.com> wrote:
>> >> > Hi,
>> >> >
>> >> > I would create a search alias, that contains the latest versions of
>> the
>> >> different collections.
>> >> >
>> >> > See:
>> >> > https://lucene.apache.org/solr/guide/7_3/collections-
>> >> api.html#collections-api
>> >> >
>> >> > Then you use this alias to search for results
>> >> >
>> >> > You get better results if you define the same schema for all
>> collections
>> >> >
>> >> > Best Regards
>> >> > Peter Gylling Jørgensen
>> >> > Findability Consultant
>> >> > Mail: peter.jorgen...@findwise.com<mailto:peter.jorgensen@
>> findwise.com>
>> >> > Mobile: +45 42442890
>> >> >
>> >> >
>> >> > Den 26. jun. 2018 kl. 13.55 skrev angeladdati <angeladd...@gmail.com
>> >> <mailto:angeladd...@gmail.com>>:
>> >> >
>> >> > Hi:
>> >> >
>> >> > I have two sources to indexing:
>> >> > Database: MetadataDB1, MetadataDB2, File Url...
>> >> > Files: MetadataF1, MetadataF2, File Url, Contain...
>> >> >
>> >> > I index the data base and the files. When I search, I need search and
>> >> show
>> >> > the merge result: Database + Files (MetadataDb1, MetadataDB2,
>> MetadataF1,
>> >> > MetadataF2, File Url, Contain, ...).
>> >> >
>> >> >
>> >> > Is it possible?
>> >> >
>> >> > Regards!
>> >> >
>> >> > Angel
>> >> >
>> >> >
>> >> >
>> >> > --
>> >> > Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
>> >>
>>

Re: Approach for Merge Database and Files

Reply via email to