Re: Question about best way to architect a Solr application with many data sources

2017-02-23 Thread Joel Bernstein
Alfresco has spent ten+ years building a content management system that follows this basic design: 1) Original bytes (PDF, Word Doc, image file) are stored in a filesystem based content store. 2) Meta-data is stored in a relational database, normalized. 3) Content is transformed to text and

Re: Question about best way to architect a Solr application with many data sources

2017-02-22 Thread Tim Casey
I would possibly extend this a bit futher. There is the source, then the 'normalized' version of the data, then the indexed version. Sometimes you realize you miss something in the normalized view and you have to go back to the actual source. This will be as likely as there are number of sources

Re: Question about best way to architect a Solr application with many data sources

2017-02-21 Thread Walter Underwood
Reindexing is exactly why you want the Single Source of Truth to be in a repository outside of Solr. For our slowly-changing data sets, we have an intermediate JSONL batch. That is created from the source repositories and saved in Amazon S3. Then we load it into Solr nightly. That allows us to

Re: Question about best way to architect a Solr application with many data sources

2017-02-21 Thread Erick Erickson
Dave: Oh, I agree that a DB is a perfectly valid place to store the data and you're absolutely right that it allows better interaction than flat files; you can ask questions of an RDBMS that you can't easily ask the disk ;). Storing to disk is an alternative if you're unwilling to deal with a DB

Re: Question about best way to architect a Solr application with many data sources

2017-02-21 Thread Dave
Ha I think I went to one of your training seminars in NYC maybe 4 years ago Eric. I'm going to have to respectfully disagree about the rdbms. It's such a well know data format that you could hire a high school programmer to help with the db end if you knew how to flatten it to solr. Besides

Re: Question about best way to architect a Solr application with many data sources

2017-02-21 Thread Walter Underwood
Awesome advice. flat=fast in Solr. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On Feb 21, 2017, at 5:17 PM, Dave wrote: > > B is a better option long term. Solr is meant for retrieving flat data, fast, > not

Re: Question about best way to architect a Solr application with many data sources

2017-02-21 Thread Erick Erickson
I'll add that I _guarantee_ you'll want to re-index the data as you change your schema and the like. You'll be able to do that much more quickly if the data is stored locally somehow. A RDBMS is not necessary however. You could simply store the data on disk in some format you could re-read and

Re: Question about best way to architect a Solr application with many data sources

2017-02-21 Thread Robert Hume
Thanks for that! I was thinking (B) too, but wanted guidance that I'm using the tool correctly. Am still interested in hearing opinions from others, thanks! rh On Tue, Feb 21, 2017 at 8:17 PM, Dave wrote: > B is a better option long term. Solr is meant for

Re: Question about best way to architect a Solr application with many data sources

2017-02-21 Thread David Hastings
And not to sound redundant but if you ever need help, database programmers are a dime a dozen, good luck finding solr developers that are available freelance for a price you're willing to pay. If you can do the solr anyone else that does web dev can do the sql > On Feb 21, 2017, at 8:17 PM,

Re: Question about best way to architect a Solr application with many data sources

2017-02-21 Thread Dave
B is a better option long term. Solr is meant for retrieving flat data, fast, not hierarchical. That's what a database is for and trust me you would rather have a real database on the end point. Each tool has a purpose, solr can never replace a relational database, and a relational database

Question about best way to architect a Solr application with many data sources

2017-02-21 Thread Robert Hume
To learn how to properly use Solr, I'm building a little experimental project with it to search for used car listings. Car listings appear on a variety of different places ... central places Craigslist and also many many individual Used Car dealership websites. I am wondering, should I: (a)