Solr and hadoop

2014-09-25 Thread Tom Chen
I wonder if Solr has InputFormat and OutputFormat like the EsInputFormat and EsOutputFormat that are provided by Elasticserach for Hadoop (es-hadoop). Is it possible for Solr to provide such integration with Hadoop? Best, Tom

Re: Solr and hadoop

2014-09-25 Thread Michael Della Bitta
Yes, there's SolrInputDocumentWritable and MapReduceIndexerTool, plus the Morphline stuff (check out https://github.com/markrmiller/solr-map-reduce-example). Michael Della Bitta Applications Developer o: +1 646 532 3062 appinions inc. “The Science of Influence Marketing” 18 East 41st Street

Re: Solr and hadoop

2014-09-25 Thread Tom Chen
I'm aware of the MapReduceIndexerTool (MRIT). That might be solving the indexing part -- the OutputFormat part. But what I asked for is more on the making Solr index data available to Hadoop MapReduce -- making Solr as a data store like what HDFS can provide. With a Solr InputFormat, we can make

Re: Solr and hadoop

2014-09-25 Thread Joel Bernstein
-- the OutputFormat part. But what I asked for is more on the making Solr index data available to Hadoop MapReduce -- making Solr as a data store like what HDFS can provide. With a Solr InputFormat, we can make the Solr index data available to Hadoop MapReduce. Along the same line, we can also make Solr

Re: Integrating solr with Hadoop

2014-07-02 Thread gurunath
Thanks Eric, I will watch out for Map reduce option. It will be helpfull if I get any links to set up hadoop with solr. -- View this message in context: http://lucene.472066.n3.nabble.com/Integrating-solr-with-Hadoop-tp4144715p4145157.html Sent from the Solr - User mailing list archive

Re: Integrating solr with Hadoop

2014-07-01 Thread Erick Erickson
. Solr 4.7 + Tomcat 7 + Apache zookeeper and Hadoop. Thanks Guru Pai. -- View this message in context: http://lucene.472066.n3.nabble.com/Integrating-solr-with-Hadoop-tp4144715p4144916.html Sent from the Solr - User mailing list archive at Nabble.com.

Integrating solr with Hadoop

2014-06-30 Thread gurunath
Hi, I want to setup solr in production, Initially the data set i am using is of small scale, the size of data will grow gradually. I have heard about using *Big Data Work for Hadoop and Solr*, Is this a better option for large data or better to go ahead with tomcat or jetty server with solr

Re: Integrating solr with Hadoop

2014-06-30 Thread Erick Erickson
Whoa! You're confusing a couple of things I think. The only real connection Solr - Hadoop _may_ be that Solr can have its indexes stored on HDFS. Well, you can also create map/reduce jobs that will index the data via M/R and merge them into a live index in Solr (assuming it's storing its indexes

Re: Integrating solr with Hadoop

2014-06-30 Thread Shawn Heisey
On 6/30/2014 3:19 AM, gurunath wrote: I want to setup solr in production, Initially the data set i am using is of small scale, the size of data will grow gradually. I have heard about using *Big Data Work for Hadoop and Solr*, Is this a better option for large data or better to go ahead

Re: Integrating solr with Hadoop

2014-06-30 Thread Jay Vyas
erickerick...@gmail.com wrote: Whoa! You're confusing a couple of things I think. The only real connection Solr - Hadoop _may_ be that Solr can have its indexes stored on HDFS. Well, you can also create map/reduce jobs that will index the data via M/R and merge them into a live index in Solr

Re: Integrating solr with Hadoop

2014-06-30 Thread gurunath
zookeeper and Hadoop. Thanks Guru Pai. -- View this message in context: http://lucene.472066.n3.nabble.com/Integrating-solr-with-Hadoop-tp4144715p4144916.html Sent from the Solr - User mailing list archive at Nabble.com.

Re: Solr with Hadoop

2013-07-18 Thread Matt Lieber
Rajesh, If you require to have an integration between Solr and Hadoop or NoSQL, I would recommend using a commercial distribution. I think most are free to use as long as you don't require support. I inquired about the Cloudera Search capability, but it seems like that far it is just preliminary

RE: Solr with Hadoop

2013-07-18 Thread Saikat Kanjilal
have more questions. Regards From: mlie...@impetus.com To: solr-user@lucene.apache.org Subject: Re: Solr with Hadoop Date: Thu, 18 Jul 2013 15:41:36 + Rajesh, If you require to have an integration between Solr and Hadoop or NoSQL, I would recommend using a commercial distribution. I

Solr with Hadoop

2013-07-17 Thread Rajesh Jain
I have a newbie question on integrating Solr with Hadoop. There are some vendors like Cloudera/MapR who have announced Solr Search for Hadoop. If I use the Apache distro, how can I use Solr Search on docs in HDFS/Hadoop Is there a tutorial on how to use it or getting started. I am using Flume

Example for Scheduling Solr Indexing - Hadoop

2012-11-12 Thread Britt
/HH/10/indexes/ /2012/11/05/HH/20/indexes/ /2012/11/05/HH/30/indexes/ /2012/11/05/HH/40/indexes/ Anyone have an example of how to do this? -- View this message in context: http://lucene.472066.n3.nabble.com/Example-for-Scheduling-Solr-Indexing-Hadoop-tp4019862.html Sent from the Solr - User

Re: Example for Scheduling Solr Indexing - Hadoop

2012-11-12 Thread Otis Gospodnetic
to do this? -- View this message in context: http://lucene.472066.n3.nabble.com/Example-for-Scheduling-Solr-Indexing-Hadoop-tp4019862.html Sent from the Solr - User mailing list archive at Nabble.com.

Solr Consultant Available in Canada: Solr, HBase, Hadoop, Mahout, Lily

2012-04-16 Thread Fuad Efendi
Hi, If anyone is interested, I am available for full-time assignments; I am involved in Hadoop/Lucene/Solr world since 2005 (Nutch). Recently implemented Lily-Framework-based distributed task executor which is currently used for Vertical Search by lead insurance companies and media: RSS, CVS

Solr Consultant Available in Canada: Solr, HBase, Hadoop, Lily

2012-04-16 Thread Fuad Efendi
Hi, If anyone is interested, I am available for full-time assignments; I am involved in Hadoop/Lucene/Solr world since 2005 (Nutch). Recently implemented Lily-Framework-based distributed task executor which is currently used for Vertical Search by lead insurance companies and media: RSS, CVS

Re: solr with hadoop

2010-07-06 Thread Jason Rutherglen
correctly, what about updating the documents and what about replicating them correctly? Does this work? Or wasn't this an issue? Kind regards - Mitch -- View this message in context: http://lucene.472066.n3.nabble.com/solr-with-hadoop-tp482688p944413.html Sent from the Solr - User mailing

Re: solr with hadoop

2010-07-05 Thread MitchK
I need to revive this discussion... If you do distributed indexing correctly, what about updating the documents and what about replicating them correctly? Does this work? Or wasn't this an issue? Kind regards - Mitch -- View this message in context: http://lucene.472066.n3.nabble.com/solr

Re: solr with hadoop

2010-06-23 Thread Otis Gospodnetic
From: Jon Baer jonb...@gmail.com To: solr-user@lucene.apache.org Sent: Tue, June 22, 2010 12:47:14 PM Subject: Re: solr with hadoop I was playing around w/ Sqoop the other day, its a simple Cloudera tool for imports (mysql - hdfs) @ href=http://www.cloudera.com/developers/downloads/sqoop

Re: solr with hadoop

2010-06-22 Thread Neeb
of mapreduce? Any directions and guidance over this setup would be highly appreciated. Thanks in advance, -Ali -- View this message in context: http://lucene.472066.n3.nabble.com/solr-with-hadoop-tp482688p914483.html Sent from the Solr - User mailing list archive at Nabble.com.

Re: solr with hadoop

2010-06-22 Thread Marc Sturlese
I think a good solution could be to use hadoop with SOLR-1301 to build solr shards and then use solr distributed search against these shards (you will have to copy to local from HDFS to search against them) -- View this message in context: http://lucene.472066.n3.nabble.com/solr-with-hadoop

Re: solr with hadoop

2010-06-22 Thread MitchK
Message From: Stu Hood stuh...@webmail.us To: solr-user@lucene.apache.org Sent: Monday, January 7, 2008 7:14:20 PM Subject: Re: solr with hadoop As Mike suggested, we use Hadoop to organize our data en route to Solr. Hadoop allows us to load balance the indexing stage, and then we use

Re: solr with hadoop

2010-06-22 Thread Jon Baer
IndexWriter.addAllIndexes or do you do that outside Hadoop? Thanks, Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: Stu Hood stuh...@webmail.us To: solr-user@lucene.apache.org Sent: Monday, January 7, 2008 7:14:20 PM Subject: Re: solr with hadoop

Re: Using Solr with Hadoop ....

2008-11-29 Thread Erik Hatcher
On Nov 28, 2008, at 8:38 PM, Yonik Seeley wrote: Or, it would be relatively trivial to write a Lucene program to merge the indexes. FYI, such a tool exists in Lucene's API already: http://lucene.apache.org/java/2_4_0/api/org/apache/lucene/misc/IndexMergeTool.html Erik

Re: Using Solr with Hadoop ....

2008-11-29 Thread Jon Baer
HadoopEntityProcessor for the DIH? Ive wondered about this as they make HadoopCluster LiveCDs and EC2 have images but best way to make use of them is always a challenge. - Jon On Nov 29, 2008, at 3:34 AM, Erik Hatcher wrote: On Nov 28, 2008, at 8:38 PM, Yonik Seeley wrote: Or, it would

Re: Using Solr with Hadoop ....

2008-11-29 Thread Noble Paul നോബിള്‍ नोब्ळ्
On Sat, Nov 29, 2008 at 7:26 PM, Jon Baer [EMAIL PROTECTED] wrote: HadoopEntityProcessor for the DIH? Reading data from Hadoop with DIH could be really cool There are a few very useful ones which are required badly. Most useful one would be a TikaEntityProcessor. But I do not see it solving the

Using Solr with Hadoop ....

2008-11-28 Thread souravm
using HDFS and MapReduce to do the indexing job within time. In that regard I have following queries regarding using Solr with Hadoop. 1. After creating the index using Hadoop whether storing them for query purpose again in HDFS would mean additional performance overhead (compared to storing

Re: Using Solr with Hadoop ....

2008-11-28 Thread Yonik Seeley
While future Solr-hadoop integration is a definite possibility (and will enable other cool stuff), it doesn't necessarily seem needed for the problem you are trying to solve. indexing them in parallel is not an option as my target doc size per hr itself can be very huge (3-6M) I'm not sure I

RE: Using Solr with Hadoop ....

2008-11-28 Thread souravm
, Sourav -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Yonik Seeley Sent: Friday, November 28, 2008 1:58 PM To: solr-user@lucene.apache.org Subject: Re: Using Solr with Hadoop While future Solr-hadoop integration is a definite possibility

Re: Using Solr with Hadoop ....

2008-11-28 Thread Yonik Seeley
Solr given the limited capabilities of the servers. Regards, Sourav -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Yonik Seeley Sent: Friday, November 28, 2008 1:58 PM To: solr-user@lucene.apache.org Subject: Re: Using Solr with Hadoop While

RE: Using Solr with Hadoop ....

2008-11-28 Thread souravm
Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Yonik Seeley Sent: Friday, November 28, 2008 5:38 PM To: solr-user@lucene.apache.org Subject: Re: Using Solr with Hadoop The indexing rate you need to achieve should be equal to the rate that new documents are produced

Re: Using Solr with Hadoop ....

2008-11-28 Thread Yonik Seeley
Ah sorry, I had misread your original post. 3-6M docs per hour can be challenging. Using the CSV loader, I've indexed 4000 docs per second (14M per hour) on a 2.6GHz Athlon, but they were relatively simple and small docs. On Fri, Nov 28, 2008 at 9:54 PM, souravm [EMAIL PROTECTED] wrote: There

RE: Using Solr with Hadoop ....

2008-11-28 Thread souravm
be worthwhile where I use Solr/Lucene's indexing power and Hadoop's parallel processing capability. Regards, Sourav -Original Message- From: Yonik Seeley [mailto:[EMAIL PROTECTED] Sent: Friday, November 28, 2008 7:08 PM To: solr-user@lucene.apache.org Subject: Re: Using Solr with Hadoop

Re: solr with hadoop

2008-01-07 Thread Stu Hood
As Mike suggested, we use Hadoop to organize our data en route to Solr. Hadoop allows us to load balance the indexing stage, and then we use the raw Lucene IndexWriter.addAllIndexes method to merge the data to be hosted on Solr instances. Thanks, Stu -Original Message- From: Mike

Re: solr with hadoop

2008-01-07 Thread Otis Gospodnetic
-user@lucene.apache.org Sent: Monday, January 7, 2008 7:14:20 PM Subject: Re: solr with hadoop As Mike suggested, we use Hadoop to organize our data en route to Solr. Hadoop allows us to load balance the indexing stage, and then we use the raw Lucene IndexWriter.addAllIndexes method to merge

Re: solr with hadoop

2008-01-04 Thread Mike Klaas
. So I was thinking is any benefits to use hadoop for this? And if so, what direction should I go? Is anybody did something for integration Solr with Hadoop? Does it give any performance boost? Hadoop might be useful for organizing your data enroute to Solr, but I don't see how it could

Re: solr with hadoop

2008-01-04 Thread Ryan McKinley
, are running slow. So I was thinking is any benefits to use hadoop for this? And if so, what direction should I go? Is anybody did something for integration Solr with Hadoop? Does it give any performance boost? Hadoop might be useful for organizing your data enroute to Solr, but I don't see how