I wonder if Solr has InputFormat and OutputFormat like the EsInputFormat
and EsOutputFormat that are provided by Elasticserach for Hadoop
(es-hadoop).
Is it possible for Solr to provide such integration with Hadoop?
Best,
Tom
Yes, there's SolrInputDocumentWritable and MapReduceIndexerTool, plus the
Morphline stuff (check out
https://github.com/markrmiller/solr-map-reduce-example).
Michael Della Bitta
Applications Developer
o: +1 646 532 3062
appinions inc.
“The Science of Influence Marketing”
18 East 41st Street
I'm aware of the MapReduceIndexerTool (MRIT). That might be solving the
indexing part -- the OutputFormat part.
But what I asked for is more on the making Solr index data available to
Hadoop MapReduce -- making Solr as a data store like what HDFS can provide.
With a Solr InputFormat, we can make
-- the OutputFormat part.
But what I asked for is more on the making Solr index data available to
Hadoop MapReduce -- making Solr as a data store like what HDFS can provide.
With a Solr InputFormat, we can make the Solr index data available to
Hadoop MapReduce. Along the same line, we can also make Solr
Thanks Eric,
I will watch out for Map reduce option. It will be helpfull if I get any
links to set up hadoop with solr.
--
View this message in context:
http://lucene.472066.n3.nabble.com/Integrating-solr-with-Hadoop-tp4144715p4145157.html
Sent from the Solr - User mailing list archive
. Solr 4.7 + Tomcat 7 + Apache zookeeper and Hadoop.
Thanks
Guru Pai.
--
View this message in context:
http://lucene.472066.n3.nabble.com/Integrating-solr-with-Hadoop-tp4144715p4144916.html
Sent from the Solr - User mailing list archive at Nabble.com.
Hi,
I want to setup solr in production, Initially the data set i am using is of
small scale, the size of data will grow gradually. I have heard about using
*Big Data Work for Hadoop and Solr*, Is this a better option for large
data or better to go ahead with tomcat or jetty server with solr
Whoa! You're confusing a couple of things I think.
The only real connection Solr - Hadoop _may_
be that Solr can have its indexes stored on HDFS.
Well, you can also create map/reduce jobs that
will index the data via M/R and merge them
into a live index in Solr (assuming it's storing its
indexes
On 6/30/2014 3:19 AM, gurunath wrote:
I want to setup solr in production, Initially the data set i am using is of
small scale, the size of data will grow gradually. I have heard about using
*Big Data Work for Hadoop and Solr*, Is this a better option for large
data or better to go ahead
erickerick...@gmail.com wrote:
Whoa! You're confusing a couple of things I think.
The only real connection Solr - Hadoop _may_
be that Solr can have its indexes stored on HDFS.
Well, you can also create map/reduce jobs that
will index the data via M/R and merge them
into a live index in Solr
zookeeper and Hadoop.
Thanks
Guru Pai.
--
View this message in context:
http://lucene.472066.n3.nabble.com/Integrating-solr-with-Hadoop-tp4144715p4144916.html
Sent from the Solr - User mailing list archive at Nabble.com.
Rajesh,
If you require to have an integration between Solr and Hadoop or NoSQL, I
would recommend using a commercial distribution. I think most are free to
use as long as you don't require support.
I inquired about the Cloudera Search capability, but it seems like that
far it is just preliminary
have more questions.
Regards
From: mlie...@impetus.com
To: solr-user@lucene.apache.org
Subject: Re: Solr with Hadoop
Date: Thu, 18 Jul 2013 15:41:36 +
Rajesh,
If you require to have an integration between Solr and Hadoop or NoSQL, I
would recommend using a commercial distribution. I
I
have a newbie question on integrating Solr with Hadoop.
There are some vendors like Cloudera/MapR who have announced Solr Search
for Hadoop.
If I use the Apache distro, how can I use Solr Search on docs in HDFS/Hadoop
Is there a tutorial on how to use it or getting started.
I am using Flume
/HH/10/indexes/
/2012/11/05/HH/20/indexes/
/2012/11/05/HH/30/indexes/
/2012/11/05/HH/40/indexes/
Anyone have an example of how to do this?
--
View this message in context:
http://lucene.472066.n3.nabble.com/Example-for-Scheduling-Solr-Indexing-Hadoop-tp4019862.html
Sent from the Solr - User
to do this?
--
View this message in context:
http://lucene.472066.n3.nabble.com/Example-for-Scheduling-Solr-Indexing-Hadoop-tp4019862.html
Sent from the Solr - User mailing list archive at Nabble.com.
Hi,
If anyone is interested, I am available for full-time assignments; I am
involved in Hadoop/Lucene/Solr world since 2005 (Nutch). Recently
implemented Lily-Framework-based distributed task executor which is
currently used for Vertical Search by lead insurance companies and media:
RSS, CVS
Hi,
If anyone is interested, I am available for full-time assignments; I am
involved in Hadoop/Lucene/Solr world since 2005 (Nutch). Recently
implemented Lily-Framework-based distributed task executor which is
currently used for Vertical Search by lead insurance companies and media:
RSS, CVS
correctly, what about updating the documents
and what about replicating them correctly?
Does this work? Or wasn't this an issue?
Kind regards
- Mitch
--
View this message in context:
http://lucene.472066.n3.nabble.com/solr-with-hadoop-tp482688p944413.html
Sent from the Solr - User mailing
I need to revive this discussion...
If you do distributed indexing correctly, what about updating the documents
and what about replicating them correctly?
Does this work? Or wasn't this an issue?
Kind regards
- Mitch
--
View this message in context:
http://lucene.472066.n3.nabble.com/solr
From: Jon Baer jonb...@gmail.com
To: solr-user@lucene.apache.org
Sent: Tue, June 22, 2010 12:47:14 PM
Subject: Re: solr with hadoop
I was playing around w/ Sqoop the other day, its a simple Cloudera tool for
imports (mysql - hdfs) @
href=http://www.cloudera.com/developers/downloads/sqoop
of mapreduce?
Any directions and guidance over this setup would be highly appreciated.
Thanks in advance,
-Ali
--
View this message in context:
http://lucene.472066.n3.nabble.com/solr-with-hadoop-tp482688p914483.html
Sent from the Solr - User mailing list archive at Nabble.com.
I think a good solution could be to use hadoop with SOLR-1301 to build solr
shards and then use solr distributed search against these shards (you will
have to copy to local from HDFS to search against them)
--
View this message in context:
http://lucene.472066.n3.nabble.com/solr-with-hadoop
Message
From: Stu Hood stuh...@webmail.us
To: solr-user@lucene.apache.org
Sent: Monday, January 7, 2008 7:14:20 PM
Subject: Re: solr with hadoop
As Mike suggested, we use Hadoop to organize our data en route to Solr.
Hadoop allows us to load balance the indexing stage, and then we use
IndexWriter.addAllIndexes or do you do that
outside Hadoop?
Thanks,
Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
- Original Message
From: Stu Hood stuh...@webmail.us
To: solr-user@lucene.apache.org
Sent: Monday, January 7, 2008 7:14:20 PM
Subject: Re: solr with hadoop
On Nov 28, 2008, at 8:38 PM, Yonik Seeley wrote:
Or, it would be relatively trivial to write a Lucene program
to merge the indexes.
FYI, such a tool exists in Lucene's API already:
http://lucene.apache.org/java/2_4_0/api/org/apache/lucene/misc/IndexMergeTool.html
Erik
HadoopEntityProcessor for the DIH?
Ive wondered about this as they make HadoopCluster LiveCDs and EC2
have images but best way to make use of them is always a challenge.
- Jon
On Nov 29, 2008, at 3:34 AM, Erik Hatcher wrote:
On Nov 28, 2008, at 8:38 PM, Yonik Seeley wrote:
Or, it would
On Sat, Nov 29, 2008 at 7:26 PM, Jon Baer [EMAIL PROTECTED] wrote:
HadoopEntityProcessor for the DIH?
Reading data from Hadoop with DIH could be really cool
There are a few very useful ones which are required badly. Most useful
one would be a TikaEntityProcessor.
But I do not see it solving the
using HDFS and MapReduce to
do the indexing job within time.
In that regard I have following queries regarding using Solr with Hadoop.
1. After creating the index using Hadoop whether storing them for query purpose
again in HDFS would mean additional performance overhead (compared to storing
While future Solr-hadoop integration is a definite possibility (and
will enable other cool stuff), it doesn't necessarily seem needed for
the problem you are trying to solve.
indexing them in parallel is not an option as my target doc size per hr
itself can be very huge (3-6M)
I'm not sure I
,
Sourav
-Original Message-
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Yonik Seeley
Sent: Friday, November 28, 2008 1:58 PM
To: solr-user@lucene.apache.org
Subject: Re: Using Solr with Hadoop
While future Solr-hadoop integration is a definite possibility
Solr given the limited capabilities of the servers.
Regards,
Sourav
-Original Message-
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Yonik Seeley
Sent: Friday, November 28, 2008 1:58 PM
To: solr-user@lucene.apache.org
Subject: Re: Using Solr with Hadoop
While
Message-
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Yonik Seeley
Sent: Friday, November 28, 2008 5:38 PM
To: solr-user@lucene.apache.org
Subject: Re: Using Solr with Hadoop
The indexing rate you need to achieve should be equal to the rate that
new documents are produced
Ah sorry, I had misread your original post. 3-6M docs per hour can be
challenging.
Using the CSV loader, I've indexed 4000 docs per second (14M per hour)
on a 2.6GHz Athlon, but they were relatively simple and small docs.
On Fri, Nov 28, 2008 at 9:54 PM, souravm [EMAIL PROTECTED] wrote:
There
be
worthwhile where I use Solr/Lucene's indexing power and Hadoop's parallel
processing capability.
Regards,
Sourav
-Original Message-
From: Yonik Seeley [mailto:[EMAIL PROTECTED]
Sent: Friday, November 28, 2008 7:08 PM
To: solr-user@lucene.apache.org
Subject: Re: Using Solr with Hadoop
As Mike suggested, we use Hadoop to organize our data en route to Solr. Hadoop
allows us to load balance the indexing stage, and then we use the raw Lucene
IndexWriter.addAllIndexes method to merge the data to be hosted on Solr
instances.
Thanks,
Stu
-Original Message-
From: Mike
-user@lucene.apache.org
Sent: Monday, January 7, 2008 7:14:20 PM
Subject: Re: solr with hadoop
As Mike suggested, we use Hadoop to organize our data en route to Solr.
Hadoop allows us to load balance the indexing stage, and then we use
the raw Lucene IndexWriter.addAllIndexes method to merge
.
So I was thinking is any benefits to use hadoop for this? And if
so, what direction should I go? Is anybody did something for
integration Solr with Hadoop? Does it give any performance boost?
Hadoop might be useful for organizing your data enroute to Solr, but
I don't see how it could
, are running slow.
So I was thinking is any benefits to use hadoop for this? And if so,
what direction should I go? Is anybody did something for integration
Solr with Hadoop? Does it give any performance boost?
Hadoop might be useful for organizing your data enroute to Solr, but I
don't see how
39 matches
Mail list logo