Hmmm, interesting, I actually hadn't thought of doing it
that way. I don't know the internals well enough to comment on it
but I do know someone who does. I'll check with them....

Erick

On Thu, Jul 3, 2014 at 9:18 AM, Tom Chen <tomchen1...@gmail.com> wrote:
> Hi,
>
> In the GoLive stage, the MRIT sends the MERGEINDEXES requests to Solr
> instances. The request has a indexDir parameter with a hdfs path to the
> index generated on HDFS, as shown in the MRIT log:
>
> 2014-07-02 15:03:55,123 DEBUG
> org.apache.http.impl.conn.DefaultClientConnection: Sending request: GET
> /solr/admin/cores?action=MERGEINDEXES&core=collection1&indexDir=hdfs%3A%2F%
> 2Fhdtest041.test.com%3A9000%2Foutdir_webaccess_app%2Fresults%2Fpart-00000%2Fdata%2Findex&wt=javabin&version=2
> HTTP/1.1
>
> So it's up to the Solr instance to understand reading index from HDFS
> (rather than for the MRIT to find the local disk to write from HDFS).
>
> The go-live option is very convenient to merge generated index to live
> index. It's desirable to use go-live than copy around indexes to local file
> system and then merge.
>
> I tried to start Solr instance with these properties to allow solr instance
> to write to local file system while being able to read index on HDFS when
> doing MERGEINDEXES:
>
>   -Dsolr.directoryFactory=HdfsDirectoryFactory \
>   -Dsolr.hdfs.confdir=$HADOOP_HOME/hadoop-conf \
>   -Dsolr.lock.type=hdfs \
>   -Dsolr.hdfs.home=file:///opt/test/solr/node/solr \
>
> i.e. the full command:
> java -DnumShards=2 \
>   -Dbootstrap_confdir=./solr/collection1/conf
> -Dcollection.configName=myconf \
>   -DzkHost=<zookeeper>:2181 \
>   -Dhost=<node1> \
>   -DSTOP.PORT=7983 -DSTOP.KEY=key \
>   -Dsolr.directoryFactory=HdfsDirectoryFactory \
>   -Dsolr.hdfs.confdir=$HADOOP_HOME/hadoop-conf \
>   -Dsolr.lock.type=hdfs \
>   -Dsolr.hdfs.home=file:///opt/test/solr/node/solr \
>   -jar start.jar
>
>
> With that, the  go-live works fine.
>
> Any comment on this approach?
>
>
>
> Tom
>
> On Wed, Jul 2, 2014 at 9:50 PM, Erick Erickson <erickerick...@gmail.com>
> wrote:
>
>> How would the MapReduceIndexerTool (MRIT for short)
>> find the local disk to write from HDFS to for each shard?
>> All it has is the information in the Solr configs, which are
>> usually relative paths on the local Solr machines, relative
>> to SOLR_HOME. Which could be different on each node
>> (that would be screwy, but possible).
>>
>> Permissions would also be a royal pain to get right....
>>
>> You _can_ forego the --go-live option and copy from
>> the HDFS nodes to your local drive and then execute
>> the "mergeIndexes" command, see:
>> https://cwiki.apache.org/confluence/display/solr/Merging+Indexes
>> Note that there is the MergeIndexTool, but there's also
>> the Core Admin command.
>>
>> The sub-indexes are in a partition in HDFS and numbered
>> sequentially.
>>
>> Best,
>> Erick
>>
>> On Wed, Jul 2, 2014 at 3:23 PM, Tom Chen <tomchen1...@gmail.com> wrote:
>> > Hi,
>> >
>> >
>> > When we run Solr Map Reduce Indexer Tool (
>> > https://github.com/markrmiller/solr-map-reduce-example), it generates
>> > indexes on HDFS
>> >
>> > The last stage is Go Live to merge the generated index to live SolrCloud
>> > index.
>> >
>> > If the live SolrCloud write index to local file system (rather than
>> HDFS),
>> > the Go Live gives such error like this:
>> >
>> > 2014-07-02 13:41:01,518 INFO org.apache.solr.hadoop.GoLive: Live merge
>> > hdfs://
>> >
>> bdvs086.test.com:9000/tmp/0000088-140618120223665-oozie-oozi-W/results/part-00000
>> > into http://bdvs087.test.com:8983/solr
>> > 2014-07-02 13:41:01,796 ERROR org.apache.solr.hadoop.GoLive: Error
>> sending
>> > live merge command
>> > java.util.concurrent.ExecutionException:
>> > org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException:
>> > directory '/opt/testdir/solr/node/hdfs:/
>> >
>> bdvs086.test.com:9000/tmp/0000088-140618120223665-oozie-oozi-W/results/part-00001/data/index
>> '
>> > does not exist
>> > at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:233)
>> > at java.util.concurrent.FutureTask.get(FutureTask.java:94)
>> > at org.apache.solr.hadoop.GoLive.goLive(GoLive.java:126)
>> > at
>> >
>> org.apache.solr.hadoop.MapReduceIndexerTool.run(MapReduceIndexerTool.java:867)
>> > at
>> >
>> org.apache.solr.hadoop.MapReduceIndexerTool.run(MapReduceIndexerTool.java:609)
>> > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
>> > at
>> >
>> org.apache.solr.hadoop.MapReduceIndexerTool.main(MapReduceIndexerTool.java:596)
>> > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>> > at
>> >
>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:60)
>> > at
>> >
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:37)
>> > at java.lang.reflect.Method.invoke(Method.java:611)
>> > at
>> >
>> org.apache.oozie.action.hadoop.LauncherMapper.map(LauncherMapper.java:491)
>> > at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
>> > at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:434)
>> > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
>> > at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
>> > at java.security.AccessController.doPrivileged(AccessController.java:310)
>> > at javax.security.auth.Subject.doAs(Subject.java:573)
>> > at
>> >
>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1502)
>> > at org.apache.hadoop.mapred.Child.main(Child.java:249)
>> > Caused by:
>> > org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException:
>> > directory '/opt/testdir/solr/node/hdfs:/
>> >
>> bdvs086.test.com:9000/tmp/0000088-140618120223665-oozie-oozi-W/results/part-00001/data/index
>> '
>> > does not exist
>> > at
>> >
>> org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:495)
>> > at
>> >
>> org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:199)
>> > at
>> >
>> org.apache.solr.client.solrj.request.CoreAdminRequest.process(CoreAdminRequest.java:493)
>> > at org.apache.solr.hadoop.GoLive$1.call(GoLive.java:100)
>> > at org.apache.solr.hadoop.GoLive$1.call(GoLive.java:89)
>> > at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:314)
>> > at java.util.concurrent.FutureTask.run(FutureTask.java:149)
>> > at
>> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:452)
>> > at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:314)
>> > at java.util.concurrent.FutureTask.run(FutureTask.java:149)
>> > at
>> >
>> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:897)
>> > at
>> >
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:919)
>> > at java.lang.Thread.run(Thread.java:738)
>> >
>> > Any way to setup SolrCloud to write index to local file system, while
>> > allowing the Solr MapReduceIndexerTool's GoLive to merge index generated
>> on
>> > HDFS to the SolrCloud?
>> >
>> > Thanks,
>> > Tom
>>

Reply via email to