Hmmm, interesting, I actually hadn't thought of doing it that way. I don't know the internals well enough to comment on it but I do know someone who does. I'll check with them....
Erick On Thu, Jul 3, 2014 at 9:18 AM, Tom Chen <tomchen1...@gmail.com> wrote: > Hi, > > In the GoLive stage, the MRIT sends the MERGEINDEXES requests to Solr > instances. The request has a indexDir parameter with a hdfs path to the > index generated on HDFS, as shown in the MRIT log: > > 2014-07-02 15:03:55,123 DEBUG > org.apache.http.impl.conn.DefaultClientConnection: Sending request: GET > /solr/admin/cores?action=MERGEINDEXES&core=collection1&indexDir=hdfs%3A%2F% > 2Fhdtest041.test.com%3A9000%2Foutdir_webaccess_app%2Fresults%2Fpart-00000%2Fdata%2Findex&wt=javabin&version=2 > HTTP/1.1 > > So it's up to the Solr instance to understand reading index from HDFS > (rather than for the MRIT to find the local disk to write from HDFS). > > The go-live option is very convenient to merge generated index to live > index. It's desirable to use go-live than copy around indexes to local file > system and then merge. > > I tried to start Solr instance with these properties to allow solr instance > to write to local file system while being able to read index on HDFS when > doing MERGEINDEXES: > > -Dsolr.directoryFactory=HdfsDirectoryFactory \ > -Dsolr.hdfs.confdir=$HADOOP_HOME/hadoop-conf \ > -Dsolr.lock.type=hdfs \ > -Dsolr.hdfs.home=file:///opt/test/solr/node/solr \ > > i.e. the full command: > java -DnumShards=2 \ > -Dbootstrap_confdir=./solr/collection1/conf > -Dcollection.configName=myconf \ > -DzkHost=<zookeeper>:2181 \ > -Dhost=<node1> \ > -DSTOP.PORT=7983 -DSTOP.KEY=key \ > -Dsolr.directoryFactory=HdfsDirectoryFactory \ > -Dsolr.hdfs.confdir=$HADOOP_HOME/hadoop-conf \ > -Dsolr.lock.type=hdfs \ > -Dsolr.hdfs.home=file:///opt/test/solr/node/solr \ > -jar start.jar > > > With that, the go-live works fine. > > Any comment on this approach? > > > > Tom > > On Wed, Jul 2, 2014 at 9:50 PM, Erick Erickson <erickerick...@gmail.com> > wrote: > >> How would the MapReduceIndexerTool (MRIT for short) >> find the local disk to write from HDFS to for each shard? >> All it has is the information in the Solr configs, which are >> usually relative paths on the local Solr machines, relative >> to SOLR_HOME. Which could be different on each node >> (that would be screwy, but possible). >> >> Permissions would also be a royal pain to get right.... >> >> You _can_ forego the --go-live option and copy from >> the HDFS nodes to your local drive and then execute >> the "mergeIndexes" command, see: >> https://cwiki.apache.org/confluence/display/solr/Merging+Indexes >> Note that there is the MergeIndexTool, but there's also >> the Core Admin command. >> >> The sub-indexes are in a partition in HDFS and numbered >> sequentially. >> >> Best, >> Erick >> >> On Wed, Jul 2, 2014 at 3:23 PM, Tom Chen <tomchen1...@gmail.com> wrote: >> > Hi, >> > >> > >> > When we run Solr Map Reduce Indexer Tool ( >> > https://github.com/markrmiller/solr-map-reduce-example), it generates >> > indexes on HDFS >> > >> > The last stage is Go Live to merge the generated index to live SolrCloud >> > index. >> > >> > If the live SolrCloud write index to local file system (rather than >> HDFS), >> > the Go Live gives such error like this: >> > >> > 2014-07-02 13:41:01,518 INFO org.apache.solr.hadoop.GoLive: Live merge >> > hdfs:// >> > >> bdvs086.test.com:9000/tmp/0000088-140618120223665-oozie-oozi-W/results/part-00000 >> > into http://bdvs087.test.com:8983/solr >> > 2014-07-02 13:41:01,796 ERROR org.apache.solr.hadoop.GoLive: Error >> sending >> > live merge command >> > java.util.concurrent.ExecutionException: >> > org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException: >> > directory '/opt/testdir/solr/node/hdfs:/ >> > >> bdvs086.test.com:9000/tmp/0000088-140618120223665-oozie-oozi-W/results/part-00001/data/index >> ' >> > does not exist >> > at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:233) >> > at java.util.concurrent.FutureTask.get(FutureTask.java:94) >> > at org.apache.solr.hadoop.GoLive.goLive(GoLive.java:126) >> > at >> > >> org.apache.solr.hadoop.MapReduceIndexerTool.run(MapReduceIndexerTool.java:867) >> > at >> > >> org.apache.solr.hadoop.MapReduceIndexerTool.run(MapReduceIndexerTool.java:609) >> > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) >> > at >> > >> org.apache.solr.hadoop.MapReduceIndexerTool.main(MapReduceIndexerTool.java:596) >> > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) >> > at >> > >> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:60) >> > at >> > >> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:37) >> > at java.lang.reflect.Method.invoke(Method.java:611) >> > at >> > >> org.apache.oozie.action.hadoop.LauncherMapper.map(LauncherMapper.java:491) >> > at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50) >> > at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:434) >> > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370) >> > at org.apache.hadoop.mapred.Child$4.run(Child.java:255) >> > at java.security.AccessController.doPrivileged(AccessController.java:310) >> > at javax.security.auth.Subject.doAs(Subject.java:573) >> > at >> > >> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1502) >> > at org.apache.hadoop.mapred.Child.main(Child.java:249) >> > Caused by: >> > org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException: >> > directory '/opt/testdir/solr/node/hdfs:/ >> > >> bdvs086.test.com:9000/tmp/0000088-140618120223665-oozie-oozi-W/results/part-00001/data/index >> ' >> > does not exist >> > at >> > >> org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:495) >> > at >> > >> org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:199) >> > at >> > >> org.apache.solr.client.solrj.request.CoreAdminRequest.process(CoreAdminRequest.java:493) >> > at org.apache.solr.hadoop.GoLive$1.call(GoLive.java:100) >> > at org.apache.solr.hadoop.GoLive$1.call(GoLive.java:89) >> > at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:314) >> > at java.util.concurrent.FutureTask.run(FutureTask.java:149) >> > at >> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:452) >> > at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:314) >> > at java.util.concurrent.FutureTask.run(FutureTask.java:149) >> > at >> > >> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:897) >> > at >> > >> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:919) >> > at java.lang.Thread.run(Thread.java:738) >> > >> > Any way to setup SolrCloud to write index to local file system, while >> > allowing the Solr MapReduceIndexerTool's GoLive to merge index generated >> on >> > HDFS to the SolrCloud? >> > >> > Thanks, >> > Tom >>