By the way, what version of Hadoop do you recommend to use with this GORA, hbase and nutch?
Thanks, Tom On Sat, Feb 20, 2016 at 10:59 PM, Lewis John Mcgibbney < lewis.mcgibb...@gmail.com> wrote: > This is a Nutch issue and always has been. > Please go to nutch user@ it is a Nutch configuration issue that is all > > > On Saturday, February 20, 2016, Tom Running <runningt...@gmail.com> wrote: > >> Lewis and Furkan, >> >> Thank you both for kindly explain and providing great tips in order for >> me to get Nutch, Gora and HBase working. I can see Nutch's crawl data in >> Hbase under the Webpage table by using scan 'webpage' with in hbase >> shell. Thank you. >> >> I am still trying to get SORL to work. >> After I ran this command. >> ./nutch solrindex http://localhost:8983/solr -all >> >> ****** it came back with the following info ***** >> ****** doesn't seem to have any problem there **** >> IndexingJob: starting >> Active IndexWriters : >> SOLRIndexWriter >> solr.server.url : URL of the SOLR instance (mandatory) >> solr.commit.size : buffer size when sending to SOLR (default 1000) >> solr.mapping.file : name of the mapping file for fields (default >> solrindex-mapping.xml) >> solr.auth : use authentication (default false) >> solr.auth.username : username for authentication >> solr.auth.password : password for authentication >> IndexingJob: done. >> >> *** it doesn't seem to have any errors******************** >> >> However, when I launch the SOLR Web UI interface can not query or find >> any things under the default collection1 or the >> gettingstarted_shard1_replica1 or gettingstarted_shard2_replica1 >> >> >> I have also tried with this option (with the colletion1) and still not >> able to query anything. >> ./nutch solrindex http://localhost:8983/solr/collection1 -all >> >> >> After download SOLR 4.10.3 and start it as it with command >> /home/solr/bin/solr start -e cloud -noprompt >> >> I did not modify any configuration file not posting any file or directory >> from within SOLR. >> I am assuming this command ./nutch solrindex >> http://localhost:8983/solr/collection1 >> will do all the posting and index for SOLR. >> >> Any ideas what am I missing here. Do I need to perform any things for >> SOLR for this to work? >> >> Thank you very much. >> Tom >> >> >> >> >> >> On Sat, Feb 20, 2016 at 4:07 AM, Furkan KAMACI <furkankam...@gmail.com> >> wrote: >> >>> Hi Tom, >>> >>> Download and configure both HBase and Solr and make them up. You do not >>> need to build Gora at your case (also neither Hbase nor Solr). It is a >>> dependency included at Nutch. >>> >>> Nutch will crawl webpages and use Gora as a backend system to >>> communicate with Hbase and Solr. >>> >>> Kind Regards, >>> Furkan KAMACI >>> 20 Şub 2016 10:45 tarihinde "Tom Running" <runningt...@gmail.com> yazdı: >>> >>> I meant SOLR 4.10.3 instead SOLR 2.X >>>> >>>> On Sat, Feb 20, 2016 at 3:44 AM, Tom Running <runningt...@gmail.com> >>>> wrote: >>>> >>>>> Great. Thank you. >>>>> >>>>> I am just wondering. How is building GORA will help with anything in >>>>> my situation? probably not, right? it doesn't seem I need to use any of >>>>> the built. >>>>> >>>>> It seems GORA already included in the SOLR 2.X and HBASE .98.9 >>>>> release. Is this a correct assumption? >>>>> >>>>> Thank you. >>>>> Tom >>>>> >>>>> On Sat, Feb 20, 2016 at 1:35 AM, Lewis John Mcgibbney < >>>>> lewis.mcgibb...@gmail.com> wrote: >>>>> >>>>>> Hi Tom, >>>>>> All you need to do is ensure that gora-hbase dependency is >>>>>> uncommented within $NUTCH_HOME/ivy/ivy.xml >>>>>> https://github.com/apache/nutch/blob/2.x/ivy/ivy.xml#L116 >>>>>> >>>>>> You then need to ensure that that the storage.data.store.class is >>>>>> correct in $NUTCH_HOME/conf/nutch-default.xml. This needs to be set to >>>>>> 'org.apache.gora.hbase.store.HBaseStore' >>>>>> >>>>>> https://github.com/apache/nutch/blob/2.x/conf/nutch-default.xml#L1333-L1371 >>>>>> >>>>>> Finally, you need to configure $NUTCH_HOME/conf/gora.properties >>>>>> https://github.com/apache/nutch/blob/2.x/conf/gora.properties >>>>>> Make sure that the correct gora-hbase configuration is included. >>>>>> >>>>>> That is all you need to do. >>>>>> Lewis >>>>>> >>>>>> On Fri, Feb 19, 2016 at 10:29 PM, Tom Running <runningt...@gmail.com> >>>>>> wrote: >>>>>> >>>>>>> Furkan, >>>>>>> >>>>>>> What you had mention is exactly what I am trying to accomplish. >>>>>>> > Using Nutch to crawl websites and storing them at Hbase and >>>>>>> indexing at Solr via Gora? >>>>>>> >>>>>>> >>>>>>> I need a bit more help to ensure what I am about to do is correct.. >>>>>>> >>>>>>> #1. >>>>>>> after successfully build GORA. I have the following two .jar files >>>>>>> in /gora/gora-solr/lib/ directory. Lot of .jar files in the /lib >>>>>>> directory but only two .jar files relative to solr. >>>>>>> solr-solrj-4.10.3.jar >>>>>>> solr-core-4.10.3.jar >>>>>>> >>>>>>> >>>>>>> #2. >>>>>>> In the solr source distribution directory I have also see the same >>>>>>> exact .jar files. This is a source code download. I have not build >>>>>>> this >>>>>>> solr yet. >>>>>>> >>>>>>> /home/solr/dist >>>>>>> solr-solrj-4.10.3.jar >>>>>>> solr-core-4.10.3.jar >>>>>>> solr-4.10.3.war >>>>>>> >>>>>>> >>>>>>> My question is. Should I copy the two solr files in #1 to >>>>>>> /home/solr/dist/ then build solr? >>>>>>> >>>>>>> >>>>>>> #3. >>>>>>> Should I also do the same thing for hbase. Copy the >>>>>>> /gora/gora-hbase/lib/hbase-* into /hbase/lib/ then build hbase? >>>>>>> >>>>>>> >>>>>>> >>>>>>> Thank you. >>>>>>> Tom >>>>>>> >>>>>>> On Wed, Feb 17, 2016 at 5:31 PM, Furkan KAMACI < >>>>>>> furkankam...@gmail.com> wrote: >>>>>>> >>>>>>>> Hi Tom, >>>>>>>> >>>>>>>> What do you aim? Using Nutch to crawl websites and storing them at >>>>>>>> Hbase and indexing at Solr via Gora? Do you have any other use cases? >>>>>>>> >>>>>>>> "Simply", you may think that Gora will act as Hibernate of NoSQL >>>>>>>> ecosystem at your use case. So, it will not run as a service, it will >>>>>>>> be a >>>>>>>> dependency. >>>>>>>> >>>>>>>> Kind Regards, >>>>>>>> Furkan KAMACI >>>>>>>> 17 Şub 2016 22:13 tarihinde "Lewis John Mcgibbney" < >>>>>>>> lewis.mcgibb...@gmail.com> yazdı: >>>>>>>> >>>>>>>> Hi Tom, >>>>>>>>> You can just follow the following tutorial >>>>>>>>> http://wiki.apache.org/nutch/Nutch2Tutorial >>>>>>>>> Replacing the gora-hbase configuration from within your Nutch >>>>>>>>> conf/nutch-default.xml and conf/gora.properties and with the relevant >>>>>>>>> dependency from within ivy/ivy.xml with the gora-solr equivalent. >>>>>>>>> Any more issues then please let us know. Gora does not run as a >>>>>>>>> service no, it is a dependency and is managed through your client >>>>>>>>> dependency manager (which in Nutch 2.X is Ivy). >>>>>>>>> Thanks >>>>>>>>> >>>>>>>>> On Wed, Feb 17, 2016 at 12:04 PM, Tom Running < >>>>>>>>> runningt...@gmail.com> wrote: >>>>>>>>> >>>>>>>>>> Furkan and Lewis, >>>>>>>>>> >>>>>>>>>> Thank you for your response to my SOS. I tried varies suggestion >>>>>>>>>> on editing pom.xlm file and including down grade the java JDK >>>>>>>>>> version to >>>>>>>>>> 1.7 and removed the .m2 folder and run mvn clean install >>>>>>>>>> again and >>>>>>>>>> it build successfully. >>>>>>>>>> >>>>>>>>>> Now Gora is successfully build. I am trying to understand how to >>>>>>>>>> get Gora run or start in order get the following three packages to >>>>>>>>>> work >>>>>>>>>> together Nutch, Solr and Hbase with GORA >>>>>>>>>> Does Gora start as a service? >>>>>>>>>> Or >>>>>>>>>> To get other three packages to work with GORA I will need to copy >>>>>>>>>> the *.jar to the three packages (Nutch, Solr and Hbase) lib folder? >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> *I am a bit confuse on how to get these packages to work with >>>>>>>>>> GORA. I had read GORA's quickstart guide but am still not too clear >>>>>>>>>> on >>>>>>>>>> what to do.* >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> *Can you provide some direction.* >>>>>>>>>> >>>>>>>>>> *Thank you.* >>>>>>>>>> >>>>>>>>>> *Tom* >>>>>>>>>> >>>>>>>>>> On Wed, Feb 17, 2016 at 1:56 PM, Furkan KAMACI < >>>>>>>>>> furkankam...@gmail.com> wrote: >>>>>>>>>> >>>>>>>>>>> Hi Tom, >>>>>>>>>>> >>>>>>>>>>> It seems that your maven is at offline mode. There may be a >>>>>>>>>>> problem with your settings.xml or environment variable for maven >>>>>>>>>>> home. How >>>>>>>>>>> do you build your project? Could you build it with -X option and >>>>>>>>>>> send the >>>>>>>>>>> output? >>>>>>>>>>> >>>>>>>>>>> Kind Regards, >>>>>>>>>>> Furkan KAMACI >>>>>>>>>>> 17 Şub 2016 20:51 tarihinde "Tom Running" <runningt...@gmail.com> >>>>>>>>>>> yazdı: >>>>>>>>>>> >>>>>>>>>>> What to do with the error below. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> [INFO] Building Apache Gora :: Accumulo 0.6.1 >>>>>>>>>>> [INFO] >>>>>>>>>>> ------------------------------------------------------------------------ >>>>>>>>>>> [WARNING] The POM for >>>>>>>>>>> org.apache.accumulo:accumulo-core:jar:1.5.1 is missing, no >>>>>>>>>>> dependency >>>>>>>>>>> information available >>>>>>>>>>> [WARNING] The POM for >>>>>>>>>>> org.apache.accumulo:accumulo-minicluster:jar:1.5.1 is missing, no >>>>>>>>>>> dependency information available >>>>>>>>>>> [WARNING] The POM for org.jboss.netty:netty:jar:3.2.2.Final is >>>>>>>>>>> missing, no dependency information available >>>>>>>>>>> [INFO] >>>>>>>>>>> ------------------------------------------------------------------------ >>>>>>>>>>> [INFO] Reactor Summary: >>>>>>>>>>> [INFO] >>>>>>>>>>> [INFO] Apache Gora ........................................ >>>>>>>>>>> SUCCESS [ 1.468 s] >>>>>>>>>>> [INFO] Apache Gora :: Compiler ............................ >>>>>>>>>>> SUCCESS [ 0.121 s] >>>>>>>>>>> [INFO] Apache Gora :: Compiler-CLI ........................ >>>>>>>>>>> SUCCESS [ 0.032 s] >>>>>>>>>>> [INFO] Apache Gora :: Shims Hadoop ........................ >>>>>>>>>>> SUCCESS [ 0.543 s] >>>>>>>>>>> [INFO] Apache Gora :: Shims Hadoop 1.x .................... >>>>>>>>>>> SUCCESS [ 0.190 s] >>>>>>>>>>> [INFO] Apache Gora :: Shims Hadoop 2.x .................... >>>>>>>>>>> SUCCESS [ 0.295 s] >>>>>>>>>>> [INFO] Apache Gora :: Shims Distribution .................. >>>>>>>>>>> SUCCESS [ 0.026 s] >>>>>>>>>>> [INFO] Apache Gora :: Core ................................ >>>>>>>>>>> SUCCESS [ 0.806 s] >>>>>>>>>>> [INFO] Apache Gora :: Accumulo ............................ >>>>>>>>>>> FAILURE [ 0.120 s] >>>>>>>>>>> [INFO] Apache Gora :: Cassandra ........................... >>>>>>>>>>> SKIPPED >>>>>>>>>>> [INFO] Apache Gora :: GoraCI .............................. >>>>>>>>>>> SKIPPED >>>>>>>>>>> [INFO] Apache Gora :: HBase ............................... >>>>>>>>>>> SKIPPED >>>>>>>>>>> [INFO] Apache Gora :: MongoDB ............................. >>>>>>>>>>> SKIPPED >>>>>>>>>>> [INFO] Apache Gora :: Solr ................................ >>>>>>>>>>> SKIPPED >>>>>>>>>>> [INFO] Apache Gora :: Tutorial ............................ >>>>>>>>>>> SKIPPED >>>>>>>>>>> [INFO] Apache Gora :: Sources-Dist ........................ >>>>>>>>>>> SKIPPED >>>>>>>>>>> [INFO] >>>>>>>>>>> ------------------------------------------------------------------------ >>>>>>>>>>> [INFO] BUILD FAILURE >>>>>>>>>>> [INFO] >>>>>>>>>>> ------------------------------------------------------------------------ >>>>>>>>>>> [INFO] Total time: 6.359 s >>>>>>>>>>> [INFO] Finished at: 2016-02-17T02:00:39-05:00 >>>>>>>>>>> [INFO] Final Memory: 25M/61M >>>>>>>>>>> [INFO] >>>>>>>>>>> ------------------------------------------------------------------------ >>>>>>>>>>> [ERROR] Failed to execute goal on project gora-accumulo: Could >>>>>>>>>>> not resolve dependencies for project >>>>>>>>>>> org.apache.gora:gora-accumulo:bundle:0.6.1: The following artifacts >>>>>>>>>>> could >>>>>>>>>>> not be resolved: org.apache.gora:gora-core:jar:0.6.1, >>>>>>>>>>> org.apache.gora:gora-core:jar:tests:0.6.1, >>>>>>>>>>> org.apache.accumulo:accumulo-core:jar:1.5.1, >>>>>>>>>>> org.apache.accumulo:accumulo-minicluster:jar:1.5.1, >>>>>>>>>>> jline:jline:jar:0.9.1, >>>>>>>>>>> org.jboss.netty:netty:jar:3.2.2.Final, >>>>>>>>>>> org.codehaus.jackson:jackson-jaxrs:jar:1.8.3, >>>>>>>>>>> org.codehaus.jackson:jackson-xc:jar:1.8.3: Cannot access central ( >>>>>>>>>>> https://repo.maven.apache.org/maven2) in offline mode and the >>>>>>>>>>> artifact org.apache.gora:gora-core:jar:0.6.1 has not been >>>>>>>>>>> downloaded from >>>>>>>>>>> it before. -> [Help 1] >>>>>>>>>>> [ERROR] >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> -- >>>>>>>>> *Lewis* >>>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> *Lewis* >>>>>> >>>>> >>>>> >>>> >> > > -- > *Lewis* > >