Hi, please mind that flag -Pyspark will download full spark binary distribution, so it might take a while for the first time you do it.
That was the main reason behind hiding it under the separate profile. On Thu, Jul 9, 2015 at 11:53 PM, moon soo Lee <m...@apache.org> wrote: > You can still manually configure all the environment variables and > properties for pyspark, but it is suggested to build with -Ppyspark from > now. > > Thanks, > moon > > > On Wed, Jul 8, 2015 at 10:59 PM IT CTO <goi....@gmail.com> wrote: >> >> Does this means that everyone who wants pySpark to work should use this >> option in the build from now on or is that going to be the default like >> spark 1.4 ? >> Eran >> >> On Thu, Jul 9, 2015 at 12:14 AM moon soo Lee <m...@apache.org> wrote: >>> >>> If your source code is older than 3 days? Because of -Ppyspark is merged >>> about 3 days ago. >>> >>> Thanks, >>> moon >>> >>> >>> On Wed, Jul 8, 2015 at 1:58 PM Vadla, Karthik <karthik.va...@intel.com> >>> wrote: >>>> >>>> I’m using this .zip https://github.com/apache/incubator-zeppelin >>>> >>>> >>>> >>>> Thanks >>>> >>>> Karthik >>>> >>>> >>>> >>>> From: moon soo Lee [mailto:m...@apache.org] >>>> Sent: Wednesday, July 8, 2015 1:37 PM >>>> To: users@zeppelin.incubator.apache.org >>>> Subject: Re: Not able to see registered table records and Pyspark not >>>> working >>>> >>>> >>>> >>>> Are you building on latest master? >>>> >>>> On Wed, Jul 8, 2015 at 1:34 PM Vadla, Karthik <karthik.va...@intel.com> >>>> wrote: >>>> >>>> Hi Moon, >>>> >>>> >>>> >>>> Yeah I tried below command. The build was successful, but at the end I >>>> got warning message as below >>>> >>>> [WARNING] The requested profile "pyspark" could not be activated because >>>> it does not exist. >>>> >>>> >>>> >>>> >>>> >>>> Pyspark exists on machine. Do I need to anything further. >>>> >>>> >>>> >>>> Thanks >>>> >>>> Karthik >>>> >>>> From: moon soo Lee [mailto:m...@apache.org] >>>> Sent: Wednesday, July 8, 2015 10:58 AM >>>> >>>> >>>> To: users@zeppelin.incubator.apache.org >>>> Subject: Re: Not able to see registered table records and Pyspark not >>>> working >>>> >>>> >>>> >>>> Hi >>>> >>>> >>>> >>>> I was meaning adding -Ppyspark profile, like >>>> >>>> >>>> >>>> mvn clean package -Pspark-1.3 -Ppyspark -Dhadoop.version=2.6.0-cdh5.4.0 >>>> -Phadoop-2.6 –DskipTests >>>> >>>> Thanks, >>>> >>>> moon >>>> >>>> On Wed, Jul 8, 2015 at 10:43 AM Vadla, Karthik <karthik.va...@intel.com> >>>> wrote: >>>> >>>> Hi Moon, >>>> >>>> >>>> >>>> You mean to say I need to build something like this. >>>> >>>> mvn clean package -Ppyspark-1.3 -Dhadoop.version=2.6.0-cdh5.4.0 >>>> -Phadoop-2.6 –DskipTests >>>> >>>> >>>> >>>> I have built my zeppelin with below command previously >>>> >>>> mvn clean package -Pspark-1.3 -Dhadoop.version=2.6.0-cdh5.4.0 >>>> -Phadoop-2.6 –DskipTests >>>> >>>> >>>> >>>> >>>> >>>> Thanks >>>> >>>> Karthik >>>> >>>> From: moon soo Lee [mailto:m...@apache.org] >>>> Sent: Wednesday, July 8, 2015 10:20 AM >>>> To: users@zeppelin.incubator.apache.org >>>> Subject: Re: Not able to see registered table records and Pyspark not >>>> working >>>> >>>> >>>> >>>> Hi, >>>> >>>> >>>> >>>> If you build latest master branch with -Ppyspark maven profile, it'll >>>> help pyspark work without setting those environment variables. >>>> >>>> Hope this helps. >>>> >>>> >>>> >>>> Best, >>>> >>>> moon >>>> >>>> >>>> >>>> On Tue, Jul 7, 2015 at 3:47 PM Vadla, Karthik <karthik.va...@intel.com> >>>> wrote: >>>> >>>> Hi All, >>>> >>>> >>>> >>>> This part is commented in zeppelin-env.sh in my conf folder. >>>> >>>> >>>> >>>> # Pyspark (supported with Spark 1.2.1 and above) >>>> >>>> # To configure pyspark, you need to set spark distribution's path to >>>> 'spark.home' property in Interpreter setting screen in Zeppelin GUI >>>> >>>> # export PYSPARK_PYTHON # path to the python command. must be >>>> the same path on the driver(Zeppelin) and all workers. >>>> >>>> # export PYTHONPATH # extra PYTHONPATH. >>>> >>>> >>>> >>>> Can you anyone help how to setup those. >>>> >>>> >>>> >>>> Appreciate your help. >>>> >>>> >>>> >>>> Thanks >>>> >>>> Karthik >>>> >>>> >>>> >>>> From: Vadla, Karthik [mailto:karthik.va...@intel.com] >>>> Sent: Tuesday, July 7, 2015 3:29 PM >>>> To: users@zeppelin.incubator.apache.org >>>> Subject: RE: Not able to see registered table records and Pyspark not >>>> working >>>> >>>> >>>> >>>> Hi Moon, >>>> >>>> >>>> >>>> Thanks for that. >>>> The problem is with my parsing. I resolved it. >>>> >>>> >>>> >>>> I have another question to ask. >>>> >>>> I’m just trying to run print command using pyspark interpreter. >>>> It is not responding . >>>> >>>> >>>> >>>> When I look at the log, I don’t have information except this >>>> >>>> >>>> >>>> INFO [2015-07-07 15:19:17,702] ({pool-1-thread-41} >>>> SchedulerFactory.java[jobStarted]:132) - Job >>>> paragraph_1436305204170_601291630 started by scheduler >>>> remoteinterpreter_267235421 >>>> >>>> INFO [2015-07-07 15:19:17,702] ({pool-1-thread-41} >>>> Paragraph.java[jobRun]:194) - run paragraph 20150707-144004_475199059 using >>>> pyspark org.apache.zeppelin.interpreter.LazyOpenInterpreter@33a625a7 >>>> >>>> INFO [2015-07-07 15:19:17,702] ({pool-1-thread-41} >>>> Paragraph.java[jobRun]:211) - RUN : list=range(1,10) >>>> >>>> print(list) >>>> >>>> INFO [2015-07-07 15:19:18,060] ({Thread-255} >>>> NotebookServer.java[broadcast]:251) - SEND >> PROGRESS >>>> >>>> INFO [2015-07-07 15:19:18,678] ({Thread-255} >>>> NotebookServer.java[broadcast]:251) - SEND >> PROGRESS >>>> >>>> INFO [2015-07-07 15:19:19,278] ({Thread-255} >>>> NotebookServer.java[broadcast]:251) - SEND >> PROGRESS >>>> >>>> INFO [2015-07-07 15:19:19,879] ({Thread-255} >>>> NotebookServer.java[broadcast]:251) - SEND >> PROGRESS >>>> >>>> >>>> >>>> >>>> >>>> Do I need to do any config settings in zeppelin-env.sh or >>>> zeppelin-site.xml??? >>>> >>>> >>>> >>>> >>>> >>>> Thanks >>>> >>>> Karthik >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> From: moon soo Lee [mailto:m...@apache.org] >>>> Sent: Friday, July 3, 2015 2:31 PM >>>> To: users@zeppelin.incubator.apache.org >>>> Subject: Re: Not able to see registered table records >>>> >>>> >>>> >>>> Hi, >>>> >>>> >>>> >>>> Could you try this branch? >>>> https://github.com/apache/incubator-zeppelin/pull/136 >>>> >>>> >>>> >>>> It'll give you better stacktrace than just displaying >>>> "java.lang.reflect.InvocationTargetException" >>>> >>>> >>>> >>>> Thanks, >>>> >>>> moon >>>> >>>> >>>> >>>> On Thu, Jul 2, 2015 at 10:34 AM Vadla, Karthik <karthik.va...@intel.com> >>>> wrote: >>>> >>>> Hi All. >>>> >>>> >>>> >>>> I just registered a tables using below code >>>> >>>> >>>> >>>> val eduText = >>>> sc.textFile("hdfs://ip.address/user/karthik/education.csv") >>>> >>>> >>>> >>>> case class Education(unitid:Integer, instnm:String, addr : String, city >>>> : String, stabbr : String, zip : Integer) >>>> >>>> >>>> >>>> val education = >>>> eduText.map(s=>s.split(",")).filter(s=>s(0)!="UNITID").map( >>>> >>>> s=>Education(s(0).toInt, >>>> >>>> s(1).replaceAll("\"", ""), >>>> >>>> s(2).replaceAll("\"", ""), >>>> >>>> s(3).replaceAll("\"", ""), >>>> >>>> s(4).replaceAll("\"", ""), >>>> >>>> s(5).replaceAll("\"", "").toInt >>>> >>>> ) >>>> >>>> ) >>>> >>>> >>>> >>>> // Below line works only in spark 1.3.0. >>>> >>>> // For spark 1.1.x and spark 1.2.x, >>>> >>>> // use bank.registerTempTable("bank") instead. >>>> >>>> >>>> >>>> education.toDF().registerTempTable("education") >>>> >>>> >>>> >>>> when I run “%sql show tables” >>>> >>>> >>>> >>>> It displays table “education” >>>> >>>> >>>> >>>> But when I try to run the command “%sql select count(*) from education”. >>>> It is throwing below error. >>>> >>>> >>>> >>>> java.lang.reflect.InvocationTargetException >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> Can anyone help me with this. >>>> >>>> Appreciate your help. >>>> >>>> >>>> >>>> And I enclosed .csv file used to register table. >>>> >>>> >>>> >>>> Thanks >>>> >>>> Karthik -- -- Kind regards, Alexander.