Re: Not able to see registered table records and Pyspark not working

Alexander Bezzubov Thu, 09 Jul 2015 17:01:14 -0700

Hi,

please mind that flag -Pyspark will download full spark binary
distribution, so it might take a while for the first time you do it.


That was the main reason behind hiding it under the separate profile.



On Thu, Jul 9, 2015 at 11:53 PM, moon soo Lee <m...@apache.org> wrote:
> You can still manually configure all the environment variables and
> properties for pyspark, but it is suggested to build with -Ppyspark from
> now.
>
> Thanks,
> moon
>
>
> On Wed, Jul 8, 2015 at 10:59 PM IT CTO <goi....@gmail.com> wrote:
>>
>> Does this means that everyone who wants pySpark to work should use this
>> option in the build from now on or is that going to be the default like
>> spark 1.4 ?
>> Eran
>>
>> On Thu, Jul 9, 2015 at 12:14 AM moon soo Lee <m...@apache.org> wrote:
>>>
>>> If your source code is older than 3 days? Because of -Ppyspark is merged
>>> about 3 days ago.
>>>
>>> Thanks,
>>> moon
>>>
>>>
>>> On Wed, Jul 8, 2015 at 1:58 PM Vadla, Karthik <karthik.va...@intel.com>
>>> wrote:
>>>>
>>>> I’m using this .zip https://github.com/apache/incubator-zeppelin
>>>>
>>>>
>>>>
>>>> Thanks
>>>>
>>>> Karthik
>>>>
>>>>
>>>>
>>>> From: moon soo Lee [mailto:m...@apache.org]
>>>> Sent: Wednesday, July 8, 2015 1:37 PM
>>>> To: users@zeppelin.incubator.apache.org
>>>> Subject: Re: Not able to see registered table records and Pyspark not
>>>> working
>>>>
>>>>
>>>>
>>>> Are you building on latest master?
>>>>
>>>> On Wed, Jul 8, 2015 at 1:34 PM Vadla, Karthik <karthik.va...@intel.com>
>>>> wrote:
>>>>
>>>> Hi Moon,
>>>>
>>>>
>>>>
>>>> Yeah I tried below command. The build was successful, but at the end I
>>>> got warning message as below
>>>>
>>>> [WARNING] The requested profile "pyspark" could not be activated because
>>>> it does not exist.
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> Pyspark exists on machine. Do I need to anything further.
>>>>
>>>>
>>>>
>>>> Thanks
>>>>
>>>> Karthik
>>>>
>>>> From: moon soo Lee [mailto:m...@apache.org]
>>>> Sent: Wednesday, July 8, 2015 10:58 AM
>>>>
>>>>
>>>> To: users@zeppelin.incubator.apache.org
>>>> Subject: Re: Not able to see registered table records and Pyspark not
>>>> working
>>>>
>>>>
>>>>
>>>> Hi
>>>>
>>>>
>>>>
>>>> I was meaning adding -Ppyspark profile, like
>>>>
>>>>
>>>>
>>>> mvn clean package -Pspark-1.3 -Ppyspark -Dhadoop.version=2.6.0-cdh5.4.0
>>>> -Phadoop-2.6 –DskipTests
>>>>
>>>> Thanks,
>>>>
>>>> moon
>>>>
>>>> On Wed, Jul 8, 2015 at 10:43 AM Vadla, Karthik <karthik.va...@intel.com>
>>>> wrote:
>>>>
>>>> Hi Moon,
>>>>
>>>>
>>>>
>>>> You mean to say I need to build something like this.
>>>>
>>>> mvn clean package -Ppyspark-1.3 -Dhadoop.version=2.6.0-cdh5.4.0
>>>> -Phadoop-2.6 –DskipTests
>>>>
>>>>
>>>>
>>>> I have built my zeppelin with below command previously
>>>>
>>>> mvn clean package -Pspark-1.3 -Dhadoop.version=2.6.0-cdh5.4.0
>>>> -Phadoop-2.6 –DskipTests
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> Thanks
>>>>
>>>> Karthik
>>>>
>>>> From: moon soo Lee [mailto:m...@apache.org]
>>>> Sent: Wednesday, July 8, 2015 10:20 AM
>>>> To: users@zeppelin.incubator.apache.org
>>>> Subject: Re: Not able to see registered table records and Pyspark not
>>>> working
>>>>
>>>>
>>>>
>>>> Hi,
>>>>
>>>>
>>>>
>>>> If you build latest master branch with -Ppyspark maven profile, it'll
>>>> help pyspark work without setting those environment variables.
>>>>
>>>> Hope this helps.
>>>>
>>>>
>>>>
>>>> Best,
>>>>
>>>> moon
>>>>
>>>>
>>>>
>>>> On Tue, Jul 7, 2015 at 3:47 PM Vadla, Karthik <karthik.va...@intel.com>
>>>> wrote:
>>>>
>>>> Hi All,
>>>>
>>>>
>>>>
>>>> This part is commented in zeppelin-env.sh in my conf folder.
>>>>
>>>>
>>>>
>>>> # Pyspark (supported with Spark 1.2.1 and above)
>>>>
>>>> # To configure pyspark, you need to set spark distribution's path to
>>>> 'spark.home' property in Interpreter setting screen in Zeppelin GUI
>>>>
>>>> # export PYSPARK_PYTHON          # path to the python command. must be
>>>> the same path on the driver(Zeppelin) and all workers.
>>>>
>>>> # export PYTHONPATH              # extra PYTHONPATH.
>>>>
>>>>
>>>>
>>>> Can you anyone help how to setup those.
>>>>
>>>>
>>>>
>>>> Appreciate your help.
>>>>
>>>>
>>>>
>>>> Thanks
>>>>
>>>> Karthik
>>>>
>>>>
>>>>
>>>> From: Vadla, Karthik [mailto:karthik.va...@intel.com]
>>>> Sent: Tuesday, July 7, 2015 3:29 PM
>>>> To: users@zeppelin.incubator.apache.org
>>>> Subject: RE: Not able to see registered table records and Pyspark not
>>>> working
>>>>
>>>>
>>>>
>>>> Hi Moon,
>>>>
>>>>
>>>>
>>>> Thanks for that.
>>>> The problem is with my parsing. I resolved it.
>>>>
>>>>
>>>>
>>>> I have another question to ask.
>>>>
>>>> I’m just trying to run print command using pyspark interpreter.
>>>> It is not responding .
>>>>
>>>>
>>>>
>>>> When I look at the log, I don’t have information except this
>>>>
>>>>
>>>>
>>>> INFO [2015-07-07 15:19:17,702] ({pool-1-thread-41}
>>>> SchedulerFactory.java[jobStarted]:132) - Job
>>>> paragraph_1436305204170_601291630 started by scheduler
>>>> remoteinterpreter_267235421
>>>>
>>>> INFO [2015-07-07 15:19:17,702] ({pool-1-thread-41}
>>>> Paragraph.java[jobRun]:194) - run paragraph 20150707-144004_475199059 using
>>>> pyspark org.apache.zeppelin.interpreter.LazyOpenInterpreter@33a625a7
>>>>
>>>> INFO [2015-07-07 15:19:17,702] ({pool-1-thread-41}
>>>> Paragraph.java[jobRun]:211) - RUN : list=range(1,10)
>>>>
>>>> print(list)
>>>>
>>>> INFO [2015-07-07 15:19:18,060] ({Thread-255}
>>>> NotebookServer.java[broadcast]:251) - SEND >> PROGRESS
>>>>
>>>> INFO [2015-07-07 15:19:18,678] ({Thread-255}
>>>> NotebookServer.java[broadcast]:251) - SEND >> PROGRESS
>>>>
>>>> INFO [2015-07-07 15:19:19,278] ({Thread-255}
>>>> NotebookServer.java[broadcast]:251) - SEND >> PROGRESS
>>>>
>>>> INFO [2015-07-07 15:19:19,879] ({Thread-255}
>>>> NotebookServer.java[broadcast]:251) - SEND >> PROGRESS
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> Do I need to do any config settings in zeppelin-env.sh or
>>>> zeppelin-site.xml???
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> Thanks
>>>>
>>>> Karthik
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> From: moon soo Lee [mailto:m...@apache.org]
>>>> Sent: Friday, July 3, 2015 2:31 PM
>>>> To: users@zeppelin.incubator.apache.org
>>>> Subject: Re: Not able to see registered table records
>>>>
>>>>
>>>>
>>>> Hi,
>>>>
>>>>
>>>>
>>>> Could you try this branch?
>>>> https://github.com/apache/incubator-zeppelin/pull/136
>>>>
>>>>
>>>>
>>>> It'll give you better stacktrace than just displaying
>>>> "java.lang.reflect.InvocationTargetException"
>>>>
>>>>
>>>>
>>>> Thanks,
>>>>
>>>> moon
>>>>
>>>>
>>>>
>>>> On Thu, Jul 2, 2015 at 10:34 AM Vadla, Karthik <karthik.va...@intel.com>
>>>> wrote:
>>>>
>>>> Hi All.
>>>>
>>>>
>>>>
>>>> I just registered a tables using below code
>>>>
>>>>
>>>>
>>>> val eduText =
>>>> sc.textFile("hdfs://ip.address/user/karthik/education.csv")
>>>>
>>>>
>>>>
>>>> case class Education(unitid:Integer, instnm:String, addr : String, city
>>>> : String, stabbr : String, zip : Integer)
>>>>
>>>>
>>>>
>>>> val education =
>>>> eduText.map(s=>s.split(",")).filter(s=>s(0)!="UNITID").map(
>>>>
>>>>     s=>Education(s(0).toInt,
>>>>
>>>>             s(1).replaceAll("\"", ""),
>>>>
>>>>             s(2).replaceAll("\"", ""),
>>>>
>>>>             s(3).replaceAll("\"", ""),
>>>>
>>>>             s(4).replaceAll("\"", ""),
>>>>
>>>>             s(5).replaceAll("\"", "").toInt
>>>>
>>>>         )
>>>>
>>>> )
>>>>
>>>>
>>>>
>>>> // Below line works only in spark 1.3.0.
>>>>
>>>> // For spark 1.1.x and spark 1.2.x,
>>>>
>>>> // use bank.registerTempTable("bank") instead.
>>>>
>>>>
>>>>
>>>> education.toDF().registerTempTable("education")
>>>>
>>>>
>>>>
>>>> when I run “%sql show tables”
>>>>
>>>>
>>>>
>>>> It displays table “education”
>>>>
>>>>
>>>>
>>>> But when I try to run the command “%sql select count(*) from education”.
>>>> It is throwing below error.
>>>>
>>>>
>>>>
>>>> java.lang.reflect.InvocationTargetException
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> Can anyone help me with this.
>>>>
>>>> Appreciate your help.
>>>>
>>>>
>>>>
>>>> And I enclosed .csv file used to register table.
>>>>
>>>>
>>>>
>>>> Thanks
>>>>
>>>> Karthik



-- 
--
Kind regards,
Alexander.

Re: Not able to see registered table records and Pyspark not working

Reply via email to