Re: Zeppelin notes version control, scheduler and external deps

Asif Imran Sat, 05 Dec 2015 15:41:13 -0800

Hi Armen,

I remember having to make sure that /usr/lib/zeppelin/local-repo was owned
by user zeppelin


sudo chown zeppelin /usr/lib/zeppelin/local-repo

Asif

On Sat, Dec 5, 2015 at 10:43 AM, armen donigian <[email protected]> wrote:

> Follow up to my previous email regarding loading of external jars & Null
> Pointer Exception (NPE).
>
> '*/usr/lib/zeppelin/local-repo' *doesn't exist for user 'hadoop' on
> master node. Is it supposed to?
> I created '*/var/lib/zeppelin/local-repo*', then '*ln -s
> /var/lib/zeppelin/local-repo /usr/lib/zeppelin/local-repo*'...but still
> getting NPE error. Any suggestions?
>
> Btw, in an unrelated topic, does zeppelin support a feature to email a
> user the output of a note? Like unix processes would return a status code,
> a zeppelin note can return at minimum true (success) or false (failure).
>
>
> On Sat, Dec 5, 2015 at 12:18 AM Work <[email protected]> wrote:
>
>> 1. EMR does not currently provide anything like this for Zeppelin. (Good
>> idea though!) Zeppelin's built-in S3 notebook storage might help you,
>> especially if you turn on bucket versioning, I suppose, but I have not
>> tried this.
>>
>> 2. Yes, if you go to the ResourceManager on port 8088 then click the
>> ApplicationMaster link next to the Zeppelin app, you can get to the Spark
>> UI associated with the Zeppelin SparkContext (assuming you have first run a
>> notebook containing Spark code, otherwise the Zeppelin YARN app won't exist
>> yet).
>>
>> 3. Sorry, I have not tried using Zeppelin's notebook scheduler, but yes,
>> DataPipelines would probably provide you more reliability for production
>> batch ETL jobs. I don't know what your use case is, but maybe you could use
>> DataPipelines to generate some dataset that you store in S3 and can query
>> via Zeppelin?
>>
>> 4. This is a limitation of Zeppelin (really though, of Spark), not
>> specifically of Zeppelin on EMR, in that you must load any dependencies
>> before running any Spark code because the dependencies can only be loaded
>> once. However, once you solve this issue, you will run into a known issue
>> with Zeppelin on EMR where you hit a weird NPE that is caused by the
>> zeppelin user not having write access to /usr/lib/zeppelin/local-repo. I
>> would suggest creating /var/lib/zeppelin/local-repo then creating a symlink
>> from /usr/lib/zeppelin/local-repo to /var/lib/zeppelin/local-repo. We will
>> fix this in emr-4.3.0.
>>
>> ~ Jonathan
>>
>> —
>> Sent from Mailbox <https://www.dropbox.com/mailbox>
>>
>>
>> On Fri, Dec 4, 2015 at 11:18 PM, armen donigian <[email protected]>
>> wrote:
>>
>>> Hi all,
>>> Installed Zeppelin on Amazon EMR and it's running swell. Had a few
>>> questions...
>>>
>>> 1. How do we version control Zeppelin notes?
>>>
>>> 2. How do you check for status of a long running Zeppelin task? Is there
>>> a web UI for this or do you simply check the Resource Manager UI
>>> @master-node:8088 (in case of AWS)?
>>>
>>> 3. Are there any known issues/limitations of running Zeppelin note
>>> scheduler in production for batch ETL jobs? Trying to assess it vs Amazon
>>> Data Pipelines.
>>>
>>> 4. When trying to add an external jar, I'm getting this error.
>>> %dep
>>> z.reset()
>>> z.load("com.databricks:spark-redshift_2.10:0.5.2")
>>> Must be used before SparkInterpreter (%spark) initialized
>>>
>>> Thanks
>>>
>>
>>

Re: Zeppelin notes version control, scheduler and external deps

Reply via email to