Re: Closure issue with spark 1.4.1

moon soo Lee Mon, 31 Aug 2015 22:38:56 -0700

Hi David, Jerry,

There're series of efforts to improve spark integration.


Work with provided version of Spark
https://issues.apache.org/jira/browse/ZEPPELIN-160

Self diagnostics of configuration
https://issues.apache.org/jira/browse/ZEPPELIN-256

Use spark-submit to run spark interpreter process
https://issues.apache.org/jira/browse/ZEPPELIN-262

I saw many people struggled with configuring spark in Zeppelin with various
environments in the mailing list.
ZEPPELIN-262 will virtually solve all the problems around configuration
with Spark.

Thanks for sharing your problems and feedback. That enables zeppelin make
progress.

Best,
moon

On Mon, Aug 31, 2015 at 9:17 PM Jerry Lam <chiling...@gmail.com> wrote:

> Hi David,
>
> We gave up on zeppelin because of the lack of support. It seems that
> zeppelin has a lot of fancy features but lack of depth. Only time will tell
> if zeppelin can overcome those limitations.
>
> Good luck,
>
> Jerry
>
> On Mon, Aug 31, 2015 at 8:17 AM, David Salinas <
> david.salinas....@gmail.com> wrote:
>
>> Hi all,
>>
>> Has anyone been able to reproduce the error with the last code snipplet I
>> gave? It fails 100% of the time on cluster for me.
>> This serialization issue asking for ZeppelinContext comes also in many
>> other cases in my setting where it should not be the case as it works fine
>> with spark shell.
>>
>> Best regards,
>>
>> David
>>
>> On Mon, Aug 24, 2015 at 9:07 PM, Jerry Lam <chiling...@gmail.com> wrote:
>>
>>> Hi Zeppelin developers,
>>>
>>> This issue sounds very serious. Is this specific to David's use case
>>> here?
>>>
>>> Best Regards,
>>>
>>> Jerry
>>>
>>> On Mon, Aug 24, 2015 at 1:28 PM, David Salinas <
>>> david.salinas....@gmail.com> wrote:
>>>
>>>> I have looked at the SparkInterpreter.java code and this is indeed the
>>>> issue. Whenever one uses an instruction with z.input("...") something then
>>>> no spark transformation can work as z will be shipped to the slaves where
>>>> Zeppelin is not installed as showed by the example I sent.
>>>> A workaround could be to interpret separately the variables (by
>>>> defining a map of variables before interpreting).
>>>>
>>>> Best,
>>>>
>>>> David
>>>>
>>>>
>>>> On Mon, Aug 24, 2015 at 6:45 PM, David Salinas <
>>>> david.salinas....@gmail.com> wrote:
>>>>
>>>>> Hi Moon,
>>>>>
>>>>> I found another way to reproduce the problem:
>>>>>
>>>>> //cell 1 does not work
>>>>> val file = "hdfs://someclusterfile.json"
>>>>> val s = z.input("Foo").toString
>>>>> val textFile = sc.textFile(file)
>>>>> textFile.filter(_.contains(s)).count
>>>>> //org.apache.spark.SparkException: Job aborted due to stage failure:
>>>>> Task 41 in stage 5.0 failed 4 times, most recent failure: Lost task 41.3 
>>>>> in
>>>>> stage 5.0 (TID 2735,XXX.com ): java.lang.NoClassDefFoundError:
>>>>> Lorg/apache/zeppelin/spark/ZeppelinContext;
>>>>>
>>>>> // cell 2 works
>>>>> val file = "hdfs://someclusterfile.json"
>>>>> val s = "Y"
>>>>> val textFile = sc.textFile(file)
>>>>> textFile.filter(_.contains(s)).count
>>>>> //res19: Long = 109
>>>>>
>>>>> This kind of issue happens often also when using variables from other
>>>>> cells and also when taking closure for transformation. Maybe you are
>>>>> reading variables inside the transformation with something like
>>>>> "z.get("s")" which causes z to be send to the slaves as one of its member
>>>>> is used (although I also sometimes have this issue without using anything
>>>>> from other cells).
>>>>>
>>>>> Best,
>>>>>
>>>>> David
>>>>>
>>>>>
>>>>> On Mon, Aug 24, 2015 at 10:34 AM, David Salinas <
>>>>> david.salinas....@gmail.com> wrote:
>>>>>
>>>>>> Sorry I forgot to mention my environment:
>>>>>> mesos 0.17, spark 1.4.1, scala 2.10.4, java 1.8
>>>>>>
>>>>>> On Mon, Aug 24, 2015 at 10:32 AM, David Salinas <
>>>>>> david.salinas....@gmail.com> wrote:
>>>>>>
>>>>>>> Hi Moon,
>>>>>>>
>>>>>>> Today I cannot reproduce the bug with elementary example either but
>>>>>>> it is still impacting all my notebooks. The weird thing is that when
>>>>>>> calling a transformation with map, it takes Zeppelin Context in the 
>>>>>>> closure
>>>>>>> which gives these java.lang.NoClassDefFoundError:
>>>>>>> Lorg/apache/zeppelin/spark/ZeppelinContext errors (spark shell run this
>>>>>>> command without any problem). I will try to find another example that is
>>>>>>> more persistent (it is weird this example was failing yesterday). Do you
>>>>>>> have any idea of what could cause Zeppelin Context to be included in the
>>>>>>> closure?
>>>>>>>
>>>>>>> Best,
>>>>>>>
>>>>>>> David
>>>>>>>
>>>>>>>
>>>>>>> On Fri, Aug 21, 2015 at 6:29 PM, moon soo Lee <m...@apache.org>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> I have tested your code and can not reproduce the problem.
>>>>>>>>
>>>>>>>> Could you share your environment? how did you configure Zeppelin
>>>>>>>> with Spark?
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> moon
>>>>>>>>
>>>>>>>> On Fri, Aug 21, 2015 at 2:25 AM David Salinas <
>>>>>>>> david.salinas....@gmail.com> wrote:
>>>>>>>>
>>>>>>>>> Hi,
>>>>>>>>>
>>>>>>>>> I have a problem when using spark closure. This error was not
>>>>>>>>> appearing with spark 1.2.1.
>>>>>>>>>
>>>>>>>>> I have included a reproducible example that happens when taking
>>>>>>>>> the closure (Zeppelin has been built with head of master with this 
>>>>>>>>> command
>>>>>>>>> mvn install -DskipTests -Pspark-1.4 -Dspark.version=1.4.1
>>>>>>>>> -Dhadoop.version=2.2.0 -Dprotobuf.version=2.5.0). Does anyone ever
>>>>>>>>> encountered this problem? All my previous notebooks are broken by 
>>>>>>>>> this :(
>>>>>>>>>
>>>>>>>>> ------------------------------
>>>>>>>>> val textFile = sc.textFile("hdfs://somefile.txt")
>>>>>>>>>
>>>>>>>>> val f = (s: String) => s+s
>>>>>>>>> textFile.map(f).count
>>>>>>>>> //works fine
>>>>>>>>> //res145: Long = 407
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> def f(s:String) = {
>>>>>>>>>     s+s
>>>>>>>>> }
>>>>>>>>> textFile.map(f).count
>>>>>>>>>
>>>>>>>>> //fails ->
>>>>>>>>>
>>>>>>>>> org.apache.spark.SparkException: Job aborted due to stage failure:
>>>>>>>>> Task 566 in stage 87.0 failed 4 times, most recent failure: Lost task 
>>>>>>>>> 566.3
>>>>>>>>> in stage 87.0 (TID 43396, XXX.com): java.lang.NoClassDefFoundError:
>>>>>>>>> Lorg/apache/zeppelin/spark/ZeppelinContext; at
>>>>>>>>> java.lang.Class.getDeclaredFields0(Native Method) at
>>>>>>>>> java.lang.Class.privateGetDeclaredFields(Class.java:2583) at
>>>>>>>>> java.lang.Class.getDeclaredField(Class.java:2068) ...
>>>>>>>>> java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1924)
>>>>>>>>> at
>>>>>>>>> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801)
>>>>>>>>> at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351) 
>>>>>>>>> at
>>>>>>>>> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2000)
>>>>>>>>>  at
>>>>>>>>>
>>>>>>>>> Best,
>>>>>>>>>
>>>>>>>>> David
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Re: Closure issue with spark 1.4.1

Reply via email to