Re: spark broadcast unavailable

Haoyuan Li Wed, 10 Dec 2014 10:20:08 -0800

Which Hadoop version are you using? Seems the exception you got was caused
by incompatible hadoop version.


Best,

Haoyuan

On Wed, Dec 10, 2014 at 12:30 AM, 十六夜涙 <cr...@qq.com> wrote:

> Hi All,
> I'v read official docs of tachyon,It seems not fit my usage,For my
> understanding,‍It just cache files in memory,but I have a file contains
> over million lines amount about 70mb,retrieveing data and mapping to a
> *Map* varible will costs over serveral minuts,which I dont want to
> process it each time in map function.since tachyon occurs another problem
> raise an exception while doing *./bin/tachyon format*
> The exception:
> Exception in thread "main" java.lang.RuntimeException:
> org.apache.hadoop.ipc.RemoteException: Server IPC version 9 cannot
> communicate with client version 4
> ‍It seems there's a compatibility problem with hadoop,but even solved it
> there's still an efficient issue as I described above.‍‍
> could somebody tell me how to  persist the data in memory.for now I just
> broadcast it, and re-submit spark application while the broadcast value
> unavaible.‍
>
>
>
> ------------------ 原始邮件 ------------------
> *发件人:* "Akhil Das";<ak...@sigmoidanalytics.com>;
> *发送时间:* 2014年12月9日(星期二) 下午3:42
> *收件人:* "十六夜涙"<cr...@qq.com>;
> *抄送:* "user"<u...@spark.incubator.apache.org>;
> *主题:* Re: spark broadcast unavailable
>
> You cannot pass the sc object (*val b = Utils.load(sc,ip_lib_path)*)
> inside a map function and that's why the Serialization exception is popping
> up( since sc is not serializable). You can try tachyon's cache if you want
> to persist the data in memory kind of forever.
>
> Thanks
> Best Regards
>
> On Tue, Dec 9, 2014 at 12:12 PM, 十六夜涙 <cr...@qq.com> wrote:
>
>> Hi all
>>     In my spark application,I load a csv file and map the datas to a Map
>> vairable for later uses on driver node ,then broadcast it,every thing works
>> fine untill the exception java.io.FileNotFoundException occurs.the console
>> log information shows me the broadcast unavailable,I googled this
>> problem,says spark will  clean up the broadcast,while these's an
>> solution,the author mentioned about re-broadcast,I followed this
>> way,written some exception handle code with `try` ,`catch`.after compliling
>> and submitting the jar,I faced anthoner problem,It shows " task
>> not serializable‍".‍‍‍
>> so here I have  there options:
>> 1,get the right way persisting broadcast
>> 2,solve the "task not serializable" problem re-broadcast variable
>> 3,save the data to some kind of database,although I prefer save data in
>> memory.
>>
>> here is come code snippets:
>>   val esRdd = kafkaDStreams.flatMap(_.split("\\n"))
>>       .map{
>>       case esregex(datetime, time_request) =>
>>         var ipInfo:Array[String]=Array.empty
>>         try{
>>             ipInfo = Utils.getIpInfo(client_ip,b.value)
>>         }catch{
>>           case e:java.io.FileNotFoundException =>{
>>             val b = Utils.load(sc,ip_lib_path)
>>             ipInfo = Utils.getIpInfo(client_ip,b.value)
>>           }
>>         }
>> ‍
>>
>
>


-- 
Haoyuan Li
AMPLab, EECS, UC Berkeley
http://www.cs.berkeley.edu/~haoyuan/

Re: spark broadcast unavailable

Reply via email to