Dear Liu:

I have tested this issue under Spark-1.1.0. The problem is solved under
this newer version.


On Wed, Nov 12, 2014 at 3:18 PM, Bo Liu <bli...@cse.ust.hk> wrote:

> Dear Liu:
>
> Thank you for your replay. I will set up an experimental environment for
> spark-1.1 and test it.
>
> On Wed, Nov 12, 2014 at 2:30 PM, Davies Liu-2 [via Apache Spark User List]
> <ml-node+s1001560n1868...@n3.nabble.com> wrote:
>
>> Yes, your broadcast should be about 300M, much smaller than 2G, I
>> didn't read your post carefully.
>>
>> The broadcast in Python had been improved much since 1.1, I think it
>> will work in 1.1 or upcoming 1.2 release, could you upgrade to 1.1?
>>
>> Davies
>>
>> On Tue, Nov 11, 2014 at 8:37 PM, bliuab <[hidden email]
>> <http://user/SendEmail.jtp?type=node&node=18684&i=0>> wrote:
>>
>> > Dear Liu:
>> >
>> > Thank you very much for your help. I will update that patch. By the
>> way, as
>> > I have succeed to broadcast an array of size(30M) the log said that
>> such
>> > array takes around 230MB memory. As a result, I think the numpy array
>> that
>> > leads to error is much smaller than 2G.
>> >
>> > On Wed, Nov 12, 2014 at 12:29 PM, Davies Liu-2 [via Apache Spark User
>> List]
>> > <[hidden email]> wrote:
>> >>
>> >> This PR fix the problem: https://github.com/apache/spark/pull/2659
>> >>
>> >> cc @josh
>> >>
>> >> Davies
>> >>
>> >> On Tue, Nov 11, 2014 at 7:47 PM, bliuab <[hidden email]> wrote:
>> >>
>> >> > In spark-1.0.2, I have come across an error when I try to broadcast
>> a
>> >> > quite
>> >> > large numpy array(with 35M dimension). The error information except
>> the
>> >> > java.lang.NegativeArraySizeException error and details is listed
>> below.
>> >> > Moreover, when broadcast a relatively smaller numpy array(30M
>> >> > dimension),
>> >> > everything works fine. And 30M dimension numpy array takes 230M
>> memory
>> >> > which, in my opinion, not very large.
>> >> > As far as I have surveyed, it seems related with py4j. However, I
>> have
>> >> > no
>> >> > idea how to fix  this. I would be appreciated if I can get some
>> hint.
>> >> > ------------
>> >> > py4j.protocol.Py4JError: An error occurred while calling
>> o23.broadcast.
>> >> > Trace:
>> >> > java.lang.NegativeArraySizeException
>> >> >         at py4j.Base64.decode(Base64.java:292)
>> >> >         at py4j.Protocol.getBytes(Protocol.java:167)
>> >> >         at py4j.Protocol.getObject(Protocol.java:276)
>> >> >         at
>> >> > py4j.commands.AbstractCommand.getArguments(AbstractCommand.java:81)
>> >> >         at py4j.commands.CallCommand.execute(CallCommand.java:77)
>> >> >         at py4j.GatewayConnection.run(GatewayConnection.java:207)
>> >> > -------------
>> >> > And the test code is a follows:
>> >> > conf =
>> >> >
>> >> > SparkConf().setAppName('brodyliu_LR').setMaster('spark://
>> 10.231.131.87:5051')
>> >> > conf.set('spark.executor.memory', '4000m')
>> >> > conf.set('spark.akka.timeout', '100000')
>> >> > conf.set('spark.ui.port','8081')
>> >> > conf.set('spark.cores.max','150')
>> >> > #conf.set('spark.rdd.compress', 'True')
>> >> > conf.set('spark.default.parallelism', '300')
>> >> > #configure the spark environment
>> >> > sc = SparkContext(conf=conf, batchSize=1)
>> >> >
>> >> > vec = np.random.rand(35000000)
>> >> > a = sc.broadcast(vec)
>> >> >
>> >> >
>> >> >
>> >> >
>> >> >
>> >> >
>> >> > --
>> >> > View this message in context:
>> >> >
>> http://apache-spark-user-list.1001560.n3.nabble.com/Pyspark-Error-when-broadcast-numpy-array-tp18662.html
>> >> > Sent from the Apache Spark User List mailing list archive at
>> Nabble.com.
>> >> >
>> >> >
>> ---------------------------------------------------------------------
>> >> > To unsubscribe, e-mail: [hidden email]
>> >> > For additional commands, e-mail: [hidden email]
>> >> >
>> >>
>> >> ---------------------------------------------------------------------
>> >> To unsubscribe, e-mail: [hidden email]
>> >> For additional commands, e-mail: [hidden email]
>> >>
>> >>
>> >>
>> >> ________________________________
>> >> If you reply to this email, your message will be added to the
>> discussion
>> >> below:
>> >>
>> >>
>> http://apache-spark-user-list.1001560.n3.nabble.com/Pyspark-Error-when-broadcast-numpy-array-tp18662p18673.html
>> >> To unsubscribe from Pyspark Error when broadcast numpy array, click
>> here.
>> >> NAML
>> >
>> >
>> >
>> >
>> > --
>> > My Homepage: www.cse.ust.hk/~bliuab
>> > MPhil student in Hong Kong University of Science and Technology.
>> > Clear Water Bay, Kowloon, Hong Kong.
>> > Profile at LinkedIn.
>> >
>> > ________________________________
>> > View this message in context: Re: Pyspark Error when broadcast numpy
>> array
>> >
>> > Sent from the Apache Spark User List mailing list archive at
>> Nabble.com.
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: [hidden email]
>> <http://user/SendEmail.jtp?type=node&node=18684&i=1>
>> For additional commands, e-mail: [hidden email]
>> <http://user/SendEmail.jtp?type=node&node=18684&i=2>
>>
>>
>>
>> ------------------------------
>>  If you reply to this email, your message will be added to the
>> discussion below:
>>
>> http://apache-spark-user-list.1001560.n3.nabble.com/Pyspark-Error-when-broadcast-numpy-array-tp18662p18684.html
>>  To unsubscribe from Pyspark Error when broadcast numpy array, click here
>> <http://apache-spark-user-list.1001560.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_code&node=18662&code=YmxpdWFiQGNzZS51c3QuaGt8MTg2NjJ8NTUwMDMxMjYz>
>> .
>> NAML
>> <http://apache-spark-user-list.1001560.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html%21nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace&breadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml>
>>
>
>
>
> --
> My Homepage: www.cse.ust.hk/~bliuab
> MPhil student in Hong Kong University of Science and Technology.
> Clear Water Bay, Kowloon, Hong Kong.
> Profile at LinkedIn <http://www.linkedin.com/pub/liu-bo/55/52b/10b>.
>



-- 
My Homepage: www.cse.ust.hk/~bliuab
MPhil student in Hong Kong University of Science and Technology.
Clear Water Bay, Kowloon, Hong Kong.
Profile at LinkedIn <http://www.linkedin.com/pub/liu-bo/55/52b/10b>.




--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Pyspark-Error-when-broadcast-numpy-array-tp18662p18805.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

Reply via email to