Dear Liu: I have tested this issue under Spark-1.1.0. The problem is solved under this newer version.
On Wed, Nov 12, 2014 at 3:18 PM, Bo Liu <bli...@cse.ust.hk> wrote: > Dear Liu: > > Thank you for your replay. I will set up an experimental environment for > spark-1.1 and test it. > > On Wed, Nov 12, 2014 at 2:30 PM, Davies Liu-2 [via Apache Spark User List] > <ml-node+s1001560n1868...@n3.nabble.com> wrote: > >> Yes, your broadcast should be about 300M, much smaller than 2G, I >> didn't read your post carefully. >> >> The broadcast in Python had been improved much since 1.1, I think it >> will work in 1.1 or upcoming 1.2 release, could you upgrade to 1.1? >> >> Davies >> >> On Tue, Nov 11, 2014 at 8:37 PM, bliuab <[hidden email] >> <http://user/SendEmail.jtp?type=node&node=18684&i=0>> wrote: >> >> > Dear Liu: >> > >> > Thank you very much for your help. I will update that patch. By the >> way, as >> > I have succeed to broadcast an array of size(30M) the log said that >> such >> > array takes around 230MB memory. As a result, I think the numpy array >> that >> > leads to error is much smaller than 2G. >> > >> > On Wed, Nov 12, 2014 at 12:29 PM, Davies Liu-2 [via Apache Spark User >> List] >> > <[hidden email]> wrote: >> >> >> >> This PR fix the problem: https://github.com/apache/spark/pull/2659 >> >> >> >> cc @josh >> >> >> >> Davies >> >> >> >> On Tue, Nov 11, 2014 at 7:47 PM, bliuab <[hidden email]> wrote: >> >> >> >> > In spark-1.0.2, I have come across an error when I try to broadcast >> a >> >> > quite >> >> > large numpy array(with 35M dimension). The error information except >> the >> >> > java.lang.NegativeArraySizeException error and details is listed >> below. >> >> > Moreover, when broadcast a relatively smaller numpy array(30M >> >> > dimension), >> >> > everything works fine. And 30M dimension numpy array takes 230M >> memory >> >> > which, in my opinion, not very large. >> >> > As far as I have surveyed, it seems related with py4j. However, I >> have >> >> > no >> >> > idea how to fix this. I would be appreciated if I can get some >> hint. >> >> > ------------ >> >> > py4j.protocol.Py4JError: An error occurred while calling >> o23.broadcast. >> >> > Trace: >> >> > java.lang.NegativeArraySizeException >> >> > at py4j.Base64.decode(Base64.java:292) >> >> > at py4j.Protocol.getBytes(Protocol.java:167) >> >> > at py4j.Protocol.getObject(Protocol.java:276) >> >> > at >> >> > py4j.commands.AbstractCommand.getArguments(AbstractCommand.java:81) >> >> > at py4j.commands.CallCommand.execute(CallCommand.java:77) >> >> > at py4j.GatewayConnection.run(GatewayConnection.java:207) >> >> > ------------- >> >> > And the test code is a follows: >> >> > conf = >> >> > >> >> > SparkConf().setAppName('brodyliu_LR').setMaster('spark:// >> 10.231.131.87:5051') >> >> > conf.set('spark.executor.memory', '4000m') >> >> > conf.set('spark.akka.timeout', '100000') >> >> > conf.set('spark.ui.port','8081') >> >> > conf.set('spark.cores.max','150') >> >> > #conf.set('spark.rdd.compress', 'True') >> >> > conf.set('spark.default.parallelism', '300') >> >> > #configure the spark environment >> >> > sc = SparkContext(conf=conf, batchSize=1) >> >> > >> >> > vec = np.random.rand(35000000) >> >> > a = sc.broadcast(vec) >> >> > >> >> > >> >> > >> >> > >> >> > >> >> > >> >> > -- >> >> > View this message in context: >> >> > >> http://apache-spark-user-list.1001560.n3.nabble.com/Pyspark-Error-when-broadcast-numpy-array-tp18662.html >> >> > Sent from the Apache Spark User List mailing list archive at >> Nabble.com. >> >> > >> >> > >> --------------------------------------------------------------------- >> >> > To unsubscribe, e-mail: [hidden email] >> >> > For additional commands, e-mail: [hidden email] >> >> > >> >> >> >> --------------------------------------------------------------------- >> >> To unsubscribe, e-mail: [hidden email] >> >> For additional commands, e-mail: [hidden email] >> >> >> >> >> >> >> >> ________________________________ >> >> If you reply to this email, your message will be added to the >> discussion >> >> below: >> >> >> >> >> http://apache-spark-user-list.1001560.n3.nabble.com/Pyspark-Error-when-broadcast-numpy-array-tp18662p18673.html >> >> To unsubscribe from Pyspark Error when broadcast numpy array, click >> here. >> >> NAML >> > >> > >> > >> > >> > -- >> > My Homepage: www.cse.ust.hk/~bliuab >> > MPhil student in Hong Kong University of Science and Technology. >> > Clear Water Bay, Kowloon, Hong Kong. >> > Profile at LinkedIn. >> > >> > ________________________________ >> > View this message in context: Re: Pyspark Error when broadcast numpy >> array >> > >> > Sent from the Apache Spark User List mailing list archive at >> Nabble.com. >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: [hidden email] >> <http://user/SendEmail.jtp?type=node&node=18684&i=1> >> For additional commands, e-mail: [hidden email] >> <http://user/SendEmail.jtp?type=node&node=18684&i=2> >> >> >> >> ------------------------------ >> If you reply to this email, your message will be added to the >> discussion below: >> >> http://apache-spark-user-list.1001560.n3.nabble.com/Pyspark-Error-when-broadcast-numpy-array-tp18662p18684.html >> To unsubscribe from Pyspark Error when broadcast numpy array, click here >> <http://apache-spark-user-list.1001560.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_code&node=18662&code=YmxpdWFiQGNzZS51c3QuaGt8MTg2NjJ8NTUwMDMxMjYz> >> . >> NAML >> <http://apache-spark-user-list.1001560.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html%21nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace&breadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml> >> > > > > -- > My Homepage: www.cse.ust.hk/~bliuab > MPhil student in Hong Kong University of Science and Technology. > Clear Water Bay, Kowloon, Hong Kong. > Profile at LinkedIn <http://www.linkedin.com/pub/liu-bo/55/52b/10b>. > -- My Homepage: www.cse.ust.hk/~bliuab MPhil student in Hong Kong University of Science and Technology. Clear Water Bay, Kowloon, Hong Kong. Profile at LinkedIn <http://www.linkedin.com/pub/liu-bo/55/52b/10b>. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Pyspark-Error-when-broadcast-numpy-array-tp18662p18805.html Sent from the Apache Spark User List mailing list archive at Nabble.com.