Hello Andrew,

Thank you very much for your great tips. Your solution worked perfectly.

In fact, I was not aware that the right option for local mode is
--driver.memory 1g

Cheers,

Rindra


On Mon, Jul 21, 2014 at 11:23 AM, Andrew Or-2 [via Apache Spark User List] <
ml-node+s1001560n10336...@n3.nabble.com> wrote:

> Hi Rindra,
>
> Depending on what you're doing with your groupBy, you may end up inflating
> your data quite a bit. Even if your machine has 16G, by default spark-shell
> only uses 512M, and the amount used for storing blocks is only 60% of that
> (spark.storage.memoryFraction), so this space becomes ~300M. This is still
> many multiples of the size of your dataset, but not by orders of magnitude.
> If you are running Spark 1.0+, you can increase the amount of memory used
> by spark-shell by adding "--driver-memory 1g" as a command line argument in
> local mode, or "--executor-memory 1g" in any other mode.
>
> (Also, it seems that you set your log level to WARN. The cause is most
> probably because the cache is not big enough, but setting the log level to
> INFO will provide you with more information on the exact sizes that are
> being used by the storage and the blocks).
>
> Andrew
>
>
> 2014-07-19 13:01 GMT-07:00 rindra <[hidden email]
> <http://user/SendEmail.jtp?type=node&node=10336&i=0>>:
>
>> Hi,
>>
>> I am working with a small dataset about 13Mbyte on the spark-shell. After
>> doing a
>> groupBy on the RDD, I wanted to cache RDD in memory but I keep getting
>> these warnings:
>>
>> scala> rdd.cache()
>> res28: rdd.type = MappedRDD[63] at repartition at <console>:28
>>
>>
>> scala> rdd.count()
>> 14/07/19 12:45:18 WARN BlockManager: Block rdd_63_82 could not be dropped
>> from memory as it does not exist
>> 14/07/19 12:45:18 WARN BlockManager: Putting block rdd_63_82 failed
>> 14/07/19 12:45:18 WARN BlockManager: Block rdd_63_40 could not be dropped
>> from memory as it does not exist
>> 14/07/19 12:45:18 WARN BlockManager: Putting block rdd_63_40 failed
>> res29: Long = 5
>>
>> It seems that I could not cache the data in memory even though my local
>> machine has
>> 16Gb RAM and the data is only 13MB with 100 partitions size.
>>
>> How to prevent this caching issue from happening? Thanks.
>>
>> Rindra
>>
>>
>>
>> --
>> View this message in context:
>> http://apache-spark-user-list.1001560.n3.nabble.com/Caching-issue-with-msg-RDD-block-could-not-be-dropped-from-memory-as-it-does-not-exist-tp10248.html
>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>>
>
>
>
> ------------------------------
>  If you reply to this email, your message will be added to the discussion
> below:
>
> http://apache-spark-user-list.1001560.n3.nabble.com/Caching-issue-with-msg-RDD-block-could-not-be-dropped-from-memory-as-it-does-not-exist-tp10248p10336.html
>  To unsubscribe from Caching issue with msg: RDD block could not be
> dropped from memory as it does not exist, click here
> <http://apache-spark-user-list.1001560.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_code&node=10248&code=cmluZHJhLnViY0BnbWFpbC5jb218MTAyNDh8MTYyNTM1MTg3OQ==>
> .
> NAML
> <http://apache-spark-user-list.1001560.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html%21nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace&breadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml>
>




--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Caching-issue-with-msg-RDD-block-could-not-be-dropped-from-memory-as-it-does-not-exist-tp10248p10463.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

Reply via email to