Re: bug for large textfiles on windows

Christopher Bourez Mon, 25 Jan 2016 13:02:42 -0800

Josh,

Thanks a lot !


You can download a video I created :
https://s3-eu-west-1.amazonaws.com/christopherbourez/public/video.mov

I created a sample file of 13 MB as explained :
https://s3-eu-west-1.amazonaws.com/christopherbourez/public/test.csv

Here are the commands I did :

I created an Aws Workspace with Windows 7 (that I can share you if you'd
like) with Standard instance, 2GiB RAM
On this instance :
I downloaded spark (1.5 or 1.6 same pb) with hadoop 2.6
installed java 8 jdk
downloaded python 2.7.8

downloaded the sample file
https://s3-eu-west-1.amazonaws.com/christopherbourez/public/test.csv

And then the command lines I launch are :
bin\pyspark --master local[1]
sc.textFile("test.csv").take(1)

As you can see, sc.textFile("test.csv", 2000).take(1) works well

Thanks a lot !


Christopher Bourez
06 17 17 50 60

On Mon, Jan 25, 2016 at 8:02 PM, Josh Rosen <joshro...@databricks.com>
wrote:

> Hi Christopher,
>
> What would be super helpful here is a standalone reproduction. Ideally
> this would be a single Scala file or set of commands that I can run in
> `spark-shell` in order to reproduce this. Ideally, this code would generate
> a giant file, then try to read it in a way that demonstrates the bug. If
> you have such a reproduction, could you attach it to that JIRA ticket?
> Thanks!
>
> On Mon, Jan 25, 2016 at 7:53 AM Christopher Bourez <
> christopher.bou...@gmail.com> wrote:
>
>> Dears,
>>
>> I would like to re-open a case for a potential bug (current status is
>> resolved but it sounds not) :
>>
>> *https://issues.apache.org/jira/browse/SPARK-12261
>> <https://issues.apache.org/jira/browse/SPARK-12261>*
>>
>> I believe there is something wrong about the memory management under
>> windows
>>
>> It has no sense to work with files smaller than a few Mo...
>>
>> Do not hesitate to ask me questions if you try to help and reproduce the
>> bug,
>>
>> Best
>>
>> Christopher Bourez
>> 06 17 17 50 60
>>
>

Re: bug for large textfiles on windows

Reply via email to