Josh, Thanks a lot !
You can download a video I created : https://s3-eu-west-1.amazonaws.com/christopherbourez/public/video.mov I created a sample file of 13 MB as explained : https://s3-eu-west-1.amazonaws.com/christopherbourez/public/test.csv Here are the commands I did : I created an Aws Workspace with Windows 7 (that I can share you if you'd like) with Standard instance, 2GiB RAM On this instance : I downloaded spark (1.5 or 1.6 same pb) with hadoop 2.6 installed java 8 jdk downloaded python 2.7.8 downloaded the sample file https://s3-eu-west-1.amazonaws.com/christopherbourez/public/test.csv And then the command lines I launch are : bin\pyspark --master local[1] sc.textFile("test.csv").take(1) As you can see, sc.textFile("test.csv", 2000).take(1) works well Thanks a lot ! Christopher Bourez 06 17 17 50 60 On Mon, Jan 25, 2016 at 8:02 PM, Josh Rosen <joshro...@databricks.com> wrote: > Hi Christopher, > > What would be super helpful here is a standalone reproduction. Ideally > this would be a single Scala file or set of commands that I can run in > `spark-shell` in order to reproduce this. Ideally, this code would generate > a giant file, then try to read it in a way that demonstrates the bug. If > you have such a reproduction, could you attach it to that JIRA ticket? > Thanks! > > On Mon, Jan 25, 2016 at 7:53 AM Christopher Bourez < > christopher.bou...@gmail.com> wrote: > >> Dears, >> >> I would like to re-open a case for a potential bug (current status is >> resolved but it sounds not) : >> >> *https://issues.apache.org/jira/browse/SPARK-12261 >> <https://issues.apache.org/jira/browse/SPARK-12261>* >> >> I believe there is something wrong about the memory management under >> windows >> >> It has no sense to work with files smaller than a few Mo... >> >> Do not hesitate to ask me questions if you try to help and reproduce the >> bug, >> >> Best >> >> Christopher Bourez >> 06 17 17 50 60 >> >