[GitHub] spark issue #15327: [SPARK-16575] [spark core] partition calculation mismatc...

2016-10-04 Thread kmader
Github user kmader commented on the issue: https://github.com/apache/spark/pull/15327 @rxin on the PS, how would you foresee the SQL implementation for binary support? is there a standard method of going from bytestreams to dataframes? --- If your project is set up for it, you can

[GitHub] spark pull request: [SPARK-11449][Core] PortableDataStream should ...

2015-11-03 Thread kmader
Github user kmader commented on the pull request: https://github.com/apache/spark/pull/9417#issuecomment-153550148 @srowen @hvanhovell this is a nice improvement and more elegant than the original approach. As a side node, In our code base (which uses PortableDataStream

[GitHub] spark pull request: Syncing up local copy

2014-11-05 Thread kmader
GitHub user kmader opened a pull request: https://github.com/apache/spark/pull/3123 Syncing up local copy You can merge this pull request into a Git repository by running: $ git pull https://github.com/4Quant/spark master Alternatively you can review and apply these changes

[GitHub] spark pull request: Syncing up local copy

2014-11-05 Thread kmader
Github user kmader closed the pull request at: https://github.com/apache/spark/pull/3123 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is

[GitHub] spark pull request: [SPARK-2759][CORE] Generic Binary File Support...

2014-10-29 Thread kmader
Github user kmader commented on a diff in the pull request: https://github.com/apache/spark/pull/1658#discussion_r19582168 --- Diff: core/src/main/scala/org/apache/spark/rdd/BinaryFileRDD.scala --- @@ -0,0 +1,57 @@ +/* + * Licensed to the Apache Software Foundation (ASF

[GitHub] spark pull request: [SPARK-2759][CORE] Generic Binary File Support...

2014-10-21 Thread kmader
Github user kmader commented on a diff in the pull request: https://github.com/apache/spark/pull/1658#discussion_r19133684 --- Diff: core/src/main/scala/org/apache/spark/api/java/JavaSparkContext.scala --- @@ -220,6 +227,83 @@ class JavaSparkContext(val sc: SparkContext) extends

[GitHub] spark pull request: [SPARK-2759][CORE] Generic Binary File Support...

2014-10-20 Thread kmader
Github user kmader commented on the pull request: https://github.com/apache/spark/pull/1658#issuecomment-59832070 So I made the requested changes and added a few more tests, but the tests appear to have not run for a strange reason: https://amplab.cs.berkeley.edu/jenkins/job

[GitHub] spark pull request: [SPARK-2759][CORE] Generic Binary File Support...

2014-10-02 Thread kmader
Github user kmader commented on a diff in the pull request: https://github.com/apache/spark/pull/1658#discussion_r18335807 --- Diff: core/src/main/scala/org/apache/spark/input/RawFileInput.scala --- @@ -0,0 +1,221 @@ +/* + * Licensed to the Apache Software Foundation (ASF

[GitHub] spark pull request: [SPARK-2759][CORE] Generic Binary File Support...

2014-10-01 Thread kmader
Github user kmader commented on a diff in the pull request: https://github.com/apache/spark/pull/1658#discussion_r18267674 --- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala --- @@ -511,6 +511,67 @@ class SparkContext(config: SparkConf) extends Logging

[GitHub] spark pull request: [SPARK-2759][CORE] Generic Binary File Support...

2014-10-01 Thread kmader
Github user kmader commented on a diff in the pull request: https://github.com/apache/spark/pull/1658#discussion_r18267344 --- Diff: core/src/main/scala/org/apache/spark/input/RawFileInput.scala --- @@ -0,0 +1,221 @@ +/* + * Licensed to the Apache Software Foundation (ASF

[GitHub] spark pull request: [SPARK-2759][CORE] Generic Binary File Support...

2014-09-16 Thread kmader
Github user kmader commented on the pull request: https://github.com/apache/spark/pull/1658#issuecomment-55769371 Thanks @jrabary for this find, it had to do with the new method for handling PortableDataStreams which didn't calculate the name correctly. I think I have it fixe

[GitHub] spark pull request: [SPARK-2759][CORE] Generic Binary File Support...

2014-09-07 Thread kmader
Github user kmader commented on the pull request: https://github.com/apache/spark/pull/1658#issuecomment-54744540 Hey @mateiz, Sorry, I had other projects to work on. I have made the changes and called the new class ```PortableDataStream``` --- If your project is set up for it

[GitHub] spark pull request: [SPARK-2759][CORE] Generic Binary File Support...

2014-08-14 Thread kmader
Github user kmader commented on the pull request: https://github.com/apache/spark/pull/1658#issuecomment-52219293 Addressing the major issues brought up Do we need both a stream API and a byte array one? The byte array might be more problematic with out of memory, but stream

[GitHub] spark pull request: [SPARK-2759][CORE] Generic Binary File Support...

2014-08-13 Thread kmader
Github user kmader commented on a diff in the pull request: https://github.com/apache/spark/pull/1658#discussion_r16177677 --- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala --- @@ -511,6 +511,67 @@ class SparkContext(config: SparkConf) extends Logging

[GitHub] spark pull request: [SPARK-2759][CORE] Generic Binary File Support...

2014-08-13 Thread kmader
Github user kmader commented on the pull request: https://github.com/apache/spark/pull/1658#issuecomment-52049280 @freeman-lab looks good, I will add it to this pull request if that's ok for you. I think my personal preference would be do keep byteFile for standard operation

[GitHub] spark pull request: Generic Binary File Support in Spark

2014-07-30 Thread kmader
Github user kmader commented on the pull request: https://github.com/apache/spark/pull/1658#issuecomment-50700133 Thanks for the feedback, I have made the changes requested, created an issue (https://issues.apache.org/jira/browse/SPARK-2759), and added a dataStreamFiles to both

[GitHub] spark pull request: Generic Binary File Support in Spark

2014-07-30 Thread kmader
GitHub user kmader opened a pull request: https://github.com/apache/spark/pull/1658 Generic Binary File Support in Spark The additions add the abstract BinaryFileInputFormat and BinaryRecordReader classes for reading in data as a byte stream and converting it to another format