[GitHub] spark pull request: [SPARK-2469] Use Snappy (instead of LZF) for d...

2014-07-15 Thread rxin
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/1415#issuecomment-48996149 I looked into the event logger code and it appears that codec change should be fine. It figures out the codec for old data automatically anyway. --- If your project is set

[GitHub] spark pull request: [SPARK-2469] Use Snappy (instead of LZF) for d...

2014-07-15 Thread andrewor14
Github user andrewor14 commented on the pull request: https://github.com/apache/spark/pull/1415#issuecomment-48996256 Yes, we log the codec used in a separate file so we don't lock ourselves out of our old event logs. This change seems fine. --- If your project is set up for it, you

[GitHub] spark pull request: [SPARK-2469] Use Snappy (instead of LZF) for d...

2014-07-15 Thread mridulm
Github user mridulm commented on the pull request: https://github.com/apache/spark/pull/1415#issuecomment-48996763 @andrewor14 do we also log the block size, etc of the codec used ? If yes, then atleast for event data we should be fine. IIRC we use the codec to compress

[GitHub] spark pull request: [SPARK-2469] Use Snappy (instead of LZF) for d...

2014-07-15 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1415#issuecomment-49001592 QA results for PR 1415:br- This patch FAILED unit tests.br- This patch merges cleanlybr- This patch adds no public classesbrbrFor more information see test

[GitHub] spark pull request: [SPARK-2469] Use Snappy (instead of LZF) for d...

2014-07-15 Thread mridulm
Github user mridulm commented on the pull request: https://github.com/apache/spark/pull/1415#issuecomment-49005728 weird that test failures - unrelated to this change --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If

[GitHub] spark pull request: [SPARK-2469] Use Snappy (instead of LZF) for d...

2014-07-15 Thread mridulm
Github user mridulm commented on the pull request: https://github.com/apache/spark/pull/1415#issuecomment-49005818 ah yes, blocksize is only used during compression time : and inferred from stream during decompression. Then only class name should be sufficient --- If your

[GitHub] spark pull request: [SPARK-2469] Use Snappy (instead of LZF) for d...

2014-07-15 Thread rxin
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/1415#issuecomment-49005883 Yea the test failure isn't related. If there is no objection, I'm going to merge this tomorrow. I will file a jira ticket so we can prepend compression codec

[GitHub] spark pull request: [SPARK-2469] Use Snappy (instead of LZF) for d...

2014-07-15 Thread mridulm
Github user mridulm commented on the pull request: https://github.com/apache/spark/pull/1415#issuecomment-49006312 Cant comment on tachyon since we dont use it and have no experience with it unfortunately. I am fine with this change for the rest. --- If your project is set up

[GitHub] spark pull request: [SPARK-2469] Use Snappy (instead of LZF) for d...

2014-07-15 Thread pwendell
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/1415#issuecomment-49090832 @rxin IIRC at one point we changed this before and it caused a performance regression for our perf suite so we reverted it. At the time I think we were running on

[GitHub] spark pull request: [SPARK-2469] Use Snappy (instead of LZF) for d...

2014-07-15 Thread rxin
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/1415#issuecomment-49099980 Yea - stability seems much more important than a small performance gain --- If your project is set up for it, you can reply to this email and have your reply appear on

[GitHub] spark pull request: [SPARK-2469] Use Snappy (instead of LZF) for d...

2014-07-15 Thread vanzin
Github user vanzin commented on the pull request: https://github.com/apache/spark/pull/1415#issuecomment-49100370 Only the codec names are stored in the event logs; no other information is currently recorded. But this change isn't really breaking anything in that area. (And, by

[GitHub] spark pull request: [SPARK-2469] Use Snappy (instead of LZF) for d...

2014-07-15 Thread rxin
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/1415#issuecomment-49100573 FYI filed JIRA: https://issues.apache.org/jira/browse/SPARK-2496 Compression streams should write its codec info to the stream --- If your project is set up for it, you

[GitHub] spark pull request: [SPARK-2469] Use Snappy (instead of LZF) for d...

2014-07-15 Thread asfgit
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/1415 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is