[GitHub] orc issue #163: ORC-162. Handle 0 byte files as empty ORC files.

2017-08-29 Thread prasanthj
Github user prasanthj commented on the issue: https://github.com/apache/orc/pull/163 I agree that reader should gracefully handle 0 length files like what this patch does instead of throwing. In addition to that we should also avoid creating splits for 0 length files. Spinning up task

[GitHub] orc issue #163: ORC-162. Handle 0 byte files as empty ORC files.

2017-08-29 Thread omalley
Github user omalley commented on the issue: https://github.com/apache/orc/pull/163 @prasanthj There are customers out there with millions of zero byte ORC files in their Hive warehouses. We need to have the reader not throw when they read them with Spark, etc. Rather than patch each c

[GitHub] orc issue #163: ORC-162. Handle 0 byte files as empty ORC files.

2017-08-29 Thread prasanthj
Github user prasanthj commented on the issue: https://github.com/apache/orc/pull/163 Hive creates empty files only for MR to support bucketed joins. Tez doesn't create empty bucket files anymore. Hive currently discards empty files during split generation. We can do similar thing in O

[GitHub] orc issue #163: ORC-162. Handle 0 byte files as empty ORC files.

2017-08-29 Thread omalley
Github user omalley commented on the issue: https://github.com/apache/orc/pull/163 I agree that a filename encoding would be a nice safe guard, but it doesn't work since Hive isn't using that convention. (Hive made the change in Hive's OrcInputFormat so it didn't move over to the ORC

[GitHub] orc pull request #151: ORC-226 Support getWriterId in c++ reader interface

2017-08-29 Thread omalley
Github user omalley commented on a diff in the pull request: https://github.com/apache/orc/pull/151#discussion_r135922340 --- Diff: c++/include/orc/Reader.hh --- @@ -288,6 +288,17 @@ namespace orc { virtual uint64_t getCompressionSize() const = 0; /** +

[GitHub] orc issue #159: ORC-175: ORC-232: add jmh-generator-annprocess in pom.xml. i...

2017-08-29 Thread omalley
Github user omalley commented on the issue: https://github.com/apache/orc/pull/159 Is the ISA-L zlib support sufficient to read and write the ORC files with zlib compression? I agree with Gopal that it doesn't feel like a separate compression codec. I'm don't think it is a go

[GitHub] orc issue #163: ORC-162. Handle 0 byte files as empty ORC files.

2017-08-29 Thread electrum
Github user electrum commented on the issue: https://github.com/apache/orc/pull/163 We had the same use case of making empty bucket creation more efficient. Encoding the fact that the file is intentionally empty in the name provides a good safeguard against storage system problems tha

[GitHub] orc issue #163: ORC-162. Handle 0 byte files as empty ORC files.

2017-08-29 Thread omalley
Github user omalley commented on the issue: https://github.com/apache/orc/pull/163 The problem is that Hive is doing this across the board. See HIVE-13040. Making the reader not throw is ok, if slightly incompatible. This patch doesn't change the writer to write such files.

[GitHub] orc issue #163: ORC-162. Handle 0 byte files as empty ORC files.

2017-08-29 Thread dain
Github user dain commented on the issue: https://github.com/apache/orc/pull/163 Also this is a backwards incompatible change, so we would, at the very least, need to do the trick where it is disabled by default in the writer until the reader is rolled out everywhere. --- If your pro

[GitHub] orc issue #163: ORC-162. Handle 0 byte files as empty ORC files.

2017-08-29 Thread dain
Github user dain commented on the issue: https://github.com/apache/orc/pull/163 We were considering doing this internally and then we ran into a production bug where files got truncated to zero bytes. Since empty files are illegal we could find all of the effected partitions easily,

[GitHub] orc issue #162: ORC-231 Configurable capability to overwrite the file if it ...

2017-08-29 Thread omalley
Github user omalley commented on the issue: https://github.com/apache/orc/pull/162 I undid the inadvertent change to the import lines. Please configure your IDE to not introduce wildcard imports for this project. --- If your project is set up for it, you can reply to this email and h

[GitHub] orc pull request #162: ORC-231 Configurable capability to overwrite the file...

2017-08-29 Thread asfgit
Github user asfgit closed the pull request at: https://github.com/apache/orc/pull/162 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled

[GitHub] orc pull request #163: ORC-162. Handle 0 byte files as empty ORC files.

2017-08-29 Thread omalley
GitHub user omalley opened a pull request: https://github.com/apache/orc/pull/163 ORC-162. Handle 0 byte files as empty ORC files. Treat 0 byte files as an empty ORC file with schema of struct<>. You can merge this pull request into a Git repository by running: $ git pull https

[GitHub] orc issue #134: Orc 17

2017-08-29 Thread AnatoliShein
Github user AnatoliShein commented on the issue: https://github.com/apache/orc/pull/134 Hi @omalley , the linking should be fixed with the new commit, and we will fix the warnings shortly. --- If your project is set up for it, you can reply to this email and have your reply appear on

[GitHub] orc pull request #160: ORC-233 Allow `orc.include.columns` to be empty

2017-08-29 Thread asfgit
Github user asfgit closed the pull request at: https://github.com/apache/orc/pull/160 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled

[GitHub] orc issue #134: Orc 17

2017-08-29 Thread omalley
Github user omalley commented on the issue: https://github.com/apache/orc/pull/134 There are a lot of warnings in libhdfs that clang reports. It is currently failing in: [ 91%] Linking CXX shared library libhdfspp.dylib with link errors about sasl and protobuf. -