Github user prasanthj commented on the issue:
https://github.com/apache/orc/pull/163
I agree that reader should gracefully handle 0 length files like what this
patch does instead of throwing. In addition to that we should also avoid
creating splits for 0 length files. Spinning up task
Github user omalley commented on the issue:
https://github.com/apache/orc/pull/163
@prasanthj There are customers out there with millions of zero byte ORC
files in their Hive warehouses. We need to have the reader not throw when they
read them with Spark, etc. Rather than patch each c
Github user prasanthj commented on the issue:
https://github.com/apache/orc/pull/163
Hive creates empty files only for MR to support bucketed joins. Tez doesn't
create empty bucket files anymore. Hive currently discards empty files during
split generation. We can do similar thing in O
Github user omalley commented on the issue:
https://github.com/apache/orc/pull/163
I agree that a filename encoding would be a nice safe guard, but it doesn't
work since Hive isn't using that convention. (Hive made the change in Hive's
OrcInputFormat so it didn't move over to the ORC
Github user omalley commented on a diff in the pull request:
https://github.com/apache/orc/pull/151#discussion_r135922340
--- Diff: c++/include/orc/Reader.hh ---
@@ -288,6 +288,17 @@ namespace orc {
virtual uint64_t getCompressionSize() const = 0;
/**
+
Github user omalley commented on the issue:
https://github.com/apache/orc/pull/159
Is the ISA-L zlib support sufficient to read and write the ORC files with
zlib compression? I agree with Gopal that it doesn't feel like a separate
compression codec.
I'm don't think it is a go
Github user electrum commented on the issue:
https://github.com/apache/orc/pull/163
We had the same use case of making empty bucket creation more efficient.
Encoding the fact that the file is intentionally empty in the name provides a
good safeguard against storage system problems tha
Github user omalley commented on the issue:
https://github.com/apache/orc/pull/163
The problem is that Hive is doing this across the board. See HIVE-13040.
Making the reader not throw is ok, if slightly incompatible. This patch
doesn't change the writer to write such files.
Github user dain commented on the issue:
https://github.com/apache/orc/pull/163
Also this is a backwards incompatible change, so we would, at the very
least, need to do the trick where it is disabled by default in the writer until
the reader is rolled out everywhere.
---
If your pro
Github user dain commented on the issue:
https://github.com/apache/orc/pull/163
We were considering doing this internally and then we ran into a production
bug where files got truncated to zero bytes. Since empty files are illegal we
could find all of the effected partitions easily,
Github user omalley commented on the issue:
https://github.com/apache/orc/pull/162
I undid the inadvertent change to the import lines. Please configure your
IDE to not introduce wildcard imports for this project.
---
If your project is set up for it, you can reply to this email and h
Github user asfgit closed the pull request at:
https://github.com/apache/orc/pull/162
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled
GitHub user omalley opened a pull request:
https://github.com/apache/orc/pull/163
ORC-162. Handle 0 byte files as empty ORC files.
Treat 0 byte files as an empty ORC file with schema of struct<>.
You can merge this pull request into a Git repository by running:
$ git pull https
Github user AnatoliShein commented on the issue:
https://github.com/apache/orc/pull/134
Hi @omalley , the linking should be fixed with the new commit, and we will
fix the warnings shortly.
---
If your project is set up for it, you can reply to this email and have your
reply appear on
Github user asfgit closed the pull request at:
https://github.com/apache/orc/pull/160
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled
Github user omalley commented on the issue:
https://github.com/apache/orc/pull/134
There are a lot of warnings in libhdfs that clang reports.
It is currently failing in:
[ 91%] Linking CXX shared library libhdfspp.dylib
with link errors about sasl and protobuf.
-
16 matches
Mail list logo