Gopal Vijayaraghavan created ORC-629:
Summary: PPD: Floating point NaN is not transitive across
comparisons
Key: ORC-629
URL: https://issues.apache.org/jira/browse/ORC-629
Project: ORC
Gopal Vijayaraghavan created ORC-570:
Summary: FS: ReaderOptions.filesystem should also accept a lazy
Supplier
Key: ORC-570
URL: https://issues.apache.org/jira/browse/ORC-570
Project: ORC
>We are conducting a project involving replacing (Linux) system's
>libz.so with our own hardware based implementation, but this requires us
> to
>replace libzip.so with our own so that small zip processing doesn't go
> through
>hardware, as hardware actually cannot process
> How small are you trying to make the stripes? I ask because all of the
> above should be small, so if they are dominating, I would assume the stripe
> is tiny or the compression really worked well.
I'm not in favour of stripelets for seek reasons, because reading a single
column from a
>Zstd with particular settings doesn’t work well on one particular
> non-public dataset after it is encoded by RLE.
>I’ve suggested that you try tuning the zstd compression to find a set of
> parameters that work well with RLE. Take a look at how we tune the zlib
> compression based
Hi,
+1 - verified keys, signature, rebuilt Hive master against this build & ran a
few queries on LLAP.
Cheers,
Gopal
On 9/20/18, 4:26 PM, "Owen O'Malley" wrote:
All,
Should we release the following artifacts as ORC 1.5.3?
tar: http://home.apache.org/~omalley/orc-1.5.3/
Hi,
> From above observation, we find that it is better to disable LEB128 encoding
> while zstd is used.
You can enable file size optimizations (automatically recommend better layouts
for compression) when
"orc.encoding.strategy"="COMPRESSION"
There are a bunch of bitpacking loops that's
Verified signatures against dist KEYS, checksums.
Built Hive3.0 against 1.5.2 & everything looks good.
+1 binding.
Cheers,
Gopal
On 6/25/18, 4:43 PM, "Prasanth Jayachandran" wrote:
Oops. My bad.
Here is the correct link
http://home.apache.org/~prasanthj/orc-1.5.2rc0/
Hi,
+1
Package builds clean & tested against HIVE-19465.
Cheers,
Gopal
On 5/14/18, 9:54 AM, "Owen O'Malley" wrote:
*Ping*
I need one more PMC vote, please. :)
On Thu, May 10, 2018 at 3:18 PM, Xiening Dai wrote:
>
Hi,
I agree with your analysis about Decimals.
Something similar has already gone into patch-available previously, but held
back
https://issues.apache.org/jira/browse/ORC-209
This is somewhat stuck behind the Vector type system evolving support for this
> the bad thing is that we still have TWO encodings to discuss.
Two is exactly what we need, not five - from the existing ORC configs
hive.exec.orc.encoding.strategy=[SPEED, COMPRESSION];
FLIP8 was my original suggestion to Teddy from the byteuniq UDF runs, though
the regressions in
>2. Under seek or predicate pushdown scenario, there’s no need to load the
> entire stream.
Yes, that is a valid scenario where the reader reads partial-streams & causes
random IO.
The current double encoding is actually 2 streams today & will continue to use
2 streams for the FLIP
Hi,
> Since Split creates two separated streams, reading one data batch will need
> an additional seek in order to reconstruct the column data
If you are seeing a seek like that, we've messed up something else higher up in
the pipeline & that can be fixed.
ORC columnar reads only do random
> existing work [1] from Teddy Choi and Owen O'Malley with some new compression
> codec (e.g. ZSTD and Brotli), we proposed to prompt FLIP as the default
> encoding for ORC double type to move this feature forwards.
Since we're discussing these, I'm going to summarize my existing notes on this,
> For performance reasons, you prefer the second option that I rejected
> where users give a file and the system finds the deletes from there. I can
> buy that.
That's simpler at least to understand and debug, the logs from ORC alone are
enough to find consistency issues.
The rest of the
Hi,
> My intention is that we can iterate on the UNSTABLE-PRE-2.0 format without
> cross-version compatibility. It will only be used for developer testing.
Sounds good - I tested Hive can communicate this to ORC correctly.
set hive.exec.orc.write.format="UNSTABLE-PRE-2.0";
offers a very
> I agree that we want to be able to trim the values. I've seen cases where
> the String is huge (~100k) and makes the StringStatistics huge. I'd propose
> that we do something like:
The only concrete consumer of this data outside of ORC readers is probably
partial scan computation of
>I can see that row indices are being used to select only rowgroups that
>satisfy a search predicate in
…
> But, I cannot find where and if the stripe level indices are being used?
18 matches
Mail list logo