Re: Null filesystem when using ORC writer

2021-04-16 Thread Ryan Schachte
at 9:28 PM Ryan Schachte wrote: > Hi Pavan, > Thanks for getting back to me. I'm giving this a shot to simplify my > example to see if I can reproduce. Any chance you could speak to why this > error would keep happening? > Excep

Re: Null filesystem when using ORC writer

2021-04-16 Thread Ryan Schachte
are seeing is coming from the point that the > configuration object being set is null. > > Try the simple example I am sure that should clear things up for you. > > > On Apr 16, 2021, at 12:32 PM, Ryan Schachte > wrote: > > > > Hi team. Desperate to understand wh

Null filesystem when using ORC writer

2021-04-16 Thread Ryan Schachte
Hi team. Desperate to understand what my issue is here, hoping someone with better knowledge of the ORC writer and how the local hadoop fs is working. I'm using the LocalFileSystem for my standalone app (compacts ORC files). Logging into the container, I see all my data written to the location I'm

Re: Disabling Kerberos when using ORC writer

2021-04-16 Thread Ryan Schachte
x27;m trying to write to. Still debugging. On Fri, Apr 16, 2021 at 3:55 AM Ryan Schachte wrote: > Hi everyone, > > I've spent many hours debugging this failure. I've written a small ORC > compactor using the Java libs. Everything works locally, but deployment to > cloud runn

Disabling Kerberos when using ORC writer

2021-04-16 Thread Ryan Schachte
Hi everyone, I've spent many hours debugging this failure. I've written a small ORC compactor using the Java libs. Everything works locally, but deployment to cloud running in Docker is giving me Kerberos auth failures. public FileSystem getHadoopFs() { return new LocalFileSystem() { @Overr

Avro representation of ORC time

2020-09-16 Thread Ryan Schachte
ors into time and nanos. The data is good, but one is represented in epoch (avro) and timestamp format (orc). The reason I bring this up as a question is I am unable to union these values in Hive since the data types are technically different. Just curious if there are thoughts on this or if I'm doing something wrong. Best, Ryan Schachte

Re: ORC vector rollback

2020-09-11 Thread Ryan Schachte
7;t reclaim the values, but they won't be > written to the file. > > .. Owen > > On Fri, Sep 11, 2020 at 5:58 PM Ryan Schachte > wrote: > > > Hi Owen, > > Thanks for the quick response. > > > > Essentially, I have an Avro -> ORC real-time conver

Re: ORC vector rollback

2020-09-11 Thread Ryan Schachte
gt; vector in the VectorizedRowBatch to just select the other rows. > > .. Owen > > On Fri, Sep 11, 2020 at 7:12 AM Ryan Schachte > wrote: > > > I'm writing a streaming application that converts incoming data into ORC > in > > real-time. One thing I'm imp

ORC vector rollback

2020-09-11 Thread Ryan Schachte
t vectorPosition I'm iterating on. Is it as simple as setting colVector.isNull[vectorPosition] to true and setting colVector.noNulls to false? I wanted to originally go into the index for each column vector and override, but I don't see an easy way to do that. Cheers!! Ryan Schachte

Re: uniontypes and Spark

2020-07-31 Thread Ryan Schachte
u would have fields for each variant of the > union and N-1 of the N fields > would be null for each row. > > .. Owen > > On Thu, Jul 30, 2020 at 9:19 AM Ryan Schachte > wrote: > > > I am writing ORC binaries in Java and they deserialize perfectly with the > > Apa

uniontypes and Spark

2020-07-30 Thread Ryan Schachte
I am writing ORC binaries in Java and they deserialize perfectly with the Apache ORC jar on the docs that I've used to validate the data. The schemas looks good, etc. When reading this information via Spark, we are encountering failures - in particular mismatched input '<' expecting '>'(line 1, p

In-memory VFS

2020-07-28 Thread Ryan Schachte
Hello all, I'm looking to batch some ORC files in memory before persisting to cloud storage. As a result, I was interested in utilizing an in-memory VFS to store this data to avoid disk writes for efficiency. JIMFS (from Google) seems to support what I need. The method signature of ORC writer expe

Re: Interpreting ORC Java Reference

2020-07-21 Thread Ryan Schachte
documentation - >https://orc.apache.org/docs/core-java.html >- The orc to json tool - > > https://github.com/apache/orc/blob/master/java/tools/src/java/org/apache/orc/tools/PrintData.java > > Feel free to ask questions here on the dev list too. > > .. Owen > > O

Interpreting ORC Java Reference

2020-07-20 Thread Ryan Schachte
Hi team, apologies for the last email, believe I sent too early. I'm interested in better understanding the ORC reference guide in the docs and wanted to clarify some things to see if I'm understanding correctly. I realize for the *VectorizedRowBatch* approach, we write in chunks of 1024 rows and

Interpreting ORC Java Reference

2020-07-20 Thread Ryan Schachte
Hi team, I'm new to ORC and m interested in getting confirmation on how I interpret the structure of the vectorized row/column vectors.

Re: Avro to ORC Conversion

2020-07-16 Thread Ryan Schachte
pile for Hive 2, but I have done some work with porting similar > > processors to Hive 2 and as I recall it was mostly API-type breaking > > changes and not so much from the behavior side of things, more of a > > Maven and Java-package-name kind of thing. > > > > Regards,

Re: Avro to ORC Conversion

2020-07-15 Thread Ryan Schachte
nar-bundles/nifi-hive-bundle/nifi-hive-processors/src/main/java/org/apache/nifi/processors/hive/ConvertAvroToORC.java > > On Wed, Jul 15, 2020 at 4:51 PM Ryan Schachte > wrote: > > > > I'm writing a standalone Java process and interested in converting the > > consum

Re: Avro to ORC Conversion

2020-07-15 Thread Ryan Schachte
erter tools. > For example, You can simply dockerize Apache Spark 3.0.0 on JDK11 > docker image and use it. The full JDK11 (openjdk:11) is 627MB. > If you use 11-jre-slim(`204MB`) as a base image, > the final docker image (Apache Spark 3.0.0 + JDK11) will be 500MB. > > Bests, > Dongjo

Avro to ORC Conversion

2020-07-15 Thread Ryan Schachte
I'm writing a standalone Java process and interested in converting the consumed Avro messages to ORC. I've seen a plethora of examples of writing to ORC, but the conversion to ORC from Avro is what I can't seem to find a lot of examples of. This is just a standard Java process running inside of a