at 9:28 PM Ryan Schachte
wrote:
> Hi Pavan,
> Thanks for getting back to me. I'm giving this a shot to simplify my
> example to see if I can reproduce. Any chance you could speak to why this
> error would keep happening?
> Excep
are seeing is coming from the point that the
> configuration object being set is null.
>
> Try the simple example I am sure that should clear things up for you.
>
> > On Apr 16, 2021, at 12:32 PM, Ryan Schachte
> wrote:
> >
> > Hi team. Desperate to understand wh
Hi team. Desperate to understand what my issue is here, hoping someone with
better knowledge of the ORC writer and how the local hadoop fs is working.
I'm using the LocalFileSystem for my standalone app (compacts ORC files).
Logging into the container, I see all my data written to the location I'm
x27;m trying to write to. Still debugging.
On Fri, Apr 16, 2021 at 3:55 AM Ryan Schachte
wrote:
> Hi everyone,
>
> I've spent many hours debugging this failure. I've written a small ORC
> compactor using the Java libs. Everything works locally, but deployment to
> cloud runn
Hi everyone,
I've spent many hours debugging this failure. I've written a small ORC
compactor using the Java libs. Everything works locally, but deployment to
cloud running in Docker is giving me Kerberos auth failures.
public FileSystem getHadoopFs() {
return new LocalFileSystem() {
@Overr
ors into time and nanos. The data is good, but one is
represented in epoch (avro) and timestamp format (orc). The reason I bring
this up as a question is I am unable to union these values in Hive since
the data types are technically different. Just curious if there are
thoughts on this or if I'm doing something wrong.
Best,
Ryan Schachte
7;t reclaim the values, but they won't be
> written to the file.
>
> .. Owen
>
> On Fri, Sep 11, 2020 at 5:58 PM Ryan Schachte
> wrote:
>
> > Hi Owen,
> > Thanks for the quick response.
> >
> > Essentially, I have an Avro -> ORC real-time conver
gt; vector in the VectorizedRowBatch to just select the other rows.
>
> .. Owen
>
> On Fri, Sep 11, 2020 at 7:12 AM Ryan Schachte
> wrote:
>
> > I'm writing a streaming application that converts incoming data into ORC
> in
> > real-time. One thing I'm imp
t vectorPosition I'm iterating on. Is it as
simple as setting colVector.isNull[vectorPosition] to true and setting
colVector.noNulls to false? I wanted to originally go into the index for
each column vector and override, but I don't see an easy way to do that.
Cheers!!
Ryan Schachte
u would have fields for each variant of the
> union and N-1 of the N fields
> would be null for each row.
>
> .. Owen
>
> On Thu, Jul 30, 2020 at 9:19 AM Ryan Schachte
> wrote:
>
> > I am writing ORC binaries in Java and they deserialize perfectly with the
> > Apa
I am writing ORC binaries in Java and they deserialize perfectly with the
Apache ORC jar on the docs that I've used to validate the data. The schemas
looks good, etc.
When reading this information via Spark, we are encountering failures - in
particular
mismatched input '<' expecting '>'(line 1, p
Hello all,
I'm looking to batch some ORC files in memory before persisting to cloud
storage. As a result, I was interested in utilizing an in-memory VFS to
store this data to avoid disk writes for efficiency. JIMFS (from Google)
seems to support what I need. The method signature of ORC writer expe
documentation -
>https://orc.apache.org/docs/core-java.html
>- The orc to json tool -
>
> https://github.com/apache/orc/blob/master/java/tools/src/java/org/apache/orc/tools/PrintData.java
>
> Feel free to ask questions here on the dev list too.
>
> .. Owen
>
> O
Hi team,
apologies for the last email, believe I sent too early. I'm interested in
better understanding the ORC reference guide in the docs and wanted to
clarify some things to see if I'm understanding correctly.
I realize for the *VectorizedRowBatch* approach, we write in chunks of 1024
rows and
Hi team,
I'm new to ORC and m interested in getting confirmation on how I interpret
the structure of the vectorized row/column vectors.
pile for Hive 2, but I have done some work with porting similar
> > processors to Hive 2 and as I recall it was mostly API-type breaking
> > changes and not so much from the behavior side of things, more of a
> > Maven and Java-package-name kind of thing.
> >
> > Regards,
nar-bundles/nifi-hive-bundle/nifi-hive-processors/src/main/java/org/apache/nifi/processors/hive/ConvertAvroToORC.java
>
> On Wed, Jul 15, 2020 at 4:51 PM Ryan Schachte
> wrote:
> >
> > I'm writing a standalone Java process and interested in converting the
> > consum
erter tools.
> For example, You can simply dockerize Apache Spark 3.0.0 on JDK11
> docker image and use it. The full JDK11 (openjdk:11) is 627MB.
> If you use 11-jre-slim(`204MB`) as a base image,
> the final docker image (Apache Spark 3.0.0 + JDK11) will be 500MB.
>
> Bests,
> Dongjo
I'm writing a standalone Java process and interested in converting the
consumed Avro messages to ORC. I've seen a plethora of examples of writing
to ORC, but the conversion to ORC from Avro is what I can't seem to find a
lot of examples of.
This is just a standard Java process running inside of a
19 matches
Mail list logo