Re: Proposal: Revamp Apache Hive website.

2022-09-15 Thread Owen O'Malley
Look at the threads and talk to Apache Infra. They couldn't make it work before. We would have needed to manually publish to the asf-site branch. On Thu, Sep 15, 2022 at 7:54 PM Simhadri G wrote: > Thanks Ayush, Pau Tallada and Owen O'Malley for the feedback! > > @Owen ,

Re: Proposal: Revamp Apache Hive website.

2022-09-15 Thread Owen O'Malley
I found it - https://github.com/apache/hive/pull/1410 On Thu, Sep 15, 2022 at 6:42 PM Owen O'Malley wrote: > I had a PR to replace the website with markdown. Apache Infra was supposed > to make it autopublish. *sigh* > > .. Owen > > On Thu, Sep 15, 2022 at 4:23 PM Pau

Re: Proposal: Revamp Apache Hive website.

2022-09-15 Thread Owen O'Malley
I had a PR to replace the website with markdown. Apache Infra was supposed to make it autopublish. *sigh* .. Owen On Thu, Sep 15, 2022 at 4:23 PM Pau Tallada wrote: > Hi, > > Great work! > +1 on updating it as well > > Missatge de Ayush Saxena del dia dj., 15 de set. > 2022 a les 17:40: > >> H

Re: Should we consider Spark3 support for Hive on Spark

2022-08-24 Thread Owen O'Malley
Hive on Spark is not recommended. The recommended path is to use either Tez or LLAP. If you already are using Spark 3, it would be far easier to use Spark SQL. .. Owen On Wed, Aug 24, 2022 at 3:46 AM Fred Bai wrote: > Hi everyone: > > Do we have any support for Hive on Spark? I need Hive on Spa

Re: Hive: Request for Dataset

2022-02-07 Thread Owen O'Malley
I believe it was just intended as an example with your own data. For an example that uses data available on every linux machine, you can do: create table passwd ( name string, not_used string, uid int, gid int, full_name string, home_dir string, shell string ) row format delimited fi

Re: LLAP can't read ORC ZLIB files from S3

2020-06-25 Thread Owen O'Malley
Actually, it looks like LLAP is trying to get the ByteBuffer array from a direct byte buffer. Turning off direct byte buffers on read should fix the problem. .. Owen On Thu, Jun 25, 2020 at 7:27 AM Aaron Grubb wrote: > This appears to have been caused by orc.write.variable.length.blocks=true >

Re: Delegation tokens for HDFS

2019-09-20 Thread Owen O'Malley
If you are using Hive Server 2 through jdbc: - The most common way is to have the data only accessible to the 'hive' user. Since the users don't have access to the underlying HDFS files, Hive can enforce column/row permissions. - The other option is to use doAs and run as the user. Tha

Re: Converting Hive Column from Varchar to String

2019-07-18 Thread Owen O'Malley
ORC files expect UTF-8, which is a superset of ascii, in strings, char, and varchar. The only place that I know that will cause trouble if you put non-utf-8 data in strings is the statistics. The API for getting the min/max will convert to Java strings. But back to your original point, the schema

Re: Converting Hive Column from Varchar to String

2019-07-18 Thread Owen O'Malley
Which version of Hive are you on? The recent versions (hive >= 2.3) should support schema evolution in the ORC reader. .. Owen On Wed, Jul 17, 2019 at 11:07 PM Jörn Franke wrote: > You have to create a new table with this column as varchar and do a select > insert from the old table. > > > Am 1

Re: Hive Compaction OOM

2018-09-17 Thread Owen O'Malley
of memory. > > > Thanks > > Shawn Weeks > > > -- > *From:* Owen O'Malley > *Sent:* Monday, September 17, 2018 3:37:09 PM > *To:* user@hive.apache.org > *Subject:* Re: Hive Compaction OOM > > How many files is it trying to merge at once? By far the ea

Re: Hive Compaction OOM

2018-09-17 Thread Owen O'Malley
at > org.apache.hadoop.hive.ql.txn.compactor.CompactorMR$CompactorMap.map(CompactorMR.java:655) > at > org.apache.hadoop.hive.ql.txn.compactor.CompactorMR$CompactorMap.map(CompactorMR.java:633) > at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54) > at org.apache.hadoop.ma

Re: Hive Compaction OOM

2018-09-17 Thread Owen O'Malley
Shawn, Can you provide the stack trace that you get with the OOM? Thanks, Owen On Mon, Sep 17, 2018 at 9:27 AM Prasanth Jayachandran < pjayachand...@hortonworks.com> wrote: > Hi Shawn > > You might be running into issues related to huge protobuf objects from > huge string columns. Without

Re: issues with Hive 3 simple sellect from an ORC table

2018-06-08 Thread Owen O'Malley
r property which may arise > from relying on this email's technical content is explicitly disclaimed. > The author will in no case be liable for any monetary damages arising from > such loss, damage or destruction. > > > > On 8 June 2018 at 16:59, Owen O'Malley wrote: &g

Re: issues with Hive 3 simple sellect from an ORC table

2018-06-08 Thread Owen O'Malley
This looks like there is an API incompatibility between the version of hadoop expected and the version used. Which version of hadoop are you using? .. Owen > On Jun 8, 2018, at 08:31, Mich Talebzadeh wrote: > > Just installed and upgraded to Hive 3 where fun and game started :) > > First I ha

Re: Proposal: File based metastore

2018-01-29 Thread Owen O'Malley
> On Jan 29, 2018, at 9:29 AM, Edward Capriolo wrote: > > > > On Mon, Jan 29, 2018 at 12:10 PM, Owen O'Malley <mailto:owen.omal...@gmail.com>> wrote: > You should really look at what the Netflix guys are doing on Iceberg. > > https://github.com/Netfl

Re: Proposal: File based metastore

2018-01-29 Thread Owen O'Malley
You should really look at what the Netflix guys are doing on Iceberg. https://github.com/Netflix/iceberg They have put a lot of thought into how to efficiently handle tabular data in S3. They put all of the metadata in S3 except for a single link to the name of the table's root metadata file. Ot

Re: READING STRING, CONTAINS \R\N, FROM ORC FILES VIA JDBC DRIVER PRODUCES DIRTY DATA

2017-11-02 Thread Owen O'Malley
ORC stores the data in UTF-8 with the length of the value stored explicitly. Therefore, it doesn't do any parsing of newlines. You can see the contents of an ORC file by using: % hive --orcfiledump -d from https://orc.apache.org/docs/hive-ddl.html . How did you load the data into Hive? ... Owe

Re: Serde moved? version 2.3.0

2017-10-27 Thread Owen O'Malley
e > source code and re-implementing if i can. good to know. > > > > On Wed, Oct 25, 2017 at 6:32 PM, Owen O'Malley > wrote: > >> I considered that, but it won't work. >> >> The Hive 2.2 code looks like: >> >> public interface SerDe { ... }

Re: Serde moved? version 2.3.0

2017-10-25 Thread Owen O'Malley
d such that such a > mapping cannot be done. > > Regards, > Matt > > > > On Oct 25, 2017, at 7:31 PM, Owen O'Malley wrote: > > > On Wed, Oct 25, 2017 at 3:20 PM, Stephen Sprague > wrote: > >> i see. interesting. i think this breaks a ton of opensou

Re: Serde moved? version 2.3.0

2017-10-25 Thread Owen O'Malley
to be: 1. Change the plugin and recompile it. 2. Upgrade to Hive 2.2 instead of 2.3. 3. Make a case for reverting the change. I'm not sure what the original motivation of the change was. It seems like it was effectively a clean up. .. Owen > Am i interpreting this corre

Re: Serde moved? version 2.3.0

2017-10-25 Thread Owen O'Malley
SerDe was removed by https://issues.apache.org/jira/browse/HIVE-15167 You should use AbstractSerDe instead. .. Owen > On Oct 25, 2017, at 2:18 PM, Stephen Sprague wrote: > > hey guys, > > could be a dumb question but not being a java type of

Re: HIVE ORC table returns NULLs ( EMR 5.9 Hive 2.3.0 )

2017-10-25 Thread Owen O'Malley
ration, logging. I am > using defaults of EMR. > > Please advice. > Thanks, Oleg. > > > > > > > On Wed, Oct 25, 2017 at 2:30 PM, Owen O'Malley > wrote: > >> The file has the data. I'm not sure what Hive is doing wrong. >> >> owen@la

Re: HIVE ORC table returns NULLs ( EMR 5.9 Hive 2.3.0 )

2017-10-24 Thread Owen O'Malley
The file has the data. I'm not sure what Hive is doing wrong. owen@laptop> java -jar ../tools/target/orc-tools-1.5.0-SNAPSHOT-uber.jar > data ~/Downloads/Country.orc > Processing data file /Users/owen/Downloads/Country.orc [length: 392] > {"Id":1,"Name":"Singapore"} > {"Id":2,"Name":"Malaysia"} >

Re: Hive 2.2.0 and Hadoop 2.4.1 compatibility

2017-09-07 Thread Owen O'Malley
If I remember right, the encryption stuff went in to Hadoop 2.5. It isn't clear to me what versions of Hive are expected to work with which version of Hadoop. I've just been cleaning up the ORC shims for Hadoop and now ORC works cleanly with versions of Hadoop from 2.2 and will use features from 2

Re: Format dillema

2017-06-20 Thread Owen O'Malley
On Tue, Jun 20, 2017 at 10:12 AM, Edward Capriolo wrote: > It is whack that two optimized row columnar formats exists and each > respective project (hive/impala) has good support for one and lame/no > support for the other. > We have two similar formats because they were designed at roughly the

Re: any hive release imminent?

2017-06-20 Thread Owen O'Malley
The natives are very restless. I'm actively working on getting Hive 2.2 released. I'm running through qfile tests now and I hope to have it in the next couple weeks. It will be quickly followed up by Hive 2.3, which will be more aggressive with features, but less stable. .. Owen On Mon, Jun 19, 2

Re: Format dillema

2017-06-20 Thread Owen O'Malley
You should also try LLAP. With ORC or text, it will cache the hot columns and partitions in memory. I can't seem to find the slides yet, but the Comcast team had good results with LLAP: https://dataworkssummit.com/san-jose-2017/sessions/hadoop-query-performance-smackdown/ https://twitter.com/thej

Re: Hive : Storing data in external RCFile/SEQUENCE FILE table

2017-06-16 Thread Owen O'Malley
On Fri, Jun 16, 2017 at 6:36 AM, Kuldeep Chitrakar < kuldeep.chitra...@synechron.com> wrote: > I have two questions regarding loading tables which are defined as RCFile, > Sequencfile etc. > > > > Q1. > > 1. Suppose a table is defined as STORED AS RCFILE or SEQUENCEFILE, > how do we load th

Re: Parquet tables with snappy compression

2017-01-25 Thread Owen O'Malley
Mich, Here are the benchmarks that I did using three different types of data: http://www.slideshare.net/HadoopSummit/file-format-benchmark-avro-json-orc-parquet I assume you are comparing parquet-snappy vs parquet-none. .. Owen On Wed, Jan 25, 2017 at 1:37 PM, Mich Talebzadeh wrote: > Hi,

Re: Column names in ORC file

2016-12-15 Thread Owen O'Malley
Yes, it was fixed in HIVE-4243. .. Owen On Thu, Dec 15, 2016 at 10:21 AM, Elliot West wrote: > Possibly related to HIVE-4243 which was fixed in Hive 2.0.0: > https://issues.apache.org/jira/browse/HIVE-4243 > > > On Thu, 15 Dec 2016 at 18:06, Daniel Haviv com> wrote: > >> Hi, >> When I'm genera

Re: IntWritable cannot be cast to LongWritable

2016-12-14 Thread Owen O'Malley
Which version of Hive are you on? Hive 2.1 should automatically handle the type conversions from the file to the table. .. Owen On Wed, Dec 14, 2016 at 9:36 AM, Daniel Haviv < daniel.ha...@veracity-group.com> wrote: > Hi, > I have an ORC table where one of the fields was an int and is now a bigi

Re: Malformed orc file

2016-08-05 Thread Owen O'Malley
The file has trailing data. If you want to recover the data, you can use: % strings -3 -t d ~/Downloads/bucket_0 | grep ORC will print the offsets where ORC occurs with in the file: 0 ORC 4559 ORC That means that there is one intermediate footer within the file. If you slice the file at the

Re: Why does ORC use Deflater instead of native ZlibCompressor?

2016-06-23 Thread Owen O'Malley
For compression, I'm also interested in investigating the pure java compression codecs that were done by the Presto project: https://github.com/airlift/aircompressor They've implemented LZ4, Snappy, and LZO in pure java. On Thu, Jun 23, 2016 at 8:04 PM, Gopal Vijayaraghavan wrote: > > Though,

Re: Why does ORC use Deflater instead of native ZlibCompressor?

2016-06-23 Thread Owen O'Malley
D&sntz=1&usg=AFrqEzcvsj2bSqJ_SYc8qpQWQJnXXEjvLQ> > > <https://www.google.com/url?q=https%3A%2F%2Fwww.glassdoor.com%2FOverview%2FWorking-at-Rocket-Fuel-EI_IE286428.11%2C22.htm&sa=D&sntz=1&usg=AFrqEzf6IUelwlAKdidiiJ3wTFdjnigQVg> > > On Thu, Jun 23, 2016 at 2:35 PM, Owen O&#

Re: Why does ORC use Deflater instead of native ZlibCompressor?

2016-06-23 Thread Owen O'Malley
On Fri, Jun 17, 2016 at 11:31 PM, Aleksei Statkevich < astatkev...@rocketfuel.com> wrote: > Hello, > > I recently looked at ORC encoding and noticed > that hive.ql.io.orc.ZlibCodec uses java's java.util.zip.Deflater and not > Hadoop's native ZlibCompressor. > > Can someone please tell me what is t

Re: [VOTE] Bylaws change to allow some commits without review

2016-04-22 Thread Owen O'Malley
+1 On Fri, Apr 22, 2016 at 1:42 PM, Lars Francke wrote: > Hi everyone, thanks for the votes. I've been held up by personal stuff > this week but as there have been no -1s or other objections I'd like to > keep this vote open a bit longer until I've had time to go through the PMCs > and contact t

Re: ORC file sort order ..

2016-04-08 Thread Owen O'Malley
Use orcfiledump with the -d parameter. It will print the contents of the orc file. You could also use the file-contents executable from the C++ ORC reader. .. Owen On Fri, Apr 8, 2016 at 5:53 PM, Gautam wrote: > Hey, > >This might be too obvious a question but I haven't found a way

Re: ORC files and statistics

2016-01-19 Thread Owen O'Malley
the reader only loads the parts of the index it needs for the columns it is reading.) .. Owen > thanks again > > > On Tuesday, 19 January 2016, 17:35, Jörn Franke > wrote: > > > Just be aware that you should insert the data sorted at least on the most > discrimating colu

Re: ORC files and statistics

2016-01-19 Thread Owen O'Malley
It has both. Each index has statistics of min, max, count, and sum for each column in the row group of 10,000 rows. It also has the location of the start of each row group, so that the reader can jump straight to the beginning of the row group. The reader takes a SearchArgument (eg. age > 100) tha

Re: Create table from ORC or Parquet file?

2015-12-09 Thread Owen O'Malley
So your use case is that you already have the ORC files and you want a table that can read those files without specifying the columns in the table? Obviously without the columns being specified Hive wouldn't be able to write to that table, so I assume you only care about reading it. Is that right?

Re: ORC NPE while writing stats

2015-09-02 Thread Owen O'Malley
We have multiple threads writing, but each thread works on one file, so > orc writer is only touched by one thread (never cross threads) > On Sep 2, 2015 11:18 AM, "Owen O'Malley" wrote: > >> I don't see how it would get there. That implies that minimum was nul

Re: ORC NPE while writing stats

2015-09-02 Thread Owen O'Malley
I don't see how it would get there. That implies that minimum was null, but the count was non-zero. The ColumnStatisticsImpl$StringStatisticsImpl.serialize looks like: @Override OrcProto.ColumnStatistics.Builder serialize() { OrcProto.ColumnStatistics.Builder result = super.serialize(); OrcPr

Re: can OrcSerde work with custom input format

2015-07-24 Thread Owen O'Malley
Using OrcSerde should work fine. .. Owen > On Jul 24, 2015, at 17:39, Jie Zhang wrote: > > Hi, > > My application is using hive extenal tables with ORC and needs some special > logic to filter out some input files. I was thinking to write a custom > InputFormat extending OrcInputFormat. >

Re: Urgent : Issue with hive installation on Redhat linux 64bit

2015-07-08 Thread Owen O'Malley
Based on the answer here: http://stackoverflow.com/a/1096159/2301201 You must be trying to use a jdk older than java 1.7. Run the hive script with bash debugging turned on to see which jdk it is using. .. Owen On Wed, Jul 8, 2015 at 9:56 PM, Ravi Kumar Jain 03 wrote: > Hello All, > > > > We

Re: ApacheCON EU HBase Track Submissions

2015-06-25 Thread Owen O'Malley
Actually, Apache: Big Data Europe CFP closes 10 July. The CFP is: http://events.linuxfoundation.org/events/apache-big-data-europe/program/cfp On Thu, Jun 25, 2015 at 11:30 AM, Nick Dimiduk wrote: > Hello developers, users, speakers, > > As part of ApacheCON's inaugural "Apache: Big Data", I'm h

Apache: Big Data Europe call for proposals

2015-06-23 Thread Owen O'Malley
ApacheCon Europe is located in Budapest this year and has split in to two co-located events and I'd like to encourage everyone to submit talks proposals on Hive and ORC to Apache: Big Data Europe. http://events.linuxfoundation.org/events/apache-big-data-europe/program/cfp The CFP closes on 10 Jul

Re: Malformed Orc file Invalid postscript length 0

2015-05-22 Thread Owen O'Malley
Bhavana, Could you send me (omal...@apache.org) the incorrect ORC file? Which file system were you using? hdfs? Which version of Hadoop and Hive? Thanks, Owen On Fri, May 22, 2015 at 9:37 AM, Grant Overby (groverby) wrote: > I’m getting the following exception when Hive executes a query

Re: Writing Sequence Files

2015-05-04 Thread Owen O'Malley
On Mon, May 4, 2015 at 11:02 AM, Grant Overby (groverby) wrote: > I’m looking for some sample code to write a hive compatible sequence > file for an external table and matching ddl. > In general the easiest way is to create a table with what you'd like to have and use Hive to write to table li

Re: ORC file across multiple HDFS blocks

2015-04-28 Thread Owen O'Malley
You can also use the C++ reader to read a set of stripes. Look at the ReaderOptions.range(offset, length), which selects the range of stripes to process in terms of bytes. .. Owen On Tue, Apr 28, 2015 at 11:02 AM, Demai Ni wrote: > Alan and Grant, > > many thanks. Grant's comment is exact on th

Re: How to add custom codec support for ORC file.

2015-04-08 Thread Owen O'Malley
There currently isn't a way to do that. What are your requirements that would be easier with a custom codec? ORC uses the codecs in a very specific way so that it can support indexing. By default ORC indexes each 10k rows and the compression is done in blocks so that the reader can skip over blocks

Re: Over-logging by ORC packages

2015-04-06 Thread Owen O'Malley
Sorry for the excessive logging. The pushdown logging should only be at the start, is there a particular message that was being repeated per a row? Thanks, Owen On Mon, Apr 6, 2015 at 9:15 AM, Moore, Douglas < douglas.mo...@thinkbiganalytics.com> wrote: > On a cluster recently upgraded to H

Re: 38 digits vs 35 digits for Decimal type?

2015-02-21 Thread Owen O'Malley
Hive decimal supports 38 digits also. It is a natural size since it fits in 127 bits. .. Owen > On Feb 21, 2015, at 07:35, Yang wrote: > > If I were to transfer a table from existing oracle to hive, I'd find it > impossible with NUMBER type columns. By default oracle NUMBER gives 38 bits, >

Hive and Tez User Meetup

2014-06-03 Thread Owen O'Malley
We have a Hive and Tez User Meetup scheduled for Thursday afternoon in San Jose. The meetup is open to everyone. There are lots of working being done on Hive, come join us! http://www.meetup.com/Hadoop-Summit-Community-San-Jose/events/179084702/ Currently the schedule includes: * Hive 0.13 releas

Re: hive-exec shaded jar a bad idea?

2013-11-21 Thread Owen O'Malley
I think we should create a ql jar that contains the ql code and no dependencies. We can still make an exec jar that bundles the dependencies. On Thu, Nov 21, 2013 at 6:50 AM, Edward Capriolo wrote: > That is a good idea. I have also considered jar jar, as well as stripping > these things from hi

Re: hive-exec shaded jar a bad idea?

2013-11-21 Thread Owen O'Malley
There is already a jira for it: https://issues.apache.org/jira/browse/HIVE-5725 On Thu, Nov 21, 2013 at 10:25 AM, Owen O'Malley wrote: > I think we should create a ql jar that contains the ql code and no > dependencies. We can still make an exec jar that bundles the dependencies. &

Re: RLE in hive ORC

2013-11-11 Thread Owen O'Malley
Hi, The RLE in ORC is a tradeoff (as is all compression) between tight representations for commonly occurring patterns and longer representations for rarely occurring patterns. The question at hand is how to use the bits available to reduce the average size of the column. In Hive 0.12, ORC gained

Re: Hive 0.11.0 | Issue with ORC Tables

2013-09-19 Thread Owen O'Malley
On Thu, Sep 19, 2013 at 5:04 AM, Savant, Keshav < keshav.c.sav...@fisglobal.com> wrote: > Hi All, > > ** ** > > We have setup apache “hive 0.11.0” services on Hadoop cluster (apache > version 0.20.203.0). Hive is showing expected results when tables are > stored as * TextFile*. > > Where

Re: Question for ORCFileFormat

2013-09-11 Thread Owen O'Malley
The easiest way to use it is to use HCatalog, which enables you to read or write ORC files from MapReduce or Pig. -- Owen On Mon, Sep 9, 2013 at 11:14 AM, Saptarshi Guha wrote: > Hello, > > Are there any examples of writing using ORC as aFileOutputFormat (and then > as a FileInputFormat) in Map

Re: ORC vs TEXT file

2013-08-12 Thread Owen O'Malley
x27;s3://test/textfile/'; > Using block level compression and bzip2codec for output. > > b) With the above set of columns, just i have changed as STORED AS ORC for > creating ORC. Not using any compression option > > c)Inserted 7256852 records in both the tables > > d)Sp

Re: ORC vs TEXT file

2013-08-12 Thread Owen O'Malley
Pandees, I've never seen a table that was larger with ORC than with text. Can you share your text's file schema with us? Is the table very small? How many rows and GB are the tables? The overhead for ORC is typically small, but as Ed says it is possible for rare cases for the overhead to dominate

Re: Hortonworks HDP 1.3 vs. HDP 1.1

2013-07-04 Thread Owen O'Malley
For HDP specific questions, you should use the Hortonworks lists: http://hortonworks.com/community/forums/forum/hive/ Your question is about the difference between Hive 0.9 and Hive 0.11. The big additions are: Decimal type ORC files Analytics functions - cube roll up Windowing functions

Re: Partition performance

2013-07-03 Thread Owen O'Malley
On Wed, Jul 3, 2013 at 5:19 AM, David Morel wrote: > > That is still not really answering the question, which is: why is it slower > to run a query on a heavily partitioned table than it is on the same number > of files in a less heavily partitioned table. > According to Gopal's investigations i

Re: Partition performance

2013-07-02 Thread Owen O'Malley
On Tue, Jul 2, 2013 at 2:34 AM, Peter Marron < peter.mar...@trilliumsoftware.com> wrote: > Hi Owen, > > ** ** > > I’m curious about this advice about partitioning. Is there some > fundamental reason why Hive > > is slow when the number of partitions is 10,000 rather than 1,000? > The pre

Re: OrcFile writing failing with multiple threads

2013-06-11 Thread Owen O'Malley
ow(); > } > > Am I missing something here about the synchronized(this) ? Perhaps I am > looking in the wrong place. > > Thanks, > agp > > > From: Owen O'Malley > Reply-To: "user@hive.apache.org" > Date: Friday, May 24, 2013 2:15 PM > T

Re: Create table like with partitions

2013-06-10 Thread Owen O'Malley
You need to create the partitioned table and then copy the rows into it. create table foo_staging (int x, int y); create table foo(int x) partitioned by (int y) clustered by (x) into 16 buckets; set hive.exec.dynamic.partition=true; set hive.exec.dynamic.partition.mode=nonstrict; set hive.enforc

Re: Using JSON Data with Hive

2013-06-07 Thread Owen O'Malley
You might look at Russell's blog about using JSON in Hive: http://hortonworks.com/blog/discovering-hive-schema-in-collections-of-json-documents/ On Fri, Jun 7, 2013 at 8:24 AM, Michael Duergner | Pockets United GmbH < mich...@pocketsunited.com> wrote: > Hi there, > > I'm looking if we can use H

Re: Accessing Table Properies from InputFormat

2013-05-28 Thread Owen O'Malley
On Tue, May 28, 2013 at 9:27 AM, Edward Capriolo wrote: > The question we are diving into is how much of hive is going to be > designed around edge cases? Hive really was not made for columnar formats, > or self describing data-types. For the most part it handles them fairly > well. > I don't vie

Re: Accessing Table Properies from InputFormat

2013-05-28 Thread Owen O'Malley
On Tue, May 28, 2013 at 8:45 AM, Edward Capriolo wrote: > That does not really make sense. Your breaking the layered approache. > InputFormats read/write data, serdes interpret data based on the table > definition. its like asking "Why can't my input format run assembly code?" > The current model

Re: Accessing Table Properies from InputFormat

2013-05-28 Thread Owen O'Malley
On Tue, May 28, 2013 at 7:59 AM, Peter Marron < peter.mar...@trilliumsoftware.com> wrote: > Hi, > > ** ** > > Hive 0.10.0 over Hadoop 1.0.4. > > ** ** > > Further to my filtering questions of before. > > I would like to be able to access the table properties from inside my > custom In

Re: OrcFile writing failing with multiple threads

2013-05-24 Thread Owen O'Malley
Currently, ORC writers, like the Java collections API don't lock themselves. You should synchronize on the writer before adding a row. I'm open to making the writers synchronized. -- Owen On Fri, May 24, 2013 at 11:39 AM, Andrew Psaltis < andrew.psal...@webtrends.com> wrote: > All, > I have a

Re: Filtering

2013-05-19 Thread Owen O'Malley
On Sun, May 19, 2013 at 3:11 PM, Peter Marron < peter.mar...@trilliumsoftware.com> wrote: >Hi Owen, > > ** ** > > Firstly I want to say a huge thank you. You have really helped me > enormously. > You're welcome. > > OK. I think that I get it now. In my custom InputFormat I can read

[ANNOUNCE] Apache Hive 0.11.0 Released

2013-05-16 Thread Owen O'Malley
The Apache Hive team is proud to announce the the release of Apache Hive version 0.11.0. The Apache Hive data warehouse software facilitates querying and managing large datasets residing in distributed storage. Built on top of Apache Hadoop, it provides: * Tools to enable easy data extract/transf

Re: Filtering

2013-05-15 Thread Owen O'Malley
On Wed, May 15, 2013 at 3:38 AM, Peter Marron < peter.mar...@trilliumsoftware.com> wrote: > Hi, > > ** ** > > I’m using Hive 0.10.0 and Hadoop 1.0.4. > > ** ** > > I would like to create a normal table but have some of my code run so that > I can remove filtering > > parts of the quer

Re: HADOOP and Query Capabilities

2013-05-13 Thread Owen O'Malley
On Mon, May 13, 2013 at 9:34 AM, Nalin Khosla wrote: > Had a quick question wrt to querying HADOOP data; > > 1. What tools are available to Query Hadoop data in real time vs batch? > The line between real time and batch isn't that clear. We are working on substantially speeding up the performance

Re: Trying to write a custom HiveOutputFormat

2013-05-13 Thread Owen O'Malley
You could also look at the OrcSerde and how it works. https://github.com/apache/hive/blob/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcSerde.java Basically, OrcSerde on "serialize" just wraps the row and object inspector in a fake writable. That is passed down to the OutputFormat. On "de

Re: Who is the hive admin user?

2013-05-10 Thread Owen O'Malley
Unfortunately, the roles in Hive are advisory only. Effectively everyone is an admin who can grant anyone (including themselves) additional permissions. If you need security, the best option is to protect the HDFS directories that the data is stored in. Set the HDFS owner, group, and permissions s

Re: complex types and ORC

2013-05-03 Thread Owen O'Malley
On Mon, Apr 29, 2013 at 4:26 PM, Sean McNamara wrote: > If I create a table that has a map field, will ORC files > columnarize by the keys in the map? Or will all the pairs in the map be > grouped together? > It will break the map keys into one sub-column and the map values into a separate sub-

Re: ORC with Map Column Type using Hive 0.11.0-RC1

2013-05-03 Thread Owen O'Malley
On Fri, May 3, 2013 at 10:20 AM, Andrew Psaltis < andrew.psal...@webtrends.com> wrote: > Hello, > I am trying to evaluate Hive 0.11.0-RC1, in particular I am very > interested in the ORC storage mechanism. We have a need to have one column > be a Map in a table and from what I have read this is >

Re: builtins submodule - is it still needed?

2013-04-05 Thread Owen O'Malley
+1 to removing them. We have a Rot13 example in ql/src/test/org/apache/hadoop/hive/ql/io/udf/Rot13{In,Out}putFormat.java anyways. *smile* -- Owen On Fri, Apr 5, 2013 at 3:11 PM, Gunther Hagleitner < ghagleit...@hortonworks.com> wrote: > +1 > > I would actually go a step further and propose to

Re: Partition performance

2013-04-04 Thread Owen O'Malley
See slide #9 from my Optimizing Hive Queries talk http://www.slideshare.net/oom65/optimize-hivequeriespptx . Certainly, we will improve it, but for now you are much better off with 1,000 partitions than 10,000. -- Owen On Thu, Apr 4, 2013 at 4:21 PM, Ramki Palle wrote: > Is it possible for you

Re: Optimizing hive queries

2013-03-28 Thread Owen O'Malley
people partition (state='nv'); You'll end up with the first partition with 2 columns (and thus implicitly the third one is null) and the second partition with 3 columns. -- Owen > > I tried searching but could not find any example. > > Thanks in advance for your help. &

Re: Optimizing hive queries

2013-03-28 Thread Owen O'Malley
Actually, Hive already has the ability to have different schemas for different partitions. (Although of course it would be nice to have the alter table be more flexible!) The "versioned metadata" means that the ORC file's metadata is stored in ProtoBufs so that we can add (or remove) fields to the

Re: Security for Hive

2013-02-23 Thread Owen O'Malley
Correct, you'll need to manage the permissions manually in HDFS. The authorization model in Hive is just to prevent accidents. Hopefully, we'll address this eventually, but in the mean time it is strongly encouraged to set the permissions of your databases and tables in hdfs to the desired permissi

Re: ROW_NUMBER() equivalent in Hive

2013-02-21 Thread Owen O'Malley
What are the semantics for ROW_NUMBER? Is it a global row number? Per a partition? Per a bucket? -- Owen On Wed, Feb 20, 2013 at 11:33 PM, kumar mr wrote: > Hi, > > This is Kumar, and this is my first question in this group. > > I have a requirement to implement ROW_NUMBER() from Teradata in

Re: hive - snappy and sequence file vs RC file

2012-06-26 Thread Owen O'Malley
SequenceFile compared to RCFile: * More widely deployed. * Available from MapReduce and Pig * Doesn't compress as small (in RCFile all of each columns values are put together) * Uncompresses and deserializes all of the columns, even if you are only reading a few In either case, for long te

Re: Hive in IntelliJ

2012-05-31 Thread Owen O'Malley
On Thu, May 31, 2012 at 9:35 AM, Edward Capriolo wrote: > Hive is in maven I meant that Hive is built with Ant. Intellij has better support for importing projects built by Maven. Hive's jars are published in Maven central, but that is a different thing. -- Owen

Re: Hive in IntelliJ

2012-05-31 Thread Owen O'Malley
On Thu, May 31, 2012 at 12:45 AM, Lars Francke wrote: > Hi, > > has anyone managed to get Hive properly set up in IntelliJ? I've tried > but so far I've failed to get it running with Ivy and its > dependencies. > I managed, but it wasn't easy. I let it do the original import, but then had to fix

Re: What's the right data storage/representation?

2012-05-15 Thread Owen O'Malley
On Tue, May 15, 2012 at 5:11 AM, Jon Palmer wrote: > I can see a few potential solutions: > > 1.   Don’t solve it. Accept that you have some artifacts in your > reporting data that cannot be recovered from the source data. > > 2.   Create status and location history tables in the applicati

Re: using the key from a SequenceFile

2012-04-19 Thread Owen O'Malley
On Thu, Apr 19, 2012 at 3:07 AM, Ruben de Vries wrote: > I’m trying to migrate a part of our current hadoop jobs from normal > mapreduce jobs to hive, > > Previously the data was stored in sequencefiles with the keys containing > valueable data! I think you'll want to define your table using a cu

Re: Why the name HIVE?

2012-02-26 Thread Owen O'Malley
On Sat, Feb 25, 2012 at 3:14 PM, Chandan B.K wrote: > > Why the name HIVE? I wasn't involved in naming it, but I would guess that it has to do with the metaphor of bees as distributed workers gathering data/pollen, storing it in their hive, and making it useful. -- Owen