Re: Hive on Spark - Mesos

2016-09-15 Thread John Omernik
age or destruction of data or any other property which may arise > from relying on this email's technical content is explicitly disclaimed. > The author will in no case be liable for any monetary damages arising from > such loss, damage or destruction. > > > > On 15 September 2016 a

Hive on Spark - Mesos

2016-09-15 Thread John Omernik
Hey all, I was experimenting with some bleeding edge Hive. (2.1) and trying to get it to run on bleeding edge Spark (2.0). Spark is working fine, I can query the data all is setup, however, I can't get Hive on Spark to work. I understand it's not really a thing (Hive on Spark on Mesos) but I am t

Re: Re: [VOTE] Hive 2.0 release plan

2015-11-30 Thread John Omernik
Agreed, any plans for Hive 1.3? Will Hive 2.0 be a breaking release for those running 1.x? On Sun, Nov 15, 2015 at 7:07 PM, Wangwenli wrote: > Good News, *Any release plan for hive 1.3* ??? > > -- > Wangwenli > > > *From:* Gopal Vijayaraghavan > *Date:* 2015-1

Hive on Spark on Mesos

2015-09-09 Thread John Omernik
In the docs for Hive on Spark, it appears to have instructions only for Yarn. Will there be instructions or the ability to run hive on spark with Mesos implementations of spark? Is it possible now and just not documented? What are the issues in running it this way? John

Parquet Files in Hive - Settings

2015-08-18 Thread John Omernik
Is there a good writeup on what the settings that can be tweaked in hive as it pertains to writing parquet files are? For example, in some obscure pages I've found settings like parquet.compression, parquet.dictionary.page.size and parquet.enable.dictionary, but they were in reference to stock mapr

Re: Hive 1.0 vs. 0.15

2015-02-09 Thread John Omernik
been-released/ > > On Mon, Feb 9, 2015 at 2:52 PM, John Omernik wrote: >> >> Can you point me to the blog, I didn't see that in the official onlist >> announce email, and probably need to bookmark said blogs if they are >> valuable. >> >> Thanks! >> &g

Re: Hive 1.0 vs. 0.15

2015-02-09 Thread John Omernik
Can you point me to the blog, I didn't see that in the official onlist announce email, and probably need to bookmark said blogs if they are valuable. Thanks! On Mon, Feb 9, 2015 at 1:46 PM, DU DU wrote: > as said in the blog, 0.15.0 maps to hive 1.1.0 > > On Mon, Feb 9, 2015 at

Hive 1.0 vs. 0.15

2015-02-09 Thread John Omernik
Hey all, I was monitoring a specific JIRA for Hive (https://issues.apache.org/jira/browse/HIVE-7073) and saw that it was resolved in Hive 0.15. But now with the Hive 1.0.0 release after Hive 0.14, am confused how where the Binary column support for Parquet will be moved into mainline. Thoughts?

Re: Running hive inside a bash script

2014-12-02 Thread John Omernik
That's not what I've found: $ hive -e "show tables" table1 table2 $ echo $? 0 $ hive -e "show partitions notable" FAILED: SemanticException [Error 10001]: Table not found notable $ echo $? 17 In a bash script: hive -e "show partitions notable" hiveresult=`echo $?` if [ $hiveresult -ne 0 ]; t

Fwd: Files Per Partition Causing Slowness

2014-12-02 Thread John Omernik
-- Forwarded message -- From: John Omernik Date: Tue, Dec 2, 2014 at 1:58 PM Subject: Re: Files Per Partition Causing Slowness To: user@hive.apache.org Thank you Edward, I knew the number of partitions mattered, but I didn't think 1000 would be to much. However, I d

Files Per Partition Causing Slowness

2014-12-02 Thread John Omernik
I am running Hive 0.12 in production, I have a table that ha 1100 partitions, (flat, no multi level partitions) and in those partitions some have a small number of files (5- 10) and others have quite a few files (up to 120). The total table size is not "huge" around 285 GB. While this is not ter

Re: bug in hive

2014-09-22 Thread John Omernik
Shushant - What I believe what Stephen is sarcastically trying to say is that some organizational education may be in order here. Hive itself is not even at version 1.0, those of us who use Hive in production know this, and have to accept that there will be bugs like the one you are trying to addr

Re: Weird Error on Inserting in Table [ORC, MESOS, HIVE]

2014-09-09 Thread John Omernik
/browse/HIVE-7140 On Tue, Sep 9, 2014 at 1:00 PM, John Omernik wrote: > I ran with debug logging, and this is interesting, there was a loss of > connection to the metastore client RIGHT before the partition mention > above... as data was looking to be moved around... I wonder if the timing

Re: Weird Error on Inserting in Table [ORC, MESOS, HIVE]

2014-09-09 Thread John Omernik
org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:129) On Tue, Sep 9, 2014 at 11:02 AM, John Omernik wrote: > I am doing a dynamic partition load in Hive 0.13 using ORC files. This has > always worked in the past both with MapReduce V1 and YARN. I am working > with Mesos now, a

Weird Error on Inserting in Table [ORC, MESOS, HIVE]

2014-09-09 Thread John Omernik
I am doing a dynamic partition load in Hive 0.13 using ORC files. This has always worked in the past both with MapReduce V1 and YARN. I am working with Mesos now, and trying to trouble shoot this weird error: Failed with exception AlreadyExistsException(message:Partition already exists What's

Parquet Binary Column Support

2014-09-06 Thread John Omernik
Greetings all - We really want to look into the Parquet file format more, however, without supporting all the Hive Column types, we are hesitant to dive in more. Currently, it looks like it's just the BINARY column type (which I use) based on the JIRA below, there hasn't been any movement in on t

Count(distinct col) in Windowing

2014-05-16 Thread John Omernik
Is there a reason why I can't use select col1, col2, count(distinct col3) over (PARTITION by col4 order by col5 ROWS BETWEEN 5 PRECEDING AND FOLLOWING) as col1 from table ? I am trying to see for any given window if there is a lot of variability in a col4, and it just doesn't work with count dis

Re: ORC file in Hive 0.13 throws Java heap space error

2014-05-16 Thread John Omernik
When I created the table, I had to reduce the orc.compress.size quite a bit to make my table with many columns work. This was on Hive 0.12 (I thought it was supposed to be fixed on Hive 0.13, but 3k+ columns is huge) The default of orc.compress size is quite a bit larger ( think in the 268k range)

Re: Hive 0.12 ORC Heap Issues on Write

2014-04-28 Thread John Omernik
ived this communication in error, please contact the sender immediately > and delete it from your system. Thank You. > > Thanks > Prasanth Jayachandran > > On Apr 27, 2014, at 3:06 PM, John Omernik wrote: > > So one more follow-up: > > The 16-.25-Success turns to a fail if I

Re: Hive 0.12 ORC Heap Issues on Write

2014-04-27 Thread John Omernik
So one more follow-up: The 16-.25-Success turns to a fail if I throw more data (and hence more partitions) at the problem. Could there be some sort of issue that rears it's head based on the number of output dynamic partitions? Thanks all! On Sun, Apr 27, 2014 at 3:33 PM, John Omernik

Re: Hive 0.12 ORC Heap Issues on Write

2014-04-27 Thread John Omernik
. lucky? Or are they related? 5. Is there a better approach I can take on this? 6. Any other variables I could look at? On Sun, Apr 27, 2014 at 11:56 AM, John Omernik wrote: > Hello all, > > I am working with Hive 0.12 right now on YARN. When I am writing a table > that is adm

Hive 0.12 ORC Heap Issues on Write

2014-04-27 Thread John Omernik
Hello all, I am working with Hive 0.12 right now on YARN. When I am writing a table that is admittedly quite "wide" (there are lots of columns, near 60, including one binary field that can get quite large). Some tasks will fail on ORC file write with Java Heap Space Issues. I have confirmed th

HDFS Storage Locations and Hive

2014-03-03 Thread John Omernik
Given the direction HDFS is going with Storage locations as identified in https://issues.apache.org/jira/browse/HDFS-2832 and https://issues.apache.org/jira/secure/attachment/12597860/20130813-HeterogeneousStorage.pdf Is now the right time to toss out some suggestions for the Hive project on in

Re: What are all the factors that go into the number of mappers - ORC

2014-02-03 Thread John Omernik
) hive.min.split.size > 3) hive.max.split.size > 4) total size on disk for the table > > Thanks > Prasanth Jayachandran > > On Feb 2, 2014, at 5:25 PM, John Omernik wrote: > > > I have two clusters, but small dev clusters, and I loaded the same > dataset into both of them.

What are all the factors that go into the number of mappers - ORC

2014-02-02 Thread John Omernik
I have two clusters, but small dev clusters, and I loaded the same dataset into both of them. The data size on disk is within 2000 Bytes. Both are ORC, one is Hive 11 and one is Hive 12. One is allocating about 8 more mappers to the exact same query. I am just curious what settings would change

Survey: What do you use to interface with Hive?

2013-12-04 Thread John Omernik
This can be an interesting subject, I know orgs that are all over on this questions. I'd be interested in hearing what you use, how it works for you, and what you are wishing you had in your interface. I'll start: We've used a number of things: - CLI for scheduled jobs. Pros: Solid running, fair

Re: Table creation for logfile data

2013-11-24 Thread John Omernik
ng date in the > form 22/11/13 which is dd/mm/yy, I have to rearrange this to /mm/dd, > can you please shed some light on this. I think we need to use split() to > get the tokens and then rearrange, but I am not able to think of an > efficient way to do this. > > Thanks. &g

Re: Table creation for logfile data

2013-11-24 Thread John Omernik
Put the logile into a location on HDFS, and create an external table pointing to that location. The External table should just have one column, a string, CREATE EXTERNAL TABLE logfile_etl (message STRING) LOCATION '/etl/logfile' I think that should work. Then Create another table CREATE TABLE lo

Is this a Bug in from_utc_timestamp?

2013-11-19 Thread John Omernik
There are some discussions on this https://issues.apache.org/jira/browse/HIVE-3822 However, one person is stating there is not an issue with timestamp, thus I am asking this question: is this a bug in from_utc_timestamp? Example: I have a column starttime with the value 1384495201 in it (it's st

Re: ORC Tuning - Examples?

2013-11-13 Thread John Omernik
udying file formats and it has some related > contents. Here is the link: > http://www.cse.ohio-state.edu/hpcs/WWW/HTML/publications/papers/TR-13-5.pdf > . > > Thanks, > > Yin > > > On Tue, Nov 12, 2013 at 8:51 PM, Lefty Leverenz > wrote: > >> If you get some usefu

ORC Tuning - Examples?

2013-11-12 Thread John Omernik
I am looking for guidance (read examples) on tuning ORC settings for my data. I see the documentation that shows the defaults, as well as a brief description of what it is. What I am looking for is some examples of things to try. *Note: I understand that nobody wants to make sweeping declaring o

Suggestion for Metastore Oprerations around ORC Files

2013-11-09 Thread John Omernik
I was testing out the conversion of a table to ORC. Using previous posts, I did alter table tablename set fileformat ORC; This worked great All new partitions created were ORC, the RC and ORC files played nice next to each other. Then I had a hypothesis. I have tables that almost always have hi

ORC Files: Does this get me anything?

2013-10-16 Thread John Omernik
So I am experimenting with ORC files, and I have a fast little table that has login events. Out of curiosity, I was wondering if based on what we all knew about ORC files, if did the below, would the per file indexing get me anything? Now, before people complain about small files, let's toss that

Re: Bug in Hive Split function (Tested on Hive 0.9 and 0.11)

2013-10-09 Thread John Omernik
I opened a JIRA on this: https://issues.apache.org/jira/browse/HIVE-5506 On Wed, Oct 9, 2013 at 9:44 AM, John Omernik wrote: > Hello all, I think I have outlined a bug in the hive split function: > > Summary: When calling split on a string of data, it will only return all > ar

Bug in Hive Split function (Tested on Hive 0.9 and 0.11)

2013-10-09 Thread John Omernik
Hello all, I think I have outlined a bug in the hive split function: Summary: When calling split on a string of data, it will only return all array items if the the last array item has a value. For example, if I have a string of text delimited by tab with 7 columns, and the first four are filled,

Re: Hive 0.11.0 | Issue with ORC Tables

2013-09-20 Thread John Omernik
Another advantage to the method described by Owen is your process of creating the ORC file is distributed. (Rather than precreating the ORC file off cluster and then moving into the cluster). This way, you just push your text files into the cluster, do the select statement and push into the ORC t

Re: Optimizing ORC Sorting - Replace two level Partitions with one?

2013-08-10 Thread John Omernik
number of files. Because your hashing the value into a bucket. > > A query scanning many partitions and files is needlessly slow from MR > overhead. > > > On Sat, Aug 10, 2013 at 12:58 PM, John Omernik wrote: > >> One issue with the bucketing is that the number of sources o

Re: Optimizing ORC Sorting - Replace two level Partitions with one?

2013-08-10 Thread John Omernik
t;> bucket a table into 10 buckets, select with where does not actually prune >> the input buckets so many queries scan all the buckets. >> >> >> On Sat, Aug 10, 2013 at 12:34 PM, Nitin Pawar wrote: >> >>> will bucketing help? if you know finite # partiot

Optimizing ORC Sorting - Replace two level Partitions with one?

2013-08-10 Thread John Omernik
I have a table that currently uses RC files and has two levels of partitions. day and source. The table is first partitioned by day, then within each day there are 6-15 source partitions. This makes for a lot of crazy partitions and was wondering if there'd be a way to optimize this with ORC fil

Re: RC -> ORC INSERT OVERWRITE metastore Heap Error

2013-08-10 Thread John Omernik
er it (due to my own error) and not have other be able to learn from my mistakes. Sorry group! John On Sat, Aug 10, 2013 at 8:18 AM, John Omernik wrote: > I am doing some testing going from table_rc to table_orc . The > table/partition structure is the same, and there is a two level pa

RC -> ORC INSERT OVERWRITE metastore Heap Error

2013-08-10 Thread John Omernik
I am doing some testing going from table_rc to table_orc . The table/partition structure is the same, and there is a two level partition day= then source= I am doing a single day (including all 10 or so sources in the day). This worked just fine in one environment, but now, I am getting strange er

Re: Large Scale Table Reprocess

2013-07-26 Thread John Omernik
f ORC files, do you see ORC files changing significantly (i.e. to the point where we have to do another re process?) On Fri, Jul 26, 2013 at 5:09 PM, John Omernik wrote: > Can you give some examples of how to alter partitions for different input > types? I'd appreciate it :) > >

Re: Large Scale Table Reprocess

2013-07-26 Thread John Omernik
s avoids all > the issues you noted. And since most queries probably only access recent > data you'll see speed ups soon after the switch. > > Alan. > > On Jul 25, 2013, at 4:45 PM, John Omernik wrote: > > > Just finishing up testing with Hive 11 and ORC. Thank you

Large Scale Table Reprocess

2013-07-25 Thread John Omernik
Just finishing up testing with Hive 11 and ORC. Thank you to Owen and all those who have put hard work into this. Just ORC files, when compared to RC files in Hive 9, 10, and 11 saw a huge increase in performance, it was amazing. That said, now we gotta reprocess. We have a large table with lots

Java Courses for Scripters/Big Data Geeks

2013-07-17 Thread John Omernik
Hey all - I was wondering if there were any "shortcut" Java courses out there. As in, I am not looking for a holistic learn everything about Java course, but more of a "So you are a big data/hive geek and you get Python/Perl pretty well, but when you try to understand Java your head explodes and

Re: Difference between like %A% and %a%

2013-05-24 Thread John Omernik
I have mentioned this before, and I think this a big miss by the Hive team. Like, by default in many SQL RDBMS (like MSSQL or MYSQL) is not case sensitive. Thus when you have new users moving over to Hive, if they see a command like "like" they will assume similarity (like many other SQL like qua

Re: Hive Authorization and Views

2013-05-16 Thread John Omernik
permissions, RDMBS have column and sometimes row > level permissions. > > When you physically have access to the underlying file (row level) > permissions are not enforceable. The only way to enforce this type of > security is to force users through a "turnstyle" that changes

Re: Hive Authorization and Views

2013-05-16 Thread John Omernik
I am curious on the thoughts of the community here, this seems like something many enterprises would drool over with Hive... I am not a coder so the level coding involved something like this is unknown. On Sat, May 4, 2013 at 8:31 AM, John Omernik wrote: > We were doing some tests this p

Re: HIVE-3979 in Hive 0.11

2013-05-06 Thread John Omernik
Bummer, ok thank you for fixing the release notes. :) On Mon, May 6, 2013 at 12:43 AM, Carl Steinbach wrote: > Hi John, > > This is a mistake in the release notes. It will be fixed in the next 0.11 > release candidate. > > Thanks. > > Carl > > > On Sat, May

Hive Authorization and Views

2013-05-04 Thread John Omernik
We were doing some tests this past week with hive authorization, one of our current use "challenges" is when we have an underlying, well managed and partitioned table, and we want to allow access to certain columns in that table. Our first thoughts went to VIEWs as that's a common use case with Re

HIVE-3979 in Hive 0.11

2013-05-04 Thread John Omernik
I see in the release notes for HIVE -3979 [HIVE-3979 ] - Provide syntax for unescaped regex on rlike, and other regexp_* functions Yet when I click on that JIRA there are not notes etc. Could it be that this was included by mistake? I am curious,

Re: Upgrade from Hive 0.9 to Hive 0.10 Heap Error on show tables;

2013-04-03 Thread John Omernik
* I did read some stuff about these settings changing in 0.10 On Wed, Apr 3, 2013 at 6:59 PM, Richard Nadeau wrote: > Hi John, > > Do you have a copy of the MySQL JDBC driver in your Hive library path? > > Rick > On Apr 3, 2013 3:57 PM, "John Omernik" wrote: > >&

Upgrade from Hive 0.9 to Hive 0.10 Heap Error on show tables;

2013-04-03 Thread John Omernik
Not sure what the issues is, conf is good, validated I can log in to mysql with username in the hive-site, and I ran the metastore update scripts. show tables; java.lang.OutOfMemoryError: Java heap space at org.apache.thrift.protocol.TBinaryProtocol.readStringBody(TBinaryProtocol.java:353) at org

Re: Books and good starting point for Hive

2013-02-24 Thread John Omernik
Hello William - Dean Wampler posts quit often on this list and has done (to my eye) a great job of separating his business (he and other authors have written a Hive book) from the community aspect of (he participates freely on the list without a lot of self promotion). Therefore, I will give him

Re: CHAN (Comprehensive Hive Archive Network) (Was Re: Using Reflect: A thread for ideas)

2013-02-14 Thread John Omernik
available to the > wider community. > > Is there interest in the user community for something like this? > > Robin > > From: John Omernik > Reply-To: "user@hive.apache.org" > Date: Wednesday, February 13, 2013 8:38 PM > To: "user@hive.apache.org&quo

Using Reflect: A thread for ideas

2013-02-13 Thread John Omernik
I stumbled across the little documented reflect function today. I've always known about it, but java scares me if it's not in a cup so I didn't dig. Well today I dug, and found an awesome use case for reflect (for me) and wanted to share. I also thought it would be nice to validate some thoughts

Re: Union in Multi Insert

2013-02-12 Thread John Omernik
gt; insert overwrite... > > > And, if I misunderstood your problem, my apologies. If you could provide > an example with sample data and expected output, that might be helpful. > > > Mark > > On Mon, Feb 11, 2013 at 7:34 PM, John Omernik wrote: > >> I

Union in Multi Insert

2013-02-11 Thread John Omernik
I am trying to do a union, group by, and multi insert all at once. I know this convoluted but I what I am trying to do is avoid having to scan through the original table more than once... if I can get all my data from two columns that I want to pull together, in one round of mappers, I win... Basi

Re: Combine multiple row values based upon a condition.

2013-02-03 Thread John Omernik
INATED BY '\n' > > STORED AS TEXTFILE > > LOCATION '/research/45924/hive/entities_extract'; > > > > LOAD DATA LOCAL INPATH > > '/home/researcher/hadoop-runnables/files/entitie_extract_by_doc.txt' > > OVERWRITE INTO TABLE entities_extract

Re: Combine multiple row values based upon a condition.

2013-02-03 Thread John Omernik
nssen, [0, 48] > > I hope this clarifies my question. > If things are still unclear please don't hesitate to ask me to clarify my > question further. > > Kind regards, > Martijn > > On Feb 3, 2013, at 1:05 PM, John Omernik wrote: > > Well there are some methods tha

Re: Combine multiple row values based upon a condition.

2013-02-03 Thread John Omernik
Well there are some methods that may work, but I'd have to understand your data and your constraints more. You want to be able to (As it sounds) sort by offset, and then look at the one row, and then the next row, to determine if the the two items should be joined. It "looks" like you are doing a

Re: The dreaded Heap Space Issue on a Transform

2013-01-30 Thread John Omernik
, and >> io.file.buffer.size? You should be able to adjust these and get past the >> heap issue. Be careful about how much ram you ave though, and don't st them >> too high. >> >> Rick >> On Jan 30, 2013 8:55 AM, "John Omernik" wrote: >> >

Re: The dreaded Heap Space Issue on a Transform

2013-01-30 Thread John Omernik
hadoop nodes. > It's the heap within the process forked by the hadoop tasktracker, I think. > > Phil. > > > On 30 January 2013 14:28, John Omernik wrote: > >> So just a follow-up. I am less looking for specific troubleshooting on >> how to fix my problem, and more l

Re: The dreaded Heap Space Issue on a Transform

2013-01-30 Thread John Omernik
ckers are run etc. Thanks is advance! On Tue, Jan 29, 2013 at 7:43 AM, John Omernik wrote: > I am running a transform script that parses through a bunch of binary > data. In 99% of the cases it runs, it runs fine, but on certain files I get > a failure (as seen below). Funny thing i

The dreaded Heap Space Issue on a Transform

2013-01-29 Thread John Omernik
I am running a transform script that parses through a bunch of binary data. In 99% of the cases it runs, it runs fine, but on certain files I get a failure (as seen below). Funny thing is, I can run a job with "only" the problem source file, and it will work fine, but when as a group of files, I g

Re: Is this a known Bug: Multi Inserts from partitioned source ignore Where Clauses

2013-01-26 Thread John Omernik
structures were to see the OVERWRITE vs INTO they may see something different. On Sat, Jan 26, 2013 at 9:20 AM, Philip Tromans wrote: > This is a known (recently fixed) bug: > > https://issues.apache.org/jira/browse/HIVE-3699 > > Phil. > > > On 26 January 2013 15:17, John Om

Is this a known Bug: Multi Inserts from partitioned source ignore Where Clauses

2013-01-26 Thread John Omernik
I ran into an interesting bug. Basically, if your FROM() source is a partitioned table and you use a where clause that prunes, all of the INSERT HERE SELECT * WHERE x=y ignores each specified where clause. This does not occur if the source partition is not specified, but if the source as where par

Lateral View in sub query issue

2013-01-25 Thread John Omernik
Anyone else seeing this select col1, col2, col3, excol from sometablewithanarrayfield LATERAL VIEW explode(arcol) artab as excol Works just fine select col1, excol, count(1) as excount from ( select col1, col2, col3, excol from sometablewithanarrayfield LATERAL VIEW explode(arcol) artab as exc

Interaction between Java and Transform Scripts on Hive

2013-01-16 Thread John Omernik
I am perplexed if I run a transform script on a file by itself, it runs fine, outputs to standard out life is good. If I run the transform script on that same file (with the path and filename being passed into the script via transform so that the python script is doing the exact same thing) I get

Re: Timestamp, Epoch Time, Functions and other Frustrations

2013-01-05 Thread John Omernik
, whereas if the conversion is to a type such as timestamp which is by design timzoneless, we should not apply a timezone to it. (unless specified through the helper functions) I am open to seeing where I am looking at things wrong. On Fri, Jan 4, 2013 at 12:06 PM, John Omernik wrote: > So I r

Re: Timestamp, Epoch Time, Functions and other Frustrations

2013-01-04 Thread John Omernik
they expect milliseconds since the epoch instead of seconds. > > > > Brad. > > > > > > On 2013-01-04, at 8:03 AM, John Omernik wrote: > > > > Greetings all. I am getting frustrated with the documentation and lack of > > intuitiveness in Hive relating to timest

Re: Timestamp, Epoch Time, Functions and other Frustrations

2013-01-04 Thread John Omernik
ne without knowing a timezone or timezone offset. On Fri, Jan 4, 2013 at 10:03 AM, John Omernik wrote: > Greetings all. I am getting frustrated with the documentation and lack of > intuitiveness in Hive relating to timestamps and was hoping I could post > here and get some clarification or

Timestamp, Epoch Time, Functions and other Frustrations

2013-01-04 Thread John Omernik
Greetings all. I am getting frustrated with the documentation and lack of intuitiveness in Hive relating to timestamps and was hoping I could post here and get some clarification or other ideas. I have a field that is a string, but is actually a 10 digit int representation of epoch time, I am goin

Re: Running commands at hive cli or hive thirft startup

2012-12-10 Thread John Omernik
Will that work for my thrift server connections? On Sun, Dec 9, 2012 at 7:56 PM, विनोद सिंह wrote: > Put a .hiverc file in your home directory containing commands, Hive CLI > will execute all of them at startup. > > Thanks, > Vinod > > On Sun, Dec 9, 2012 at 10:25 PM

Running commands at hive cli or hive thirft startup

2012-12-09 Thread John Omernik
I am looking for ways to streamline some of my analytics. One thing I notice is that when I use hive cli, or connect to my hive thrift server, there are a some commands I always end up running for my session. If I have multiple CLIs or connections to Thrift, then I have to run it each time. If I l

Re: BINARY column type

2012-12-02 Thread John Omernik
well. *shrug* On Sun, Dec 2, 2012 at 9:00 AM, Connell, Chuck wrote: > The hex idea is clever. But does this mean that the files you brought > into Hive (with a LOAD statement) were essentially ascii (hexed), not raw > binary? > > -- > *From:* John Om

Re: BINARY column type

2012-12-01 Thread John Omernik
e, and find a way to make this work. > > Chuck > > -- > *From:* John Omernik [j...@omernik.com] > *Sent:* Saturday, December 01, 2012 4:22 PM > *To:* user@hive.apache.org > *Subject:* Re: BINARY column type > > Hi Chuck - > > I've used binary columns with

Re: BINARY column type

2012-12-01 Thread John Omernik
Hi Chuck - I've used binary columns with Newlines in the data. I used RCFile format for my storage method. Works great so far. Whether or not this is "the" way to get data in, I use hexed data (my transform script outputs hex encoded) and the final insert into the table gets a unhex(sourcedata).

Re: Hive table backed by a txt file on S3

2012-10-25 Thread John Omernik
Try putting the location to the directory the file is in. If there are other files you don't want to be included make a subdir. On Thu, Oct 25, 2012 at 6:26 AM, Nitin Pawar wrote: > In that case, it looks like when you do a select * .. its just a cat > operation. > > whats the error you are getti

Re: Writing Custom Serdes for Hive

2012-10-16 Thread John Omernik
ve the record reader > can get access to predicates. The code to access HBase from Hive needs it > for the same reasons as you would need with Mongo and might be a good place > to start. > > thanks, > Shrikanth > > On Oct 16, 2012, at 8:54 AM, John Omernik wrote: > > Th

Re: Writing Custom Serdes for Hive

2012-10-16 Thread John Omernik
ngiu… > > ** ** > > http://www.congiu.com/a-json-readwrite-serde-for-hive/**** > > ** ** > > Chuck Connell > > Nuance R&D Data Team > > Burlington, MA > > ** ** > > ** ** > > *From:* John Omernik [mailto:j...@omernik.com] > *Sent:* T

Writing Custom Serdes for Hive

2012-10-16 Thread John Omernik
We have a maybe obvious question about a serde. When a serde in invoked, does it have access to the original hive query? Ideally the original query could provide the Serde some hints on how to access the data on the backend. Also, are there any good links/documention on how to write Serdes? Kind

Hive Issues RCFile - Binary fields Corruption or field storage issues?

2012-10-15 Thread John Omernik
I am putting binary data into binary columns in hive and using RCFile. Most data is just fine in my very large table, however queries over certain time frames get me RCFile/Compression issues. The data goes in fine. Is this a FS level corruption issue? Is this something tunable? How would I even

Re: NEED HELP in Hive Query

2012-10-14 Thread John Omernik
select NAME, DATE, URL, SUM(HITCOUNT) as HITCOUNT from yourtable group by NAME, DATE, URL That's the HIVE answer. Not sure the PIG answer. On Sun, Oct 14, 2012 at 9:54 AM, yogesh dhari wrote: > Hi all, > > I have this file. I want this operation to perform in *HIVE & PIG* > > NAME

Custom Serde/Connector Null Pointer Exception

2012-10-13 Thread John Omernik
Greetings all. I am not sure if this is a hive issue or a custom serde issue. So I will ask in both places. I am trying to use the mongodb connection written by yc-huang. This could have great potential with our data. THe link is here. https://github.com/yc-huang/Hive-mongo I followed instruct

Re: View Partition Pruning not Occurring during transform

2012-10-11 Thread John Omernik
own' as you would assume nesting froms > make the query happen in a specific way. > > > On Wednesday, October 10, 2012, John Omernik wrote: > > Agreed. That's the conclusion we came to as well. So it's less of a bug > and more of a feature request. I think one of t

Re: Book 'Programming Hive' from O'Reilly now available!

2012-10-11 Thread John Omernik
Read the book cover to cover. I feel like so many different areas have been filled in with my Hive knowledge. WONDERFUL book. One question: With RCFile, does Block compression with GZIP work like SequenceFiles where by using RCFILE + BLOCK + GZIP you actually get some of the benefits of splitabl

Re: View Partition Pruning not Occurring during transform

2012-10-10 Thread John Omernik
in the output of a > transform refer to specific columns in the input of a transform for > predicate push down purposes (and that such pushdown is legal for this > transformation) > > thanks, > Shrikanth > On Oct 10, 2012, at 12:04 PM, John Omernik wrote: > > > Greetings all

View Partition Pruning not Occurring during transform

2012-10-10 Thread John Omernik
Greetings all, I am trying to incorporate a TRANSFORM into a view (so we can abstract the transform script away from the user) As a Test, I have a table partitioned on day (in -MM-DD formated) with lots of partitions and I tried this CREATE VIEW view_transform as Select TRANSFORM (day, ip)

Re: Hive File Sizes, Merging, and Splits

2012-09-25 Thread John Omernik
35 PM, Connell, Chuck wrote: > Why do you think the current generated code is inefficient? > > ** ** > > ** ** > > ** ** > > *From:* John Omernik [mailto:j...@omernik.com] > *Sent:* Tuesday, September 25, 2012 2:57 PM > *To:* user@hive.apache.org > *Subje

Hive File Sizes, Merging, and Splits

2012-09-25 Thread John Omernik
I am really struggling trying to make hears or tails out of how to optimize the data in my tables for best query times. I have a partition that is compressed (Gzip) RCFile data in two files total 421877 263715 -rwxr-xr-x 1 darkness darkness 270044140 2012-09-25 13:32 00_0 158162 -rwxr-xr-x 1

Hive Transform Scripts Ending Cleanly

2012-09-21 Thread John Omernik
Greetings All - I have a transform script that some some awesome stuff (at least to my eyes) Basically, here is the SQL SELECT TRANSFORM (filename) USING 'worker.sh' as (col1, col2, col3, col4, col5) FROM mysource_filetable worker.sh is actually a wrapper script that looks like this:

Segfault in Python script during Transform

2012-09-17 Thread John Omernik
I am running a transform query in hive 0.9.0 and trying to figure out why my script segfaults while run as part of a hive job but not part when run by itself. The library that is freaking out is libcrypto and is being called by the Python m2crypto module. That being said, I can run the same pytho

Hive job not distributing

2012-09-01 Thread John Omernik
I have a job that has lots of tiny map tasks that finish very fast (I think my max time was 9 seconds) I understand that I should change my input to avoid that... and it's difficult because these processes are using a transform script on binary data so it makes it difficult to pull off (Long story)

Force number of records per map task

2012-08-31 Thread John Omernik
This is going to sound very odd, but I am hoping to use a transform script in such a way that I pass a filepath to the transform script, to which it reads the file and produces a bunch of rows in hive. In this case the data is pcaps. I have a location accessible to all nodes, and I want to have m

Re: Join with OR condition in hive

2012-08-29 Thread John Omernik
How do you join two tables that aren't represented in both sides of the =? Can you describe a bit more of what you are trying to get out of the data? I am having a hard time wrapping my head around this... On Wed, Aug 29, 2012 at 4:44 PM, sonia gehlot wrote: > Hi All, > > I am joining 2 table

Troubles with Heap Space Issues on Insert

2012-08-29 Thread John Omernik
I am running some data that isn't huge persay, but I performing processing on it to get into my final table (RCFile). One of the challenges is that it comes in large blocks of data, for example, I may have a 70MB chunk of binary data that I want to put in. My process that generates this data hexes

Hive 0.9 and Indexing

2012-07-26 Thread John Omernik
I am playing with Hive indexing and a little discouraged by the gap between the potential seen and the amount of documentation around indexing. I am running Hive 0.9 and started playing with indexing as follows: I have a table logs that has a bunch of fields but for this, lets say three. sessionut

Re: Searching for a string off a group by query

2012-07-17 Thread John Omernik
a set of > strings (ex: * *collect_set(msgBody)*)* that comes as a result of a group > by query. > > > On Tue, Jul 17, 2012 at 8:50 AM, John Omernik wrote: > >> Not sure what you are trying to do, but you may want to check out the >> array_contains function. Also, if

Re: Searching for a string off a group by query

2012-07-16 Thread John Omernik
Not sure what you are trying to do, but you may want to check out the array_contains function. Also, if you are using Hive 9 you can use the concat_ws() function. This is taken from a google search: select concat_ws(‘.’, array(‘www’,’apache’,’org’)) from src limit 1; www.apache.org https://cwiki

  1   2   >