RegExSerde over JDBC... is it possible>

2013-12-30 Thread Jay Vyas
Hi folks.. Is there an JDBC or other API driven, programmatic way to launch a Hive Job using SerDe's from the contrib library? I've been attempting to run a query using Hive's JDBC driver which uses the regex SerDe, however, it appears that hive doesn;t see this class. As Im launching the job f

Re: ORC file tuning

2013-12-30 Thread Yin Huai
Hi Avrilia, In org.apache.hadoop.hive.ql.io.orc.WriterImpl, the block size is determined by Math.min(1.5GB, 2 * stripeSize). Also, you can use "orc.block.padding" in the table property to control whether the writer to pad HDFS blocks to prevent stripes from straddling blocks. The default value of

Re: Using Hive with WebHCat

2013-12-30 Thread Eugene Koifman
Sorry, missed this mail earlier. The fact that these files are missing is OK. When you ask for status info via REST it tries to return a list of fields (each field like 'exitValue' is a file when using HDFStorage, which is the default) but it doesn't know which ones have never been written. For

Re: RegExSerde over JDBC... is it possible>

2013-12-30 Thread Jay Vyas
Ah... i see whats going on. When hive finally issues a "load" statement that actually USES the SerDe, hadoop takes over. At that point, hadoop/lib needs to have the hive serde libraries copied into it, otherwise it cant find the AbstractSerDe classes. I wonder if there is a more elegant way to i

WebHCat MapReduce Job Syntax

2013-12-30 Thread Jonathan Hodges
Hi, I am trying to kick off a mapreduce job via WebHCat. The following is the hadoop jar command. hadoop jar /home/hadoop/camus-non-avro-consumer-1.0-SNAPSHOT-jar-with-dependencies.jar com.linkedin.camus.etl.kafka.CamusJob -P /home/hadoop/camus_non_avro.properties As you can see there is an app

Re: WebHCat MapReduce Job Syntax

2013-12-30 Thread Jonathan Hodges
Sorry accidentally hit send before adding the lines from webhcat.log DEBUG | 30 Dec 2013 19:08:01,042 | org.apache.hcatalog.templeton.Server | queued job job_201312212124_0161 in 267 ms DEBUG | 30 Dec 2013 19:08:38,880 | org.apache.hcatalog.templeton.tool.HDFSStorage | Couldn't find /templeton-ha

Re: WebHCat MapReduce Job Syntax

2013-12-30 Thread Eugene Koifman
have you tried adding -d arg=-P before -d arg=/tmp/properites On Mon, Dec 30, 2013 at 11:14 AM, Jonathan Hodges wrote: > Sorry accidentally hit send before adding the lines from webhcat.log > > DEBUG | 30 Dec 2013 19:08:01,042 | org.apache.hcatalog.templeton.Server | > queued job job_20131

HiveMetaStoreClient only sees one of my DBs ?

2013-12-30 Thread Yang
if I log into my hive shell, do "show databases;" , I see many DBs: Logging initialized using configuration in file:/etc/hive/conf/hive-log4j.properties hive> show databases; OK conf confnew default money testdb Time taken: 1.57 seconds, Fetched: 6 row(s) but somehow if I run the following java

RE: HiveMetaStoreClient only sees one of my DBs ?

2013-12-30 Thread java8964
Best mailing list for this question is hive, but I will try to give my guess here anyway. If you only see 'default' database, most likely you are using hive 'LocalMetaStore'. For helping yourself to find out the problem, try to find out following information: 1) What kind of Hive metastore you a

Re: WebHCat MapReduce Job Syntax

2013-12-30 Thread Jonathan Hodges
I didn't try that before, but I just did. curl -s -d user.name=hadoop \ >-d jar=/tmp/camus-non-avro-consumer-1.0-SNAPSHOT-jar-with-dependencies.jar \ >-d class=com.linkedin.camus.etl.kafka.CamusJob \ >-d arg=-P \ >-d arg=/tmp/camus_non_avro.properties \ >

Re: WebHCat MapReduce Job Syntax

2013-12-30 Thread Eugene Koifman
Is there any output from TrivialExecService class in any hadoop logs? (it's DEBUG level log4j output in hive 0.12). It should print the command that TempletonControllerJob's launcher task (LaunchMapper) is trying to launch On Mon, Dec 30, 2013 at 12:55 PM, Jonathan Hodges wrote: > I didn't try

Re: Hive - Issue Converting Text to Orc

2013-12-30 Thread Bryan Jeffrey
Prasanth, Any luck? On Tue, Dec 24, 2013 at 4:31 PM, Bryan Jeffrey wrote: > Prasanth, > > I am also traveling this week. Your assistance would be appreciated, but > not at the expense of your holiday! > > Bryan > On Dec 24, 2013 2:23 PM, "Prasanth Jayachandran" < > pjayachand...@hortonworks.co

Re: HiveMetaStoreClient only sees one of my DBs ?

2013-12-30 Thread Yang
thanks, I fixed it. it turns out that I need to put my hive-site.xml into classpath, without this, it still mysteriously works and somehow gave me a "default" db. (I wish it had given a more explicit error ) On Mon, Dec 30, 2013 at 12:42 PM, java8964 wrote: > Best mailing list for this questi

Re: WebHCat MapReduce Job Syntax

2013-12-30 Thread Jonathan Hodges
I don't see 'TrivialExecService' output in the jobtracker or tasktracker logs. We are using hive 0.11 though so maybe not set to DEBUG? On Mon, Dec 30, 2013 at 2:11 PM, Eugene Koifman wrote: > Is there any output from TrivialExecService class in any hadoop logs? > (it's DEBUG level log4j outpu

Re: WebHCat MapReduce Job Syntax

2013-12-30 Thread Eugene Koifman
It looks like in 0.11 it writes to stderr (limited logging anyway). Perhaps you can try adding '*statusdir*' param to your REST call and see if anything useful is written to that directory. On Mon, Dec 30, 2013 at 2:22 PM, Jonathan Hodges wrote: > I don't see 'TrivialExecService' output in the

Re: WebHCat MapReduce Job Syntax

2013-12-30 Thread Jonathan Hodges
You're the man! When I included the 'statusdir' param I get the following output in stderr. Exception in thread "main" java.io.FileNotFoundException: /tmp/camus_non_avro.properties (No such file or directory) at java.io.FileInputStream.open(Native Method) at java.io.FileInputStrea

Re: hive hbase integration

2013-12-30 Thread Vikas Parashar
Hi Guys, Eventually, i got success to solve that issue. As you know, issue was related with map-reduce jobs. Error was clearing saying, there was some jar(class) is missing. So, i have create a temporary folder and copied some jars file there. Fortunately, it's work for me. Kindly fine below steps

ORC performance

2013-12-30 Thread chandra Reddy Bogala
Hi, I have inserted around 700 million records or 55 GB worth data from staging table to below ORC table. And then I tried running small queries to get few columns data from ORC table. But ORC query performance is slower than staging table query performance. I am not sure where I am doing mistake