Hi all,
I'm running a query that scans a file stored in ORC format and extracts
some columns. My file is about 92 GB, uncompressed. I kept the default
stripe size. The MapReduce job generates 363 map tasks.
I have noticed that the first 180 map tasks finish in 3 secs (each) and
after they
By jdbc run hive, how to get the Hive job status
Now I'm currently using the following method
Configuration conf = new Configuration();
JobConf job = new JobConf(conf);
JobClient jc = new JobClient(job);
// 获取集群状态
// ClusterStatus cs = jc.getClusterStatus();
JobStatus[] jobStatus =
Why not INSERT INTO for appending the new data?
a)Load the new data into staging table
b)INSERT INTO final table.
Sent from Windows Mail
From: Raj Hadoop
Sent: Monday, 10 February 2014 08:15
To: user, User
Hi,
My requirement is a typical Datawarehouse and
Hello! all
I wish to find a function that returns hive day of the week (Monday, Tuesday ..
etc) to enter a parameter (timestamp).
Anyone have an idea of how to do it?
--
[image: BEEVA]
*Eduardo Parra Valdés*
eduardo.pa...@beeva.com
BEE OUR CLIENT
WWW.BEEVA.COM http://www.beeva.com/
Clara del
Hi Avrilia
Is it a partitioned table? If so approximately how many partitions are there
and how many files are there? What is the value for hive.input.format?
My suspicion is that there are ~180 files and each file is ~515MB in size.
Since, you had mentioned you are using default stripe size
Hi,
I know this has been asked before. I did google around this topic and tried to
understand as much as possible, but I kind of got difference answers based on
different places. So I like to ask what I have faced and if someone can help me
again on this topic.
I created one table with one
Hi Prasanth,
No it's not a partitioned table. The table consists of only one file of
(91.7 GB). When I created the table I loaded data from a text table to the
orc table and used only 1 map task so that only one large file is created
and not many small files. This is why I'm getting confused with
I'm working on a UDAF that takes in a constant string that defines
what the final output of the UDAF will be. In the mode=PARTIAL1 call
to the init function all the parameters are available and the constant
can be read so the output ObjectInspector can be built. I haven't
found a way to pass
oddly enough i don't see one here:
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF#LanguageManualUDF-DateFunctions
however, you're not the only one finding something like this useful. cf.
https://issues.apache.org/jira/browse/HIVE-6046
in the meantime it appears as though
Hi Avrilia
I have few more questions
1) Have you enabled ORC predicate pushdown by setting
hive.optimize.index.filter?
2) What is the value for hive.input.format?
3) Which hive version are you using?
4) What query are you using?
Thanks
Prasanth Jayachandran
On Feb 10, 2014, at 1:26 PM,
Here's one implementation of it:
https://github.com/livingsocial/HiveSwarm#dayofweekdate.
The code for it is pretty straight forward:
https://github.com/livingsocial/HiveSwarm/blob/master/src/main/java/com/livingsocial/hive/udf/DayOfWeek.java
On Mon, Feb 10, 2014 at 4:38 PM, Stephen Sprague
Hi Prasanth,
Here are the answers to your questions:
1) Yes I have set both set hive.optimize.ppd=true; set
hive.optimize.index.filter=true;
2) From describe extended: inputFormat:
org.apache.hadoop.hive.ql.io.orc.OrcInputFormat
3) Hive 0.12
4) Select max (I1) from table;
Thanks,
Avrilia
On
2) From describe extended: inputFormat:
org.apache.hadoop.hive.ql.io.orc.OrcInputFormat
OrcInputFormat can be bypassed if hive.input.format is set to
CombineHiveInputFormat. There are two different split computation code path
both of which may generate different number of splits and hence
Hi Prasanth,
It seems that I was actually using the
hive.input.format=org.apache.hadoop.hive.ql.io.CombineHiveInputFormat and
that was generating 363 map tasks. I tried the org.apache.
hadoop.hive.ql.io.HiveInputFormat and I as actually able to get 182 map
tasks and get rid of the short map
Great to hear!
Thanks
Prasanth Jayachandran
On Feb 10, 2014, at 2:50 PM, Avrilia Floratou avrilia.flora...@gmail.com
wrote:
Hi Prasanth,
It seems that I was actually using the
hive.input.format=org.apache.hadoop.hive.ql.io.CombineHiveInputFormat and
that was generating 363 map tasks. I
Hi,
I need to know what are conditions based on which Hive decides to execute
a UDF, UDTF and UDAF on the map or reduce side. So far I have understood
that UDFs are mostly executed on map side and
UDAFs on reduce side. But are there any conditions in which a UDF can be
executed on reduce side
HBase storage handler uses it's own InputFormat.
So, hbase.client.scanner.caching (which is used in hbase.TableInputFormat)
does not work. It might be configurable via HIVE-2906, something like
select empno, ename from hbase_emp ('hbase.scan.cache'='1000'). But I've
not tried.
bq. Is there any
17 matches
Mail list logo