I use hive 0.6 version, I use JDBC to execute one search sql, for example
below code:
public static void query() {
Connection hiveConnection = null;
Statement statement = null;
try {
Class.forName("org.apache.hadoop.hive.jdbc.HiveDriver");
hiveConne
4M is a good number based on a lot of experiments. Increase the number
will reduce the file size, but the saving will increase very slow
after the block size goes beyond 4M, but the perf/memory will increase
a lot.
FYI, we are trying a new columnar file format. And it seems 4M is no
longer the mag
On Mon, Jan 24, 2011 at 7:51 PM, yongqiang he wrote:
> Yes. It only support block compression. (No record level compression support.)
> You can use the config 'hive.io.rcfile.record.buffer.size' to specify
> the block size (before compression). The default is 4MB.
>
> Thanks
> Yongqiang
> On Mon,
Yes. It only support block compression. (No record level compression support.)
You can use the config 'hive.io.rcfile.record.buffer.size' to specify
the block size (before compression). The default is 4MB.
Thanks
Yongqiang
On Mon, Jan 24, 2011 at 4:44 PM, Edward Capriolo wrote:
> On Mon, Jan 24,
On Mon, Jan 24, 2011 at 4:42 PM, Edward Capriolo wrote:
> On Mon, Jan 24, 2011 at 4:14 PM, yongqiang he
> wrote:
>> How did you upload the data to the new table?
>> You can get the data compressed by doing a insert overwrite to the
>> destination table with setting "hive.exec.compress.output" to
hmmm, I've seen mention of SymLink but I don't yet grasp how it works/applies
to selecting files to process. Also, I don't have much control over how the
data gets to the bucket I end up reading from, hence the need to powerfully
select.
Could you point me to some SymLink documentation or an
Although there is only 1 reducer, the amount of data to that reducer
should be really
small: it will have same number of rows as the number of mappers.
Can you check how much data is your reducer getting ?
Is it reading a long time to read the small data from each mapper ?
Thanks,
-namit
On 1
On Mon, Jan 24, 2011 at 5:58 PM, Avram Aelony wrote:
> Hi,
>
> I really like the virtual column feature in 0.7 that allows me to request
> INPUT__FILE__NAME and see the names of files that are being acted on.
>
> Because I can see the files that are being read, I see that I am spending
> time qu
Hi,
I really like the virtual column feature in 0.7 that allows me to request
INPUT__FILE__NAME and see the names of files that are being acted on.
Because I can see the files that are being read, I see that I am spending time
querying many, many very large files, most of which I do not need
So, in pig this is reduced by 4 threads automatially? That would be interesting.
I've usually used groups, but when there is only one group, you still
have this problem where the CPU resources are underutilized.
-Ajo.
On Mon, Jan 24, 2011 at 2:01 PM, Jonathan Coveney wrote:
> Yes, I tried that,
Yes, I tried that, it looks like it forces it to 1 if there are no groups.
2011/1/24 Ajo Fod
> oh ... sorry you say you already tried that.
>
>
>
> On Mon, Jan 24, 2011 at 1:54 PM, Ajo Fod wrote:
> > you could try to set the number of reducers e.g:
> > set mapred.reduce.tasks=4;
> >
> > set th
oh ... sorry you say you already tried that.
On Mon, Jan 24, 2011 at 1:54 PM, Ajo Fod wrote:
> you could try to set the number of reducers e.g:
> set mapred.reduce.tasks=4;
>
> set this before doing the select.
>
> -Ajo
>
> On Mon, Jan 24, 2011 at 1:13 PM, Jonathan Coveney wrote:
>> I have a
you could try to set the number of reducers e.g:
set mapred.reduce.tasks=4;
set this before doing the select.
-Ajo
On Mon, Jan 24, 2011 at 1:13 PM, Jonathan Coveney wrote:
> I have a 10 node server or so, and have been mainly using pig on it, but
> would like to try out Hive.
> I am running thi
On Mon, Jan 24, 2011 at 4:14 PM, yongqiang he wrote:
> How did you upload the data to the new table?
> You can get the data compressed by doing a insert overwrite to the
> destination table with setting "hive.exec.compress.output" to true.
>
> Thanks
> Yongqiang
> On Mon, Jan 24, 2011 at 12:30 PM,
How did you upload the data to the new table?
You can get the data compressed by doing a insert overwrite to the
destination table with setting "hive.exec.compress.output" to true.
Thanks
Yongqiang
On Mon, Jan 24, 2011 at 12:30 PM, Edward Capriolo wrote:
> I am trying to explore some use case tha
I have a 10 node server or so, and have been mainly using pig on it, but
would like to try out Hive.
I am running this query, which doesn't take too long in Pig, but is taking
quite a long time in Hive.
hive -e "select count(1) as ct from my_table where v1='02' and v2 =
;" > thecount
One
I am trying to explore some use case that I believe are perfect for
the columnarSerDe, tables with 100+ columns where only one or two are
selected in a particular query.
CREATE TABLE ()
ROW FORMAT SERDE "org.apache.hadoop.hive.serde2.columnar.ColumnarSerDe"
STORED AS RCFile ;
My issue is m
Speculative execution must be on for hadoop in your setup. With speculative
execution 'ON' framework starts running tasks on different nodes in
parallel. The one that finishes first contributes, the other one is killed.
-Shrijeet
On Mon, Jan 24, 2011 at 11:56 AM, Vijay wrote:
> Hi,
>
> When I r
Hi,
When I run hive queries, everything finishes successfully but in the
jobtracker UI, I see a lot of failed/killed task attempts. For example
I have a simple query that finishes in 4 minutes. On the jobtracker UI
here are the map and reduce stats:
MAP - Num Tasks: 138, Complete: 138, Failed/Kil
Hi All,
Whats the best way to add custom serde jar.
looks like add jar says added successfully and I am able to list that jar in
hive interface
However Create table fails to find the class.
Thanks
Ankit
20 matches
Mail list logo