Re: Low performance map join when join key types are different

2015-12-23 Thread Zhiwen Sun
Got it. Thanks for your reply. Zhiwen Sun On Wed, Dec 23, 2015 at 2:24 PM, Gopal Vijayaraghavan wrote: > > > But why disable mapjoin has better performance when we don't use cast to > >string(user always lazy)? > > > > Join key values comparison in in reduc

Re: Low performance map join when join key types are different

2015-12-22 Thread Zhiwen Sun
Thanks to Gopal. But why disable mapjoin has better performance when we don't use cast to string(user always lazy)? Join key values comparison in in reduce stage is more quickly? Zhiwen Sun p;9456 On Wed, Dec 23, 2015 at 2:36 AM, Gopal Vijayaraghavan wrote: > > > We found tha

Low performance map join when join key types are different

2015-12-22 Thread Zhiwen Sun
a.id and b.id to double. When the conversion occurs, map join will become very slow. Simple solution is disable autojoin. Does anyone how to solve it more effectively? My hive version : 1.1.0+cdh5.4.7+233 Zhiwen Sun

Re: dfs storage full on all slave machines of 6 machine hive cluster

2013-03-18 Thread Zhiwen Sun
The folder "/mnt/hadoop-fs/dfs/data/current/" is the main folder of datanode in hadoop. You can use *hadoop dfs -rmr {nouserdir} *to get more free space in HDFS. *Don't delete file directly in OS file system.* Zhiwen Sun On Mon, Mar 18, 2013 at 6:48 PM, Manish Bhoge wrote:

Re: how to handle variable format data of text file?

2013-03-18 Thread Zhiwen Sun
As u defined in create table hql: fields delimited by blank space. So, the other data is omitted if you wanna contain rest data at the end of line. I suggest you use org.apache.hadoop.hive.contrib.serde2.RegexSerDe row format instead of default delimited format. Zhiwen Sun On Mon, Mar 11