Managing input split sizes in Hive running the tez engine

2016-04-20 Thread Nitin Kumar
Hi, I want to gain a better understanding of how in the input splits are calculated in the tez engine. I am aware that the *hive.input.format* property can be set to either *HiveInputFormat* (default) or to *CombineHiveInputFormat* (generally accepted for large number of files having sizes << hdf

RE: Question on Implementing CASE in Hive Join

2016-04-20 Thread Markovitz, Dudu
The second version works as expected (after fixing a typo in the word ‘indicator’). If you don’t get any results you should check your data (maybe the fields contains trailing spaces or control characters etc.). If you’re willing to replace the ‘OUTER’ with ‘INNER’, there’s another option - sel

Re: Hive footprint

2016-04-20 Thread Mich Talebzadeh
Hi, If I may, I would also like to see where the Hive optimizer shows that it is used with explain ... or other means. It will be interesting. HTH Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw

Re: Hive footprint

2016-04-20 Thread Marcin Tustin
Could you expand on this? This sounds like something that would be great to know, and probably fold into the wiki. On Wed, Apr 20, 2016 at 11:57 AM, Jörn Franke wrote: > Hive has working indexes. However many people overlook that a block is > usually much larger than in a relational database and

Re: Hive footprint

2016-04-20 Thread Jörn Franke
Hive has working indexes. However many people overlook that a block is usually much larger than in a relational database and thus do not use them right. > On 19 Apr 2016, at 09:31, Mich Talebzadeh wrote: > > The issue is that Hive has indexes (not index store) but they don't work so > there we

Re: Hive footprint

2016-04-20 Thread Jörn Franke
Depends really what you want to do. Hive is more for queries involving a lot of data, whereby hbase+Phoenix is more for oltp scenarios or sensor ingestion. I think the reason is that hive has been the entry point for many engines and formats. Additionally there is a lot of tuning capabilities fr

Re: Hive footprint

2016-04-20 Thread Mich Talebzadeh
A caveat here. An OLTP database much like Oracle or SAP ASE will use indexes for point queries in other words when the search is via index scan. In that case the search will be very fast because typically few blocks will be needed using Index scan and using RowID pointer to the underlying data blo

Re: Question on Implementing CASE in Hive Join

2016-04-20 Thread Kishore A
Hi Dudu, Thank you for sending queries around this. I have run these queries and below are the observations 1. It did return the same error as before" SemanticException [Error 10017]: Line 4:4 Both left and right aliases encountered in JOIN 'code'" 2. Query execution is successful but not retri

Re: Hive footprint

2016-04-20 Thread Sabarish Sasidharan
HBase is very good for direct key based lookups. And when you want to do scans for a range of keys (data is sorted by keys) Whereas Hive is not good for seeks (needle in haystack problem). You can optimize with ORCs, stripes, sorting etc. But still it is a needle in a haystack problem. Apache Kyl