Hi Nitin,

I want queries to return within a second

Hive table DataSize is 50TB – Snappy RC file

Thanks and Regards
Prabakaran.N  aka NP
nsn, Bangalore
When "I" is replaced by "We" - even Illness becomes "Wellness"


From: ext Nitin Pawar [mailto:nitinpawar...@gmail.com]
Sent: Thursday, July 31, 2014 6:25 PM
To: user@hadoop.apache.org
Subject: Re: Hadoop Realtime Queries

I want quick response for SQL queries .

how quick is quick for you ?
what's the data size?
what kind of queries you want to run?
what is the frequency of running the query on same dataset again and again?


On Thu, Jul 31, 2014 at 6:20 PM, Natarajan, Prabakaran 1. (NSN - IN/Bangalore) 
<prabakaran.1.natara...@nsn.com<mailto:prabakaran.1.natara...@nsn.com>> wrote:
Hi,

Thank you all for the reply.

I want quick response for SQL queries .

Thanks and Regards
Prabakaran.N

From: ext Bertrand Dechoux 
[mailto:decho...@gmail.com<mailto:decho...@gmail.com>]
Sent: Thursday, July 31, 2014 1:28 PM
To: user@hadoop.apache.org<mailto:user@hadoop.apache.org>
Subject: Re: Hadoop Realtime Queries

It all depends on the context and what is really meant by realtime. Impala (and 
other concurrent alternatives) are not listed among the tools you have tried.
Maybe you should not focus only on batch frameworks for providing a realtime 
access? The results are not surprising.

Bertrand Dechoux

On Thu, Jul 31, 2014 at 9:38 AM, Kumar, Deepak8 
<deepak8.ku...@citi.com<mailto:deepak8.ku...@citi.com>> wrote:
Hi,
As far as I know, real time queries are only possible using HBase & cloudera 
search. Hive would be a batch process, it is not real time. So instead of 
tuning different parameters , may be you could look for different architecture 
design so that you could use HBase.

Regards,
Deepak

From: Natarajan, Prabakaran 1. (NSN - IN/Bangalore) 
[mailto:prabakaran.1.natara...@nsn.com<mailto:prabakaran.1.natara...@nsn.com>]
Sent: Thursday, July 31, 2014 3:32 AM
To: user@hadoop.apache.org<mailto:user@hadoop.apache.org>
Subject: Hadoop Realtime Queries

Hi

I want to perform realtime query on HDFS data.   I tried hadoop/yarnt/hive, 
shark on spark, Tez, etc.,
But still I couldn’t get subsecond performance on the large data that I have.
I understand hadoop is not meant for this, but still want to achieve as max as 
possible

1.       How can we tune RHEL OS for this?
2.       How can we tune yarn?
3.       Is there is any stable framework like Tez which can perform much better
4.       Is there is any caching strategy that we can adopt?
5.       Any articles related to this are welcome

Thanks in Advance

Prabakaran.N







--
Nitin Pawar

Reply via email to