Re: How to work around non-executive /tmp with Hive in Parquet+Snappy compression?

2016-03-24 Thread Rex X
Sorry to bump this thread again. Got following error: java.lang.UnsatisfiedLinkError: /tmp/snappy-1.0.4.1-libsnappyjava.so: /tmp/ snappy-1.0.4.1-libsnappyjava.so: failed to map segment from shared object: Operation not permited Based on the following post:

Re: How to work around non-executive /tmp with Hive in Parquet+Snappy compression?

2016-03-24 Thread Rex X
Nice! Problem solved! On Mon, Mar 21, 2016 at 8:19 AM, Tale Firefly wrote: > Hey ! > > Are you talking about the hdfs /tmp or the local FS /tmp ? > > For the HDFS one, I think it should be the property : > hive.exec.scratchdir > > For the local, I think it should be the

How to do multiple output of Hive with Python?

2016-03-24 Thread Rex X
Given a query select category, value from someHiveTable; I expect to output the result above of each category into one separate file named by the corresponding category. Any tips how to make it?

Re: Hadoop 2.6 version https://issues.apache.org/jira/browse/YARN-2624

2016-03-24 Thread mahender bigdata
Is there any other way to do NM Node Cache directory, I'm using Windows Cluster Hortan Works HDP System. /mahender On 3/24/2016 11:27 AM, mahender bigdata wrote: Hi, Has any one is holding work around for this bug, Looks like this problem still persists in hadoop 2.6. Templeton Job get

Hadoop 2.6 version https://issues.apache.org/jira/browse/YARN-2624

2016-03-24 Thread mahender bigdata
Hi, Has any one is holding work around for this bug, Looks like this problem still persists in hadoop 2.6. Templeton Job get failed as soon as job is submitted. Please let us know as early as possible Application application_1458842675930_0002 failed 2 times due to AM Container for

Hive CBO cost

2016-03-24 Thread Vijaya Chander
Hi, Is there any way we can see/get the cost(CPU & I/O units) of the optimized Hive query plan. Once the query is submitted, it gets parsed, multiple plans gets generated, costs get associated with each plan and finally "the lowest cost plan" gets selected before execution. We were looking at

Hive TableSample with number of rows.

2016-03-24 Thread Sandeep Khurana
Hello Hive provides a table sample approach for number of rows. The documentation is at https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Sampling#LanguageManualSampling-BlockSampling It states "For example, the following query will take the first 10 rows from each input split.

RE: Issue joining 21 HUGE Hive tables

2016-03-24 Thread Loudongfeng
Not sure if there's anything special in external tables. May be you can try Cost Based Optimizer in Hive 0.14 and above. analyze table your_table compute statistics; analyze table your_table compute statistics for columns col_1, col_2,...; or analyze table your_table compute statistics for

Re: Issue joining 21 HUGE Hive tables

2016-03-24 Thread Mich Talebzadeh
Posting a typical query that you are using will help to clarify the issue. Also you may use TEMPORARY TABLEs to keep the intermediate stage results. On the face of it you can time every query itself to find out the longest components etc select from_unixtime(unix_timestamp(), 'dd/MM/

Re: Issue joining 21 HUGE Hive tables

2016-03-24 Thread Jörn Franke
Joining so many external tables is always an issue with any component. Your problem is not Hive specific; but your data model seems to be messed up. First of all you should have them in an appropriate format, such as ORC or parquet and the tables should not be external. Then you should use the