Re: Namenode automatically going to safemode with 2.1.0-beta

2013-07-16 Thread Krishna Kishore Bonagiri
Yes Harsh, I haven't set dfs.namenode.name.dir anywhere in config files. My name node has again gone into safe mode today while it was idle. I shall try setting this value to something other than /tmp On Tue, Jul 16, 2013 at 6:39 AM, Harsh J ha...@cloudera.com wrote: 2013-07-12 11:04:26,002

RE: Policies for placing a reducer

2013-07-16 Thread Devaraj k
Hi, It doesn’t consider where the map’s ran to schedule the reducers because reducers need to contact all the mappers for the map o/p’s. It schedules reducers wherever the slots available. Thanks Devaraj k From: Felix.徐 [mailto:ygnhz...@gmail.com] Sent: 16 July 2013 09:25 To:

hive task fails when left semi join

2013-07-16 Thread kira.wang
Hello, I am trying to filter out some records in a table in hive. The number of lines in this table is 4billions+, I make a left semi join between above table and a small table with 1k lines. However, after 3 hours job running, it turns out a fail status. My question are as follows,

Re: hive task fails when left semi join

2013-07-16 Thread Nitin Pawar
Can you try map only join? Your one table is just 1k records .. map join will help you run it faster and hopefully you will not hit memory condition On Tue, Jul 16, 2013 at 12:56 PM, kira.w...@xiaoi.com wrote: Hello, ** ** I am trying to filter out some records in a table in hive.

答复: hive task fails when left semi join

2013-07-16 Thread kira.wang
Thanks for you positive answer. From your answer, I get the key word “map join”, and realize it, do you mean that I can do as the blog says: http://blog.csdn.net/xqy1522/article/details/6699740 If you do mind, please scan the website. 发件人: Nitin Pawar

RE: hive task fails when left semi join

2013-07-16 Thread Devaraj k
Hi, In the given image, I see there are some failed/killed map reduce task attempts. Could you check why these are failing, you can check further based on the fail/kill reason. Thanks Devaraj k From: kira.w...@xiaoi.com [mailto:kira.w...@xiaoi.com] Sent: 16 July 2013 12:57 To:

答复: hive task fails when left semi join

2013-07-16 Thread kira.wang
I have check it. As datanode logs shown that, 2013-07-16 00:05:31,294 WARN org.apache.hadoop.mapred.TaskTracker: getMapOutput(attempt_201307041810_0138_m_000259_0,53) failed : org.mortbay.jetty.EofException: timeout This may be caused by a so-called “data skew” problem. Thanks,

Re: hive task fails when left semi join

2013-07-16 Thread Nitin Pawar
Dev, from what I learned in my past exp with running huge one table queries is one hits reduce side memory limits or timeout limits. I will wait for Kira to give more details on the same. sorry i forgot to ask for the logs and suggested a different approach :( Kira, Page is in chinese so can't

Collect, Spill and Merge phases insight

2013-07-16 Thread Felix . 徐
Hi all, I am trying to understand the process of Collect, Spill and Merge in Map, I've referred to a few documentations but still have a few questions. Here is my understanding about the spill phase in map: 1.Collect function add a record into the buffer. 2.If the buffer exceeds a threshold

While Inserting data into hive Why I colud not able to query ?

2013-07-16 Thread samir das mohapatra
Dear All, Did any one faced the issue : While Loading huge dataset into hive table , hive restricting me to query from same table. I have set hive.support.concurrency=true, still showing conflicting lock present for TABLENAME mode SHARED property namehive.support.concurrency/name

java.io.IOException: error=2, No such file or directory

2013-07-16 Thread Fatih Haltas
Hi everyone, I am trying to import data from postgresql to hdfs. But I am having some problems, Here is the problem details: Sqoop Version: 1.4.3 Hadoop Version:1.0.4 *1) When I use this command:* * * *./sqoop import-all-tables --connect jdbc:postgresql:// 192.168.194.158:5432/IMS --username

Re: Running a single cluster in multiple datacenters

2013-07-16 Thread Azuryy Yu
Hi Bertrand, I guess you configured two racks totally. one IDC is a rack, and another IDC is another rack. so if you want to don't replicate populate during one IDC down, you had to change the replicate placement policy, if there are minimum blocks on one rack, then don't do anything. (here

Re: java.io.IOException: error=2, No such file or directory

2013-07-16 Thread Shahab Yunus
The error is: *Please set $HBASE_HOME to the root of your HBase installation.* * * Have you checked whether it is set or not? Have you verified your HBase or Hadoop installation? Similarly, the following: *Cannot run program psql: java.io.IOException: error=2, No such file or directory * Also

Re: java.io.IOException: error=2, No such file or directory

2013-07-16 Thread Fatih Haltas
Thanks Shahab, I solved my problem, in anyother way,

Re: java.io.IOException: error=2, No such file or directory

2013-07-16 Thread Shahab Yunus
Great. Can you please share, if possible, what was the problem and how you solved it? Thanks. Regards, Shahab On Tue, Jul 16, 2013 at 9:58 AM, Fatih Haltas fatih.hal...@nyu.edu wrote: Thanks Shahab, I solved my problem, in anyother way,

Re: copy files from ftp to hdfs in parallel, distcp failed

2013-07-16 Thread Ram
Hi, Please replace 0.0.0.0.with your ftp host ip address and try it. Hi, From, Ramesh. On Mon, Jul 15, 2013 at 3:22 PM, Hao Ren h@claravista.fr wrote: Thank you, Ram I have configured core-site.xml as following: ?xml version=1.0? ?xml-stylesheet type=text/xsl

Re: copy files from ftp to hdfs in parallel, distcp failed

2013-07-16 Thread Hao Ren
Hi, Actually, I test with my own ftp host at first, however it doesn't work. Then I changed it into 0.0.0.0. But I always get the can not access ftp msg. Thank you . Hao. Le 16/07/2013 17:03, Ram a écrit : Hi, Please replace 0.0.0.0.with your ftp host ip address and try it. Hi,

Re: Collect, Spill and Merge phases insight

2013-07-16 Thread Stephen Boesch
great questions, i am also looking forward to answers from expert(s) here. 2013/7/16 Felix.徐 ygnhz...@gmail.com Hi all, I am trying to understand the process of Collect, Spill and Merge in Map, I've referred to a few documentations but still have a few questions. Here is my understanding

Incrementally adding to existing output directory

2013-07-16 Thread Max Lebedev
Hi I'm trying to figure out how to incrementally add to an existing output directory using MapReduce. I cannot specify the exact output path, as data in the input is sorted into categories and then written to different directories based in the contents. (in the examples below, token= or

header of a tuple/bag

2013-07-16 Thread Mix Nin
Hi, I am trying query a data set on HDFS using PIG. Data = LOAD '/user/xx/20130523/*; x = FOREACH Data GENERATE cookie_id; I get below error. line 2, column 26 Invalid field projection. Projected field [cookie_id] does not exist How do i find the column names in the bag Data . The developer

spawn maps without any input data - hadoop streaming

2013-07-16 Thread Austin Chungath
Hi, I am trying to generate random data using hadoop streaming python. It's a map only job and I need to run a number of maps. There is no input to the map as it's just going to generate random data. How do I specify the number of maps to run? ( I am confused here because, if I am not wrong,

Re: While Inserting data into hive Why I colud not able to query ?

2013-07-16 Thread Alan Gates
This question should be sent to u...@hive.apache.org. Alan. On Jul 16, 2013, at 3:23 AM, samir das mohapatra wrote: Dear All, Did any one faced the issue : While Loading huge dataset into hive table , hive restricting me to query from same table. I have set

Re: While Inserting data into hive Why I colud not able to query ?

2013-07-16 Thread Nitin Pawar
Samir try running the command unlock table and see if it works On Tue, Jul 16, 2013 at 8:42 PM, Alan Gates ga...@hortonworks.com wrote: This question should be sent to u...@hive.apache.org. Alan. On Jul 16, 2013, at 3:23 AM, samir das mohapatra wrote: Dear All, Did any one faced the

RE: Incrementally adding to existing output directory

2013-07-16 Thread Devaraj k
Hi Max, It can be done by customizing the output format class for your Job according to your expectations. You could you refer OutputFormat.checkOutputSpecs(JobContext context) method which checks the ouput specification. We can override this in your custom OutputFormat. You can also see

RE: spawn maps without any input data - hadoop streaming

2013-07-16 Thread Devaraj k
Hi Austin, Here number of maps for a Job depends on the splits return by InputFormat.getSplits() API. We can have an input format which decides the number of maps(by returning the splits) for a Job according to the need. If we use FileInputFormat, these number of splits