Re: How to make the hive external table read from subdirectories

2012-09-12 Thread Bejoy KS
Hi Nataraj Once you have created a partitioned table you need to add the partitions, only then the data in sub dirs will be visible to hive. After creating the table you need to execute a command like below ALTER TABLE some_table ADD PARTITION (year='2012', month='09', dayofmonth='11') LOCATION

Re: rack topology data update

2012-09-12 Thread Saurabh bhutyani
I believe running the following command on namenode should refresh it. 'hadoop dfsadmin -refreshNodes' Thanks & Regards, Saurabh Bhutyani Call : 9820083104 Gtalk: s4saur...@gmail.com On Thu, Sep 13, 2012 at 11:25 AM, Viji R wrote: > Hi Jameson, > > If the NameNode has cached the wrong valu

Re: rack topology data update

2012-09-12 Thread Viji R
Hi Jameson, If the NameNode has cached the wrong value earlier, it will not refresh that until you restart it. On Thu, Sep 13, 2012 at 11:21 AM, Jameson Li wrote: > Hi harsh, > > I have followed your suggestion operation. > > 1, stop the new datanode.(I have modified the topology file in the nam

Re: rack topology data update

2012-09-12 Thread Jameson Li
Hi harsh, I have followed your suggestion operation. 1, stop the new datanode.(I have modified the topology file in the namenode before.) 2, run 'hadoop dfsadmin -refreshNodes' on the namenode 3, start the new datanode. But it really not update the new topology mapping. It just show the start in

RE: removing datanodes from clustes.

2012-09-12 Thread yogesh.kumar13
Thanks Brahmareddy, Do we need to create include and exclude files, and of which extension. Please suggest. Regards Yogesh Kumar From: Brahma Reddy Battula [brahmareddy.batt...@huawei.com] Sent: Wednesday, September 12, 2012 10:16 AM To: user@hadoop.apache.org S

Re: rack topology data update

2012-09-12 Thread Harsh J
Jameson, The right process to add a new node with the right mapping is: 1. Update topology file for the new DN. 2. Issue a dfsadmin -refreshNodes to get new topology mapping updated in NN. 3. Start the DN only after (2) so it picks up the right mapping and a default mapping does not get cached.

Re: multipleoutputs does not like speculative execution in map-only job

2012-09-12 Thread Harsh J
Hey Radim, Does your job use the FileOutputCommitter? On Thu, Sep 13, 2012 at 4:21 AM, Radim Kolar wrote: > with speculative execution enabled Hadoop can run task attempt on more then > 1 node. If mapper is using multipleoutputs then second attempt (or sometimes > even all) fails to create outpu

RE: How to make the hive external table read from subdirectories

2012-09-12 Thread Nataraj Rashmi - rnatar
Thanks for your response. Can someone see if this is ok? I am not getting any records when I query the hive table when I use Partitions. This is how I am creating the table. CREATE EXTERNAL TABLE Data (field1 STRING,field2) PARTITIONED BY(year STRING, month STRING, dayofmonth STRING) ROW FOR

rack topology data update

2012-09-12 Thread Jameson Li
Our hadoop version is hadoop-0.20-append+4. We have configured the rack awareness in the namenode. But when I add new datanode, and update the topology data file, and restart the datanode, I just see the log in the namenode that: 2012-09-13 10:35:25,074 INFO org.apache.hadoop.net.NetworkTopology:

Re: Is mahout kmeans slow ?

2012-09-12 Thread Elaine Gan
Hi, Sorry, i sent to the wrong ML. Please ignore this. Thank you. > Hi, > > I'm trying to do some text analysis using mahout kmeans (clustering), > processing the data on hadoop. > --numClusters = 160 > --maxIter (-x) maxIter = 200 > > Well my data is small, around 500MB . > I have 4 servers,

Is mahout kmeans slow ?

2012-09-12 Thread Elaine Gan
Hi, I'm trying to do some text analysis using mahout kmeans (clustering), processing the data on hadoop. --numClusters = 160 --maxIter (-x) maxIter = 200 Well my data is small, around 500MB . I have 4 servers, each with 4CPU and TaskTrackers are set to 4 as maximum. When i run the mahout task, i

multipleoutputs does not like speculative execution in map-only job

2012-09-12 Thread Radim Kolar
with speculative execution enabled Hadoop can run task attempt on more then 1 node. If mapper is using multipleoutputs then second attempt (or sometimes even all) fails to create output file because it is being created by another attempt: attempt_1347286420691_0011_m_00_0 attempt_134728642

'Can't get service ticket for: host/0.0.0.0' when running hdfs with kerberos

2012-09-12 Thread jack chrispoo
Hi, I'm using Hadoop 1.0.1, I tried to follow https://ccp.cloudera.com/display/CDHDOC/Configuring+Hadoop+Security+in+CDH3+%28KSSL%29 to configure hadoop with kerberos authentication. I configured KDC and added hdfs, mapred, host principles for each node to kerberos and deployed the keytabs to each

Re: How to make the hive external table read from subdirectories

2012-09-12 Thread Bejoy KS
Hi Natraj Create a partitioned table and add the sub dirs as partitions. You need to have some logic in place for determining the partitions. Say if the sub dirs denote data based on a date then make date as the partition. Regards Bejoy KS Sent from handheld, please excuse typos. -Origi

How to make the hive external table read from subdirectories

2012-09-12 Thread Nataraj Rashmi - rnatar
I have a hive external table created from a hdfs location. How do I make it read the data from all the subdirectories also? Thanks. *** The information contained in this communication is confidential, is intended only for th

Re: Some general questions about DBInputFormat

2012-09-12 Thread Yaron Gonen
Hi again Nick, DBInputFormat does use Connection.TRANSACTION_SERIALIZABLE, but this a per connection attribute. Since every mapper has its own connection, and every connection is opened in a different time, every connection sees a different snapshot of the DB and it can cause for example two mapper

RE: How get information messages when a JobControl is used ?

2012-09-12 Thread Piter85 Piter85
Ok I will do it! From: rekha_jo...@intuit.com To: user@hadoop.apache.org Subject: Re: How get information messages when a JobControl is used ? Date: Wed, 12 Sep 2012 11:41:21 + Good that web hdfs is sufficient for now, Piter! The counters are part of o.a.h.mapreduce.Job so you can ge

Re: How get information messages when a JobControl is used ?

2012-09-12 Thread Joshi, Rekha
Good that web hdfs is sufficient for now, Piter! The counters are part of o.a.h.mapreduce.Job so you can get them as job.getCounters()..etc or via JobInProgress. It is not a JobControl feature as such, so they will not be directly in JobControl/ControlledJob API. However Bertrand's point is an i

RE: How get information messages when a JobControl is used ?

2012-09-12 Thread Piter85 Piter85
Hi Rekha and Bertrand! Thanks for the answers! Ok I see that in web interface (_logs->history->job_.) there are infos about executions of jobs. I hope that this infos will be enough for me. As I said before, scanning APIs, the only method that I found was ControlledJob:toString(). Bye! :)

Expected behavior of nested UserGroupInformation

2012-09-12 Thread Bertrand Dechoux
Hi, I am using UserGroupInformation.doAs(...) in order to launch a job programmatically from a remote application. I was wondering : what is the expected behavior of nested UserGroupInformation? Is it the same as with Jaas? Which is, if I am not mistaken, the last inner 'subject' is used? If that

Re: How get information messages when a JobControl is used ?

2012-09-12 Thread Bertrand Dechoux
But as far as I know there is no way to have a snapshot of the JobControl state. https://issues.apache.org/jira/browse/MAPREDUCE-3562 I was trying only to get the state of all jobs and it is not possible to get a consistent view. For Map/Reduce progress, I guess you could the same by digging into

Re: How get information messages when a JobControl is used ?

2012-09-12 Thread Joshi, Rekha
Hi Piter, JobControl just means there are multiple complex jobs, but you will see the information for each job on your hadoop web interface webhdfs still, wouldn't you? Or if that does not work, you might need to use Reporters/Counters to get the log info data in custom format as needed. Thank

Re: Question about the task assignment strategy

2012-09-12 Thread Hiroyuki Yamada
Hi, Thank you for replaying the experiments. I launched a job through hive with default TextInputFormat. The job is TPC-H Q1 query, which is a simple selection query for lineitem table. The each size of data (data01...data14) is about 300GB, so about 4.2TB(=300GB*14) in total. I really appreciat

How get information messages when a JobControl is used ?

2012-09-12 Thread Piter85 Piter85
Hi! I'm using JobControl (v. 1.0.3) to chain two MapReduce applications. It works and creates output data, but it doesn't give me back information messages as number of mappers, number of records in input or in output, etc... It only returns messages like this : 12/09/12 09:56:38 WARN mapred.J

Re: How to make different mappers execute different processing on a same data ?

2012-09-12 Thread Jason Yang
All right, I got it~Thank you very much. 2012/9/11 Harsh J > Hey Jason, > > While I am not sure on whats the best way to automatically "evaluate" > during the execution of a job, the MultipleInputs class offers a way > to run different map implementations within a single job for different > inpu

Re: how to make different mappers execute different processing on same data ?

2012-09-12 Thread Jason Yang
Thanks for your reply. But I'm not sure that woks since the data volume is large, which makes the cost of shuffling quite high if all the process are applied in Reducer. I thought the Hadoop would transfer all the output of Mapper to Reducer by HTTP, right? 2012/9/11 Narasingu Ramesh > Hi Jaso