Hi Nataraj
Once you have created a partitioned table you need to add the partitions, only
then the data in sub dirs will be visible to hive.
After creating the table you need to execute a command like below
ALTER TABLE some_table ADD PARTITION (year='2012', month='09', dayofmonth='11')
LOCATION
I believe running the following command on namenode should refresh it.
'hadoop dfsadmin -refreshNodes'
Thanks & Regards,
Saurabh Bhutyani
Call : 9820083104
Gtalk: s4saur...@gmail.com
On Thu, Sep 13, 2012 at 11:25 AM, Viji R wrote:
> Hi Jameson,
>
> If the NameNode has cached the wrong valu
Hi Jameson,
If the NameNode has cached the wrong value earlier, it will not
refresh that until you restart it.
On Thu, Sep 13, 2012 at 11:21 AM, Jameson Li wrote:
> Hi harsh,
>
> I have followed your suggestion operation.
>
> 1, stop the new datanode.(I have modified the topology file in the nam
Hi harsh,
I have followed your suggestion operation.
1, stop the new datanode.(I have modified the topology file in the namenode
before.)
2, run 'hadoop dfsadmin -refreshNodes' on the namenode
3, start the new datanode.
But it really not update the new topology mapping.
It just show the start in
Thanks Brahmareddy,
Do we need to create include and exclude files, and of which extension.
Please suggest.
Regards
Yogesh Kumar
From: Brahma Reddy Battula [brahmareddy.batt...@huawei.com]
Sent: Wednesday, September 12, 2012 10:16 AM
To: user@hadoop.apache.org
S
Jameson,
The right process to add a new node with the right mapping is:
1. Update topology file for the new DN.
2. Issue a dfsadmin -refreshNodes to get new topology mapping updated in NN.
3. Start the DN only after (2) so it picks up the right mapping and a
default mapping does not get cached.
Hey Radim,
Does your job use the FileOutputCommitter?
On Thu, Sep 13, 2012 at 4:21 AM, Radim Kolar wrote:
> with speculative execution enabled Hadoop can run task attempt on more then
> 1 node. If mapper is using multipleoutputs then second attempt (or sometimes
> even all) fails to create outpu
Thanks for your response. Can someone see if this is ok? I am not getting any
records when I query the hive table when I use Partitions.
This is how I am creating the table.
CREATE EXTERNAL TABLE Data (field1 STRING,field2) PARTITIONED BY(year
STRING, month STRING, dayofmonth STRING) ROW FOR
Our hadoop version is hadoop-0.20-append+4.
We have configured the rack awareness in the namenode.
But when I add new datanode, and update the topology data file, and restart
the datanode, I just see the log in the namenode that:
2012-09-13 10:35:25,074 INFO org.apache.hadoop.net.NetworkTopology:
Hi,
Sorry, i sent to the wrong ML.
Please ignore this.
Thank you.
> Hi,
>
> I'm trying to do some text analysis using mahout kmeans (clustering),
> processing the data on hadoop.
> --numClusters = 160
> --maxIter (-x) maxIter = 200
>
> Well my data is small, around 500MB .
> I have 4 servers,
Hi,
I'm trying to do some text analysis using mahout kmeans (clustering),
processing the data on hadoop.
--numClusters = 160
--maxIter (-x) maxIter = 200
Well my data is small, around 500MB .
I have 4 servers, each with 4CPU and TaskTrackers are set to 4 as
maximum.
When i run the mahout task, i
with speculative execution enabled Hadoop can run task attempt on more
then 1 node. If mapper is using multipleoutputs then second attempt (or
sometimes even all) fails to create output file because it is being
created by another attempt:
attempt_1347286420691_0011_m_00_0
attempt_134728642
Hi,
I'm using Hadoop 1.0.1, I tried to follow
https://ccp.cloudera.com/display/CDHDOC/Configuring+Hadoop+Security+in+CDH3+%28KSSL%29
to
configure hadoop with kerberos authentication. I configured KDC and added
hdfs, mapred, host principles for each node to kerberos and deployed the
keytabs to each
Hi Natraj
Create a partitioned table and add the sub dirs as partitions. You need to have
some logic in place for determining the partitions. Say if the sub dirs denote
data based on a date then make date as the partition.
Regards
Bejoy KS
Sent from handheld, please excuse typos.
-Origi
I have a hive external table created from a hdfs location. How do I make it
read the data from all the subdirectories also?
Thanks.
***
The information contained in this communication is confidential, is
intended only for th
Hi again Nick,
DBInputFormat does use Connection.TRANSACTION_SERIALIZABLE, but this a per
connection attribute. Since every mapper has its own connection, and every
connection is opened in a different time, every connection sees a different
snapshot of the DB and it can cause for example two mapper
Ok I will do it!
From: rekha_jo...@intuit.com
To: user@hadoop.apache.org
Subject: Re: How get information messages when a JobControl is used ?
Date: Wed, 12 Sep 2012 11:41:21 +
Good
that web hdfs is sufficient for now, Piter!
The
counters are part of o.a.h.mapreduce.Job so you can ge
Good that web hdfs is sufficient for now, Piter!
The counters are part of o.a.h.mapreduce.Job so you can get them as
job.getCounters()..etc or via JobInProgress. It is not a JobControl feature as
such, so they will not be directly in JobControl/ControlledJob API.
However Bertrand's point is an i
Hi Rekha and Bertrand! Thanks for the answers! Ok I see that in web interface
(_logs->history->job_.) there are infos about executions of jobs.
I hope that this infos will be enough for me.
As I said before, scanning APIs, the only method that I found was
ControlledJob:toString().
Bye! :)
Hi,
I am using UserGroupInformation.doAs(...) in order to launch a job
programmatically from a remote application.
I was wondering : what is the expected behavior of nested
UserGroupInformation?
Is it the same as with Jaas? Which is, if I am not mistaken, the last inner
'subject' is used?
If that
But as far as I know there is no way to have a snapshot of the JobControl
state.
https://issues.apache.org/jira/browse/MAPREDUCE-3562
I was trying only to get the state of all jobs and it is not possible to
get a consistent view.
For Map/Reduce progress, I guess you could the same by digging into
Hi Piter,
JobControl just means there are multiple complex jobs, but you will see the
information for each job on your hadoop web interface webhdfs still, wouldn't
you?
Or if that does not work, you might need to use Reporters/Counters to get the
log info data in custom format as needed.
Thank
Hi,
Thank you for replaying the experiments.
I launched a job through hive with default TextInputFormat.
The job is TPC-H Q1 query, which is a simple selection query for lineitem table.
The each size of data (data01...data14) is about 300GB, so about
4.2TB(=300GB*14) in total.
I really appreciat
Hi! I'm using JobControl (v. 1.0.3) to chain two MapReduce applications. It
works and creates output data, but it doesn't give me back information messages
as number of mappers, number of records in input or in output, etc...
It only returns messages like this :
12/09/12 09:56:38 WARN mapred.J
All right, I got it~Thank you very much.
2012/9/11 Harsh J
> Hey Jason,
>
> While I am not sure on whats the best way to automatically "evaluate"
> during the execution of a job, the MultipleInputs class offers a way
> to run different map implementations within a single job for different
> inpu
Thanks for your reply.
But I'm not sure that woks since the data volume is large, which makes the
cost of shuffling quite high if all the process are applied in Reducer.
I thought the Hadoop would transfer all the output of Mapper to Reducer by
HTTP, right?
2012/9/11 Narasingu Ramesh
> Hi Jaso
26 matches
Mail list logo