Re: Why do some blocks refuse to replicate...?

2013-03-28 Thread MARCOS MEDRADO RUBINELLI
Felix, After changing hdfs-site.xml, did you run "hadoop dfsadmin -refreshNodes"? That should have been enough, but you can try increasing the replication factor of these files, wait for them to be replicated to the new nodes, then setting it back to its original value. Cheers, Marcos In 28-0

Re: Job log location and retention

2013-04-03 Thread MARCOS MEDRADO RUBINELLI
Zheyi, The jobtracker doesn't keep a reference to the job to save memory, but you may still find it in the filesystem. For a default CDH3 installation, it will be in the jobtracker's local filesystem, at /var/log/hadoop-0.20/history/done/ Logs from individual tasks are a little trickier to find

Re: one minute delay in running a simple ls command on hadoop (maybe near security groups..): hadoop 0.23.5

2013-04-04 Thread MARCOS MEDRADO RUBINELLI
Gopi, The namenode is essentially running bash -c "id -Gn hduser" and waiting for the response. You could try executing from the shell it to see if it does take a long time in Azure, or if the output is too complex to parse. Regards, Marcos In 04-04-2013 09:33, Gopi Krishna M wrote: in case an

RE: [RFC] Deploy CDH4 on a cluseter

2013-04-07 Thread MARCOS MEDRADO RUBINELLI
Harry, It doesn't have to be a proper, fully qualified name. Sometimes hosts don't even have one. It just has to match exactly what `hostname` returns, because that's what each node will use to identify itself. Also, only servers that will actually be part of the cluster need to be in /etc/hos

RES: I want to call HDFS REST api to upload a file using httplib.

2013-04-08 Thread MARCOS MEDRADO RUBINELLI
On your first call, Hadoop will return a URL pointing to a datanode in the Location header of the 307 response. On your second call, you have to use that URL instead of constructing your own. You can see the specific documentation here: http://hadoop.apache.org/docs/r1.0.4/webhdfs.html#CREATE R

Re: jps show nothing but hadoop still running

2013-04-11 Thread MARCOS MEDRADO RUBINELLI
It's a limitation of jps: it will only show processes run as the current user. Try using "sudo -u hdfs jps". This also means that you will have to sudo as the appropriate user to run the stop/start scripts. Regards, Marcos On 11-04-2013 06:26, zhang.hen...@zte.com.cn

Re: UNDERSTANDING HADOOP PERFORMANCE

2013-04-11 Thread MARCOS MEDRADO RUBINELLI
dfs.namenode.handler.count and dfs.datanode.handler.count control how many concurrent threads the server will have to handle incoming requests. The default values should be fine for smaller clusters, but if you have a lot of simultaneous HDFS operations, you may see performance gains by increasi

Adjusting tasktracker heap size?

2013-04-15 Thread MARCOS MEDRADO RUBINELLI
Hi, I am currently tuning a cluster, and I haven't found much information on what factors to consider while adjusting the heap size of tasktrackers. Is it a direct multiple of the number of map+reduce slots? Is there anything else I should consider? Thank you, Marcos

Re: HW infrastructure for Hadoop

2013-04-16 Thread MARCOS MEDRADO RUBINELLI
Tadas, "Hadoop Operations" has pretty useful, up-to-date information. The chapter on hardware selection is available here: http://my.safaribooksonline.com/book/databases/hadoop/9781449327279/4dot-planning-a-hadoop-cluster/id2760689 Regards, Marcos Em 16-04-2013 07:13, Tadas MakĨinskas escreveu

Re: Adjusting tasktracker heap size?

2013-04-17 Thread MARCOS MEDRADO RUBINELLI
can adjust this according to our requirement to fine tune our cluster. This is my thought. On Mon, Apr 15, 2013 at 4:40 PM, MARCOS MEDRADO RUBINELLI mailto:marc...@buscapecompany.com>> wrote: Hi, I am currently tuning a cluster, and I haven't found much information on what factors to co

Re: Physically moving HDFS cluster to new

2013-04-18 Thread MARCOS MEDRADO RUBINELLI
Here's a rough guideline: Moving a cluster isn't all that different from upgrading it. The initial steps are the same: - stop your mapreduce services - switch you namenode to safe mode - generate a final image with -saveNamespace - stop your hfds services - back up your metadata - as long as you

Re: AW: How to process only input files containing 100% valid rows

2013-04-19 Thread MARCOS MEDRADO RUBINELLI
Matthias, As far as I know, there are no guarantees on when counters will be updated during the job. One thing you can do is to write a metadata file along with your parsed events listing what files have errors and should be ignored in the next step of your ETL workflow. If you really don't wa

Re: Management API

2013-06-09 Thread MARCOS MEDRADO RUBINELLI
Brian, If you have access to the web UI, you can get those metrics in JSON from the JMXJsonServlet. Try hitting http://namenode_hostname:50070/jmx?qry=Hadoop:* and http://jobtracker_v1_hostname:50030/jmx?qry=hadoop:* It isn't as extensive as other options, but if you just need a snapshot of nod

Re: Management API

2013-06-12 Thread MARCOS MEDRADO RUBINELLI
,name=NameNodeInfo::LiveNodes ) and build in some flexibility, you shouldn't have any problems. Regards, Marcos On 09-06-2013 11:30, Rita wrote: Are there any specs for the JSON schema? On Thu, Jun 6, 2013 at 9:49 AM, MARCOS MEDRADO RUBINELLI mailto:marc...@buscapecompany.com>> wrote