Re: Management API

2013-06-12 Thread MARCOS MEDRADO RUBINELLI
=NameNodeInfo::LiveNodes ) and build in some flexibility, you shouldn't have any problems. Regards, Marcos On 09-06-2013 11:30, Rita wrote: Are there any specs for the JSON schema? On Thu, Jun 6, 2013 at 9:49 AM, MARCOS MEDRADO RUBINELLI marc...@buscapecompany.commailto:marc...@buscapecompany.com

Re: AW: How to process only input files containing 100% valid rows

2013-04-19 Thread MARCOS MEDRADO RUBINELLI
Matthias, As far as I know, there are no guarantees on when counters will be updated during the job. One thing you can do is to write a metadata file along with your parsed events listing what files have errors and should be ignored in the next step of your ETL workflow. If you really don't

Re: Adjusting tasktracker heap size?

2013-04-17 Thread MARCOS MEDRADO RUBINELLI
We can adjust this according to our requirement to fine tune our cluster. This is my thought. On Mon, Apr 15, 2013 at 4:40 PM, MARCOS MEDRADO RUBINELLI marc...@buscapecompany.commailto:marc...@buscapecompany.com wrote: Hi, I am currently tuning a cluster, and I haven't found much information

Re: HW infrastructure for Hadoop

2013-04-16 Thread MARCOS MEDRADO RUBINELLI
Tadas, Hadoop Operations has pretty useful, up-to-date information. The chapter on hardware selection is available here: http://my.safaribooksonline.com/book/databases/hadoop/9781449327279/4dot-planning-a-hadoop-cluster/id2760689 Regards, Marcos Em 16-04-2013 07:13, Tadas Makčinskas escreveu:

Adjusting tasktracker heap size?

2013-04-15 Thread MARCOS MEDRADO RUBINELLI
Hi, I am currently tuning a cluster, and I haven't found much information on what factors to consider while adjusting the heap size of tasktrackers. Is it a direct multiple of the number of map+reduce slots? Is there anything else I should consider? Thank you, Marcos

Re: jps show nothing but hadoop still running

2013-04-11 Thread MARCOS MEDRADO RUBINELLI
It's a limitation of jps: it will only show processes run as the current user. Try using sudo -u hdfs jps. This also means that you will have to sudo as the appropriate user to run the stop/start scripts. Regards, Marcos On 11-04-2013 06:26,

Re: UNDERSTANDING HADOOP PERFORMANCE

2013-04-11 Thread MARCOS MEDRADO RUBINELLI
dfs.namenode.handler.count and dfs.datanode.handler.count control how many concurrent threads the server will have to handle incoming requests. The default values should be fine for smaller clusters, but if you have a lot of simultaneous HDFS operations, you may see performance gains by

RES: I want to call HDFS REST api to upload a file using httplib.

2013-04-08 Thread MARCOS MEDRADO RUBINELLI
On your first call, Hadoop will return a URL pointing to a datanode in the Location header of the 307 response. On your second call, you have to use that URL instead of constructing your own. You can see the specific documentation here: http://hadoop.apache.org/docs/r1.0.4/webhdfs.html#CREATE

RE: [RFC] Deploy CDH4 on a cluseter

2013-04-07 Thread MARCOS MEDRADO RUBINELLI
Harry, It doesn't have to be a proper, fully qualified name. Sometimes hosts don't even have one. It just has to match exactly what `hostname` returns, because that's what each node will use to identify itself. Also, only servers that will actually be part of the cluster need to be in

Re: one minute delay in running a simple ls command on hadoop (maybe near security groups..): hadoop 0.23.5

2013-04-04 Thread MARCOS MEDRADO RUBINELLI
Gopi, The namenode is essentially running bash -c id -Gn hduser and waiting for the response. You could try executing from the shell it to see if it does take a long time in Azure, or if the output is too complex to parse. Regards, Marcos In 04-04-2013 09:33, Gopi Krishna M wrote: in case

Re: Job log location and retention

2013-04-03 Thread MARCOS MEDRADO RUBINELLI
Zheyi, The jobtracker doesn't keep a reference to the job to save memory, but you may still find it in the filesystem. For a default CDH3 installation, it will be in the jobtracker's local filesystem, at /var/log/hadoop-0.20/history/done/ Logs from individual tasks are a little trickier to

Re: Why do some blocks refuse to replicate...?

2013-03-28 Thread MARCOS MEDRADO RUBINELLI
Felix, After changing hdfs-site.xml, did you run hadoop dfsadmin -refreshNodes? That should have been enough, but you can try increasing the replication factor of these files, wait for them to be replicated to the new nodes, then setting it back to its original value. Cheers, Marcos In

Re: Set number Reducer per machines.

2010-10-05 Thread Marcos Medrado Rubinelli
You can set the mapred.tasktracker.map.tasks.maximum and mapred.tasktracker.reduce.tasks.maximum properties in your mapred-site.xml file, but you may also want to check your current mapred.child.java.opts and mapred.child.ulimit values to make sure they aren't overriding the 4GB you set

Re: why does 'jps' lose track of hadoop processes ?

2010-03-29 Thread Marcos Medrado Rubinelli
find the processes to stop. -- Marcos Medrado Rubinelli Tecnologia - BuscaPé Tel. +55 11 3848-8700 Ramal 8788 marc...@buscape-inc.com mailto:marc...@buscape-inc.com