=NameNodeInfo::LiveNodes
) and build in some flexibility, you shouldn't have any problems.
Regards,
Marcos
On 09-06-2013 11:30, Rita wrote:
Are there any specs for the JSON schema?
On Thu, Jun 6, 2013 at 9:49 AM, MARCOS MEDRADO RUBINELLI
marc...@buscapecompany.commailto:marc...@buscapecompany.com
Matthias,
As far as I know, there are no guarantees on when counters will be updated
during the job. One thing you can do is to write a metadata file along with
your parsed events listing what files have errors and should be ignored in the
next step of your ETL workflow.
If you really don't
We can adjust this according to our requirement to fine tune our cluster.
This is my thought.
On Mon, Apr 15, 2013 at 4:40 PM, MARCOS MEDRADO RUBINELLI
marc...@buscapecompany.commailto:marc...@buscapecompany.com wrote:
Hi,
I am currently tuning a cluster, and I haven't found much information
Tadas,
Hadoop Operations has pretty useful, up-to-date information. The chapter on
hardware selection is available here:
http://my.safaribooksonline.com/book/databases/hadoop/9781449327279/4dot-planning-a-hadoop-cluster/id2760689
Regards,
Marcos
Em 16-04-2013 07:13, Tadas Makčinskas escreveu:
Hi,
I am currently tuning a cluster, and I haven't found much information on
what factors to consider while adjusting the heap size of tasktrackers.
Is it a direct multiple of the number of map+reduce slots? Is there
anything else I should consider?
Thank you,
Marcos
It's a limitation of jps: it will only show processes run as the current user.
Try using sudo -u hdfs jps. This also means that you will have to sudo as the
appropriate user to run the stop/start scripts.
Regards,
Marcos
On 11-04-2013 06:26,
dfs.namenode.handler.count and dfs.datanode.handler.count control how many
concurrent threads the server will have to handle incoming requests. The
default values should be fine for smaller clusters, but if you have a lot of
simultaneous HDFS operations, you may see performance gains by
On your first call, Hadoop will return a URL pointing to a datanode in the
Location header of the 307 response. On your second call, you have to use that
URL instead of constructing your own. You can see the specific documentation
here:
http://hadoop.apache.org/docs/r1.0.4/webhdfs.html#CREATE
Harry,
It doesn't have to be a proper, fully qualified name. Sometimes hosts don't
even have one. It just has to match exactly what `hostname` returns, because
that's what each node will use to identify itself.
Also, only servers that will actually be part of the cluster need to be in
Gopi,
The namenode is essentially running bash -c id -Gn hduser and waiting for the
response. You could try executing from the shell it to see if it does take a
long time in Azure, or if the output is too complex to parse.
Regards,
Marcos
In 04-04-2013 09:33, Gopi Krishna M wrote:
in case
Zheyi,
The jobtracker doesn't keep a reference to the job to save memory, but you may
still find it in the filesystem. For a default CDH3 installation, it will be in
the jobtracker's local filesystem, at /var/log/hadoop-0.20/history/done/
Logs from individual tasks are a little trickier to
Felix,
After changing hdfs-site.xml, did you run hadoop dfsadmin -refreshNodes? That
should have been enough, but you can try increasing the replication factor of
these files, wait for them to be replicated to the new nodes, then setting it
back to its original value.
Cheers,
Marcos
In
You can set the mapred.tasktracker.map.tasks.maximum and
mapred.tasktracker.reduce.tasks.maximum properties in your
mapred-site.xml file, but you may also want to check your current
mapred.child.java.opts and mapred.child.ulimit values to make sure they
aren't overriding the 4GB you set
find the
processes to stop.
--
Marcos Medrado Rubinelli
Tecnologia - BuscaPé
Tel. +55 11 3848-8700 Ramal 8788
marc...@buscape-inc.com mailto:marc...@buscape-inc.com
14 matches
Mail list logo