RE: How can I increase the speed balancing?

2014-09-04 Thread Srikanth upputuri
AFAIK, this setting is meant to throttle bandwidth usage by balancer so that the balancing traffic will not severely impact the performance of the running jobs. Increasing this value will show effect only when there is enough total available bandwidth on the network. On an already overloaded

Hadoop and Open Data (CKAN.org).

2014-09-04 Thread Henrik Aagaard Jørgensen
Dear all, I'm very new to Hadoop as I'm still trying to grasp its value and purpose. I do hope my question on this mailing list is OK. I manage our open data platform at our municipality, using CKAN.org. It works very well for its purpose of showing data and adding API's to data. However,

RE: question about matching java API with libHDFS

2014-09-04 Thread Liu, Yi A
You could refer to the header file: “src/main/native/libhdfs/hdfs.h”, you could get the APIs in detail. Regards, Yi Liu From: Demai Ni [mailto:nid...@gmail.com] Sent: Thursday, September 04, 2014 5:21 AM To: user@hadoop.apache.org Subject: question about matching java API with libHDFS hi,

RE: HDFS balance

2014-09-04 Thread Jamal B
Yes. We do it all the time. The node which you move this cron job to only needs to have the hadoop environment set up, and proper connectivity to the cluster in which it is writing to. On Sep 3, 2014 10:51 AM, John Lilley john.lil...@redpoint.net wrote: Can you run the load from an edge node

Re: Hadoop and Open Data (CKAN.org).

2014-09-04 Thread Alec Ten Harmsel
I would recommend using Hadoop only if you are ingesting a lot of data and you need reasonable performance at scale. I would recommend starting with using insert language/tool of choice to ingest and transform data until that process starts taking too long. For example, one of our researchers at

Re: Hadoop and Open Data (CKAN.org).

2014-09-04 Thread Mohan Radhakrishnan
I understand that coding MR jobs using a language is required but if we are just processing large amounts of data (Machine Learning for example) we could use Pig. I recently processed 0.25 TB on AWS clusters in a reasonably short time. In this case the development effort is very less. Thanks,

Re: Datanode can not start with error Error creating plugin: org.apache.hadoop.metrics2.sink.FileSink

2014-09-04 Thread Rich Haase
The reason you can't launch your datanode is: *2014-09-04 10:20:01,677 FATAL org.apache.hadoop.hdfs.server.datanode.DataNode: Exception in secureMain* *java.net.BindException: Port in use: 0.0.0.0:50075 http://0.0.0.0:50075/* It appears that you already have a datanode instance listening on port

Re: question about matching java API with libHDFS

2014-09-04 Thread Demai Ni
hi, Yi A, Thanks for your response. I took a look at hdfs.h and hdfs.c, it seems the lib only exposes some of APIs, as there are a lot of other public methods can be accessed through java API/client, but not implemented in libhdfs, such as the one I am using now: DFSclient.getNamenode().

[ANN] Multireducers - run multiple reducers on the same mapreduce job

2014-09-04 Thread Elazar Leibovich
I'll appreciate reviews of the code and the API of multireducers - a way to run a couple of map and reduce classes in the same MapReduce job. Thanks, https://github.com/elazarl/multireducers Usage example: MultiJob.create(). withMapper(SelectFirstField.class, Text.class,

Re: Need some tutorials for Mapreduce written in Python

2014-09-04 Thread Andrew Ehrlich
Also when you look at examples pay attention to the Hadoop version. The java API has changed a bit which can be confusing. On Aug 28, 2014, at 10:10 AM, Amar Singh amarsingh...@gmail.com wrote: Thank you to everyone who responded to this thread. I got couple of good moves and got some good