Database insertion by HAdoop

2013-02-18 Thread Masoud
Dear All, We are going to do our experiment of a scientific papers, ] We must insert data in our database for later consideration, it almost 300 tables each one has 2/000/000 records. as you know It takes lots of time to do it with a single machine, we are going to use our Hadoop cluster (32

答复: 答复: some ideas for QJM and NFS

2013-02-18 Thread 谢良
IMHO, it's not a recommented deploy manner to deploy JN and DN into same nodes. Regards, Liang 发件人: Azuryy Yu [azury...@gmail.com] 发送时间: 2013年2月18日 15:56 收件人: user@hadoop.apache.org 主题: Re: 答复: some ideas for QJM and NFS All JNs are deployed on the same node with

Re: 答复: 答复: some ideas for QJM and NFS

2013-02-18 Thread Azuryy Yu
I don't think this is an issue. QJM talks with JN using RPC, the default handlers are enough for both DN and JN in my testing. On Mon, Feb 18, 2013 at 4:02 PM, 谢良 xieli...@xiaomi.com wrote: IMHO, it's not a recommented deploy manner to deploy JN and DN into same nodes. Regards, Liang

答复: 答复: 答复: some ideas for QJM and NFS

2013-02-18 Thread 谢良
But it'll contend for I/O resources. when journal(...) is invoked in JN, there will be a flush(shouldFsync) operation, relying on Java's FileChannel.force(...) method, then turn into a f(data)sync system call. So it's better to deploy JN on different nodes(e.g. SAS+Raid ?) Regards, Liang

Re: product recommendations engine

2013-02-18 Thread Sofia Georgiakaki
Hello Douglass, you could take a look at Mahout and Myrrix projects. These are two projects thatprovide implementations of recommendation machine learning algorithms. There are MapReduce implementations as well, to support massive datasets. In addition, these systems provide client

Re: why my test result on dfs short circuit read is slower?

2013-02-18 Thread Michael Segel
On Feb 17, 2013, at 7:09 PM, Liu, Raymond raymond@intel.com wrote: io.file.buffer.size Drop this down to 64KB not 128KB. You have 16 cpu which really means 8 cores and 4 disks. Do you have Ganglia up and running? I'll wager that you'll see a lot of wait cpu cycles in both cases.

Database insertion by HAdoop

2013-02-18 Thread Masoud
Dear All, We are going to do our experiment of a scientific papers, ] We must insert data in our database for later consideration, it almost 300 tables each one has 2/000/000 records. as you know It takes lots of time to do it with a single machine, we are going to use our Hadoop cluster (32

Re: Database insertion by HAdoop

2013-02-18 Thread Mohammad Tariq
Hello Masoud, You can use the Bulk Load feature. You might find it more efficient than normal client APIs or using the TableOutputFormat. The bulk load feature uses a MapReduce job to output table data in HBase's internal data format, and then directly loads the generated StoreFiles

Re: Database insertion by HAdoop

2013-02-18 Thread Hemanth Yamijala
What database is this ? Was hbase mentioned ? On Monday, February 18, 2013, Mohammad Tariq wrote: Hello Masoud, You can use the Bulk Load feature. You might find it more efficient than normal client APIs or using the TableOutputFormat. The bulk load feature uses a MapReduce job

Re: Database insertion by HAdoop

2013-02-18 Thread Michael Segel
Nope HBase wasn't mentioned. The OP could be talking about using external tables and Hive. The OP could still be stuck in the RDBMs world and hasn't flattened his data yet. 2 million records? Kinda small dontcha think? Not Enough Information ... On Feb 18, 2013, at 8:58 AM, Hemanth

Piping output of hadoop command

2013-02-18 Thread Julian Wissmann
Hi, we're running a Hadoop cluster with hbase for the purpose of evaluating it as database for a research project and we've more or less decided to go with it. So now I'm exploring backup mechanisms and have decided to experiment with hadoops export functionality for that. What I am trying to

Re: Piping output of hadoop command

2013-02-18 Thread Jean-Marc Spaggiari
Hi Julian, I think it's not outputing on the standard output bu on the error one. You might want to test that: hadoop fs -copyToLocal FILE_IN_HDFS 12 | ssh REMOTE_HOST dd of=FILE_ON REMOTE_HOST Which will redirect the stderr to the stdout too. Not sure, but it might be your issue. JM

Using NFS mounted volume for Hadoop installation/configuration

2013-02-18 Thread Mehmet Belgin
Hi Everyone, Will it be any problem if I put the hadoop executables and configuration on a NFS volume, which is shared by all masters and slaves? This way the configuration changes will be available for all nodes, without need for synching any files. While this looks almost like a no-brainer,

Re: Piping output of hadoop command

2013-02-18 Thread Harsh J
Hi, The command you're looking for is not -copyToLocal (it doesn't really emit the file, which you seem to need here), but rather a simple -cat: Something like the below would make your command work: $ hadoop fs -cat FILE_IN_HDFS | ssh REMOTE_HOST dd of=TARGET_FILE On Mon, Feb 18, 2013 at

Re: Using NFS mounted volume for Hadoop installation/configuration

2013-02-18 Thread Chris Embree
I'm doing that currently. No problems to report so far. The only pitfall I've found is around NFS stability. If your NAS is 100% solid no problems. I've seen mtab get messed up and refuse to remount if NFS has any hiccups. If you want to really crazy, consider NFS for your datanode root fs.

Re: Using NFS mounted volume for Hadoop installation/configuration

2013-02-18 Thread Nan Zhu
I'm also maintaining an experimental Hadoop cluster, and I need to modify the Hadoop source code and test it, so just use NFS to deploy the latest version of code, no problem found yet Best, -- Nan Zhu School of Computer Science, McGill University On Monday, 18 February, 2013 at 1:09

Re: Using NFS mounted volume for Hadoop installation/configuration

2013-02-18 Thread Chen He
To use NFS as datanode fs may bring performance problems. Millions of requests may block your NFS server. On Mon, Feb 18, 2013 at 12:09 PM, Chris Embree cemb...@gmail.com wrote: I'm doing that currently. No problems to report so far. The only pitfall I've found is around NFS stability. If

Re: Using NFS mounted volume for Hadoop installation/configuration

2013-02-18 Thread Mehmet Belgin
It looks like the NFS stability and performance are two main concerns. Since my cluster is still experimental, I will continue to use NFS for now. In the future, when we have a larger production cluster, I will consider local configurations. Thank you all for your replies! -Mehmet On Feb

Re: Using NFS mounted volume for Hadoop installation/configuration

2013-02-18 Thread Chen He
Cloudera manager or Zettaset can be a choice if you like easy configuration. This type of software will do the rysnc for you. On Mon, Feb 18, 2013 at 12:53 PM, Mehmet Belgin mehmet.bel...@oit.gatech.edu wrote: It looks like the NFS stability and performance are two main concerns. Since my

Re: product recommendations engine

2013-02-18 Thread Douglass Davis
Ok thanks. Myrrix looks like it has much of the set-up work done so I am taking a closer look at that. On Mon, Feb 18, 2013 at 4:00 AM, Sofia Georgiakaki geosofie_...@yahoo.comwrote: Hello Douglass, you could take a look at Mahout and Myrrix projects. These are two projects that provide

Re: Using NFS mounted volume for Hadoop installation/configuration

2013-02-18 Thread Chris Embree
Just for clarification, we only use NFS for binaries and config files. HDFS and MarpRed write to local disk. We just don't install an OS there. :) On Mon, Feb 18, 2013 at 1:44 PM, Paul Wilkinson paul.m.wilkin...@gmail.comwrote: That requirement for 100% availability is the issue. If NFS goes

Re: Can I perfrom a MR on my local filesystem

2013-02-18 Thread Sandy Ryza
Hi Nikhil, The jobtracker doesn't do any deployment of other daemons. They are expected to be installed and started on other nodes separately. If I understand your question more broadly, MR doesn't necessarily run its map and reduce tasks on the nodes that contain the data. All data is read

Namenode formatting problem

2013-02-18 Thread Keith Wiley
This is Hadoop 2.0. Formatting the namenode produces no errors in the shell, but the log shows this: 2013-02-18 22:19:46,961 FATAL org.apache.hadoop.hdfs.server.namenode.NameNode: Exception in namenode join java.net.BindException: Problem binding to [ip-13-0-177-110:9212]

Re: Namenode formatting problem

2013-02-18 Thread Azuryy Yu
Because journal nodes are also be formated during NN format, so you need to start all JN daemons firstly. On Feb 19, 2013 7:01 AM, Keith Wiley kwi...@keithwiley.com wrote: This is Hadoop 2.0. Formatting the namenode produces no errors in the shell, but the log shows this: 2013-02-18

Re: Database insertion by HAdoop

2013-02-18 Thread Masoud
Hello Tariq, Our database is sql server 2008, and we dont need to develop a professional app, we just need to develop it fast and make our experiment result soon. Thanks On 02/18/2013 11:58 PM, Hemanth Yamijala wrote: What database is this ? Was hbase mentioned ? On Monday, February 18,

Re: Contribute to Hadoop Community

2013-02-18 Thread Alexander Alten-Lorenz
Hey, Thank you for offer. Please open a account @https://issues.apache.org/jira/ and file a Jira about your work, attach the patches and describe what changes you've done. To let review the patches you've submitted please open a account @reviews.apache.org and open a review request for your

Re: Contribute to Hadoop Community

2013-02-18 Thread Varsha Raveendran
Thank you very much for a quick response. On Tue, Feb 19, 2013 at 12:12 PM, Alexander Alten-Lorenz wget.n...@gmail.com wrote: Hey, Thank you for offer. Please open a account @https://issues.apache.org/jira/ and file a Jira about your work, attach the patches and describe what changes

Re: building from subversion repository

2013-02-18 Thread George R Goffe
Harsh, Thanks for your response. I have implemented your suggestions and have met with great success... sort of. Now I'm trying to figure out my build environment. I'm running into errors. Perhaps I can ask you further questions? Maybe make more bug reports, unless of course, the errors are