Dear All,
We are going to do our experiment of a scientific papers, ]
We must insert data in our database for later consideration, it almost
300 tables each one has 2/000/000 records.
as you know It takes lots of time to do it with a single machine,
we are going to use our Hadoop cluster (32
IMHO, it's not a recommented deploy manner to deploy JN and DN into same nodes.
Regards,
Liang
发件人: Azuryy Yu [azury...@gmail.com]
发送时间: 2013年2月18日 15:56
收件人: user@hadoop.apache.org
主题: Re: 答复: some ideas for QJM and NFS
All JNs are deployed on the same node with
I don't think this is an issue. QJM talks with JN using RPC, the default
handlers are enough for both DN and JN in my testing.
On Mon, Feb 18, 2013 at 4:02 PM, 谢良 xieli...@xiaomi.com wrote:
IMHO, it's not a recommented deploy manner to deploy JN and DN into same
nodes.
Regards,
Liang
But it'll contend for I/O resources. when journal(...) is invoked in JN, there
will be a flush(shouldFsync) operation, relying on Java's
FileChannel.force(...) method, then turn into a f(data)sync system call.
So it's better to deploy JN on different nodes(e.g. SAS+Raid ?)
Regards,
Liang
Hello Douglass,
you could take a look at Mahout and Myrrix projects. These are two projects
thatprovide implementations of recommendation machine learning algorithms.
There are MapReduce implementations as well, to support massive datasets.
In addition, these systems provide client
On Feb 17, 2013, at 7:09 PM, Liu, Raymond raymond@intel.com wrote:
io.file.buffer.size
Drop this down to 64KB not 128KB.
You have 16 cpu which really means 8 cores and 4 disks.
Do you have Ganglia up and running?
I'll wager that you'll see a lot of wait cpu cycles in both cases.
Dear All,
We are going to do our experiment of a scientific papers, ]
We must insert data in our database for later consideration, it almost
300 tables each one has 2/000/000 records.
as you know It takes lots of time to do it with a single machine,
we are going to use our Hadoop cluster (32
Hello Masoud,
You can use the Bulk Load feature. You might find it more
efficient than normal client APIs or using the TableOutputFormat.
The bulk load feature uses a MapReduce job to output table data
in HBase's internal data format, and then directly loads the
generated StoreFiles
What database is this ? Was hbase mentioned ?
On Monday, February 18, 2013, Mohammad Tariq wrote:
Hello Masoud,
You can use the Bulk Load feature. You might find it more
efficient than normal client APIs or using the TableOutputFormat.
The bulk load feature uses a MapReduce job
Nope HBase wasn't mentioned.
The OP could be talking about using external tables and Hive.
The OP could still be stuck in the RDBMs world and hasn't flattened his data
yet.
2 million records? Kinda small dontcha think?
Not Enough Information ...
On Feb 18, 2013, at 8:58 AM, Hemanth
Hi,
we're running a Hadoop cluster with hbase for the purpose of
evaluating it as database for a research project and we've more or
less decided to go with it.
So now I'm exploring backup mechanisms and have decided to experiment
with hadoops export functionality for that.
What I am trying to
Hi Julian,
I think it's not outputing on the standard output bu on the error one.
You might want to test that:
hadoop fs -copyToLocal FILE_IN_HDFS 12 | ssh REMOTE_HOST dd
of=FILE_ON REMOTE_HOST
Which will redirect the stderr to the stdout too.
Not sure, but it might be your issue.
JM
Hi Everyone,
Will it be any problem if I put the hadoop executables and configuration on a
NFS volume, which is shared by all masters and slaves? This way the
configuration changes will be available for all nodes, without need for
synching any files. While this looks almost like a no-brainer,
Hi,
The command you're looking for is not -copyToLocal (it doesn't really
emit the file, which you seem to need here), but rather a simple -cat:
Something like the below would make your command work:
$ hadoop fs -cat FILE_IN_HDFS | ssh REMOTE_HOST dd of=TARGET_FILE
On Mon, Feb 18, 2013 at
I'm doing that currently. No problems to report so far.
The only pitfall I've found is around NFS stability. If your NAS is 100%
solid no problems. I've seen mtab get messed up and refuse to remount if
NFS has any hiccups.
If you want to really crazy, consider NFS for your datanode root fs.
I'm also maintaining an experimental Hadoop cluster, and I need to modify the
Hadoop source code and test it,
so just use NFS to deploy the latest version of code, no problem found yet
Best,
--
Nan Zhu
School of Computer Science,
McGill University
On Monday, 18 February, 2013 at 1:09
To use NFS as datanode fs may bring performance problems. Millions of
requests may block your NFS server.
On Mon, Feb 18, 2013 at 12:09 PM, Chris Embree cemb...@gmail.com wrote:
I'm doing that currently. No problems to report so far.
The only pitfall I've found is around NFS stability. If
It looks like the NFS stability and performance are two main concerns. Since my
cluster is still experimental, I will continue to use NFS for now. In the
future, when we have a larger production cluster, I will consider local
configurations.
Thank you all for your replies!
-Mehmet
On Feb
Cloudera manager or Zettaset can be a choice if you like easy
configuration. This type of software will do the rysnc for you.
On Mon, Feb 18, 2013 at 12:53 PM, Mehmet Belgin
mehmet.bel...@oit.gatech.edu wrote:
It looks like the NFS stability and performance are two main concerns.
Since my
Ok thanks. Myrrix looks like it has much of the set-up work done so I am
taking a closer look at that.
On Mon, Feb 18, 2013 at 4:00 AM, Sofia Georgiakaki
geosofie_...@yahoo.comwrote:
Hello Douglass,
you could take a look at Mahout and Myrrix projects. These are two
projects that provide
Just for clarification, we only use NFS for binaries and config files.
HDFS and MarpRed write to local disk. We just don't install an OS there.
:)
On Mon, Feb 18, 2013 at 1:44 PM, Paul Wilkinson
paul.m.wilkin...@gmail.comwrote:
That requirement for 100% availability is the issue. If NFS goes
Hi Nikhil,
The jobtracker doesn't do any deployment of other daemons. They are
expected to be installed and started on other nodes separately.
If I understand your question more broadly, MR doesn't necessarily run its
map and reduce tasks on the nodes that contain the data. All data is read
This is Hadoop 2.0. Formatting the namenode produces no errors in the shell,
but the log shows this:
2013-02-18 22:19:46,961 FATAL org.apache.hadoop.hdfs.server.namenode.NameNode:
Exception in namenode join
java.net.BindException: Problem binding to [ip-13-0-177-110:9212]
Because journal nodes are also be formated during NN format, so you need to
start all JN daemons firstly.
On Feb 19, 2013 7:01 AM, Keith Wiley kwi...@keithwiley.com wrote:
This is Hadoop 2.0. Formatting the namenode produces no errors in the
shell, but the log shows this:
2013-02-18
Hello Tariq,
Our database is sql server 2008,
and we dont need to develop a professional app, we just need to develop
it fast and make our experiment result soon.
Thanks
On 02/18/2013 11:58 PM, Hemanth Yamijala wrote:
What database is this ? Was hbase mentioned ?
On Monday, February 18,
Hey,
Thank you for offer.
Please open a account @https://issues.apache.org/jira/ and file a Jira about
your work, attach the patches and describe what changes you've done. To let
review the patches you've submitted please open a account @reviews.apache.org
and open a review request for your
Thank you very much for a quick response.
On Tue, Feb 19, 2013 at 12:12 PM, Alexander Alten-Lorenz
wget.n...@gmail.com wrote:
Hey,
Thank you for offer.
Please open a account @https://issues.apache.org/jira/ and file a Jira
about your work, attach the patches and describe what changes
Harsh,
Thanks for your response.
I have implemented your suggestions and have met with great success... sort of.
Now I'm trying to figure out my build environment. I'm running into errors.
Perhaps I can ask you further questions? Maybe make more bug reports, unless of
course, the errors are
28 matches
Mail list logo