streaming but no sorting

2009-04-28 Thread Dmitry Pushkarev
number of reducers to desired. This involves two highly ineffective (for the task) steps - sorting and fetching. Is there a way to get around that? Ideally I'd want all mapper outputs to be written to one file, one record per line. Thanks. --- Dmitry Pushkarev +1-650-644-8988

RE: CloudBurst: Hadoop for DNA Sequence Analysis

2009-04-08 Thread Dmitry Pushkarev
on DP alignment whereas navigation in seed space and N*log(n) sort requires only a fraction of that time - that was my experience applying hadoop cluster to sequencing human genomes. --- Dmitry Pushkarev +1-650-644-8988 -Original Message- From: michael.sch...@gmail.com [mailto:michael.sch

HDD benchmark/checking tool

2009-02-03 Thread Dmitry Pushkarev
everything abnormal for /dev/sdaX. But if you have better solution I'd appreciate if you share it. --- Dmitry Pushkarev +1-650-644-8988

RE: Is Hadoop Suitable for me?

2009-01-28 Thread Dmitry Pushkarev
Definitely not, You should be looking at expandable Ethernet storage that can be extended by connecting additional SAS arrays. (like dell powervault and similar things from other companies) 600Mb is just 6 seconds over gigabit network... --- Dmitry Pushkarev -Original Message- From

streaming split sizes

2009-01-20 Thread Dmitry Pushkarev
I'm wondering if there is any way to make hadoop give larger datasets to map jobs? (trivial way, of course would be to split dataset to N files and make it feed one file at a time, but is there any standard solution for that?) Thanks. --- Dmitry Pushkarev +1-650-644-8988

RE: streaming split sizes

2009-01-20 Thread Dmitry Pushkarev
we're running very small (15 nodes, 120 cores) specifically built for the task. --- Dmitry Pushkarev +1-650-644-8988 -Original Message- From: Delip Rao [mailto:delip...@gmail.com] Sent: Tuesday, January 20, 2009 6:19 PM To: core-user@hadoop.apache.org Subject: Re: streaming split sizes Hi

Hadoop and Matlab

2008-12-12 Thread Dmitry Pushkarev
Hi. Can anyone share experience of successfully parallelizing matlab tasks using hadoop? We have implemented this thing with python (in form of simple module that takes serialized function and data array and runs this function on the cluster)m but we really have no clue how to that in

Hadoop and security.

2008-10-05 Thread Dmitry Pushkarev
Dear hadoop users, I'm lucky to work in academic environment where information security is not the question. However, I'm sure that most of the hadoop users aren't. Here is the question: how secure hadoop is? (or let's say foolproof) Here is the answer:

hadoop under windows.

2008-10-03 Thread Dmitry Pushkarev
Hi. I have a strange problem with hadoop when I run jobs under windows (my laptop runs XP, but all cluster machines including namenode run Ubuntu). I run job (which runs perfectly under linux, and all configs and Java versions are the same), all mappers finishes successfully, and so does

jython HBase map/red task

2008-09-17 Thread Dmitry Pushkarev
Hi. I'm writing mapreduce task in jython, and I can't launch ToolRunner.run, jython says TypeError: integer required on ToolRunner.run line and I can't get more detailed explanation. I guess the error is either in ToolRunner or in setConf: What am I doing wrong? J And can anyone share a

RE: Installing Hadoop on OS X 10.5 single node cluster (MacPro) posted to wiki

2008-09-14 Thread Dmitry Pushkarev
Awesome, wish I had it couple of weeks ago. By the way, can someone give me a Jython code that interacts with HBase? I want to learn to write simple mapreducers (for example to go over all rows in a given column, compute something, and put result into another column). It work I promise to write

RE: HDFS

2008-09-13 Thread Dmitry Pushkarev
Why not use HAR over HDFS? Idea being that if you don't do too much of writing, having files compacter to har archives (that will be stored in 64mb slices) might be a good answer. Thus the question for hadoop developers, is hadoop har-aware? In two senses: 1. Whether it tries to assign tasks

RE: namenode multitreaded

2008-09-12 Thread Dmitry Pushkarev
because of a global lock, unfortunately. The other cpus would still be used to some extent by network IO and other threads. Usually we don't see just one cpu at 100% and nothing else on the other cpus. What kind of load do you have? Raghu. Dmitry Pushkarev wrote: Hi. My namenode runs

namenode multitreaded

2008-09-11 Thread Dmitry Pushkarev
Hi. My namenode runs on a 8-core server with lots of RAM, but it only uses one core (100%). Is it possible to tell namenode to use all available cores? Thanks.

RE: Thinking about retriving DFS metadata from datanodes!!!

2008-09-10 Thread Dmitry Pushkarev
This will effectively ruin system on large scale. Since you will have to update all blocks when you play with metadata... -Original Message- From: 叶双明 [mailto:[EMAIL PROTECTED] Sent: Wednesday, September 10, 2008 12:06 AM To: core-user@hadoop.apache.org Subject: Re: Thinking about

number of tasks on a node.

2008-09-09 Thread Dmitry Pushkarev
Hi. How can node find out how many task are being run on it at a given time? I want tasktracer nodes (which are assigned from amazon EC) to shutdown if nothing is being run for some period of time, but don't yet see right way of implementing this.

RE: task assignment managemens.

2008-09-08 Thread Dmitry Pushkarev
on that machine is running slower than others), speculative execution, if enabled, can help a lot. Also, implicitly, faster/better machines get more work than the slower machines. On 9/8/08 3:27 AM, Dmitry Pushkarev [EMAIL PROTECTED] wrote: Dear Hadoop users, Is it possible without

task assignment managemens.

2008-09-07 Thread Dmitry Pushkarev
Dear Hadoop users, Is it possible without using java manage task assignment to implement some simple rules? Like do not launch more that 1 instance of crawling task on a machine, and do not run data intensive tasks on remote machines, and do not run computationally intensive tasks on

RE: no output from job run on cluster

2008-09-04 Thread Dmitry Pushkarev
Hi, I'd check java version installed, that was the problem in my case, and surprisingly no output from hadoop. If it help - can you submit bug request ? :) -Original Message- From: Shirley Cohen [mailto:[EMAIL PROTECTED] Sent: Thursday, September 04, 2008 10:07 AM To:

RE: har/unhar utility

2008-09-03 Thread Dmitry Pushkarev
/docs/r0.18.0/hadoop_archives.html On 9/3/08 3:21 PM, Dmitry Pushkarev [EMAIL PROTECTED] wrote: Does anyone have har/unhar utility? Or at least format description: It looks pretty obvious though, but just in case. Thanks

RE: har/unhar utility

2008-09-03 Thread Dmitry Pushkarev
03, 2008 4:00 AM To: core-user@hadoop.apache.org Subject: Re: har/unhar utility You could create a har archive of the small files and then pass the corresponding har filesystem as input to your mapreduce job. Would that work? On 9/3/08 4:24 PM, Dmitry Pushkarev [EMAIL PROTECTED] wrote

datanodes in virtual networks.

2008-09-01 Thread Dmitry Pushkarev
Dear hadoop users, Our lab in slowly switching from SGE to hadoop, however not everything seems to be easy and obvious. We are in no way computer scientists, we're just physicists, biologist and couple of statisticians trying to solve our computational problems, please take this into