Text file character encoding

2008-06-01 Thread NOMURA Yoshihide
Hello, I'm using Hadoop 0.17.0 to analyze some large amount of CSV files. And I need to read such files in different character encoding from UTF-8, but I think TextInputFormat doesn't support such character encoding. I guess LineRecordReader class or Text class should support encoding settings li

RE: bug on jute?

2008-06-01 Thread Vivek Ratan
Jute spits out "java.util.ArrayList", which is correct. > -Original Message- > From: Runping Qi [mailto:[EMAIL PROTECTED] > Sent: Monday, June 02, 2008 10:38 AM > To: core-user@hadoop.apache.org > Subject: FW: bug on jute? > > > > > > > > From: Fl

FW: bug on jute?

2008-06-01 Thread Runping Qi
From: Flavio Junqueira [mailto:[EMAIL PROTECTED] Sent: Saturday, May 31, 2008 2:27 AM To: [EMAIL PROTECTED] Subject: bug on jute? Hi, I found a small bug on jute, and I was wondering how to proceed with fixing it. The problem is the following. If I decla

Stack Overflow When Running Job

2008-06-01 Thread jkupferman
Hi everyone, I have a job running that keeps failing with Stack Overflows and I really dont see how that is happening. The job runs for about 20-30 minutes before one task errors, then a few more error and it fails. I am running hadoop-17 and ive tried lowering these settings to no avail: io.sort.

Stack Overflow When Running Job

2008-06-01 Thread jkupferman
-- View this message in context: http://www.nabble.com/Stack-Overflow-When-Running-Job-tp17593524p17593524.html Sent from the Hadoop core-user mailing list archive at Nabble.com.

Re: Image processing with Hadoop and PHP

2008-06-01 Thread Jim R. Wilson
Hi Sheraz, As the others mentioned, one way to do this is via hadoop-streaming, which allows you to specify any program as the mapper and reducer parts of the MapReduce algorithm implemented by Hadoop. In your case, I'd imagine a solution looking something like this: 1) Collect a batch of images

Re: In memory Map Reduce

2008-06-01 Thread Ted Dunning
Hadoop goes to some lengths to make sure that things can stay in memory as much as possible. There are still cases, however, where intermediate results are normally written to disk. That means that implementors will have those time scales in their head as they do things which will inevitably mak

Re: Image processing with Hadoop and PHP

2008-06-01 Thread Yuri Kudryavcev
Hi. I guess you can use Hadoop Streaming ( http://wiki.apache.org/hadoop/HadoopStreaming), if you'd pack your php image processing into executables. That will run over Hadoop cluster. - Yuri. On Sun, Jun 1, 2008 at 7:38 PM, Sheraz Sharif <[EMAIL PROTECTED]> wrote: > Hi all, > > I am new to Hadoop

Re: Image processing with Hadoop and PHP

2008-06-01 Thread Rong-en Fan
On Sun, Jun 1, 2008 at 11:38 PM, Sheraz Sharif <[EMAIL PROTECTED]> wrote: > Hi all, > > I am new to Hadoop and have very little experience with java. However, I am > very experienced with PHP. I've seen one web page where a guy who wrote a > map-reduce function in PHP, and others in Python. > > I

Image processing with Hadoop and PHP

2008-06-01 Thread Sheraz Sharif
Hi all, I am new to Hadoop and have very little experience with java. However, I am very experienced with PHP. I've seen one web page where a guy who wrote a map-reduce function in PHP, and others in Python. I would like to receive hundreds, if not thousands of images a day and proces

Re: Realtime Map Reduce = Supercomputing for the Masses?

2008-06-01 Thread Edward Capriolo
I think that feature makes sense because starting JVM has overhead. On Sun, Jun 1, 2008 at 4:26 AM, Christophe Taton <[EMAIL PROTECTED]> wrote: > Actually Hadoop could be made more friendly to such realtime Map/Reduce > jobs. > For instance, we could consider running all tasks inside the task trac

Re: Qt 4.4 / QtConcurrent

2008-06-01 Thread Martin Jaggi
Thanks, it's very nice to see that they integrated Map Reduce. But as I understood it this does not work (yet) for distributed systems, but only on one single machine. Am 01.06.2008 um 14:33 schrieb Brice Arnould: Hi ! With Qt 4.4, Trolltech provides a GPLed implementation of an in memory

Re: In memory Map Reduce

2008-06-01 Thread Brice Arnould
Hi ! With Qt 4.4, Trolltech provides a GPLed implementation of an in memory map/reduce for many languages (at least c++ and Java) as a part of QtConcurrent. I have not used this yet, but in general their API are well tough and their code very slick. You might want to have a look at this. Code s

Re: other implementations of TaskRunner

2008-06-01 Thread Martin Jaggi
That would indeed be a nice idea, that there could be other implementations of TaskRunner suited for special hardware, or for in- memory systems. But if the communication remains the same (HDFS with disk access), this would not necessarily make things faster in the shuffling phase etc.

Re: In memory Map Reduce

2008-06-01 Thread Martin Jaggi
Thanks for your comments! So in the case that all intermediate pairs fit into the RAM of the cluster, does the InMemoryFileSystem already allow the intermediate phase to be done without much disk access? Or what would be the current bottleneck in Hadoop in this scenario (huge computational

Re: Realtime Map Reduce = Supercomputing for the Masses?

2008-06-01 Thread Christophe Taton
Actually Hadoop could be made more friendly to such realtime Map/Reduce jobs. For instance, we could consider running all tasks inside the task tracker jvm as separate threads, which could be implemented as another personality of the TaskRunner. I have been looking into this a couple of weeks ago..

Re: Realtime Map Reduce = Supercomputing for the Masses?

2008-06-01 Thread Ted Dunning
Hadoop is highly optimized towards handling datasets that are much too large to fit into memory. That means that there are many trade-offs that have been made that make it much less useful for very short jobs or jobs that would fit into memory easily. Multi-core implementations of map-reduce are