Hello,
I'm using Hadoop 0.17.0 to analyze some large amount of CSV files.
And I need to read such files in different character encoding from UTF-8,
but I think TextInputFormat doesn't support such character encoding.
I guess LineRecordReader class or Text class should support encoding
settings li
Jute spits out "java.util.ArrayList", which is correct.
> -Original Message-
> From: Runping Qi [mailto:[EMAIL PROTECTED]
> Sent: Monday, June 02, 2008 10:38 AM
> To: core-user@hadoop.apache.org
> Subject: FW: bug on jute?
>
>
>
>
>
>
>
> From: Fl
From: Flavio Junqueira [mailto:[EMAIL PROTECTED]
Sent: Saturday, May 31, 2008 2:27 AM
To: [EMAIL PROTECTED]
Subject: bug on jute?
Hi, I found a small bug on jute, and I was wondering how to proceed with
fixing it. The problem is the following. If I decla
Hi everyone,
I have a job running that keeps failing with Stack Overflows and I really
dont see how that is happening.
The job runs for about 20-30 minutes before one task errors, then a few more
error and it fails.
I am running hadoop-17 and ive tried lowering these settings to no avail:
io.sort.
--
View this message in context:
http://www.nabble.com/Stack-Overflow-When-Running-Job-tp17593524p17593524.html
Sent from the Hadoop core-user mailing list archive at Nabble.com.
Hi Sheraz,
As the others mentioned, one way to do this is via hadoop-streaming,
which allows you to specify any program as the mapper and reducer
parts of the MapReduce algorithm implemented by Hadoop.
In your case, I'd imagine a solution looking something like this:
1) Collect a batch of images
Hadoop goes to some lengths to make sure that things can stay in memory as
much as possible. There are still cases, however, where intermediate
results are normally written to disk. That means that implementors will
have those time scales in their head as they do things which will inevitably
mak
Hi.
I guess you can use Hadoop Streaming (
http://wiki.apache.org/hadoop/HadoopStreaming), if you'd pack your php image
processing into executables. That will run over Hadoop cluster.
- Yuri.
On Sun, Jun 1, 2008 at 7:38 PM, Sheraz Sharif <[EMAIL PROTECTED]> wrote:
> Hi all,
>
> I am new to Hadoop
On Sun, Jun 1, 2008 at 11:38 PM, Sheraz Sharif <[EMAIL PROTECTED]> wrote:
> Hi all,
>
> I am new to Hadoop and have very little experience with java. However, I am
> very experienced with PHP. I've seen one web page where a guy who wrote a
> map-reduce function in PHP, and others in Python.
>
> I
Hi all,
I am new to Hadoop and have very little experience with java.
However, I am very experienced with PHP. I've seen one web page
where a guy who wrote a map-reduce function in PHP, and others in
Python.
I would like to receive hundreds, if not thousands of images a day
and proces
I think that feature makes sense because starting JVM has overhead.
On Sun, Jun 1, 2008 at 4:26 AM, Christophe Taton <[EMAIL PROTECTED]> wrote:
> Actually Hadoop could be made more friendly to such realtime Map/Reduce
> jobs.
> For instance, we could consider running all tasks inside the task trac
Thanks, it's very nice to see that they integrated Map Reduce.
But as I understood it this does not work (yet) for distributed
systems, but only on one single machine.
Am 01.06.2008 um 14:33 schrieb Brice Arnould:
Hi !
With Qt 4.4, Trolltech provides a GPLed implementation of an in memory
Hi !
With Qt 4.4, Trolltech provides a GPLed implementation of an in memory
map/reduce for many languages (at least c++ and Java) as a part of
QtConcurrent.
I have not used this yet, but in general their API are well tough and their
code very slick. You might want to have a look at this.
Code s
That would indeed be a nice idea, that there could be other
implementations of TaskRunner suited for special hardware, or for in-
memory systems.
But if the communication remains the same (HDFS with disk access),
this would not necessarily make things faster in the shuffling phase
etc.
Thanks for your comments!
So in the case that all intermediate pairs fit into the RAM of the
cluster, does the InMemoryFileSystem already allow the intermediate
phase to be done without much disk access? Or what would be the
current bottleneck in Hadoop in this scenario (huge computational
Actually Hadoop could be made more friendly to such realtime Map/Reduce
jobs.
For instance, we could consider running all tasks inside the task tracker
jvm as separate threads, which could be implemented as another personality
of the TaskRunner.
I have been looking into this a couple of weeks ago..
Hadoop is highly optimized towards handling datasets that are much too large
to fit into memory. That means that there are many trade-offs that have
been made that make it much less useful for very short jobs or jobs that
would fit into memory easily.
Multi-core implementations of map-reduce are
17 matches
Mail list logo