s of files. New files may be added (or
removed) at arbitrary location which may trigger recalculation of data
depending on them. It is similar to unix 'make' utility.
Sincerely,
Vadim Zaliva
Cascading is great.
If you looking for more pragmatic approach, which would allow you to
build a workflow
from existing Hadoop tasks and PIG scripts without writing additional
Java code you may want to take a look at HAMAKE:
http://code.google.com/p/hamake/
Vadim
HAMAKE is make-like utility for Hadoop. More information at the project page:
http://code.google.com/p/hamake/
Documentation is still quite poor, but core functionality is working
and I plan on improving it further.
Sincerely,
Vadim
I just got around configuring this in my hadoop-0.18.3 install and I
can share my working topology script.
Documentaion is a bit confusing on this matter, so I hope it would be helpful.
The script is called by namenode as datanotes first connect to it. It
is passed an IP address of a datanode as a
Great news! I've been using homemade hadoop RPMs for some time and
will be glad to
switch to these.
Since I am using bleeding edge version of pig I will be interested in
PIG RPMs done daily from PIG SVN.
Vadim
On Mon, Mar 16, 2009 at 19:34, Christophe Bisciglia
wrote:
> Mark, this is great feed
Scott,
Thanks for interesting information. By JBOD, I assume you mean just listing
multiple partition mount points in hadoop config?
Vadim
On Fri, Mar 13, 2009 at 12:48, Scott Carey wrote:
> On 3/13/09 11:56 AM, "Allen Wittenauer" wrote:
>
> On 3/13/09 11:25 AM, &q
> When you stripe you automatically make every disk in the system have the
> same speed as the slowest disk. In our experiences, systems are more likely
> to have a 'slow' disk than a dead one and detecting that is really
> really hard. In a distributed system, that multiplier effect can h
bly a good starting point, so for eight cores,
> you should have at least 4 disks.
>
> - Aaron
>
> On Wed, Mar 11, 2009 at 10:15 AM, Vadim Zaliva wrote:
>
>> Hi!
>>
>> I have a question about fine-tunining hadoop performance on 8-core
>> machines.
>>
Hi!
I have a question about fine-tunining hadoop performance on 8-core machines.
I have 2 machines I am testing. One is 8-core Xeon and another is 8-core
Opteron. 16Gb RAM each. They both run mapreduce and dfs nodes. Currently
I've set up each of them to run 32 map and 8 reduce tasks.
Also, HADOOP
On Wed, Feb 25, 2009 at 05:59, Ryan LeCompte wrote:
> Hello all,
>
> Is anyone using Hadoop as more of a near/almost real-time processing
> of log data for their systems to aggregate stats, etc? I know that
> Hadoop has generally been good at off-line processing of large amounts
> of data, but I'v
reducers?
> You could do that by doing
>
> job.setNumReduceTasks(0);
>
> Jothi
>
>
> On 2/25/09 10:34 AM, "Vadim Zaliva" wrote:
>
>> On Thu, Feb 7, 2008 at 10:07, Owen O'Malley wrote:
>>
>>> Setting it to 0 skips all of the buffering, sort
On Thu, Feb 7, 2008 at 10:07, Owen O'Malley wrote:
> Setting it to 0 skips all of the buffering, sorting, merging, and shuffling.
> It passes the objects straight from the mapper to the output format, which
> writes it straight to hdfs.
I just tried to set number or Reduce tasks to 0, but Job Tr
t; 2009/2/11 Owen O'Malley :
>>
>> On Feb 10, 2009, at 7:21 PM, Vadim Zaliva wrote:
>>
>>> Maybe version 0.18
>>> is better suited for production environment?
>>
>> Yahoo is mostly on 0.18.3 + some patches at this point.
>>
>> -- Owen
>>
>
>
>
> --
> M. Raşit ÖZDAŞ
>
Hi!
Kind of novice question, but I need to know, what Hadoop version is
considered stable. I was
trying to run version 0.19, and I've seen numerous stability issues
with it. Maybe version 0.18
is better suited for production environment?
Vadim
I am starting to wonder If hadoop 19 stable enough for production?
Vadim
On 2/9/09, Vadim Zaliva wrote:
> yes, I can access DFS from the cluster. namenode status seems to be OK
> and I see no errors in namenode log files.
>
> initially all trackers were visible, and 9433 ma
works at
first and then starts failing.
Vadim
On Sun, Feb 8, 2009 at 22:19, Amar Kamat wrote:
> Vadim Zaliva wrote:
>>
>> Hi!
>>
>> I am observing strange situation in my Hadoop cluster. While running
>> task, eventually it gets into
>> this strange mode wh
Hi!
I am observing strange situation in my Hadoop cluster. While running
task, eventually it gets into
this strange mode where:
1. JobTracker reports 0 task trackers.
2. Task tracker processes are alive but log file is full of repeating
messages like this:
2009-02-08 19:16:47,761 INFO org.apach
On Mon, Jan 26, 2009 at 11:22, Edward Capriolo wrote:
> Zeroconf is more focused on simplicity then security. One of the
> original problems that may have been fixes is that any program can
> announce any service. IE my laptop can announce that it is the DNS for
> google.com etc.
I see two distin
Is it possible to obtain auto-generated IDs when writing data using
DBOutputFormat?
For example, is it possible to write Mapper which stores records in DB
and returns auto-generated
IDs of these records?
Let me explain what I am trying to achieve:
I have data like this
which I would like to st
s your data you need to close the file - means you might
>> > have many small file, a situation where hdfs is not very strong
>> > (namespace is hold in memory).
>> > Hbase might be an interesting tool for you, also zookeeper if you want
>> > to do something home g
Hi!
I am considering using Hadoop for (almost) realime data processing. I
have data coming every second and I would like to use hadoop cluster
to process
it as fast as possible. I need to be able to maintain some guaranteed
max. processing time, for example under 3 minutes.
Does anybody have expe
On Jan 30, 2008, at 13:57, Jason Venner wrote:
I think somebody mentioned WebDAV support. That would work for me,
so I can PUT files.
Vadim
I suppose we could add a feature to the hdfs web ui to allow
uploading files.
Ted Dunning wrote:
I am looking for a way for scripts to write data to HD
l
configuration is definitive.
On 1/29/08 10:33 AM, "Vadim Zaliva" <[EMAIL PROTECTED]> wrote:
I have a bunch of gzip files which I am trying to process with Hadoop
task. The task fails with exception:
java.io.EOFException: Unexpected end of ZLIB input stream at
java.util.zip.In
On Jan 29, 2008 10:50 AM, Ted Dunning <[EMAIL PROTECTED]> wrote:
> IF you drill into the task using the job tracker's web interface, you can
> get to the tasks xml configuration. That configuration will have the input
> file split specification in it.
>
> You may also be able to see the input file
I have a bunch of gzip files which I am trying to process with Hadoop
task. The task fails with exception:
java.io.EOFException: Unexpected end of ZLIB input stream at
java.util.zip.InflaterInputStream.fill(InflaterInputStream.java:223)
at java.util.zip.InflaterInputStream.read(InflaterInputS
On Jan 26, 2008, at 23:36, Otis Gospodnetic wrote:
Miles and Vadim - are you aware of the new Lucene sub-project,
Mahout? I think Grant Ingersoll mentioned it here the other day... http://lucene.apache.org/mahout/
Yes, it looks very interesting. I will be following it closely and
maybe ev
On Jan 22, 2008, at 15:17, Miles Osborne wrote:
Thanks Miles!
Vadim
There are machine-learning papers dealing with Map Reduce proper, eg:
*Map-Reduce for Machine Learning on Multicore*. Cheng-Tao Chu, Sang
Kyun
Kim, Yi-An Lin, YuanYuan Yu, Gary Bradski, Andrew Y. Ng and Kunle
Olukotun.
I
On Jan 22, 2008, at 14:44, Ted Dunning wrote:
I am also very interested in machine learning applications of MapReduce.
Collaborative Filtering in particular. If there are some lists/groups/
publications
related to this subject I will appreciate any pointers.
Sincerely,
Vadim
I would love to
28 matches
Mail list logo