[ANNOUNCE] hamake-1.1

2009-06-10 Thread Vadim Zaliva
s of files. New files may be added (or removed) at arbitrary location which may trigger recalculation of data depending on them. It is similar to unix 'make' utility. Sincerely, Vadim Zaliva

Re: Complex workflows in Hadoop

2009-04-16 Thread Vadim Zaliva
Cascading is great. If you looking for more pragmatic approach, which would allow you to build a workflow from existing Hadoop tasks and PIG scripts without writing additional Java code you may want to take a look at HAMAKE: http://code.google.com/p/hamake/ Vadim

[ANNOUNCE] hamake-1.0

2009-04-13 Thread Vadim Zaliva
HAMAKE is make-like utility for Hadoop. More information at the project page: http://code.google.com/p/hamake/ Documentation is still quite poor, but core functionality is working and I plan on improving it further. Sincerely, Vadim

Re: Hadoop topology.script.file.name Form

2009-03-18 Thread Vadim Zaliva
I just got around configuring this in my hadoop-0.18.3 install and I can share my working topology script. Documentaion is a bit confusing on this matter, so I hope it would be helpful. The script is called by namenode as datanotes first connect to it. It is passed an IP address of a datanode as a

Re: Cloudera's Distribution for Hadoop

2009-03-16 Thread Vadim Zaliva
Great news! I've been using homemade hadoop RPMs for some time and will be glad to switch to these. Since I am using bleeding edge version of pig I will be interested in PIG RPMs done daily from PIG SVN. Vadim On Mon, Mar 16, 2009 at 19:34, Christophe Bisciglia wrote: > Mark, this is great feed

Re: tuning performance

2009-03-14 Thread Vadim Zaliva
Scott, Thanks for interesting information. By JBOD, I assume you mean just listing multiple partition mount points in hadoop config? Vadim On Fri, Mar 13, 2009 at 12:48, Scott Carey wrote: > On 3/13/09 11:56 AM, "Allen Wittenauer" wrote: > > On 3/13/09 11:25 AM, &q

Re: tuning performance

2009-03-13 Thread Vadim Zaliva
>    When you stripe you automatically make every disk in the system have the > same speed as the slowest disk.  In our experiences, systems are more likely > to have a 'slow' disk than a dead one and detecting that is really > really hard.  In a distributed system, that multiplier effect can h

Re: tuning performance

2009-03-12 Thread Vadim Zaliva
bly a good starting point, so for eight cores, > you should have at least 4 disks. > > - Aaron > > On Wed, Mar 11, 2009 at 10:15 AM, Vadim Zaliva wrote: > >> Hi! >> >> I have a question about fine-tunining hadoop performance on 8-core >> machines. >>

tuning performance

2009-03-11 Thread Vadim Zaliva
Hi! I have a question about fine-tunining hadoop performance on 8-core machines. I have 2 machines I am testing. One is 8-core Xeon and another is 8-core Opteron. 16Gb RAM each. They both run mapreduce and dfs nodes. Currently I've set up each of them to run 32 map and 8 reduce tasks. Also, HADOOP

Re: Using Hadoop for near real-time processing of log data

2009-02-25 Thread Vadim Zaliva
On Wed, Feb 25, 2009 at 05:59, Ryan LeCompte wrote: > Hello all, > > Is anyone using Hadoop as more of a near/almost real-time processing > of log data for their systems to aggregate stats, etc? I know that > Hadoop has generally been good at off-line processing of large amounts > of data, but I'v

Re: Skip Reduce Phase

2009-02-24 Thread Vadim Zaliva
reducers? > You could do that by doing > > job.setNumReduceTasks(0); > > Jothi > > > On 2/25/09 10:34 AM, "Vadim Zaliva" wrote: > >> On Thu, Feb 7, 2008 at 10:07, Owen O'Malley wrote: >> >>> Setting it to 0 skips all of the buffering, sort

Re: Skip Reduce Phase

2009-02-24 Thread Vadim Zaliva
On Thu, Feb 7, 2008 at 10:07, Owen O'Malley wrote: > Setting it to 0 skips all of the buffering, sorting, merging, and shuffling. > It passes the objects straight from the mapper to the output format, which > writes it straight to hdfs. I just tried to set number or Reduce tasks to 0, but Job Tr

Re: stable version

2009-02-11 Thread Vadim Zaliva
t; 2009/2/11 Owen O'Malley : >> >> On Feb 10, 2009, at 7:21 PM, Vadim Zaliva wrote: >> >>> Maybe version 0.18 >>> is better suited for production environment? >> >> Yahoo is mostly on 0.18.3 + some patches at this point. >> >> -- Owen >> > > > > -- > M. Raşit ÖZDAŞ >

stable version

2009-02-10 Thread Vadim Zaliva
Hi! Kind of novice question, but I need to know, what Hadoop version is considered stable. I was trying to run version 0.19, and I've seen numerous stability issues with it. Maybe version 0.18 is better suited for production environment? Vadim

Re: lost TaskTrackers

2009-02-09 Thread Vadim Zaliva
I am starting to wonder If hadoop 19 stable enough for production? Vadim On 2/9/09, Vadim Zaliva wrote: > yes, I can access DFS from the cluster. namenode status seems to be OK > and I see no errors in namenode log files. > > initially all trackers were visible, and 9433 ma

Re: lost TaskTrackers

2009-02-09 Thread Vadim Zaliva
works at first and then starts failing. Vadim On Sun, Feb 8, 2009 at 22:19, Amar Kamat wrote: > Vadim Zaliva wrote: >> >> Hi! >> >> I am observing strange situation in my Hadoop cluster. While running >> task, eventually it gets into >> this strange mode wh

lost TaskTrackers

2009-02-08 Thread Vadim Zaliva
Hi! I am observing strange situation in my Hadoop cluster. While running task, eventually it gets into this strange mode where: 1. JobTracker reports 0 task trackers. 2. Task tracker processes are alive but log file is full of repeating messages like this: 2009-02-08 19:16:47,761 INFO org.apach

Re: Zeroconf for hadoop

2009-01-26 Thread Vadim Zaliva
On Mon, Jan 26, 2009 at 11:22, Edward Capriolo wrote: > Zeroconf is more focused on simplicity then security. One of the > original problems that may have been fixes is that any program can > announce any service. IE my laptop can announce that it is the DNS for > google.com etc. I see two distin

DBOutputFormat and auto-generated keys

2009-01-26 Thread Vadim Zaliva
Is it possible to obtain auto-generated IDs when writing data using DBOutputFormat? For example, is it possible to write Mapper which stores records in DB and returns auto-generated IDs of these records? Let me explain what I am trying to achieve: I have data like this which I would like to st

Re: realtime hadoop

2008-06-24 Thread Vadim Zaliva
s your data you need to close the file - means you might >> > have many small file, a situation where hdfs is not very strong >> > (namespace is hold in memory). >> > Hbase might be an interesting tool for you, also zookeeper if you want >> > to do something home g

realtime hadoop

2008-06-23 Thread Vadim Zaliva
Hi! I am considering using Hadoop for (almost) realime data processing. I have data coming every second and I would like to use hadoop cluster to process it as fast as possible. I need to be able to maintain some guaranteed max. processing time, for example under 3 minutes. Does anybody have expe

Re: Low complexity way to write a file to hdfs?

2008-01-30 Thread Vadim Zaliva
On Jan 30, 2008, at 13:57, Jason Venner wrote: I think somebody mentioned WebDAV support. That would work for me, so I can PUT files. Vadim I suppose we could add a feature to the hdfs web ui to allow uploading files. Ted Dunning wrote: I am looking for a way for scripts to write data to HD

Re: broken gzip file

2008-01-29 Thread Vadim Zaliva
l configuration is definitive. On 1/29/08 10:33 AM, "Vadim Zaliva" <[EMAIL PROTECTED]> wrote: I have a bunch of gzip files which I am trying to process with Hadoop task. The task fails with exception: java.io.EOFException: Unexpected end of ZLIB input stream at java.util.zip.In

Re: broken gzip file

2008-01-29 Thread Vadim Zaliva
On Jan 29, 2008 10:50 AM, Ted Dunning <[EMAIL PROTECTED]> wrote: > IF you drill into the task using the job tracker's web interface, you can > get to the tasks xml configuration. That configuration will have the input > file split specification in it. > > You may also be able to see the input file

broken gzip file

2008-01-29 Thread Vadim Zaliva
I have a bunch of gzip files which I am trying to process with Hadoop task. The task fails with exception: java.io.EOFException: Unexpected end of ZLIB input stream at java.util.zip.InflaterInputStream.fill(InflaterInputStream.java:223) at java.util.zip.InflaterInputStream.read(InflaterInputS

Re: Hadoop-2438

2008-01-28 Thread Vadim Zaliva
On Jan 26, 2008, at 23:36, Otis Gospodnetic wrote: Miles and Vadim - are you aware of the new Lucene sub-project, Mahout? I think Grant Ingersoll mentioned it here the other day... http://lucene.apache.org/mahout/ Yes, it looks very interesting. I will be following it closely and maybe ev

Re: Hadoop-2438

2008-01-22 Thread Vadim Zaliva
On Jan 22, 2008, at 15:17, Miles Osborne wrote: Thanks Miles! Vadim There are machine-learning papers dealing with Map Reduce proper, eg: *Map-Reduce for Machine Learning on Multicore*. Cheng-Tao Chu, Sang Kyun Kim, Yi-An Lin, YuanYuan Yu, Gary Bradski, Andrew Y. Ng and Kunle Olukotun. I

Re: Hadoop-2438

2008-01-22 Thread Vadim Zaliva
On Jan 22, 2008, at 14:44, Ted Dunning wrote: I am also very interested in machine learning applications of MapReduce. Collaborative Filtering in particular. If there are some lists/groups/ publications related to this subject I will appreciate any pointers. Sincerely, Vadim I would love to