Re: Indexed Hashtables

2009-01-14 Thread Sean Shanny
Delip, So far we have had pretty good luck with memcached. We are building a hadoop based solution for data warehouse ETL on XML based log files that represent click stream data on steroids. We process about 34 million records or about 70 GB data a day. We have to process dimensional da

Re: TestDFSIO delivers bad values of "throughput" and "average IO rate"

2009-01-14 Thread Konstantin Shvachko
In TestDFSIO we want each task to create only one file. It is a one-to-one mapping from files to map tasks. And splits are defined so that each map gets only one file name, which it creates or reads. --Konstantin tienduc_dinh wrote: I don't understand, why the parameter -nrFiles of TestDFSIO sh

Indexed Hashtables

2009-01-14 Thread Delip Rao
Hi, I need to lookup a large number of key/value pairs in my map(). Is there any indexed hashtable available as a part of Hadoop I/O API? I find Hbase an overkill for my application; something on the lines of HashStore (www.cellspark.com/hashstore.html) should be fine. Thanks, Delip

Re: Merging reducer outputs into a single part-00000 file

2009-01-14 Thread Jim Twensky
Owen and Rasit, Thank you for the responses. I've figured that mapred.reduce.tasks was set to 1 in my hadoop-default xml and I didn't overwrite it in my hadoop-site.xml configuration file. Jim On Wed, Jan 14, 2009 at 11:23 AM, Owen O'Malley wrote: > On Jan 14, 2009, at 12:46 AM, Rasit OZDAS wr

Re: RAID vs. JBOD

2009-01-14 Thread Runping Qi
Hi, We at Yahoo did some Hadoop benchmarking experiments on clusters with JBOD and RAID0. We found that under heavy loads (such as gridmix), JBOD cluster performed better. Gridmix tests: Load: gridmix2 Cluster size: 190 nodes Test results: RAID0: 75 minutes JBOD: 67 minutes Difference: 10% T

Re: getting null from CompressionCodecFactory.getCodec(Path file)

2009-01-14 Thread Chris Douglas
I got it. For some reason getDefaultExtension() returns ".lzo_deflate". Is that a bug? Shouldn't it be .lzo? The .lzo suffix is reserved for lzop (LzopCodec). LzoCodec doesn't generate compatible output, hence "lzo_deflate". -C In the head revision I couldn't find it at all in http://s

Re: General questions about Map-Reduce

2009-01-14 Thread tienduc_dinh
I got it ... Thanks to all Cheers, Duc -- View this message in context: http://www.nabble.com/General-questions-about-Map-Reduce-tp21399361p21461628.html Sent from the Hadoop core-user mailing list archive at Nabble.com.

Re: libhdfs append question[MESSAGE NOT SCANNED]

2009-01-14 Thread Craig Macdonald
Tamas, There is a patch attached to the issue, which you should be able to apply to get O_APPEND . https://issues.apache.org/jira/browse/HADOOP-4494 Craig Tamás Szokol wrote: Hi! I'm using the latest stable 0.19.0 version of hadoop. I'd like to try the new append functionality. Is it av

Re: Merging reducer outputs into a single part-00000 file

2009-01-14 Thread Owen O'Malley
On Jan 14, 2009, at 12:46 AM, Rasit OZDAS wrote: Jim, As far as I know, there is no operation done after Reducer. Correct, other than output promotion, which moves the output file to the final filename. But if you are a little experienced, you already know these. Ordered list means one

Hello, world for Hadoop + Lucene

2009-01-14 Thread John Howland
Howdy! Is there any sort of "Hello, world!" example for building a Lucene index with Hadoop? I am looking through the source in contrib/index and it is a bit beyond me at the moment. Alternatively, is there more documentation related to the contrib/index example code? There seems to be a lot of i

Does the logging properties still work?

2009-01-14 Thread Wu Wei
Hi all, In hadoop-default.xml file, I found properties hadoop.logfile.size and hadoop.logfile.count. I didn't change the default settings, but still log files grew beyond the limit. Does these logging properties still work in hadoop 0.19.0? btw: I found a class org.apache.hadoop.util.LogForm

HOD: cluster allocation don't work

2009-01-14 Thread Thomas Kobienia
Hi, We use an hadoop cluster with version 0.18.2. The static hdfs and mapred system works fine. Sometimes tasktrackerchild processes don't finish after job completion. That is another problem. My present issue is hod. I've set up torque and hod according the online documentations. The torque serv

Re: Re: getting null from CompressionCodecFactory.getCodec(Path file)

2009-01-14 Thread Tom White
LZO was removed due to license incompatibility: https://issues.apache.org/jira/browse/HADOOP-4874 Tom On Wed, Jan 14, 2009 at 11:18 AM, Gert Pfeifer wrote: > I got it. For some reason getDefaultExtension() returns ".lzo_deflate". > > Is that a bug? Shouldn't it be .lzo? > > In the head revision

Re: Re: getting null from CompressionCodecFactory.getCodec(Path file)

2009-01-14 Thread Gert Pfeifer
I got it. For some reason getDefaultExtension() returns ".lzo_deflate". Is that a bug? Shouldn't it be .lzo? In the head revision I couldn't find it at all in http://svn.apache.org/repos/asf/hadoop/core/trunk/src/core/org/apache/hadoop/io/compress/ There should be a Class LzoCodec.java. Was that

Re: Re: getting null from CompressionCodecFactory.getCodec(Path file)

2009-01-14 Thread Gert Pfeifer
Arun C Murthy wrote: > > On Jan 13, 2009, at 7:29 AM, Gert Pfeifer wrote: > >> Hi, >> I want to use an lzo file as input for a mapper. The record reader >> determines the codec using a CompressionCodecFactory, like this: >> >> (Hadoop version 0.19.0) >> > > http://hadoop.apache.org/core/docs/r0.

Re: Dynamic Node Removal and Addition

2009-01-14 Thread Rasit OZDAS
Hi Alyssa, http://markmail.org/message/jyo4wssouzlb4olm#query:%22Decommission%20of%20datanodes%22+page:1+mid:p2krkt6ebysrsrpl+state:results as pointed here, decommission (removal) of datanodes was not an easy job at the date of version 0.12. I strongly think it's still not easy. As far as I know,

Namenode freeze

2009-01-14 Thread Sagar Naik
Hi Datanode goes down. and then looks like ReplicationMonitor tries to even-out the replication However while doing so, it holds the lock on FsNameSystem With this lock held, other threads wait on this lock to respond As a result, the namenode does not list the dirs/ Web-UI does not respond I w

Re: Merging reducer outputs into a single part-00000 file

2009-01-14 Thread Rasit OZDAS
Jim, As far as I know, there is no operation done after Reducer. At the first look, the situation reminds me of same keys for all the tasks, This can be the result of one of following cases: - input format reads same keys for every task. - mapper collects every incoming key-value pairs under same