Re: Implementing VectorWritable

2009-12-28 Thread Jeff Zhang
The readFields and write method is empty ? When data is transfered from map phase to reduce phase, data is serialized and deserialized , so the write and readFields will be called. You should not leave them empty. Jeff Zhang On Tue, Dec 29, 2009 at 1:29 PM, bharath v < bharathvissapragada1...@

Implementing VectorWritable

2009-12-28 Thread bharath v
Hi , I've implemented a simple VectorWritable class as follows package com; import org.apache.hadoop.*; import org.apache.hadoop.io.*; import java.io.*; import java.util.Vector; public class VectorWritable implements WritableComparable { private Vector value = new Vector(); public Vector

XInclude and tags in conf files

2009-12-28 Thread Derek Brown
Regarding the ability to include other files in configuration files: http://issues.apache.org/jira/browse/HADOOP-4944 I'm seeing apparent differing behavior that may make sense to someone. I say I have my core-site.xml as follows: http://www.w3.org/2001/XInclude";> include-site.

Re: Running Hadoop on demand

2009-12-28 Thread Jim Twensky
Thanks for pointing that out Edward, I'll take a look at the most recent documentation on HOD. On Mon, Dec 28, 2009 at 7:39 PM, Edward Capriolo wrote: > Jim, > > Most components of hadoop now have a document branch per release, that > documentation is inline and is generated by forest. Anything t

Re: HDFS read/write speeds, and read optimization

2009-12-28 Thread Stas Oskin
Hi. Going back to the subject, has anyone ever bench-marked small (10 - 20 node) HDFS clusters? I did my own speed checks, and it seems I can reach ~77Mbps, on a quad-disk node. This comes to ~19Mbps per disk, which seems quite low in my opinion. Can anyone advice about this? Thanks.

Multiple file output

2009-12-28 Thread Huazhong Ning
Hi all, I need your help on multiple file output. I have many big files and I hope the processing result of each file is outputted to a separate file. I know in the old Hadoop APIs, the class MultipleOutputFormat works for this propose. But I cannot find the same class in new APIs. Does anybod

Re: Running Hadoop on demand

2009-12-28 Thread Edward Capriolo
Jim, Most components of hadoop now have a document branch per release, that documentation is inline and is generated by forest. Anything that you find not on the wiki, with a version number, should have a more up to date page. For example: http://hadoop.apache.org/common/ Drop down 'Documentatio

Re: Text coding

2009-12-28 Thread Aram Mkhitaryan
the point is that in java everything is an object, the byte[ ] as well so when you call byte[].toString() as usual you get '[', a letter that defines the type 'B', then comes '@' and then the hash code this is the standard toString() implementation for the array object I would recommend to impleme

Running Hadoop on demand

2009-12-28 Thread Jim Twensky
Hi, I'd like to get Hadoop running on a large University cluster which is used by many people to run different types of applications. We are currently using Torque to assign nodes and manage the queue. What I want to do is to enable people to request "n" processors, and automatically start Hadoop

Re: Text coding

2009-12-28 Thread Todd Lipcon
Furthermore, Text is meant for use when you have a UTF8-encoded string. Creating a Text object from a byte array that is not proper UTF-8 is likely to result in some kind of exception or data mangling. You should use BytesWritable for this purpose -Todd 2009/12/28 Edward Capriolo > Calling bitA

Re: Text coding

2009-12-28 Thread Edward Capriolo
Calling bitArray.toString() does not return your data. You can test this in a standalong program. You need to write the array out bitwise or byte wise. toString() does not do what you want. Edward 2009/12/28 Gang Luo : > Hi all, > I don't know too much about text coding and there is one thing con

Reduce hangs, FileNotFoundException

2009-12-28 Thread Ferreira, Herve (NSN - PT/Amadora)
Hi, I'm really frustrated because I've already lost some days trying to deploy hadoop but it doesn't work. If I deploy in a single cluster all things work ok (the mapreduce example as well the deployment). However when I try to install hadoop in a cluster the problems appear. The conf

Text coding

2009-12-28 Thread Gang Luo
Hi all, I don't know too much about text coding and there is one thing confusing me. I am implementing the bloom filter in mapreduce. The output is a bit array (implemented as byte[ ]) and the length is 2 exp 24 (that means, 2exp21 bytes). The size of the array should be 2 mb. But when I output