Re: Hadoop summit video capture?

2008-03-26 Thread Enis Soztutar
+1 Otis Gospodnetic wrote: Hi, Wasn't there going to be a live stream from the Hadoop summit? I couldn't find any references on the event site/page, and searches on veoh, youtube and google video yielded nothing. Is an archived version of the video (going to be) available? Thanks, Otis --

Re: small sized files - how to use MultiInputFileFormat

2008-04-01 Thread Enis Soztutar
Hi, An example extracting one record per file would be : public class FooInputFormat extends MultiFileInputFormat { @Override public RecordReader getRecordReader(InputSplit split, JobConf job, Reporter reporter) throws IOException { return new FooRecordReader(job, (MultiFileSplit)spli

Re: I need your help sincerely!

2008-04-22 Thread Enis Soztutar
Hi, The number of map tasks is supposed to be greater than the number of machines, so in your configuration, 6 map tasks is ok. However there should be another problem. Have you changed the code for word count? Please ensure that the example code is unchanged and your configuration is right.

Re: Best practices for handling many small files

2008-04-24 Thread Enis Soztutar
A shameless attempt to defend MultiFileInputFormat : A concrete implementation of MultiFileInputFormat is not needed, since every InputFormat relying on MultiFileInputFormat is expected to have its custom RecordReader implementation, thus they need to override getRecordReader(). An implementat

Re: JobConf: How to pass List/Map

2008-04-30 Thread Enis Soztutar
Hi, There are many ways which you can pass objects using configuration. Possibly the easiest way would be to use Stringifier interface. you can for example : DefaultStringifier.store(conf, variable ,"mykey"); variable = DefaultStringifier.load(conf, "mykey", variableClass ); you should take

Re: JobConf: How to pass List/Map

2008-04-30 Thread Enis Soztutar
strings, then you can directly store them in conf. The other obvious alternative would be to switch to 0.17, once it is out. Tarandeep Singh wrote: On Wed, Apr 30, 2008 at 5:11 AM, Enis Soztutar <[EMAIL PROTECTED]> wrote: Hi, There are many ways which you can pass objects

Re: JobConf: How to pass List/Map

2008-05-02 Thread Enis Soztutar
It is exactly what DefaultStringifier does, ugly but useful *smile*. Jason Venner wrote: We have been serializing to a bytearrayoutput stream then base64 encoding the underlying byte array and passing that string in the conf. It is ugly but it works well until 0.17 Enis Soztutar wrote: Yes

Re: Why ComparableWritable does not take a template?

2008-05-12 Thread Enis Soztutar
Hi, WritableComparable uses generics in trunk, but if you use 0.16.x you cannot use that version. WritableComparable is not generified yet due to legacy reasons, but the work is in progress. The problem with your code is raising from WritableComparator.newKey(). It seems your object cannot be

Re: non-static map or reduce classes?

2008-05-21 Thread Enis Soztutar
Hi, Static inner classes, and the static fields are different things in java. Hadoop needs to instantiate the Mapper and Reducer classes from their class names, so if they are defined as inner classes, they need to be static. You can either declare the inner classes to be static, and use the

Re: How long is Hadoop full unit test suit expected to run?

2008-06-17 Thread Enis Soztutar
Lukas Vlcek wrote: Hi, How long is Hadoop full unit test suit expected to run? How do you go about running Hadoop tests? I found that it can take hours for [ant test] target to run which does not seem to be very efficient for development. Is there anything I can do to speed up tests (like runni

Re: How long is Hadoop full unit test suit expected to run?

2008-06-17 Thread Enis Soztutar
ake a big difference. What kind of HW are you using for Hadoop testing? I would definitely appreciate if [ant test] runs under an hour. Regards, Lukas On Tue, Jun 17, 2008 at 10:26 AM, Enis Soztutar <[EMAIL PROTECTED]> wrote: Lukas Vlcek wrote: Hi, How long is Hadoop full uni

Re: Hadoop supports RDBMS?

2008-06-25 Thread Enis Soztutar
Yes, there is a way to use DBMS over JDBC. The feature is not realeased yet, but you can try it out, and give valuable feedback to us. You can find the patch and the jira issue at : https://issues.apache.org/jira/browse/HADOOP-2536 Lakshmi Narayanan wrote: Has anyone tried using any RDBMS wit

Re: edge count question

2008-06-27 Thread Enis Soztutar
Cam Bazz wrote: hello, I have a lucene index storing documents which holds src and dst words. word pairs may repeat. (it is a multigraph). I want to use hadoop to count how many of the same word pairs there are. I have looked at the aggregateword count example, and I understand that if I make a

Re: edge count question

2008-06-27 Thread Enis Soztutar
alternatives to nutch? yes, of course. There are all sorts of open source crawlers / indexers. Best Regards -C.B. On Fri, Jun 27, 2008 at 10:08 AM, Enis Soztutar <[EMAIL PROTECTED]> wrote: Cam Bazz wrote: hello, I have a lucene index storing documents which holds src and dst

Re: MultiFileInputFormat - Not enough mappers

2008-07-11 Thread Enis Soztutar
MultiFileSplit currently does not support automatic map task count computation. You can manually set the number of maps via jobConf#setNumMapTasks() or via command line arg -D mapred.map.tasks= Goel, Ankur wrote: Hi Folks, I am using hadoop to process some temporal data which is

Re: MultiFileInputFormat - Not enough mappers

2008-07-11 Thread Enis Soztutar
tasks in the application - (totalSize / blockSize), which is what I am doing as a work-around. I think this should be the default behaviour in MultiFileInputFormat. Should a JIRA be opened for the same ? -Ankur -Original Message- From: Enis Soztutar [mailto:[EMAIL PROTECTED] Sent: Friday

Re: MultiFileInputFormat and gzipped files

2008-08-05 Thread Enis Soztutar
MultiFileWordCount uses its own RecordReader, namely MultiFileLineRecordReader. This is different from the LineRecordReader which automatically detects the file's codec, and decodes it. You can write a custom RecordReader similar to LineRecordReader and MultiFileLineRecordReader, or just add c

Re: How I should use hadoop to analyze my logs?

2008-08-15 Thread Enis Soztutar
You can use chukwa, which is a contrib in the trunk for collecting log entries from web servers. You can run adaptors in the web servers, and a collector in the log server. The log entries may not be analyzed in real time, but it should be close to real time. I suggest you use pig, for log data

Re: Any InputFormat class implementation for Database records

2008-08-26 Thread Enis Soztutar
There is a patch you can try it out and share your xp : https://issues.apache.org/jira/browse/HADOOP-2536 ruchir wrote: Hi, I want to know whether is there any implementation of InputFormat class in Hadoop which can read data from Database instead of from HDFS while processing any hadoop Jo

Re: Questions about Hadoop

2008-09-24 Thread Enis Soztutar
for batch jobs, however these jobs can be chained together to form a workflow. I will try to be more helpful if you could extend what you mean by workflow. Enis Soztutar Regards Arijit Dr. Arijit Mukherjee Principal Member of Technical Staff, Level-II Connectiva Systems (I) Pvt. Ltd. J-2

Re: 1 file per record

2008-09-24 Thread Enis Soztutar
Yes, you can use MultiFileInputFormat. You can extend the MultiFileInputFormat to return a RecordReader, which reads a record for each file in the MultiFileSplit. Enis chandra wrote: hi.. By setting isSplitable false, we can set 1 file with n records 1 mapper. Is there any way to set 1 com

Re: Questions about Hadoop

2008-09-24 Thread Enis Soztutar
Enis Arijit Dr. Arijit Mukherjee Principal Member of Technical Staff, Level-II Connectiva Systems (I) Pvt. Ltd. J-2, Block GP, Sector V, Salt Lake Kolkata 700 091, India Phone: +91 (0)33 23577531/32 x 107 http://www.connectivasystems.com -Original Message- From: Enis Soztutar [mailto:[

Re: 1 file per record

2008-09-24 Thread Enis Soztutar
Nope, not right now. But this has came up before. Perhaps you will contribute one? chandravadana wrote: thanks is there any built in record reader which performs this function.. Enis Soztutar wrote: Yes, you can use MultiFileInputFormat. You can extend the MultiFileInputFormat to

Re: Hadoop with image processing

2008-10-15 Thread Enis Soztutar
From my understanding of the problem, you can - keep the image binary data in sequence files - copy the image whose similar images will searched to dfs with high replication. - in each map, calculate the similarity to the image - output only the similar images from the map. - no need a reduce st

Re: Anyone have a Lucene index InputFormat for Hadoop?

2008-11-12 Thread Enis Soztutar
I recommend you check nutch's src, which includes classes for Index input/output from mapred. Anthony Urso wrote: Anyone have a Lucene index InputFormat already implemented? Failing that, how about a Writable for the Lucene Document class? Cheers, Anthony

Re: how to pass an object to mapper

2008-12-23 Thread Enis Soztutar
There are several ways you can pass static information to tasks in Hadoop. The first is to store it in conf via DefaultStringifier, which needs the object to be serialized either through Writable or Serializable interfaces. Second way would be to save/serialize the data to a file and send it vi

Re: HADOOP-2536 supports Oracle too?

2009-02-04 Thread Enis Soztutar
Hadoop-2536 connects to the db via JDBC, so in theory it should work with proper jdbc drivers. It has been tested against MySQL, Hsqldb, and PostreSQL, but not Oracle. To answer your earlier question, the actual SQL statements might not be recognized by Oracle, so I suggest the best way to test

Re: HADOOP-2536 supports Oracle too?

2009-02-05 Thread Enis Soztutar
ed, Feb 4, 2009 at 7:13 AM, Enis Soztutar wrote: Hadoop-2536 connects to the db via JDBC, so in theory it should work with proper jdbc drivers. It has been tested against MySQL, Hsqldb, and PostreSQL, but not Oracle. To answer your earlier question, the actual SQL statements might not b

Re: How to use DBInputFormat?

2009-02-05 Thread Enis Soztutar
Please see below, Stefan Podkowinski wrote: As far as i understand the main problem is that you need to create splits from streaming data with an unknown number of records and offsets. Its just the same problem as with externally compressed data (.gz). You need to go through the complete stream

Re: re : How to use MapFile in C++ program

2009-02-06 Thread Enis Soztutar
There is currently no way to read MapFiles in any language other than Java. You can write a JNI wrapper similar to libhdfs. Alternatively, you can also write the complete stack from scratch, however this might prove very difficult or impossible. You might want to check the ObjectFile/TFile speci

Re: HADOOP-2536 supports Oracle too?

2009-02-17 Thread Enis Soztutar
simple database connectivity program on it. Could you please tell me how u went about it?? my mail id is "sandys_cr...@yahoo.com" . A copy of your code that successfully connected to MySQL will also be helpful. Thanks, Sandhiya Enis Soztutar-2 wrote: From the exception :

Re: how to optimize mapreduce procedure??

2009-03-13 Thread Enis Soztutar
ZhiHong Fu wrote: Hello, I'm writing a program which will finish lucene searching in about 12 index directorys, all of them are stored in HDFS. It is done like this: 1. We get about 12 index Directorys through lucene index functionality, each of which about 100M size, 2. We store thes

Re: merging files

2009-03-18 Thread Enis Soztutar
Use MultipleInputs and use two different mappers for the inputs. map1 should be IdentityMapper, mapper 2 should output key, value pairs where value is a peudo marker value(same for all keys), which marks that the value is null/empty. In the reducer just output the key/value pairs which does not

Re: Small Test Data Sets

2009-03-25 Thread Enis Soztutar
Patterson, Josh wrote: I want to confirm something with the list that I'm seeing; I needed to confirm that my Reader was reading our file format correctly, so I created a MR job that simply output each K/V pair to the reducer, which then just wrote out each one to the output file. This allows

Re: what change to be done in OutputCollector to print custom writable object

2009-04-01 Thread Enis Soztutar
Deepak Diwakar wrote: Hi, I am learning how to make custom-writable working. So I have implemented a simple MyWriitable class. And I can play with the MyWritable object within the Map-Reduce. but suppose in Reduce Values are a type of MyWritable object and I put them into OutputCollector to g

Hadoop Presentation at Ankara / Turkey

2009-04-16 Thread Enis Soztutar
Hi all, I will be giving a presentation on Hadoop at "1. Ulusal Yüksek Başarım ve Grid Konferansı" tomorrow(Apr 17, 13:10). The conference location is at KKM ODTU/Ankara/Turkey. Presentation will be in Turkish. All the Hadoop users and wanna-be users in the area are welcome to attend. More i

Re: hadoop file system browser

2008-01-22 Thread Enis Soztutar
Webdav interface for hadoop works as it is, but it needs a major redesign to be scalable, however it is still useful. It has even been used with windows explorer defining the webdav server as a remote service. Ted Dunning wrote: There has been significant work on building a web-DAV interface f

Re: hadoop file system browser

2008-01-22 Thread Enis Soztutar
webdav has faded for now. Alban Chevignard wrote: What are the scalability issues associated with the current WebDAV interface? Thanks, -Alban On Jan 22, 2008 7:27 AM, Enis Soztutar <[EMAIL PROTECTED]> wrote: Webdav interface for hadoop works as it is, but it needs a major redesign

Re: hadoop file system browser

2008-01-24 Thread Enis Soztutar
n On Jan 23, 2008 2:53 AM, Enis Soztutar <[EMAIL PROTECTED]> wrote: As you know, dfs client connects to the individual datanodes to read/write data and has a minimal interaction with the Namenode, which improves the io rate linearly(theoretically 1:1). However current implementatio

Re: Why no DoubleWritable?

2008-02-05 Thread Enis Soztutar
Hi, The reason may be that perhaps nobody needed the extra precision brought by double compansating the extra space, compared to FloatWritable. If you really need DoubleWritable you may write the class, which will be straightforward, and then attach it to a jira issue so that we can add it to

Re: Nutch Extensions to MapReduce

2008-03-06 Thread Enis Soztutar
Hi, Currently nutch is a fairly complex application that *uses* hadoop as a base for distributed computing and storage. In this regard there is no part in nutch that "extends" hadoop. The core of the mapreduce indeed does work with pairs, and nutch uses specific pairs such as , etc. So lo

Re: Difference between local mode and distributed mode

2008-03-06 Thread Enis Soztutar
Hi, LocalJobRunner uses just 0 or 1 reduce. This is because running in local mode is only supported for testing purposes. Although you can simulate distribute mode in local, by using MiniMRCluster and MiniDFSCluster under src/test. Best wishes Enis Naama Kraus wrote: Hi, I ran a simple Map

Re: Nutch Extensions to MapReduce

2008-03-06 Thread Enis Soztutar
help, I might have more concrete details of what I am trying to implement later on, now I am basically learning. Naama On Thu, Mar 6, 2008 at 3:13 PM, Enis Soztutar <[EMAIL PROTECTED]> wrote: Hi, Currently nutch is a fairly complex application that *uses* hadoop as a base for distributed

Re: Nutch Extensions to MapReduce

2008-03-06 Thread Enis Soztutar
is a List(for example ArrayWritable) containing children names). If not, that's also fine, I was just curious :-) Naama On Thu, Mar 6, 2008 at 3:58 PM, Enis Soztutar <[EMAIL PROTECTED]> wrote: Let me explain this more technically :) An MR job takes pairs. Each map(k1,v1) may resu

Re: displaying intermediate results of map/reduce

2008-03-06 Thread Enis Soztutar
You can also run the job in local mode with zero reducers, so that the map results are the results of the job. Prasan Ary wrote: Hi All, I am using eclipse to write a map/reduce java application that connects to hadoop on remote cluster. Is there a way I can display intermediate results of

Re: [core-user] Processing binary files Howto??

2008-03-18 Thread Enis Soztutar
Hi, please see below, Ted Dunning wrote: This sounds very different from your earlier questions. If you have a moderate (10's to 1000's) number of binary files, then it is very easy to write a special purpose InputFormat that tells hadoop that the file is not splittable. @ Ted, actually w