Fwd: inputformat - how to list all active trackers?

2019-10-23 Thread Marcelo Valle
Hi, I am creating a custom input format that extends FileInputFormat ( org.apache.hadoop.mapreduce.lib.input.FileInputFormat). I used the new hadoop API - hadoop-mapreduce-client-ore 2.8.3 and I am running this on AWS EMR. My intention is to spread the input files among the hosts, in a way each

Hadoop Custom InputFormat (SequenceFileInputFormat vs FileInputFormat)

2016-07-15 Thread Travis Chung
number of segments, size of subheaders and segment data (I'll need this to create my splits). To digest it all, I'm wondering if it's best to create a custom InputFormat inheriting from (1) FileInputFormat or (2) SequenceFileInputFormat. If I go with (1), I will create HeaderSplits and DataSplits

How to junit test customized Hadoop InputFormat/OutputFormat

2015-12-08 Thread Todd
Hi, I have defined an InputFormat class and an OutputFormat class. It looks to me that I have to create a MR to test whether they work. I would ask whether there is way to junit customized Hadoop InputFormat/OutputFormat without kicking off an MR application. Thanks!

Changing the InputFormat

2015-02-26 Thread Arko Provo Mukherjee
Hello, I am trying to write a Hadoop program that handles JSON and hence wrote a CustomInputFormat to handle the data. The Custom format extends the RecordReader and then overrides the nextKeyValue() method. However, this doesn't solve the problem when one JSON object is split across two

A more scalable Kafka to Hadoop InputFormat

2014-10-30 Thread Casey Green
Hi Folks, I'm open sourcing a scalable Kafka InputFormat. As far as I know or am aware of, my version is unique compared to other Kafka InputFormats out there, in that input splits are mapped to Kafka log files, rather than entire Kafka partitions. Mapping Kafka log files to input splits

InputFormat for dealing with log files.

2014-10-05 Thread Guillermo Ortiz
I'd like to know if there's an InputFormat to be able to deal with log files. The problem that I have it's that if I have to read an Tomcat log for example, sometimes the exceptions are typed on several lines, but they should be processed just like one line, I mean all the lines together

Re: InputFormat for dealing with log files.

2014-10-05 Thread Ted Yu
Have you read http://blog.rguha.net/?p=293 ? Cheers On Sun, Oct 5, 2014 at 6:24 AM, Guillermo Ortiz gor...@pragsis.com wrote: I'd like to know if there's an InputFormat to be able to deal with log files. The problem that I have it's that if I have to read an Tomcat log for example

Re: InputFormat for dealing with log files.

2014-10-05 Thread Guillermo Ortiz
, although it seems too expensive in time process and the operations in the InputFormat should be pretty fast. Any better idea? - Mensaje original - De: Ted Yu yuzhih...@gmail.com Para: common-u...@hadoop.apache.org user@hadoop.apache.org Enviados: Domingo, 5 de Octubre 2014 16:27:18

Re: InputFormat for dealing with log files.

2014-10-05 Thread Ted Yu
and the operations in the InputFormat should be pretty fast. Any better idea? -- *De: *Ted Yu yuzhih...@gmail.com *Para: *common-u...@hadoop.apache.org user@hadoop.apache.org *Enviados: *Domingo, 5 de Octubre 2014 16:27:18 *Asunto: *Re: InputFormat for dealing

Re: Hadoop InputFormat - Processing large number of small files

2014-09-01 Thread rab ra
wholeFileInputFormat. But i am not sure filename comes to map process either as key or value. But, I think this file format reads the contents of the file. I wish to have a inputformat that just gives filename or list of filenames. Also, files are very small. The wholeFileInputFormat spans one map process

Re: Hadoop InputFormat - Processing large number of small files

2014-08-26 Thread rab ra
comes to map process either as key or value. But, I think this file format reads the contents of the file. I wish to have a inputformat that just gives filename or list of filenames. Also, files are very small. The wholeFileInputFormat spans one map process per file and thus results huge number

Re: Hadoop InputFormat - Processing large number of small files

2014-08-21 Thread Felix Chern
. But i am not sure filename comes to map process either as key or value. But, I think this file format reads the contents of the file. I wish to have a inputformat that just gives filename or list of filenames. Also, files are very small. The wholeFileInputFormat spans one map process per

Re: Hadoop InputFormat - Processing large number of small files

2014-08-21 Thread rab ra
filename comes to map process either as key or value. But, I think this file format reads the contents of the file. I wish to have a inputformat that just gives filename or list of filenames. Also, files are very small. The wholeFileInputFormat spans one map process per file and thus results

RE: Hadoop InputFormat - Processing large number of small files

2014-08-21 Thread java8964
that. Yong Date: Thu, 21 Aug 2014 22:26:12 +0530 Subject: Re: Hadoop InputFormat - Processing large number of small files From: rab...@gmail.com To: user@hadoop.apache.org Hello, This means that a file with names of all the files that need to be processed and is fed to hadoop with NLineInputFormat

Re: Hadoop InputFormat - Processing large number of small files

2014-08-20 Thread Shahab Yunus
Have you looked at the WholeFileInputFormat implementations? There are quite a few if search for them... http://hadoop-sandy.blogspot.com/2013/02/wholefileinputformat-in-java-hadoop.html https://github.com/tomwhite/hadoop-book/blob/master/ch07/src/main/java/WholeFileInputFormat.java Regards,

Re: Hadoop InputFormat - Processing large number of small files

2014-08-20 Thread rab ra
Thanks for the response. Yes, I know wholeFileInputFormat. But i am not sure filename comes to map process either as key or value. But, I think this file format reads the contents of the file. I wish to have a inputformat that just gives filename or list of filenames. Also, files are very small

Re: Hadoop InputFormat - Processing large number of small files

2014-08-20 Thread Felix Chern
On Aug 20, 2014, at 8:19 AM, rab ra rab...@gmail.com wrote: Thanks for the response. Yes, I know wholeFileInputFormat. But i am not sure filename comes to map process either as key or value. But, I think this file format reads the contents of the file. I wish to have a inputformat

Hadoop InputFormat - Processing large number of small files

2014-08-19 Thread rab ra
Hello, I have a use case wherein i need to process huge set of files stored in HDFS. Those files are non-splittable and they need to be processed as a whole. Here, I have the following question for which I need answers to proceed further in this. 1. I wish to schedule the map process in task

Re: InputFormat and InputSplit - Network location name contains /:

2014-04-11 Thread Patcharee Thongtra
done in your current InputFormat implementation. If you're looking to store a single file path, use the FileSplit class, or if not as simple as that, do use it as a base reference to build you Path based InputSplit derivative. Its sources are at https://github.com/apache/hadoop-common/blob/release

InputFormat and InputSplit - Network location name contains /:

2014-04-10 Thread Patcharee Thongtra
Hi, I wrote a custom InputFormat and InputSplit to handle netcdf file. I use with a custom pig Load function. When I submitted a job by running a pig script. I got an error below. From the error log, the network location name is hdfs://service-1-0.local:8020/user/patcharee/netcdf_data

Re: InputFormat and InputSplit - Network location name contains /:

2014-04-10 Thread Harsh J
Do not use the InputSplit's getLocations() API to supply your file path, it is not intended for such things, if thats what you've done in your current InputFormat implementation. If you're looking to store a single file path, use the FileSplit class, or if not as simple as that, do use

Re: A couple of Questions on InputFormat

2013-09-23 Thread Harsh J
Hi, (I'm assuming 1.0~ MR here) On Sun, Sep 22, 2013 at 1:00 AM, Steve Lewis lordjoe2...@gmail.com wrote: Classes implementing InputFormat implement public ListInputSplit getSplits(JobContext job) which a List if InputSplits. for FileInputFormat the Splits have Path.start and End 1) When

Re: A couple of Questions on InputFormat

2013-09-23 Thread Steve Lewis
lordjoe2...@gmail.com wrote: Classes implementing InputFormat implement public ListInputSplit getSplits(JobContext job) which a List if InputSplits. for FileInputFormat the Splits have Path.start and End 1) When is this method called and on which JVM on Which Machine and is it called

Re: A couple of Questions on InputFormat

2013-09-23 Thread Harsh J
ha...@cloudera.com wrote: Hi, (I'm assuming 1.0~ MR here) On Sun, Sep 22, 2013 at 1:00 AM, Steve Lewis lordjoe2...@gmail.com wrote: Classes implementing InputFormat implement public ListInputSplit getSplits(JobContext job) which a List if InputSplits. for FileInputFormat the Splits

A couple of Questions on InputFormat

2013-09-21 Thread Steve Lewis
Classes implementing InputFormat implement public ListInputSplit getSplits(JobContext job) which a List if InputSplits. for FileInputFormat the Splits have Path.start and End 1) When is this method called and on which JVM on Which Machine and is it called only once? 2) Do the number of Map task

Re: Which InputFormat to use?

2013-07-05 Thread Azuryy Yu
Using InputFormat under mapreduce package. mapred package is very old package. but generally you can extend from FileInputFormat under o.a.h.mapreduce package. On Fri, Jul 5, 2013 at 1:23 PM, Devaraj k devara...@huawei.com wrote: Hi Ahmed, ** ** Hadoop 0.20.0 included

Which InputFormat to use?

2013-07-04 Thread Ahmed Eldawy
Hi I'm developing a new set of InputFormats that are used for a project I'm doing. I found that there are two ways to create a new InputFormat. 1- Extend the abstract class org.apache.hadoop.mapreduce.InputFormat 2- Implement the interface org.apache.hadoop.mapred.InputFormat I don't know why

RE: Which InputFormat to use?

2013-07-04 Thread Otto Mok
InputFormat to use? Hi I'm developing a new set of InputFormats that are used for a project I'm doing. I found that there are two ways to create a new InputFormat. 1- Extend the abstract class org.apache.hadoop.mapreduce.InputFormat 2- Implement the interface org.apache.hadoop.mapred.InputFormat I don't

RE: Which InputFormat to use?

2013-07-04 Thread Devaraj k
From: Ahmed Eldawy [mailto:aseld...@gmail.com] Sent: 05 July 2013 00:00 To: user@hadoop.apache.org Subject: Which InputFormat to use? Hi I'm developing a new set of InputFormats that are used for a project I'm doing. I found that there are two ways to create a new InputFormat. 1- Extend

Re: Inputformat

2013-06-22 Thread Azuryy Yu
in a nail (json file) with a screwdriver ( XMLInputReader) then perhaps the reason it won't work may be that you are using the wrong tool? On Jun 21, 2013 11:38 PM, jamal sasha jamalsha...@gmail.com wrote: Hi, I am using one of the libraries which rely on InputFormat. Right now

Re: Inputformat

2013-06-21 Thread Niels Basjes
If you try to hammer in a nail (json file) with a screwdriver ( XMLInputReader) then perhaps the reason it won't work may be that you are using the wrong tool? On Jun 21, 2013 11:38 PM, jamal sasha jamalsha...@gmail.com wrote: Hi, I am using one of the libraries which rely on InputFormat

Passing values from InputFormat via the Configuration object

2013-05-17 Thread Public Network Services
Hi... I need to communicate some proprietary number (long) values from the getSplits() method of a custom InputFormat class to the Hadoop driver class (used to launch the job), but the JobContext object passed to the getSplits() method has no access to a Counters object. From the source code

Re: Passing values from InputFormat via the Configuration object

2013-05-17 Thread Public Network Services
counters = work.getCounters(); } Would that be correct? On Fri, May 17, 2013 at 5:33 PM, Public Network Services publicnetworkservi...@gmail.com wrote: Hi... I need to communicate some proprietary number (long) values from the getSplits() method of a custom InputFormat class to the Hadoop driver

Help me improve this InputFormat/Loader

2013-04-11 Thread Mark
. We have something like the following: https://gist.github.com/anonymous/5364554 (The naming is a little off since its technically not an InputFormat. .any ideas on a proper name?) Basically it uses retrieves all directory for a given path and sorts them in descending order, limiting to the last

Re: InputFormat for some REST api

2013-02-19 Thread Robert Evans
Subject: InputFormat for some REST api Hi, Do you know of any InputFormat implemented for some REST api provider? Usually when one needs to process data that is accessible only by REST, one should try to download the data first someone, but what if you cannot download it? thanks

Re: InputFormat for some REST api

2013-02-19 Thread Mohammad Tariq
@hadoop.apache.org Date: Tuesday, February 19, 2013 4:49 AM To: user@hadoop.apache.org user@hadoop.apache.org Subject: InputFormat for some REST api Hi, Do you know of any InputFormat implemented for some REST api provider? Usually when one needs to process data that is accessible only by REST

Re: InputFormat for some REST api

2013-02-19 Thread Yaron Gonen
@hadoop.apache.org Subject: InputFormat for some REST api Hi, Do you know of any InputFormat implemented for some REST api provider? Usually when one needs to process data that is accessible only by REST, one should try to download the data first someone, but what if you cannot download it? thanks

Re: InputFormat for some REST api

2013-02-19 Thread Alex Thieme
. --Bobby From: Yaron Gonen yaron.go...@gmail.com Reply-To: user@hadoop.apache.org user@hadoop.apache.org Date: Tuesday, February 19, 2013 4:49 AM To: user@hadoop.apache.org user@hadoop.apache.org Subject: InputFormat for some REST api Hi, Do you know of any InputFormat implemented for some

Custom InputFormat errer

2012-08-29 Thread Chen He
Hi guys I met a interesting problem when I implement my own custom InputFormat which extends the FileInputFormat.(I rewrite the RecordReader class but not the InputSplit class) My recordreader will take following format as a basic record: (my recordreader extends the LineRecordReader. It returns

Re: Custom InputFormat errer

2012-08-29 Thread Harsh J
wrote: Hi guys I met a interesting problem when I implement my own custom InputFormat which extends the FileInputFormat.(I rewrite the RecordReader class but not the InputSplit class) My recordreader will take following format as a basic record: (my recordreader extends the LineRecordReader

Re: Custom InputFormat errer

2012-08-29 Thread Chen He
InputFormat which extends the FileInputFormat.(I rewrite the RecordReader class but not the InputSplit class) My recordreader will take following format as a basic record: (my recordreader extends the LineRecordReader. It returns a record if it meets #Trailer# and contains #Header#. I only

Re: Custom InputFormat errer

2012-08-29 Thread Harsh J
record? Your case is not very different from the newlines logic presented here: http://wiki.apache.org/hadoop/HadoopMapReduce On Wed, Aug 29, 2012 at 11:13 AM, Chen He airb...@gmail.com wrote: Hi guys I met a interesting problem when I implement my own custom InputFormat which extends

how to increment counters inside of InputFormat/RecordReader in mapreduce api?

2012-07-30 Thread Jim Donofrio
, both have no access to counters. Is there really no way to increment counters inside of a RecordReader or InputFormat in the mapreduce api?

Re: how to increment counters inside of InputFormat/RecordReader in mapreduce api?

2012-07-30 Thread Harsh J
, both have no access to counters. Is there really no way to increment counters inside of a RecordReader or InputFormat in the mapreduce api? -- Harsh J

RE: Suggestion for InputSplit and InputFormat - Split every line.

2012-03-16 Thread Vanessa van Gelder
gupta Sent: Friday, March 16, 2012 3:39 AM To: common-user@hadoop.apache.org Subject: Re: Suggestion for InputSplit and InputFormat - Split every line. Have a look at NLineInputFormat class in Hadoop. It is build to split the input on the basis of number of lines. On Thu, Mar 15, 2012 at 6:13 PM

RE: decompressing bzip2 data with a custom InputFormat

2012-03-16 Thread Tony Burton
Cool - thanks for the confirmation and link, Joey, very helpful. -Original Message- From: Joey Echeverria [mailto:j...@cloudera.com] Sent: 14 March 2012 19:03 To: common-user@hadoop.apache.org Subject: Re: decompressing bzip2 data with a custom InputFormat Yes you have to deal

Suggestion for InputSplit and InputFormat - Split every line.

2012-03-15 Thread Deepak Nettem
to be called only once), and work on the data source. What's the best way to construct InputSplit, InputFormat and RecordReader to achieve this? I would appreciate any example code :) Best, Deepak

Re: Suggestion for InputSplit and InputFormat - Split every line.

2012-03-15 Thread anil gupta
, InputFormat and RecordReader to achieve this? I would appreciate any example code :) Best, Deepak -- Thanks Regards, Anil Gupta

RE: decompressing bzip2 data with a custom InputFormat

2012-03-14 Thread Tony Burton
Hi - sorry to bump this, but I'm having trouble resolving this. Essentially the question is: If I create my own InputFormat by subclassing TextInputFormat, does the subclass have to handle its own streaming of compressed data? If so, can anyone point me at an example where this is done

Re: decompressing bzip2 data with a custom InputFormat

2012-03-14 Thread Joey Echeverria
/LineRecordReader.java -Joey On Wed, Mar 14, 2012 at 11:08 AM, Tony Burton tbur...@sportingindex.com wrote: Hi - sorry to bump this, but I'm having trouble resolving this. Essentially the question is: If I create my own InputFormat by subclassing TextInputFormat, does the subclass have to handle its

decompressing bzip2 data with a custom InputFormat

2012-03-12 Thread Tony Burton
Hi, I'm setting up a map-only job that reads large bzip2-compressed data files, parses the XML and writes out the same data in plain text format. My XML InputFormat extends TextInputFormat and has a RecordReader based upon the one you can see at http://xmlandhadoop.blogspot.com/ (my version

mapreduce 0.21 problem with inputformat

2011-10-24 Thread Simon Klausner
Hi, i'm trying to define my own InputFormat and RecordReader, however I'm getting a type mismatch error in the createRecordReader method of the InputFormat class. Here is the inputformat: http://codepad.org/wdr2NqBe http://codepad.org/wdr2NqBe here is the recordreader: http

InputFormat Problem

2011-10-21 Thread Simon Klausner
Hi, i'm trying to define my own InputFormat and RecordReader, however I'm getting a type mismatch error in the createRecordReader method of the InputFormat class. Here is the inputformat: http://codepad.org/wdr2NqBe here is the recordreader: http://codepad.org/9cmY6BjS http

Re: InputFormat Problem

2011-10-21 Thread Harsh J
that in a org.apache.hadoop.mapreduce.InputFormat derivative, which will not work. Try changing your InputFormat to import org.apache.hadoop.mapred.InputFormat or vice versa (for your RecordReader). On Fri, Oct 21, 2011 at 12:58 PM, Simon Klausner h...@gmx.at wrote: Hi, i'm trying to define my own InputFormat

Custom InputFormat for Multiline Input File Hive/Hadoop

2011-10-10 Thread Mike Sukmanowsky
that are escaped by a backslash (\\n and \\t). As a result I've opted to create my own InputFormat to handle the multiple newlines and convert those tabs to spaces when Hive is going to try to do a split on the tabs. I've found a fairly good reference for doing this using the newer InputFormat API

Re: Does anyone have sample code for forcing a custom InputFormat to use a small split

2011-09-12 Thread Harsh J
Hello Steve, On Mon, Sep 12, 2011 at 7:57 AM, Steve Lewis lordjoe2...@gmail.com wrote: I have a problem where there is a single, relatively small (10-20 MB) input file. (It happens it is a fasta file which will have meaning if you are a biologist.)  I am already using a custom  InputFormat

Re: Does anyone have sample code for forcing a custom InputFormat to use a small split

2011-09-12 Thread Harsh J
meaning if you are a biologist.)  I am already using a custom  InputFormat  and a custom reader to force a custom parsing. The file may generate tens or hundreds of millions of key value pairs and the mapper does a fair amount of work on each record. The standard implementation

questions regarding data storage and inputformat

2011-07-27 Thread Tom Melendez
accesses, but I do need to be aware of the keys, as I need to be sure that I get all of the relevant keys sent to a given mapper 2. Looks like I want a custom inputformat for this, extending SequenceFileInputFormat. Do you agree? I'll gladly take some opinions on this, as I ultimately want to split

Re: questions regarding data storage and inputformat

2011-07-27 Thread Joey Echeverria
below). 2. Looks like I want a custom inputformat for this, extending SequenceFileInputFormat.  Do you agree?  I'll gladly take some opinions on this, as I ultimately want to split the based on what's in the file, which might be a little unorthodox. If you need to split based on where certain

Re: questions regarding data storage and inputformat

2011-07-27 Thread Tom Melendez
3. Another idea might be create separate seq files for chunk of records and make them non-splittable, ensuring that they go to a single mapper.  Assuming I can get away with this, see any pros/cons with that approach? Separate sequence files would require the least amount of custom code.

Re: questions regarding data storage and inputformat

2011-07-27 Thread Joey Echeverria
You could either use a custom RecordReader or you could override the run() method on your Mapper class to do the merging before calling the map() method. -Joey On Wed, Jul 27, 2011 at 11:09 AM, Tom Melendez t...@supertom.com wrote: 3. Another idea might be create separate seq files for chunk

Re: Get the actual line number from inputformat in the mapper

2011-04-28 Thread Soren Flexner
This is a bit out of left field, but you could add a 'key' field at the beginning of each record (which you would arrange to be the record number), and then use the keyValue input format. Now your keys are the record number. This might be prohibitive if your data is already on HDFS, and you

Get the actual line number from inputformat in the mapper

2011-04-27 Thread Pei HE
Hi, I want to know how to get the actual line number of the input file in the mapper. The key, which TextInputFormat generates, is the bytes offset in the file. So, how can I find the global line offset in the mapper? Thanks - - Pei

Re: Get the actual line number from inputformat in the mapper

2011-04-27 Thread Harsh J
Hello Pei, On Thu, Apr 28, 2011 at 6:58 AM, Pei HE pei...@gmail.com wrote: The key, which TextInputFormat generates, is the bytes offset in the file. So, how can I find the global line offset in the mapper? This is not achievable unless you have fixed byte records (in which case you should be

Re: custom InputFormat class

2011-03-05 Thread souri datta
, 2011 at 11:41 PM, Harsh J qwertyman...@gmail.com wrote: It is worth reading some implementations of already existing InputFormat classes, such as the simple TextInputFormat, or the SequenceFileInputFormat which also has a RecordReader implementation in it. You may find these source files

Re: custom InputFormat class

2011-03-05 Thread souri datta
FYI. Moving back to hadoop 0.20.2 solved my problem. Thanks, Souri On Fri, Mar 4, 2011 at 11:27 PM, souri datta souri.isthe...@gmail.com wrote: Hi,  Is there a good tutorial for writing custom InputFormat classes? Any help would be greatly appreciated. Thanks, Souri

custom InputFormat class

2011-03-04 Thread souri datta
Hi, Is there a good tutorial for writing custom InputFormat classes? Any help would be greatly appreciated. Thanks, Souri

Re: custom InputFormat class

2011-03-04 Thread Harsh J
It is worth reading some implementations of already existing InputFormat classes, such as the simple TextInputFormat, or the SequenceFileInputFormat which also has a RecordReader implementation in it. You may find these source files in your downloaded Hadoop distribution's src/ directory itself

Re: custom InputFormat class

2011-03-04 Thread souri datta
InputFormat classes, such as the simple TextInputFormat, or the SequenceFileInputFormat which also has a RecordReader implementation in it. You may find these source files in your downloaded Hadoop distribution's src/ directory itself (in their appropriate packages). I do not know

Re: custom InputFormat class

2011-03-04 Thread Harsh J
Yes, it may be too much to grasp in the first read. Reading a non text-based record reader implementation helps (something that has its own reader class, and just uses record readers to manage that). I'd suggested SequenceFile for this case. On Fri, Mar 4, 2011 at 11:51 PM, souri datta

Re: InputFormat for a big file

2010-12-20 Thread madhu phatak
as a InputFormat but it sends only one line to the Mapper which is very in efficient. So can you guide me to write a InputFormat which splits the file into multiple Splits and each mapper can read multiple line from each split Regards Madhukar -- View this message in context

Re: InputFormat for a big file

2010-12-20 Thread Harsh J
line of the file is a number . I want to find the sum all those numbers. I wanted to use NLineInputFormat as a InputFormat but it sends only one line to the Mapper which is very in efficient. So can you guide me to write a InputFormat which splits the file into multiple Splits

InputFormat for a big file

2010-12-17 Thread madhu phatak
Hi I have a very large file of size 1.4 GB. Each line of the file is a number . I want to find the sum all those numbers. I wanted to use NLineInputFormat as a InputFormat but it sends only one line to the Mapper which is very in efficient. So can you guide me to write a InputFormat which splits

Re: InputFormat for a big file

2010-12-17 Thread Matthew John
//So can you guide me to write a InputFormat which splits the file //into multiple Splits more the number of mappers u assign , more the number of input splits in the mapreduce.. in effect, the number of inputsplits is equal to the number of mappers assigned. that should take care of the problem

Re: InputFormat for a big file

2010-12-17 Thread Ted Dunning
. On Fri, Dec 17, 2010 at 7:58 AM, madhu phatak phatak@gmail.com wrote: Hi I have a very large file of size 1.4 GB. Each line of the file is a number . I want to find the sum all those numbers. I wanted to use NLineInputFormat as a InputFormat but it sends only one line to the Mapper

Re: InputFormat for a big file

2010-12-17 Thread Aman
GB. Each line of the file is a number . I want to find the sum all those numbers. I wanted to use NLineInputFormat as a InputFormat but it sends only one line to the Mapper which is very in efficient. So can you guide me to write a InputFormat which splits the file into multiple Splits

Symbol Link as InputFormat Folder

2010-12-14 Thread lamfeeli...@gmail.com
Dear All, I've got a Folder A, and has a Symbol Link folder A' linked to A, but when I add A' as one of the inputformat folders, it gives me this error: Exception in thread main org.apache.hadoop.hdfs.protocol.UnresolvedPathException: hdfs://localhost:9000/user/songliu/W

Re: InputFormat in mapred vs. mapreduce.

2010-12-07 Thread Jane Chen
Harsh, thank you for your response. That's what I guessed. In 0.20, Interface InputFormat under mapred package was deprecated. In 0.21, it is no longer deprecated. Why is that? Thanks, Jane --- On Tue, 12/7/10, Harsh J qwertyman...@gmail.com wrote: From: Harsh J qwertyman...@gmail.com

Re: InputFormat in mapred vs. mapreduce.

2010-12-07 Thread Greg Roelofs
Jane Chen wrote: In 0.20, Interface InputFormat under mapred package was deprecated. In 0.21, it is no longer deprecated. Why is that? IIRC, it's because not all of the old features have yet been reproduced in the new API, so it's premature to deprecate the old one. The list archives over

Loading a file from HDFS into InputFormat

2010-10-04 Thread maha
Hi guys, I'm looking at the diagram on Module 4 of Yahoo-Hadoop-Tutorial. The figure shows files in one node taken from the HDFS to InputFormat. My question is, what is the name of the object that is responsible of reading from HDFS? ie . the one that gives the file address

Re: InputFormat version problem

2010-09-22 Thread Tianqiang Li
, I have customized InputFormat class to read our log format in our hadoop job and Pig, which is built on top of Hadoop 0.20 api, now I'd like to re-use this inputformat to load data into Hive table by specifying InputFormat, and a Serde when I create a table like below: CREATE TABLE

InputFormat version problem.

2010-09-21 Thread Tianqiang Li
Hi, I have customized InputFormat class to read our log format in our hadoop job and Pig, which is built on top of Hadoop 0.20 api, now I'd like to re-use this inputformat to load data into Hive table by specifying InputFormat, and a Serde when I create a table like below: CREATE TABLE

InputFormat version problem

2010-09-21 Thread Tianqiang Li
Hi, I have customized InputFormat class to read our log format in our hadoop job and Pig, which is built on top of Hadoop 0.20 api, now I'd like to re-use this inputformat to load data into Hive table by specifying InputFormat, and a Serde when I create a table like below: CREATE TABLE

Re: InputFormat version problem

2010-09-21 Thread Edward Capriolo
On Wed, Sep 22, 2010 at 12:08 AM, Tianqiang Li peter...@gmail.com wrote: Hi, I have customized InputFormat class to read our log format in our hadoop job and Pig, which is built on top of Hadoop 0.20 api, now I'd like to re-use this inputformat to load data into Hive table by specifying

Using the same InputFormat class for JOIN?

2010-07-01 Thread yan qi
and tmp2 respectively. I found that this query is executed in Hive with a MapReduce Job. Therefore, I am wondering if tmp2 and tmp7 are both assumed to share the same InputFormat class. What if tmp2 and tmp7 are using different InputFormat classes to read records? Thanks, WS

Re: Using the same InputFormat class for JOIN?

2010-07-01 Thread yan qi
sHi, Namit, Thanks a lot for your reply! I checked the source code. Given a query, (select tmp7.* from tmp7 join tmp2 on (tmp7.c2 = tmp2.c1)), there is only a MapReduce job generated. As far as I know, the function setInputFormat would be used to set the job's InputFormat class

Re: KeyValueInputFormat Inputformat in showing error cannot find symbol

2010-06-09 Thread Ted Yu
See: http://www.mail-archive.com/common-user@hadoop.apache.org/msg03280.html On Tue, Jun 8, 2010 at 8:00 AM, Jak jakheart...@hotmail.com wrote: Hi All, I newbie to hadoop, Actually i followed the tutorial, configured and worked with wordcount example. it is worked great. I read yahoo

KeyValueInputFormat Inputformat in showing error cannot find symbol

2010-06-08 Thread Jak
Hi All, I newbie to hadoop, Actually i followed the tutorial, configured and worked with wordcount example. it is worked great. I read yahoo developer tutorial in module 4, they mentioned input format. when i tried this format KeyValueInputFormat i got the following error while compiling the java

custom InputFormat and RecordReader

2010-05-25 Thread Mo Zhou
Hi, I am quite new to hadoop. I write my own StreamFastaInputFormat an StreamFastaRecordReader in $ hadoopbase/src/contrib/streaming/src/java/org/apache/hadoop/streaming/ I run $ant under the directory $hadoopbase/src/contrib/streaming/ using the default build.xml. However it failed due to the

Re: Pipes program with Java InputFormat/RecordReader

2010-04-15 Thread Keith Wiley
On Apr 14, 2010, at 20:55 , Amareshwari Sri Ramadasu wrote: Hi Keith, My answers inline. On 4/15/10 12:57 AM, Keith Wiley kwi...@keithwiley.com wrote: How do I use a nondefault Java InputFormat/RecordReader with a Pipes program. I realize I can set: property

Pipes program with Java InputFormat/RecordReader

2010-04-14 Thread Keith Wiley
How do I use a nondefault Java InputFormat/RecordReader with a Pipes program. I realize I can set: property namehadoop.pipes.java.recordreader/name valuetrue/value /property or alterntively -D hadoop.pipes.java.recordreader=true ...to get the default reader (and that works

Re: Pipes program with Java InputFormat/RecordReader

2010-04-14 Thread Amareshwari Sri Ramadasu
Hi Keith, My answers inline. On 4/15/10 12:57 AM, Keith Wiley kwi...@keithwiley.com wrote: How do I use a nondefault Java InputFormat/RecordReader with a Pipes program. I realize I can set: property namehadoop.pipes.java.recordreader/name valuetrue/value /property

RE: Get Line Number from InputFormat

2010-04-06 Thread Michael Segel
haven't had my first cup of coffee yet. :-) ) From: am...@yahoo-inc.com To: common-user@hadoop.apache.org Date: Tue, 6 Apr 2010 12:14:56 +0530 Subject: Re: Get Line Number from InputFormat Hi, If your records are structured / of equal size, then getting the line number is straightforward

Get Line Number from InputFormat

2010-04-05 Thread Song Liu
Dear all, TextInputFormat send the Offset, Line into the Mapper, however, the offset is sometime meaningless, and confusing. Is it possible to have a InputFormat which outputs Line NO., line into mapper? Thanks a lot. Song

Want to create custom inputformat to read from solr

2010-02-24 Thread Rakhi Khatwani
Hi, Has anyone tried creating customInputFormat which reads from solrIndex for processing using mapreduce??? is it possible doin tht?? and how? Regards, Raakhi

Re: Want to create custom inputformat to read from solr

2010-02-24 Thread Rekha Joshi
The last I heard, there were some discussions of instead creating solr index using hadoop mapreduce rather than pushing solr index into hdfs and so on. SOLR-1045 ad SOLR-1301 can provide you more info. Cheers, /R On 2/24/10 4:23 PM, Rakhi Khatwani rkhatw...@gmail.com wrote: Hi, Has

Custom InputFormat: LineRecordReader.LineReader reads 0 bytes

2010-02-23 Thread Alexey Tigarev
Hi All! I am implementing a custom InputFormat. Its custom RecordReader uses LineRecordReader.LineReader inside. In some cases its read() method returns 0, i.e. reads 0 bytes. This happen also in unit test where it reads form a regular file on UNIX filesystem. What does it mean and how should I

Re: custom InputFormat

2010-01-12 Thread valentina kroshilina
{ first.write(out); text.write(out); } } -Original Message- From: valentina kroshilina [mailto:kroshil...@gmail.com] Sent: 2010年1月8日 12:05 To: common-user@hadoop.apache.org Subject: custom InputFormat I have LongWritable, IncidentWritable

InputFormat key and value class names

2010-01-08 Thread Bassam Tabbara
Given an InputFormatK,V what is the easiest way of retrieving the class name of K and V? Is reflection the only way? Thanks! Bassam

custom InputFormat

2010-01-08 Thread valentina kroshilina
I have LongWritable, IncidentWritable key-value pair as output from one job, that I want to read as input in my second job, where IncidentWritable is custom Writable(see code below). How do I read IncidentWritable in my custom Reader? I don't know how to convert byte[] to IncidentWritable. Code

  1   2   >