Hi,
I am creating a custom input format that extends FileInputFormat (
org.apache.hadoop.mapreduce.lib.input.FileInputFormat). I used the new
hadoop API - hadoop-mapreduce-client-ore 2.8.3 and I am running this on AWS
EMR.
My intention is to spread the input files among the hosts, in a way each
number of segments, size of subheaders and segment data
(I'll need this to create my splits).
To digest it all, I'm wondering if it's best to create a custom InputFormat
inheriting from (1) FileInputFormat or (2) SequenceFileInputFormat.
If I go with (1), I will create HeaderSplits and DataSplits
Hi,
I have defined an InputFormat class and an OutputFormat class. It looks to me
that I have to create a MR to test whether they work.
I would ask whether there is way to junit customized Hadoop
InputFormat/OutputFormat without kicking off an MR application.
Thanks!
Hello,
I am trying to write a Hadoop program that handles JSON and hence wrote a
CustomInputFormat to handle the data. The Custom format extends the
RecordReader and then overrides the nextKeyValue() method.
However, this doesn't solve the problem when one JSON object is split
across two
Hi Folks,
I'm open sourcing a scalable Kafka InputFormat. As far as I know or am aware
of, my version is unique compared to other Kafka InputFormats out there, in
that input splits are mapped to Kafka log files, rather than entire Kafka
partitions. Mapping Kafka log files to input splits
I'd like to know if there's an InputFormat to be able to deal with log files.
The problem that I have it's that if I have to read an Tomcat log for example,
sometimes the exceptions are typed on several lines, but they should be
processed just like one line, I mean all the lines together
Have you read http://blog.rguha.net/?p=293 ?
Cheers
On Sun, Oct 5, 2014 at 6:24 AM, Guillermo Ortiz gor...@pragsis.com wrote:
I'd like to know if there's an InputFormat to be able to deal with log
files. The problem that I have it's that if I have to read an Tomcat log
for example
, although it seems too
expensive in time process and the operations in the InputFormat should be
pretty fast.
Any better idea?
- Mensaje original -
De: Ted Yu yuzhih...@gmail.com
Para: common-u...@hadoop.apache.org user@hadoop.apache.org
Enviados: Domingo, 5 de Octubre 2014 16:27:18
and the operations in the
InputFormat should be pretty fast.
Any better idea?
--
*De: *Ted Yu yuzhih...@gmail.com
*Para: *common-u...@hadoop.apache.org user@hadoop.apache.org
*Enviados: *Domingo, 5 de Octubre 2014 16:27:18
*Asunto: *Re: InputFormat for dealing
wholeFileInputFormat. But i am not sure filename comes to map
process either as key or value. But, I think this file format reads the
contents of the file. I wish to have a inputformat that just gives filename
or list of filenames.
Also, files are very small. The wholeFileInputFormat spans one map process
comes to map
process either as key or value. But, I think this file format reads the
contents of the file. I wish to have a inputformat that just gives filename
or list of filenames.
Also, files are very small. The wholeFileInputFormat spans one map process
per file and thus results huge number
. But i am not sure filename comes to map
process either as key or value. But, I think this file format reads the
contents of the file. I wish to have a inputformat that just gives filename
or list of filenames.
Also, files are very small. The wholeFileInputFormat spans one map process
per
filename comes to map
process either as key or value. But, I think this file format reads the
contents of the file. I wish to have a inputformat that just gives filename
or list of filenames.
Also, files are very small. The wholeFileInputFormat spans one map
process per file and thus results
that.
Yong
Date: Thu, 21 Aug 2014 22:26:12 +0530
Subject: Re: Hadoop InputFormat - Processing large number of small files
From: rab...@gmail.com
To: user@hadoop.apache.org
Hello,
This means that a file with names of all the files that need to be processed
and is fed to hadoop with NLineInputFormat
Have you looked at the WholeFileInputFormat implementations? There are
quite a few if search for them...
http://hadoop-sandy.blogspot.com/2013/02/wholefileinputformat-in-java-hadoop.html
https://github.com/tomwhite/hadoop-book/blob/master/ch07/src/main/java/WholeFileInputFormat.java
Regards,
Thanks for the response.
Yes, I know wholeFileInputFormat. But i am not sure filename comes to map
process either as key or value. But, I think this file format reads the
contents of the file. I wish to have a inputformat that just gives filename
or list of filenames.
Also, files are very small
On Aug 20, 2014, at 8:19 AM, rab ra rab...@gmail.com wrote:
Thanks for the response.
Yes, I know wholeFileInputFormat. But i am not sure filename comes to map
process either as key or value. But, I think this file format reads the
contents of the file. I wish to have a inputformat
Hello,
I have a use case wherein i need to process huge set of files stored in
HDFS. Those files are non-splittable and they need to be processed as a
whole. Here, I have the following question for which I need answers to
proceed further in this.
1. I wish to schedule the map process in task
done in
your current InputFormat implementation.
If you're looking to store a single file path, use the FileSplit
class, or if not as simple as that, do use it as a base reference to
build you Path based InputSplit derivative. Its sources are at
https://github.com/apache/hadoop-common/blob/release
Hi,
I wrote a custom InputFormat and InputSplit to handle netcdf file. I use
with a custom pig Load function. When I submitted a job by running a pig
script. I got an error below. From the error log, the network location
name is
hdfs://service-1-0.local:8020/user/patcharee/netcdf_data
Do not use the InputSplit's getLocations() API to supply your file
path, it is not intended for such things, if thats what you've done in
your current InputFormat implementation.
If you're looking to store a single file path, use the FileSplit
class, or if not as simple as that, do use
Hi,
(I'm assuming 1.0~ MR here)
On Sun, Sep 22, 2013 at 1:00 AM, Steve Lewis lordjoe2...@gmail.com wrote:
Classes implementing InputFormat implement
public ListInputSplit getSplits(JobContext job) which a List if
InputSplits. for FileInputFormat the Splits have Path.start and End
1) When
lordjoe2...@gmail.com
wrote:
Classes implementing InputFormat implement
public ListInputSplit getSplits(JobContext job) which a List if
InputSplits. for FileInputFormat the Splits have Path.start and End
1) When is this method called and on which JVM on Which Machine and is it
called
ha...@cloudera.com wrote:
Hi,
(I'm assuming 1.0~ MR here)
On Sun, Sep 22, 2013 at 1:00 AM, Steve Lewis lordjoe2...@gmail.com
wrote:
Classes implementing InputFormat implement
public ListInputSplit getSplits(JobContext job) which a List if
InputSplits. for FileInputFormat the Splits
Classes implementing InputFormat implement
public ListInputSplit getSplits(JobContext job) which a List if
InputSplits. for FileInputFormat the Splits have Path.start and End
1) When is this method called and on which JVM on Which Machine and is it
called only once?
2) Do the number of Map task
Using InputFormat under mapreduce package. mapred package is very old
package. but generally you can extend from FileInputFormat under
o.a.h.mapreduce package.
On Fri, Jul 5, 2013 at 1:23 PM, Devaraj k devara...@huawei.com wrote:
Hi Ahmed,
** **
Hadoop 0.20.0 included
Hi I'm developing a new set of InputFormats that are used for a project I'm
doing. I found that there are two ways to create a new InputFormat.
1- Extend the abstract class org.apache.hadoop.mapreduce.InputFormat
2- Implement the interface org.apache.hadoop.mapred.InputFormat
I don't know why
InputFormat to use?
Hi I'm developing a new set of InputFormats that are used for a project I'm
doing. I found that there are two ways to create a new InputFormat.
1- Extend the abstract class org.apache.hadoop.mapreduce.InputFormat
2- Implement the interface org.apache.hadoop.mapred.InputFormat
I don't
From: Ahmed Eldawy [mailto:aseld...@gmail.com]
Sent: 05 July 2013 00:00
To: user@hadoop.apache.org
Subject: Which InputFormat to use?
Hi I'm developing a new set of InputFormats that are used for a project I'm
doing. I found that there are two ways to create a new InputFormat.
1- Extend
in a nail (json file) with a screwdriver (
XMLInputReader) then perhaps the reason it won't work may be that you are
using the wrong tool?
On Jun 21, 2013 11:38 PM, jamal sasha jamalsha...@gmail.com wrote:
Hi,
I am using one of the libraries which rely on InputFormat.
Right now
If you try to hammer in a nail (json file) with a screwdriver (
XMLInputReader) then perhaps the reason it won't work may be that you are
using the wrong tool?
On Jun 21, 2013 11:38 PM, jamal sasha jamalsha...@gmail.com wrote:
Hi,
I am using one of the libraries which rely on InputFormat
Hi...
I need to communicate some proprietary number (long) values from the
getSplits() method of a custom InputFormat class to the Hadoop driver class
(used to launch the job), but the JobContext object passed to the
getSplits() method has no access to a Counters object.
From the source code
counters = work.getCounters();
}
Would that be correct?
On Fri, May 17, 2013 at 5:33 PM, Public Network Services
publicnetworkservi...@gmail.com wrote:
Hi...
I need to communicate some proprietary number (long) values from the
getSplits() method of a custom InputFormat class to the Hadoop driver
.
We have something like the following: https://gist.github.com/anonymous/5364554
(The naming is a little off since its technically not an InputFormat. .any
ideas on a proper name?) Basically it uses retrieves all directory for a given
path and sorts them in descending order, limiting to the last
Subject: InputFormat for some REST api
Hi,
Do you know of any InputFormat implemented for some REST api provider?
Usually when one needs to process data that is accessible only by REST, one
should try to download the data first someone, but what if you cannot download
it?
thanks
@hadoop.apache.org
Date: Tuesday, February 19, 2013 4:49 AM
To: user@hadoop.apache.org user@hadoop.apache.org
Subject: InputFormat for some REST api
Hi,
Do you know of any InputFormat implemented for some REST api provider?
Usually when one needs to process data that is accessible only by REST
@hadoop.apache.org
Subject: InputFormat for some REST api
Hi,
Do you know of any InputFormat implemented for some REST api provider?
Usually when one needs to process data that is accessible only by REST,
one should try to download the data first someone, but what if you cannot
download it?
thanks
.
--Bobby
From: Yaron Gonen yaron.go...@gmail.com
Reply-To: user@hadoop.apache.org user@hadoop.apache.org
Date: Tuesday, February 19, 2013 4:49 AM
To: user@hadoop.apache.org user@hadoop.apache.org
Subject: InputFormat for some REST api
Hi,
Do you know of any InputFormat implemented for some
Hi guys
I met a interesting problem when I implement my own custom InputFormat
which extends the FileInputFormat.(I rewrite the RecordReader class but not
the InputSplit class)
My recordreader will take following format as a basic record: (my
recordreader extends the LineRecordReader. It returns
wrote:
Hi guys
I met a interesting problem when I implement my own custom InputFormat which
extends the FileInputFormat.(I rewrite the RecordReader class but not the
InputSplit class)
My recordreader will take following format as a basic record: (my
recordreader extends the LineRecordReader
InputFormat
which
extends the FileInputFormat.(I rewrite the RecordReader class but not the
InputSplit class)
My recordreader will take following format as a basic record: (my
recordreader extends the LineRecordReader. It returns a record if it
meets
#Trailer# and contains #Header#. I only
record? Your case is not very different
from the newlines logic presented here:
http://wiki.apache.org/hadoop/HadoopMapReduce
On Wed, Aug 29, 2012 at 11:13 AM, Chen He airb...@gmail.com wrote:
Hi guys
I met a interesting problem when I implement my own custom InputFormat
which
extends
, both have no access to counters.
Is there really no way to increment counters inside of a RecordReader or
InputFormat in the mapreduce api?
, both
have no access to counters.
Is there really no way to increment counters inside of a RecordReader or
InputFormat in the mapreduce api?
--
Harsh J
gupta
Sent: Friday, March 16, 2012 3:39 AM
To: common-user@hadoop.apache.org
Subject: Re: Suggestion for InputSplit and InputFormat - Split every
line.
Have a look at NLineInputFormat class in Hadoop. It is build to split
the input on the basis of number of lines.
On Thu, Mar 15, 2012 at 6:13 PM
Cool - thanks for the confirmation and link, Joey, very helpful.
-Original Message-
From: Joey Echeverria [mailto:j...@cloudera.com]
Sent: 14 March 2012 19:03
To: common-user@hadoop.apache.org
Subject: Re: decompressing bzip2 data with a custom InputFormat
Yes you have to deal
to be
called only once), and work on the data source.
What's the best way to construct InputSplit, InputFormat and RecordReader
to achieve this? I would appreciate any example code :)
Best,
Deepak
, InputFormat and RecordReader
to achieve this? I would appreciate any example code :)
Best,
Deepak
--
Thanks Regards,
Anil Gupta
Hi - sorry to bump this, but I'm having trouble resolving this.
Essentially the question is: If I create my own InputFormat by subclassing
TextInputFormat, does the subclass have to handle its own streaming of
compressed data? If so, can anyone point me at an example where this is done
/LineRecordReader.java
-Joey
On Wed, Mar 14, 2012 at 11:08 AM, Tony Burton tbur...@sportingindex.com wrote:
Hi - sorry to bump this, but I'm having trouble resolving this.
Essentially the question is: If I create my own InputFormat by subclassing
TextInputFormat, does the subclass have to handle its
Hi,
I'm setting up a map-only job that reads large bzip2-compressed data files,
parses the XML and writes out the same data in plain text format. My XML
InputFormat extends TextInputFormat and has a RecordReader based upon the one
you can see at http://xmlandhadoop.blogspot.com/ (my version
Hi,
i'm trying to define my own InputFormat and RecordReader, however I'm
getting a type mismatch error in the createRecordReader method of the
InputFormat class.
Here is the inputformat:
http://codepad.org/wdr2NqBe http://codepad.org/wdr2NqBe
here is the recordreader:
http
Hi,
i'm trying to define my own InputFormat and RecordReader, however I'm
getting a type mismatch error in the createRecordReader method of the
InputFormat class.
Here is the inputformat:
http://codepad.org/wdr2NqBe
here is the recordreader:
http://codepad.org/9cmY6BjS http
that in a org.apache.hadoop.mapreduce.InputFormat
derivative, which will not work.
Try changing your InputFormat to import
org.apache.hadoop.mapred.InputFormat or vice versa (for your
RecordReader).
On Fri, Oct 21, 2011 at 12:58 PM, Simon Klausner h...@gmx.at wrote:
Hi,
i'm trying to define my own InputFormat
that
are escaped by a backslash (\\n and \\t). As a result I've opted to create
my own InputFormat to handle the multiple newlines and convert those tabs to
spaces when Hive is going to try to do a split on the tabs.
I've found a fairly good reference for doing this using the newer
InputFormat API
Hello Steve,
On Mon, Sep 12, 2011 at 7:57 AM, Steve Lewis lordjoe2...@gmail.com wrote:
I have a problem where there is a single, relatively small (10-20 MB) input
file. (It happens it is a fasta file which will have meaning if you are a
biologist.) I am already using a custom InputFormat
meaning if you are
a
biologist.) I am already using a custom InputFormat and a custom
reader
to force a custom parsing. The file may generate tens or hundreds of
millions of key value pairs and the mapper does a fair amount of work
on
each record.
The standard implementation
accesses, but I do need
to be aware of the keys, as I need to be sure that I get all of the
relevant keys sent to a given mapper
2. Looks like I want a custom inputformat for this, extending
SequenceFileInputFormat. Do you agree? I'll gladly take some
opinions on this, as I ultimately want to split
below).
2. Looks like I want a custom inputformat for this, extending
SequenceFileInputFormat. Do you agree? I'll gladly take some
opinions on this, as I ultimately want to split the based on what's in
the file, which might be a little unorthodox.
If you need to split based on where certain
3. Another idea might be create separate seq files for chunk of
records and make them non-splittable, ensuring that they go to a
single mapper. Assuming I can get away with this, see any pros/cons
with that approach?
Separate sequence files would require the least amount of custom code.
You could either use a custom RecordReader or you could override the
run() method on your Mapper class to do the merging before calling the
map() method.
-Joey
On Wed, Jul 27, 2011 at 11:09 AM, Tom Melendez t...@supertom.com wrote:
3. Another idea might be create separate seq files for chunk
This is a bit out of left field, but you could add a 'key' field at the
beginning of each record (which you would arrange to be the record
number), and then use the keyValue input format. Now your keys are the
record number.
This might be prohibitive if your data is already on HDFS, and you
Hi,
I want to know how to get the actual line number of the input file in
the mapper.
The key, which TextInputFormat generates, is the bytes offset in the
file. So, how can I find the global line offset in the mapper?
Thanks
- -
Pei
Hello Pei,
On Thu, Apr 28, 2011 at 6:58 AM, Pei HE pei...@gmail.com wrote:
The key, which TextInputFormat generates, is the bytes offset in the
file. So, how can I find the global line offset in the mapper?
This is not achievable unless you have fixed byte records (in which
case you should be
, 2011 at 11:41 PM, Harsh J qwertyman...@gmail.com wrote:
It is worth reading some implementations of already existing
InputFormat classes, such as the simple TextInputFormat, or the
SequenceFileInputFormat which also has a RecordReader implementation
in it.
You may find these source files
FYI.
Moving back to hadoop 0.20.2 solved my problem.
Thanks,
Souri
On Fri, Mar 4, 2011 at 11:27 PM, souri datta souri.isthe...@gmail.com wrote:
Hi,
Is there a good tutorial for writing custom InputFormat classes?
Any help would be greatly appreciated.
Thanks,
Souri
Hi,
Is there a good tutorial for writing custom InputFormat classes?
Any help would be greatly appreciated.
Thanks,
Souri
It is worth reading some implementations of already existing
InputFormat classes, such as the simple TextInputFormat, or the
SequenceFileInputFormat which also has a RecordReader implementation
in it.
You may find these source files in your downloaded Hadoop
distribution's src/ directory itself
InputFormat classes, such as the simple TextInputFormat, or the
SequenceFileInputFormat which also has a RecordReader implementation
in it.
You may find these source files in your downloaded Hadoop
distribution's src/ directory itself (in their appropriate packages).
I do not know
Yes, it may be too much to grasp in the first read. Reading a non
text-based record reader implementation helps (something that has its
own reader class, and just uses record readers to manage that). I'd
suggested SequenceFile for this case.
On Fri, Mar 4, 2011 at 11:51 PM, souri datta
as a InputFormat but it sends only one
line
to the Mapper which is very in efficient.
So can you guide me to write a InputFormat which splits the file
into multiple Splits and each mapper can read multiple
line from each split
Regards
Madhukar
--
View this message in context
line of the file is a
number
.
I want to find the sum all those numbers.
I wanted to use NLineInputFormat as a InputFormat but it sends only one
line
to the Mapper which is very in efficient.
So can you guide me to write a InputFormat which splits the file
into multiple Splits
Hi
I have a very large file of size 1.4 GB. Each line of the file is a number .
I want to find the sum all those numbers.
I wanted to use NLineInputFormat as a InputFormat but it sends only one line
to the Mapper which is very in efficient.
So can you guide me to write a InputFormat which splits
//So can you guide me to write a InputFormat which splits the file
//into multiple Splits
more the number of mappers u assign , more the number of input splits in the
mapreduce..
in effect, the number of inputsplits is equal to the number of mappers
assigned.
that should take care of the problem
.
On Fri, Dec 17, 2010 at 7:58 AM, madhu phatak phatak@gmail.com wrote:
Hi
I have a very large file of size 1.4 GB. Each line of the file is a number
.
I want to find the sum all those numbers.
I wanted to use NLineInputFormat as a InputFormat but it sends only one
line
to the Mapper
GB. Each line of the file is a number
.
I want to find the sum all those numbers.
I wanted to use NLineInputFormat as a InputFormat but it sends only one
line
to the Mapper which is very in efficient.
So can you guide me to write a InputFormat which splits the file
into multiple Splits
Dear All,
I've got a Folder A, and has a Symbol Link folder A' linked to A, but
when I add A' as one of the inputformat folders, it gives me this error:
Exception in thread main
org.apache.hadoop.hdfs.protocol.UnresolvedPathException:
hdfs://localhost:9000/user/songliu/W
Harsh, thank you for your response. That's what I guessed.
In 0.20, Interface InputFormat under mapred package was deprecated. In 0.21,
it is no longer deprecated. Why is that?
Thanks,
Jane
--- On Tue, 12/7/10, Harsh J qwertyman...@gmail.com wrote:
From: Harsh J qwertyman...@gmail.com
Jane Chen wrote:
In 0.20, Interface InputFormat under mapred package was deprecated.
In 0.21, it is no longer deprecated. Why is that?
IIRC, it's because not all of the old features have yet been reproduced
in the new API, so it's premature to deprecate the old one. The list
archives over
Hi guys,
I'm looking at the diagram on Module 4 of Yahoo-Hadoop-Tutorial. The
figure shows files in one node taken from the HDFS to InputFormat. My
question is, what is the name of the object that is responsible of reading
from HDFS? ie . the one that gives the file address
,
I have customized InputFormat class to read our log format in our hadoop
job
and Pig, which is built on top of Hadoop 0.20 api, now I'd like to re-use
this inputformat to load data into Hive table by specifying InputFormat,
and
a Serde when I create a table like below:
CREATE TABLE
Hi,
I have customized InputFormat class to read our log format in our hadoop job
and Pig, which is built on top of Hadoop 0.20 api, now I'd like to re-use
this inputformat to load data into Hive table by specifying InputFormat, and
a Serde when I create a table like below:
CREATE TABLE
Hi,
I have customized InputFormat class to read our log format in our hadoop job
and Pig, which is built on top of Hadoop 0.20 api, now I'd like to re-use
this inputformat to load data into Hive table by specifying InputFormat, and
a Serde when I create a table like below:
CREATE TABLE
On Wed, Sep 22, 2010 at 12:08 AM, Tianqiang Li peter...@gmail.com wrote:
Hi,
I have customized InputFormat class to read our log format in our hadoop job
and Pig, which is built on top of Hadoop 0.20 api, now I'd like to re-use
this inputformat to load data into Hive table by specifying
and tmp2 respectively.
I found that this query is executed in Hive with a MapReduce Job.
Therefore, I am wondering if tmp2 and tmp7 are both assumed to share the
same InputFormat class.
What if tmp2 and tmp7 are using different InputFormat classes to read
records?
Thanks,
WS
sHi, Namit,
Thanks a lot for your reply!
I checked the source code. Given a query, (select tmp7.* from tmp7 join
tmp2 on (tmp7.c2 = tmp2.c1)), there is only a MapReduce job generated. As
far as I know, the function setInputFormat would be used to set the job's
InputFormat class
See:
http://www.mail-archive.com/common-user@hadoop.apache.org/msg03280.html
On Tue, Jun 8, 2010 at 8:00 AM, Jak jakheart...@hotmail.com wrote:
Hi All,
I newbie to hadoop, Actually i followed the tutorial, configured and worked
with
wordcount example. it is worked great. I read yahoo
Hi All,
I newbie to hadoop, Actually i followed the tutorial, configured and worked with
wordcount example. it is worked great. I read yahoo developer tutorial in module
4, they mentioned input format. when i tried this format KeyValueInputFormat i
got the following error while compiling the java
Hi,
I am quite new to hadoop. I write my own StreamFastaInputFormat an
StreamFastaRecordReader in $
hadoopbase/src/contrib/streaming/src/java/org/apache/hadoop/streaming/
I run $ant under the directory $hadoopbase/src/contrib/streaming/
using the default build.xml. However it failed due to the
On Apr 14, 2010, at 20:55 , Amareshwari Sri Ramadasu wrote:
Hi Keith,
My answers inline.
On 4/15/10 12:57 AM, Keith Wiley kwi...@keithwiley.com wrote:
How do I use a nondefault Java InputFormat/RecordReader with a Pipes program.
I realize I can set:
property
How do I use a nondefault Java InputFormat/RecordReader with a Pipes program.
I realize I can set:
property
namehadoop.pipes.java.recordreader/name
valuetrue/value
/property
or alterntively -D hadoop.pipes.java.recordreader=true
...to get the default reader (and that works
Hi Keith,
My answers inline.
On 4/15/10 12:57 AM, Keith Wiley kwi...@keithwiley.com wrote:
How do I use a nondefault Java InputFormat/RecordReader with a Pipes program.
I realize I can set:
property
namehadoop.pipes.java.recordreader/name
valuetrue/value
/property
haven't had my first cup of coffee yet. :-) )
From: am...@yahoo-inc.com
To: common-user@hadoop.apache.org
Date: Tue, 6 Apr 2010 12:14:56 +0530
Subject: Re: Get Line Number from InputFormat
Hi,
If your records are structured / of equal size, then getting the line number
is straightforward
Dear all,
TextInputFormat send the Offset, Line into the Mapper, however, the
offset is sometime meaningless, and confusing. Is it possible to have a
InputFormat which outputs Line NO., line into mapper?
Thanks a lot.
Song
Hi,
Has anyone tried creating customInputFormat which reads from
solrIndex for processing using mapreduce??? is it possible doin tht?? and
how?
Regards,
Raakhi
The last I heard, there were some discussions of instead creating solr index
using hadoop mapreduce rather than pushing solr index into hdfs and so on.
SOLR-1045 ad SOLR-1301 can provide you more info.
Cheers,
/R
On 2/24/10 4:23 PM, Rakhi Khatwani rkhatw...@gmail.com wrote:
Hi,
Has
Hi All!
I am implementing a custom InputFormat.
Its custom RecordReader uses LineRecordReader.LineReader inside.
In some cases its read() method returns 0, i.e. reads 0 bytes. This
happen also in unit test where it reads form a regular file on UNIX
filesystem.
What does it mean and how should I
{
first.write(out);
text.write(out);
}
}
-Original Message-
From: valentina kroshilina [mailto:kroshil...@gmail.com]
Sent: 2010年1月8日 12:05
To: common-user@hadoop.apache.org
Subject: custom InputFormat
I have LongWritable, IncidentWritable
Given an InputFormatK,V what is the easiest way of retrieving the class name
of K and V? Is reflection the only way?
Thanks!
Bassam
I have LongWritable, IncidentWritable key-value pair as output from one
job, that I want to read as input in my second job, where IncidentWritable
is custom Writable(see code below).
How do I read IncidentWritable in my custom Reader? I don't know how to
convert byte[] to IncidentWritable.
Code
1 - 100 of 151 matches
Mail list logo