unsubscribe
apparently things have changed from nutch 0.9. Actually Hadoop's API
has also changed quite a bit since then, deprecated methods such as
JobConf.setInputKeyClass and setInputValueClass are no longer
available, which was used extensively in nutch 0.9.
-- Jim
On Tue, May 27, 2008 at 11:26 PM, Denn
I see. Just realized the lucene tutorial fromwhich I grabbed:
>> document.add(Field.Text("author", author));
>>document.add(Field.Text("title", title));
>>document.add(Field.Text("topic", topic));
are using obsolete APIs. The latest ones should use constructors
instead, and it b
When reading docs from a processed lucene index, say off disk, any
fields that are not stored are not repopulated and will not appear in
documents fields. You also won't be able to read a document and then
pass it to another indexwriter (say if you wanted to do an index
splitter) and have it k
In the nutch trunk svn it is like this:
output.collect(key, new LuceneDocumentWrapper(doc));
But that is only a passthrough to the output format. Write and
readFields isn't implmented for the writable, it just passes the object
through to the output format which creates the lucene index.
De
Hi Dennis,
Now I see the picture. I would love to see the code you have for
creating complex writables - thanks for sharing it!
Since I just started to look at lucence the other day, I may once
again misunderstand what you were saying by
"serialization/deserialization of lucene document will los
I am replying to myself because I just found something interesting in
Nutch, yet it raises more questions.
In Nutch 0.9 source code, in org.apache.nutch.indexer.Indexer.java,
there is a line that says:
output.collect(key, new ObjectWritable(doc));
where doc is a lucene Document object. This see
You can get the bytes using those methods and write them to a data
output. You would probably also want to write an int before it in the
stream to tell the number of bytes for the object. If you are wanting
to not use the java serialization process and translate an object to
bytes that is a l
Thanks for the quick response, Dennis. However, your code snippet was
about how to serialize/deserialize using
ObjectInputStream/ObjectOutputStream. Maybe it was my fault for not
making the question clear enough - I was wondering if and how I can
serialize/deserialize using only DataInput and Dat
You can use something like the code below to go back and forth from
serializables. The problem with lucene documents is that fields which
are not stored will be lost during the serialization / deserialization
process.
Dennis
public static Object toObject(byte[] bytes, int start)
throws IOE
Hello,
I am not sure if this is a genuine hadoop question or more towards a
core-java question. I am hoping to create a wrapper over Lucene
Document, so that this wrapper can be used for the value field of a
Hadoop SequenceFile, and therefore, this wrapper must also implement
the Writable interfa
11 matches
Mail list logo