Re: How to make a lucene Document hadoop Writable?

2008-05-28 Thread David Chung
unsubscribe

Re: How to make a lucene Document hadoop Writable?

2008-05-27 Thread Jim the Standing Bear
apparently things have changed from nutch 0.9. Actually Hadoop's API has also changed quite a bit since then, deprecated methods such as JobConf.setInputKeyClass and setInputValueClass are no longer available, which was used extensively in nutch 0.9. -- Jim On Tue, May 27, 2008 at 11:26 PM, Denn

Re: How to make a lucene Document hadoop Writable?

2008-05-27 Thread Jim the Standing Bear
I see. Just realized the lucene tutorial fromwhich I grabbed: >> document.add(Field.Text("author", author)); >>document.add(Field.Text("title", title)); >>document.add(Field.Text("topic", topic)); are using obsolete APIs. The latest ones should use constructors instead, and it b

Re: How to make a lucene Document hadoop Writable?

2008-05-27 Thread Dennis Kubes
When reading docs from a processed lucene index, say off disk, any fields that are not stored are not repopulated and will not appear in documents fields. You also won't be able to read a document and then pass it to another indexwriter (say if you wanted to do an index splitter) and have it k

Re: How to make a lucene Document hadoop Writable?

2008-05-27 Thread Dennis Kubes
In the nutch trunk svn it is like this: output.collect(key, new LuceneDocumentWrapper(doc)); But that is only a passthrough to the output format. Write and readFields isn't implmented for the writable, it just passes the object through to the output format which creates the lucene index. De

Re: How to make a lucene Document hadoop Writable?

2008-05-27 Thread Jim the Standing Bear
Hi Dennis, Now I see the picture. I would love to see the code you have for creating complex writables - thanks for sharing it! Since I just started to look at lucence the other day, I may once again misunderstand what you were saying by "serialization/deserialization of lucene document will los

Re: How to make a lucene Document hadoop Writable?

2008-05-27 Thread Jim the Standing Bear
I am replying to myself because I just found something interesting in Nutch, yet it raises more questions. In Nutch 0.9 source code, in org.apache.nutch.indexer.Indexer.java, there is a line that says: output.collect(key, new ObjectWritable(doc)); where doc is a lucene Document object. This see

Re: How to make a lucene Document hadoop Writable?

2008-05-27 Thread Dennis Kubes
You can get the bytes using those methods and write them to a data output. You would probably also want to write an int before it in the stream to tell the number of bytes for the object. If you are wanting to not use the java serialization process and translate an object to bytes that is a l

Re: How to make a lucene Document hadoop Writable?

2008-05-27 Thread Jim the Standing Bear
Thanks for the quick response, Dennis. However, your code snippet was about how to serialize/deserialize using ObjectInputStream/ObjectOutputStream. Maybe it was my fault for not making the question clear enough - I was wondering if and how I can serialize/deserialize using only DataInput and Dat

Re: How to make a lucene Document hadoop Writable?

2008-05-27 Thread Dennis Kubes
You can use something like the code below to go back and forth from serializables. The problem with lucene documents is that fields which are not stored will be lost during the serialization / deserialization process. Dennis public static Object toObject(byte[] bytes, int start) throws IOE

How to make a lucene Document hadoop Writable?

2008-05-27 Thread Jim the Standing Bear
Hello, I am not sure if this is a genuine hadoop question or more towards a core-java question. I am hoping to create a wrapper over Lucene Document, so that this wrapper can be used for the value field of a Hadoop SequenceFile, and therefore, this wrapper must also implement the Writable interfa