The problem with the Lucene indexer really lies in the Document creation. A document, as best I can tell, is a sort of map-representation (in the form [field: value]) of the object to be indexed.
Right. You're coming from the perspective of indexing objects, which I see this as an atypical operation, at least in the generalizable way you're approaching it with XDoclet. I've indexed all sorts of different things - XML files, HTML files, database information, and more, but have not had a need for general object indexing.
Creating such a Document requrires the Document Builder to somehow know about the internals of the object to be indexed. There seem to be a few solutions: 1. For any object to be indexed, the object must implement some Indexable interface with a method like getObjectAsDocument or getObjectAsPropertyMap or something. 2. For any object to be indexed, the object must adhere to Bean standards and the Document Builder simply uses BeanUtils to do the indexing. 3. Use the strategy pattern and have a DocumentBuilderStrategy implementation for each and every class to be indexed. 4. Have the document builder refer to some external mappings file for each class.
Of these: 1 - unwieldy 2 - breaks encapsulation 3 - balloons the number of classes required 4 - a pain in the neck to continually update mappings files to match class changes
... and that's where xdoclet comes in. By adding @lucene tags to a class and having a <lucenedoclet/> task, we can have Ant auto-generate these mappings when we update our class.
But your objects will still need to expose their data through the Bean standards (item 2 above) in order for a generic indexer to get at the data. You show that below by tagging a getter.
/** * @hibernate property column="name" * @lucene field name="name" */ public String getName() { return this.name; }
this would generate, via ant, some sort of xml file, like: [com/sun/petstore/Dog.lcm.xml]: <lucene-mapping class="com.sun.petstore.Dog"> <field name="name" type="java.lang.String" /> </lucene-mapping>
we then create a generic indexer which looks for .lcm.xml files and uses them to create fields for a supplied collection of objects to be indexed.
Sure, this seems reasonable for the most simplistic cases. You'll also need a facility in there to control the attributes on the fields (indexed, tokenized, stored). And perhaps fields will require different types of analysis if they are tokenized? You may also want to capture boost information for index-time boosting. I'm not sure what the type information gives you in your example. Lucene deals with String's or java.io.Reader's. That's all. Whatever information you get will need to be made textual somehow. And there are several considerations when dealing with numeric and date information like zero padding numerics and perhaps converting dates to YYYYMMDD (or using the built-in DateField utility).
In summary, I think I'm proposing the latter of your two options, but my discrete-math terminology is a bit lacking.
Right - I understand what you're after now.
I don't quite approve of it though, nor have I seen a scenario that would fit such a generic implementation. In my Lucene indexes, I have aggregate fields that glue together text from other fields in order to have an "artificial" field for easy searching. I have document-level boosting, not just field-level boosting, at index time in some cases.
Please feel free to continue your efforts despite my lack of enthusiasm for it, though. I'm no longer an active XDoclet user (I don't work with EJB's or Struts any more, thank goodness!). I'd love to see more Lucene integration at lots of levels. I'm skeptical of a generic indexer though - so prove me wrong.
Your first step should be to build the generic indexer, without regard for XDoclet. Create a mapping file by hand, iron out the kinks, then get to the XDoclet part when you've got the rest working fine.
Erik
-James
On Mon, 22 Nov 2004 14:48:20 -0500, Erik Hatcher <[EMAIL PROTECTED]> wrote:As an active Lucene committer, (former) XDoclet user (and committer), and co-author of the upcoming Lucene in Action book - http://www.manning.com/hatcher2 - I'm intrigued by this.
I'm not following what you plan on indexing with Lucene though. Are you proposing to index the tag values as fields? Or are you proposing some type of object graph indexing with the mappings of object paths encoded as XDoclet tags?
Erik
On Nov 22, 2004, at 12:25 PM, James Rosen wrote:
While reading Ara Abrahamian's book on open-source development, I came
across a section on using xml config files to do lucene mappings. I
thought this would be a perfect addition to xdoclet. I would be happy
to write the following, but am new to sourceforge and will need a
little help getting started.
My proposal includes:
-a <lucenedoclet/>
[EMAIL PROTECTED] tags
-an addition to the lucene library (I realize this is not your domain,
so it will have to be an add-on for now) allowing the indexers and
document-builders to use .lcm.xml files
If this project is of interest to you, please post to the list.
-James
(Attached: response from Mr. Abrahamian) PS - moderators can ignore my previous message - I didn't realize it was a member's-only list, and have subsequently joined to resend the message.
---------- Forwarded message ---------- From: Ara Abrahamian <[EMAIL PROTECTED]> Date: Mon, 22 Nov 2004 18:47:49 +0330 Subject: RE: lucene tags To: James <[EMAIL PROTECTED]>
Hi,
I'm not an active contributor to xdoclet any more. I don't know what's
going
on in there these days :-) So, please send an email to xdoclet's
xdoclet-devel mailing list describing what you are going to do with the
@lucene tags and the lucenedoclet module. I'm sure you'll get a lot of
help
from the team :-)
Ara.
-----Original Message----- From: Nobody [mailto:[EMAIL PROTECTED] On Behalf Of James Sent: Sunday, November 21, 2004 11:11 PM To: [EMAIL PROTECTED] Subject: lucene tags
Message body follows:
I have been reading your open source development book, and would like to offer my help to the project. The Lucene class-config mapping files you mention in chapter 18 are a perfect fit for a new <lucenedoclet/> task and some @lucene tags. I am happy to write these, but am new to SourceForge and don't really know where to begin contributing. Please advise if this would be a welcome addition. -James
--
This message has been sent to you, a registered SourceForge.net user,
by another site user, through the SourceForge.net site. This message
has been delivered to your SourceForge.net mail alias. You may reply
to this message using the "Reply" feature of your email client, or
using the messaging facility of SourceForge.net at:
https://sourceforge.net/sendmessage.php?touser=1163582
------------------------------------------------------- SF email is sponsored by - The IT Product Guide Read honest & candid reviews on hundreds of IT Products from real users. Discover which products truly live up to the hype. Start reading now. http://productguide.itmanagersjournal.com/ _______________________________________________ xdoclet-devel mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/xdoclet-devel
-------------------------------------------------------
SF email is sponsored by - The IT Product Guide
Read honest & candid reviews on hundreds of IT Products from real users.
Discover which products truly live up to the hype. Start reading now.
http://productguide.itmanagersjournal.com/
_______________________________________________
xdoclet-devel mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/xdoclet-devel
-------------------------------------------------------
SF email is sponsored by - The IT Product Guide
Read honest & candid reviews on hundreds of IT Products from real users.
Discover which products truly live up to the hype. Start reading now.
http://productguide.itmanagersjournal.com/
_______________________________________________
xdoclet-devel mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/xdoclet-devel
-------------------------------------------------------
SF email is sponsored by - The IT Product Guide
Read honest & candid reviews on hundreds of IT Products from real users.
Discover which products truly live up to the hype. Start reading now. http://productguide.itmanagersjournal.com/
_______________________________________________
xdoclet-devel mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/xdoclet-devel