: This example code looks interesting. If I understand
: correctly using this approach requires that builders
: like the "q" QueryObjectBuilder instance must be
: explicitly registered with each and every builder that
: consumes its type of output eg BQOB and FQOB. An
correct.
: provider for the
This example code looks interesting. If I understand
correctly using this approach requires that builders
like the "q" QueryObjectBuilder instance must be
explicitly registered with each and every builder that
consumes its type of output eg BQOB and FQOB. An
alternative would be to register "q" jus
: I'd still like to keep the parser core reasonably generic (ie
: java.lang.Object rather than Query or Filter) because I can see it being
: used for instantiating many different types of objects eg requests for
: GroupBy , highlighting, indexing, testing etc.
: As for your type-safety requiremen
On 03/01/2006, at 11:08 AM, markharw00d wrote:
I thought
you said you "didn't really want to have to design a general API for
parsing XML as part of this project" ? :)
Having grown tired of messing with my own solution I tried using
commons Digester with my example XML but ran into iss
I thought
you said you "didn't really want to have to design a general API for
parsing XML as part of this project" ? :)
Having grown tired of messing with my own solution I tried using commons
Digester with my example XML but ran into issues so I'm back looking at
a custom solution.
I'
: I'm personally happier to stick with one approach,
: preferably with an existing, standardized interface
: which lets me switch implementations. I didn't really
: want to have to design a general API for parsing XML
: as part of this project.
I'm not suggesting that, I'm just saying that the AP
I suspect it's a little too ambitious to provide a
unifying common abstraction which wraps event based
*and* "pull" parser approaches.
I'm personally happier to stick with one approach,
preferably with an existing, standardized interface
which lets me switch implementations. I didn't really
want
: > I think that the ideal API wouldn't require people
: > writing ObjectBuilders
: > to know anything about sax, or to ever need to
: > import anything from
: > org.xml.** or javax.xml.**
:
: Fair enough. I presume we want to maintain the
: position that Lucene should not have any dependencies
: o
Sorry, slip of keyboard meant I posted last message
mid-edit.
Hi Chris,
Thanks for taking the time to review this.
> 1) I aplaud the plugable nature of your solution.
I think that's definitely a worthwhile objective.
> 2) Digging into what was involved in writting an
> ObjectBuilder, I found...
Hi Chris,
Thanks for taking the time to review this.
> 1) I aplaud the plugable nature of your solution.
That's definitely a worthwhile objective.
> 2) Digging into what was involved in writting an
> ObjectBuilder, I found...
> don't really feel like
> the API has a very clean seperation from SAX
Hey all,
I haven't been paying real close attention to this thread, but if any
of you are looking for something that has _easy_ Object->XML->Object
you should seriously try XStream (http://xstream.codehaus.org)..
Simplest/easiest api I've seen. BSD licensed too (Apache friendly).
One c
I finally got a chance to look at this code today (the best part about the
last day before vacation, is no one expects you to get anything done, so
you can ignore your "real work" and spend time on things that are more
important in the long run) and while I still havne't wrapped my head
arround al
I've looked at Mark's concept and code, and, IMHO, his implementation is
well-done and addresses a huge need. It allows you to conduct Lucene
searches that can harness all the power of the latest Query objects,
without any special Java coding. Yet it also allows the user to be
presented with
Ok. I admit hijacking Mark's original intentions :-) Nevertheless, I think it
would be a very interesting project to expose Lucene indexes via XQuery or at
least XPath initially with Lucene based full-text extensions, regardless of the
type of documents that have been indexed (I'm not talking ab
>However the moment you are promoting INTEROPERABILITY
with other
>search/retrieval systems by XMLizing the query input
and the >result output, like Mark is, then it makes
sense to adhere to >standards
I think this is hijacking my original intentions to
some extent. I may be accused of being shor
, December 19, 2005 6:44 PM
To: java-dev@lucene.apache.org
Subject: Re: "Advanced" query language
Comments in-line
Wolfgang Hoschek wrote:
> Yes, there are interesting impls out there. I've myself implemented
> XQuery fulltext search via extension functions build on Lucene. See
Comments in-line
Wolfgang Hoschek wrote:
Yes, there are interesting impls out there. I've myself implemented
XQuery fulltext search via extension functions build on Lucene. See
http://dsd.lbl.gov/nux/index.html#Google-like%20realtime%20fulltext%
20search%20via%20Apache%20Lucene%20engine
H
On Dec 17, 2005, at 2:36 PM, Paul Elschot wrote:
Gentlemen,
While maintaining my bookmarks I ran into this:
"Case Study: Enabling Low-Cost XML-Aware Searching
Capable of Complex Querying":
http://www.idealliance.org/papers/xmle02/dx_xmle02/papers/
03-02-08/03-02-08.html
Some loose thoughts:
Yes, there are interesting impls out there. I've myself implemented
XQuery fulltext search via extension functions build on Lucene. See
http://dsd.lbl.gov/nux/index.html#Google-like%20realtime%20fulltext%
20search%20via%20Apache%20Lucene%20engine
However, rather than targetting fulltext sear
Paul and Wolfang,
Thank you very much for your input. I think there are two distinct problems
that have emerged from this thread:
1) The ability to create efficient structures to index and query XML documents
(element, attributes and corresponding values) with a full-text query language
and pe
Gentlemen,
While maintaining my bookmarks I ran into this:
"Case Study: Enabling Low-Cost XML-Aware Searching
Capable of Complex Querying":
http://www.idealliance.org/papers/xmle02/dx_xmle02/papers/03-02-08/03-02-08.html
Some loose thoughts:
In the system described there a Lucene document is use
Personally, I tend to use DOM for config type stuff where performance
doesn't matter. I tend to avoid it for per-request XML processing
when you want potentially thousands per second. Besides being slower,
it generates more garbage.
-Yonik
On 12/16/05, Erik Hatcher <[EMAIL PROTECTED]> wrote:
>
I don't think DOM and RAM is necessarily an issue.
The object construction process accesses the content
in the same order that a SAX based path takes so that
just seems an appropriate approach. There is no need
to leap around the structure in any other way from
what I can see, which is where DOM w
Why wouldn't simply using DOM be sufficient? Is it envisioned that
a query XML would be large enough to prohibit RAM DOM loading of the
entire document?
Erik
On Dec 16, 2005, at 2:51 AM, mark harwood wrote:
While SAX is fast, I've found callback interfaces
more difficult to
deal
> While SAX is fast, I've found callback interfaces
> more difficult to
> deal with while generating nested object graphs...
> it normally
> requires one to maintain state in stack(s).
I've gone to some trouble to avoid the effects of this
on the programming model.
Stack management is handled by t
Right now the Sun STAX impl is decidedly buggy compared to xerces SAX
(and it's not faster either). The most complete, reliable and
efficient STAX impl seems to be woodstox.
Wolfgang.
On Dec 15, 2005, at 7:22 PM, Yonik Seeley wrote:
Agreed, that is a significant downside.
StAX is included
Agreed, that is a significant downside.
StAX is included in Java 6, but that doesn't help too much given the
Java 1.4 req.
-Yonik
On 12/15/05, Wolfgang Hoschek <[EMAIL PROTECTED]> wrote:
> STAX would probably make coding easier, but unfortunately complicates
> the packaging side: one must ship at
I think implementing an XQuery Full-Text engine is far beyond the
scope of Lucene.
Implementing a building block for the fulltext aspect of it would be
more manageable. Unfortunately The W3C fulltext drafts
indiscriminately mix and mingle two completely different languages
into a single l
STAX would probably make coding easier, but unfortunately complicates
the packaging side: one must ship at least two additional external
jars (stax interfaces and impl) for it to become usable. Plus, STAX
is quite underspecified (I wrote a STAX parser + serializer impl
lately), so there's r
On 12/15/05, markharw00d <[EMAIL PROTECTED]> wrote:
> At this stage I am more interested in feedback on parser design/approach
Excellent idea.
While SAX is fast, I've found callback interfaces more difficult to
deal with while generating nested object graphs... it normally
requires one to maintain
Mark,
This is very cool. When I was at TripleHop we did something very similar where
both query and results conformed to an XML Schema and we used XML over HTTP as
our main vehicle to do remote/federated searches with quick rendering with
stylesheets.
That however is the first piece of the puz
Erik Hatcher wrote:
While there have been several different topics brought up on this
thread, it seems we're diverging from the original idea. Let's
consider the most basic use case example here, and I'm making it
intentionally as concrete as possible:
A Swing client performs searches by
For normal text data, with valid unicode characters that aren't legal
XML, I'd rather have a simple escaping mechanism. Something like
backslash escaping that is easily understood. Maybe something as
simple as \00 for � (backslash followed by two hex digits).
Similar RFC for an extension to XM
Yonik wrote:
For normal text data, with valid unicode characters that aren't legal
XML, I'd rather have a simple escaping mechanism. Something like
backslash escaping that is easily understood. Maybe something as
simple as \00 for � (backslash followed by two hex digits).
I agree with your go
On Tuesday 06 December 2005 03:20, Chris Hostetter wrote:
...
>
> I can think of at least two big use cases that I'm concerned about
>
> 1) Human creation
...
>
> 2) Aliasing
>
...
Meanwhile I scratched some surface off XSL, and I think it can allow
both simplification and aliasing in one
That's basically what I'm implementing with Nux, except that the
syntax and calling conventions are a bit different, and that Lucene
analyzers can optionally be specified, which makes it a lot more
powerful (but also a bit more complicated).
Wolfgang.
On Dec 6, 2005, at 10:48 AM, Incze Laj
Maybe, I'm a bit late with this, but.
There is an ongoing effort at w3c to define a fulltext
search language that could extend their xpath and xquery
languages (which clearly makes sense).
These are the current documents on the topic:
http://www.w3.org/TR/2005/WD-xquery-full-text-20051103/
http:
> Are you aware, though, of an existing Unicode serialization/markup
> mechanism without XML's gaps?
No, but I'm not advocating anything other than XML. I'm just pointing
out a problem that needs to be solved.
> Base64 is frequently used as an escape mechanism for binary data in XML.
Yeah, but
Yonik Seeley wrote:
On 12/6/05, Erik Hatcher <[EMAIL PROTECTED]> wrote:
Also I'd be curious to see a problem with Unicode code points in XML,
if you have one handy.
The definition of valid XML 1.0 characters:
#x9 | #xA | #xD | [#x20-#xD7FF] | [#xE000-#xFFFD] | [#x1-#x10]
The simplest
On 12/6/05, Erik Hatcher <[EMAIL PROTECTED]> wrote:
> Suppose a user of the Swing or RoR client enters "some phrase", who
> is responsible for analyzing that phrase so that it is suitable for
> PhraseQuery.add()? Right?
Right, and even more. The query one specifies may be morphed into
another ty
On 12/6/05, Erik Hatcher <[EMAIL PROTECTED]> wrote:
> > example: � is not valid XML
> Can you give an example of a query that needs binary information?
It's never an absolute need - one could always work around the
problem, for sure. The issue was more a desire to be able to
represent everything
One thing I like about the possibility of XML (as opposed to other
syntax) is that I could create query templates and process them with
XSLT. And I can do this client side and also in most modern browsers.
-
To unsubscribe, e-m
> but we should also allow for the client to push the
> analysis
> responsibility to the server:
Yet another variation we could support is to use the
existing QueryParser server-side for handling
user-typed input. On the client user input is unparsed
and combined with the lower-level constraints
On Dec 5, 2005, at 9:18 PM, Yonik Seeley wrote:
If we go with XML, I think this must be solved (or else we are at the
point where we can only represent a subset of queries that lucene can
handle again).
Hmmm, maybe it's not quite so serious if the XML represents a
pre-analyzed query vs post-a
On Dec 5, 2005, at 9:07 PM, Yonik Seeley wrote:
There is one little problem with XML though... It's inability to
directly represent binary data, or even all unicode code points (no,
entities don't fix this). I use binary data in lucene to represent
some numerics, and that can't be represented
: Though, I'd be careful with proposing a variety of equivalent
: syntaxes as it may easily lead to more confusion than good. Let's
: start with one canonical syntax. If desired, other (more pleasant)
: syntaxes may then be converted to that as part of a preprocessing step.
Experience has taught
Hopefully that makes sense to someone besides just me. It's
certainly a
lot more complexity then a simple one to one mapping, but it seems
to me
like the flexability is worth spending the extra time to design/
build it.
Makes perfect sense to me, and it doesn't seem any more complex
Hopefully that makes sense to someone besides just me. It's
certainly a
lot more complexity then a simple one to one mapping, but it seems
to me
like the flexability is worth spending the extra time to design/
build it.
Makes perfect sense to me, and it doesn't seem any more complex tha
I'm extremely stoked to see this topic come up, but very sad that I didn't
have time to read any Lucene mail this past weekend. I'll have to
catchup.
First off...
: Again, we're talking machine-to-machine communication here, not human-
: machine.
: While there have been several different topic
> If we go with XML, I think this must be solved (or else we are at the
> point where we can only represent a subset of queries that lucene can
> handle again).
Hmmm, maybe it's not quite so serious if the XML represents a
pre-analyzed query vs post-analyzed.
This doesn't appear quite as simple a
On 12/5/05, Erik Hatcher <[EMAIL PROTECTED]> wrote:
> Having an XML format representing a Query
> and a mechanism to parse it into an actual Query instance makes a lot
> of sense.
Certainly. A representation is needed where it is very easy to add
support for new queries, flexible enough to handle
On Dec 5, 2005, at 5:36 PM, Erik Hatcher wrote:
While there have been several different topics brought up on this
thread, it seems we're diverging from the original idea. Let's
consider the most basic use case example here, and I'm making it
intentionally as concrete as possible:
A Swing
On Monday 05 December 2005 23:36, Erik Hatcher wrote:
> On Dec 5, 2005, at 3:18 PM, Paul Elschot wrote:
..
> >
> > boosting:
> > match:
> > moreLikeThis(percent="0.25", docId="44"):
> > compareField("contents")
> > compareField("title")
> > downgrade(demote="0.5"):
> > simpl
Erik's scenario pretty much nails it for me.
I prefer the Ant-like XML approach over a Spring one because all the
messy classnames are removed from document instances. ( I wasn't
suggesting we use either technology, merely citing them as object
assembly languages). Haven't seen HiveMind/Digest
On Dec 5, 2005, at 3:18 PM, Paul Elschot wrote:
On Monday 05 December 2005 17:56, John Haxby wrote:
Yonik Seeley wrote:
I looked into this a year ago... most scripting languages have an
emphasis on script execution speed, not script parsing speed
(which is
what we would need). The scriptin
On Monday 05 December 2005 17:56, John Haxby wrote:
> Yonik Seeley wrote:
>
> >I looked into this a year ago... most scripting languages have an
> >emphasis on script execution speed, not script parsing speed (which is
> >what we would need). The scripting languages I tried were horribly
> >slow
> is a more clear syntax have:
>
> QUERY title,date,size, content WHERE (title LIKE
> 'foo*' OR size>=0)
Let's not forget that unlike most query languages
which are boolean (things either match or they don't)
Lucene has many facilities for influencing the degree
to which matches occur.
A lot of
Yonik Seeley wrote:
I looked into this a year ago... most scripting languages have an
emphasis on script execution speed, not script parsing speed (which is
what we would need). The scripting languages I tried were horribly
slow at parsing a small script. The only one that could parse at a
rea
On 12/5/05, Mario Alejandro M. <[EMAIL PROTECTED]> wrote:
> or maybe embeb a Scripting engine? that can be more usefull, and can be used
> for easy extend other things apart of the query language..
I looked into this a year ago... most scripting languages have an
emphasis on script execution speed
>From my work in port Lucene to Delphi I think that have a LQL (Lucene Query
Language) is a valuable idea, but I consider that put it in XML is not
expresive enough for this...
Anyway, I think that going the way of SQL or OCL can be better is a more
clear syntax have:
QUERY title,date,size, c
Hi
So far I have seen no mention of XPath like queries.
Not that I am a huge fan here, but it would give a standard query
language and standard parser (jaxen saxpath maybe). The disadvantage is
wrapping stuff up as functions, as already discussed. Adding functions
is OK.
E.g.
//*[termQuery(@my:
On Sunday 04 December 2005 22:32, markharw00d wrote:
> I think I'm with Erik on this - I generally don't see end users keen to
> type anything other than "words with spaces" as queries.
I think/hope that XSL allows a simplified front end that would fit
my needs.
> I do see them commonly using G
I think I'm with Erik on this - I generally don't see end users keen to
type anything other than "words with spaces" as queries.
I do see them commonly using GUI forms with multiple inputs and behind
the scenes application code assembling the query - the same way just
about every web app in the
On Dec 4, 2005, at 11:02 AM, Paul Elschot wrote:
Are there XML editors that can limit their output to a given
stylesheet?
In that case one only needs to predefine a style sheet for queries.
Yes, there are many sophisticated XML editors. I'm not quite sure
where you're going with this thou
On Sunday 04 December 2005 15:26, Erik Hatcher wrote:
>
> On Dec 4, 2005, at 6:52 AM, Paul Elschot wrote:
> > I tried rewroting the XML query in exactly this way, with a
> > few property=.. constructs:
> >
> > boostingQuery(
> > matchQuery=moreLikeThis(
> > percentTer
On Dec 4, 2005, at 6:52 AM, Paul Elschot wrote:
I tried rewroting the XML query in exactly this way, with a
few property=.. constructs:
boostingQuery(
matchQuery=moreLikeThis(
percentTermsToMatch="0.25",
docId="44",
On Sunday 04 December 2005 05:17, Yonik Seeley wrote:
> On 12/3/05, Paul Elschot <[EMAIL PROTECTED]> wrote:
> > Indeed, this is a disadvantage of the "function call" syntax.
>
> It depends on the langage. Take Python for example:
>
> >>> def foo(a,b): print a,b
> >>> foo(1,2)
> 1 2
> >>> foo(a=1
Paul Elschot wrote:
Would it be possible to privide such a GUI automatically
(by introspection) given a set of Query classes of which objects
can be mixed to form a query?
Certainly possible - I've seen app servers with automatic GUI test
clients which can introspect an EJB interface and l
On 12/3/05, Paul Elschot <[EMAIL PROTECTED]> wrote:
> Indeed, this is a disadvantage of the "function call" syntax.
It depends on the langage. Take Python for example:
>>> def foo(a,b): print a,b
>>> foo(1,2)
1 2
>>> foo(a=1,b=2)
1 2
>>> foo(b=2,a=1)
1 2
>>>
-Yonik
Now hiring -- http://forms.c
On Saturday 03 December 2005 19:00, markharw00d wrote:
> Erik Hatcher wrote:
>
...
> parameters that tweak it's behaviour. If I don't have a query language
> that names the parameters explicitly (say, XML) I end up having to
> define what looks like a function with a long list of parameters: "li
Hi,
> From: markharw00d [mailto:[EMAIL PROTECTED]
> Re: MoreLikeThis queries.
> Yes, they can be usefully wrapped as queries (see attached simple
> example). In fact it was my attempts at bastardising QueryParser to
> support them that brought home it's limitations. I ended up with a
> subcl
Hi,
> From: Erik Hatcher [mailto:[EMAIL PROTECTED]
> > > maxQueryTerms="30">
>
> We're back to MoreLikeThis - it's not currently a Query subclass.
> How do you envision this sort of thing fitting in if it's not a Query?
But MoreLikeThis class produces a Query. It's similar to google "d
Erik Hatcher wrote:
Rest assured that human-readable query expressions aren't going away
at all. I don't think Mark even implied that.
That's right. The proposal is *not* to replace what is already there -
QueryParser will always have a useful role to play supporting the
"Google-like" que
Rest assured that human-readable query expressions aren't going away
at all. I don't think Mark even implied that. The idea is to have a
way to communicate a query electronically in a precise way that
avoids parser syntax and the awkwardness this could have with
analysis. This seems reas
On Samstag 03 Dezember 2005 03:57, Yonik Seeley wrote:
> It would be nice to resolve/fix the whole "JavaCC using an exception
> for flow control" issue too.
Did anybody have a look yet at javacc 4.0beta1, does it maybe fix that
problem?
Regards
Daniel
--
http://www.danielnaber.de
--
Just as a clarification, human-readable strings for queries are
essential for how we do things at CNET.
In addition to Mark's comments:
- standard logging mechanisms such as the access log of a app server
are readable
- easily human typable one-off queries during development and for
troubleshootin
On Dec 2, 2005, at 10:03 AM, mark harwood wrote:
There seems to be a growing gap between Lucene
functionality and the query language offered by
QueryParser (eg no support for regex queries, span
queries, "more like this", filter queries,
minNumShouldMatch etc etc).
At least with a couple of the
What ever is generating the xml could just as easily create/instantiate the
query objects.
Yes, it is easier using the existing Java objects to construct queries
but they are inappropriate when you consider the scenarios 1 to 3 I
outlined earlier (query persistence, support for clients wri
> It's worth considering why it's useful to have a
> String-representable form for queries:
Absolutely. A quickly parseable string representation for queries is
essential in so many contexts, for the reasons you brought out. Think
what SQL does for the database.
-Yonik
Now hiring -- http://form
On Friday 02 December 2005 16:03, mark harwood wrote:
> There seems to be a growing gap between Lucene
> functionality and the query language offered by
> QueryParser (eg no support for regex queries, span
> queries, "more like this", filter queries,
> minNumShouldMatch etc etc).
>
> Closing this
I don't see the value in this. What ever is generating the xml could just as
easily create/instantiate the query objects.
I would much rather see the query parser migrated to an internal parser (that
would be easier to maintain), and develop a syntax that allowed easier use of
the most common/p
81 matches
Mail list logo