On Tue, 2004-01-20 at 04:47, Stefano Mazzocchi wrote:

> On 19 Jan 2004, at 17:00, Michael Oliver wrote:
> 
> >
> > On Mon, 2004-01-19 at 08:31, Stefano Mazzocchi wrote:
> >
> >> On 19 Jan 2004, at 15:12, Michael Oliver wrote:
> >>
> >>> On Mon, 2004-01-19 at 06:32, Stefano Mazzocchi wrote:
> >>>
> >>>> I personally wouldn't know how to make use of a query against full
> >>>> text
> >>>> *and* properties. This is because such a query looks weird to me:
> >>>> full-text is the least structure possible (get me everything but I
> >>>> don't know where) while properties tent to be very much structured
> >>>> (last modified time, author, and so on).
> >>>>
> >>>> There is a decades long discussion on what is data and what is
> >>>> metadata
> >>>> and I don't want to touch that with a stick, but I think that if you
> >>>> need to do full-text search on your metadata there is something 
> >>>> wrong.
> >>>
> >>> Stefano with all due respect, there is nothing wrong with a full-text
> >>> search on metadata because metadata in this case can be any 
> >>> properties
> >>> of any of the resources in the repository and that meta data can be
> >>> free
> >>> form text.
> >>
> >> Well, this is because I try to avoid having metadata that can be free
> >> form text, but as I said, this is my way and I don't want to impose it
> >> on others.
> >>
> >
> > Well as long as we CAN have properties that are free form text, we 
> > can't
> > avoid them.
> 
> Very true.
> 
> >>> consider a search query like
> >>>
> >>> doctype="memo" and description contains "Fire Stefano" and contents
> >>> contains "January"
> >>
> >> I would think that this schema is not appropriate. a description is
> >> part of content, not metadata. But it's like arguing about whether
> >> something should be an element or an attribute... sometimes it's just
> >> subjective.
> >>
> >
> > No, I don't think so.  Metadata IS data about data, eh?
> 
> Right, and for this very reason metadata is data.... this "about" is 
> the key: it's semantic meaning is relative, not absolute.
> 
> > And a
> > "description" can't be anything else, you certainly don't think a 
> > binary
> > file stored in Slide (content) includes the "description" of the
> > content, which is text, is part of the content?
> 
> eheh, we can agree to disgree then: my content could be something like
> 
>   <image>
>    <bits>
>     ... base64-encoded bitstream of the image ...
>    </bits>
>    <description>
>     this is an image about a horse
>    </description>
>   <image>
> 
> or could be GIF image (which *does* have optional text-based 
> descriptions at the end of it).
> 
> In JCR we have nodes and properties but we don't specify that nodes are 
> data and properties are metadata. 

Well we are talking about Slide and you CAN store anything you want
including binary data, wrapped in XML and version that as
"content".....and in your example you would still be using a full text
search for the indexable values.  

Whether you put the description in your xml, the description is embedded
in the GIF or whereever, I would argue however that your "description"
IS still meta data, or attributes of the content whether the content is
a descreet file or <bits>...</bits>.   In <bits type="base64"> i doubt
you would call "base64"  "content".  

If a user decides they want the description in properties disambiguated
by namepsace qualifiers or to store binary data wrapped in XML, is does
not matter, it needs to be indexed and searchable and with that I am
sure we agree.

> There are some properties that are 
> read-only and auto-generated by the containers (for example things 
> last-modified-time, or creation-time) but they could be things like 
> width/height for images, autogenerated by the notion that the nodetype 
> is an image and for this reason the container reacts on this and 
> extracts the information.
> 
> Anyway, this is an accademic point as I do agree that if people store 
> strings in properties, we need a way to search them using full-text.
> 
> > Slide/WebDAV properties
> > that can be created by and saved by the user is all about 
> > categorization
> > of and description of the content, almost for the express purpose of
> > being able to find the right content and therefore should be very much
> > part of the search mechanism.
> 
> Very much agreed, yes.
> 
> >>> doctype and description are properties with string values that would 
> >>> be
> >>> indexed and matched with the same index as the contents.
> >>
> >> So, are you suggesting that we index everything? [not critical, just
> >> curious]
> >
> >
> > Absolutely, if somone wants to save some piece of information they will
> > want to retrieve it and search for it.
> 
> Ok, good, it seems we have a direction now.
> 
> --
> Stefano.
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
> 

Reply via email to