2010/8/30 Slavek Tecl <[email protected]>:
>
>  Sweet, it works, so the only problem was in my way of adding a property to 
> the lucene Document.
>
>  Many thanks for your help!Cheers,Slavek

You're welcome. thx for the feedback,

Regards Ard

>
>
>
>> From: [email protected]
>> To: [email protected]
>> Subject: RE: Searching for binary values
>> Date: Mon, 30 Aug 2010 11:20:22 +0200
>>
>>
>> Hi Ard,
>> now I know what the problem is, I just thought that the call to 
>> doc.add(createFullTextField(...)) would
>> be sufficient.
>> The query I've been using is more complicated but for testing I used really 
>> simple one:
>> SELECT child.* FROM [customns:customtype] WHERE 
>> CONTAINS(child.binaryproperty, 'value').
>> For the future, I'm sure the query will contain non-binary properties as 
>> well so a query like
>> SELECT child.* FROM [customns:customtype] WHERE 
>> CONTAINS(child.binaryproperty, 'value') OR CONTAINS(child.stringproperty, 
>> 'foo')
>> would be used too.
>> Anyway, thanks for pointing me out the right direction, I'll try to 
>> implement the stuff and see if it's working correctly.
>> Best regards,
>> Slavek
>>
>>
>> > Date: Mon, 30 Aug 2010 10:28:39 +0200
>> > Subject: Re: Searching for binary values
>> > From: [email protected]
>> > To: [email protected]
>> >
>> > Hello,
>> >
>> > 2010/8/30 Slavek Tecl <[email protected]>:
>> > >
>> > >
>> > > once again in HTML...
>> > >
>> > > here comes the addBinaryValue method body:...
>> > >
>> > >
>> > > //standard way of indexing
>> > > String jcrData = mappings.getPrefix(Name.NS_JCR_URI) + ":data";
>> > > if (jcrData.equals(fieldName)) {
>> > > InternalValue type = getValue(NameConstants.JCR_MIMETYPE);
>> > > if (type != null) {
>> > > Metadata metadata = new Metadata();
>> > > metadata.set(Metadata.CONTENT_TYPE, type.getString());
>> > > // jcr:encoding is not mandatory
>> > > InternalValue encoding = getValue(NameConstants.JCR_ENCODING);
>> > > if (encoding != null) {
>> > > metadata.set(Metadata.CONTENT_ENCODING, encoding.getString());
>> > > }
>> > > doc.add(createFulltextField(internalValue, metadata));
>> > > }
>> > > } else {
>> > > //everything else gets indexed as well
>> > > MimeTypes gk = new MimeTypes();
>> > > MimeType mimeType = gk.getMimeType(internalValue.getStream());
>> > >
>> > > Metadata metadata = new Metadata();
>> > > metadata.set(Metadata.CONTENT_TYPE, mimeType.getName());
>> > > doc.add(createFulltextField(internalValue, metadata));
>> > > }
>> >
>> > ok, but as I said in one of my earlier mails, binaries are not indexed
>> > on property level, only on nodescope level. Your code doesn't index on
>> > property lever either, see createFulltextField. I though hope I did
>> > understand you first mail correctly: You want to specifically search
>> > in the binary property *only* right? You could post me the xpath that
>> > you want to be executed.
>> >
>> > Anyway,
>> >
>> > you should also add the indexed binary as a non stored (I recommend
>> > non stored) property, thus something like the method
>> >
>> > protected void addStringValue(Document doc, String fieldName,
>> > Object internalValue, boolean tokenized,
>> > boolean includeInNodeIndex, float boost,
>> > boolean useInExcerpt) {
>> >
>> > does. However, you must realize that binaries get indexed in
>> > background lazily by default. I'd recommend to not call
>> >
>> > doc.add(createFulltextField(internalValue, metadata));
>> >
>> > but call
>> >
>> > doc.add(createFulltextField(fieldName, internalValue, metadata));
>> >
>> > add this new createFulltextField method, and create your own
>> > LazyTextExtractorField class also having an arg for fieldName.
>> >
>> > Then, you need to also add the extracted analysed text as a property.
>> >
>> > Regards Ard
>> >
>> > >
>> > >
>> > >
>> > > and here we have my custom parser (and I can see it's being started 
>> > > everytime the binary value with my custom mime type is added):
>> > >
>> > > XHTMLContentHandler xhtml = new XHTMLContentHandler(handler, metadata);
>> > > xhtml.startDocument();
>> > > ...fetch keywords...
>> > > for(String value: keywords) {
>> > > xhtml.characters(value);
>> > > xhtml.characters(" ");
>> > > }
>> > > xhtml.endDocument();
>> > > ...
>> > >
>> > >
>> > >
>> > >
>> > >
>> > >
>> > >
>> > > ----------------------------------------
>> > >> Date: Mon, 30 Aug 2010 09:52:28 +0200
>> > >> Subject: Re: Searching for binary values
>> > >> From: [email protected]
>> > >> To: [email protected]
>> > >>
>> > >> 2010/8/30 Slavek Tecl :
>> > >> >
>> > >> > Bloody hotmail, screwed my awesome formatting ;)Hope it's ok now.
>> > >>
>> > >> Hmmmm...not really
>> > >>
>> > >> >
>> > >> > here comes the addBinaryValue method body:...//standard way of 
>> > >> > indexingString jcrData = mappings.getPrefix(Name.NS_JCR_URI) + 
>> > >> > ":data";if (jcrData.equals(fieldName)) { InternalValue type = 
>> > >> > getValue(NameConstants.JCR_MIMETYPE); if (type != null) { Metadata 
>> > >> > metadata = new Metadata(); metadata.set(Metadata.CONTENT_TYPE, 
>> > >> > type.getString()); // jcr:encoding is not mandatory InternalValue 
>> > >> > encoding = getValue(NameConstants.JCR_ENCODING); if (encoding != 
>> > >> > null) { metadata.set(Metadata.CONTENT_ENCODING, 
>> > >> > encoding.getString()); } doc.add(createFulltextField(internalValue, 
>> > >> > metadata)); }} else { //everything else gets indexed as well 
>> > >> > MimeTypes gk = new MimeTypes(); MimeType mimeType = 
>> > >> > gk.getMimeType(internalValue.getStream()); Metadata metadata = new 
>> > >> > Metadata(); metadata.set(Metadata.CONTENT_TYPE, mimeType.getName()); 
>> > >> > doc.add(createFulltextField(internalValue, metadata));}...
>> > >> >
>> > >> > and here we have my custom parser (and I can see it's being started 
>> > >> > everytime the binary value with my custom mime type is 
>> > >> > added):XHTMLContentHandler xhtml = new XHTMLContentHandler(handler, 
>> > >> > metadata);xhtml.startDocument();...fetch keywords...for(String value: 
>> > >> > keywords) { xhtml.characters(value); xhtml.characters(" 
>> > >> > ");}xhtml.endDocument();...
>> > >> >> Date: Mon, 30 Aug 2010 09:31:47 +0200
>> > >> >> Subject: Re: Searching for binary values
>> > >> >> From: [email protected]
>> > >> >> To: [email protected]
>> > >> >>
>> > >> >> Slavek,
>> > >> >>
>> > >> >> I am no computer :-) Is there a way you format this is little to 
>> > >> >> human
>> > >> >> understandable kind of thing?
>> > >> >>
>> > >> >>
>> > >> >> 2010/8/30 Slavek Tecl :
>> > >> >>>
>> > >> >>> All right, here comes the addBinaryValue method body: ... 
>> > >> >>> //standard way of indexing String jcrData = 
>> > >> >>> mappings.getPrefix(Name.NS_JCR_URI) + ":data"; if 
>> > >> >>> (jcrData.equals(fieldName)) { InternalValue type = 
>> > >> >>> getValue(NameConstants.JCR_MIMETYPE); if (type != null) { Metadata 
>> > >> >>> metadata = new Metadata(); metadata.set(Metadata.CONTENT_TYPE, 
>> > >> >>> type.getString());
>> > >> >>> // jcr:encoding is not mandatory InternalValue encoding = 
>> > >> >>> getValue(NameConstants.JCR_ENCODING); if (encoding != null) { 
>> > >> >>> metadata.set(Metadata.CONTENT_ENCODING, encoding.getString()); }
>> > >> >>> doc.add(createFulltextField(internalValue, metadata)); } } else { 
>> > >> >>> //everything else gets indexed as well MimeTypes gk = new 
>> > >> >>> MimeTypes(); MimeType mimeType = 
>> > >> >>> gk.getMimeType(internalValue.getStream());
>> > >> >>> Metadata metadata = new Metadata(); 
>> > >> >>> metadata.set(Metadata.CONTENT_TYPE, mimeType.getName()); 
>> > >> >>> doc.add(createFulltextField(internalValue, metadata)); } ...
>> > >> >>> my custom parser leverages XMLContentHandler like this (and I can 
>> > >> >>> see it's being started everytime the binary value with my custom 
>> > >> >>> mime type is added):
>> > >> >>> ...XHTMLContentHandler xhtml = new XHTMLContentHandler(handler, 
>> > >> >>> metadata);xhtml.startDocument();... for(String value: keywords) { 
>> > >> >>> xhtml.characters(value); xhtml.characters(" "); 
>> > >> >>> //xhtml.element("p", value); }xhtml.endDocument();...
>> > >> >>>> Date: Mon, 30 Aug 2010 09:12:16 +0200
>> > >> >>>> Subject: Re: Searching for binary values
>> > >> >>>> From: [email protected]
>> > >> >>>> To: [email protected]
>> > >> >>>>
>> > >> >>>> 2010/8/27 Slavek Tecl :
>> > >> >>>>> In my case the addBinaryValue has been overriden in my custom 
>> > >> >>>>> class so I'm adding this field to the document as well.
>> > >> >>>>
>> > >> >>>> Is it possible that you made some error in this? I can't judge it 
>> > >> >>>> without code
>> > >> >>>>
>> > >> >>>> Regards Ard
>> > >> >>>>
>> > >> >>>>>
>> > >> >>>>>> Date: Fri, 27 Aug 2010 17:16:56 +0200
>> > >> >>>>>> Subject: Re: Searching for binary values
>> > >> >>>>>> From: [email protected]
>> > >> >>>>>> To: [email protected]
>> > >> >>>>>>
>> > >> >>>>>> 2010/8/27 Slavek Tecl :
>> > >> >>>>>>>
>> > >> >>>>>>> I'm looking for a clarification how the query is processed in 
>> > >> >>>>>>> my customized jackrabbit instance. In my case the NodeIndexer 
>> > >> >>>>>>> is subclassed so it can add the binary value to the indexed 
>> > >> >>>>>>> Document even if it does not have nt:resource type. Then Tika 
>> > >> >>>>>>> has been customized with my mimetype so the parser is able to 
>> > >> >>>>>>> recognize the binary stream through it's magic and of course 
>> > >> >>>>>>> the tika's Parser object was implemented to support the custom 
>> > >> >>>>>>> binary stream to extract words from it.If I run a query on 
>> > >> >>>>>>> nt:resource nodes it correctly returns files including the 
>> > >> >>>>>>> searched word as expected but when I invoke a similar query on 
>> > >> >>>>>>> a binary property (and the content of this binary property is 
>> > >> >>>>>>> exactly the type of the stream Tika can parse) it does not 
>> > >> >>>>>>> return anything - is there a way out?
>> > >> >>>>>>
>> > >> >>>>>>
>> > >> >>>>>> Binary properties are only indexed on nodescope level, not on 
>> > >> >>>>>> property level.
>> > >> >>>>>>
>> > >> >>>>>> See protected void addBinaryValue(Document doc,
>> > >> >>>>>> String fieldName,
>> > >> >>>>>> InternalValue internalValue) {
>> > >> >>>>>>
>> > >> >>>>>> and then specifically doc.add(createFulltextField(internalValue, 
>> > >> >>>>>> metadata));
>> > >> >>>>>>
>> > >> >>>>>> in jr NodeIndexer
>> > >> >>>>>>
>> > >> >>>>>> Regards Ard
>> > >> >>>>>
>> > >> >>>
>> > >> >
>> > >
>> > >
>>
>

Reply via email to