Sweet, it works, so the only problem was in my way of adding a property to the 
lucene Document.

 Many thanks for your help!Cheers,Slavek



> From: [email protected]
> To: [email protected]
> Subject: RE: Searching for binary values
> Date: Mon, 30 Aug 2010 11:20:22 +0200
> 
> 
> Hi Ard, 
> now I know what the problem is, I just thought that the call to 
> doc.add(createFullTextField(...)) would
> be sufficient.
> The query I've been using is more complicated but for testing I used really 
> simple one:
> SELECT child.* FROM [customns:customtype] WHERE 
> CONTAINS(child.binaryproperty, 'value').
> For the future, I'm sure the query will contain non-binary properties as well 
> so a query like
> SELECT child.* FROM [customns:customtype] WHERE 
> CONTAINS(child.binaryproperty, 'value') OR CONTAINS(child.stringproperty, 
> 'foo')
> would be used too.
> Anyway, thanks for pointing me out the right direction, I'll try to implement 
> the stuff and see if it's working correctly.
> Best regards,
> Slavek
> 
> 
> > Date: Mon, 30 Aug 2010 10:28:39 +0200
> > Subject: Re: Searching for binary values
> > From: [email protected]
> > To: [email protected]
> > 
> > Hello,
> > 
> > 2010/8/30 Slavek Tecl <[email protected]>:
> > >
> > >
> > > once again in HTML...
> > >
> > > here comes the addBinaryValue method body:...
> > >
> > >
> > > //standard way of indexing
> > > String jcrData = mappings.getPrefix(Name.NS_JCR_URI) + ":data";
> > > if (jcrData.equals(fieldName)) {
> > > InternalValue type = getValue(NameConstants.JCR_MIMETYPE);
> > > if (type != null) {
> > > Metadata metadata = new Metadata();
> > > metadata.set(Metadata.CONTENT_TYPE, type.getString());
> > > // jcr:encoding is not mandatory
> > > InternalValue encoding = getValue(NameConstants.JCR_ENCODING);
> > > if (encoding != null) {
> > > metadata.set(Metadata.CONTENT_ENCODING, encoding.getString());
> > > }
> > > doc.add(createFulltextField(internalValue, metadata));
> > > }
> > > } else {
> > > //everything else gets indexed as well
> > > MimeTypes gk = new MimeTypes();
> > > MimeType mimeType = gk.getMimeType(internalValue.getStream());
> > >
> > > Metadata metadata = new Metadata();
> > > metadata.set(Metadata.CONTENT_TYPE, mimeType.getName());
> > > doc.add(createFulltextField(internalValue, metadata));
> > > }
> > 
> > ok, but as I said in one of my earlier mails, binaries are not indexed
> > on property level, only on nodescope level. Your code doesn't index on
> > property lever either, see createFulltextField. I though hope I did
> > understand you first mail correctly: You want to specifically search
> > in the binary property *only* right? You could post me the xpath that
> > you want to be executed.
> > 
> > Anyway,
> > 
> > you should also add the indexed binary as a non stored (I recommend
> > non stored) property, thus something like the method
> > 
> > protected void addStringValue(Document doc, String fieldName,
> > Object internalValue, boolean tokenized,
> > boolean includeInNodeIndex, float boost,
> > boolean useInExcerpt) {
> > 
> > does. However, you must realize that binaries get indexed in
> > background lazily by default. I'd recommend to not call
> > 
> > doc.add(createFulltextField(internalValue, metadata));
> > 
> > but call
> > 
> > doc.add(createFulltextField(fieldName, internalValue, metadata));
> > 
> > add this new createFulltextField method, and create your own
> > LazyTextExtractorField class also having an arg for fieldName.
> > 
> > Then, you need to also add the extracted analysed text as a property.
> > 
> > Regards Ard
> > 
> > >
> > >
> > >
> > > and here we have my custom parser (and I can see it's being started 
> > > everytime the binary value with my custom mime type is added):
> > >
> > > XHTMLContentHandler xhtml = new XHTMLContentHandler(handler, metadata);
> > > xhtml.startDocument();
> > > ...fetch keywords...
> > > for(String value: keywords) {
> > > xhtml.characters(value);
> > > xhtml.characters(" ");
> > > }
> > > xhtml.endDocument();
> > > ...
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > > ----------------------------------------
> > >> Date: Mon, 30 Aug 2010 09:52:28 +0200
> > >> Subject: Re: Searching for binary values
> > >> From: [email protected]
> > >> To: [email protected]
> > >>
> > >> 2010/8/30 Slavek Tecl :
> > >> >
> > >> > Bloody hotmail, screwed my awesome formatting ;)Hope it's ok now.
> > >>
> > >> Hmmmm...not really
> > >>
> > >> >
> > >> > here comes the addBinaryValue method body:...//standard way of 
> > >> > indexingString jcrData = mappings.getPrefix(Name.NS_JCR_URI) + 
> > >> > ":data";if (jcrData.equals(fieldName)) { InternalValue type = 
> > >> > getValue(NameConstants.JCR_MIMETYPE); if (type != null) { Metadata 
> > >> > metadata = new Metadata(); metadata.set(Metadata.CONTENT_TYPE, 
> > >> > type.getString()); // jcr:encoding is not mandatory InternalValue 
> > >> > encoding = getValue(NameConstants.JCR_ENCODING); if (encoding != null) 
> > >> > { metadata.set(Metadata.CONTENT_ENCODING, encoding.getString()); } 
> > >> > doc.add(createFulltextField(internalValue, metadata)); }} else { 
> > >> > //everything else gets indexed as well MimeTypes gk = new MimeTypes(); 
> > >> > MimeType mimeType = gk.getMimeType(internalValue.getStream()); 
> > >> > Metadata metadata = new Metadata(); 
> > >> > metadata.set(Metadata.CONTENT_TYPE, mimeType.getName()); 
> > >> > doc.add(createFulltextField(internalValue, metadata));}...
> > >> >
> > >> > and here we have my custom parser (and I can see it's being started 
> > >> > everytime the binary value with my custom mime type is 
> > >> > added):XHTMLContentHandler xhtml = new XHTMLContentHandler(handler, 
> > >> > metadata);xhtml.startDocument();...fetch keywords...for(String value: 
> > >> > keywords) { xhtml.characters(value); xhtml.characters(" 
> > >> > ");}xhtml.endDocument();...
> > >> >> Date: Mon, 30 Aug 2010 09:31:47 +0200
> > >> >> Subject: Re: Searching for binary values
> > >> >> From: [email protected]
> > >> >> To: [email protected]
> > >> >>
> > >> >> Slavek,
> > >> >>
> > >> >> I am no computer :-) Is there a way you format this is little to human
> > >> >> understandable kind of thing?
> > >> >>
> > >> >>
> > >> >> 2010/8/30 Slavek Tecl :
> > >> >>>
> > >> >>> All right, here comes the addBinaryValue method body: ... //standard 
> > >> >>> way of indexing String jcrData = mappings.getPrefix(Name.NS_JCR_URI) 
> > >> >>> + ":data"; if (jcrData.equals(fieldName)) { InternalValue type = 
> > >> >>> getValue(NameConstants.JCR_MIMETYPE); if (type != null) { Metadata 
> > >> >>> metadata = new Metadata(); metadata.set(Metadata.CONTENT_TYPE, 
> > >> >>> type.getString());
> > >> >>> // jcr:encoding is not mandatory InternalValue encoding = 
> > >> >>> getValue(NameConstants.JCR_ENCODING); if (encoding != null) { 
> > >> >>> metadata.set(Metadata.CONTENT_ENCODING, encoding.getString()); }
> > >> >>> doc.add(createFulltextField(internalValue, metadata)); } } else { 
> > >> >>> //everything else gets indexed as well MimeTypes gk = new 
> > >> >>> MimeTypes(); MimeType mimeType = 
> > >> >>> gk.getMimeType(internalValue.getStream());
> > >> >>> Metadata metadata = new Metadata(); 
> > >> >>> metadata.set(Metadata.CONTENT_TYPE, mimeType.getName()); 
> > >> >>> doc.add(createFulltextField(internalValue, metadata)); } ...
> > >> >>> my custom parser leverages XMLContentHandler like this (and I can 
> > >> >>> see it's being started everytime the binary value with my custom 
> > >> >>> mime type is added):
> > >> >>> ...XHTMLContentHandler xhtml = new XHTMLContentHandler(handler, 
> > >> >>> metadata);xhtml.startDocument();... for(String value: keywords) { 
> > >> >>> xhtml.characters(value); xhtml.characters(" "); //xhtml.element("p", 
> > >> >>> value); }xhtml.endDocument();...
> > >> >>>> Date: Mon, 30 Aug 2010 09:12:16 +0200
> > >> >>>> Subject: Re: Searching for binary values
> > >> >>>> From: [email protected]
> > >> >>>> To: [email protected]
> > >> >>>>
> > >> >>>> 2010/8/27 Slavek Tecl :
> > >> >>>>> In my case the addBinaryValue has been overriden in my custom 
> > >> >>>>> class so I'm adding this field to the document as well.
> > >> >>>>
> > >> >>>> Is it possible that you made some error in this? I can't judge it 
> > >> >>>> without code
> > >> >>>>
> > >> >>>> Regards Ard
> > >> >>>>
> > >> >>>>>
> > >> >>>>>> Date: Fri, 27 Aug 2010 17:16:56 +0200
> > >> >>>>>> Subject: Re: Searching for binary values
> > >> >>>>>> From: [email protected]
> > >> >>>>>> To: [email protected]
> > >> >>>>>>
> > >> >>>>>> 2010/8/27 Slavek Tecl :
> > >> >>>>>>>
> > >> >>>>>>> I'm looking for a clarification how the query is processed in my 
> > >> >>>>>>> customized jackrabbit instance. In my case the NodeIndexer is 
> > >> >>>>>>> subclassed so it can add the binary value to the indexed 
> > >> >>>>>>> Document even if it does not have nt:resource type. Then Tika 
> > >> >>>>>>> has been customized with my mimetype so the parser is able to 
> > >> >>>>>>> recognize the binary stream through it's magic and of course the 
> > >> >>>>>>> tika's Parser object was implemented to support the custom 
> > >> >>>>>>> binary stream to extract words from it.If I run a query on 
> > >> >>>>>>> nt:resource nodes it correctly returns files including the 
> > >> >>>>>>> searched word as expected but when I invoke a similar query on a 
> > >> >>>>>>> binary property (and the content of this binary property is 
> > >> >>>>>>> exactly the type of the stream Tika can parse) it does not 
> > >> >>>>>>> return anything - is there a way out?
> > >> >>>>>>
> > >> >>>>>>
> > >> >>>>>> Binary properties are only indexed on nodescope level, not on 
> > >> >>>>>> property level.
> > >> >>>>>>
> > >> >>>>>> See protected void addBinaryValue(Document doc,
> > >> >>>>>> String fieldName,
> > >> >>>>>> InternalValue internalValue) {
> > >> >>>>>>
> > >> >>>>>> and then specifically doc.add(createFulltextField(internalValue, 
> > >> >>>>>> metadata));
> > >> >>>>>>
> > >> >>>>>> in jr NodeIndexer
> > >> >>>>>>
> > >> >>>>>> Regards Ard
> > >> >>>>>
> > >> >>>
> > >> >
> > >
> > >
> 
                                          

Reply via email to