Sweet, it works, so the only problem was in my way of adding a property to the lucene Document.
Many thanks for your help!Cheers,Slavek > From: [email protected] > To: [email protected] > Subject: RE: Searching for binary values > Date: Mon, 30 Aug 2010 11:20:22 +0200 > > > Hi Ard, > now I know what the problem is, I just thought that the call to > doc.add(createFullTextField(...)) would > be sufficient. > The query I've been using is more complicated but for testing I used really > simple one: > SELECT child.* FROM [customns:customtype] WHERE > CONTAINS(child.binaryproperty, 'value'). > For the future, I'm sure the query will contain non-binary properties as well > so a query like > SELECT child.* FROM [customns:customtype] WHERE > CONTAINS(child.binaryproperty, 'value') OR CONTAINS(child.stringproperty, > 'foo') > would be used too. > Anyway, thanks for pointing me out the right direction, I'll try to implement > the stuff and see if it's working correctly. > Best regards, > Slavek > > > > Date: Mon, 30 Aug 2010 10:28:39 +0200 > > Subject: Re: Searching for binary values > > From: [email protected] > > To: [email protected] > > > > Hello, > > > > 2010/8/30 Slavek Tecl <[email protected]>: > > > > > > > > > once again in HTML... > > > > > > here comes the addBinaryValue method body:... > > > > > > > > > //standard way of indexing > > > String jcrData = mappings.getPrefix(Name.NS_JCR_URI) + ":data"; > > > if (jcrData.equals(fieldName)) { > > > InternalValue type = getValue(NameConstants.JCR_MIMETYPE); > > > if (type != null) { > > > Metadata metadata = new Metadata(); > > > metadata.set(Metadata.CONTENT_TYPE, type.getString()); > > > // jcr:encoding is not mandatory > > > InternalValue encoding = getValue(NameConstants.JCR_ENCODING); > > > if (encoding != null) { > > > metadata.set(Metadata.CONTENT_ENCODING, encoding.getString()); > > > } > > > doc.add(createFulltextField(internalValue, metadata)); > > > } > > > } else { > > > //everything else gets indexed as well > > > MimeTypes gk = new MimeTypes(); > > > MimeType mimeType = gk.getMimeType(internalValue.getStream()); > > > > > > Metadata metadata = new Metadata(); > > > metadata.set(Metadata.CONTENT_TYPE, mimeType.getName()); > > > doc.add(createFulltextField(internalValue, metadata)); > > > } > > > > ok, but as I said in one of my earlier mails, binaries are not indexed > > on property level, only on nodescope level. Your code doesn't index on > > property lever either, see createFulltextField. I though hope I did > > understand you first mail correctly: You want to specifically search > > in the binary property *only* right? You could post me the xpath that > > you want to be executed. > > > > Anyway, > > > > you should also add the indexed binary as a non stored (I recommend > > non stored) property, thus something like the method > > > > protected void addStringValue(Document doc, String fieldName, > > Object internalValue, boolean tokenized, > > boolean includeInNodeIndex, float boost, > > boolean useInExcerpt) { > > > > does. However, you must realize that binaries get indexed in > > background lazily by default. I'd recommend to not call > > > > doc.add(createFulltextField(internalValue, metadata)); > > > > but call > > > > doc.add(createFulltextField(fieldName, internalValue, metadata)); > > > > add this new createFulltextField method, and create your own > > LazyTextExtractorField class also having an arg for fieldName. > > > > Then, you need to also add the extracted analysed text as a property. > > > > Regards Ard > > > > > > > > > > > > > > and here we have my custom parser (and I can see it's being started > > > everytime the binary value with my custom mime type is added): > > > > > > XHTMLContentHandler xhtml = new XHTMLContentHandler(handler, metadata); > > > xhtml.startDocument(); > > > ...fetch keywords... > > > for(String value: keywords) { > > > xhtml.characters(value); > > > xhtml.characters(" "); > > > } > > > xhtml.endDocument(); > > > ... > > > > > > > > > > > > > > > > > > > > > > > > ---------------------------------------- > > >> Date: Mon, 30 Aug 2010 09:52:28 +0200 > > >> Subject: Re: Searching for binary values > > >> From: [email protected] > > >> To: [email protected] > > >> > > >> 2010/8/30 Slavek Tecl : > > >> > > > >> > Bloody hotmail, screwed my awesome formatting ;)Hope it's ok now. > > >> > > >> Hmmmm...not really > > >> > > >> > > > >> > here comes the addBinaryValue method body:...//standard way of > > >> > indexingString jcrData = mappings.getPrefix(Name.NS_JCR_URI) + > > >> > ":data";if (jcrData.equals(fieldName)) { InternalValue type = > > >> > getValue(NameConstants.JCR_MIMETYPE); if (type != null) { Metadata > > >> > metadata = new Metadata(); metadata.set(Metadata.CONTENT_TYPE, > > >> > type.getString()); // jcr:encoding is not mandatory InternalValue > > >> > encoding = getValue(NameConstants.JCR_ENCODING); if (encoding != null) > > >> > { metadata.set(Metadata.CONTENT_ENCODING, encoding.getString()); } > > >> > doc.add(createFulltextField(internalValue, metadata)); }} else { > > >> > //everything else gets indexed as well MimeTypes gk = new MimeTypes(); > > >> > MimeType mimeType = gk.getMimeType(internalValue.getStream()); > > >> > Metadata metadata = new Metadata(); > > >> > metadata.set(Metadata.CONTENT_TYPE, mimeType.getName()); > > >> > doc.add(createFulltextField(internalValue, metadata));}... > > >> > > > >> > and here we have my custom parser (and I can see it's being started > > >> > everytime the binary value with my custom mime type is > > >> > added):XHTMLContentHandler xhtml = new XHTMLContentHandler(handler, > > >> > metadata);xhtml.startDocument();...fetch keywords...for(String value: > > >> > keywords) { xhtml.characters(value); xhtml.characters(" > > >> > ");}xhtml.endDocument();... > > >> >> Date: Mon, 30 Aug 2010 09:31:47 +0200 > > >> >> Subject: Re: Searching for binary values > > >> >> From: [email protected] > > >> >> To: [email protected] > > >> >> > > >> >> Slavek, > > >> >> > > >> >> I am no computer :-) Is there a way you format this is little to human > > >> >> understandable kind of thing? > > >> >> > > >> >> > > >> >> 2010/8/30 Slavek Tecl : > > >> >>> > > >> >>> All right, here comes the addBinaryValue method body: ... //standard > > >> >>> way of indexing String jcrData = mappings.getPrefix(Name.NS_JCR_URI) > > >> >>> + ":data"; if (jcrData.equals(fieldName)) { InternalValue type = > > >> >>> getValue(NameConstants.JCR_MIMETYPE); if (type != null) { Metadata > > >> >>> metadata = new Metadata(); metadata.set(Metadata.CONTENT_TYPE, > > >> >>> type.getString()); > > >> >>> // jcr:encoding is not mandatory InternalValue encoding = > > >> >>> getValue(NameConstants.JCR_ENCODING); if (encoding != null) { > > >> >>> metadata.set(Metadata.CONTENT_ENCODING, encoding.getString()); } > > >> >>> doc.add(createFulltextField(internalValue, metadata)); } } else { > > >> >>> //everything else gets indexed as well MimeTypes gk = new > > >> >>> MimeTypes(); MimeType mimeType = > > >> >>> gk.getMimeType(internalValue.getStream()); > > >> >>> Metadata metadata = new Metadata(); > > >> >>> metadata.set(Metadata.CONTENT_TYPE, mimeType.getName()); > > >> >>> doc.add(createFulltextField(internalValue, metadata)); } ... > > >> >>> my custom parser leverages XMLContentHandler like this (and I can > > >> >>> see it's being started everytime the binary value with my custom > > >> >>> mime type is added): > > >> >>> ...XHTMLContentHandler xhtml = new XHTMLContentHandler(handler, > > >> >>> metadata);xhtml.startDocument();... for(String value: keywords) { > > >> >>> xhtml.characters(value); xhtml.characters(" "); //xhtml.element("p", > > >> >>> value); }xhtml.endDocument();... > > >> >>>> Date: Mon, 30 Aug 2010 09:12:16 +0200 > > >> >>>> Subject: Re: Searching for binary values > > >> >>>> From: [email protected] > > >> >>>> To: [email protected] > > >> >>>> > > >> >>>> 2010/8/27 Slavek Tecl : > > >> >>>>> In my case the addBinaryValue has been overriden in my custom > > >> >>>>> class so I'm adding this field to the document as well. > > >> >>>> > > >> >>>> Is it possible that you made some error in this? I can't judge it > > >> >>>> without code > > >> >>>> > > >> >>>> Regards Ard > > >> >>>> > > >> >>>>> > > >> >>>>>> Date: Fri, 27 Aug 2010 17:16:56 +0200 > > >> >>>>>> Subject: Re: Searching for binary values > > >> >>>>>> From: [email protected] > > >> >>>>>> To: [email protected] > > >> >>>>>> > > >> >>>>>> 2010/8/27 Slavek Tecl : > > >> >>>>>>> > > >> >>>>>>> I'm looking for a clarification how the query is processed in my > > >> >>>>>>> customized jackrabbit instance. In my case the NodeIndexer is > > >> >>>>>>> subclassed so it can add the binary value to the indexed > > >> >>>>>>> Document even if it does not have nt:resource type. Then Tika > > >> >>>>>>> has been customized with my mimetype so the parser is able to > > >> >>>>>>> recognize the binary stream through it's magic and of course the > > >> >>>>>>> tika's Parser object was implemented to support the custom > > >> >>>>>>> binary stream to extract words from it.If I run a query on > > >> >>>>>>> nt:resource nodes it correctly returns files including the > > >> >>>>>>> searched word as expected but when I invoke a similar query on a > > >> >>>>>>> binary property (and the content of this binary property is > > >> >>>>>>> exactly the type of the stream Tika can parse) it does not > > >> >>>>>>> return anything - is there a way out? > > >> >>>>>> > > >> >>>>>> > > >> >>>>>> Binary properties are only indexed on nodescope level, not on > > >> >>>>>> property level. > > >> >>>>>> > > >> >>>>>> See protected void addBinaryValue(Document doc, > > >> >>>>>> String fieldName, > > >> >>>>>> InternalValue internalValue) { > > >> >>>>>> > > >> >>>>>> and then specifically doc.add(createFulltextField(internalValue, > > >> >>>>>> metadata)); > > >> >>>>>> > > >> >>>>>> in jr NodeIndexer > > >> >>>>>> > > >> >>>>>> Regards Ard > > >> >>>>> > > >> >>> > > >> > > > > > > > >
