2010/8/30 Slavek Tecl <[email protected]>: > > Sweet, it works, so the only problem was in my way of adding a property to > the lucene Document. > > Many thanks for your help!Cheers,Slavek
You're welcome. thx for the feedback, Regards Ard > > > >> From: [email protected] >> To: [email protected] >> Subject: RE: Searching for binary values >> Date: Mon, 30 Aug 2010 11:20:22 +0200 >> >> >> Hi Ard, >> now I know what the problem is, I just thought that the call to >> doc.add(createFullTextField(...)) would >> be sufficient. >> The query I've been using is more complicated but for testing I used really >> simple one: >> SELECT child.* FROM [customns:customtype] WHERE >> CONTAINS(child.binaryproperty, 'value'). >> For the future, I'm sure the query will contain non-binary properties as >> well so a query like >> SELECT child.* FROM [customns:customtype] WHERE >> CONTAINS(child.binaryproperty, 'value') OR CONTAINS(child.stringproperty, >> 'foo') >> would be used too. >> Anyway, thanks for pointing me out the right direction, I'll try to >> implement the stuff and see if it's working correctly. >> Best regards, >> Slavek >> >> >> > Date: Mon, 30 Aug 2010 10:28:39 +0200 >> > Subject: Re: Searching for binary values >> > From: [email protected] >> > To: [email protected] >> > >> > Hello, >> > >> > 2010/8/30 Slavek Tecl <[email protected]>: >> > > >> > > >> > > once again in HTML... >> > > >> > > here comes the addBinaryValue method body:... >> > > >> > > >> > > //standard way of indexing >> > > String jcrData = mappings.getPrefix(Name.NS_JCR_URI) + ":data"; >> > > if (jcrData.equals(fieldName)) { >> > > InternalValue type = getValue(NameConstants.JCR_MIMETYPE); >> > > if (type != null) { >> > > Metadata metadata = new Metadata(); >> > > metadata.set(Metadata.CONTENT_TYPE, type.getString()); >> > > // jcr:encoding is not mandatory >> > > InternalValue encoding = getValue(NameConstants.JCR_ENCODING); >> > > if (encoding != null) { >> > > metadata.set(Metadata.CONTENT_ENCODING, encoding.getString()); >> > > } >> > > doc.add(createFulltextField(internalValue, metadata)); >> > > } >> > > } else { >> > > //everything else gets indexed as well >> > > MimeTypes gk = new MimeTypes(); >> > > MimeType mimeType = gk.getMimeType(internalValue.getStream()); >> > > >> > > Metadata metadata = new Metadata(); >> > > metadata.set(Metadata.CONTENT_TYPE, mimeType.getName()); >> > > doc.add(createFulltextField(internalValue, metadata)); >> > > } >> > >> > ok, but as I said in one of my earlier mails, binaries are not indexed >> > on property level, only on nodescope level. Your code doesn't index on >> > property lever either, see createFulltextField. I though hope I did >> > understand you first mail correctly: You want to specifically search >> > in the binary property *only* right? You could post me the xpath that >> > you want to be executed. >> > >> > Anyway, >> > >> > you should also add the indexed binary as a non stored (I recommend >> > non stored) property, thus something like the method >> > >> > protected void addStringValue(Document doc, String fieldName, >> > Object internalValue, boolean tokenized, >> > boolean includeInNodeIndex, float boost, >> > boolean useInExcerpt) { >> > >> > does. However, you must realize that binaries get indexed in >> > background lazily by default. I'd recommend to not call >> > >> > doc.add(createFulltextField(internalValue, metadata)); >> > >> > but call >> > >> > doc.add(createFulltextField(fieldName, internalValue, metadata)); >> > >> > add this new createFulltextField method, and create your own >> > LazyTextExtractorField class also having an arg for fieldName. >> > >> > Then, you need to also add the extracted analysed text as a property. >> > >> > Regards Ard >> > >> > > >> > > >> > > >> > > and here we have my custom parser (and I can see it's being started >> > > everytime the binary value with my custom mime type is added): >> > > >> > > XHTMLContentHandler xhtml = new XHTMLContentHandler(handler, metadata); >> > > xhtml.startDocument(); >> > > ...fetch keywords... >> > > for(String value: keywords) { >> > > xhtml.characters(value); >> > > xhtml.characters(" "); >> > > } >> > > xhtml.endDocument(); >> > > ... >> > > >> > > >> > > >> > > >> > > >> > > >> > > >> > > ---------------------------------------- >> > >> Date: Mon, 30 Aug 2010 09:52:28 +0200 >> > >> Subject: Re: Searching for binary values >> > >> From: [email protected] >> > >> To: [email protected] >> > >> >> > >> 2010/8/30 Slavek Tecl : >> > >> > >> > >> > Bloody hotmail, screwed my awesome formatting ;)Hope it's ok now. >> > >> >> > >> Hmmmm...not really >> > >> >> > >> > >> > >> > here comes the addBinaryValue method body:...//standard way of >> > >> > indexingString jcrData = mappings.getPrefix(Name.NS_JCR_URI) + >> > >> > ":data";if (jcrData.equals(fieldName)) { InternalValue type = >> > >> > getValue(NameConstants.JCR_MIMETYPE); if (type != null) { Metadata >> > >> > metadata = new Metadata(); metadata.set(Metadata.CONTENT_TYPE, >> > >> > type.getString()); // jcr:encoding is not mandatory InternalValue >> > >> > encoding = getValue(NameConstants.JCR_ENCODING); if (encoding != >> > >> > null) { metadata.set(Metadata.CONTENT_ENCODING, >> > >> > encoding.getString()); } doc.add(createFulltextField(internalValue, >> > >> > metadata)); }} else { //everything else gets indexed as well >> > >> > MimeTypes gk = new MimeTypes(); MimeType mimeType = >> > >> > gk.getMimeType(internalValue.getStream()); Metadata metadata = new >> > >> > Metadata(); metadata.set(Metadata.CONTENT_TYPE, mimeType.getName()); >> > >> > doc.add(createFulltextField(internalValue, metadata));}... >> > >> > >> > >> > and here we have my custom parser (and I can see it's being started >> > >> > everytime the binary value with my custom mime type is >> > >> > added):XHTMLContentHandler xhtml = new XHTMLContentHandler(handler, >> > >> > metadata);xhtml.startDocument();...fetch keywords...for(String value: >> > >> > keywords) { xhtml.characters(value); xhtml.characters(" >> > >> > ");}xhtml.endDocument();... >> > >> >> Date: Mon, 30 Aug 2010 09:31:47 +0200 >> > >> >> Subject: Re: Searching for binary values >> > >> >> From: [email protected] >> > >> >> To: [email protected] >> > >> >> >> > >> >> Slavek, >> > >> >> >> > >> >> I am no computer :-) Is there a way you format this is little to >> > >> >> human >> > >> >> understandable kind of thing? >> > >> >> >> > >> >> >> > >> >> 2010/8/30 Slavek Tecl : >> > >> >>> >> > >> >>> All right, here comes the addBinaryValue method body: ... >> > >> >>> //standard way of indexing String jcrData = >> > >> >>> mappings.getPrefix(Name.NS_JCR_URI) + ":data"; if >> > >> >>> (jcrData.equals(fieldName)) { InternalValue type = >> > >> >>> getValue(NameConstants.JCR_MIMETYPE); if (type != null) { Metadata >> > >> >>> metadata = new Metadata(); metadata.set(Metadata.CONTENT_TYPE, >> > >> >>> type.getString()); >> > >> >>> // jcr:encoding is not mandatory InternalValue encoding = >> > >> >>> getValue(NameConstants.JCR_ENCODING); if (encoding != null) { >> > >> >>> metadata.set(Metadata.CONTENT_ENCODING, encoding.getString()); } >> > >> >>> doc.add(createFulltextField(internalValue, metadata)); } } else { >> > >> >>> //everything else gets indexed as well MimeTypes gk = new >> > >> >>> MimeTypes(); MimeType mimeType = >> > >> >>> gk.getMimeType(internalValue.getStream()); >> > >> >>> Metadata metadata = new Metadata(); >> > >> >>> metadata.set(Metadata.CONTENT_TYPE, mimeType.getName()); >> > >> >>> doc.add(createFulltextField(internalValue, metadata)); } ... >> > >> >>> my custom parser leverages XMLContentHandler like this (and I can >> > >> >>> see it's being started everytime the binary value with my custom >> > >> >>> mime type is added): >> > >> >>> ...XHTMLContentHandler xhtml = new XHTMLContentHandler(handler, >> > >> >>> metadata);xhtml.startDocument();... for(String value: keywords) { >> > >> >>> xhtml.characters(value); xhtml.characters(" "); >> > >> >>> //xhtml.element("p", value); }xhtml.endDocument();... >> > >> >>>> Date: Mon, 30 Aug 2010 09:12:16 +0200 >> > >> >>>> Subject: Re: Searching for binary values >> > >> >>>> From: [email protected] >> > >> >>>> To: [email protected] >> > >> >>>> >> > >> >>>> 2010/8/27 Slavek Tecl : >> > >> >>>>> In my case the addBinaryValue has been overriden in my custom >> > >> >>>>> class so I'm adding this field to the document as well. >> > >> >>>> >> > >> >>>> Is it possible that you made some error in this? I can't judge it >> > >> >>>> without code >> > >> >>>> >> > >> >>>> Regards Ard >> > >> >>>> >> > >> >>>>> >> > >> >>>>>> Date: Fri, 27 Aug 2010 17:16:56 +0200 >> > >> >>>>>> Subject: Re: Searching for binary values >> > >> >>>>>> From: [email protected] >> > >> >>>>>> To: [email protected] >> > >> >>>>>> >> > >> >>>>>> 2010/8/27 Slavek Tecl : >> > >> >>>>>>> >> > >> >>>>>>> I'm looking for a clarification how the query is processed in >> > >> >>>>>>> my customized jackrabbit instance. In my case the NodeIndexer >> > >> >>>>>>> is subclassed so it can add the binary value to the indexed >> > >> >>>>>>> Document even if it does not have nt:resource type. Then Tika >> > >> >>>>>>> has been customized with my mimetype so the parser is able to >> > >> >>>>>>> recognize the binary stream through it's magic and of course >> > >> >>>>>>> the tika's Parser object was implemented to support the custom >> > >> >>>>>>> binary stream to extract words from it.If I run a query on >> > >> >>>>>>> nt:resource nodes it correctly returns files including the >> > >> >>>>>>> searched word as expected but when I invoke a similar query on >> > >> >>>>>>> a binary property (and the content of this binary property is >> > >> >>>>>>> exactly the type of the stream Tika can parse) it does not >> > >> >>>>>>> return anything - is there a way out? >> > >> >>>>>> >> > >> >>>>>> >> > >> >>>>>> Binary properties are only indexed on nodescope level, not on >> > >> >>>>>> property level. >> > >> >>>>>> >> > >> >>>>>> See protected void addBinaryValue(Document doc, >> > >> >>>>>> String fieldName, >> > >> >>>>>> InternalValue internalValue) { >> > >> >>>>>> >> > >> >>>>>> and then specifically doc.add(createFulltextField(internalValue, >> > >> >>>>>> metadata)); >> > >> >>>>>> >> > >> >>>>>> in jr NodeIndexer >> > >> >>>>>> >> > >> >>>>>> Regards Ard >> > >> >>>>> >> > >> >>> >> > >> > >> > > >> > > >> >
