2010/8/30 Slavek Tecl <[email protected]>:
>
> Bloody hotmail, screwed my awesome formatting ;)Hope it's ok now.
Hmmmm...not really
>
> here comes the addBinaryValue method body:...//standard way of indexingString
> jcrData = mappings.getPrefix(Name.NS_JCR_URI) + ":data";if
> (jcrData.equals(fieldName)) { InternalValue type =
> getValue(NameConstants.JCR_MIMETYPE); if (type != null) {
> Metadata metadata = new Metadata(); metadata.set(Metadata.CONTENT_TYPE,
> type.getString()); // jcr:encoding is not mandatory
> InternalValue encoding = getValue(NameConstants.JCR_ENCODING); if
> (encoding != null) { metadata.set(Metadata.CONTENT_ENCODING,
> encoding.getString()); }
> doc.add(createFulltextField(internalValue, metadata)); }} else {
> //everything else gets indexed as well MimeTypes gk = new MimeTypes();
> MimeType mimeType = gk.getMimeType(internalValue.getStream());
> Metadata metadata = new Metadata(); metadata.set(Metadata.CONTENT_TYPE,
> mimeType.getName()); doc.add(createFulltextField(internalValue,
> metadata));}...
>
> and here we have my custom parser (and I can see it's being started everytime
> the binary value with my custom mime type is added):XHTMLContentHandler xhtml
> = new XHTMLContentHandler(handler, metadata);xhtml.startDocument();...fetch
> keywords...for(String value: keywords) { xhtml.characters(value);
> xhtml.characters(" ");}xhtml.endDocument();...
>> Date: Mon, 30 Aug 2010 09:31:47 +0200
>> Subject: Re: Searching for binary values
>> From: [email protected]
>> To: [email protected]
>>
>> Slavek,
>>
>> I am no computer :-) Is there a way you format this is little to human
>> understandable kind of thing?
>>
>>
>> 2010/8/30 Slavek Tecl <[email protected]>:
>>>
>>> All right, here comes the addBinaryValue method body: ...
>>> //standard way of indexing String jcrData =
>>> mappings.getPrefix(Name.NS_JCR_URI) + ":data"; if
>>> (jcrData.equals(fieldName)) { InternalValue type =
>>> getValue(NameConstants.JCR_MIMETYPE); if (type != null) {
>>> Metadata metadata = new Metadata();
>>> metadata.set(Metadata.CONTENT_TYPE, type.getString());
>>> // jcr:encoding is not mandatory
>>> InternalValue encoding = getValue(NameConstants.JCR_ENCODING);
>>> if (encoding != null) {
>>> metadata.set(Metadata.CONTENT_ENCODING,
>>> encoding.getString()); }
>>> doc.add(createFulltextField(internalValue, metadata));
>>> } } else { //everything else gets indexed
>>> as well MimeTypes gk = new MimeTypes(); MimeType mimeType
>>> = gk.getMimeType(internalValue.getStream());
>>> Metadata metadata = new Metadata();
>>> metadata.set(Metadata.CONTENT_TYPE, mimeType.getName());
>>> doc.add(createFulltextField(internalValue, metadata)); } ...
>>> my custom parser leverages XMLContentHandler like this (and I can see it's
>>> being started everytime the binary value with my custom mime type is added):
>>> ...XHTMLContentHandler xhtml = new XHTMLContentHandler(handler,
>>> metadata);xhtml.startDocument();... for(String value: keywords) {
>>> xhtml.characters(value); xhtml.characters(" ");
>>> //xhtml.element("p", value); }xhtml.endDocument();...
>>>> Date: Mon, 30 Aug 2010 09:12:16 +0200
>>>> Subject: Re: Searching for binary values
>>>> From: [email protected]
>>>> To: [email protected]
>>>>
>>>> 2010/8/27 Slavek Tecl <[email protected]>:
>>>>> In my case the addBinaryValue has been overriden in my custom class so
>>>>> I'm adding this field to the document as well.
>>>>
>>>> Is it possible that you made some error in this? I can't judge it without
>>>> code
>>>>
>>>> Regards Ard
>>>>
>>>>>
>>>>>> Date: Fri, 27 Aug 2010 17:16:56 +0200
>>>>>> Subject: Re: Searching for binary values
>>>>>> From: [email protected]
>>>>>> To: [email protected]
>>>>>>
>>>>>> 2010/8/27 Slavek Tecl <[email protected]>:
>>>>>>>
>>>>>>> I'm looking for a clarification how the query is processed in my
>>>>>>> customized jackrabbit instance. In my case the NodeIndexer is
>>>>>>> subclassed so it can add the binary value to the indexed Document even
>>>>>>> if it does not have nt:resource type. Then Tika has been customized
>>>>>>> with my mimetype so the parser is able to recognize the binary stream
>>>>>>> through it's magic and of course the tika's Parser object was
>>>>>>> implemented to support the custom binary stream to extract words from
>>>>>>> it.If I run a query on nt:resource nodes it correctly returns files
>>>>>>> including the searched word as expected but when I invoke a similar
>>>>>>> query on a binary property (and the content of this binary property is
>>>>>>> exactly the type of the stream Tika can parse) it does not return
>>>>>>> anything - is there a way out?
>>>>>>
>>>>>>
>>>>>> Binary properties are only indexed on nodescope level, not on property
>>>>>> level.
>>>>>>
>>>>>> See protected void addBinaryValue(Document doc,
>>>>>> String fieldName,
>>>>>> InternalValue internalValue) {
>>>>>>
>>>>>> and then specifically doc.add(createFulltextField(internalValue,
>>>>>> metadata));
>>>>>>
>>>>>> in jr NodeIndexer
>>>>>>
>>>>>> Regards Ard
>>>>>
>>>
>