Bloody hotmail, screwed my awesome formatting ;)Hope it's ok now.
here comes the addBinaryValue method body:...//standard way of indexingString
jcrData = mappings.getPrefix(Name.NS_JCR_URI) + ":data";if
(jcrData.equals(fieldName)) { InternalValue type =
getValue(NameConstants.JCR_MIMETYPE); if (type != null) { Metadata
metadata = new Metadata(); metadata.set(Metadata.CONTENT_TYPE,
type.getString()); // jcr:encoding is not mandatory
InternalValue encoding = getValue(NameConstants.JCR_ENCODING); if
(encoding != null) { metadata.set(Metadata.CONTENT_ENCODING,
encoding.getString()); } doc.add(createFulltextField(internalValue,
metadata)); }} else { //everything else gets indexed as well
MimeTypes gk = new MimeTypes(); MimeType mimeType =
gk.getMimeType(internalValue.getStream()); Metadata metadata = new
Metadata(); metadata.set(Metadata.CONTENT_TYPE, mimeType.getName());
doc.add(createFulltextField(internalValue, metadata));}...
and here we have my custom parser (and I can see it's being started everytime
the binary value with my custom mime type is added):XHTMLContentHandler xhtml =
new XHTMLContentHandler(handler, metadata);xhtml.startDocument();...fetch
keywords...for(String value: keywords) { xhtml.characters(value);
xhtml.characters(" ");}xhtml.endDocument();...
> Date: Mon, 30 Aug 2010 09:31:47 +0200
> Subject: Re: Searching for binary values
> From: [email protected]
> To: [email protected]
>
> Slavek,
>
> I am no computer :-) Is there a way you format this is little to human
> understandable kind of thing?
>
>
> 2010/8/30 Slavek Tecl <[email protected]>:
>>
>> All right, here comes the addBinaryValue method body: ...
>> //standard way of indexing String jcrData =
>> mappings.getPrefix(Name.NS_JCR_URI) + ":data"; if
>> (jcrData.equals(fieldName)) { InternalValue type =
>> getValue(NameConstants.JCR_MIMETYPE); if (type != null) {
>> Metadata metadata = new Metadata();
>> metadata.set(Metadata.CONTENT_TYPE, type.getString());
>> // jcr:encoding is not mandatory
>> InternalValue encoding = getValue(NameConstants.JCR_ENCODING);
>> if (encoding != null) {
>> metadata.set(Metadata.CONTENT_ENCODING,
>> encoding.getString()); }
>> doc.add(createFulltextField(internalValue, metadata));
>> } } else { //everything else gets indexed as
>> well MimeTypes gk = new MimeTypes(); MimeType mimeType =
>> gk.getMimeType(internalValue.getStream());
>> Metadata metadata = new Metadata();
>> metadata.set(Metadata.CONTENT_TYPE, mimeType.getName());
>> doc.add(createFulltextField(internalValue, metadata)); } ...
>> my custom parser leverages XMLContentHandler like this (and I can see it's
>> being started everytime the binary value with my custom mime type is added):
>> ...XHTMLContentHandler xhtml = new XHTMLContentHandler(handler,
>> metadata);xhtml.startDocument();... for(String value: keywords) {
>> xhtml.characters(value); xhtml.characters(" ");
>> //xhtml.element("p", value); }xhtml.endDocument();...
>>> Date: Mon, 30 Aug 2010 09:12:16 +0200
>>> Subject: Re: Searching for binary values
>>> From: [email protected]
>>> To: [email protected]
>>>
>>> 2010/8/27 Slavek Tecl <[email protected]>:
>>>> In my case the addBinaryValue has been overriden in my custom class so I'm
>>>> adding this field to the document as well.
>>>
>>> Is it possible that you made some error in this? I can't judge it without
>>> code
>>>
>>> Regards Ard
>>>
>>>>
>>>>> Date: Fri, 27 Aug 2010 17:16:56 +0200
>>>>> Subject: Re: Searching for binary values
>>>>> From: [email protected]
>>>>> To: [email protected]
>>>>>
>>>>> 2010/8/27 Slavek Tecl <[email protected]>:
>>>>>>
>>>>>> I'm looking for a clarification how the query is processed in my
>>>>>> customized jackrabbit instance. In my case the NodeIndexer is subclassed
>>>>>> so it can add the binary value to the indexed Document even if it does
>>>>>> not have nt:resource type. Then Tika has been customized with my
>>>>>> mimetype so the parser is able to recognize the binary stream through
>>>>>> it's magic and of course the tika's Parser object was implemented to
>>>>>> support the custom binary stream to extract words from it.If I run a
>>>>>> query on nt:resource nodes it correctly returns files including the
>>>>>> searched word as expected but when I invoke a similar query on a binary
>>>>>> property (and the content of this binary property is exactly the type of
>>>>>> the stream Tika can parse) it does not return anything - is there a way
>>>>>> out?
>>>>>
>>>>>
>>>>> Binary properties are only indexed on nodescope level, not on property
>>>>> level.
>>>>>
>>>>> See protected void addBinaryValue(Document doc,
>>>>> String fieldName,
>>>>> InternalValue internalValue) {
>>>>>
>>>>> and then specifically doc.add(createFulltextField(internalValue,
>>>>> metadata));
>>>>>
>>>>> in jr NodeIndexer
>>>>>
>>>>> Regards Ard
>>>>
>>