Hello,

2010/8/30 Slavek Tecl <[email protected]>:
>
>
>  once again in HTML...
>
>  here comes the addBinaryValue method body:...
>
>
>  //standard way of indexing
> String jcrData = mappings.getPrefix(Name.NS_JCR_URI) + ":data";
> if (jcrData.equals(fieldName)) {
> InternalValue type = getValue(NameConstants.JCR_MIMETYPE);
> if (type != null) {
>    Metadata metadata = new Metadata();
>    metadata.set(Metadata.CONTENT_TYPE, type.getString());
>    // jcr:encoding is not mandatory
>    InternalValue encoding = getValue(NameConstants.JCR_ENCODING);
>    if (encoding != null) {
>       metadata.set(Metadata.CONTENT_ENCODING, encoding.getString());
>   }
>   doc.add(createFulltextField(internalValue, metadata));
> }
> } else {
> //everything else gets indexed as well
> MimeTypes gk = new MimeTypes();
> MimeType mimeType = gk.getMimeType(internalValue.getStream());
>
> Metadata metadata = new Metadata();
> metadata.set(Metadata.CONTENT_TYPE, mimeType.getName());
> doc.add(createFulltextField(internalValue, metadata));
> }

ok, but as I said in one of my earlier mails, binaries are not indexed
on property level, only on nodescope level. Your code doesn't index on
property lever either, see createFulltextField. I though hope I did
understand you first mail correctly: You want to specifically search
in the binary property *only* right? You could post me the xpath that
you want to be executed.

Anyway,

you should also add the indexed binary as a non stored (I recommend
non stored) property, thus something like the method

protected void addStringValue(Document doc, String fieldName,
                                  Object internalValue, boolean tokenized,
                                  boolean includeInNodeIndex, float boost,
                                  boolean useInExcerpt) {

does. However, you must realize that binaries get indexed in
background lazily by default. I'd recommend to not call

doc.add(createFulltextField(internalValue, metadata));

but call

doc.add(createFulltextField(fieldName, internalValue, metadata));

add this new createFulltextField method, and create your own
LazyTextExtractorField class also having an arg for fieldName.

Then, you need to also add the extracted analysed text as a property.

Regards Ard

>
>
>
> and here we have my custom parser (and I can see it's being started everytime 
> the binary value with my custom mime type is added):
>
> XHTMLContentHandler xhtml = new XHTMLContentHandler(handler, metadata);
> xhtml.startDocument();
> ...fetch keywords...
> for(String value: keywords) {
> xhtml.characters(value);
> xhtml.characters(" ");
> }
> xhtml.endDocument();
> ...
>
>
>
>
>
>
>
> ----------------------------------------
>> Date: Mon, 30 Aug 2010 09:52:28 +0200
>> Subject: Re: Searching for binary values
>> From: [email protected]
>> To: [email protected]
>>
>> 2010/8/30 Slavek Tecl :
>> >
>> > Bloody hotmail, screwed my awesome formatting ;)Hope it's ok now.
>>
>> Hmmmm...not really
>>
>> >
>> > here comes the addBinaryValue method body:...//standard way of 
>> > indexingString jcrData = mappings.getPrefix(Name.NS_JCR_URI) + ":data";if 
>> > (jcrData.equals(fieldName)) { InternalValue type = 
>> > getValue(NameConstants.JCR_MIMETYPE); if (type != null) { Metadata 
>> > metadata = new Metadata(); metadata.set(Metadata.CONTENT_TYPE, 
>> > type.getString()); // jcr:encoding is not mandatory InternalValue encoding 
>> > = getValue(NameConstants.JCR_ENCODING); if (encoding != null) { 
>> > metadata.set(Metadata.CONTENT_ENCODING, encoding.getString()); } 
>> > doc.add(createFulltextField(internalValue, metadata)); }} else { 
>> > //everything else gets indexed as well MimeTypes gk = new MimeTypes(); 
>> > MimeType mimeType = gk.getMimeType(internalValue.getStream()); Metadata 
>> > metadata = new Metadata(); metadata.set(Metadata.CONTENT_TYPE, 
>> > mimeType.getName()); doc.add(createFulltextField(internalValue, 
>> > metadata));}...
>> >
>> > and here we have my custom parser (and I can see it's being started 
>> > everytime the binary value with my custom mime type is 
>> > added):XHTMLContentHandler xhtml = new XHTMLContentHandler(handler, 
>> > metadata);xhtml.startDocument();...fetch keywords...for(String value: 
>> > keywords) { xhtml.characters(value); xhtml.characters(" 
>> > ");}xhtml.endDocument();...
>> >> Date: Mon, 30 Aug 2010 09:31:47 +0200
>> >> Subject: Re: Searching for binary values
>> >> From: [email protected]
>> >> To: [email protected]
>> >>
>> >> Slavek,
>> >>
>> >> I am no computer :-) Is there a way you format this is little to human
>> >> understandable kind of thing?
>> >>
>> >>
>> >> 2010/8/30 Slavek Tecl :
>> >>>
>> >>> All right, here comes the addBinaryValue method body: ... //standard way 
>> >>> of indexing String jcrData = mappings.getPrefix(Name.NS_JCR_URI) + 
>> >>> ":data"; if (jcrData.equals(fieldName)) { InternalValue type = 
>> >>> getValue(NameConstants.JCR_MIMETYPE); if (type != null) { Metadata 
>> >>> metadata = new Metadata(); metadata.set(Metadata.CONTENT_TYPE, 
>> >>> type.getString());
>> >>> // jcr:encoding is not mandatory InternalValue encoding = 
>> >>> getValue(NameConstants.JCR_ENCODING); if (encoding != null) { 
>> >>> metadata.set(Metadata.CONTENT_ENCODING, encoding.getString()); }
>> >>> doc.add(createFulltextField(internalValue, metadata)); } } else { 
>> >>> //everything else gets indexed as well MimeTypes gk = new MimeTypes(); 
>> >>> MimeType mimeType = gk.getMimeType(internalValue.getStream());
>> >>> Metadata metadata = new Metadata(); metadata.set(Metadata.CONTENT_TYPE, 
>> >>> mimeType.getName()); doc.add(createFulltextField(internalValue, 
>> >>> metadata)); } ...
>> >>> my custom parser leverages XMLContentHandler like this (and I can see 
>> >>> it's being started everytime the binary value with my custom mime type 
>> >>> is added):
>> >>> ...XHTMLContentHandler xhtml = new XHTMLContentHandler(handler, 
>> >>> metadata);xhtml.startDocument();... for(String value: keywords) { 
>> >>> xhtml.characters(value); xhtml.characters(" "); //xhtml.element("p", 
>> >>> value); }xhtml.endDocument();...
>> >>>> Date: Mon, 30 Aug 2010 09:12:16 +0200
>> >>>> Subject: Re: Searching for binary values
>> >>>> From: [email protected]
>> >>>> To: [email protected]
>> >>>>
>> >>>> 2010/8/27 Slavek Tecl :
>> >>>>> In my case the addBinaryValue has been overriden in my custom class so 
>> >>>>> I'm adding this field to the document as well.
>> >>>>
>> >>>> Is it possible that you made some error in this? I can't judge it 
>> >>>> without code
>> >>>>
>> >>>> Regards Ard
>> >>>>
>> >>>>>
>> >>>>>> Date: Fri, 27 Aug 2010 17:16:56 +0200
>> >>>>>> Subject: Re: Searching for binary values
>> >>>>>> From: [email protected]
>> >>>>>> To: [email protected]
>> >>>>>>
>> >>>>>> 2010/8/27 Slavek Tecl :
>> >>>>>>>
>> >>>>>>> I'm looking for a clarification how the query is processed in my 
>> >>>>>>> customized jackrabbit instance. In my case the NodeIndexer is 
>> >>>>>>> subclassed so it can add the binary value to the indexed Document 
>> >>>>>>> even if it does not have nt:resource type. Then Tika has been 
>> >>>>>>> customized with my mimetype so the parser is able to recognize the 
>> >>>>>>> binary stream through it's magic and of course the tika's Parser 
>> >>>>>>> object was implemented to support the custom binary stream to 
>> >>>>>>> extract words from it.If I run a query on nt:resource nodes it 
>> >>>>>>> correctly returns files including the searched word as expected but 
>> >>>>>>> when I invoke a similar query on a binary property (and the content 
>> >>>>>>> of this binary property is exactly the type of the stream Tika can 
>> >>>>>>> parse) it does not return anything - is there a way out?
>> >>>>>>
>> >>>>>>
>> >>>>>> Binary properties are only indexed on nodescope level, not on 
>> >>>>>> property level.
>> >>>>>>
>> >>>>>> See protected void addBinaryValue(Document doc,
>> >>>>>> String fieldName,
>> >>>>>> InternalValue internalValue) {
>> >>>>>>
>> >>>>>> and then specifically doc.add(createFulltextField(internalValue, 
>> >>>>>> metadata));
>> >>>>>>
>> >>>>>> in jr NodeIndexer
>> >>>>>>
>> >>>>>> Regards Ard
>> >>>>>
>> >>>
>> >
>
>

Reply via email to