Hello,
2010/8/30 Slavek Tecl <[email protected]>:
>
>
> once again in HTML...
>
> here comes the addBinaryValue method body:...
>
>
> //standard way of indexing
> String jcrData = mappings.getPrefix(Name.NS_JCR_URI) + ":data";
> if (jcrData.equals(fieldName)) {
> InternalValue type = getValue(NameConstants.JCR_MIMETYPE);
> if (type != null) {
> Metadata metadata = new Metadata();
> metadata.set(Metadata.CONTENT_TYPE, type.getString());
> // jcr:encoding is not mandatory
> InternalValue encoding = getValue(NameConstants.JCR_ENCODING);
> if (encoding != null) {
> metadata.set(Metadata.CONTENT_ENCODING, encoding.getString());
> }
> doc.add(createFulltextField(internalValue, metadata));
> }
> } else {
> //everything else gets indexed as well
> MimeTypes gk = new MimeTypes();
> MimeType mimeType = gk.getMimeType(internalValue.getStream());
>
> Metadata metadata = new Metadata();
> metadata.set(Metadata.CONTENT_TYPE, mimeType.getName());
> doc.add(createFulltextField(internalValue, metadata));
> }
ok, but as I said in one of my earlier mails, binaries are not indexed
on property level, only on nodescope level. Your code doesn't index on
property lever either, see createFulltextField. I though hope I did
understand you first mail correctly: You want to specifically search
in the binary property *only* right? You could post me the xpath that
you want to be executed.
Anyway,
you should also add the indexed binary as a non stored (I recommend
non stored) property, thus something like the method
protected void addStringValue(Document doc, String fieldName,
Object internalValue, boolean tokenized,
boolean includeInNodeIndex, float boost,
boolean useInExcerpt) {
does. However, you must realize that binaries get indexed in
background lazily by default. I'd recommend to not call
doc.add(createFulltextField(internalValue, metadata));
but call
doc.add(createFulltextField(fieldName, internalValue, metadata));
add this new createFulltextField method, and create your own
LazyTextExtractorField class also having an arg for fieldName.
Then, you need to also add the extracted analysed text as a property.
Regards Ard
>
>
>
> and here we have my custom parser (and I can see it's being started everytime
> the binary value with my custom mime type is added):
>
> XHTMLContentHandler xhtml = new XHTMLContentHandler(handler, metadata);
> xhtml.startDocument();
> ...fetch keywords...
> for(String value: keywords) {
> xhtml.characters(value);
> xhtml.characters(" ");
> }
> xhtml.endDocument();
> ...
>
>
>
>
>
>
>
> ----------------------------------------
>> Date: Mon, 30 Aug 2010 09:52:28 +0200
>> Subject: Re: Searching for binary values
>> From: [email protected]
>> To: [email protected]
>>
>> 2010/8/30 Slavek Tecl :
>> >
>> > Bloody hotmail, screwed my awesome formatting ;)Hope it's ok now.
>>
>> Hmmmm...not really
>>
>> >
>> > here comes the addBinaryValue method body:...//standard way of
>> > indexingString jcrData = mappings.getPrefix(Name.NS_JCR_URI) + ":data";if
>> > (jcrData.equals(fieldName)) { InternalValue type =
>> > getValue(NameConstants.JCR_MIMETYPE); if (type != null) { Metadata
>> > metadata = new Metadata(); metadata.set(Metadata.CONTENT_TYPE,
>> > type.getString()); // jcr:encoding is not mandatory InternalValue encoding
>> > = getValue(NameConstants.JCR_ENCODING); if (encoding != null) {
>> > metadata.set(Metadata.CONTENT_ENCODING, encoding.getString()); }
>> > doc.add(createFulltextField(internalValue, metadata)); }} else {
>> > //everything else gets indexed as well MimeTypes gk = new MimeTypes();
>> > MimeType mimeType = gk.getMimeType(internalValue.getStream()); Metadata
>> > metadata = new Metadata(); metadata.set(Metadata.CONTENT_TYPE,
>> > mimeType.getName()); doc.add(createFulltextField(internalValue,
>> > metadata));}...
>> >
>> > and here we have my custom parser (and I can see it's being started
>> > everytime the binary value with my custom mime type is
>> > added):XHTMLContentHandler xhtml = new XHTMLContentHandler(handler,
>> > metadata);xhtml.startDocument();...fetch keywords...for(String value:
>> > keywords) { xhtml.characters(value); xhtml.characters("
>> > ");}xhtml.endDocument();...
>> >> Date: Mon, 30 Aug 2010 09:31:47 +0200
>> >> Subject: Re: Searching for binary values
>> >> From: [email protected]
>> >> To: [email protected]
>> >>
>> >> Slavek,
>> >>
>> >> I am no computer :-) Is there a way you format this is little to human
>> >> understandable kind of thing?
>> >>
>> >>
>> >> 2010/8/30 Slavek Tecl :
>> >>>
>> >>> All right, here comes the addBinaryValue method body: ... //standard way
>> >>> of indexing String jcrData = mappings.getPrefix(Name.NS_JCR_URI) +
>> >>> ":data"; if (jcrData.equals(fieldName)) { InternalValue type =
>> >>> getValue(NameConstants.JCR_MIMETYPE); if (type != null) { Metadata
>> >>> metadata = new Metadata(); metadata.set(Metadata.CONTENT_TYPE,
>> >>> type.getString());
>> >>> // jcr:encoding is not mandatory InternalValue encoding =
>> >>> getValue(NameConstants.JCR_ENCODING); if (encoding != null) {
>> >>> metadata.set(Metadata.CONTENT_ENCODING, encoding.getString()); }
>> >>> doc.add(createFulltextField(internalValue, metadata)); } } else {
>> >>> //everything else gets indexed as well MimeTypes gk = new MimeTypes();
>> >>> MimeType mimeType = gk.getMimeType(internalValue.getStream());
>> >>> Metadata metadata = new Metadata(); metadata.set(Metadata.CONTENT_TYPE,
>> >>> mimeType.getName()); doc.add(createFulltextField(internalValue,
>> >>> metadata)); } ...
>> >>> my custom parser leverages XMLContentHandler like this (and I can see
>> >>> it's being started everytime the binary value with my custom mime type
>> >>> is added):
>> >>> ...XHTMLContentHandler xhtml = new XHTMLContentHandler(handler,
>> >>> metadata);xhtml.startDocument();... for(String value: keywords) {
>> >>> xhtml.characters(value); xhtml.characters(" "); //xhtml.element("p",
>> >>> value); }xhtml.endDocument();...
>> >>>> Date: Mon, 30 Aug 2010 09:12:16 +0200
>> >>>> Subject: Re: Searching for binary values
>> >>>> From: [email protected]
>> >>>> To: [email protected]
>> >>>>
>> >>>> 2010/8/27 Slavek Tecl :
>> >>>>> In my case the addBinaryValue has been overriden in my custom class so
>> >>>>> I'm adding this field to the document as well.
>> >>>>
>> >>>> Is it possible that you made some error in this? I can't judge it
>> >>>> without code
>> >>>>
>> >>>> Regards Ard
>> >>>>
>> >>>>>
>> >>>>>> Date: Fri, 27 Aug 2010 17:16:56 +0200
>> >>>>>> Subject: Re: Searching for binary values
>> >>>>>> From: [email protected]
>> >>>>>> To: [email protected]
>> >>>>>>
>> >>>>>> 2010/8/27 Slavek Tecl :
>> >>>>>>>
>> >>>>>>> I'm looking for a clarification how the query is processed in my
>> >>>>>>> customized jackrabbit instance. In my case the NodeIndexer is
>> >>>>>>> subclassed so it can add the binary value to the indexed Document
>> >>>>>>> even if it does not have nt:resource type. Then Tika has been
>> >>>>>>> customized with my mimetype so the parser is able to recognize the
>> >>>>>>> binary stream through it's magic and of course the tika's Parser
>> >>>>>>> object was implemented to support the custom binary stream to
>> >>>>>>> extract words from it.If I run a query on nt:resource nodes it
>> >>>>>>> correctly returns files including the searched word as expected but
>> >>>>>>> when I invoke a similar query on a binary property (and the content
>> >>>>>>> of this binary property is exactly the type of the stream Tika can
>> >>>>>>> parse) it does not return anything - is there a way out?
>> >>>>>>
>> >>>>>>
>> >>>>>> Binary properties are only indexed on nodescope level, not on
>> >>>>>> property level.
>> >>>>>>
>> >>>>>> See protected void addBinaryValue(Document doc,
>> >>>>>> String fieldName,
>> >>>>>> InternalValue internalValue) {
>> >>>>>>
>> >>>>>> and then specifically doc.add(createFulltextField(internalValue,
>> >>>>>> metadata));
>> >>>>>>
>> >>>>>> in jr NodeIndexer
>> >>>>>>
>> >>>>>> Regards Ard
>> >>>>>
>> >>>
>> >
>
>