Re: BinaryValue does not get indexed

Stefan Guggisberg Thu, 19 Apr 2007 01:40:14 -0700

hi phillip,

On 4/18/07, Phillip Rhodes <[EMAIL PROTECTED]> wrote:


I am adding BinaryValue properties to my nodes.  It appears that jackrabbit is 
not indexing the values of the BinaryValue even if the contents represent a 
string.  If I add the String value as a StringValue, the value is indexed and 
picked up in a contains search.

I have 2 issues with this:

1) String property values have a limit of around 16000 characters because the 
SimpleDBPersistence adapter will store the value in a BLOB field.  I get Mysql 
data truncation errors unless I chop the data down to 16000 characters.  In 
addition, I am doubling my space requirements.  No only do I have to store my 
binary content, by it's string representation in the node.


there's been a related jira issue:
https://issues.apache.org/jira/browse/JCR-760

the issue has been resolved and will be included in the upcoming 1.3 release.

for the time being you could either change the table defintions
directly on mysql
or change the mysql.ddl file and let jackrabbit recreate the tables.


2) I use a byte[] array throughout my application has a means to store pdf files, image 
files, text files, etc...  It is a "common denominator for all content"  PDF 
files, image files, wiki entries, etc...  all can be stored, passed around, retrieved as 
a byte[] array.  I would like to figure out how to get jackrabbit to index the byte[] 
array properly.


see below


3) Not an issue, but a question. How does jackrabbit know that a node is a pdf document?  
It must figure it out somehow because I see that there is support in the SearchIndex to 
configure pdf extractions.  Do I add "jcr:mimeType" property of application/pdf 
to my pdf node and that will do it?  Will this solve the first 2 issues??


in order to be indexed the binary data must be stored in the jcr:data
property of a
node of type nt:resource  (or as a sub type thereof).

e.g.

   Node node = parent.addNode("jcr:content", "nt:resource");
   node.setProperty("jcr:mimeType", "application/pdf);
   node.setProperty("jcr:data", new ByteArrayInputStream(pdfBytes));
   node.setProperty("jcr:lastModified", Calendar.getInstance());
   session.save();

once the resource nodes went through the text filters you can search
binary content using the jcr:contains function:

//element(*, nt:resource)[jcr:contains(., 'foo')]

cheers
stefan


I appreciate your thoughts on this!


My Code:

String contentText= "this is a unique piece of text";
byte[] bytes = contentText.getBytes();
node.setProperty("content", new BinaryValue(bytes));
if (content.length() > 16000) {
        contentText= contentText.substring(0, 16000);
}
node.setProperty("worksproperty", new StringValue(contentText));



This is my xpath query:
//*[jcr:contains(.,'unique')]

Re: BinaryValue does not get indexed

Reply via email to