hi phillip,
On 4/18/07, Phillip Rhodes <[EMAIL PROTECTED]> wrote:
I am adding BinaryValue properties to my nodes. It appears that jackrabbit is
not indexing the values of the BinaryValue even if the contents represent a
string. If I add the String value as a StringValue, the value is indexed and
picked up in a contains search.
I have 2 issues with this:
1) String property values have a limit of around 16000 characters because the
SimpleDBPersistence adapter will store the value in a BLOB field. I get Mysql
data truncation errors unless I chop the data down to 16000 characters. In
addition, I am doubling my space requirements. No only do I have to store my
binary content, by it's string representation in the node.
there's been a related jira issue:
https://issues.apache.org/jira/browse/JCR-760
the issue has been resolved and will be included in the upcoming 1.3 release.
for the time being you could either change the table defintions
directly on mysql
or change the mysql.ddl file and let jackrabbit recreate the tables.
2) I use a byte[] array throughout my application has a means to store pdf files, image
files, text files, etc... It is a "common denominator for all content" PDF
files, image files, wiki entries, etc... all can be stored, passed around, retrieved as
a byte[] array. I would like to figure out how to get jackrabbit to index the byte[]
array properly.
see below
3) Not an issue, but a question. How does jackrabbit know that a node is a pdf document?
It must figure it out somehow because I see that there is support in the SearchIndex to
configure pdf extractions. Do I add "jcr:mimeType" property of application/pdf
to my pdf node and that will do it? Will this solve the first 2 issues??
in order to be indexed the binary data must be stored in the jcr:data
property of a
node of type nt:resource (or as a sub type thereof).
e.g.
Node node = parent.addNode("jcr:content", "nt:resource");
node.setProperty("jcr:mimeType", "application/pdf);
node.setProperty("jcr:data", new ByteArrayInputStream(pdfBytes));
node.setProperty("jcr:lastModified", Calendar.getInstance());
session.save();
once the resource nodes went through the text filters you can search
binary content using the jcr:contains function:
//element(*, nt:resource)[jcr:contains(., 'foo')]
cheers
stefan
I appreciate your thoughts on this!
My Code:
String contentText= "this is a unique piece of text";
byte[] bytes = contentText.getBytes();
node.setProperty("content", new BinaryValue(bytes));
if (content.length() > 16000) {
contentText= contentText.substring(0, 16000);
}
node.setProperty("worksproperty", new StringValue(contentText));
This is my xpath query:
//*[jcr:contains(.,'unique')]