Good morning,
I would like to ask for your help in understanding where I go wrong in building
a working example where I populate a repository with binary data, index it, and
run a contains query.
I have logs to TRACE and I see the indexing working, upon executing the query
however I always get 0 results.
Repository is NodeStore and I create it in this way:
LuceneIndexProvider provider = new LuceneIndexProvider();
Oak oak = new Oak(ns)
.with((QueryIndexProvider) provider)
.with((Observer) provider)
.with(new LuceneIndexEditorProvider());
repository = new Jcr(oak).createRepository();
Then I populate it in this way:
Node node = rootNode.addNode("node" + i, "nt:unstructured");
byte[] data = ("testo" + i).getBytes();
ByteArrayInputStream bais = new ByteArrayInputStream(data);
Binary binary = session.getValueFactory()
.createBinary(bais);
try {
node.setProperty("binaryData", binary);
} finally {
binary.dispose();
}
node.setProperty("jcr:mimeType", "text/plain");
Then the index is in this way:
Node root = session.getRootNode();
Node oakIndex = root.getNode("oak:index");
Node index = oakIndex.addNode("contentTextIndex", "oak:QueryIndexDefinition");
index.setProperty("type", "lucene");
index.setProperty("async", (String[]) null);
Node indexRules = index.addNode("indexRules", "nt:unstructured");
Node ntBase = indexRules.addNode("nt:base", "nt:unstructured");
Node properties = ntBase.addNode("properties", "nt:unstructured");
Node binaryDataProperty = properties.addNode("binaryData", "nt:unstructured");
binaryDataProperty.setProperty("name", propertyName);
binaryDataProperty.setProperty("propertyIndex", true);
binaryDataProperty.setProperty("analyzed", true);
Node jcrMimeTypeProperty = properties.addNode("jcr:mimeType");
jcrMimeTypeProperty.setProperty("name", "jcr:mimeType");
jcrMimeTypeProperty.setProperty("propertyIndex", true);
jcrMimeTypeProperty.setProperty("analyzed", true);
Then I search in this way:
String sql2QueryString = "SELECT * FROM [nt:base] WHERE CONTAINS([binaryData],
'testo')";
Query sql2Query = queryManager.createQuery(sql2QueryString, Query.JCR_SQL2);
QueryResult result = sql2Query.execute();
and I read the results in this way:
NodeIterator nodes = result.getNodes();
while (nodes.hasNext()) {
Node node = nodes.nextNode();
log.info("Path: " + node.getPath());
counter++;
}
log.info("Found {} results", counter);
I'm using oak 1.68.0 with tika-core and tika-parsers-standard-package 2.9.2.
In logs I see the indexing and the text extraction correctly, if you want I can
attach a full log.
Really thank you for your help, best regards
Cordiali saluti / Best regards,
Raffaele Gambelli
Senior Java Developer
E [email protected]<mailto:[email protected]>
[CEGEKA] Via Ettore Cristoni, 84
IT-40033 Bologna (IT), Italy
T +39 02 2544271
WWW.CEGEKA.COM<https://www.cegeka.com>
[https://2655225.fs1.hubspotusercontent-na1.net/hubfs/2655225/0.0%20Cegeka%20(new)/1.%20Visuals/Email%20Signatures/Annual_Report_Visuals_2023_Email%20Banner%201.png]<https://www.cegeka.com/it/annual-report-2023?utm_campaign=[EN]%20-%20Annual%20Report%202023&utm_source=email%20signature%20banner&utm_medium=email%20signature%20banner%20annual%20report%202023>
Dichiarazione di Riservatezza
Le informazioni contenute nella mail sono riservate. Se si rende conto di non
essere il destinatario corretto della mail, la preghiamo di segnalare l'errore
al mittente e di cancellare immediatamente il messaggio. L’utilizzo improprio
di informazioni riservate può comportare sanzioni.
Protezione dei dati personali
La informiamo che i suoi dati saranno trattati da Cegeka nel rispetto delle
disposizioni di legge applicabili (D. Lgs 196/2003 e Regolamento UE 679/2016).
Per maggiori dettagli può consultare le nostre informative privacy al link
https://www.cegeka.com/it/informazioni-sulla-privacy.<https://www.cegeka.com/it/informazioni-sulla-privacy>