Good morning,

I would like to ask for your help in understanding where I go wrong in building 
a working example where I populate a repository with binary data, index it, and 
run a contains query.

I have logs to TRACE and I see the indexing working, upon executing the query 
however I always get 0 results.

Repository is NodeStore and I create it in this way:

LuceneIndexProvider provider = new LuceneIndexProvider();
Oak oak = new Oak(ns)
                .with((QueryIndexProvider) provider)
                .with((Observer) provider)
                .with(new LuceneIndexEditorProvider());
repository = new Jcr(oak).createRepository();

Then I populate it in this way:

Node node = rootNode.addNode("node" + i, "nt:unstructured");
byte[] data = ("testo" + i).getBytes();
ByteArrayInputStream bais = new ByteArrayInputStream(data);
Binary binary = session.getValueFactory()
               .createBinary(bais);
try {
node.setProperty("binaryData", binary);

} finally {
binary.dispose();
}
node.setProperty("jcr:mimeType", "text/plain");

Then the index is in this way:

Node root = session.getRootNode();
Node oakIndex = root.getNode("oak:index");
Node index = oakIndex.addNode("contentTextIndex", "oak:QueryIndexDefinition");
index.setProperty("type", "lucene");
index.setProperty("async", (String[]) null);
Node indexRules = index.addNode("indexRules", "nt:unstructured");
Node ntBase = indexRules.addNode("nt:base", "nt:unstructured");
Node properties = ntBase.addNode("properties", "nt:unstructured");
Node binaryDataProperty = properties.addNode("binaryData", "nt:unstructured");
binaryDataProperty.setProperty("name", propertyName);
binaryDataProperty.setProperty("propertyIndex", true);
binaryDataProperty.setProperty("analyzed", true);
Node jcrMimeTypeProperty = properties.addNode("jcr:mimeType");
jcrMimeTypeProperty.setProperty("name", "jcr:mimeType");
jcrMimeTypeProperty.setProperty("propertyIndex", true);
jcrMimeTypeProperty.setProperty("analyzed", true);

Then I search in this way:

String sql2QueryString = "SELECT * FROM [nt:base] WHERE CONTAINS([binaryData], 
'testo')";
Query sql2Query = queryManager.createQuery(sql2QueryString, Query.JCR_SQL2);
QueryResult result = sql2Query.execute();

and I read the results in this way:

NodeIterator nodes = result.getNodes();
while (nodes.hasNext()) {
    Node node = nodes.nextNode();
    log.info("Path: " + node.getPath());
    counter++;
}
log.info("Found {} results", counter);

I'm using oak 1.68.0 with tika-core and tika-parsers-standard-package 2.9.2.

In logs I see the indexing and the text extraction correctly, if you want I can 
attach a full log.

Really thank you for your help, best regards



Cordiali saluti / Best regards,

Raffaele Gambelli
Senior Java Developer
E  [email protected]<mailto:[email protected]>

[CEGEKA]        Via Ettore Cristoni, 84
IT-40033 Bologna (IT), Italy
T +39 02 2544271
WWW.CEGEKA.COM<https://www.cegeka.com>

[https://2655225.fs1.hubspotusercontent-na1.net/hubfs/2655225/0.0%20Cegeka%20(new)/1.%20Visuals/Email%20Signatures/Annual_Report_Visuals_2023_Email%20Banner%201.png]<https://www.cegeka.com/it/annual-report-2023?utm_campaign=[EN]%20-%20Annual%20Report%202023&utm_source=email%20signature%20banner&utm_medium=email%20signature%20banner%20annual%20report%202023>
Dichiarazione di Riservatezza
Le informazioni contenute nella mail sono riservate. Se si rende conto di non 
essere il destinatario corretto della mail, la preghiamo di segnalare l'errore 
al mittente e di cancellare immediatamente il messaggio. L’utilizzo improprio 
di informazioni riservate può comportare sanzioni.
Protezione dei dati personali
La informiamo che i suoi dati saranno trattati da Cegeka nel rispetto delle 
disposizioni di legge applicabili (D. Lgs 196/2003 e Regolamento UE 679/2016). 
Per maggiori dettagli può consultare le nostre informative privacy al link 
https://www.cegeka.com/it/informazioni-sulla-privacy.<https://www.cegeka.com/it/informazioni-sulla-privacy>


Reply via email to