Hi all,
I was able to find results but only looking for any properties, so with a query
like this:
SELECT * FROM [nt:unstructured] WHERE CONTAINS(*, 'text')
and indexing any properties with an index like this
/oak:index/assetType
- jcr:primaryType = "oak:QueryIndexDefinition"
- type = "lucene"
- compatVersion = 2
- async = "async"
+ indexRules
- jcr:primaryType = "nt:unstructured"
+ nt:base
+ properties
- jcr:primaryType = "nt:unstructured"
+ allProps
- name = ".*"
- isRegexp = true
- nodeScopeIndex = true
This means that search on indexed binary data works but I would like really to
do it working querying only specific property and indexing only that specific
property too, this remains a mystery.
Do you have some explanation for this different behaviour?
Thanks
Cordiali saluti / Best regards,
Raffaele Gambelli
Senior Java Developer
E [email protected]<mailto:[email protected]>
[CEGEKA] Via Ettore Cristoni, 84
IT-40033 Bologna (IT), Italy
T +39 02 2544271
WWW.CEGEKA.COM<https://www.cegeka.com/>
[https://2655225.fs1.hubspotusercontent-na1.net/hubfs/2655225/0.0%20Cegeka%20(new)/1.%20Visuals/Email%20Signatures/Annual_Report_Visuals_2023_Email%20Banner%201.png]<https://www.cegeka.com/it/annual-report-2023?utm_campaign=[EN]%20-%20Annual%20Report%202023&utm_source=email%20signature%20banner&utm_medium=email%20signature%20banner%20annual%20report%202023>
Dichiarazione di Riservatezza
Le informazioni contenute nella mail sono riservate. Se si rende conto di non
essere il destinatario corretto della mail, la preghiamo di segnalare l'errore
al mittente e di cancellare immediatamente il messaggio. L’utilizzo improprio
di informazioni riservate può comportare sanzioni.
Protezione dei dati personali
La informiamo che i suoi dati saranno trattati da Cegeka nel rispetto delle
disposizioni di legge applicabili (D. Lgs 196/2003 e Regolamento UE 679/2016).
Per maggiori dettagli può consultare le nostre informative privacy al link
https://www.cegeka.com/it/informazioni-sulla-privacy.<https://www.cegeka.com/it/informazioni-sulla-privacy>
________________________________
From: Raffaele Gambelli <[email protected]>
Sent: Thursday, September 12, 2024 10:20 AM
To: [email protected] <[email protected]>
Subject: Re: Indexing a binary and searching with contains, help request
Thanks Julian and Thomas,
* yes I know
https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Foakutils.appspot.com%2Fgenerate%2Findex&data=05%7C02%7CRaffaele.Gambelli%40cegeka.com%7C10da968dbe3d4e6f16ef08dcd303be20%7C42151053019347aa9e81effd81f772cc%7C0%7C0%7C638617260232131239%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C0%7C%7C%7C&sdata=RA01QK73yUKSOkjEY7Kmry5%2BpsrGNsb6nSHOd%2Bfh1%2F4%3D&reserved=0<https://oakutils.appspot.com/generate/index>
but it wasn't useful to accomplish my task.
*
I've already post same question in stackoverflow
https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fstackoverflow.com%2Fquestions%2F78973742%2Findexing-a-binary-and-searching-with-contains-cannot-find-results&data=05%7C02%7CRaffaele.Gambelli%40cegeka.com%7C10da968dbe3d4e6f16ef08dcd303be20%7C42151053019347aa9e81effd81f772cc%7C0%7C0%7C638617260232146028%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C0%7C%7C%7C&sdata=5SFlW5tCCzLCF78yw9BGbYsSDGijPV5KV1nrznYRILU%3D&reserved=0<https://stackoverflow.com/questions/78973742/indexing-a-binary-and-searching-with-contains-cannot-find-results>
I think and hope that a junit or something similar exists in the oak repository
with the goal of testing a scenario like mine, which I think is fairly typical:
binary data (text/plain or application/pdf), indexed and a full-text search
with contains that pulls it up
Is anyone able to find it and give me a link please?
Cordiali saluti / Best regards,
Raffaele Gambelli
Senior Java Developer
E [email protected]<mailto:[email protected]>
[CEGEKA] Via Ettore Cristoni, 84
IT-40033 Bologna (IT), Italy
T +39 02 2544271
https://eur03.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.cegeka.com%2F&data=05%7C02%7CRaffaele.Gambelli%40cegeka.com%7C10da968dbe3d4e6f16ef08dcd303be20%7C42151053019347aa9e81effd81f772cc%7C0%7C0%7C638617260232156784%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C0%7C%7C%7C&sdata=ksHifn4Q47%2BKdNJaQiqgHoPfcNd92PoiZtcmY47L%2Bnc%3D&reserved=0<https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.cegeka.com%2F&data=05%7C02%7CRaffaele.Gambelli%40cegeka.com%7C10da968dbe3d4e6f16ef08dcd303be20%7C42151053019347aa9e81effd81f772cc%7C0%7C0%7C638617260232164843%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C0%7C%7C%7C&sdata=VZmssxVIHJob4jXkWnI5gqV23XuHeRobjttmNgV6lpY%3D&reserved=0><http://www.cegeka.com/>
[https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2F2655225.fs1.hubspotusercontent-na1.net%2Fhubfs%2F2655225%2F0.0%2520Cegeka%2520&data=05%7C02%7CRaffaele.Gambelli%40cegeka.com%7C10da968dbe3d4e6f16ef08dcd303be20%7C42151053019347aa9e81effd81f772cc%7C0%7C0%7C638617260232171769%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C0%7C%7C%7C&sdata=AEzRQGA%2B%2FzKunNJaZFxNYdGQQ6DrSjnK2WMnaTfMCGU%3D&reserved=0(new)/1.%20Visuals/Email%20Signatures/Annual_Report_Visuals_2023_Email%20Banner%201.png]<https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.cegeka.com%2Fit%2Fannual-report-2023%3Futm_campaign%3D&data=05%7C02%7CRaffaele.Gambelli%40cegeka.com%7C10da968dbe3d4e6f16ef08dcd303be20%7C42151053019347aa9e81effd81f772cc%7C0%7C0%7C638617260232178818%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C0%7C%7C%7C&sdata=U%2Fn2JpUqGjn5bX7VJ6wfn00XD5f2d1Fp8xc88lI5IHo%3D&reserved=0[EN]%20-%20Annual%20Report%202023&utm_source=email%20signature%20banner&utm_medium=email%20signature%20banner%20annual%20report%202023<https://2655225.fs1.hubspotusercontent-na1.net/hubfs/2655225/0.0%20Cegeka%20>>
Dichiarazione di Riservatezza
Le informazioni contenute nella mail sono riservate. Se si rende conto di non
essere il destinatario corretto della mail, la preghiamo di segnalare l'errore
al mittente e di cancellare immediatamente il messaggio. L’utilizzo improprio
di informazioni riservate può comportare sanzioni.
Protezione dei dati personali
La informiamo che i suoi dati saranno trattati da Cegeka nel rispetto delle
disposizioni di legge applicabili (D. Lgs 196/2003 e Regolamento UE 679/2016).
Per maggiori dettagli può consultare le nostre informative privacy al link
https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.cegeka.com%2Fit%2Finformazioni-sulla-privacy&data=05%7C02%7CRaffaele.Gambelli%40cegeka.com%7C10da968dbe3d4e6f16ef08dcd303be20%7C42151053019347aa9e81effd81f772cc%7C0%7C0%7C638617260232185474%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C0%7C%7C%7C&sdata=C%2BWqlYZY1VHFXRbgZfakP3bWQm8dW2tAn%2BgvrzGAHdk%3D&reserved=0.<https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.cegeka.com%2Fit%2Finformazioni-sulla-privacy&data=05%7C02%7CRaffaele.Gambelli%40cegeka.com%7C10da968dbe3d4e6f16ef08dcd303be20%7C42151053019347aa9e81effd81f772cc%7C0%7C0%7C638617260232191941%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C0%7C%7C%7C&sdata=tEJZH28Rp8H1GItaNVQKIUDcllTTCqNsJe4fiZvQzB8%3D&reserved=0><https://www.cegeka.com/it/informazioni-sulla-privacy>
________________________________
From: Thomas Mueller <[email protected]>
Sent: Thursday, September 12, 2024 9:19 AM
To: [email protected] <[email protected]>
Subject: Re: Indexing a binary and searching with contains, help request
[You don't often get email from [email protected]. Learn why this is
important at https://aka.ms/LearnAboutSenderIdentification ]
Hi,
I'm not sure if you are aware of the following, it might help:
https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Foakutils.appspot.com%2Fgenerate%2Findex&data=05%7C02%7CRaffaele.Gambelli%40cegeka.com%7C10da968dbe3d4e6f16ef08dcd303be20%7C42151053019347aa9e81effd81f772cc%7C0%7C0%7C638617260232198556%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C0%7C%7C%7C&sdata=JgBIF2VNbLPYSDS1FCXWM1m8jPze491e3%2Fcvbtet5iA%3D&reserved=0<https://oakutils.appspot.com/generate/index>
https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.aemstuff.com%2Fblogs%2Ffeb%2Faemindexcheatsheat.html&data=05%7C02%7CRaffaele.Gambelli%40cegeka.com%7C10da968dbe3d4e6f16ef08dcd303be20%7C42151053019347aa9e81effd81f772cc%7C0%7C0%7C638617260232204890%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C0%7C%7C%7C&sdata=r%2FjwMqG28%2BMKgiLTbd4HT79b%2FVTQFj%2BPD%2FEYPVZ0yrU%3D&reserved=0<https://www.aemstuff.com/blogs/feb/aemindexcheatsheat.html>
https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fexperienceleague.adobe.com%2Fdocs%2Fexperience-manager-65%2Fassets%2FJCR_query_cheatsheet-v1.1.pdf&data=05%7C02%7CRaffaele.Gambelli%40cegeka.com%7C10da968dbe3d4e6f16ef08dcd303be20%7C42151053019347aa9e81effd81f772cc%7C0%7C0%7C638617260232211374%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C0%7C%7C%7C&sdata=A%2FVQklPET4UdtVuSHpaHYMzTMkBbfb8J4DowAz24vaw%3D&reserved=0<https://experienceleague.adobe.com/docs/experience-manager-65/assets/JCR_query_cheatsheet-v1.1.pdf>
These were written for the Adobe AEM product, but I find them useful even
outside of AEM.
And here an example index definition:
{
"/oak:index/acmeAsset-1": {
"compatVersion": 2,
"type": "lucene",
"tags": ["asset"],
"async": ["async", "nrt"],
"includedPaths": ["/content/dam"],
"jcr:primaryType": "oak:QueryIndexDefinition",
"evaluatePathRestrictions": true,
"maxFieldLength": 100000,
"aggregates": {
"jcr:primaryType": "nt:unstructured",
"dam:Asset": {
"jcr:primaryType": "nt:unstructured",
"include0": {
"path": "jcr:content",
"jcr:primaryType": "nt:unstructured"
},
"include1": {
"path": "jcr:content/metadata",
"jcr:primaryType": "nt:unstructured"
},
"include2": {
"path": "jcr:content/metadata/*",
"jcr:primaryType": "nt:unstructured"
},
"include3": {
"path": "jcr:content/renditions",
"jcr:primaryType": "nt:unstructured"
},
"include4": {
"path": "jcr:content/renditions/original",
"jcr:primaryType": "nt:unstructured"
},
"include5": {
"path": "jcr:content/renditions/original/jcr:content",
"jcr:primaryType": "nt:unstructured"
},
"include6": {
"path": "jcr:content/comments",
"jcr:primaryType": "nt:unstructured"
},
"include7": {
"path": "jcr:content/comments/*",
"jcr:primaryType": "nt:unstructured"
},
"include8": {
"path": "jcr:content/data/master",
"jcr:primaryType": "nt:unstructured"
},
"include9": {
"path": "jcr:content/usages",
"jcr:primaryType": "nt:unstructured"
},
"include10": {
"path": "jcr:content/renditions/text.txt/jcr:content",
"jcr:primaryType": "nt:unstructured"
}
}
},
"facets": {
"jcr:primaryType": "nt:unstructured",
"topChildren": "100",
"secure": "insecure"
},
"indexRules": {
"jcr:primaryType": "nt:unstructured",
"dam:Asset": {
"jcr:primaryType": "nt:unstructured",
"properties": {
"jcr:primaryType": "nt:unstructured",
"jcrLastModified": {
"ordered": true,
"name": "jcr:content/jcr:lastModified",
"propertyIndex": true,
"jcr:primaryType": "nt:unstructured",
"type": "Date"
},
"jcrTitle": {
"useInSpellcheck": true,
"useInSuggest": true,
"nodeScopeIndex": true,
"name": "jcr:content/jcr:title",
"propertyIndex": true,
"boost": 2.0,
"jcr:primaryType": "nt:unstructured"
},
"jcrDescription": {
"nodeScopeIndex": true,
"useInSpellcheck": true,
"name": "jcr:content/jcr:description",
"propertyIndex": true,
"jcr:primaryType": "nt:unstructured",
"useInSuggest": true
},
"jcrCreated": {
"ordered": true,
"name": "jcr:created",
"propertyIndex": true,
"jcr:primaryType": "nt:unstructured",
"type": "Date"
},
"nodeName": {
"nodeScopeIndex": true,
"name": ":nodeName",
"jcr:primaryType": "nt:unstructured",
"useInSuggest": true
},
}
}
}
}
}
I wonder if nowadays, you would get more answers on stackoverflow.com? I'm not
sure...
Regards,
Thomas