Hi Orlando,

Did you read the answer from Alexander Klimetschek here: http://mail-archives.apache.org/mod_mbox/jackrabbit-users/201207.mbox/%[email protected]%3E ?

On 27.06.2012, at 17:19, Furst, Carl wrote:

> > So here's the sql I use:
> >
> > select * from [nt:resource] where  contains([jcr:data], 'include');
>
> The full text index for binary properties is by default aggregated on > the node itself, not > on the jcr:data property. You address that with "*" and you need a selector (s in this case):
>
> select * from [nt:resource] as s where contains(s.*, 'include')
>
> (In the former sql1 you could simply to CONTAINS(., 'include') to > adress the node itself).
>
> See my recent mail (about xpath, but same index is used): http://markmail.org/message/oc6uootrpxepso4d

> Cheers,
> Alex

Hope this helps,

Torsten


On 07.06.2013 02:58, Orlando Palis wrote:
Hi Folks,

I'm new to jackrabbit and I'm trying out full-text search using jackrabbit
2.6.0. (with tika 1.3) . I have a custom node type that allows me to store
some custom properties and multiple html files (stored as binary) .  I have
the following configurations:

*workspace.xml:*

<?xml version="1.0" encoding="UTF-8"?>
<Workspace name="default">
         <!--
             virtual file system of the workspace:
             class: FQN of class implementing the FileSystem interface
         -->
         <FileSystem
class="org.apache.jackrabbit.core.fs.db.OracleFileSystem">
             <param name="dataSourceName" value="ds1"/>
             <param name="schemaObjectPrefix" value="fs_${wsp.name}_"/>
         </FileSystem>
         <!--
             persistence manager of the workspace:
             class: FQN of class implementing the PersistenceManager
interface
         -->
         <PersistenceManager
class="org.apache.jackrabbit.core.persistence.pool.OraclePersistenceManager">
             <param name="dataSourceName" value="ds1"/>
             <param name="schemaObjectPrefix" value="pm_${wsp.name}_"/>
         </PersistenceManager>
         <!--
             Search index and the file system it uses.
             class: FQN of class implementing the QueryHandler interface
         -->
         <SearchIndex
class="org.apache.jackrabbit.core.query.lucene.SearchIndex">
             <param name="path" value="${wsp.home}/index"/>
             <param name="analyzer"
value="org.apache.lucene.analysis.standard.StandardAnalyzer"/>
             <param name="queryClass"
value="org.apache.jackrabbit.core.query.QueryImpl"/>
             <param name="excerptProviderClass"
value="org.apache.jackrabbit.core.query.lucene.DefaultHTMLExcerpt"/>
             <param name="supportHighlighting" value="true"/>
             <param name="tikaConfigPath"
value="${wsp.home}/tika-config.xml"/>
         </SearchIndex>
</Workspace>


*tika-config.xml:*

<?xml version="1.0" encoding="UTF-8"?>
<properties>
     <mimeTypeRepository resource="/org/apache/tika/mime/tika-mimetypes.xml"
magic="false"/>
     <parsers>
            <parser name="parse-html"
class="org.apache.tika.parser.html.HtmlParser">
                <mime>text/html</mime>
                <mime>application/xhtml+xml</mime>
                <mime>application/x-asp</mime>
            </parser>
     </parsers>
</properties>

*JCR-SQL2 queries tested:*

1) SELECT * FROM [nt:file] as file WHERE CONTAINS(file.*, 'This')

2) SELECT * FROM [nt:file] as file WHERE CONTAINS(file.*, 'This*')

3)
SELECT file.*, resource.* FROM [nt:file] AS file
INNER JOIN [nt:resource] AS resource ON ISCHILDNODE(resource, file)
WHERE resource.[jcr:mimeType] = 'text/html'
AND CONTAINS(file.*, 'This')

4)
SELECT file.*, resource.* FROM [nt:file] AS file
INNER JOIN [nt:resource] AS resource ON ISCHILDNODE(resource, file)
WHERE resource.[jcr:mimeType] = 'text/html'
AND CONTAINS(file.*, 'This*')

*Result:*
Nothing seems to work.  If I remove the CONTAINS() clause from the queries,
I am able to get rows from all the queries above and for query #3 & #4 I
can see that the field resource.[jcr:data] has the text ("This") I am
searching for when I dump the result to the log file.  I've also tried
deleting the index folder so that the repository will be re-indexed but I
am still not able to do full-text search successfully.

What am I missing?  In addition, is there any documentation on how to
configure tika (tika-config.xml)?


Thanks and Regards,
Orlando



--
Torsten Stolpmann
Geschäftsführender Gesellschafter

verit Informationssysteme GmbH
Europaallee 10
67657 Kaiserslautern

E-Mail: [email protected]
Telefon: +49 631 520 840 00
Fax: +49 631 520 840 01
Web: http://www.verit.de/

Registergericht: Amtsgericht Kaiserslautern
Registernummer: HRB 3751
Geschäftsleitung: Claudia Könnecke, Torsten Stolpmann

Reply via email to