From: SARA KRISHNAN (IRIS-ISD-OOCLL/SNT)
Sent: Wednesday, January 22, 2014 4:40 PM
To: '[email protected]'; '[email protected]'
Cc: ORLANDO PALIS (IRIS-ISD-OOCLL/SNT)
Subject: Help needed - Text search not working in Jackrabbit 2.6.5
Hello,
We have been working with Jackrabbit for a year now, starting with version
2.4.3 and now in the process of upgrading to the latest stable version 2.6.5.
We have not been able to get the content search function to work since the move
to this latest version. A bit of a background on our setup and requirements.
We are referencing Jackrabbit jars in Weblogic 10.1.3.6 via server classpath
setting as highlighted below, along with our application. Users need to be able
to search the content, html files, in the repository using JCR-SQL2.
1.
CLASSPATH=D:\oracle\MIDDLE~1\modules\javax.persistence_1.1.0.0_2-0.jar;D:\oracle\MIDDLE~1\modules\com.oracle.jpa2support_1.0.0.0_2-0.jar;D:\oracle\MIDDLE~1\patch_wls1036\profiles\default\sys_manifest_classpath\weblogic_patch.jar;D:\oracle\MIDDLE~1\patch_ocp371\profiles\default\sys_manifest_classpath\weblogic_patch.jar;D:\Oracle\Middleware\coherence_3.7.0.2\eclipselink.jar;D:\Oracle\Middleware\coherence_3.7.0.2\coherence.jar;D:\Oracle\Middleware\coherence_3.7.0.2\toplink-grid.jar;D:\oracle\MIDDLE~1\JROCKI~1.1-3\lib\tools.jar;D:\oracle\MIDDLE~1\WLSERV~1.3\server\lib\weblogic_sp.jar;D:\oracle\MIDDLE~1\WLSERV~1.3\server\lib\weblogic.jar;D:\oracle\MIDDLE~1\modules\features\weblogic.server.modules_10.3.6.0.jar;D:\oracle\MIDDLE~1\WLSERV~1.3\server\lib\webservices.jar;D:\oracle\MIDDLE~1\modules\ORGAPA~1.1/lib/ant-all.jar;D:\oracle\MIDDLE~1\modules\NETSFA~1.0_1/lib/ant-contrib.jar;D:\oracle\MIDDLE~1\WLSERV~1.3\common\derby\lib\derbyclient.jar;D:\oracle\MIDDLE~1\WLSERV~1.3\server\lib\xqrl.jar;D:/Oracle/Middleware/modules/javax.persistence_1.1.0.0_2-0.jar;D:/oracle/Middleware/jackrabbit/jackrabbit-api-2.6.5.jar;D:/oracle/Middleware/jackrabbit/jackrabbit-core-2.6.5.jar;D:/oracle/Middleware/jackrabbit/jackrabbit-jcr-commons-2.6.5.jar;D:/oracle/Middleware/jackrabbit/jackrabbit-jcr-rmi-2.6.5.jar;D:/oracle/Middleware/jackrabbit/jackrabbit-jcr-server-2.6.5.jar;D:/oracle/Middleware/jackrabbit/jackrabbit-jcr-servlet-2.6.5.jar;D:/oracle/Middleware/jackrabbit/jackrabbit-spi-2.6.5.jar;D:/oracle/Middleware/jackrabbit/jackrabbit-spi-commons-2.6.5.jar;D:/oracle/Middleware/jackrabbit/jackrabbit-webdav-2.6.5.jar;D:/oracle/Middleware/jackrabbit/commons-io-2.2.jar;D:/oracle/Middleware/jackrabbit/concurrent-1.3.4.jar;D:/oracle/Middleware/jackrabbit/jcr-2.0.jar;D:/oracle/Middleware/jackrabbit/lucene-core-3.6.0.jar;D:/oracle/Middleware/jackrabbit/slf4j-api-1.6.4.jar;D:/oracle/Middleware/jackrabbit/tika-core-1.3.jar;D:/oracle/Middleware/jackrabbit/tika-parsers-1.3.jar;D:/oracle/Middleware/jackrabbit/tagsoup-1.2.1.jar;%CLASSPATH%;
2. Attached Repository xml with workspace section changed to load an
externalized tika-config xml
3. Attached tika-config.xml with HTMLParser configuration.
4. SQL used to search the content
SELECT rt.*, file.*, resource.* FROM [rt:RuleTariff] AS rt INNER JOIN [rt:file]
AS file ON ISCHILDNODE(file, rt) INNER JOIN [nt:resource] AS resource ON
ISCHILDNODE(resource, file) WHERE resource.[jcr:mimeType] = 'text/html' AND
CONTAINS(resource.*, 'two') //'two' is the text being searched
The content search works perfectly well using the sql above when we switch back
to Jackrabbit 2.4.3, which uses Lucene.
1. Are we missing any other configuration ?
2. Can we use lucene for full text search in 2.6.5 instead of tika ? If yes,
how should this be configured ?
Thanks in advance.
-Sara
IMPORTANT NOTICE
Email from OOCL is confidential and may be legally privileged. If it is not
intended for you, please delete it immediately unread. The internet
cannot guarantee that this communication is free of viruses, interception
or interference and anyone who communicates with us by email is taken
to accept the risks in doing so. Without limitation, OOCL and its affiliates
accept no liability whatsoever and howsoever arising in connection with
the use of this email. Under no circumstances shall this email constitute
a binding agreement to carry or for provision of carriage services by OOCL,
which is subject to the availability of carrier's equipment and vessels and
the terms and conditions of OOCL's standard bill of lading which is also
available at http://www.oocl.com.
<?xml version="1.0"?>
<!--
Licensed to the Apache Software Foundation (ASF) under one or more
contributor license agreements. See the NOTICE file distributed with
this work for additional information regarding copyright ownership.
The ASF licenses this file to You under the Apache License, Version 2.0
(the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
<!DOCTYPE Repository
PUBLIC "-//The Apache Software Foundation//DTD Jackrabbit 2.0//EN"
"http://jackrabbit.apache.org/dtd/repository-2.0.dtd">
<!-- Example Repository Configuration File
Used by
- org.apache.jackrabbit.core.config.RepositoryConfigTest.java
-
-->
<Repository>
<!--
virtual file system where the repository stores global state
(e.g. registered namespaces, custom node types, etc.)
-->
<DataSources>
<DataSource name="ds1">
<param name="driver" value="javax.naming.InitialContext"/>
<param name="url" value="JACKRABBITDS"/>
<param name="databaseType" value="oracle"/>
</DataSource>
</DataSources>
<FileSystem class="org.apache.jackrabbit.core.fs.db.OracleFileSystem">
<param name="dataSourceName" value="ds1"/>
<param name="schemaObjectPrefix" value="fs_" />
</FileSystem>
<!--
data store configuration
-->
<DataStore class="org.apache.jackrabbit.core.data.db.DbDataStore">
<param name="dataSourceName" value="ds1"/>
<param name="schemaObjectPrefix" value="ds_" />
</DataStore>
<!--
security configuration
-->
<Security appName="Jackrabbit">
<!--
security manager:
class: FQN of class implementing the JackrabbitSecurityManager interface
-->
<SecurityManager class="org.apache.jackrabbit.core.DefaultSecurityManager" workspaceName="security">
<!--
workspace access:
class: FQN of class implementing the WorkspaceAccessManager interface
-->
<!-- <WorkspaceAccessManager class="..."/> -->
<!-- <param name="config" value="${rep.home}/security.xml"/> -->
</SecurityManager>
<!--
access manager:
class: FQN of class implementing the AccessManager interface
-->
<AccessManager class="org.apache.jackrabbit.core.security.DefaultAccessManager">
<!-- <param name="config" value="${rep.home}/access.xml"/> -->
</AccessManager>
<LoginModule class="org.apache.jackrabbit.core.security.authentication.DefaultLoginModule">
<!--
anonymous user name ('anonymous' is the default value)
-->
<param name="anonymousId" value="anonymous"/>
<!--
administrator user id (default value if param is missing is 'admin')
-->
<param name="adminId" value="admin"/>
</LoginModule>
</Security>
<!--
location of workspaces root directory and name of default workspace
-->
<Workspaces rootPath="${rep.home}/workspaces" defaultWorkspace="default"/>
<!--
workspace configuration template:
used to create the initial workspace if there's no workspace yet
-->
<Workspace name="${wsp.name}">
<!--
virtual file system of the workspace:
class: FQN of class implementing the FileSystem interface
-->
<FileSystem class="org.apache.jackrabbit.core.fs.db.OracleFileSystem">
<param name="dataSourceName" value="ds1"/>
<param name="schemaObjectPrefix" value="fs_${wsp.name}_" />
</FileSystem>
<!--
persistence manager of the workspace:
class: FQN of class implementing the PersistenceManager interface
-->
<PersistenceManager class="org.apache.jackrabbit.core.persistence.pool.OraclePersistenceManager">
<param name="dataSourceName" value="ds1"/>
<param name="schemaObjectPrefix" value="pm_${wsp.name}_" />
</PersistenceManager>
<!--
Search index and the file system it uses.
class: FQN of class implementing the QueryHandler interface
-->
<SearchIndex class="org.apache.jackrabbit.core.query.lucene.SearchIndex">
<param name="path" value="${wsp.home}/index"/>
<param name="supportHighlighting" value="true"/>
<param name="tikaConfigPath" value="${rep.home}/../../config/tika-config.xml"/>
<param name="useCompoundFile" value="true"/>
<param name="minMergeDocs" value="100"/>
<param name="volatileIdleTime" value="3"/>
<param name="maxMergeDocs" value="2147483647"/>
<param name="mergeFactor" value="10"/>
<param name="maxFieldLength" value="10000"/>
<param name="bufferSize" value="10"/>
<param name="cacheSize" value="1000"/>
<param name="forceConsistencyCheck" value="false"/>
<param name="enableConsistencyCheck" value="false"/>
<param name="autoRepair" value="true"/>
<param name="analyzer" value="org.apache.lucene.analysis.standard.StandardAnalyzer"/>
<param name="queryClass" value="org.apache.jackrabbit.core.query.QueryImpl"/>
<param name="respectDocumentOrder" value="true"/>
<param name="resultFetchSize" value="2147483647"/>
<param name="extractorPoolSize" value="0"/>
<param name="extractorTimeout" value="100"/>
<param name="extractorBackLogSize" value="100"/>
<param name="supportHighlighting" value="true"/>
<param name="excerptProviderClass" value="org.apache.jackrabbit.core.query.lucene.DefaultHTMLExcerpt"/>
</SearchIndex>
</Workspace>
<!--
Configures the versioning
-->
<Versioning rootPath="${rep.home}/version">
<!--
Configures the filesystem to use for versioning for the respective
persistence manager
-->
<FileSystem class="org.apache.jackrabbit.core.fs.db.OracleFileSystem">
<param name="dataSourceName" value="ds1"/>
<param name="schemaObjectPrefix" value="fs_ver_" />
</FileSystem>
<!--
Configures the persistence manager to be used for persisting version state.
Please note that the current versioning implementation is based on
a 'normal' persistence manager, but this could change in future
implementations.
-->
<PersistenceManager class="org.apache.jackrabbit.core.persistence.pool.OraclePersistenceManager">
<param name="dataSourceName" value="ds1"/>
<param name="schemaObjectPrefix" value="pm_ver_" />
</PersistenceManager>
</Versioning>
<!--
Search index for content that is shared repository wide
(/jcr:system tree, contains mainly versions)
<SearchIndex class="org.apache.jackrabbit.core.query.lucene.SearchIndex">
<param name="path" value="${rep.home}/repository/index"/>
<param name="supportHighlighting" value="true"/>
</SearchIndex>
-->
<!--
Run with a cluster journal
-->
<Cluster id="node1">
<Journal class="org.apache.jackrabbit.core.journal.MemoryJournal"/>
</Cluster>
<RepositoryLockMechanism class="org.apache.jackrabbit.core.util.CooperativeFileLock"/>
</Repository>
<?xml version="1.0" encoding="UTF-8"?>
<!--
Licensed to the Apache Software Foundation (ASF) under one or more
contributor license agreements. See the NOTICE file distributed with
this work for additional information regarding copyright ownership.
The ASF licenses this file to You under the Apache License, Version 2.0
(the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
<properties>
<detectors>
<detector class="org.apache.tika.detect.DefaultDetector"/>
</detectors>
<parsers>
<parser name="parse-html" class="org.apache.tika.parser.html.HtmlParser">
<mime>text/html</mime>
<mime>application/xhtml+xml</mime>
<mime>application/x-asp</mime>
</parser>
<parser class="org.apache.tika.parser.DefaultParser"/>
<parser class="org.apache.tika.parser.EmptyParser">
<!-- Disable package extraction as it's too resource-intensive -->
<mime>application/x-archive</mime>
<mime>application/x-bzip</mime>
<mime>application/x-bzip2</mime>
<mime>application/x-cpio</mime>
<mime>application/x-gtar</mime>
<mime>application/x-gzip</mime>
<mime>application/x-tar</mime>
<mime>application/zip</mime>
<!-- Disable image extraction as there's no text to be found -->
<mime>image/bmp</mime>
<mime>image/gif</mime>
<mime>image/jpeg</mime>
<mime>image/png</mime>
<mime>image/vnd.wap.wbmp</mime>
<mime>image/x-icon</mime>
<mime>image/x-psd</mime>
<mime>image/x-xcf</mime>
</parser>
</parsers>
</properties>