So after some investigation I'm at a loss as to which class to use for text extraction (ie what to set textFilterClasses to in the workspace.xml file). Which class is the default in 2.4.2? The Wiki I think is incorrect... It states org.apache.jackrabbit.core.query.lucene.TextPlainTextFilter as the default, but I don't see that class in the source code.
Possible candidates are: Org.apache.jackrabbit.core.query.lucene.SearchIndex (regular search indexer) Org.apache.jackrabbit.core.query.lucene.BlockingParser org.apache.jackrabbit.core.query.lucene.LazyTextExtractorField Any suggestions? I'll plug in the last two and see if things improve. Thanks, Carl Furst On 7/11/12 1:36 PM, "Furst, Carl" <[email protected]> wrote: >2.4.2 - Thanks for the references.. I'll check out Tika and try a test. > >Thanks, >Carl Furst > > > > > >On 7/3/12 5:19 AM, "Alex Parvulescu" <[email protected]> wrote: > >>Hi Carl, >> >>What version of jackrabbit are you on? >> >>Next, are you sure you have the tika extractors in the classpath? maybe >>you >>are seeing something along the lines of [0]. >> >>I would try to isolate the problem by taking tomcat out of the setup. >>Build >>a simple test, see how it works then deploy on tomcat and verify. >>A good place to start is the unit test collection available in jackrabbit >>core [1]. >> >> >>best, >>alex >> >>[0] https://issues.apache.org/jira/browse/JCR-3287 >>[1] >>http://svn.apache.org/viewvc/jackrabbit/trunk/jackrabbit-core/src/test/ja >>v >>a/org/apache/jackrabbit/core/query/FulltextSQL2QueryTest.java?view=markup >> >> >>On Wed, Jun 27, 2012 at 8:06 PM, Furst, Carl <[email protected]> wrote: >> >>> So given the below I tried to use >>> >>> 'inclu*' and 'include*' and still no results so I'm going to start >>>looking >>> into perhaps maybe some of these reasons as why: >>> >>> >>>https://wiki.apache.org/lucene-java/LuceneFAQ#Why_am_I_getting_no_hits_. >>>2 >>>BA >>> C8_incorrect_hits.3F >>> >>> Of course it could just be that the parser is not parsing the '*'. >>> >>> Thanks again, >>> >>> >>> >>> Carl Furst >>> >>> >>> >>> >>> >>> On 6/27/12 1:59 PM, "Furst, Carl" <[email protected]> wrote: >>> >>> >Thanks Torsten, >>> > >>> >So even using JQOM would not help here. I'll read up more on lucine >>>and >>> >find out more. My main stumbling block here was where the query was >>>being >>> >executed. Was it on the Derby level or the Lucine level.. >>> > >>> >This has cleared that part of it up for me as well. >>> > >>> >Thanks again, >>> > >>> >Carl Furst >>> > >>> > >>> > >>> > >>> > >>> > >>> >On 6/27/12 1:50 PM, "Torsten Stolpmann" <[email protected]> wrote: >>> > >>> >>Hi Carl, >>> >> >>> >>per default the underlying Lucene implementation does not match >>>leading >>> >>wildcards for performance reasons. See also: >>> >> >>> >>>https://wiki.apache.org/lucene-java/LuceneFAQ#What_wildcard_search_suppo >>>r >>> >>t >>> >>_is_available_from_Lucene.3F >>> >> >>> >>So just matching '*' will not work, but eg. 'i*' might give you the >>> >>results you were looking for. >>> >> >>> >>Sadly enough I did not find any reference to this in the JackRabbit >>> >>documentation. >>> >> >>> >>Took me quite a while to find that too. >>> >> >>> >>Hope this helps, >>> >> >>> >>Torsten >>> >> >>> >>On 27.06.2012 17:19, Furst, Carl wrote: >>> >>> I'm probably missing something here but everything I've read so far >>> >>>leads >>> >>> me to believe this should work.. >>> >>> >>> >>> I have nodes in a repositoy of type nt:folder and nt:file. nt:file >>>has >>> >>>a >>> >>> child node jcr:content of type nt:resource which has a child >>>property >>> >>> called jcr:data >>> >>> >>> >>> There are many cases where the jcr:data column has the world >>>'include' >>> >>>in >>> >>> it. They are jsp files so, yes, I know this word exists in several >>> >>>files. >>> >>> >>> >>> So here's the sql I use: >>> >>> >>> >>> select * from [nt:resource] where contains([jcr:data], 'include'); >>> >>> >>> >>> Here's the sql that is returned from q.getStatement() : >>> >>> >>> >>> SELECT [nt:resource].* FROM [nt:resource] WHERE >>> >>> CONTAINS([nt:resource].[jcr:data], 'include'); >>> >>> >>> >>> Here is a sample text in jcr:data to search on. >>> >>> >>> >>> <%@ include file="..." >>> >>> >>> >>> >>> >>> ... More jsp here.. >>> >>> <%/jsp:include... >>> >>> >>> >>> Yet it doesn¹t find it. I feel I'm missing something.. Do I need to >>>add >>> >>>a >>> >>> "searchable" mixin or something? >>> >>> >>> >>> Any ideas why this is not being found? >>> >>> >>> >>> It used to be that apache had the cdn file for jackrabbit node >>>types >>> >>>was >>> >>> readily available. Does anyone know where I can find the cdn file >>>for >>> >>> jackrabbit node types? >>> >>> >>> >>> jcr:content is unstructured, but I explicitly make the type >>>nt:resource >>> >>> (otherwise the statement would would not be parsed, Query object >>>would >>> >>> throw an error, like "table not found," right? Because the type is >>>a >>> >>> table). So the type is right.. The field is right.. The search is >>>not >>> >>> working. >>> >>> >>> >>> >>> >>> I'm using Jackrabbit without any special configuration. Just the >>>war in >>> >>>a >>> >>> simple tomcat deployment. So it's sitting on top of Derby and >>>Lucine. >>> >>> >>> >>> >>> >>> Any help would be appreciated. >>> >>> >>> >>> Thanks, >>> >>> >>> >>> Carl Furst >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> ********************************************************** >>> >>> >>> >>> MLB.com: Where Baseball is Always On >>> >>> >>> >> >>> > >>> > >>> > >>> > >>> > >>> > >>> >********************************************************** >>> > >>> >MLB.com: Where Baseball is Always On >>> >>> >>> >>> >>> >>> >>> ********************************************************** >>> >>> MLB.com: Where Baseball is Always On >>> > > > > > > >********************************************************** > >MLB.com: Where Baseball is Always On ********************************************************** MLB.com: Where Baseball is Always On
