Hi, I'm having some problems with full-text search with Magnolia 5.2.5 EE.
This is properly working for Word files (.doc, .docx) but I can't search over
PDF files.
Here my test case.
I have uploaded a DOC and a PDF file both containing the key-word "worldcup".
Using the following FTL expression:
[#assign results_dam = cmsfn.simpleSearch("dam", "worldcup", "mgnl:asset", "/")
/]
only the .doc file is returned.
I have also tried to perform directly the following query:
select * from [nt:base] as t where ISDESCENDANTNODE([/]) AND contains(t.*,
'worldcup')
but still the PDF file is not returned.
What can be the reason? Is there any configuration to do not already included
in a standard installation?
I have tried to modify jackrabbit configuration file (in my local dev
environment is jackrabbit-bundle-derby-search.xml)adding the following
configuration to <SearchIndex>:
[code]
...
<param name="textFilterClasses"
value="org.apache.jackrabbit.extractor.PlainTextExtractor,
org.apache.jackrabbit.extractor.MsWordTextExtractor,
org.apache.jackrabbit.extractor.MsExcelTextExtractor,
org.apache.jackrabbit.extractor.MsPowerPointTextExtractor,
org.apache.jackrabbit.extractor.PdfTextExtractor,
org.apache.jackrabbit.extractor.OpenOfficeTextExtractor,
org.apache.jackrabbit.extractor.RTFTextExtractor,
org.apache.jackrabbit.extractor.HTMLTextExtractor,
org.apache.jackrabbit.extractor.XMLTextExtractor"/>
...
[/code]
but I have found the following WARNING in log file during Magnolia startup:
[i]WARN rg.apache.jackrabbit.core.query.lucene.SearchIndex:
The textFilterClasses configuration parameter has been deprecated, and the
configured value will be ignored:
org.apache.jackrabbit.extractor.PlainTextExtractor,
org.apache.jackrabbit.extractor.MsWordTextExtractor,org.apache.jackrabbit.extractor.MsExcelTextExtractor,org.apache.jackrabbit.extractor.MsPowerPointTextExtractor,org.apache.jackrabbit.extractor.PdfTextExtractor,org.apache.jackrabbit.extractor.OpenOfficeTextExtractor,org.apache.jackrabbit.extractor.RTFTextExtractor,org.apache.jackrabbit.extractor.HTMLTextExtractor,org.apache.jackrabbit.extractor.XMLTextExtractor[/i]
Thanks,
Pietro
--
Context is everything:
http://forum.magnolia-cms.com/forum/thread.html?threadId=6411247e-6641-49d6-8fac-d84b171a91af
----------------------------------------------------------------
For list details, see http://www.magnolia-cms.com/community/mailing-lists.html
Alternatively, use our forums: http://forum.magnolia-cms.com/
To unsubscribe, E-mail to: <[email protected]>
----------------------------------------------------------------