Re: How to index PDF file stored in SQL Server 2008

Lance Norskog Thu, 07 Apr 2011 19:23:28 -0700

You need the TikaEntityProcessor to unpack the PDF image. You are
sticking binary blobs into the index. Tika unpacks the text out of the
file.


TikaEP is not in Solr 1.4, but it is in the new Solr 3.1 release.

On Thu, Apr 7, 2011 at 7:14 PM, Roy Liu <liuchua...@gmail.com> wrote:
> Hi,
>
> I have a table named *attachment *in MS SQL Server 2008.
>
> COLUMN    TYPE
> -------------     ----------------
> id               int
> title            varchar(200)
> attachment image
>
> I need to index the attachment(store pdf files) column from database via
> DIH.
>
> After access this URL, it returns "Indexing completed. Added/Updated: 5
> documents. Deleted 0 documents."
> http://localhost:8080/solr/dataimport?command=full-import
>
> However, I can not search anything.
>
> Anyone can help me ?
>
> Thanks.
>
>
> --------------------
> *data-config-sql.xml*
> <dataConfig>
>  <dataSource type="JdbcDataSource"
>              driver="com.microsoft.sqlserver.jdbc.SQLServerDriver"
>              url="jdbc:sqlserver://localhost:1433;databaseName=master"
>              user="user"
>              password="pw"/>
>  <document>
>    <entity name="doc"
>            query="select id,title,attachment from attachment">
>    </entity>
>  </document>
> </dataConfig>
>
> *schema.xml*
> <field name="attachment" type="text" indexed="true" stored="true"/>
>
>
>
> Best Regards,
> Roy Liu
>



-- 
Lance Norskog
goks...@gmail.com

Re: How to index PDF file stored in SQL Server 2008

Reply via email to