You need the TikaEntityProcessor to unpack the PDF image. You are sticking binary blobs into the index. Tika unpacks the text out of the file.
TikaEP is not in Solr 1.4, but it is in the new Solr 3.1 release. On Thu, Apr 7, 2011 at 7:14 PM, Roy Liu <liuchua...@gmail.com> wrote: > Hi, > > I have a table named *attachment *in MS SQL Server 2008. > > COLUMN TYPE > ------------- ---------------- > id int > title varchar(200) > attachment image > > I need to index the attachment(store pdf files) column from database via > DIH. > > After access this URL, it returns "Indexing completed. Added/Updated: 5 > documents. Deleted 0 documents." > http://localhost:8080/solr/dataimport?command=full-import > > However, I can not search anything. > > Anyone can help me ? > > Thanks. > > > -------------------- > *data-config-sql.xml* > <dataConfig> > <dataSource type="JdbcDataSource" > driver="com.microsoft.sqlserver.jdbc.SQLServerDriver" > url="jdbc:sqlserver://localhost:1433;databaseName=master" > user="user" > password="pw"/> > <document> > <entity name="doc" > query="select id,title,attachment from attachment"> > </entity> > </document> > </dataConfig> > > *schema.xml* > <field name="attachment" type="text" indexed="true" stored="true"/> > > > > Best Regards, > Roy Liu > -- Lance Norskog goks...@gmail.com