Hello there, I am using DIH for importing data from a mysql db and a
directory. For this purpose I have wrote my own Transformer class in order
to modify imported values under several cases. Now we need to add document
support for our indexing server and that leaded us to use Tika in order to
import documents' content. My index server contains data for the following
objects:
 
* Bookmarks

* Courses

* Files (here I need to use Tika)


All the previous elements share some common properties such as: Id, Title,
Description, Text. Also all the needed data are stored to the database and
thats why we decided to use a single DIH mechanism in order to import all
these elements to the Solr index. Of course in the case of the files I need
to read their content. 

So I have wrote something similar to the next code in order to handle
documents' content:


//each file is downloaded first using FTP
            FTPClient ftpClient = new FTPClient();
            ftpClient.connect("FTPServer");
            ftpClient.login("uname", "pass");                
            File localFile =  new File("/tmp/" + fileName);
            ftpClient.download("/repos/files/original/" + fileName,
localFile);
               
                
            InputStream input = new FileInputStream(localFile);            
            ContentHandler textHandler = new BodyContentHandler(-1);
            Metadata metadata = new Metadata();      
            
            AutoDetectParser parser = new AutoDetectParser();            
            try {
                parser.parse(input, textHandler, metadata);
            } catch (IOException ex) {
                Logger.getLogger("SCX.Indexing.Main").log(Level.SEVERE,
null, ex);
            } catch (SAXException ex) {
                Logger.getLogger("SCX.Indexing.Main").log(Level.SEVERE,
null, ex);
            } catch (TikaException ex) {
                Logger.getLogger("SCX.Indexing.Main").log(Level.SEVERE,
null, ex);
            }finally{
                input.close();
            }
            row.put("text", textHandler.toString());
            row.put("title", metadata.get("title"));


This code is under the transformRow method that my class overrides. 
The problem is that when I run the same code in a main class the code
executes normally but when I move the previous code to the transformRow
method, textHandler.toString() doesn't return any text neither metadata.
Also no exception is thrown!

Has anyone face something similar on the past?

Thanks a lot

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Tika-parser-doesn-t-seem-to-work-with-Solr-DIH-Row-Transformer-tp3148853p3148853.html
Sent from the Solr - User mailing list archive at Nabble.com.

Reply via email to