Thanks a lot Jan. That works very well. I am now trying to index the doc in Solr deleting the extractOnly parameter and can't find any similiar option to get the data indexed in plain text. I am getting the metadata as well, This is my request. http://localhost:8983/solr/document/update/extract?iteral.id=DDOC001&stream.file=C :\TIKA\FileTest\Test.txt&commit=true&fmap.content=DocContentS
My DocContentS contains \n \n stream_size 13 \n X-Parsed-By org.apache.tika.parser.DefaultParser \n X-Parsed-By org.apache.tika.parser.txt.TXTParser \n stream_name Test.txt \n stream_source_info file:/C:/TIKA/FileTest/Test.txt \n Content-Encoding ISO-8859-1 \n Content-Type text/plain; charset=ISO-8859-1 \n \n \n Prueba Sergio \n " I can't find anywhere how to modify this behaviour. On Wed, 14 Nov 2018 at 13:06, Jan Høydahl <jan....@cominvent.com> wrote: > Have you tried to specify &extractFormat=text > > -- > Jan Høydahl, search solution architect > Cominvent AS - www.cominvent.com > > > 14. nov. 2018 kl. 12:09 skrev marotosg <marot...@gmail.com>: > > > > Hi all, > > > > Currently I am trying to do index documents from different kinds with > Solr > > and tika. It's working fine but when solr returns the content of the > > document. Doesn't return the plain text. It comes back as well with some > > metadata. > > > > For instance my request. > > > http://localhost:8983/solr/document/update/extract?extractOnly=true&stream.file=C > :\TIKA\FileTest\Test.txt > > > > Content of Test.txt file is just "*Test File*". > > > > Response from Solr as you can see below returns plenty of information. > > I would the answer to be something like this without noise for the > search. > > <str name="Test.txt"> > > Test File > > </str> > > > > <response> > > <lst name="responseHeader"> > > <int name="status">0</int> > > <int name="QTime">135</int> > > </lst> > > <str name="Test.txt"> > > <?xml version="1.0" encoding="UTF-8"?> <html > > xmlns="http://www.w3.org/1999/xhtml"> <head> <meta name="stream_size" > > content="13"/> <meta name="X-Parsed-By" > > content="org.apache.tika.parser.DefaultParser"/> <meta name="X-Parsed-By" > > content="org.apache.tika.parser.txt.TXTParser"/> <meta name="stream_name" > > content="Test.txt"/> <meta name="stream_source_info" > > content="file:/C:/TIKA/FileTest/Test.txt"/> <meta name="Content-Encoding" > > content="ISO-8859-1"/> <meta name="Content-Type" content="text/plain; > > charset=ISO-8859-1"/> <title></title> </head> <body> <p>Test File</p> > > </body> </html> > > </str> > > <lst name="Test.txt_metadata"> > > <arr name="stream_size"> > > <str>13</str> > > </arr> > > <arr name="X-Parsed-By"> > > <str>org.apache.tika.parser.DefaultParser</str> > > <str>org.apache.tika.parser.txt.TXTParser</str> > > </arr> > > <arr name="stream_name"> > > <str>Test.txt</str> > > </arr> > > <arr name="stream_source_info"> > > <str>file:/C:/TIKA/FileTest/Test.txt</str> > > </arr> > > <arr name="Content-Encoding"> > > <str>ISO-8859-1</str> > > </arr> > > <arr name="Content-Type"> > > <str>text/plain; charset=ISO-8859-1</str> > > </arr> > > </lst> > > </response> > > > > Can anyone give some light here? > > Thanks a lot. > > > > > > > > -- > > Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html > >