about PatternCaptureGroupFilterFactory. This isn't going to help. The data you see when you return stored data is _before_ any analysis so the Pattern....Factory won't be applied. You could do this in a ScriptUpdateProcessorFactory. Or, just don't worry about it and have the real app deal with it.
I don't particularly know about the Tika settings, that's largely a guess. Best, Erick On Thu, Nov 24, 2016 at 8:43 AM, Furkan KAMACI <furkankam...@gmail.com> wrote: > Hi Erick, > > 1) I am looking stored data via Solr Admin UI. I send the query and check > what is in content field. > > 2) I can debug the Tika settings if you think that this is not the desired > behaviour to have such metadata fields combined into content field. > > *PS: *Is there any solution to get rid of it except for > using PatternCaptureGroupFilterFactory? > > Kind Regards, > Furkan KAMACI > > On Thu, Nov 24, 2016 at 6:31 PM, Erick Erickson <erickerick...@gmail.com> > wrote: > >> 1> I'm assuming when you "see" this data you're looking at the stored >> data, right? It's a verbatim copy of whatever you sent to the field. >> I'm guessing it's a character-encoding mismatch between the source and >> what you use to display. >> >> 2> How are you extracting this data? There are Tika options I think >> that can/do mush fields together. >> >> Best, >> Erick >> >> >> >> On Thu, Nov 24, 2016 at 7:54 AM, Furkan KAMACI <furkankam...@gmail.com> >> wrote: >> > Hi, >> > >> > I'm testing Solr 4.9.1 I've indexed documents via it. Content field at >> > schema has text_general field type which is not modified from original. I >> > do not copy any fields to content. When I check the data I see content >> > values as like: >> > >> > " \n \nstream_source_info MARLON BRANDO.rtf \nstream_content_type >> > application/rtf \nstream_size 13580 \nstream_name MARLON BRANDO.rtf >> > \nContent-Type application/rtf \nresourceName MARLON BRANDO.rtf \n >> \n >> > \n 1. Vivien Leigh and Marlon Brando in \"A Streetcar Named Desire\" >> > directed by Elia Kazan \n" >> > >> > My questions: >> > >> > 1) Is it usual to have that newline characters? >> > 2) Is it usual to have file metadata at the beginning of the content >> (i.e. >> > stream source, stream_content_type) or related to tool that I post data >> to >> > Solr? >> > >> > Kind Regards, >> > Furkan KAMACI >>