about PatternCaptureGroupFilterFactory. This isn't going to help. The
data you see when you return stored data is _before_ any analysis so
the Pattern....Factory won't be applied. You could do this in a
ScriptUpdateProcessorFactory. Or, just don't worry about it and have
the real app deal with it.

I don't particularly know about the Tika settings, that's largely a guess.

Best,
Erick

On Thu, Nov 24, 2016 at 8:43 AM, Furkan KAMACI <furkankam...@gmail.com> wrote:
> Hi Erick,
>
> 1) I am looking stored data via Solr Admin UI. I send the query and check
> what is in content field.
>
> 2) I can debug the Tika settings if you think that this is not the desired
> behaviour to have such metadata fields combined into content field.
>
> *PS: *Is there any solution to get rid of it except for
> using PatternCaptureGroupFilterFactory?
>
> Kind Regards,
> Furkan KAMACI
>
> On Thu, Nov 24, 2016 at 6:31 PM, Erick Erickson <erickerick...@gmail.com>
> wrote:
>
>> 1> I'm assuming when you "see" this data you're looking at the stored
>> data, right? It's a verbatim copy of whatever you sent to the field.
>> I'm guessing it's a character-encoding mismatch between the source and
>> what you use to display.
>>
>> 2> How are you extracting this data? There are Tika options I think
>> that can/do mush fields together.
>>
>> Best,
>> Erick
>>
>>
>>
>> On Thu, Nov 24, 2016 at 7:54 AM, Furkan KAMACI <furkankam...@gmail.com>
>> wrote:
>> > Hi,
>> >
>> > I'm testing Solr 4.9.1 I've indexed documents via it. Content field at
>> > schema has text_general field type which is not modified from original. I
>> > do not copy any fields to content. When I check the data  I see content
>> > values as like:
>> >
>> >  " \n \nstream_source_info MARLON BRANDO.rtf   \nstream_content_type
>> > application/rtf   \nstream_size 13580   \nstream_name MARLON BRANDO.rtf
>> > \nContent-Type application/rtf   \nresourceName MARLON BRANDO.rtf   \n
>> \n
>> > \n  1. Vivien Leigh and Marlon Brando in \"A Streetcar Named Desire\"
>> > directed by Elia Kazan \n"
>> >
>> > My questions:
>> >
>> > 1) Is it usual to have that newline characters?
>> > 2) Is it usual to have file metadata at the beginning of the content
>> (i.e.
>> > stream source, stream_content_type) or related to tool that I post data
>> to
>> > Solr?
>> >
>> > Kind Regards,
>> > Furkan KAMACI
>>

Reply via email to