Re: Unicode Character Problem

Ahmet Arslan Sat, 10 Dec 2016 08:25:07 -0800

Hi Furkan,

I am pretty sure this is a pdf extraction thing.
Turkish characters caused us trouble in the past during extracting text from 
pdf files.
You can confirm by performing manual copy-paste from original pdf file.


Ahmet


On Friday, December 9, 2016 8:44 PM, Furkan KAMACI <furkankam...@gmail.com> 
wrote:
Hi,

I'm trying to index Turkish characters. These are what I see at my index (I
see both of them at different places of my content):

aç �klama
açıklama

These are same words but indexed different (same weird character at first
one). I see that there is not a weird character when I check the original
PDF file.

What do you think about it. Is it related to Solr or Tika?

PS: I use text_general for analyser of content field.

Kind Regards,
Furkan KAMACI

Re: Unicode Character Problem

Reply via email to