My input is that:
{| style="text-align: left; width: 50%; table-layout: fixed;" border="0" |}
Analysis is as follows:
WT
textraw_bytesstartendtypeflagsposition
style[73 74 79 6c 65]3801
text[74 65 78 74]101402
align[61 6c 69 67 6e]152003
left[6c 65 66 74]222604
width[77 69 64 74 68]283305
50[35
I've compared the results when using WikipediaTokenizer for index time
analyzer but there is no difference?
2014-02-23 3:44 GMT+02:00 Ahmet Arslan :
> Hi Furkan,
>
> There is org.apache.lucene.analysis.wikipedia.WikipediaTokenizer
>
> Ahmet
>
>
> On Sunday, February 23, 2014 2:22 AM, Furkan KAM
Hi Furkan,
There is org.apache.lucene.analysis.wikipedia.WikipediaTokenizer
Ahmet
On Sunday, February 23, 2014 2:22 AM, Furkan KAMACI
wrote:
Hi;
I want to run an NLP algorithm for Wikipedia data. I used dataimport
handler for dump data and everything is OK. However there are some texts as
li
Hi;
I want to run an NLP algorithm for Wikipedia data. I used dataimport
handler for dump data and everything is OK. However there are some texts as
like:
== Altyapı bilgileri == Köyde, [[ilköğretim]] okulu yoktur fakat taşımalı
eğitimden yararlanılmaktadır.
I think that it should be like that: