Re: Solr Japanese support

Erick Erickson Sun, 16 Mar 2014 18:19:27 -0700

Tri Dang:

Please follow the instructions here:


https://lucene.apache.org/solr/discussion.html

Best,
Erick

On Sun, Mar 16, 2014 at 6:15 PM, Tri Dang <tritd...@yahoo.com> wrote:
> Please unsubscribe me.
>
>
>
>
>
> On Sunday, March 16, 2014 9:10 PM, Alexandre Rafalovitch <arafa...@gmail.com> 
> wrote:
>
> Which version of Solr are you on? Because for Solr 4, the endpoint
> should be /update and the Content-Type should be correct. See:
> http://wiki.apache.org/solr/UpdateCSV
>
> I would expect the problem NOT to be around Japanese, but around other
> things. You could for example try to index Japanese into the example
> collection that comes with Solr. That way you got other variables all
> correct. Then, you add another field+fieldType and see if it still
> works.
>
> Regards,
>    Alex.
> Personal website: http://www.outerthoughts.com/
> LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
> - Time is the quality of nature that keeps events from happening all
> at once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD
> book)
>
>
>
> On Sat, Mar 15, 2014 at 11:50 PM, Bala Iyer <grb...@yahoo.com> wrote:
>> Hi,
>>
>> I am new to Solr japanese.
>> I added the support for japanese on schema.xml
>> How can i insert Japanese text into that field either by solr client (java / 
>> php / ruby ) or by curl
>>
>>
>> schema.xml
>> ====================================
>>     <field name="username" type="string" indexed="true" stored="true" 
>> multiValued="true" omitNorms="true" termVectors="true" />
>>     <field name="timestamp" type="date" indexed="true" stored="true" 
>> multiValued="true" omitNorms="true" termVectors="true" />
>>     <field name="jtxt" type="text_ja" indexed="true" stored="true" 
>> multiValued="true" omitNorms="true" termVectors="true" />
>>
>>     <fieldType name="text_ja" class="solr.TextField" 
>> positionIncrementGap="100" autoGeneratePhraseQueries="false">
>>       <analyzer>
>>         <tokenizer class="solr.JapaneseTokenizerFactory" mode="search"/>
>>
>>         <!--<tokenizer class="solr.JapaneseTokenizerFactory" mode="search" 
>> userDictionary="lang/userdict_ja.txt"/>-->
>>         <!-- Reduces inflected verbs and adjectives to their base/dictionary 
>> forms (辞書形) -->
>>         <filter class="solr.JapaneseBaseFormFilterFactory"/>
>>         <!-- Removes tokens with certain part-of-speech tags -->
>>         <filter class="solr.JapanesePartOfSpeechStopFilterFactory" 
>> tags="lang/stoptags_ja.txt" />
>>         <!-- Normalizes full-width romaji to half-width and half-width kana 
>> to full-width (Unicode NFKC subset) -->
>>         <filter class="solr.CJKWidthFilterFactory"/>
>>         <!-- Removes common tokens typically not useful for search, but have 
>> a negative effect on ranking -->
>>         <filter class="solr.StopFilterFactory" ignoreCase="true" 
>> words="lang/stopwords_ja.txt" />
>>         <!-- Normalizes common katakana spelling variations by removing any 
>> last long sound character (U+30FC) -->
>>         <filter class="solr.JapaneseKatakanaStemFilterFactory" 
>> minimumLength="4"/>
>>         <!-- Lower-cases romaji characters -->
>>         <filter class="solr.LowerCaseFilterFactory"/>
>>       </analyzer>
>>     </fieldType>
>> ====================================
>>
>> my insert.csv file
>>
>> "id","username","timestamp","content","jtxt"
>> "999999999","xxxxx","2013-12-26T10:14:26Z","Hello ","マイ ドキュメント"
>> =========================
>> I am trying to insert through curl it gives me error
>> curl 
>> "http://localhost:8983/solr/collection1/update/csv?separator=,&commit=true"; 
>> -H "Content-Type: text/plain; charset=utf-8" --data-binary @insert.csv
>>
>>
>> ERROR
>> ----------------------------
>> <?xml version="1.0" encoding="UTF-8"?>
>> <response>
>> <lst name="responseHeader"><int name="status">400</int><int 
>> name="QTime">23</int
>>></lst><lst name="error"><str name="msg">Document is missing mandatory 
>>>uniqueKey
>>  field: id</str><int name="code">400</int></lst>
>> </response>
>>
>> I know i should not use "Content-Type as text/plain"
>> =========================
>>
>>
>> Thanks

Re: Solr Japanese support

Reply via email to