Re: Solr Japanese support

Alexandre Rafalovitch Sun, 16 Mar 2014 18:11:08 -0700

Which version of Solr are you on? Because for Solr 4, the endpoint
should be /update and the Content-Type should be correct. See:
http://wiki.apache.org/solr/UpdateCSV


I would expect the problem NOT to be around Japanese, but around other
things. You could for example try to index Japanese into the example
collection that comes with Solr. That way you got other variables all
correct. Then, you add another field+fieldType and see if it still
works.

Regards,
   Alex.
Personal website: http://www.outerthoughts.com/
LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
- Time is the quality of nature that keeps events from happening all
at once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD
book)


On Sat, Mar 15, 2014 at 11:50 PM, Bala Iyer <grb...@yahoo.com> wrote:
> Hi,
>
> I am new to Solr japanese.
> I added the support for japanese on schema.xml
> How can i insert Japanese text into that field either by solr client (java / 
> php / ruby ) or by curl
>
>
> schema.xml
> ====================================
>     <field name="username" type="string" indexed="true" stored="true" 
> multiValued="true" omitNorms="true" termVectors="true" />
>     <field name="timestamp" type="date" indexed="true" stored="true" 
> multiValued="true" omitNorms="true" termVectors="true" />
>     <field name="jtxt" type="text_ja" indexed="true" stored="true" 
> multiValued="true" omitNorms="true" termVectors="true" />
>
>     <fieldType name="text_ja" class="solr.TextField" 
> positionIncrementGap="100" autoGeneratePhraseQueries="false">
>       <analyzer>
>         <tokenizer class="solr.JapaneseTokenizerFactory" mode="search"/>
>
>         <!--<tokenizer class="solr.JapaneseTokenizerFactory" mode="search" 
> userDictionary="lang/userdict_ja.txt"/>-->
>         <!-- Reduces inflected verbs and adjectives to their base/dictionary 
> forms (辞書形) -->
>         <filter class="solr.JapaneseBaseFormFilterFactory"/>
>         <!-- Removes tokens with certain part-of-speech tags -->
>         <filter class="solr.JapanesePartOfSpeechStopFilterFactory" 
> tags="lang/stoptags_ja.txt" />
>         <!-- Normalizes full-width romaji to half-width and half-width kana 
> to full-width (Unicode NFKC subset) -->
>         <filter class="solr.CJKWidthFilterFactory"/>
>         <!-- Removes common tokens typically not useful for search, but have 
> a negative effect on ranking -->
>         <filter class="solr.StopFilterFactory" ignoreCase="true" 
> words="lang/stopwords_ja.txt" />
>         <!-- Normalizes common katakana spelling variations by removing any 
> last long sound character (U+30FC) -->
>         <filter class="solr.JapaneseKatakanaStemFilterFactory" 
> minimumLength="4"/>
>         <!-- Lower-cases romaji characters -->
>         <filter class="solr.LowerCaseFilterFactory"/>
>       </analyzer>
>     </fieldType>
> ====================================
>
> my insert.csv file
>
> "id","username","timestamp","content","jtxt"
> "999999999","xxxxx","2013-12-26T10:14:26Z","Hello ","マイ ドキュメント"
> =========================
> I am trying to insert through curl it gives me error
> curl 
> "http://localhost:8983/solr/collection1/update/csv?separator=,&commit=true"; 
> -H "Content-Type: text/plain; charset=utf-8" --data-binary @insert.csv
>
>
> ERROR
> ----------------------------
> <?xml version="1.0" encoding="UTF-8"?>
> <response>
> <lst name="responseHeader"><int name="status">400</int><int 
> name="QTime">23</int
>></lst><lst name="error"><str name="msg">Document is missing mandatory 
>>uniqueKey
>  field: id</str><int name="code">400</int></lst>
> </response>
>
> I know i should not use "Content-Type as text/plain"
> =========================
>
>
> Thanks

Re: Solr Japanese support

Reply via email to