Fyi folks Attn: @Wcolen
---------- Forwarded message ---------- From: Gustavo Frederico <gustavo.freder...@thinkwrap.com> Date: Thu, Jan 19, 2017 at 9:59 AM Subject: Re: text classification in portuguese To: u...@predictionio.incubator.apache.org Marcus, at first sight this looks like a correct Json encoding. Json itself encodes the UTF-8 characters. Abraço Gustavo On Thu, Jan 19, 2017 at 8:54 AM, Marcus Vinicius <marcus...@gmail.com> wrote: > Hello guys, > > I`m again. I`m trying to classify a portuguese text following the demo > tutorial (http://predictionio.incubator.apache.org/demo/textclassific > ation/). > > Someone already perform this with predictionIo? How could be the better > way to i lead with stemming and stop portuguese words? > > Allow me to take this opportunity to do another question. Someone has > problem with encoding? My csv load file is in ISO-8859 and in python script > i`m transforming my text to utf-8. > > text_utf8 = text.decode('iso-8859-1').encode('utf-8') > client.create_event( > event="documents", > entity_type="source", > entity_id=str(count), # use the count num as user ID > properties= { > "text" : text_utf8, > "category" : attr[2], > "label" : int(attr[3]) > } > ) > > When i retrive event from http://localhost:7070/events.json i got a > encoded word. Is it right? > > {"eventId":"x","event":"documents","entityType":"source","entityId":"73","properties":{"category":"A","text":"Gest\u008bo > de > Caixa","label":2},"eventTime":"2017-01-19T12:31:27.863Z","creationTime":"2017-01-19T12:31:27.867Z"} > > > I really appreciate your attention. > > > -- > > Marcus Vinicius A. Silva > > *P* *ANTES DE IMPRIMIR pense em sua responsabilidade e compromisso > com o MEIO AMBIENTE.* >