text classification in portuguese

Marcus Vinicius Thu, 19 Jan 2017 05:55:19 -0800

Hello guys,

I`m again. I`m trying to classify a portuguese text following the demo
tutorial (http://predictionio.incubator.apache.org/demo/textclassification/
).


Someone already perform this with predictionIo? How could be the better way
to i lead with stemming and stop portuguese words?

Allow me to take this opportunity to do another question. Someone has
problem with encoding? My csv load file is in ISO-8859 and in python script
i`m transforming my text to utf-8.

text_utf8 = text.decode('iso-8859-1').encode('utf-8')
    client.create_event(
      event="documents",
      entity_type="source",
      entity_id=str(count), # use the count num as user ID
      properties= {
        "text" : text_utf8,
        "category" : attr[2],
        "label" : int(attr[3])
      }
    )

When i retrive event from http://localhost:7070/events.json i got  a
encoded word. Is it right?

{"eventId":"x","event":"documents","entityType":"source","entityId":"73","properties":{"category":"A","text":"Gest\u008bo
de 
Caixa","label":2},"eventTime":"2017-01-19T12:31:27.863Z","creationTime":"2017-01-19T12:31:27.867Z"}


I really appreciate your attention.


-- 

Marcus Vinicius A. Silva

*P*  *ANTES DE IMPRIMIR pense em sua responsabilidade e compromisso
com o MEIO AMBIENTE.*

text classification in portuguese

Reply via email to