I'm coming at this from a market research point of view (that's my
background). There seem to be a number of opportunities there for
classificaton, clustering, and regression analysis tools, so I am building
- or rather attempting to build - tools with the aim that they will go on
the web, and peo
Hi Nigel. I see you're in the UK, I'm based east of you in London. My
goal with the disambiguator is to provide a well documented pipeline
such that it can be easily retrained.
I have a notion that in the future I'll host a version of my code
production-ready under my http://annotate.io/ , ready f
Hi Harold. Are you using different models for the different types of
social media? I'd guess that the grammar/terms used in a tweet could
look quite different to what you see in e.g. a Google+ Comment
(different demographic->probably higher quality English, less space
restrictions->longer/clearer w
I am just starting down the road towards having a text classifier for
social media posts. As this may be used in a variety of situations
(currently negotiating 2 freelance analytics positions with research
agencies), the classifier will need to have a mechanism for retraining on a
project by projec
Hi Ian,
Thank you very much for writing this message, and especially for
sharing your experience. I am actually
doing the very same thing, and would love to collaborate with you,
if possible. I'm not as far along in my journey as you are, but I hope
we can help each other in the future!
I'm categ
Hello Mike. Could you give a summary of your problem? It sounds like
you're categorising text (tweets? medical text? news articles?) into
>2 categories (how many?), is that right? Is the goal really to
optimise your f1 score, or maybe to only want accurate categorisations
(precision) or maybe high
2013/7/10 Mike Hansen :
> I have been using Scikit's text classification for several weeks, and I
> really like it. I use my own corpus (self-generated) and prepare each
> document using the NLTK. Presently I am relying on this tutorial/code-base,
> only making changes when absolutely necessary f
I have been using Scikit's text classification for several weeks, and I really
like it. I use my own corpus (self-generated) and prepare each document using
the NLTK. Presently I am relying on this tutorial/code-base, only making
changes when absolutely necessary for my documents to work.
The