got it, Thanks, Joel. On Thu, Feb 9, 2017 at 11:17 AM, Susheel Kumar <susheel2...@gmail.com> wrote:
> I increased from 250 to 2500 and 100 to 1000 when did't get expected > result. Let me put more examples. > > Thanks, > Susheel > > On Thu, Feb 9, 2017 at 11:03 AM, Joel Bernstein <joels...@gmail.com> > wrote: > >> A few things that I see right off: >> >> 1) 2500 terms is too many. I was testing with 100-250 terms >> 2) 1000 iterations is to high. If the model hasn't converged by 100 >> iterations it's likely not going to converge. >> 3) You're going to need more examples. You may want to run features first >> and see what it selects. Then you need multiple examples for each feature. >> I was testing with the enron ham/spam data set. It would be good to >> download that dataset and see what that looks like. >> >> Joel Bernstein >> http://joelsolr.blogspot.com/ >> >> On Thu, Feb 9, 2017 at 10:15 AM, Susheel Kumar <susheel2...@gmail.com> >> wrote: >> >> > Hello Joel, >> > >> > Here is the final iteration in json format. >> > >> > https://www.dropbox.com/s/g3a3606ms6cu8q4/final_iteration.json?dl=0 >> > >> > Below is the expression used >> > >> > update(models, >> > batchSize="50", >> > train(trainingSet, >> > features(trainingSet, >> > q="*:*", >> > featureSet="threatFeatures", >> > field="body_txt", >> > outcome="out_i", >> > numTerms=2500), >> > q="*:*", >> > name="threatModel", >> > field="body_txt", >> > outcome="out_i", >> > maxIterations="1000")) >> > >> > I just have 16 documents with 8+ve and 8-ves. The field which contains >> the >> > feedback is body_txt (text_general type) >> > >> > Thanks for looking. >> > >> > >> > >> > On Wed, Feb 8, 2017 at 7:52 AM, Joel Bernstein <joels...@gmail.com> >> wrote: >> > >> > > Can you post the final iteration of the model? >> > > >> > > Also the expression you used to train the model? >> > > >> > > How much training data do you have? Ho many positive examples and >> > negatives >> > > examples? >> > > >> > > Joel Bernstein >> > > http://joelsolr.blogspot.com/ >> > > >> > > On Tue, Feb 7, 2017 at 2:14 PM, Susheel Kumar <susheel2...@gmail.com> >> > > wrote: >> > > >> > > > Hello, >> > > > >> > > > I am tried to follow http://joelsolr.blogspot.com/ to see if we can >> > > > classify positive & negative feedbacks using streaming expressions. >> > All >> > > > works but end result where probability_d result of classify >> expression >> > > > gives similar results for positive / negative feedback. See below >> > > > >> > > > What I may be missing here. Do i need to put more data in training >> set >> > > or >> > > > something else? >> > > > >> > > > >> > > > { "result-set": { "docs": [ { "body_txt": [ "love the company" ], >> > > > "score_d": 2.1892474120319667, "id": "6", "probability_d": >> > > > 0.977944433135261 }, { "body_txt": [ "bad experience " ], "score_d": >> > > > 3.1689453250842914, "id": "5", "probability_d": 0.9888109278133054 >> }, { >> > > > "body_txt": [ "This company rewards its employees, but you should >> only >> > > work >> > > > here if you truly love sales. The stress of the job can get to you >> and >> > > they >> > > > definitely push you." ], "score_d": 4.621702323888672, "id": "4", >> > > > "probability_d": 0.9999999999898557 }, { "body_txt": [ "no chance >> for >> > > > advancement with that company every year I was there it got worse I >> > don't >> > > > know if all branches of adp but Florence organization was turn over >> > rate >> > > > would be higher if it was for temp workers" ], "score_d": >> > > > 5.288898825826228, "id": "3", "probability_d": 0.9999999999999956 >> }, { >> > > > "body_txt": [ "It was a pleasure to work at the Milpitas campus. The >> > team >> > > > that works there are professional and dedicated individuals. The >> level >> > of >> > > > loyalty and dedication is impressive" ], "score_d": >> 2.5303947056922937, >> > > > "id": "2", "probability_d": 0.9999990430778418 }, >> > > > >> > > >> > >> > >