Hi All,

The current process which I am doing:


1.       Reading Unstructured data, Understand what text means what. e.g. a 
phrase like "JOHN SMITH" this is a customer name. or a phrase like "CLAIM 
DURATION" is a reason type field. Currently, doing this manually (no issue over 
here).

2.       Creating a data frame with fields - Reason, Name, Category, 
Sub-Reason, Description. There is no boolean or numeric field (Do I need to 
create any? based on #5 need).

3.       Manually feeding the data frame with values as read from the 
unstructured data text. This would have approximate be having 50 rows. and this 
is my training set. (no issue over here)

4.       Dataframe #3 would be nothing but a supervised train data set.



Following #5, this is what I need to achieve (and need help here)-


5.       Now when I receive another file (test data),lets say to start - just a 
phrase "******************", can I predict that text based on the trained 
model, it is a Reason related phrase (based on the probability, lets say more 
than 70%), create a new data frame (PredcitedDF) and add a column (e.g. Reason) 
add a row for this text under Reason field. Again receive on more text phrase, 
which seems like "Name", add a column "Name" in the PredictedDF and add this 
value under Name column as second row...and hence further.

I was reading about RTextTools (http://www.rtexttools.com/), well in that case 
it has be told that this value is for this text and hence further...

Any help would be appreciated.

Regards,
Anshuk Pal Chaudhuri


        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to