I am using IDF estimator/model (TF-IDF) to convert text features into vectors. Currently, I fit IDF model on all sample data and then transform them. I read somewhere that I should split my data into training and test before fitting IDF model; Fit IDF only on training data and then use same transformer to transform training and test data. This raise more questions: 1) Why would you do that? What exactly do IDF learn during fitting process that it can reuse to transform any new dataset. Perhaps idea is to keep same value for |D| and DF|t, D| while use new TF|t, D| ? 2) If not then fitting and transforming seems redundant for IDF model
-- [image: What's New with Xactly] <http://www.xactlycorp.com/email-click/> <https://www.nyse.com/quote/XNYS:XTLY> [image: LinkedIn] <https://www.linkedin.com/company/xactly-corporation> [image: Twitter] <https://twitter.com/Xactly> [image: Facebook] <https://www.facebook.com/XactlyCorp> [image: YouTube] <http://www.youtube.com/xactlycorporation>