Could the 'metadata model' be a separate file? On Tue, Jun 7, 2011 at 12:22 AM, Hector Yee <[email protected]> wrote: > I've used systems before that kept the original mapping to the classifier > specific mapping. > It can be nice because you can add new features and an old model may still > work because the new features would be out of range of the old mappings. > It can also provide a place to store score statistics (such as min / max / > avg / std dev) for classifiers that need to normalize their features, such > as the linear models. > > It could be something like this > > FeatureInfo > int32 original_index > int32 internal_index > float min_value > float max_value > > FeatureSetInfo > repeated FeatureInfo > > The drawback is potentially adding 32-bytes per feature, which could be > detrimental in terms of size, especially for high dimensional feature spaces > (e.g. text). > If the writable interface could make this optional it would work. > Or we could make all classifiers have a fixed header that we write > containing the common meta-data followed by the actual model itself. > > On Mon, Jun 6, 2011 at 3:17 AM, Ted Dunning <[email protected]> wrote: > >> You have to remember that mapping. You will have created it when you >> encoded the target variable. >> >> This is occasionally a nasty problem. I have considered adding the ability >> to record a dictionary in the classification models, but have not done so. >> >> What interface would you like to see? >> >> Hector, you might like a vote on this. What do you think? >> >> Jeff, what do you think about the impact on the clustering/classification >> unification? >> >> On Sat, Jun 4, 2011 at 10:39 PM, XiaoboGu <[email protected]> wrote: >> >> > How can I find the map between original target labels and the encoded >> > target codes? >> > >> > > > > -- > Yee Yang Li Hector > http://hectorgon.blogspot.com/ (tech + travel) > http://hectorgon.com (book reviews) >
-- Lance Norskog [email protected]
