Marc Perkel wrote:
This command used to display tokens. Now it displays just numbers. Is there any way to get the text back?

The hash used is one way. It was put in because of large performance gains dealing with fixed size numbers instead of variable size strings.


The only way to get the strings back is to compile a dictionary from the original corpus and then use that to convert the numbers in the dump. I don't know of anyone who is writing a tool to do that, but that is the only way I could think of to get such a dump utility to work.

 -- sidney




Reply via email to