Hi Gene,
thanks for the reply!
The majority of operations performed on this matrix will be searching
for documents that contain a specific term.
Regards,
Dirk
--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups
Try b-trees. they are quite useful representations of large databases On 5/22/06, Dirk [EMAIL PROTECTED]
wrote:Hi Varun,thanks for your reply!Sparse matrix memory implementation sounds like a fit to me.
Will give Google a try and find out more about it!Thanks,DirkX-Google-Language:
Dirk wrote:
Hi Gene,
thanks for the reply!
The majority of operations performed on this matrix will be searching
for documents that contain a specific term.
Regards,
Dirk
A natural implementation would be a table of docname, term pairs.
Index the table on term so you can look up
Dirk wrote:
Hi!
I'm looking for an effective way to store a large document-term matrix.
The matrix I'm looking at has about 100.000 documents and probably
1.000 terms.
Which representation of this matrix would be the most effictive to work
with?
Putting the whole thing into memory at
The Term Document matrix is a perfect example of implementing Sparse matrix memory implementation. I have had examples where I was able to represent a 5000 words cross 50,000 documents (a little more than that), much efficiently using the in-memory representation techniques of sparse memories.