You might find
http://www.lucidimagination.com/search/document/39b53fbf4b525f2f/lda_only_executes_a_single_map_task_per_iteration_when_running_in_actual_distributed_mode#311eb323a8208e28
informative.
(BTW, LDA is only meant to run w/ TF)
-Grant
On May 19, 2010, at 9:49 PM, Jeff Eastman wrote:
On Wed, May 19, 2010 at 10:10 PM, Drew Farris wrote:
> Of course this doesn't really address the root problem however -- why LDA
> on reuters is slow. How long is it taking to run?
>
> Drew
nm, saw it in the JIRA issue (5.5min vs. 1.5min)
Jeff,
Just curious, have you tried:
./bin/mahout seq2sparse -Dmapred.reduce.tasks=2 -i
./examples/bin/work/reuters-out-seqdir/ -o
./examples/bin/work/reuters-out-seqdir-sparse -wt tf -seq
The mahout script (MahoutDriver) allows arbitrary hadoop properties to be
specified via -D arguments which a
[
https://issues.apache.org/jira/browse/MAHOUT-397?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jeff Eastman updated MAHOUT-397:
Status: Patch Available (was: Open)
patch submitted runs on r946508
> SparseVectorsFromSequenceFi
[
https://issues.apache.org/jira/browse/MAHOUT-397?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jeff Eastman updated MAHOUT-397:
Attachment: MAHOUT-397.patch
This patch seems to resolve the issue by propagating the number of red
SparseVectorsFromSequenceFiles only outputs a single vector file
Key: MAHOUT-397
URL: https://issues.apache.org/jira/browse/MAHOUT-397
Project: Mahout
Issue Type: Improvement
On 5/19/10 3:19 PM, Jeff Eastman wrote:
I tried propagating numReducers into its makePartialVectors driver;
however, but a single reducer is still all I get. I need to figure out
how to tickle the elephant to give me more.
Note to self: Use a real elephant. Running Hadoop in Eclipse is great
f
On 5/19/10 1:49 PM, Drew Farris wrote:
On Wed, May 19, 2010 at 3:49 PM, Jeff Eastmanwrote:
I cannot imagine how one could ever get LDA to scale if it is always
limited to a single input vector file. Is there a way to get multiple output
vector files from seqtosparse?
I don't know o
On Wed, May 19, 2010 at 3:49 PM, Jeff Eastman wrote:
> I cannot imagine how one could ever get LDA to scale if it is always
> limited to a single input vector file. Is there a way to get multiple output
> vector files from seqtosparse?
>
I don't know offhand, but is the default input split (mapr
I ran the Reuters dataset against LDA yesterday on a 2-node cluster and
it took a really long time to converge (100 iterations * 10 min ea)
extracting 20 topics. I was able to reduce the iteration time by 50% by
using just TF and SeqAccSparseVectors but it was still only using a
single mapper a
[
https://issues.apache.org/jira/browse/MAHOUT-383?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12869050#action_12869050
]
Zoran Sevarac commented on MAHOUT-383:
--
Just to let you that we've released the Neurop
[
https://issues.apache.org/jira/browse/MAHOUT-364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12869048#action_12869048
]
Zoran Sevarac commented on MAHOUT-364:
--
Hi.
Just to let you that we've released the N
12 matches
Mail list logo