Re: Mahout fpg

Suneel Marthi Wed, 20 Nov 2013 15:12:40 -0800

>From the stacktrace:

FAILEDjava.lang.NumberFormatException: For input string: "A1234567"    
at 
java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)


Obviously, the input's incorrect.





On Wednesday, November 20, 2013 6:02 PM, Sameer Tilak <ssti...@live.com> wrote:
 
Dear Sebastian,I tried using ItemSimilarityJob.My data has the following format
Each line contains data in the format:userid    itemid  (I also tried userid, 
itemcode). Itemcode is a string. However, I am getting the following error. May 
be my input format is incorrect.

  ./mahout org.apache.mahout.cf.taste.hadoop.similarity.item.ItemSimilarityJob 
--input testdata/similarityinput -o testdata/similarityoutput 
--similarityClassname SIMILARITY_COOCCURRENCE --maxSimilaritiesPerItem 10    
13/11/20 14:46:39 WARN driver.MahoutDriver: No 
org.apache.mahout.cf.taste.hadoop.similarity.item.ItemSimilarityJob.props found 
on classpath, will use command-line arguments only13/11/20 14:46:39 INFO 
common.AbstractJob: Command line arguments: {--booleanData=[false], 
--endPhase=[2147483647], --input=[testdata/similarityinput], --maxPrefs=[500], 
--maxSimilaritiesPerItem=[10], --minPrefsPerUser=[1], 
--output=[testdata/similarityoutput], 
--similarityClassname=[SIMILARITY_COOCCURRENCE], --startPhase=[0], 
--tempDir=[temp]}13/11/20 14:46:39 INFO common.AbstractJob: Command line 
arguments: {--booleanData=[false], --endPhase=[2147483647], 
--input=[testdata/similarityinput], --minPrefsPerUser=[1], 
--output=[temp/prepareRatingMatrix],
 --ratingShift=[0.0], --startPhase=[0], --tempDir=[temp]}13/11/20 14:46:41 INFO 
input.FileInputFormat: Total input paths to process : 113/11/20 14:46:41 INFO 
util.NativeCodeLoader: Loaded the native-hadoop library13/11/20 14:46:41 WARN 
snappy.LoadSnappy: Snappy native library not loaded13/11/20 14:46:41 INFO 
mapred.JobClient: Running job: job_201311111627_011513/11/20 14:46:42 INFO 
mapred.JobClient:  map 0% reduce 0%13/11/20 14:47:00 INFO mapred.JobClient: 
Task Id : attempt_201311111627_0115_m_000000_0, Status : 
FAILEDjava.lang.NumberFormatException: For input string: "A1234567"    at 
java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)   
 at java.lang.Long.parseLong(Long.java:441)    at 
java.lang.Long.parseLong(Long.java:483)    at 
org.apache.mahout.cf.taste.hadoop.item.ItemIDIndexMapper.map(ItemIDIndexMapper.java:50)
    at
 
org.apache.mahout.cf.taste.hadoop.item.ItemIDIndexMapper.map(ItemIDIndexMapper.java:31)
    at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)    at 
org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)    at 
org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)    at 
org.apache.hadoop.mapred.Child$4.run(Child.java:255)    at 
java.security.AccessController.doPrivileged(Native Method)    at 
javax.security.auth.Subject.doAs(Subject.java:415)    at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121)
    at org.apache.hadoop.mapred.Child.main(Child.java:249)
13/11/20 14:47:11 INFO mapred.JobClient: Task Id : 
attempt_201311111627_0115_m_000000_1, Status : 
FAILEDjava.lang.NumberFormatException: For input string: "A1234567"    at 
java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)   
 at java.lang.Long.parseLong(Long.java:441)    at 
java.lang.Long.parseLong(Long.java:483)    at 
org.apache.mahout.cf.taste.hadoop.item.ItemIDIndexMapper.map(ItemIDIndexMapper.java:50)
    at 
org.apache.mahout.cf.taste.hadoop.item.ItemIDIndexMapper.map(ItemIDIndexMapper.java:31)
    at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)    at 
org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)    at 
org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)    at 
org.apache.hadoop.mapred.Child$4.run(Child.java:255)    at 
java.security.AccessController.doPrivileged(Native Method)    at 
javax.security.auth.Subject.doAs(Subject.java:415)    at
 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121)
    at org.apache.hadoop.mapred.Child.main(Child.java:249)

> Date: Wed, 20 Nov 2013 08:22:07 +0100
> From: ssc.o...@googlemail.com
> To: user@mahout.apache.org
> Subject: Re: Mahout fpg
> 
> You can use ItemSimilarityJob to find sets of items that cooccur
> together in your users interactions.
> 
> --sebastian
> 
> 
> On 20.11.2013 08:11, Sameer Tilak wrote:
> > 
> > 
> > 
> > Hi Sunil,
> > Thanks for your reply. We can benefit a lot from the parallel frequent 
> > pattern matching functionality. Will there be any alternative in future 
> > releases? I guess, we can use older versions of Mahout if we need that.
> > 
> >> Date: Tue, 19 Nov 2013 19:25:54 -0800
> >> From: suneel_mar...@yahoo.com
> >> Subject: Re: Mahout fpg
> >> To: user@mahout.apache.org
> >>
> >> Fpg has been removed from the codebase as it will not be supported.
> >>
> >>
> >>
> >>
> >>
> >> On Tuesday, November 19, 2013 8:56 PM, Sameer Tilak <ssti...@live.com> 
> >> wrote:
> >>  
> >> Hi everyone,I downloaded the latest version of Mahout and did mvn install. 
> >> When I try to run fog, I get the following errors. Do I need to download 
> >> and compile FPG separately? Looks like somehow it has not been included in 
> >> the list of valid programs.
> >> 13/11/19 17:49:19 WARN driver.MahoutDriver: Unable to add class: 
> >> fpg13/11/19 17:49:19 WARN driver.MahoutDriver: No fpg.props found on 
> >> classpath, will use command-line arguments onlyUnknown program 'fpg' 
> >> chosen.Valid program names are:  arff.vector: : Generate Vectors from an 
> >> ARFF file or directory  baumwelch: : Baum-Welch algorithm for unsupervised 
> >> HMM training  canopy: : Canopy clustering  cat: : Print a file or resource 
> >> as the logistic regression models would see it  cleansvd: : Cleanup and 
> >> verification of SVD output  clusterdump: : Dump cluster output to text  
> >> clusterpp: : Groups Clustering Output In Clusters  cmdump: : Dump 
> >> confusion matrix in HTML or text formats  concatmatrices: : Concatenates 2 
> >> matrices of same cardinality into a single matrix  cvb: : LDA via 
> >> Collapsed Variation Bayes (0th deriv. approx)  cvb0_local: : LDA via 
> >> Collapsed Variation Bayes, in memory locally.  evaluateFactorization: : 
> >> compute RMSE and MAE of a rating
> >>  matrix factorization against probes  fkmeans: : Fuzzy K-means clustering  
> >>hmmpredict: : Generate random sequence of observations by given HMM  
> >>itemsimilarity: : Compute the item-item-similarities for item-based 
> >>collaborative filtering  kmeans: : K-means clustering  lucene.vector: : 
> >>Generate Vectors from a Lucene index  lucene2seq: : Generate Text 
> >>SequenceFiles from a Lucene index  matrixdump: : Dump matrix in CSV format  
> >>matrixmult: : Take the product of two matrices  parallelALS: : ALS-WR 
> >>factorization of a rating matrix  qualcluster: : Runs clustering 
> >>experiments and summarizes results in a CSV  recommendfactorized: : Compute 
> >>recommendations using the factorization of a rating matrix  
> >>recommenditembased: : Compute recommendations using item-based 
> >>collaborative filtering  regexconverter: : Convert text files on a per line 
> >>basis based on regular expressions  resplit: : Splits a set of 
> >>SequenceFiles into a number of equal splits 
 rowid: :
> >>  Map SequenceFile<Text,VectorWritable> to 
> >>{SequenceFile<IntWritable,VectorWritable>, SequenceFile<IntWritable,Text>}  
> >>rowsimilarity: : Compute the pairwise similarities of the rows of a matrix  
> >>runAdaptiveLogistic: : Score new production data using a probably trained 
> >>and validated AdaptivelogisticRegression model  runlogistic: : Run a 
> >>logistic regression model against CSV data  seq2encoded: : Encoded Sparse 
> >>Vector generation from Text sequence files  seq2sparse: : Sparse Vector 
> >>generation from Text sequence files  seqdirectory: : Generate sequence 
> >>files (of Text) from a directory  seqdumper: : Generic Sequence File dumper 
> >> seqmailarchives: : Creates SequenceFile from a directory containing 
> >>gzipped mail archives  seqwiki: : Wikipedia xml dump to sequence file  
> >>spectralkmeans: : Spectral k-means clustering  split: : Split Input data 
> >>into test and train sets  splitDataset: : split a rating dataset into 
> >>training and probe parts  ssvd: :
> >>  Stochastic SVD  streamingkmeans: : Streaming k-means clustering  svd: : 
> >>Lanczos Singular Value Decomposition  testnb: : Test the Vector-based Bayes 
> >>classifier  trainAdaptiveLogistic: : Train an AdaptivelogisticRegression 
> >>model  trainlogistic: : Train a logistic regression using stochastic 
> >>gradient descent  trainnb: : Train the Vector-based Bayes classifier  
> >>transpose: : Take the transpose of a matrix  validateAdaptiveLogistic: : 
> >>Validate an AdaptivelogisticRegression model against hold-out data set  
> >>vecdist: : Compute the distances between a set of Vectors (or Cluster or 
> >>Canopy, they must fit in memory) and a list of Vectors  vectordump: : Dump 
> >>vectors from a sequence file to text  viterbi: : Viterbi decoding of hidden 
> >>states from given output states sequence                          
> > 
> >                            
> > 
>

Re: Mahout fpg

Reply via email to