Re: Sorting Values sent to reducer NOT based on KEY (Depending on part of VALUE)

2013-04-23 Thread Sofia Georgiakaki
Hello, Sorting is done by the SortingComparator which performs sorting based on the value of key. A possible solution would be the following: You could write a custom Writable comparable class which extends WritableComparable (lets call it MyCompositeFieldWritableComparable), that will store

Re: product recommendations engine

2013-02-18 Thread Sofia Georgiakaki
Hello Douglass, you could take a look at Mahout and Myrrix projects. These are two projects thatprovide implementations of recommendation machine learning algorithms. There are MapReduce implementations as well, to support massive datasets. In addition, these systems provide client

Re: regarding the ioexception in java

2011-10-02 Thread Sofia Georgiakaki
Good morning! Check again the name and path of your jar file. I guess you don't spell it correct when you write the command so hadoop cannot find it, as indicated by this message:     Error opening job jar: hadoop-examples-0.20.203.0.jar Good luck Sofia

many killed tasks, long execution time

2011-09-23 Thread Sofia Georgiakaki
Good morning! I would be grateful if anyone could help me about a serious problem that I'm facing. I try to run a hadoop job on a 12-node luster (has 48 task capacity), and I have problems when dealing with big input data (10-20GB) which gets worse when I increase the number of reducers.

Re: many killed tasks, long execution time

2011-09-23 Thread Sofia Georgiakaki
common-user@hadoop.apache.org; Sofia Georgiakaki geosofie_...@yahoo.com Sent: Friday, September 23, 2011 4:28 PM Subject: Re: many killed tasks, long execution time Can you include the complete stack trace of the IOException you are seeing? --Bobby Evans On 9/23/11 2:15 AM, Sofia Georgiakaki

Re: Is it possible to access the HDFS via Java OUTSIDE the Cluster?

2011-09-05 Thread Sofia Georgiakaki
Good evening, this topic seems very interesting. To be sure I understood the case - do you mean that I can write a simple Java program and access a file stored in HDFS from within the java application? Assuming that I have e.g. 10 files of size 30GB each stored on HDFS on a cluster of 15

Re: Hadoop--store a sequence file in distributed cache?

2011-08-13 Thread Sofia Georgiakaki
: Joey Echeverria [mailto:j...@cloudera.com] Sent: Friday, August 12, 2011 6:28 AM To: common-user@hadoop.apache.org; Sofia Georgiakaki Subject: Re: Hadoop--store a sequence file in distributed cache? You can use any kind of format for files in the distributed cache, so yes you can use sequence

Re: Hadoop--store a sequence file in distributed cache?

2011-08-12 Thread Sofia Georgiakaki
-user@hadoop.apache.org; Sofia Georgiakaki geosofie_...@yahoo.com Sent: Friday, August 12, 2011 11:30 AM Subject: Re: Hadoop--store a sequence file in distributed cache? Hi Sofia, I assume that output of first job is stored on HDFS. In that case I would directly read file from Mappers without

TotalOrderPartitioner with new api - help

2011-08-03 Thread Sofia Georgiakaki
thesis, and I don't know from who I should ask for help. Thank you very much in advance, Sofia Georgiakaki undergraduate student department of Electronic Computer Engineering Technical University of Crete, Greece

Re: TotalOrderPartitioner with new api - help

2011-08-03 Thread Sofia Georgiakaki
is possible that it will be updated to Hadoop 0.20.203. Will I have a problem using the old api then?? Hadoop is confusing, I say. Thank you, Sofia Georgiakaki

how to use TotalOrderPartitioner

2011-07-29 Thread Sofia Georgiakaki
can set the different InputFormats... Could someone give me a helping hand please? Thank you in advance, Sofia Georgiakaki

cannot get configuration settings from API

2011-07-27 Thread Sofia Georgiakaki
Good afternoon, during writing a MapReduce job, I need to get the value of some configuration settings. For instance, I need to get the value of dfs.write.packet.size inside the reducer, so I write, using the context of the reducer:                 Configuration

Running queries using index on HDFS

2011-07-25 Thread Sofia Georgiakaki
Good evening, I have built an Rtree on HDFS, in order to improve the query performance of high-selectivity spatial queries. The Rtree is composed of a number of hdfs files (each one created by one Reducer, so as the number of the files is equal to the number of the reducers), where each file