Re: HDFS file missing a part-file

2012-10-02 Thread Björn-Elmar Macek
01.10.2012 22:36, schrieb Björn-Elmar Macek: The script i now want to executed looks like this: x = load 'tag_count_ts_pro_userpair' as (group:tuple(),cnt:int,times:bag{t:tuple(c:chararray)}); y = foreach x generate *, moins.daysFromStart('2011-06-01 00:00:00', times); store y into 'test_daysFromStart

HDFS file missing a part-file

2012-10-01 Thread Björn-Elmar Macek
Hi, i am kind of unsure where to post this problem, but i think it is more related to hadoop than to pig. By successfully executing a pig script i created a new file in my hdfs. Sadly though, i cannot use it for further processing except for dumping and viewing the data: every

Re: HDFS file missing a part-file

2012-10-01 Thread Björn-Elmar Macek
) at org.apache.hadoop.mapred.Child.main(Child.java:249) On Mon, 1 Oct 2012 10:12:22 -0700, Robert Molina rmol...@hortonworks.com wrote: Hi Bjorn,  Can you post the exception you are getting during the map phase? On Mon, Oct 1, 2012 at 9:11 AM, Björn-Elmar Macek wrote: Hi, i am kind of unsure where

Re: HDFS file missing a part-file

2012-10-01 Thread Björn-Elmar Macek
correctly on hdfs. Can you provide the pig script you are trying to run?  Also, for the original script that ran and generated the file, can you verify if that job had any failed tasks? On Mon, Oct 1, 2012 at 10:31 AM, Björn-Elmar Macek wrote: Hi Robert, the exception i see in the output

Re: Join-package combiner number of input and output records the same

2012-09-25 Thread Björn-Elmar Macek
Hi, i had this problem once too. Did you properly overwrite the reduce method with the @override annotation? Does your reduce method use OutputCollector or Context for gathering outputs? If you are using current version, it has to be Context. The thing is: if you do NOT override the standart

Re: Join-package combiner number of input and output records the same

2012-09-25 Thread Björn-Elmar Macek
-Elmar Macek ma...@cs.uni-kassel.de: Hi, i had this problem once too. Did you properly overwrite the reduce method with the @override annotation? Does your reduce method use OutputCollector or Context for gathering outputs? If you are using current version, it has to be Context. The thing

Re: mortbay, huge files and the ulimit

2012-09-05 Thread Björn-Elmar Macek
(Text.class); FileInputFormat.addInputPath(job, new Path(otherArgs[0])); FileOutputFormat.setOutputPath(job, new Path(otherArgs[1])); System.exit(job.waitForCompletion(true) ? 0 : 1); } } Am 05.09.2012 13:56, schrieb Björn-Elmar Macek: Hello again, i just wanted to keep you updated, in case

Re: mortbay, huge files and the ulimit

2012-09-05 Thread Björn-Elmar Macek
people us a rule of thumb to do x4 to get approx mem requirement. Just some ideas, not really a solution but maybe it helps you further. On Wed, Sep 5, 2012 at 2:02 PM, Björn-Elmar Macek ma...@cs.uni-kassel.de wrote: Excuse me: in my last code section was some old code included. Here is it again

Re: mortbay, huge files and the ulimit

2012-09-05 Thread Björn-Elmar Macek
if it possible to say anything about how much the program is still doing useful stuff. On Wed, Sep 5, 2012 at 2:48 PM, Björn-Elmar Macek ma...@cs.uni-kassel.de wrote: Hi Vasco, thank you for your help! I can try to add the limit again (i currently have it turned off for all Java processes spawned

Re: best way to join?

2012-09-04 Thread Björn-Elmar Macek
Hi Dexter, i think, what you want is a clustering of points based on the euclidian distance or density based clustering ( http://en.wikipedia.org/wiki/Cluster_analysis ). I bet there are some implemented quite well in Mahout already: afaik this is the datamining framework based on Hadoop.

Re: mortbay, huge files and the ulimit

2012-08-30 Thread Björn-Elmar Macek
) at org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:410) at org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582) Am 29.08.12 15:53, schrieb Björn-Elmar Macek: Hi there, i am currently running a job where i selfjoin a 63 gigabyte big csv file

JobTracker assigns TaskTracker role to a server of the cluster that it shoud not use...

2012-08-22 Thread Björn-Elmar Macek
Hi all, well since i now got all servers continiously running for my job i still encounter problems: although all services seem to be up and no errors are produced i seem to be stuck at the map process at a certain percentage. I am not yet sure, that just letting the cluster run may solve

Re: DataNode and Tasttracker communication

2012-08-20 Thread Björn-Elmar Macek
Ok, to give to you the solution to the namespace errors on the datanodes, the startup and the communication problem between datanodes/tasktracker and namenode/jobtracker i did the following: As you can read on several sites: there are 2 strategies for fixing datanode namespaces. since i like

Re: DataNode and Tasttracker communication

2012-08-14 Thread Björn-Elmar Macek
Hi James, thank you for your reply! i tried to, but i can only see my own processes, since i am no root user. :( I already sent out a request to the cluster admins to sort this out for me. Regards, Björn Am 14.08.2012 08:51, schrieb James Brown: Hi Bjorn, For the two items below, it is

DataNode and Tasttracker communication

2012-08-13 Thread Björn-Elmar Macek
Hi, i am currently trying to run my hadoop program on a cluster. Sadly though my datanodes and tasktrackers seem to have difficulties with their communication as their logs say: * Some datanodes and tasktrackers seem to have portproblems of some kind as it can be seen in the logs below. I

OutputValueGroupingComparator gets strange inputs (topic changed from Logs cannot be created)

2012-08-09 Thread Björn-Elmar Macek
Hi again, this is an direct response to my previous posting with the title Logs cannot be created, where logs could not be created (Spill failed). I got the hint, that i gotta check privileges, but that was not the problem, because i own the folders that were used for this. I finally found

Namenode and Jobtracker dont start

2012-07-18 Thread Björn-Elmar Macek
Hi, i have lately been running into problems since i started running hadoop on a cluster: The setup is the following: 1 Computer is NameNode and Jobtracker 1 Computer is SecondaryNameNode 2 Computers are TaskTracker and DataNode I ran into problems with running the wordcount example:

Re: Hadoop Debugging in LocalMode (Breakpoints not reached)

2012-05-25 Thread Björn-Elmar Macek
Am 23.05.2012 10:47, schrieb Björn-Elmar Macek: Ok, i have look at the logs some further and googled every tiny bit of them, hoping to find an answer out there. I fear that the following line nails my problem at a big scale: 12/05/22 01:30:21 INFO mapred.ReduceTask: attempt_local_0001_r_00_0

Re: Hadoop Debugging in LocalMode (Breakpoints not reached)

2012-05-23 Thread Björn-Elmar Macek
* The Partitioner always returns proper values Please, i would really need a hint, to where i have to look. Am 22.05.2012 16:57, schrieb Björn-Elmar Macek: Hi Jayaseelan, thanks for the bump! ;) I have continued working on the problem, but with no further success. I emptied the log directory and started

Hadoop Debugging in LocalMode (Breakpoints not reached)

2012-05-22 Thread Björn-Elmar Macek
Hi there, i am currently trying to get rid of bugs in my Hadoop program by debugging it. Everything went fine til some point yesterday. I dont know what exactly happened, but my program does not stop at breakpoints within the Reducer and also not within the RawComparator for the values

Is the order of collected outputs in the map step preserved til the reduce step?

2012-05-10 Thread Björn-Elmar Macek
Hello all, i am currently working with a set of data which is chronologically ordered (every data element has a timestamp and they are monotonically increasing). Please correct me, if i am mistaken, but the data should arrive chronologically ordered at the mapper, right? But is the order in

Hadoop Configuration Issues

2012-04-27 Thread Björn-Elmar Macek
i read on hadoop never discussed these issues. BTW: HADOOP_HOME is defined, although the log tells different. I hope you can assist me. Best regards, Björn-Elmar Macek

Re: Hadoop Configuration Issues

2012-04-27 Thread Björn-Elmar Macek
://mapredit.blogspot.com On Apr 27, 2012, at 12:01 PM, Björn-Elmar Macek wrote: Hello, i have recently installed Hadoop on my and a second machine in order to test the setup and develop little programs locally before deploying them to the cluster. I stumbled over several difficulties, which i could fix with some

Re: Hadoop Configuration Issues

2012-04-27 Thread Björn-Elmar Macek
would suggest you use the default configs: http://svn.apache.org/viewvc/hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/conf/ - Alex -- Alexander Lorenz http://mapredit.blogspot.com On Apr 27, 2012, at 12:39 PM, Björn-Elmar Macek wrote: Hi Alex, as i have written, i already