free some space on data volume

2015-02-16 Thread Georgi Ivanov
Hi, I need to free some space on one of the data directories on one server. I have 2 data volumes : /data/1 and /data/2 One of the hard drives was broken, and after we replaced it, i ran a rebalance. The problem is i forgot to change the policy to FreeSpace (insted of default round-robin). S

CDH 4.7 upgrade java 6 to java 7

2014-11-17 Thread Georgi Ivanov
Hi, I am planning to upgrade Java version to Java 7. I am using ubuntu server Here is what I’m planning to do : 1. Stop all hadoop services in cloudera manager. 2. Update java version on all nodes (update-alternatives --set java /path/to/java). 3. Start all hadoop services in cloudera manager 4

Re: HDFS multiple dfs_data_dir disbalance

2014-10-22 Thread Georgi Ivanov
Reddy Battula ____ From: Georgi Ivanov [iva...@vesseltracker.com] Sent: Wednesday, October 22, 2014 5:17 PM To: user@hadoop.apache.org Subject: HDFS multiple dfs_data_dir disbalance Hi, My cluster is configured with 2 data dirs. /data/1 /data/2 Usually hadoop is

HDFS multiple dfs_data_dir disbalance

2014-10-22 Thread Georgi Ivanov
Hi, My cluster is configured with 2 data dirs. /data/1 /data/2 Usually hadoop is balancing the utilization of these dirs. Now i have one node where /data/1 is 100% full and /data/2 is not. Is there anything i can do about this, as this results in failed mapppers/rdecers ? Georgi

Re: Bzip2 files as an input to MR job

2014-09-22 Thread Georgi Ivanov
cture inside the AVRO and these blocks are gzipped. I suggest you simply try it. Niels On Mon, Sep 22, 2014 at 4:40 PM, Georgi Ivanov mailto:iva...@vesseltracker.com>> wrote: Hi guys, I would like to compress the files on HDFS to save some storage. As far as i see bzip2

Bzip2 files as an input to MR job

2014-09-22 Thread Georgi Ivanov
Hi guys, I would like to compress the files on HDFS to save some storage. As far as i see bzip2 is the only format which is splitable (and slow). The actual files are Avro. So in my driver class i have : job.setInputFormatClass(AvroKeyInputFormat.class); I have number of jobs running processi

Re: Re-sampling time data with MR job. Ideas

2014-09-19 Thread Georgi Ivanov
the individual entities or do you still need all individual entities and just want to translate the timestamp to another resolution (5s => 10 min)? Cheers, Mirko 2014-09-19 9:17 GMT+01:00 Georgi Ivanov <mailto:iva...@vesseltracker.com>>: Hello, I have time related data

Re-sampling time data with MR job. Ideas

2014-09-19 Thread Georgi Ivanov
Hello, I have time related data like this : entity_id, timestamp , data The resolution of the data is something like 5 seconds. I want to extract the data with 10 minutes resolution. So what i can do is : Just emit everything in the mapper as data is not sorted there . Emit only every 10 minutes

Re: Regular expressions in fs paths?

2014-09-10 Thread Georgi Ivanov
Yes you can : hadoop fs -ls /tmp/myfiles* I would recommend first using -ls in order to verify you are selecting the right files. #Mahesh : do you need some help doing this ? On 10.09.2014 13:46, Mahesh Khandewal wrote: I want to unsubscribe from this mailing list On Wed, Sep 10, 2014 at

HDFS balance

2014-09-03 Thread Georgi Ivanov
Hi, We have 11 nodes cluster. Every hour a cron job is started to upload one file( ~1GB) to Hadoop on node1. (plain hadoop fs -put) This way node1 is getting full because the first replica is always stored on the node where the command is executed. Every day i am running re-balance, but this seems