unsubscribe

2024-02-01 Thread Jakub Stransky

Configuring hadoop in Azure on linux using Azure BLOB storage

2016-01-28 Thread Jakub Stransky
Hello, we are trying to configure hadoop HDP 2.2 running on azure cloud to use a Azure Storage BLOB instead of regular HDFS. Cluster is up and running, we can list files in azure blob storage over hdoop fs commands. But when trying to run smoke test mapreduce teragen we are getting following excep

Re: Capacity scheduler properties

2015-01-15 Thread Jakub Stransky
Wow, pretty awesome documentation! Thx On 15 January 2015 at 19:53, Wangda Tan wrote: > You can check HDP 2.2's document: > > http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.2.0/YARN_RM_v22/capacity_scheduler/index.html > > HTH, > Wangda > > On Thu, Jan 15, 20

Capacity scheduler properties

2015-01-15 Thread Jakub Stransky
Hello, I am configuring capacity scheduler all seems ok but I cannot find what is the meaning of the following property yarn.scheduler.capacity.root.unfunded.capacity I just found that everywhere is set to 50 and description is "No description". Can anybody clarify or point to where to find rel

Memory consumption by AM

2014-10-23 Thread Jakub Stransky
Hello experienced users, we are new to hadoop hence using nearly default configuration including scheduler - which I guess by default is Capacity Scheduler. Lately we were confronted with following behaviour on the cluster. We are using apache oozie for job submission of various data pipes. We ha

Re: How to limit the number of containers requested by a pig script?

2014-10-21 Thread Jakub Stransky
then we can, I think. We do have this property mapreduce.job.maps. > > Regards, > Shahab > > On Tue, Oct 21, 2014 at 2:42 AM, Jakub Stransky > wrote: > >> Hello, >> >> as far as I understand. Number of mappers you cannot drive. The number of >> reducers y

Re: How to limit the number of containers requested by a pig script?

2014-10-20 Thread Jakub Stransky
requested(and used ofcourse) by my pig-script (not as a yarn queue > configuration or some such stuff.. I want to limit it from outside on a > per job basis. I would ideally like to set the number in my pig-script.) > Can I do this? > Thanks, > Sunil. > -- Jakub Stransky cz.linkedin.com/in/jakubstransky

Re: how to copy data between two hdfs cluster fastly?

2014-10-17 Thread Jakub Stransky
Distcp? On 17 Oct 2014 20:51, "Alexander Pivovarov" wrote: > try to run on dest cluster datanode > $ hadoop fs -cp hdfs://from_cluster/hdfs://to_cluster/ > > > > On Fri, Oct 17, 2014 at 11:26 AM, Shivram Mani wrote: > >> What is your approx input size ? >> Do you have multiple files

Cannot fine profiling log file

2014-09-23 Thread Jakub Stransky
Hello experienced users, I did try to use profiling of tasks during mapreduce mapreduce.task.profile true mapreduce.task.profile.maps 0-5 mapreduce.task.profile.params -agentlib:hprof=cpu=samples,heap=sites,depth=6,force=n,thread=y,

Re: CPU utilization

2014-09-12 Thread Jakub Stransky
u run reduce task, you need 1024 MB (mapreduce.reduce.memory.mb). > If you run the MapReduce app master, you need 1024 MB ( > yarn.app.mapreduce.am.resource.mb). > > Therefore, you run MapReduce job, you can run only 2 containers per > NodeManager (3 x 768 = 2304 < 2048) on your setup. > > 2014-09

Re: CPU utilization

2014-09-12 Thread Jakub Stransky
ee higher CPU utilization than 30%. > > Cheers! > Adam > > 2014-09-12 17:51 GMT+02:00 Jakub Stransky : > >> Hello experienced hadoop users, >> >> I have one beginners question regarding cpu utilization on datanodes when >> running MR job. Cluster of 5 mach

Re: Enable Debug logging for a job

2014-09-12 Thread Jakub Stransky
e your response. > > Thanks, > Siddhi > > > -- Jakub Stransky cz.linkedin.com/in/jakubstransky

CPU utilization

2014-09-12 Thread Jakub Stransky
Hello experienced hadoop users, I have one beginners question regarding cpu utilization on datanodes when running MR job. Cluster of 5 machines, 2NN +3 DN really inexpensive hw using following parameters: # hadoop - yarn-site.xml yarn.nodemanager.resource.memory-mb : 2048 yarn.scheduler.minimum-a

task slowness

2014-09-11 Thread Jakub Stransky
Hello experienced hadoop users, I am having a data pipeline consisting of two java MR jobs coordinated by oozie scheduler. Both of them process the same data but the first one is more than 10 times slower than second one. Job counters on RM page are not much helpful in that matter. I have verified

Re: virtual memory consumption

2014-09-11 Thread Jakub Stransky
map memory as 768M and reduce memory as 1024M and am as > 1024M. > > With AM and a single map task it is 1.7M and cannot start another > container for reducer. > Reduce these values and check. > > On 9/11/14, Jakub Stransky wrote: > > Hello hadoop users, > > >

virtual memory consumption

2014-09-11 Thread Jakub Stransky
Hello hadoop users, I am facing following issue when running M/R job during a reduce phase: Container [pid=22961,containerID=container_1409834588043_0080_01_10] is running beyond virtual memory limits. Current usage: 636.6 MB of 1 GB physical memory used; 2.1 GB of 2.1 GB virtual memory used.

running beyond virtual memory limits

2014-09-10 Thread Jakub Stransky
Hello, I am getting following error when running on 500MB dataset compressed in avro data format. Container [pid=22961,containerID=container_1409834588043_0080_01_10] is running beyond virtual memory limits. Current usage: 636.6 MB of 1 GB physical memory used; 2.1 GB of 2.1 GB virtual memory

Error could only be replicated to 0 nodes instead of minReplication (=1)

2014-08-28 Thread Jakub Stransky
Hello, we are using Hadoop 2.2.0 (HDP 2.0), avro 1.7.4. running on CentOS 6.3 I am facing a following issue when using a AvroMultipleOutputs with dynamic output files. My M/R job works fine for a smaller amount of data or at least the error hasn't appear there so far. With bigger amount of data I