from:"Alexander Pivovarov"

Re: HDFS disk space requirement

2013-01-10 Thread Alexander Pivovarov

finish elementary school first. (plus, minus operations at least) On Thu, Jan 10, 2013 at 7:23 PM, Panshul Whisper ouchwhis...@gmail.comwrote: Thank you for the response. Actually it is not a single file, I have JSON files that amount to 115 GB, these JSON files need to be processed and

Re: doubt

2014-01-18 Thread Alexander Pivovarov

it' enough. hadoop uses only 1GB RAM by default. On Sat, Jan 18, 2014 at 10:11 PM, sri harsha rsharsh...@gmail.com wrote: Hi , i want to install 4 node cluster in 64-bit LINUX. 4GB RAM 500HD is enough for this or shall i need to expand ? please suggest about my query. than x -- amiable

Re: Setting Up First Hadoop / Yarn Cluster

2014-07-31 Thread Alexander Pivovarov

Probably permission issue. On Thu, Jul 31, 2014 at 11:32 AM, Houston King houston.k...@gmail.com wrote: Hey Everyone, I'm a noob working to setup my first 13 node Hadoop 2.4.0 cluster, and I've run into some problems that I'm having a heck of a time debugging. I've been following the

Re: How to serialize very large object in Hadoop Writable?

2014-08-22 Thread Alexander Pivovarov

Max array size is max integer. So, byte array can not be bigger than 2GB On Aug 22, 2014 1:41 PM, Yuriy yuriythe...@gmail.com wrote: Hadoop Writable interface relies on public void write(DataOutput out) method. It looks like behind DataOutput interface, Hadoop uses DataOutputStream, which

Re: How to serialize very large object in Hadoop Writable?

2014-08-22 Thread Alexander Pivovarov

. That, at least, explains the problem. And what should be the workaround if the combined set of data is larger than 2 GB? On Fri, Aug 22, 2014 at 1:50 PM, Alexander Pivovarov apivova...@gmail.com wrote: Max array size is max integer. So, byte array can not be bigger than 2GB On Aug 22, 2014 1:41 PM

Re: Tez and MapReduce

2014-09-01 Thread Alexander Pivovarov

e.g. in hive to switch engines set hive.execution.engine=mr; or set hive.execution.engine=tez; tez is faster especially on complex queries. On Aug 31, 2014 10:33 PM, Adaryl Bob Wakefield, MBA adaryl.wakefi...@hotmail.com wrote: Can Tez and MapReduce live together and get along in the same

Re: No space when running a hadoop job

2014-09-27 Thread Alexander Pivovarov

It can read/write in parallel to all drives. More hdd more io speed. On Sep 27, 2014 7:28 AM, Susheel Kumar Gadalay skgada...@gmail.com wrote: Correct me if I am wrong. Adding multiple directories will not balance the files distributions across these locations. Hadoop will add exhaust the

Re: Spark vs Tez

2014-10-17 Thread Alexander Pivovarov

Spark creator Amplab did some benchmarks. https://amplab.cs.berkeley.edu/benchmark/ On Fri, Oct 17, 2014 at 11:06 AM, Adaryl Bob Wakefield, MBA adaryl.wakefi...@hotmail.com wrote: Does anybody have any performance figures on how Spark stacks up against Tez? If you don’t have figures, does

Re: Spark vs Tez

2014-10-17 Thread Alexander Pivovarov

It's going to be spark engine for hive (in addition to mr and tez). Spark API is available for Java and Python as well. Tez engine is available now and it's quite stable. As for speed. For complex queries it shows 10x-20x improvement in comparison to mr engine. e.g. one of my queries runs 30

Re: problems with Hadoop instalation

2014-10-29 Thread Alexander Pivovarov

Are RHEL7 based OSs supported? On Wed, Oct 29, 2014 at 3:59 PM, David Novogrodsky david.novogrod...@gmail.com wrote: All, I am new to Hadoop so any help would be appreciated. I have a question for the mailing list regarding Hadoop. I have installed the most recent stable version (2.4.1)

Re: High Availability hadoop cluster.

2014-11-06 Thread Alexander Pivovarov

2 boxes for 2 NNs (better dedicated boxes) min 3 JNs min 3 ZKs JNs and ZKs can share boxes with other services On Wed, Nov 5, 2014 at 11:31 PM, Oleg Ruchovets oruchov...@gmail.com wrote: Hello. We are using hortonwork distribution and want to evaluate HA capabilities. Can community please

Re: High Availability hadoop cluster.

2014-11-06 Thread Alexander Pivovarov

. Thank you for the link. Just to be sure - JN can be installed on data nodes like zookeeper? If we have 2 Name Nodes and 15 Data Nodes - is it correct to install ZK and JN on datanodes machines? Thanks Oleg. On Thu, Nov 6, 2014 at 5:06 PM, Alexander Pivovarov apivova...@gmail.com wrote

Re: Hardware requirements for simple node hadoop cluster

2014-12-07 Thread Alexander Pivovarov

for balanced conf you need (per core) 1-1.5 2 GB 7200 SATA hdd for hdfs (in JBOD mode, not RAID) 3-4 GB RAM ECC reserve 4GB RAM for OS better to use separate hdd or usb stick for OS e.g. for 16 cores you can use 16-24 2GB hdds 64 GB RAM (if planing to use Apache Spark put 128 GB) On Sun,

Re: way to add custom udf jar in hadoop 2.x version

2014-12-31 Thread Alexander Pivovarov

I found that the easiest way is to put udf jar to /usr/lib/hadoop-mapred on all computers in the cluster. Hive cli, hiveserver2, oozie launcher, oozie hive action, mr will see the jar then. I'm using hdp-2.1.5 On Dec 30, 2014 10:58 PM, reena upadhyay reena2...@gmail.com wrote: Hi, I am using

Re: Building for Windows

2015-02-11 Thread Alexander Pivovarov

try mvn package -Pdist -Dtar -DskipTests On Wed, Feb 11, 2015 at 2:02 PM, Lucio Crusca lu...@sulweb.org wrote: Hello everybody, I'm absolutely new to hadoop and a customer asked me to build version 2.6 for Windows Server 2012 R2. I'm myself a java programmer, among other things, but I've

Re: Building for Windows

2015-02-11 Thread Alexander Pivovarov

\target\. https://wiki.apache.org/hadoop/Hadoop2OnWindows https://svn.apache.org/viewvc/hadoop/common/branches/branch-2/BUILDING.txt?view=markup On Wed, Feb 11, 2015 at 3:09 PM, Alexander Pivovarov apivova...@gmail.com wrote: try mvn package -Pdist -Dtar -DskipTests On Wed, Feb 11, 2015 at 2

Re: Building for Windows

2015-02-11 Thread Alexander Pivovarov

PM, Lucio Crusca lu...@sulweb.org wrote: In data mercoledì 11 febbraio 2015 15:17:23, Alexander Pivovarov ha scritto: in addition to skipTests you want to add native-win profile mvn clean package -Pdist,native-win -DskipTests -Dtar Ok thanks but... what's the point of having tests

Re: Copying many files to HDFS

2015-02-16 Thread Alexander Pivovarov

Hi Kevin, What is network throughput btw 1. NFS server and client machine? 2. client machine and dananodes? Alex On Feb 13, 2015 5:29 AM, Kevin kevin.macksa...@gmail.com wrote: Hi, I am setting up a Hadoop cluster (CDH5.1.3) and I need to copy a thousand or so files into HDFS, which totals

Re: Multiple separate Hadoop clusters on same physical machines

2015-02-01 Thread Alexander Pivovarov

start several vms and install hadoop on each vm keywords: kvm, QEMU On Mon, Jan 26, 2015 at 1:18 AM, Harun Reşit Zafer harun.za...@tubitak.gov.tr wrote: Hi everyone, We have set up and been playing with Hadoop 1.2.x and its friends (Hbase, pig, hive etc.) on 7 physical servers. We want to

Re: Kerberos Security in Hadoop

2015-02-18 Thread Alexander Pivovarov

http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/SecureMode.html https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/6/html/Managing_Smart_Cards/Configuring_a_Kerberos_5_Server.html On Wed, Feb 18, 2015 at 4:49 PM, Krish Donald gotomyp...@gmail.com

Re: Kerberos Security in Hadoop

2015-02-18 Thread Alexander Pivovarov

are not using cloudera manager to setup your cluster. On Wed, Feb 18, 2015 at 4:51 PM, Alexander Pivovarov apivova...@gmail.com wrote: http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/SecureMode.html https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/6

Re: t2.micro on AWS; Is it enough for setting up Hadoop cluster ?

2015-03-05 Thread Alexander Pivovarov

master node. --- Regards, Jonathan Aquilina Founder Eagle Eye T On 2015-03-06 02:11, Alexander Pivovarov wrote: What is the easiest way to assign names to aws ec2 computers? I guess computer need static hostname and dns name before it can be used in hadoop cluster. On Mar 5, 2015 4:36 PM

Re: t2.micro on AWS; Is it enough for setting up Hadoop cluster ?

2015-03-05 Thread Alexander Pivovarov

bootstrap action (script) to have it distribute esri to the entire cluster. Why are you guys reinventing the wheel? --- Regards, Jonathan Aquilina Founder Eagle Eye T On 2015-03-06 03:35, Alexander Pivovarov wrote: I found the following solution to this problem I registered 2

Re: sorting in hive -- general

2015-03-08 Thread Alexander Pivovarov

same result as order by ? On Sat, Mar 7, 2015 at 7:05 PM, Alexander Pivovarov apivova...@gmail.com wrote: sort by query produces multiple independent files. order by - just one file usually sort by is used with distributed by. In older hive versions (0.7) they might be used to implement

Re: sorting in hive -- general

2015-03-07 Thread Alexander Pivovarov

sort by query produces multiple independent files. order by - just one file usually sort by is used with distributed by. In older hive versions (0.7) they might be used to implement local sort within partition similar to RANK() OVER (PARTITION BY A ORDER BY B) On Sat, Mar 7, 2015 at 3:02 PM,

Re: t2.micro on AWS; Is it enough for setting up Hadoop cluster ?

2015-03-05 Thread Alexander Pivovarov

What is the easiest way to assign names to aws ec2 computers? I guess computer need static hostname and dns name before it can be used in hadoop cluster. On Mar 5, 2015 4:36 PM, Jonathan Aquilina jaquil...@eagleeyet.net wrote: When I started with EMR it was alot of testing and trial and error.

Re: t2.micro on AWS; Is it enough for setting up Hadoop cluster ?

2015-03-05 Thread Alexander Pivovarov

On Thu, Mar 5, 2015 at 8:47 PM, Alexander Pivovarov apivova...@gmail.com wrote: what about DNS? if you have 2 computers (nn and dn) how nn knows dn ip? The script puts only this computer ip to /etc/hosts On Thu, Mar 5, 2015 at 6:39 PM, max scalf oracle.bl...@gmail.com wrote: Here is a easy

Re: how to load data

2015-04-30 Thread Alexander Pivovarov

Follow the links I sent you already. On Apr 30, 2015 11:52 AM, Kumar Jayapal kjayapa...@gmail.com wrote: Hi Alex, How to create external textfile hive table pointing to /extract/DBCLOC and specify CSVSerde ? Thanks Jay On Wed, Apr 29, 2015 at 3:43 PM, Alexander Pivovarov apivova

Re: How to move back to .gz file from hive to hdfs

2015-04-30 Thread Alexander Pivovarov

Try to find the file in hdfs trash On Apr 30, 2015 2:14 PM, Kumar Jayapal kjayapa...@gmail.com wrote: Hi, I loaded one file to hive table it is in .gz extension. file is moved/deleted from hdfs. when I execute select command I get an error. Error: Error while processing statement: FAILED:

Re: how to load data

2015-05-01 Thread Alexander Pivovarov

wrong. I appreciate your help. Thanks jay Thank you very much for you help Alex, On Wed, Apr 29, 2015 at 3:43 PM, Alexander Pivovarov apivova...@gmail.com wrote: 1. Create external textfile hive table pointing to /extract/DBCLOC and specify CSVSerde if using hive-0.14 and newer

Re: How to move back to .gz file from hive to hdfs

2015-04-30 Thread Alexander Pivovarov

try desc formatted table_name; it shows you table location on hdfs On Thu, Apr 30, 2015 at 2:43 PM, Kumar Jayapal kjayapa...@gmail.com wrote: I did not find it in .Trash file is moved to hive table I want to move it back to hdfs. On Thu, Apr 30, 2015 at 2:20 PM, Alexander Pivovarov apivova

Re: how to load data

2015-04-29 Thread Alexander Pivovarov

1. Create external textfile hive table pointing to /extract/DBCLOC and specify CSVSerde if using hive-0.14 and newer use this https://cwiki.apache.org/confluence/display/Hive/CSV+Serde if hive-0.13 and older use https://github.com/ogrodnek/csv-serde You do not even need to unzgip the file. hive

Re: query uses WITH blocks and throws exception if run as Oozie hive action (hive-0.13.1)

2015-05-15 Thread Alexander Pivovarov

and Kryo issues fixed after 0.13.1? On Fri, May 15, 2015 at 3:20 PM, Alexander Pivovarov apivova...@gmail.com wrote: Looks like it was fixed in hive-0.14 https://issues.apache.org/jira/browse/HIVE-7079 On Fri, May 15, 2015 at 2:26 PM, Alexander Pivovarov apivova...@gmail.com wrote: Hi

Re: query uses WITH blocks and throws exception if run as Oozie hive action (hive-0.13.1)

2015-05-15 Thread Alexander Pivovarov

, Alexander Pivovarov apivova...@gmail.com wrote: I also noticed another error message in logs 10848 [main] ERROR org.apache.hadoop.hive.ql.exec.tez.TezJobMonitor - Status: Failed 10849 [main] ERROR org.apache.hadoop.hive.ql.exec.tez.TezJobMonitor - Vertex failed, vertexName=Map 32, vertexId

Re: query uses WITH blocks and throws exception if run as Oozie hive action (hive-0.13.1)

2015-05-15 Thread Alexander Pivovarov

Looks like it was fixed in hive-0.14 https://issues.apache.org/jira/browse/HIVE-7079 On Fri, May 15, 2015 at 2:26 PM, Alexander Pivovarov apivova...@gmail.com wrote: Hi Everyone I'm using hive-0.13.1 (HDP-2.1.5) and getting the following stacktrace if run my query (which has WITH block

query uses WITH blocks and throws exception if run as Oozie hive action (hive-0.13.1)

2015-05-15 Thread Alexander Pivovarov

Hi Everyone I'm using hive-0.13.1 (HDP-2.1.5) and getting the following stacktrace if run my query (which has WITH block) via Oozie. (BTW, the query works fine in CLI) I can't put exact query but the structure is similar to create table my_consumer as with sacusaloan as (select distinct

Re: How do I integrate Hadoop app development with Eclipse IDE?

2015-05-20 Thread Alexander Pivovarov

1. create pom.xml for your project 2. add hadoop dependencies which you need 3. $ mvn eclipse:eclipse -DdownloadSources=true -DdownloadJavadocs=true 4. import existing java project to eclipse On Wed, May 20, 2015 at 5:31 PM, Caesar Samsi caesarsa...@mac.com wrote: Hello, I’m embarking on

What settings I need to access remove HA cluster.

2015-06-29 Thread Alexander Pivovarov

Hi Everyone I have 2 HA clusters mydev and myqa I want to have an ability to access hdfs://myqa/ paths from mydev cluster boxes. What settings should I add to mydev hdfs-site.xml so that hadoop can resolve myqa HA alias to active NN? Thank you Alex

Re: fs.s3a.endpoint not working

2016-01-14 Thread Alexander Pivovarov

http://www.jets3t.org/toolkit/configuration.html On Jan 14, 2016 10:56 AM, "Alexander Pivovarov" <apivova...@gmail.com> wrote: > Add jets3t.properties file with s3service.s3-endpoint= to > /etc/hadoop/conf folder > > The folder with the file should be in HADOOP_CLASSP

Re: fs.s3a.endpoint not working

2016-01-14 Thread Alexander Pivovarov

Add jets3t.properties file with s3service.s3-endpoint= to /etc/hadoop/conf folder The folder with the file should be in HADOOP_CLASSPATH JetS3t library which is used by hadoop is looking for this file. On Dec 22, 2015 12:39 PM, "Phillips, Caleb" wrote: > Hi All, > >

What's the diff btw setOutputKeyComparatorClass and setOutputValueGroupingComparator?

2011-11-26 Thread Alexander Pivovarov

I tried to use one or another for secondary sort -- both options work fine -- I get combined sorted result in reduce() iterator Also I noticed that if I set both of them at the same time then KeyComparatorClass.compare(O1, O2) never called, hadoop calls only ValueGroupingComparator.compare() I

Re: More cores Vs More Nodes ?

2011-12-13 Thread Alexander Pivovarov

more nodes means more IO on read on mapper step If you use combiners you might need to send only small amount of data over network to reducers Alexander On Tue, Dec 13, 2011 at 12:45 PM, real great.. greatness.hardn...@gmail.com wrote: more cores might help in hadoop environments as there

Re: Which hardware to choose

2012-10-01 Thread Alexander Pivovarov

Privet Oleg Cloudera and Dell setup the following cluster for my company Company receives 1.5 TB raw data per day 38 data nodes + 2 Name Nodes Data Node: Dell PowerEdge C2100 series 2 x XEON x5670 48 GB RAM ECC (12x4GB 1333MHz) 12 x 2 TB 7200 RPM SATA HDD (with hot swap) JBOD Intel Gigabit

Re: Which hardware to choose

2012-10-02 Thread Alexander Pivovarov

: Great , Thank you for the such detailed information, By the way what type of Disk Controller do you use? Thanks Oleg. On Tue, Oct 2, 2012 at 6:34 AM, Alexander Pivovarov apivova...@gmail.com wrote: Privet Oleg Cloudera and Dell setup the following cluster for my company Company

Re: Which hardware to choose

2012-10-02 Thread Alexander Pivovarov

. On 10/02/2012 12:55 PM, Alexander Pivovarov wrote: 38 data nodes + 2 Name Nodes Data Node: Dell PowerEdge C2100 series 2 x XEON x5670 48 GB RAM ECC (12x4GB 1333MHz) 12 x 2 TB 7200 RPM SATA HDD (with hot swap) JBOD Intel Gigabit ET Dual port PCIe

45 matches

Mail list logo