finish elementary school first. (plus, minus operations at least)
On Thu, Jan 10, 2013 at 7:23 PM, Panshul Whisper ouchwhis...@gmail.comwrote:
Thank you for the response.
Actually it is not a single file, I have JSON files that amount to 115 GB,
these JSON files need to be processed and
it' enough. hadoop uses only 1GB RAM by default.
On Sat, Jan 18, 2014 at 10:11 PM, sri harsha rsharsh...@gmail.com wrote:
Hi ,
i want to install 4 node cluster in 64-bit LINUX. 4GB RAM 500HD is enough
for this or shall i need to expand ?
please suggest about my query.
than x
--
amiable
Probably permission issue.
On Thu, Jul 31, 2014 at 11:32 AM, Houston King houston.k...@gmail.com
wrote:
Hey Everyone,
I'm a noob working to setup my first 13 node Hadoop 2.4.0 cluster, and
I've run into some problems that I'm having a heck of a time debugging.
I've been following the
Max array size is max integer. So, byte array can not be bigger than 2GB
On Aug 22, 2014 1:41 PM, Yuriy yuriythe...@gmail.com wrote:
Hadoop Writable interface relies on public void write(DataOutput out)
method.
It looks like behind DataOutput interface, Hadoop uses DataOutputStream,
which
. That, at least, explains the problem. And what
should be the workaround if the combined set of data is larger than 2 GB?
On Fri, Aug 22, 2014 at 1:50 PM, Alexander Pivovarov apivova...@gmail.com
wrote:
Max array size is max integer. So, byte array can not be bigger than 2GB
On Aug 22, 2014 1:41 PM
e.g. in hive to switch engines
set hive.execution.engine=mr;
or
set hive.execution.engine=tez;
tez is faster especially on complex queries.
On Aug 31, 2014 10:33 PM, Adaryl Bob Wakefield, MBA
adaryl.wakefi...@hotmail.com wrote:
Can Tez and MapReduce live together and get along in the same
It can read/write in parallel to all drives. More hdd more io speed.
On Sep 27, 2014 7:28 AM, Susheel Kumar Gadalay skgada...@gmail.com
wrote:
Correct me if I am wrong.
Adding multiple directories will not balance the files distributions
across these locations.
Hadoop will add exhaust the
Spark creator Amplab did some benchmarks.
https://amplab.cs.berkeley.edu/benchmark/
On Fri, Oct 17, 2014 at 11:06 AM, Adaryl Bob Wakefield, MBA
adaryl.wakefi...@hotmail.com wrote:
Does anybody have any performance figures on how Spark stacks up
against Tez? If you don’t have figures, does
It's going to be spark engine for hive (in addition to mr and tez).
Spark API is available for Java and Python as well.
Tez engine is available now and it's quite stable. As for speed. For
complex queries it shows 10x-20x improvement in comparison to mr engine.
e.g. one of my queries runs 30
Are RHEL7 based OSs supported?
On Wed, Oct 29, 2014 at 3:59 PM, David Novogrodsky
david.novogrod...@gmail.com wrote:
All,
I am new to Hadoop so any help would be appreciated.
I have a question for the mailing list regarding Hadoop. I have installed
the most recent stable version (2.4.1)
2 boxes for 2 NNs (better dedicated boxes)
min 3 JNs
min 3 ZKs
JNs and ZKs can share boxes with other services
On Wed, Nov 5, 2014 at 11:31 PM, Oleg Ruchovets oruchov...@gmail.com
wrote:
Hello.
We are using hortonwork distribution and want to evaluate HA capabilities.
Can community please
.
Thank you for the link.
Just to be sure - JN can be installed on data nodes like zookeeper?
If we have 2 Name Nodes and 15 Data Nodes - is it correct to install ZK
and JN on datanodes machines?
Thanks
Oleg.
On Thu, Nov 6, 2014 at 5:06 PM, Alexander Pivovarov apivova...@gmail.com
wrote
for balanced conf you need (per core)
1-1.5 2 GB 7200 SATA hdd for hdfs (in JBOD mode, not RAID)
3-4 GB RAM ECC
reserve 4GB RAM for OS
better to use separate hdd or usb stick for OS
e.g. for 16 cores you can use
16-24 2GB hdds
64 GB RAM (if planing to use Apache Spark put 128 GB)
On Sun,
I found that the easiest way is to put udf jar to /usr/lib/hadoop-mapred
on all computers in the cluster. Hive cli, hiveserver2, oozie launcher,
oozie hive action, mr will see the jar then. I'm using hdp-2.1.5
On Dec 30, 2014 10:58 PM, reena upadhyay reena2...@gmail.com wrote:
Hi,
I am using
try
mvn package -Pdist -Dtar -DskipTests
On Wed, Feb 11, 2015 at 2:02 PM, Lucio Crusca lu...@sulweb.org wrote:
Hello everybody,
I'm absolutely new to hadoop and a customer asked me to build version 2.6
for
Windows Server 2012 R2. I'm myself a java programmer, among other things,
but
I've
\target\.
https://wiki.apache.org/hadoop/Hadoop2OnWindows
https://svn.apache.org/viewvc/hadoop/common/branches/branch-2/BUILDING.txt?view=markup
On Wed, Feb 11, 2015 at 3:09 PM, Alexander Pivovarov apivova...@gmail.com
wrote:
try
mvn package -Pdist -Dtar -DskipTests
On Wed, Feb 11, 2015 at 2
PM, Lucio Crusca lu...@sulweb.org wrote:
In data mercoledì 11 febbraio 2015 15:17:23, Alexander Pivovarov ha
scritto:
in addition to skipTests you want to add native-win profile
mvn clean package -Pdist,native-win -DskipTests -Dtar
Ok thanks but... what's the point of having tests
Hi Kevin,
What is network throughput btw
1. NFS server and client machine?
2. client machine and dananodes?
Alex
On Feb 13, 2015 5:29 AM, Kevin kevin.macksa...@gmail.com wrote:
Hi,
I am setting up a Hadoop cluster (CDH5.1.3) and I need to copy a thousand
or so files into HDFS, which totals
start several vms and install hadoop on each vm
keywords: kvm, QEMU
On Mon, Jan 26, 2015 at 1:18 AM, Harun Reşit Zafer
harun.za...@tubitak.gov.tr wrote:
Hi everyone,
We have set up and been playing with Hadoop 1.2.x and its friends (Hbase,
pig, hive etc.) on 7 physical servers. We want to
http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/SecureMode.html
https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/6/html/Managing_Smart_Cards/Configuring_a_Kerberos_5_Server.html
On Wed, Feb 18, 2015 at 4:49 PM, Krish Donald gotomyp...@gmail.com
are not using cloudera manager to setup your
cluster.
On Wed, Feb 18, 2015 at 4:51 PM, Alexander Pivovarov apivova...@gmail.com
wrote:
http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/SecureMode.html
https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/6
master node.
---
Regards,
Jonathan Aquilina
Founder Eagle Eye T
On 2015-03-06 02:11, Alexander Pivovarov wrote:
What is the easiest way to assign names to aws ec2 computers?
I guess computer need static hostname and dns name before it can be used
in hadoop cluster.
On Mar 5, 2015 4:36 PM
bootstrap
action (script) to have it distribute esri to the entire cluster.
Why are you guys reinventing the wheel?
---
Regards,
Jonathan Aquilina
Founder Eagle Eye T
On 2015-03-06 03:35, Alexander Pivovarov wrote:
I found the following solution to this problem
I registered 2
same result as
order by ?
On Sat, Mar 7, 2015 at 7:05 PM, Alexander Pivovarov apivova...@gmail.com
wrote:
sort by query produces multiple independent files.
order by - just one file
usually sort by is used with distributed by.
In older hive versions (0.7) they might be used to implement
sort by query produces multiple independent files.
order by - just one file
usually sort by is used with distributed by.
In older hive versions (0.7) they might be used to implement local sort
within partition
similar to RANK() OVER (PARTITION BY A ORDER BY B)
On Sat, Mar 7, 2015 at 3:02 PM,
What is the easiest way to assign names to aws ec2 computers?
I guess computer need static hostname and dns name before it can be used in
hadoop cluster.
On Mar 5, 2015 4:36 PM, Jonathan Aquilina jaquil...@eagleeyet.net wrote:
When I started with EMR it was alot of testing and trial and error.
On Thu, Mar 5, 2015 at 8:47 PM, Alexander Pivovarov apivova...@gmail.com
wrote:
what about DNS?
if you have 2 computers (nn and dn) how nn knows dn ip?
The script puts only this computer ip to /etc/hosts
On Thu, Mar 5, 2015 at 6:39 PM, max scalf oracle.bl...@gmail.com wrote:
Here is a easy
Follow the links I sent you already.
On Apr 30, 2015 11:52 AM, Kumar Jayapal kjayapa...@gmail.com wrote:
Hi Alex,
How to create external textfile hive table pointing to /extract/DBCLOC and
specify CSVSerde ?
Thanks
Jay
On Wed, Apr 29, 2015 at 3:43 PM, Alexander Pivovarov apivova
Try to find the file in hdfs trash
On Apr 30, 2015 2:14 PM, Kumar Jayapal kjayapa...@gmail.com wrote:
Hi,
I loaded one file to hive table it is in .gz extension. file is
moved/deleted from hdfs.
when I execute select command I get an error.
Error: Error while processing statement: FAILED:
wrong.
I appreciate your help.
Thanks
jay
Thank you very much for you help Alex,
On Wed, Apr 29, 2015 at 3:43 PM, Alexander Pivovarov apivova...@gmail.com
wrote:
1. Create external textfile hive table pointing to /extract/DBCLOC and
specify CSVSerde
if using hive-0.14 and newer
try
desc formatted table_name;
it shows you table location on hdfs
On Thu, Apr 30, 2015 at 2:43 PM, Kumar Jayapal kjayapa...@gmail.com wrote:
I did not find it in .Trash file is moved to hive table I want to move it
back to hdfs.
On Thu, Apr 30, 2015 at 2:20 PM, Alexander Pivovarov apivova
1. Create external textfile hive table pointing to /extract/DBCLOC and
specify CSVSerde
if using hive-0.14 and newer use this
https://cwiki.apache.org/confluence/display/Hive/CSV+Serde
if hive-0.13 and older use https://github.com/ogrodnek/csv-serde
You do not even need to unzgip the file. hive
and Kryo issues fixed after 0.13.1?
On Fri, May 15, 2015 at 3:20 PM, Alexander Pivovarov apivova...@gmail.com
wrote:
Looks like it was fixed in hive-0.14
https://issues.apache.org/jira/browse/HIVE-7079
On Fri, May 15, 2015 at 2:26 PM, Alexander Pivovarov apivova...@gmail.com
wrote:
Hi
, Alexander Pivovarov apivova...@gmail.com
wrote:
I also noticed another error message in logs
10848 [main] ERROR org.apache.hadoop.hive.ql.exec.tez.TezJobMonitor -
Status: Failed
10849 [main] ERROR org.apache.hadoop.hive.ql.exec.tez.TezJobMonitor -
Vertex failed, vertexName=Map 32, vertexId
Looks like it was fixed in hive-0.14
https://issues.apache.org/jira/browse/HIVE-7079
On Fri, May 15, 2015 at 2:26 PM, Alexander Pivovarov apivova...@gmail.com
wrote:
Hi Everyone
I'm using hive-0.13.1 (HDP-2.1.5) and getting the following stacktrace
if run my query (which has WITH block
Hi Everyone
I'm using hive-0.13.1 (HDP-2.1.5) and getting the following stacktrace if
run my query (which has WITH block) via Oozie. (BTW, the query works fine
in CLI)
I can't put exact query but the structure is similar to
create table my_consumer
as
with sacusaloan as (select distinct
1. create pom.xml for your project
2. add hadoop dependencies which you need
3. $ mvn eclipse:eclipse -DdownloadSources=true -DdownloadJavadocs=true
4. import existing java project to eclipse
On Wed, May 20, 2015 at 5:31 PM, Caesar Samsi caesarsa...@mac.com wrote:
Hello,
I’m embarking on
Hi Everyone
I have 2 HA clusters mydev and myqa
I want to have an ability to access hdfs://myqa/ paths from mydev cluster
boxes.
What settings should I add to mydev hdfs-site.xml so that hadoop can
resolve myqa HA alias to active NN?
Thank you
Alex
http://www.jets3t.org/toolkit/configuration.html
On Jan 14, 2016 10:56 AM, "Alexander Pivovarov" <apivova...@gmail.com>
wrote:
> Add jets3t.properties file with s3service.s3-endpoint= to
> /etc/hadoop/conf folder
>
> The folder with the file should be in HADOOP_CLASSP
Add jets3t.properties file with s3service.s3-endpoint= to
/etc/hadoop/conf folder
The folder with the file should be in HADOOP_CLASSPATH
JetS3t library which is used by hadoop is looking for this file.
On Dec 22, 2015 12:39 PM, "Phillips, Caleb" wrote:
> Hi All,
>
>
I tried to use one or another for secondary sort -- both options work fine
-- I get combined sorted result in reduce() iterator
Also I noticed that if I set both of them at the same time
then KeyComparatorClass.compare(O1, O2) never called, hadoop calls
only ValueGroupingComparator.compare()
I
more nodes means more IO on read on mapper step
If you use combiners you might need to send only small amount of data over
network to reducers
Alexander
On Tue, Dec 13, 2011 at 12:45 PM, real great.. greatness.hardn...@gmail.com
wrote:
more cores might help in hadoop environments as there
Privet Oleg
Cloudera and Dell setup the following cluster for my company
Company receives 1.5 TB raw data per day
38 data nodes + 2 Name Nodes
Data Node:
Dell PowerEdge C2100 series
2 x XEON x5670
48 GB RAM ECC (12x4GB 1333MHz)
12 x 2 TB 7200 RPM SATA HDD (with hot swap) JBOD
Intel Gigabit
:
Great ,
Thank you for the such detailed information,
By the way what type of Disk Controller do you use?
Thanks
Oleg.
On Tue, Oct 2, 2012 at 6:34 AM, Alexander Pivovarov apivova...@gmail.com
wrote:
Privet Oleg
Cloudera and Dell setup the following cluster for my company
Company
.
On 10/02/2012 12:55 PM, Alexander Pivovarov wrote:
38 data nodes + 2 Name Nodes
Data Node:
Dell PowerEdge C2100 series
2 x XEON x5670
48 GB RAM ECC (12x4GB 1333MHz)
12 x 2 TB 7200 RPM SATA HDD (with hot swap) JBOD
Intel Gigabit ET Dual port PCIe
45 matches
Mail list logo