Hi all,
Below hive query in pig latin how to do that.
select t2.col1, t3.col2
from table2 t2
join table3 t3
WHERE t3.col2 IS NOT NULL
AND t2.col1 LIKE CONCAT(CONCAT('%',t3.col2),'%')
Regards
Abhi
Of course it all depends...
But something like this could work:
Leave 1-2 GB for the kernel, pagecache, tools, overhead etc.
Plan 3-4 GB for Datanode and Tasktracker each
Plan 2.5-3 GB per slot. Depending on the kinds of jobs, you may need more
or less memory per slot.
Have 2-3 times as many
Well...
If you're not running HBase, you're less harmed by minimal swapping so you
could push the number of slots and over subscribe.
The only thing I would have to suggest is that you monitor your system closely
as you adjust the number of slots.
You have to admit though, its fun to tune
moved common-user@hadoop.apache.org to bcc and added u...@pig.apache.org
Best asked on the Pig users list.
Cheers,
Joep
On Wed, Oct 3, 2012 at 7:04 AM, Abhishek abhishek.dod...@gmail.com wrote:
Hi all,
Below hive query in pig latin how to do that.
select t2.col1, t3.col2
from table2 t2
Hi all,
Can we discuss performance of pig vs hive
1) what hive is good at?
2) what pig is good at?
3) Hive optimizer vs pig optimizer
4) hive limitations vs pig limitations
Regards
Abhi
Sent from my iPhone
from amazon web site:
http://aws.amazon.com/elasticmapreduce/faqs/#hive-8
Q: When should I use Hive vs. PIG?
Hive and PIG both provide high level data-processing languages with support
for complex data types for operating on large datasets. The Hive language
is a variant of SQL and so is more
Hi Zhu,
Thanks for the reply.I am running some querys where is slower than pig.
I was also thinking that pig optimizer is better than hive optimizer.
Regards
Abhi
Sent from my iPhone
On Oct 3, 2012, at 7:15 PM, TianYi Zhu tianyi@facilitatedigital.com wrote:
from amazon web site:
Hi Abhishek,
I've no idea with the optimizer. In my opinion, SQL
like programming language is hard to optimize, hive may slower than pig in
many cases. But on the earth, for every hadoop job, there must be a
best(time or space) sequence of map/reduce phases. you should rewrite your
pig/hive
Thanks Zhu for your reply, your points makes sense to me.
Regards
Abhishek
On Oct 3, 2012, at 8:14 PM, TianYi Zhu tianyi@facilitatedigital.com wrote:
Hi Abhishek,
I've no idea with the optimizer. In my opinion, SQL
like programming language is hard to optimize, hive may slower than
On Wed, Oct 3, 2012 at 7:50 PM, Dan Richelson drichel...@tendrilinc.com wrote:
Anecdotally I can say that Pig seems to scale down better than Hive.
We see this in tests- hive scripts running small amounts of data take
much longer than similar Pig scripts. Hive parallel settings are
enabled.
Anecdotally I can say that Pig seems to scale down better than Hive.
We see this in tests- hive scripts running small amounts of data take
much longer than similar Pig scripts. Hive parallel settings are
enabled. I think this has to do with the fact that there doesn't seem
to be a 'local' mode for
Sorry for double posting
Hi!
The services are running on master(NN,JT,TT,DN) and slave(TT,DN)
according to jps. In the WebUI`s the slaves are shown as up and
running, I'm getting heartbeats and everything. When running a job, it
completes and logs everything to the command prompt. The
Hi,
The classic option exists to provide backward compatibility for users
wanting to run an MR1 cluster (with JT, etc.).
With the inclusion of YARN and MR2 modes of runtime, Apache Hadoop
removed MR1 services support:
➜ mapred jobtracker
Sorry, the jobtracker command is no longer supported.
I have no DHCP, DNS is configured manually(and there might be the problem)
etc hosts on both machines:
172.16.0.1 master vmmaker-ubd64
172.16.0.11 slave vmmaker-slave
127.0.0.1 localhost
# on master
127.0.0.1 vmmaker-ubd64
# on slave
127.0.0.1 vmmaker-slave
Yup, I hate that when it happens.
You tend to see this more with Avro than anything else.
The issue is that in Java, the first class loaded wins. So when Hadoop loads
1.4 first, you can't unload it and replace it with 1.7.
The only solution that we found to be workable is to replace the
Hi Ben,
As long as the switch of libraries doesn't impact the execution of the
Child task code itself, for Apache Hadoop 1.x, using the config
mapreduce.user.classpath.first set to true may solve your trouble.
On Wed, Oct 3, 2012 at 4:51 PM, Ben Rycroft brycrof...@gmail.com wrote:
Hi all,
I
Yup! I hate this issue. It also happens with json libs if you have an old
hadoop !!!
Its really easy also to dump the exact class path at runtime using the
((URLClassLoader)ClassLoader.getSystemClassLoader()).getURLs(); trick...
very very very very useful :)
I always do this just to make sure
Hi,
Would reducing the output from the map tasks solve the problem ? i.e. are
reducers slowing down because a lot of data is being shuffled ?
If that's the case, you could see if the map output size will reduce by
using the framework's combiner or an in-mapper combining technique.
Thanks
I have followed a suggestion on the given link, and set mapred.min.split.size
to 134217728.
With the above mapred.min.split.size, I get mapred.map.tasks =121
(previously it was 242).
Thanks for all the replies !
Shing
From: Romedius Weiss
HI hadoop-users,
I'm curious if there is an implementation somewhere of a counter which
tracks the maximum of some value across all mappers or reducers?
Thanks
J
Jeremy,
Here's my shot at it (pardon the quick crappy code):
https://gist.github.com/3828246
Basically - you can achieve it in two ways:
Requirement: All tasks must increment the max designated counter
only AFTER the max has been computed (i.e. in cleanup).
1. All tasks may use same counter
Why does GenericOptionParser also remove -Dprop=value options (without
setting the system properties)? Those are not hadoop options but java
options. And why does hadoop jar not accept -Dprop=value before the class
name, as java does? like hadoop jar -Dprop=value class
How do i set java system
Hi All,
I recently had occasion to learn Hadoop/HBase and set up a fully distributed
cluster on EC2. To save future Hadoop'ers some of the frustration that
involved, I wrote a how-to blog post detailing the necessary steps to get going
quickly and avoid the gotchas. As far as I know, this is
Koert,
System properties ought to go to JVM args directly. Use
HADOOP_CLIENT_OPTS to pass those -Ds. This is cause System properties
is a JVM-level concept and java is the utility that accepts those.
We provide a wrapper in form of hadoop jar.
GenericOptionsParser (i.e. Tool - Don't use GOP
I am incorrect on the below:
The classic option exists to provide backward compatibility for users
wanting to run an MR1 cluster (with JT, etc.).
Turns out, classic is just for older clients to run on YARN without
any other changes. It roughly translates the RM address from a JT
property into
Moving this to user@ since it's not appropriate for general@.
On Fri, Sep 28, 2012 at 11:16 PM, Xiang Hua bea...@gmail.com wrote:
Hi,
i want to select 4(600G) local disks combined with 3*800G disks form
diskarray in one datanode.
is there any problem? performance ?
The recommended
26 matches
Mail list logo