ards / Met vriendelijke groeten,
Niels Basjes
(in either Hadoop or Flink) or am I doing something wrong?
Would upgrading Yarn to 2.7.1 (i.e. HDP 2.3) fix this?
Niels Basjes
21:30:27,821 WARN org.apache.hadoop.security.UserGroupInformation
- PriviledgedActionException as:nbasjes (auth:SIMPLE)
cause:org.apache.hadoop.ipc.RemoteException
Just curious: what is the input for your job ? If it is a single gzipped
file then that is the cause of getting exactly 1 mapper.
Niels
On Fri, Apr 10, 2015, 09:21 Amit Kumar amiti...@msn.com wrote:
Thanks a lot Harsha for replying
This problem has waster at least last one week.
We tried
'] ];
Is this something for which there is already a JIRA (couldn't find it)?
If not; Should I create one? (I.e. do you think this would make sense for
others?)
Niels Basjes
On Fri, Jan 2, 2015 at 9:00 PM, Yakubovich, Alexey
alexey.yakubov...@searshc.com wrote:
Try to look hr:
http://stackoverflow.com
I created https://issues.apache.org/jira/browse/HIVE-9252 for this
improvement.
On Sun, Jan 4, 2015 at 5:16 PM, Niels Basjes ni...@basjes.nl wrote:
Hi,
These options:
- HIVE_HOME/auxlib
- http://stackoverflow.com/questions/14032924/how-to-add-serde-jar
- ADD JAR commands in your $HOME
Thanks for the pointer.
This seems to work for functions. Is there something similar for CREATE
EXTERNAL TABLE ??
Niels
On Dec 31, 2014 8:13 AM, Ted Yu yuzhih...@gmail.com wrote:
Have you seen this thread ?
or anyone which have used your email account.
cheers Aleks
--
Best regards / Met vriendelijke groeten,
Niels Basjes
Dec 2014 18:05, Niels Basjes ni...@basjes.nl wrote:
Yes, I agree. We should accept people as they are.
So perhaps we should increase the hurdle to subscribe in the first place?
Something like adding a question like What do you do if you want to
unsubscribe from a mailing list?
That way
.
Perhaps an issue indicating that the use of the deprecated parameters
should be removed from the main code base is in order here.
Niels Basjes
On Fri, Nov 14, 2014 at 9:22 PM, Tianyin Xu t...@cs.ucsd.edu wrote:
Hi,
I'm very confused by some of the MapReduce configuration parameters
which appear
Very interesting!
What makes Tez more scalable than Spark?
What architectural thing makes the difference?
Niels Basjes
On Oct 19, 2014 3:07 AM, Jeff Zhang zjf...@gmail.com wrote:
Tez has a feature called pre-warm which will launch JVM before you use it
and you can reuse the container
suitable.
Did I understand correctly?
Niels Basjes
On Oct 17, 2014 8:30 PM, Gavin Yue yue.yuany...@gmail.com wrote:
Spark and tez both make MR faster, this has no doubt.
They also provide new features like DAG, which is quite important for
interactive query processing. From this perspective, you
on the Internet.
Georgi
--
Best regards / Met vriendelijke groeten,
Niels Basjes
googling.
Does anyone know where I can find such a thing?
--
Best regards / Met vriendelijke groeten,
Niels Basjes
I expect the impact on the IO speed to be almost 0 because waiting for a
single disk seek is longer than many thousands of calls to a synchronized
method.
Niels
On Aug 11, 2013 3:00 PM, Harsh J ha...@cloudera.com wrote:
Yes, I feel we could discuss this over a JIRA to remove it if it hurts
--
Best regards / Met vriendelijke groeten,
Niels Basjes
--
Best regards / Met vriendelijke groeten,
Niels Basjes
A circular file on hdfs is not possible.
Some of the ways around this limitation:
- Create a series of files and delete the oldest file when you have too
much.
- Put the data into an hbase table and do something similar.
- Use completely different technology like mongodb which has built in
) you need them all to update their config
files.
My question is: Can you set the HADOOP_CONF_DIR to be a URL on a webserver?
A while ago I tried this and (back then) it didn't work.
Would this be a useful enhancement?
--
Best regards,
Niels Basjes
fast.
What things should we consider also?
Has anyone any experience with such a setup?
Is it a good idea to do this?
What are better options for us to consider?
Thanks for any input.
--
Best regards,
Niels Basjes
If you try to hammer in a nail (json file) with a screwdriver (
XMLInputReader) then perhaps the reason it won't work may be that you are
using the wrong tool?
On Jun 21, 2013 11:38 PM, jamal sasha jamalsha...@gmail.com wrote:
Hi,
I am using one of the libraries which rely on InputFormat.
My best guess is that at a low level a string is often terminated by having
a null byte at the end.
Perhaps that's where the difference lies.
Perhaps the gz decompressor simply stops at the null byte and the basic
record reader that follows simply continues.
In this situation your input file
Have you tried something like this (i do not have a pc here to check this
code)
context.write(NullWritable, new Text(jsn.toString()));
On Jun 4, 2013 8:10 PM, Chengi Liu chengi.liu...@gmail.com wrote:
Hi,
I have the following redcuer class
public static class TokenCounterReducer
on something identical to what you are
describing here.
Niels Basjes
On Sat, Jun 1, 2013 at 9:47 PM, Rody BigData rodybigd...@gmail.com wrote:
I have some old ( not very old - each of 4GB RAM with a decent processor
etc., and working fine till now ) Dell Windows XP machines and want to
convert
I've installed CentOS on several different types of old (originally Windows
XP) Dell desktops for the last 4 years (i.e. desktops as old as 7 years
ago) and so far installing CentOS was as easy as booting from the
installation CD/DVD and doing next, next, finish.
The only thing that you may run
I never configure the ssh feature.
Not for running on a single node and not for a full size cluster.
I simply start all the required deamons (name/data/job/task) and configure
them on which ports each can be reached.
Niels Basjes
On May 16, 2013 4:55 PM, Raj Hadoop hadoop...@yahoo.com wrote
cluster.
On Tue, May 14, 2013 at 5:09 PM, Niels Basjes ni...@basjes.nl wrote:
I made a typo. I meant API (instead of SPI).
Have a look at this for more information:
http://stackoverflow.com/questions/833768/java-code-for-getting-current-time
If you have a client that is not under
the time from the namenode or jobtracker
would suffice.
i looked at JobClient but didn't see anything helpful.
--
Best regards / Met vriendelijke groeten,
Niels Basjes
time is easy.
Niels Basjes
On Tue, May 14, 2013 at 5:46 PM, Jane Wayne jane.wayne2...@gmail.comwrote:
niels,
i'm not familiar with the native java spi. spi = service provider
interface? could you let me know if this spi is part of the hadoop
api? if so, which package/class?
but yes, all
How about a different approach:
If you use the multiple output option you can process the valid lines in a
normal way and put the invalid lines in a special separate output file.
On Apr 18, 2013 9:36 PM, Matthias Scherer matthias.sche...@1und1.de
wrote:
Hi all,
** **
In my mapreduce job,
Have a look at this
http://stackoverflow.com/questions/3546025/is-it-possible-to-run-hadoop-in-pseudo-distributed-operation-without-hdfs
--
Met vriendelijke groet,
Niels Basjes
(Verstuurd vanaf mobiel )
Op 17 feb. 2013 07:51 schreef Agarwal, Nikhil nikhil.agar...@netapp.com
het volgende:
Hi
My suggestion is to use secondary sort with a single reducer. That easy you
can easily extract the top N. If you want to get the top N% you'll need an
additional phase to determine how many records this N% really is.
--
Met vriendelijke groet,
Niels Basjes
(Verstuurd vanaf mobiel )
Op 2 feb
My suggestion is to use secondary sort with a single reducer. That easy you
can easily extract the top N. If you want to get the top N% you'll need an
additional phase to determine how many records this N% really is.
--
Met vriendelijke groet,
Niels Basjes
(Verstuurd vanaf mobiel )
Op 2 feb
F. put a mongodb replica set on all hadoop workernodes and let the tasks
query the mongodb at localhost.
(this is what I did recently with a multi GiB dataset)
--
Met vriendelijke groet,
Niels Basjes
(Verstuurd vanaf mobiel )
Op 30 dec. 2012 20:01 schreef Jonathan Bishop jbishop@gmail.com
into blocks and stored in HDFS?
Yes, and then the mapper will read the other parts of the file over the network.
So what I do is I upload such files with a bigger HDFS blocksize so
the mapper has the entire file locally.
--
Best regards / Met vriendelijke groeten,
Niels Basjes
.
Then you also have other solutions which will allow you to scale such as
Storm.
A few people have already considered using Storm for scalability and Esper
to do the real computation.
Regards
Bertrand
On Sun, Aug 19, 2012 at 9:44 PM, Niels Basjes ni...@basj.es wrote:
Is there a complete
: Have a look at the WordCount example.
Input of a single map call is 1 record: This is a line
Output are 4 records:
This1
is 1
a1
line 1
--
Best regards / Met vriendelijke groeten,
Niels Basjes
and up (I tested it with Cloudera
CDH4b1).
So for now Hadoop 1.x is not yet supported (waiting for HADOOP-7823).
Running mvn package automatically generates an RPM on my CentOS system.
Have fun with it an let me know what you think.
--
Best regards / Met vriendelijke groeten,
Niels Basjes
and up (I tested it with Cloudera
CDH4b1).
So for now Hadoop 1.x is not yet supported (waiting for HADOOP-7823).
Running mvn package automatically generates an RPM on my CentOS system.
Have fun with it an let me know what you think.
--
Best regards / Met vriendelijke groeten,
Niels Basjes
that takes the output of run 1 and create a aggregate that
can be used to partition the dataset
2) Use the partitioning dataset from '1)' to distribute the processing for
the next run.
Thanks for your suggestions.
--
Best regards / Met vriendelijke groeten,
Niels Basjes
and from there simply manual define the partitions based on the
pattern we find.
--
Best regards / Met vriendelijke groeten,
Niels Basjes
each line in the very
first mapper. Then we store the result in (snappy compressed) avro files.
I don't disagree, I just want to have a solid argument in favor of it...
:)
--
Best regards / Met vriendelijke groeten,
Niels Basjes
of concatenated gzipped files. (HADOOP-7909)
--
Best regards / Met vriendelijke groeten,
Niels Basjes
it by setting the right configuration?
- a separate library?
- a nice idea I had fun building but that no one needs?
- ... ?
--
Best regards / Met vriendelijke groeten,
Niels Basjes
regards / Met vriendelijke groeten,
Niels Basjes
and base the partitioning on that (like the one used in
terrasort) wouldn't help.
The data has a special distribution...
Niels Basjes
--Bobby Evans
On 2/28/12 2:10 PM, Niels Basjes ni...@basjes.nl wrote:
Hi,
We have a job that outputs a set of files that are several hundred MB
it by setting the right configuration?
- a separate library?
- a nice idea I had fun building but that no one needs?
- ... ?
--
Best regards / Met vriendelijke groeten,
Niels Basjes
groet,
Niels Basjes
Op 6 sep. 2011 01:54 schreef ilyal levin nipponil...@gmail.com het
volgende:
o.k , so now i'm using SequenceFileInputFormat
and SequenceFileOutputFormat and it works fine but the output of the reducer
is
now a binary file (not txt) so i can't understand the data. how can i
Yes, that way it could work.
I'm just wondering ... Why would you want to have a script like this in
HDFS?
Met vriendelijk groet,
Niels Basjes
Op 16 aug. 2011 06:49 schreef Friso van Vollenhoven
fvanvollenho...@xebia.com het volgende:
hadoop fs -cat /path/on/hdfs/script.sh | bash
Should
/ mapper_number records. Does anyone has such
experience ?
--
Best Regards
Jeff Zhang
--
Best regards / Met vriendelijke groeten,
Niels Basjes
blocks in
parallel.) Note that you’ll need enough storage capacity. I don’t have
example code, but I’m guessing Google can help.
From: Mapred Learn [mailto:mapred.le...@gmail.com]
Sent: maandag 20 juni 2011 18:09
To: Niels Basjes; Evert Lammerts
Subject: Re: AW: How to split a big file
mappers running in parallel. Isn't it so ?
Yes, that is very true.
--
Best regards / Met vriendelijke groeten,
Niels Basjes
*/
--
Met vriendelijke groeten,
Niels Basjes
)
at org.apache.hadoop.mapred.Task.initialize(Task.java:486)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
...
So what is the correct way of doing this?
--
Met vriendelijke groeten,
Niels Basjes
that
a datanode that has blocks of this file must always have ALL blocks of
this file?
--
Best regards,
Niels Basjes
at 1:25 PM, Niels Basjes ni...@basjes.nl wrote:
Hi,
In some scenarios you have gzipped files as input for your map reduce
job (apache logfiles is a common example).
Now some of those files are several hundred megabytes and as such will
be split by HDFS in several blocks.
When looking
Set 64-bit
I was thinking in running a Tasktracker in each Core, although I don't
know how to do it.
Any help to install Hadoop MR in cluster mode on my laptop?
Thanks,
--
Pedro
--
Met vriendelijke groeten,
Niels Basjes
scripting for
anaconda (= Redhat installer).
--
Met vriendelijke groeten,
Niels Basjes
storageID (as defined in
.../cache/hdfs/dfs/data/current/VERSION.
So, my question is How do I resolve the collision of the storageIDs?
Thanks!
-mgl
--
Met vriendelijke groeten,
Niels Basjes
hadoop code actually chooses the decompressor on the
extention of the filename.
--
Niels Basjes
solution of their own a while ago. Howl?
--
Harsh J
http://harshj.com
--
Met vriendelijke groeten,
Niels Basjes
.
Disadvantage:
Each run will show different results.
Only works if the set of keys that needs to be chopped is small
enough so you can have it in memory in the call to the second map.
HTH
Niels Basjes
2011/3/10 Luca Aiello alu...@yahoo-inc.com:
Dear users,
hope this is the right list to submit
the distribution.
This can introduce some errors but it should produce a output which is quite
uniformly distributed.
Thanks again!
You're welcome.
Niels
On Mar 10, 2011, at 12:23 PM, Niels Basjes wrote:
If I understand your problem correctly you actually need some way of
knowing if you need
.
--
Met vriendelijke groeten,
Niels Basjes
groeten,
Niels Basjes
.
HTH
--
Met vriendelijke groeten,
Niels Basjes
).
If you really need realtime (as in: I want a guarantee that I have
an answer within 0.x seconds) the answer is: No, HDFS/HBase cannot
guarantee that.
Other components like MapReduce (and Hive which run on top of
MapReduce) are purely batch oriented.
--
Met vriendelijke groeten,
Niels Basjes
of
MR parallelism?
AFAIK it should be splittable in the same blocks as the compression was done.
How to control the size of block to be compressed in SequenceFile?
Can't help you with that one.
--
Met vriendelijke groeten,
Niels Basjes
is stop
reading the input iterator after N records and limit the output in
that way.
Doing it in the reducer also allows you to easily add a concept of
Top N by using the Secondary Sort trick to sort the input before
it arrives at the reducer.
HTH
Niels Basjes
)
I am using CDH3B3, even though I think this is not specific to CDH3B3.
Sorry for the cross post.
Raj
--
Met vriendelijke groeten,
Niels Basjes
are
being used, as we are not getting the full advantage of our cluster.
--
Met vriendelijke groeten,
Niels Basjes
?
Thanks,
Pedro
--
Met vriendelijke groeten,
Niels Basjes
-site.xml using these two settings:
mapred.tasktracker.{map|reduce}.tasks.maximum
Have a look at this page for more information
http://hadoop.apache.org/common/docs/current/cluster_setup.html
--
Met vriendelijke groeten,
Niels Basjes
a
(certain) job to invoke exactly N Mappers, where N is the number of cores in
the cluster. Irregardless of the size of the data. This is not critical if
it can't be done, but it can improve the performance of my job if it can be
done.
Thanks
Shai
On Thu, Nov 25, 2010 at 9:55 PM, Niels Basjes
that you can't.
The main limit is that the Iterator does not have a size or length
method.
--
Met vriendelijke groeten,
Niels Basjes
times there is a design fault in the processing and the combiner
disrupts the processing.
HTH
Niels Basjes
2010/11/5 Adam Phelps a...@opendns.com
I've noticed an odd behavior with a map-reduce job I've written which is
reading data out of an HBase table. After a couple days of poking
would like to understand the logic behind the current implementation
choice in relation to what I expected (mainly from the documentation).
Thanks for explaining.
--
Best regards,
Niels Basjes
76 matches
Mail list logo