Hi,
You say are on an HA cluster; yet by just looking at the errors I see the
stack being routed through "org.apache.hadoop.hdfs.NameNodeProxies.
createNonHAProxy"
My best guess is that you HA config is incomplete.
Niels Basjes
On Wed, Apr 27, 2016 at 4:27 PM, Mayank Mishra wr
Best regards / Met vriendelijke groeten,
Niels Basjes
g (in either Hadoop or Flink) or am I doing something wrong?
Would upgrading Yarn to 2.7.1 (i.e. HDP 2.3) fix this?
Niels Basjes
21:30:27,821 WARN org.apache.hadoop.security.UserGroupInformation
- PriviledgedActionException as:nbasjes (auth:SIMPLE)
cause:org.apache.hadoop.ipc.RemoteExce
MapReduce is based on the premise that several parts of a task can be
processed independently in parallel.
If you "require" an order of processing then these files are depending on
each other. Why use MapReduce at all?
With your requirement you cannot use more than one CPU anyway.
Niels
On Thu, 3
Just curious: what is the input for your job ? If it is a single gzipped
file then that is the cause of getting exactly 1 mapper.
Niels
On Fri, Apr 10, 2015, 09:21 Amit Kumar wrote:
> Thanks a lot Harsha for replying
>
> This problem has waster at least last one week.
>
> We tried what you sugg
I created https://issues.apache.org/jira/browse/HIVE-9252 for this
improvement.
On Sun, Jan 4, 2015 at 5:16 PM, Niels Basjes wrote:
> Hi,
>
> These options:
> - HIVE_HOME/auxlib
> - http://stackoverflow.com/questions/14032924/how-to-add-serde-jar
> - ADD JAR commands in your
[, JAR|FILE|ARCHIVE 'file_uri'] ];
Is this something for which there is already a JIRA (couldn't find it)?
If not; Should I create one? (I.e. do you think this would make sense for
others?)
Niels Basjes
On Fri, Jan 2, 2015 at 9:00 PM, Yakubovich, Alexey <
alexey.yakubov...@sear
Thanks for the pointer.
This seems to work for functions. Is there something similar for CREATE
EXTERNAL TABLE ??
Niels
On Dec 31, 2014 8:13 AM, "Ted Yu" wrote:
> Have you seen this thread ?
>
> http://search-hadoop.com/m/8er9TcALc/Hive+udf+custom+jar&subj=Best+way+to+add+custom+UDF+jar+in+HiveS
t unsubscribe at the same way as the subscription
>> was, as described here.
>>
>> http://hadoop.apache.org/mailing_lists.html
>>
>> In the case that YOU don't know how a mailinglist works, please take a
>> look here.
>>
>> http://en.wikipedia.org/wik
t;
> I'm new to this list but from my point of view it is very disrespectful to
> the list members and developers that YOU don't invest a little bit of time
> by your self to search how you can unsubscribe from a list on which YOU
> have subscribed or anyone which have used your email account.
>
> cheers Aleks
>
>
>
>
>
>
>
>
--
Best regards / Met vriendelijke groeten,
Niels Basjes
.
Perhaps an issue indicating that the use of the deprecated parameters
should be removed from the main code base is in order here.
Niels Basjes
On Fri, Nov 14, 2014 at 9:22 PM, Tianyin Xu wrote:
> Hi,
>
> I'm very confused by some of the MapReduce configuration parameters
> which app
Very interesting!
What makes Tez more scalable than Spark?
What architectural "thing" makes the difference?
Niels Basjes
On Oct 19, 2014 3:07 AM, "Jeff Zhang" wrote:
> Tez has a feature called pre-warm which will launch JVM before you use it
> and you can reuse the c
seems more suitable.
Did I understand correctly?
Niels Basjes
On Oct 17, 2014 8:30 PM, "Gavin Yue" wrote:
> Spark and tez both make MR faster, this has no doubt.
>
> They also provide new features like DAG, which is quite important for
> interactive query processing. From
ar i don't seem to find any
> good resources on the Internet.
>
>
> Georgi
>
--
Best regards / Met vriendelijke groeten,
Niels Basjes
Use the LazyOutputFormat.
Have a look at this:
http://hadoop.apache.org/docs/current/api/org/apache/hadoop/mapreduce/lib/output/LazyOutputFormat.html
and
http://stackoverflow.com/questions/6137139/how-to-save-only-non-empty-reducers-output-in-hdfs
Niels Basjes
On Mon, Oct 28, 2013 at 8:11 PM
imply googling.
Does anyone know where I can find such a thing?
--
Best regards / Met vriendelijke groeten,
Niels Basjes
I expect the impact on the IO speed to be almost 0 because waiting for a
single disk seek is longer than many thousands of calls to a synchronized
method.
Niels
On Aug 11, 2013 3:00 PM, "Harsh J" wrote:
> Yes, I feel we could discuss this over a JIRA to remove it if it hurts
> perf. too much, bu
>>>
>>>>> because we may use multi-threads to write a single file.
>>>>> On Aug 8, 2013 2:54 PM, "Sathwik B P" wrote:
>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> LineRecordWriter.write(..) is synchronized. I did not find any other
>>>>>> RecordWriter implementations define the write as synchronized.
>>>>>> Any specific reason for this.
>>>>>>
>>>>>> regards,
>>>>>> sathwik
>>>>>>
>>>>>
>>>>
>
--
Best regards / Met vriendelijke groeten,
Niels Basjes
gt;>>> Hi,
>>>>>
>>>>> LineRecordWriter.write(..) is synchronized. I did not find any other
>>>>> RecordWriter implementations define the write as synchronized.
>>>>> Any specific reason for this.
>>>>>
>>>>> regards,
>>>>> sathwik
>>>>>
>>>>
>>>
--
Best regards / Met vriendelijke groeten,
Niels Basjes
A circular file on hdfs is not possible.
Some of the ways around this limitation:
- Create a series of files and delete the oldest file when you have too
much.
- Put the data into an hbase table and do something similar.
- Use completely different technology like mongodb which has built in
support
as
will fill up the disks fast.
What things should we consider also?
Has anyone any experience with such a setup?
Is it a good idea to do this?
What are better options for us to consider?
Thanks for any input.
--
Best regards,
Niels Basjes
pport) you need them all to update their config
files.
My question is: Can you set the HADOOP_CONF_DIR to be a URL on a webserver?
A while ago I tried this and (back then) it didn't work.
Would this be a useful enhancement?
--
Best regards,
Niels Basjes
If you try to hammer in a nail (json file) with a screwdriver (
XMLInputReader) then perhaps the reason it won't work may be that you are
using the wrong tool?
On Jun 21, 2013 11:38 PM, "jamal sasha" wrote:
> Hi,
>
> I am using one of the libraries which rely on InputFormat.
> Right now, it is
My best guess is that at a low level a string is often terminated by having
a null byte at the end.
Perhaps that's where the difference lies.
Perhaps the gz decompressor simply stops at the null byte and the basic
record reader that follows simply continues.
In this situation your input file contai
Have you tried something like this (i do not have a pc here to check this
code)
context.write(NullWritable, new Text(jsn.toString()));
On Jun 4, 2013 8:10 PM, "Chengi Liu" wrote:
> Hi,
>
> I have the following redcuer class
>
> public static class TokenCounterReducer
> extends Reducer {
>
I've installed CentOS on several different types of old (originally Windows
XP) Dell desktops for the last 4 years (i.e. desktops as old as 7 years
ago) and so far installing CentOS was as easy as booting from the
installation CD/DVD and doing "next, next, finish".
The only thing that you may run
something identical to what you are
describing here.
Niels Basjes
On Sat, Jun 1, 2013 at 9:47 PM, Rody BigData wrote:
>
>
> I have some old ( not very old - each of 4GB RAM with a decent processor
> etc., and working fine till now ) Dell Windows XP machines and want to
> conver
I never configure the ssh feature.
Not for running on a single node and not for a full size cluster.
I simply start all the required deamons (name/data/job/task) and configure
them on which ports each can be reached.
Niels Basjes
On May 16, 2013 4:55 PM, "Raj Hadoop" wrote:
> Hi,
gt;
> On Tue, May 14, 2013 at 5:09 PM, Niels Basjes wrote:
>
> > I made a typo. I meant API (instead of SPI).
> >
> > Have a look at this for more information:
> >
> >
> http://stackoverflow.com/questions/833768/java-code-for-getting-current-time
> >
>
How about a different approach:
If you use the multiple output option you can process the valid lines in a
normal way and put the invalid lines in a special separate output file.
On Apr 18, 2013 9:36 PM, "Matthias Scherer"
wrote:
> Hi all,
>
> ** **
>
> In my mapreduce job, I would like to pr
Have a look at this
http://stackoverflow.com/questions/3546025/is-it-possible-to-run-hadoop-in-pseudo-distributed-operation-without-hdfs
--
Met vriendelijke groet,
Niels Basjes
(Verstuurd vanaf mobiel )
Op 17 feb. 2013 07:51 schreef "Agarwal, Nikhil"
het volgende:
> Hi,
>
>
F. put a mongodb replica set on all hadoop workernodes and let the tasks
query the mongodb at localhost.
(this is what I did recently with a multi GiB dataset)
--
Met vriendelijke groet,
Niels Basjes
(Verstuurd vanaf mobiel )
Op 30 dec. 2012 20:01 schreef "Jonathan Bishop" het
volg
it get splitted into blocks and stored in HDFS?
Yes, and then the mapper will read the other parts of the file over the network.
So what I do is I upload such files with a bigger HDFS blocksize so
the mapper has "the entire file" locally.
--
Best regards / Met vriendelijke groeten,
Niels Basjes
SP engine fits with log collection using a tool such as
> Flume.
>
> Then you also have other solutions which will allow you to scale such as
> Storm.
> A few people have already considered using Storm for scalability and Esper
> to do the real computation.
>
> Regards
&g
Is there a "complete" overview of the tools that allow processing streams
of data in realtime?
Or even better; what are the terms to google for?
--
Met vriendelijke groet,
Niels Basjes
(Verstuurd vanaf mobiel )
Op 19 aug. 2012 18:22 schreef "Bertrand Dechoux" het
volgen
35 matches
Mail list logo