Re: Issue on accessing AVRO format file from HA Cluster.

2016-04-27 Thread Niels Basjes
Hi, You say are on an HA cluster; yet by just looking at the errors I see the stack being routed through "org.apache.hadoop.hdfs.NameNodeProxies. createNonHAProxy" My best guess is that you HA config is incomplete. Niels Basjes On Wed, Apr 27, 2016 at 4:27 PM, Mayank Mishra wr

Long running Yarn Applications on a secured HA cluster?

2016-01-28 Thread Niels Basjes
Best regards / Met vriendelijke groeten, Niels Basjes

Flink job on secure Yarn fails after many hours

2015-12-02 Thread Niels Basjes
g (in either Hadoop or Flink) or am I doing something wrong? Would upgrading Yarn to 2.7.1 (i.e. HDP 2.3) fix this? Niels Basjes 21:30:27,821 WARN org.apache.hadoop.security.UserGroupInformation - PriviledgedActionException as:nbasjes (auth:SIMPLE) cause:org.apache.hadoop.ipc.RemoteExce

Re: Sorting the inputSplits

2015-07-30 Thread Niels Basjes
MapReduce is based on the premise that several parts of a task can be processed independently in parallel. If you "require" an order of processing then these files are depending on each other. Why use MapReduce at all? With your requirement you cannot use more than one CPU anyway. Niels On Thu, 3

Re: Not able to run more than one map task

2015-04-10 Thread Niels Basjes
Just curious: what is the input for your job ? If it is a single gzipped file then that is the cause of getting exactly 1 mapper. Niels On Fri, Apr 10, 2015, 09:21 Amit Kumar wrote: > Thanks a lot Harsha for replying > > This problem has waster at least last one week. > > We tried what you sugg

Re: way to add custom udf jar in hadoop 2.x version

2015-01-04 Thread Niels Basjes
I created https://issues.apache.org/jira/browse/HIVE-9252 for this improvement. On Sun, Jan 4, 2015 at 5:16 PM, Niels Basjes wrote: > Hi, > > These options: > - HIVE_HOME/auxlib > - http://stackoverflow.com/questions/14032924/how-to-add-serde-jar > - ADD JAR commands in your

Re: way to add custom udf jar in hadoop 2.x version

2015-01-04 Thread Niels Basjes
[, JAR|FILE|ARCHIVE 'file_uri'] ]; Is this something for which there is already a JIRA (couldn't find it)? If not; Should I create one? (I.e. do you think this would make sense for others?) Niels Basjes On Fri, Jan 2, 2015 at 9:00 PM, Yakubovich, Alexey < alexey.yakubov...@sear

Re: way to add custom udf jar in hadoop 2.x version

2014-12-31 Thread Niels Basjes
Thanks for the pointer. This seems to work for functions. Is there something similar for CREATE EXTERNAL TABLE ?? Niels On Dec 31, 2014 8:13 AM, "Ted Yu" wrote: > Have you seen this thread ? > > http://search-hadoop.com/m/8er9TcALc/Hive+udf+custom+jar&subj=Best+way+to+add+custom+UDF+jar+in+HiveS

Re: to all this unsubscribe sender

2014-12-05 Thread Niels Basjes
t unsubscribe at the same way as the subscription >> was, as described here. >> >> http://hadoop.apache.org/mailing_lists.html >> >> In the case that YOU don't know how a mailinglist works, please take a >> look here. >> >> http://en.wikipedia.org/wik

Re: to all this unsubscribe sender

2014-12-05 Thread Niels Basjes
t; > I'm new to this list but from my point of view it is very disrespectful to > the list members and developers that YOU don't invest a little bit of time > by your self to search how you can unsubscribe from a list on which YOU > have subscribed or anyone which have used your email account. > > cheers Aleks > > > > > > > > -- Best regards / Met vriendelijke groeten, Niels Basjes

Re: Are these configuration parameters deprecated?

2014-11-14 Thread Niels Basjes
. Perhaps an issue indicating that the use of the deprecated parameters should be removed from the main code base is in order here. Niels Basjes On Fri, Nov 14, 2014 at 9:22 PM, Tianyin Xu wrote: > Hi, > > I'm very confused by some of the MapReduce configuration parameters > which app

Re: Spark vs Tez

2014-10-19 Thread Niels Basjes
Very interesting! What makes Tez more scalable than Spark? What architectural "thing" makes the difference? Niels Basjes On Oct 19, 2014 3:07 AM, "Jeff Zhang" wrote: > Tez has a feature called pre-warm which will launch JVM before you use it > and you can reuse the c

Re: Spark vs Tez

2014-10-18 Thread Niels Basjes
seems more suitable. Did I understand correctly? Niels Basjes On Oct 17, 2014 8:30 PM, "Gavin Yue" wrote: > Spark and tez both make MR faster, this has no doubt. > > They also provide new features like DAG, which is quite important for > interactive query processing. From

Re: Bzip2 files as an input to MR job

2014-09-22 Thread Niels Basjes
ar i don't seem to find any > good resources on the Internet. > > > Georgi > -- Best regards / Met vriendelijke groeten, Niels Basjes

Re: stop generating these "part-XXXX" empty files when using MultipleOutputs in mapreduce job

2013-10-28 Thread Niels Basjes
Use the LazyOutputFormat. Have a look at this: http://hadoop.apache.org/docs/current/api/org/apache/hadoop/mapreduce/lib/output/LazyOutputFormat.html and http://stackoverflow.com/questions/6137139/how-to-save-only-non-empty-reducers-output-in-hdfs Niels Basjes On Mon, Oct 28, 2013 at 8:11 PM

Generating mysql or sqlite datafiles from Hadoop (Java)?

2013-09-17 Thread Niels Basjes
imply googling. Does anyone know where I can find such a thing? -- Best regards / Met vriendelijke groeten, Niels Basjes

Re: Why LineRecordWriter.write(..) is synchronized

2013-08-11 Thread Niels Basjes
I expect the impact on the IO speed to be almost 0 because waiting for a single disk seek is longer than many thousands of calls to a synchronized method. Niels On Aug 11, 2013 3:00 PM, "Harsh J" wrote: > Yes, I feel we could discuss this over a JIRA to remove it if it hurts > perf. too much, bu

Re: Why LineRecordWriter.write(..) is synchronized

2013-08-08 Thread Niels Basjes
>>> >>>>> because we may use multi-threads to write a single file. >>>>> On Aug 8, 2013 2:54 PM, "Sathwik B P" wrote: >>>>> >>>>>> Hi, >>>>>> >>>>>> LineRecordWriter.write(..) is synchronized. I did not find any other >>>>>> RecordWriter implementations define the write as synchronized. >>>>>> Any specific reason for this. >>>>>> >>>>>> regards, >>>>>> sathwik >>>>>> >>>>> >>>> > -- Best regards / Met vriendelijke groeten, Niels Basjes

Re: Why LineRecordWriter.write(..) is synchronized

2013-08-08 Thread Niels Basjes
gt;>>> Hi, >>>>> >>>>> LineRecordWriter.write(..) is synchronized. I did not find any other >>>>> RecordWriter implementations define the write as synchronized. >>>>> Any specific reason for this. >>>>> >>>>> regards, >>>>> sathwik >>>>> >>>> >>> -- Best regards / Met vriendelijke groeten, Niels Basjes

Re: Is there any way to use a hdfs file as a Circular buffer?

2013-07-24 Thread Niels Basjes
A circular file on hdfs is not possible. Some of the ways around this limitation: - Create a series of files and delete the oldest file when you have too much. - Put the data into an hbase table and do something similar. - Use completely different technology like mongodb which has built in support

Running a single cluster in multiple datacenters

2013-07-15 Thread Niels Basjes
as will fill up the disks fast. What things should we consider also? Has anyone any experience with such a setup? Is it a good idea to do this? What are better options for us to consider? Thanks for any input. -- Best regards, Niels Basjes

Use a URL for the HADOOP_CONF_DIR?

2013-07-15 Thread Niels Basjes
pport) you need them all to update their config files. My question is: Can you set the HADOOP_CONF_DIR to be a URL on a webserver? A while ago I tried this and (back then) it didn't work. Would this be a useful enhancement? -- Best regards, Niels Basjes

Re: Inputformat

2013-06-21 Thread Niels Basjes
If you try to hammer in a nail (json file) with a screwdriver ( XMLInputReader) then perhaps the reason it won't work may be that you are using the wrong tool? On Jun 21, 2013 11:38 PM, "jamal sasha" wrote: > Hi, > > I am using one of the libraries which rely on InputFormat. > Right now, it is

Re: gz containing null chars?

2013-06-10 Thread Niels Basjes
My best guess is that at a low level a string is often terminated by having a null byte at the end. Perhaps that's where the difference lies. Perhaps the gz decompressor simply stops at the null byte and the basic record reader that follows simply continues. In this situation your input file contai

Re: Reducer to output only json

2013-06-04 Thread Niels Basjes
Have you tried something like this (i do not have a pc here to check this code) context.write(NullWritable, new Text(jsn.toString())); On Jun 4, 2013 8:10 PM, "Chengi Liu" wrote: > Hi, > > I have the following redcuer class > > public static class TokenCounterReducer > extends Reducer { >

Re: Experimental Hadoop Cluster - Linux Windows machines

2013-06-01 Thread Niels Basjes
I've installed CentOS on several different types of old (originally Windows XP) Dell desktops for the last 4 years (i.e. desktops as old as 7 years ago) and so far installing CentOS was as easy as booting from the installation CD/DVD and doing "next, next, finish". The only thing that you may run

Re: Experimental Hadoop Cluster - Linux Windows machines

2013-06-01 Thread Niels Basjes
something identical to what you are describing here. Niels Basjes On Sat, Jun 1, 2013 at 9:47 PM, Rody BigData wrote: > > > I have some old ( not very old - each of 4GB RAM with a decent processor > etc., and working fine till now ) Dell Windows XP machines and want to > conver

Re: Configuring SSH - is it required? for a psedo distriburted mode?

2013-05-19 Thread Niels Basjes
I never configure the ssh feature. Not for running on a single node and not for a full size cluster. I simply start all the required deamons (name/data/job/task) and configure them on which ports each can be reached. Niels Basjes On May 16, 2013 4:55 PM, "Raj Hadoop" wrote: > Hi,

Re: how to get the time of a hadoop cluster, v0.20.2

2013-05-16 Thread Niels Basjes
gt; > On Tue, May 14, 2013 at 5:09 PM, Niels Basjes wrote: > > > I made a typo. I meant API (instead of SPI). > > > > Have a look at this for more information: > > > > > http://stackoverflow.com/questions/833768/java-code-for-getting-current-time > > >

Re: How to process only input files containing 100% valid rows

2013-04-19 Thread Niels Basjes
How about a different approach: If you use the multiple output option you can process the valid lines in a normal way and put the invalid lines in a special separate output file. On Apr 18, 2013 9:36 PM, "Matthias Scherer" wrote: > Hi all, > > ** ** > > In my mapreduce job, I would like to pr

Re: Can I perfrom a MR on my local filesystem

2013-02-16 Thread Niels Basjes
Have a look at this http://stackoverflow.com/questions/3546025/is-it-possible-to-run-hadoop-in-pseudo-distributed-operation-without-hdfs -- Met vriendelijke groet, Niels Basjes (Verstuurd vanaf mobiel ) Op 17 feb. 2013 07:51 schreef "Agarwal, Nikhil" het volgende: > Hi, > >

Re: What is the preferred way to pass a small number of configuration parameters to a mapper or reducer

2012-12-30 Thread Niels Basjes
F. put a mongodb replica set on all hadoop workernodes and let the tasks query the mongodb at localhost. (this is what I did recently with a multi GiB dataset) -- Met vriendelijke groet, Niels Basjes (Verstuurd vanaf mobiel ) Op 30 dec. 2012 20:01 schreef "Jonathan Bishop" het volg

Re: Doubts on compressed file

2012-11-07 Thread Niels Basjes
it get splitted into blocks and stored in HDFS? Yes, and then the mapper will read the other parts of the file over the network. So what I do is I upload such files with a bigger HDFS blocksize so the mapper has "the entire file" locally. -- Best regards / Met vriendelijke groeten, Niels Basjes

Re: Hadoop Real time help

2012-08-22 Thread Niels Basjes
SP engine fits with log collection using a tool such as > Flume. > > Then you also have other solutions which will allow you to scale such as > Storm. > A few people have already considered using Storm for scalability and Esper > to do the real computation. > > Regards &g

Re: Hadoop Real time help

2012-08-19 Thread Niels Basjes
Is there a "complete" overview of the tools that allow processing streams of data in realtime? Or even better; what are the terms to google for? -- Met vriendelijke groet, Niels Basjes (Verstuurd vanaf mobiel ) Op 19 aug. 2012 18:22 schreef "Bertrand Dechoux" het volgen