unsubscribe

2023-02-02 Thread John Lilley
[rg] <https://www.redpointglobal.com/> John Lilley Data Management Chief Architect, Redpoint Global Inc. 888 Worcester Street, Suite 200 Wellesley, MA 02482 M: +1 7209385761 | john.lil...@redpointglobal.com<mailto:john.lil...@redpointglobal.com> PLEASE NOTE: This e-mail f

Re: [DISCUSS] fate of branch-2.9

2020-08-27 Thread John Zhuge
t; > >> >>>> On Mon, Mar 2, 2020 at 6:30 PM Sree Vaddi > >> >>>> wrote: > >> >>>> > >> >>>>> +1 > >> >>>>> > >> >>>>> Sent from Yahoo Mail on Android > >> >>>>> > >> >>>>> On Mon, Mar 2, 2020 at 5:12 PM, Wei-Chiu Chuang< > weic...@apache.org> > >> >>>>> wrote: Hi, > >> >>>>> > >> >>>>> Following the discussion to end branch-2.8, I want to start a > >> >>>>> discussion > >> >>>>> around what's next with branch-2.9. I am hesitant to use the word > "end > >> >>>>> of > >> >>>>> life" but consider these facts: > >> >>>>> > >> >>>>> * 2.9.0 was released Dec 17, 2017. > >> >>>>> * 2.9.2, the last 2.9.x release, went out Nov 19 2018, which is > more > >> >>>>> than > >> >>>>> 15 months ago. > >> >>>>> * no one seems to be interested in being the release manager for > 2.9.3. > >> >>>>> * Most if not all of the active Hadoop contributors are using > Hadoop > >> >>>>> 2.10 > >> >>>>> or Hadoop 3.x. > >> >>>>> * We as a community do not have the cycle to manage multiple > release > >> >>>>> line, > >> >>>>> especially since Hadoop 3.3.0 is coming out soon. > >> >>>>> > >> >>>>> It is perhaps the time to gradually reduce our footprint in Hadoop > >> >>>>> 2.x, and > >> >>>>> encourage people to upgrade to Hadoop 3.x > >> >>>>> > >> >>>>> Thoughts? > >> >>>>> > >> >>>>> > -- John Zhuge

RE: Multiple classloaders and Hadoop APIs

2019-02-15 Thread John Lilley
happens by trial-and-error, and then needing to figure out how to get the thread-context-classloader set at the appropriate place and time. John Lilley -Original Message- From: Sean Busbey Sent: Tuesday, February 5, 2019 10:42 AM To: John Lilley Cc: user@hadoop.apache.org Subject: Re

Multiple classloaders and Hadoop APIs

2019-02-01 Thread John Lilley
y? If not, is there a list of such classes? Thanks, John Lilley

Example of YARN delegation token for hiveserver2

2018-09-19 Thread John Lilley
and MapReduce do this somehow, but we haven't been able to dig out the specifics from the code. John Lilley

Re: Help me understand hadoop caching behavior

2017-12-27 Thread Avery, John
Nevermind. I found my stupid mistake. I didn’t reset a variable…this fact had escaped me for the past two days. From: "Avery, John" Date: Wednesday, December 27, 2017 at 4:20 PM To: "user@hadoop.apache.org" Subject: Help me understand hadoop caching behavior I’m writing a

Help me understand hadoop caching behavior

2017-12-27 Thread Avery, John
I’m writing a program using the C API for Hadoop. I have a 4-node cluster. (Cluster was setup according to https://www.tutorialspoint.com/hadoop/hadoop_multi_node_cluster.htm) Of the 4 nodes, one is the namenode and a datanode, the others are datanodes (with one being a secondary namenode). I’

[ANNOUNCE] Apache Gora 0.7 Release

2017-03-23 Thread lewis john mcgibbney
Hi Folks, The Apache Gora team are pleased to announce the immediate availability of Apache Gora 0.7. The Apache Gora open source framework provides an in-memory data model and persistence for big data. Gora supports persisting to column stores, key value stores, document stores and RDBMSs, and an

RE: What is the recommended way to append to files on hdfs?

2016-05-31 Thread John Lilley
ood design, coupling your input readers directly with output writers. Instead, put the writers in separate threads and push byte arrays to be written to them via a queue. John Lilley From: Dmitry Goldenberg [mailto:dgoldenberg...@gmail.com] Sent: Wednesday, May 25, 2016 9:12 PM To: user@hadoop.

RE: Filing a JIRA

2016-05-24 Thread John Lilley
That's it! Thanks. John Lilley From: Chris Nauroth [mailto:cnaur...@hortonworks.com] Sent: Tuesday, May 24, 2016 10:24 AM To: John Lilley ; 'user@hadoop.apache.org' Subject: Re: Filing a JIRA Something is definitely odd about the UI there. From your second link, can yo

RE: Filing a JIRA

2016-05-24 Thread John Lilley
to here: https://issues.apache.org/jira/browse/HADOOP/?selectedTab=com.atlassian.jira.jira-projects-plugin:summary-panel But still, creating a bug using the "Bug -> Create Detailed" menu files it against Atlas. John Lilley From: Chris Nauroth [mailto:cnaur...@hortonworks.com] Sent: Monday, M

Filing a JIRA

2016-05-09 Thread John Lilley
I am having trouble filing a bug report. I was trying this: https://issues.apache.org/jira/servicedesk/customer/portal/5/create/27 But this doesn't seem right and refuses my request. Can someone point me to the right place? Thanks John L

Can a YARN container request be fulfilled by a container with less memory?

2015-12-11 Thread John Lilley
2048MB. We are using the default Capacity Scheduler. Is this a configuration error on our part or has the Resource Manager somehow returned the wrong size container allocation? Should we simply reject small containers and wait for the RM to find a larger one for us? John Lilley

RE: Ubuntu open file limits

2015-10-02 Thread John Lilley
ice.datalever.com_45454 Only the first worker node has the higher file limit. The rest have lower limits. I have verified this on two separate clusters now. The same discrepencies are observed by looking at /proc//limits for the datanode processes on each worker node. This is looking like

Ubuntu open file limits

2015-09-30 Thread John Lilley
oot soft nofile 65536 root hard nofile 65536 But none of this seems to affect the RM/NM limits. Thanks john <>

Fw: important

2015-09-08 Thread John Hu
Hello! Important message, visit http://bookreviewsrus.com/appear.php?dfw4 John Hu

RE: UserGroupInformation and login with password

2015-08-19 Thread John Lilley
to understand why this process diverges from reloginFromKeytab(), which works just fine. John Lilley From: Zheng, Kai [mailto:kai.zh...@intel.com] Sent: Monday, August 17, 2015 5:40 PM To: user@hadoop.apache.org Subject: RE: UserGroupInformation and login with password Hi John, Login from keytab is mostly e

UserGroupInformation and login with password

2015-08-17 Thread John Lilley
It seems like this should be something simple. We do need to support password, because many of our customers do not allow keytabs. Thanks John Lilley <>

RE: Error in YARN localization with Active Directory user -- inconsistent directory name escapement

2015-05-04 Thread John Lilley
Follow-up, this is indeed a YARN bug and I've filed a JIRA, which has garnered a lot of attention and a patch. john From: John Lilley [mailto:john.lil...@redpoint.net] Sent: Friday, April 17, 2015 1:01 PM To: 'user@hadoop.apache.org' Subject: Error in YARN localization with Active

RE: Trusted-realm vs default-realm kerberos issue

2015-04-19 Thread John Lilley
files not matching exactly. Unfortunately we made so many attempts that I can’t now recall exactly what we did to bring it all into line. john From: Alexander Alten-Lorenz [mailto:wget.n...@gmail.com] Sent: Wednesday, March 25, 2015 3:28 AM To: user@hadoop.apache.org Subject: Re: Trusted-realm

Error in YARN localization with Active Directory user -- inconsistent directory name escapement

2015-04-17 Thread John Lilley
1/g<mailto:%5e.*@OFFICE\.DATALEVER\.COM$)s/%5e(.*)@OFFICE\.DATALEVER\.COM$/office\\$1/g> Thanks John Lilley <>

Trusted-realm vs default-realm kerberos issue

2015-03-17 Thread John Lilley
at Edge AD controller trusts an "enterprise" AD controller. Trying to authenticate using the password equivalent of UserGroupInformation.loginUserFromKeytab() with a user in the "enterprise" realm fails, while a user in the "edge" realm succeeds. Thanks John <>

RE: SSH passwordless & Hadoop starup/shutdown scripts

2014-11-26 Thread John Beaulaurier -X (jbeaulau - ADVANCED NETWORK INFORMATION INC at Cisco)
Please disregard. Issue resolved. -John From: John Beaulaurier -X (jbeaulau - ADVANCED NETWORK INFORMATION INC at Cisco) Sent: Wednesday, November 26, 2014 9:34 AM To: user@hadoop.apache.org Subject: SSH passwordless & Hadoop starup/shutdown scripts Hello, I had originally configured our

SSH passwordless & Hadoop starup/shutdown scripts

2014-11-26 Thread John Beaulaurier -X (jbeaulau - ADVANCED NETWORK INFORMATION INC at Cisco)
,hostname: Name or service not knows Can someone suggest where I should be looking for configuration issues? Thank you -John

Need some help with RecordReader

2014-10-28 Thread John Dison
fter reading the first line of the next record. I would be very thankful for any help with implementing such a RecordReader. Thanks in advance,John.

YARN rack-specific, relax_locality=false container request does not respond

2014-10-20 Thread John (Youngseok) Yang
request properly, but *fails to somehow resolve the rack at each of nodeUpdates* (never mind the resource limit of virtualCores: I am using DefaultResourceCalculator, which only looks at the memory) I would appreciate any advice or suggestions. Best regards, John Rack

JNI native-method warning in HDP 2.0

2014-10-16 Thread John Lilley
We are seeing a warning deep in HDFS code, I was wondering if anyone knows of this or if a JIRA has been filed or fixed? Searching on the warning text didn't crop up anything. It is not harmful AFAIK. John [Dynamic-linking native method java.net.NetworkInterface.init ... JNI] WARNI

RE: Issue with Hadoop/Kerberos security as client

2014-09-18 Thread John Lilley
n properly. Any C++ thread calling AttachCurrentThread() must then fetch the system class loader and set it to be the thread's context class loader: java.lang.Thread.currentThread().setContextClassLoader(java.lang.ClassLoader.getSystemClassLoader()); john From: John Lilley [mailto:john.li

RE: How can I increase the speed balancing?

2014-09-16 Thread John Lilley
Srikanth, The cluster is idle while balancing, and it seems to move about 2MB/minute. There is no discernable CPU load. john From: Srikanth upputuri [mailto:srikanth.upput...@huawei.com] Sent: Thursday, September 04, 2014 12:57 AM To: user@hadoop.apache.org Subject: RE: How can I increase the

RE: How can I increase the speed balancing?

2014-09-03 Thread John Lilley
I have also found that neither dfsadmin - setBalanacerBandwidth nor dfs.datanode.balance.bandwidthPerSec’ have any notable effect on apparent balancer rate. This is on Hadoop 2.2.0 john From: cho ju il [mailto:tjst...@kgrid.co.kr] Sent: Wednesday, September 03, 2014 12:55 AM To: user

RE: HDFS balance

2014-09-03 Thread John Lilley
Can you run the load from an "edge node" that is not a DataNode? john John Lilley Chief Architect, RedPoint Global Inc. 1515 Walnut Street | Suite 300 | Boulder, CO 80302 T: +1 303 541 1516  | M: +1 720 938 5761 | F: +1 781-705-2077 Skype: jlilley.redpoint | john.lil...@re

RE: YARN userapp cache lifetime: can't find core dump

2014-09-02 Thread John Lilley
I think I found it: yarn.nodemanager.delete.debug-delay-sec From: John Lilley [mailto:john.lil...@redpoint.net] Sent: Tuesday, September 02, 2014 2:02 PM To: 'user@hadoop.apache.org' Subject: YARN userapp cache lifetime: can't find core dump We have a YARN task that is core-dumpi

RE: YARN userapp cache lifetime: can't find core dump

2014-09-02 Thread John Lilley
Shahab, Thanks, but I think that is just for log aggregation. I want to retain the entire localized directory structure for a YARN task, including any files written to that place, after the task has exited. John From: Shahab Yunus [mailto:shahab.yu...@gmail.com] Sent: Tuesday, September 02

YARN userapp cache lifetime: can't find core dump

2014-09-02 Thread John Lilley
below here is empty /data2/hadoop/yarn/local/usercache/jlilley/appcache I seem to recall there is a YARN setting to control the time these files are kept around after application exit, but I can't figure out what it is. Thanks, john <>

RE: winutils and security

2014-08-26 Thread John Lilley
Call addDelegationTokens() to extract delegated Credentials for HDFS and keep them around. Once this has been done, it appears tha tall is well. We can use those Credentials in the YARN application master launch context. john From: John Lilley [mailto:john.lil...@redpoint.net] Sent: Sunday, August 24, 2014 11:

RE: winutils and security

2014-08-24 Thread John Lilley
Following up on this, I was able to extract a winutils.exe and Hadoop.dll from a Hadoop install for Windows, and set up HADDOP_HOME and PATH to find them. It makes no difference to security, apparently. John From: John Lilley [mailto:john.lil...@redpoint.net] Sent: Saturday, August 23, 2014 2

winutils and security

2014-08-23 Thread John Lilley
these errors occur, messages come from Hadoop like those below. Is it possible that this is leading to our security failures? (I posted previously about that problem but got no response). What does winutils.exe have to do with security, if anything? Thanks john The relevant portions of the log

Issue with Hadoop/Kerberos security as client

2014-08-19 Thread John Lilley
(Client.java:654) ... 24 more Oddly, the multi-thread access pattern works in pure Java, it just fails when performed from C++ via JNI. We are very careful to maintain global JNI references etc... the JNI interface works flawlessly in all other cases. Any ideas? Thanks John <>

RE: Issues with documentation on YARN

2014-07-27 Thread John Lilley
The only way we've found to write an application master is by example. However, the distributed-shell example in 2.2 was not very good; it is much improved in 2.4. I would start with that and edit, rather than create from scratch. john -Original Message- From: Никитин Конст

Re: what changes needed for existing HDFS java client in order to work with kerberosed hadoop server ?

2014-07-17 Thread John Glynn
Unsubscribe On Jul 16, 2014 5:00 PM, "Xiaohua Chen" wrote: > Hi Experts, > > I am new to Hadoop. I would like to get some help from you: > > Our current HDFS java client works fine with hadoop server which has > NO Kerberos security enabled. We use HDFS lib e.g. > org.apache.hadoop.fs.*. > > No

silence/suppress output from mkdir & put

2014-07-11 Thread John Meza
I'm try to silence the output of a hdfs load script (snippets below), but continue to get output. This causes issues when the multiple load scripts are put into background. Am I missing something? Are there switches I could use?any info is appreciated.John hadoop fs -mkdir /logs/firsttier/2014/D

RE: HBase metadata

2014-07-08 Thread John Lilley
Thanks. Apologies, I should have gone there first. john From: Ted Yu [mailto:yuzhih...@gmail.com] Sent: Tuesday, July 08, 2014 11:04 AM To: common-u...@hadoop.apache.org Subject: Re: HBase metadata I have forwarded the original post to user@hbase FYI On Tue, Jul 8, 2014 at 10:01 AM, Martin

RE: HBase metadata

2014-07-08 Thread John Lilley
Sorry to be rude, but what does everyone actually use now? We are an ISV and need to support the most common access pattern. john From: Martin, Nick [mailto:nimar...@pssd.com] Sent: Tuesday, July 08, 2014 10:53 AM To: user@hadoop.apache.org Subject: RE: HBase metadata Have you looked @ Lingual

RE: HBase metadata

2014-07-08 Thread John Lilley
Those look intriguing. But what do people actually use today? Is it all application-specific coding? Hive? John From: Mirko Kämpf [mailto:mirko.kae...@gmail.com] Sent: Tuesday, July 08, 2014 10:12 AM To: user@hadoop.apache.org Subject: Re: HBase metadata Hi John, I suggest the project: http

HBase metadata

2014-07-08 Thread John Lilley
question about metadata standards. What do users mostly do to use HBase for row-oriented access? It is always going through Hive? Thanks john

RE: persisent services in Hadoop

2014-06-27 Thread John Lilley
. Cheers, john From: Arun Murthy [mailto:a...@hortonworks.com] Sent: Wednesday, June 25, 2014 11:50 PM To: user@hadoop.apache.org Subject: Re: persisent services in Hadoop John, We are excited to see ISVs like you get value from YARN, and appreciate the patience you've already shown in the pa

add to example programs

2014-06-26 Thread John Hancock
describes how to add a job to this jar? -John

persisent services in Hadoop

2014-06-25 Thread John Lilley
have an AM that creates a set of YARN tasks and just waits until YARN gives a task on each node, and restart any failed tasks, but it doesn't really fit the AM/container structure very well. I've also read about Slider, which looks interesting. Other ideas? --john

RE: Gathering connection information

2014-06-14 Thread John Lilley
require direct connections to each DataNode? Does such an Edge Node proxy all of those connections automatically, or does our software need to be made aware of this convention somehow? Thanks, John From: Rishi Yadav [mailto:ri...@infoobjects.com] Sent: Saturday, June 07, 2014 8:20 AM To: user

hdfs over http error

2014-06-12 Thread John Beaulaurier -X (jbeaulau - ADVANCED NETWORK INFORMATION INC at Cisco)
. Add the following property to hdfs-site.xml dfs.web.ugi webgroup Is a restart of dfs, or mapred, or both necessary after the adding the property? Thanks -John

RE: Hadoop SAN Storage reuse

2014-06-12 Thread John Lilley
to disk contention. john From: Natarajan, Prabakaran 1. (NSN - IN/Bangalore) [mailto:prabakaran.1.natara...@nsn.com] Sent: Thursday, June 12, 2014 12:00 AM To: user@hadoop.apache.org Subject: Hadoop SAN Storage reuse Hi I know SAN storage is not recommended for Hadoop.But we don't

Gathering connection information

2014-06-04 Thread John Lilley
nformation more easily, like from a web API (where at least we'd only need one address and port)? john

RE: partition file by content based through HDFS

2014-05-11 Thread John Lilley
files to indicate the key split. But this kind of begs the question “why”? MapReduce has built-in support for data partitioning on the fly in the “mappers” and you don’t really need to do anything. Is that too slow for your needs? john From: Mirko Kämpf [mailto:mirko.kae...@gmail.com] Sent

HDFS and YARN security and interface impacts

2014-04-25 Thread John Lilley
Hi, can anyone help me with this? From: John Lilley [mailto:john.lil...@redpoint.net] Sent: Sunday, April 20, 2014 3:40 PM To: user@hadoop.apache.org Subject: HDFS and YARN security and interface impacts We have an application that interfaces directly to HDFS and YARN (no MapReduce). It does

HDFS and YARN security and interface impacts

2014-04-20 Thread John Lilley
ll be MapReduce. For a "native" YARN/HDFS application, what changes if any must be made to the API calls to support Kerberos or other authentication? Does it just happen automatically at the OS level using the authenticated user ID of the process? If there's a good reference I'd appreciate it. john

Re: Doubt regarding Binary Compatibility\Source Compatibility with old *mapred* APIs and new *mapreduce* APIs in Hadoop

2014-04-14 Thread John Meagher
Also, "Source Compatibility" also means ONLY a recompile is needed. No code changes should be needed. On Mon, Apr 14, 2014 at 10:37 AM, John Meagher wrote: > Source Compatibility = you need to recompile and use the new version > as part of the compilation > > Binary Compat

Re: Doubt regarding Binary Compatibility\Source Compatibility with old *mapred* APIs and new *mapreduce* APIs in Hadoop

2014-04-14 Thread John Meagher
Source Compatibility = you need to recompile and use the new version as part of the compilation Binary Compatibility = you can take something compiled against the old version and run it on the new version On Mon, Apr 14, 2014 at 9:19 AM, Radhe Radhe wrote: > Hello People, > > As per the Apache s

reducing HDFS FS connection timeouts

2014-03-27 Thread John Lilley
t.connect.timeout", "7000"); before calling FileSystem.get() but it doesn't seem to matter. What is the prescribed technique for lowering connection timeout to HDFS? Thanks john

RE: Hadoop Takes 6GB Memory to run one mapper

2014-03-27 Thread John Lilley
This discussion may also be relevant to your question: http://stackoverflow.com/questions/21005643/container-is-running-beyond-memory-limits Do you actually need to specify that -Xmx6000m for java heap or could it be one of the other issues discussed? John From: John Lilley [mailto:john.lil

RE: Hadoop Takes 6GB Memory to run one mapper

2014-03-27 Thread John Lilley
Could you have a pmem-vs-vmem issue as in: http://stackoverflow.com/questions/8017500/specifying-memory-limits-with-hadoop john From: praveenesh kumar [mailto:praveen...@gmail.com] Sent: Tuesday, March 25, 2014 7:38 AM To: user@hadoop.apache.org Subject: Re: Hadoop Takes 6GB Memory to run one

RE: ipc.Client: Retrying connect to server

2014-03-27 Thread John Lilley
Does "netstat -an | grep LISTEN" show these ports being listened on? Can you stat hdfs from the command line e.g.: hdfs dfsadmin -report hdfs fsck / hdfs dfs -ls / Also, check out /var/log/hadoop or /var/log/hdfs for more details. john From: Mahmood Naderan [mailto:nt_mahm...@yahoo

RE: Getting error message from AM container launch

2014-03-26 Thread John Lilley
Wangda Tan, Thanks for your reply! We did actually figure out where the problem was coming from, but this is a very helpful technique to know. John From: Wangda Tan [mailto:wheele...@gmail.com] Sent: Wednesday, March 26, 2014 6:35 PM To: user@hadoop.apache.org Subject: Re: Getting error

RE: Getting error message from AM container launch

2014-03-26 Thread John Lilley
the total command-line-argument + environment space. Cheers, John From: Azuryy [mailto:azury...@gmail.com] Sent: Wednesday, March 26, 2014 5:13 PM To: user@hadoop.apache.org Subject: Re: Getting error message from AM container launch You used 'nice' in your app? Sent from my iPhone5

RE: Getting error message from AM container launch

2014-03-26 Thread John Lilley
On further examination they appear to be 369 characters long. I've read about similar issues showing when the environment exceeds 132KB, but we aren't putting anything significant in the environment. John From: John Lilley [mailto:john.lil...@redpoint.net] Sent: Wednesday, March 26,

RE: Getting error message from AM container launch

2014-03-26 Thread John Lilley
We do have a fairly long container command-line. Not huge, around 200 characters. John From: John Lilley [mailto:john.lil...@redpoint.net] Sent: Wednesday, March 26, 2014 4:38 PM To: user@hadoop.apache.org Subject: Getting error message from AM container launch Running a non-MapReduce YARN

Getting error message from AM container launch

2014-03-26 Thread John Lilley
Running a non-MapReduce YARN application, one of the containers launched by the AM is failing with an error message I've never seen. Any ideas? I'm not sure who exactly is running "nice" or why its argument list would be too long. Thanks john Container for appattempt_13957

Re: re-replication after data node failure

2014-03-26 Thread John Meagher
The balancer is not what handles adding extra replicas in the case of a node failure, but it looks like the balancer bandwidth setting is the way to throttle. See: http://mail-archives.apache.org/mod_mbox/hadoop-hdfs-user/201301.mbox/%3c50f870c1.5010...@getjar.com%3E On Wed, Mar 26, 2014 at 10:51

RE: ResourceManager shutting down

2014-03-13 Thread John Lilley
Never mind... we figured out its DNS entry was going missing. john From: John Lilley [mailto:john.lil...@redpoint.net] Sent: Thursday, March 13, 2014 2:52 PM To: user@hadoop.apache.org Subject: ResourceManager shutting down We have this erratic behavior where every so often the RM will shutdown

ResourceManager shutting down

2014-03-13 Thread John Lilley
We have this erratic behavior where every so often the RM will shutdown with an UnknownHostException. The odd thing is, the host it complains about have been in use for days at that point without problem. Any ideas? Thanks, John 2014-03-13 14:38:14,746 INFO rmapp.RMAppImpl

RE: Fetching configuration values from cluster

2014-03-07 Thread John Lilley
, Stanley Shi, [http://www.gopivotal.com/files/media/logos/pivotal-logo-email-signature.png] On Fri, Mar 7, 2014 at 1:46 AM, John Lilley mailto:john.lil...@redpoint.net>> wrote: How would I go about fetching configuration values (e.g. yarn-site.xml) from the cluster via the API from an appli

Fetching configuration values from cluster

2014-03-06 Thread John Lilley
How would I go about fetching configuration values (e.g. yarn-site.xml) from the cluster via the API from an application not running on a cluster node? Thanks John

Re: Hadoop FileCrush

2014-03-05 Thread John Meagher
I've used it regularly and it works great. It has come up on the lists occasionally, but not that often. A more recent version is available at https://github.com/edwardcapriolo/filecrush On Thu, Feb 27, 2014 at 12:26 PM, Devin Suiter RDX wrote: > Hi, > > Has anyone used Hadoop Filecrush? > >

RE: Need help: fsck FAILs, refuses to clean up corrupt fs

2014-03-04 Thread John Lilley
Ah... found the answer. I had to manually leave safe mode to delete the corrupt files. john From: John Lilley [mailto:john.lil...@redpoint.net] Sent: Tuesday, March 04, 2014 9:33 AM To: user@hadoop.apache.org Subject: RE: Need help: fsck FAILs, refuses to clean up corrupt fs More information

RE: Need help: fsck FAILs, refuses to clean up corrupt fs

2014-03-04 Thread John Lilley
More information from the NameNode log. I don't understand... it is saying that I cannot delete the corrupted file until the NameNode leaves safe mode, but it won't leave safe mode until the file system is no longer corrupt. How do I get there from here? Thanks john 2014-03-04 06

Re: [hadoop] AvroMultipleOutputs org.apache.avro.file.DataFileWriter$AppendWriteException

2014-03-04 Thread John Pauley
gopivotal.com/files/media/logos/pivotal-logo-email-signature.png] On Mon, Mar 3, 2014 at 11:49 PM, John Pauley mailto:john.pau...@threattrack.com>> wrote: This is cross posted to avro-user list (http://mail-archives.apache.org/mod_mbox/avro-user/201402.mbox/%3ccf3612f6.94d2%25john.pau.

Need help: fsck FAILs, refuses to clean up corrupt fs

2014-03-04 Thread John Lilley
I have a file system with some missing/corrupt blocks. However, running hdfs fsck -delete also fails with errors. How do I get around this? Thanks John [hdfs@metallica yarn]$ hdfs fsck -delete /rpdm/tmp/ProjectTemp_461_40/TempFolder_4/data00012_00.dld Connecting to namenode via http

RE: decommissioning a node

2014-03-04 Thread John Lilley
OK, restarting all services now fsck shows under-replication. Was it the NameNode restart? John From: John Lilley [mailto:john.lil...@redpoint.net] Sent: Tuesday, March 04, 2014 5:47 AM To: user@hadoop.apache.org Subject: decommissioning a node Our cluster has a node that reboot randomly. So

decommissioning a node

2014-03-04 Thread John Lilley
that this node is really gone, and it should start replicating the missing blocks? Thanks John

[hadoop] AvroMultipleOutputs org.apache.avro.file.DataFileWriter$AppendWriteException

2014-03-03 Thread John Pauley
mReduceTasks(0); AvroJob.setOutputKeySchema(job, schema); AvroMultipleOutputs.addNamedOutput(job, "avro", AvroKeyOutputFormat.class, schema); FileInputFormat.addInputPath(job, new Path(("/tmp/avrotest/input"))); FileOutputFormat.setOutputPath(job, new Path("/tmp/avrotest/output")); return (job.waitForCompletion(true) ? 0 : 1); } public static void main(String[] args) throws Exception { int exitCode = ToolRunner.run(new AvroContainerFileDriver(), args); System.exit(exitCode); } } Thanks, John Pauley Sr. Software Engineer ThreatTrack Security

ResourceManager crash on deleted NM node back from the dead

2014-03-03 Thread John Lilley
rce with checksum d23ee1d271c6ac5bd27de664146be2 This command was run using /usr/lib/hadoop/hadoop-common-2.2.0.2.0.6.0-76.jar Thanks John

RE: how to remove a dead node?

2014-03-02 Thread John Lilley
Actually it does show in Ambari, but the only time I've seen it is when adding a new host it shows in the "other registered hosts" list. john From: John Lilley [mailto:john.lil...@redpoint.net] Sent: Saturday, March 01, 2014 12:32 PM To: user@hadoop.apache.org Subject: how to rem

how to remove a dead node?

2014-03-01 Thread John Lilley
: 0 (0 B) DFS Used: 0 (0 B) Non DFS Used: 0 (0 B) DFS Remaining: 0 (0 B) DFS Used%: 100.00% DFS Remaining%: 0.00% Last contact: Fri Feb 28 18:20:38 MST 2014 Ambari doesn't show it at all in the hosts. How do I remove it so I can re-add it without conflict? john

RE: large CDR data samples

2014-03-01 Thread John Lilley
Yes I have, and I'm talking to them now about getting a sample file. They may be nice and give me a large file. I was also hoping to find "real" data if possible. Thanks, john From: Ted Yu [mailto:yuzhih...@gmail.com] Sent: Saturday, March 01, 2014 8:43 AM To: common-u...@ha

large CDR data samples

2014-03-01 Thread John Lilley
I would like to explore Call Data Record (CDR aka Call Detail Record) analysis, and to that end I'm looking for a large (GB+) CDR file or a program to synthesize a somewhat-realistic sample file. Does anyone know where to find such a thing? Thanks John

RE: very long timeout on failed RM connect

2014-03-01 Thread John Lilley
one full round of retries of the underlying ipc. In each ClientRMProxy retry, the max number of underlying ipc retry is controlled by "ipc.client.connect.max.retries". Did you try setting both ? Jian On Wed, Feb 12, 2014 at 8:36 AM, John Lilley mailto:john.lil.

RE: very long timeout on failed RM connect

2014-02-12 Thread John Lilley
quot;, "2"); Also does not help. Is there a retry parameter that can be set? Thanks John From: John Lilley [mailto:john.lil...@redpoint.net] Sent: Monday, February 10, 2014 12:12 PM To: user@hadoop.apache.org Subject: RE: very long timeout on failed RM connect I tried: conf.set(&q

RE: very long timeout on failed RM connect

2014-02-10 Thread John Lilley
I tried: conf.set("yarn.resourcemanager.connect.max-wait.ms", "1"); conf.set("yarn.resourcemanager.connect.retry-interval.ms", "1000"); But it has no apparent effect. Still hangs for a very long time. john From: Jian He [mailto:j...@hortonworks.com]

very long timeout on failed RM connect

2014-02-10 Thread John Lilley
ategy exists, in order to prevent a highly-loaded cluster from failing jobs. But it is not appropriate for an interactive application. Thanks John

RE: HDFS buffer sizes

2014-02-09 Thread John Lilley
Thanks. Experimentally, I have found that changing the buffers sizes has no effect, so that makes sense. John From: Arpit Agarwal [mailto:aagar...@hortonworks.com] Sent: Tuesday, January 28, 2014 12:35 AM To: user@hadoop.apache.org Subject: Re: HDFS buffer sizes Looks like

RE: HDFS read stats

2014-02-09 Thread John Lilley
Thanks! I would have never found that. john From: Ted Yu [mailto:yuzhih...@gmail.com] Sent: Monday, January 27, 2014 4:57 PM To: common-u...@hadoop.apache.org Subject: Re: HDFS read stats FSDataInputStream has this javadoc: /** Utility that wraps a {@link FSInputStream} in a {@link

RE: BlockMissingException reading HDFS file, but the block exists and fsck shows OK

2014-01-27 Thread John Lilley
, multi-process access to the same set of files. 3) Replication=1 having an influence. Any ideas? I am not seeing any errors in the datanode logs. I will run some other tests with replication=3 to see what happens. John From: John Lilley [mailto:john.lil...@redpoint.net] Sent: Monday, Januar

RE: HDFS read stats

2014-01-27 Thread John Lilley
Ummm... so if I've called FileSystem.open() with an hdfs:// path, and it returns an FSDataInputStream, how do I get from there to the DFSInputStream that you say has the interface I want? Thanks John From: Ted Yu [mailto:yuzhih...@gmail.com] Sent: Sunday, January 26, 2014 6:16 PM To: com

RE: HDFS open file limit

2014-01-27 Thread John Lilley
What exception would I expect to get if this limit was exceeded? john From: Harsh J [mailto:ha...@cloudera.com] Sent: Monday, January 27, 2014 8:12 AM To: Subject: Re: HDFS open file limit Hi John, There is a concurrent connections limit on the DNs that's set to a default of 4k max par

RE: BlockMissingException reading HDFS file, but the block exists and fsck shows OK

2014-01-27 Thread John Lilley
for any errors? On Jan 27, 2014 8:37 PM, "John Lilley" mailto:john.lil...@redpoint.net>> wrote: I am getting this perplexing error. Our YARN application launches tasks that attempt to simultaneously open a large number of files for merge. There seems to be a load threshold in te

BlockMissingException reading HDFS file, but the block exists and fsck shows OK

2014-01-27 Thread John Lilley
However, I would still like to know what limit is being hit, and how to best predict that limit on various cluster configurations. Thanks, john

RE: HDFS read stats

2014-01-26 Thread John Lilley
Ted, Thanks for link! I says 2.1.0 beta fix, and I can find FileSystem$Statistics class in 2.2.0 but it only seems to talk about read/write ops and bytes, not the local-vs-remote bytes. What am I missing? John From: Ted Yu [mailto:yuzhih...@gmail.com] Sent: Sunday, January 26, 2014 10:26 AM

HDFS open file limit

2014-01-26 Thread John Lilley
I have an application that wants to open a large set of files in HDFS simultaneously. Are there hard or practical limits to what can be opened at once by a single process? By the entire cluster in aggregate? Thanks John

HDFS read stats

2014-01-26 Thread John Lilley
Is there a way to monitor the proportion of HDFS read data that is satisfied by local nodes vs going across the network? Thanks John

RE: HDFS data transfer is faster than SCP based transfer?

2014-01-26 Thread John Lilley
x27;d need to know more about your application. john From: rab ra [mailto:rab...@gmail.com] Sent: Saturday, January 25, 2014 7:29 AM To: user@hadoop.apache.org Subject: RE: HDFS data transfer is faster than SCP based transfer? The input files are provided as argument to a binary being executed b

RE: HDFS data transfer is faster than SCP based transfer?

2014-01-25 Thread John Lilley
There are no short-circuit writes, only reads, AFAIK. Is it necessary to transfer from HDFS to local disk? Can you read from HDFS directly using the FileSystem interface? john From: Shekhar Sharma [mailto:shekhar2...@gmail.com] Sent: Saturday, January 25, 2014 3:44 AM To: user@hadoop.apache.org

  1   2   3   4   >