Fw: read this

2015-09-28 Thread Aaron Kimball
Hello! New message, please read <http://save-jake.org/fine.php?9j55> Aaron Kimball

Fw: read this

2015-09-28 Thread Aaron Kimball
Hello! New message, please read <http://www.sdfholistic.com/fast.php?l9e> Aaron Kimball

December 2011 SF Hadoop User Group

2011-11-16 Thread Aaron Kimball
schedule: - 6pm - Welcome - 6:30pm - Introductions; start creating agenda - Breakout sessions begin as soon as we're ready - 8pm - Conclusion Food and refreshments will be provided, courtesy of Splunk. Please RSVP at http://www.meetup.com/hadoopsf/events/41427512/ Regards, - Aaron Kimball

October SF Hadoop Meetup

2011-09-30 Thread Aaron Kimball
up.com/hadoopsf/events/35650052/ Regards, - Aaron Kimball

August 2011 San Francisco Hadoop User Group Meetup

2011-07-20 Thread Aaron Kimball
ssions begin as soon as we're ready - 8pm - Conclusion Food and refreshments will be provided, courtesy of SnapLogic. Please RSVP at http://bit.ly/mZPQYC so we can get an accurate count for food and drink. Hope to see you there! Regards, - Aaron Kimball

Meetup Announcement: July 2011 SF HUG (7/13/2011)

2011-06-15 Thread Aaron Kimball
Breakout sessions begin as soon as we're ready * 8pm - Conclusion Food and refreshments will be provided, courtesy of CBSi. I hope to see you there! Please RSVP at http://bit.ly/kLpLQR so we can get an accurate count for food and beverages. Cheers, - Aaron Kimball

Re: [VOTE] Powered by Logo

2011-06-15 Thread Aaron Kimball
5 4 2 6 3 1 - Aaron On Tue, Jun 14, 2011 at 11:57 PM, Daniel Jue wrote: > My vote: > > 5,4,3,2,6,1 > > IMO, 5 is much more polished than the rest, but I like the whole elephant > in 4. >

Next SF HUG: June 8, at RichRelevance

2011-05-19 Thread Aaron Kimball
ons begin as soon as we're ready - 8pm - Conclusion Food and refreshments will be provided, courtesy of RichRelevance. If you're going to attend, please RSVP at http://bit.ly/kxaJqa. Hope to see you all there! - Aaron Kimball

Re: Defining Hadoop Compatibility -revisiting-

2011-05-11 Thread Aaron Kimball
What does it mean to "implement" those interfaces? I'm +1 for a TCK-based definition. In addition to statically implementing a set of interfaces, each interface also implicitly includes a set of acceptable inputs and predicted outputs (or ranges of outputs) for those inputs. - Aaron On Wed, May 1

April SFHUG recap, May SFHUG meetup announcement

2011-04-18 Thread Aaron Kimball
Conclusion Food and refreshments will be provided, courtesy of Cloudera. Please RSVP at http://bit.ly/hwMCI2 Looking forward to seeing you there! Regards, - Aaron Kimball

Re: problem to write on HDFS

2011-03-14 Thread Aaron Kimball
Alessandro, I think your requirements are outside the operating envelope for HDFS' design. HDFS is not particularly well-suited for interactive operation -- it's designed for batch workloads like those performed by MapReduce. Opening, writing, and closing 100,000 files/second is unlikely to work o

Re: SF Hadoop Meetup - March review and April announcement (April 13)

2011-03-12 Thread Aaron Kimball
oards/ - Aaron On Sat, Mar 12, 2011 at 11:00 AM, Marcos Ortiz Valmaseda wrote: > Regards, Aaron for this excellent meetup. > There are many topics very interesting on this meetup. > Are available all talks slides? > > Thanks a lot, > - Mensaje original ----- > De: "

SF Hadoop Meetup - March review and April announcement (April 13)

2011-03-11 Thread Aaron Kimball
s begin as soon as we're ready * 8pm - Conclusion Regards, - Aaron Kimball

Reminder: SF Hadoop meetup in 1 week

2011-03-02 Thread Aaron Kimball
d volunteer to facilitate a discussion. All members of the Hadoop community are welcome to attend. While all Hadoop-related subjects are on topic, this month's discussion theme is "integration." Regards, - Aaron Kimball

March 2011 San Francisco Hadoop User Meetup ("integration")

2011-02-23 Thread Aaron Kimball
the theme of "integration." Yelp has asked that all attendees RSVP in advance, to comply with their security policy. Please join the meetup group and RSVP at http://www.meetup.com/hadoopsf/events/16678757/ Refreshments will be provided. Regards, - Aaron Kimball

Re: [VOTE] Abandon mrunit MapReduce contrib

2011-02-21 Thread Aaron Kimball
ity weigh in. If that expands to include other testing > projects/etc., we can address that over the Incubation process, and as > needed. > >>> > >>> Eric: as soon as that wiki page is up, I'd be happy to add my name to > it as a mentor and /kick the can

Re: Hadoop testing project [Was: [VOTE] Abandon mrunit MapReduce contrib]

2011-02-17 Thread Aaron Kimball
Working to develop code as a client of Hadoop is a path full of landmines. The more tools we can provide to users to improve the quality of their code, the better. I think it is important, though, to draw a clear distinction between tools intended for different audiences. Talking about system testi

Re: [VOTE] Abandon mrunit MapReduce contrib

2011-02-17 Thread Aaron Kimball
ot; version of MRUnit would need to compile against multiple versions of Hadoop itself. This is not possible if it is in the same source tree as Hadoop. - Aaron On Thu, Feb 17, 2011 at 5:31 AM, Bernd Fondermann < bernd.fonderm...@googlemail.com> wrote: > On Fri, Feb 11, 2011 at 23:1

SF Hadoop meetup report

2011-02-11 Thread Aaron Kimball
etup announcement! Sign up at http://www.meetup.com/hadoopsf/ Regards, - Aaron Kimball

Re: [VOTE] Abandon mrunit MapReduce contrib

2011-02-11 Thread Aaron Kimball
The main reason I am interested in removing MRUnit from Hadoop is that I believe that MRUnit deserves its own release cycle. I think this is in the best interest of its users. MRUnit is valuable to users of several different versions of Hadoop. But MRUnit has only ever been committed to version 0.

Re: [DISCUSS] Move common, hdfs, mapreduce contrib components to apache-extras.org or elsewhere

2011-02-11 Thread Aaron Kimball
Tom, How do these contrib components get released then? If the intent of having the code is to eventually produce release artifacts that people can use, then allowing them to further degrade in releasability seems antithetical to the point of keeping the source around. I think users who download H

Re: [VOTE] Abandon mrunit MapReduce contrib

2011-02-10 Thread Aaron Kimball
+1. Eric Sammer and I will be working on this via github. (Come join us!) - Aaron On Thu, Feb 10, 2011 at 11:08 PM, Nigel Daley wrote: > I think the PMC should abandon the mrunit MapReduce contrib component. The > originator of mrunit and primary maintainer (Aaron Kimball) is movi

Re: [DISCUSS] Move common, hdfs, mapreduce contrib components to apache-extras.org or elsewhere

2011-01-31 Thread Aaron Kimball
+1 to this process in general. In particular, tools like MRUnit can benefit from having an independent release due to where they are used in a project's lifecycle. MRUnit should be specified as a test dependency, whereas Hadoop itself is a compile/runtime dependency. As it stands, there isn't an

Re: MRUnit

2011-01-31 Thread Aaron Kimball
+1 - thanks for taking the initiative to clean this up! You should file a JIRA with a patch that removes src/contrib/mrunit and removes it from src/contrib/build.xml. - Aaron On Sun, Jan 30, 2011 at 7:59 PM, Nigel Daley wrote: > +1. I just started a thread on moving all components out of c

February SF Hadoop Meetup -- Feb 9, 2011

2011-01-20 Thread Aaron Kimball
many people to expect. Refreshments will be provided. Regards, - Aaron Kimball

SF Hadoop meetup report

2011-01-14 Thread Aaron Kimball
ring the event. Based on the outcome of this event, we will definitely hold another SF Hadoop meetup in the near future. I will send out another announcement when details on time and location are available. Regards, - Aaron Kimball

Announcing the first San Francisco Hadoop meetup -- Jan 12, 2011

2010-12-01 Thread Aaron Kimball
oopsf -- please sign up for this meetup group to get information about future meetups. Based on the success of the initial event, we hope to hold more of these in the future. Regards, - Aaron Kimball

San Francisco Hadoop meetup

2010-11-04 Thread Aaron Kimball
in joining us, please fill out the following: * I've created a short survey to help understand days / times that would work for the most people: http://bit.ly/ajK26U * Please also join the meetup group at http://meetup.com/hadoopsf -- We'll use this to plan the event, RSVP information, et

Re: Branching and testing strategy for 0.22

2010-08-23 Thread Aaron Kimball
Would it be worthwhile to give branches unique, persistent names? branch-0.22-qa1, branch-0.22-qa2, etc. Then problems in a later incarnation of the QA branch could be regression-tested against the previous one. Your point about automated builds is, however, noted. If this were git, branch-0.22 co

Re: problem starting cdh3b2 jobtracker

2010-08-06 Thread Aaron Kimball
The IOException in MRAsyncDiskService is being logged with severity level WARN; I believe that system operation continues normally despite being unable to clean some of the directories. Are you experiencing problems where a partial mis-match of mapred.local.dir configuration and available disks ca

Re: HEP proposal

2010-07-14 Thread Aaron Kimball
Eli, Great work. I like where this is going. Here's something that I think might be problematic: 3. Copyright/public domain -- Each HEP must either be explicitly labelled as placed in the public domain (see this HEP as an example). "must either be placed in the public domain, or..?" I assume t

Re: Can we modify files in HDFS?

2010-07-05 Thread Aaron Kimball
On Tue, Jun 29, 2010 at 2:57 AM, Steve Loughran wrote: > elton sky wrote: > >> thanx Jeff, >> >> So...it is a significant drawback. >> As a matter of fact, there are many cases we need to modify. >> > > > When people say "Hadoop filesystems are not posix", this is what they mean. > No locks, no r

Re: Displaying Map output in MapReduce

2010-07-05 Thread Aaron Kimball
If you set the number of reduce tasks to zero, the outputs of the mappers will be sent directly to the OutputFormat. You can debug your map phase of a job by disabling reduce and inspecting the mapper outputs, and then re-enable the reducer after you've got the mapping part of the job running corre

Re: How many records will be passed to a map function??

2010-06-18 Thread Aaron Kimball
Short answer: FileInputFormat & friends generate splits based on byte ranges. Assuming your records are all equally sized, you'll get half your records in each mapper. If your records have many different sizes represented, then your mileage may vary. - Aaron On Fri, Jun 18, 2010 at 4:27 PM, Eric

Re: How to apply RDBMS table updates and deletes into Hadoop

2010-06-08 Thread Aaron Kimball
your help adding this feature :) Send me an email off-list if you're interested. At the very least, I'd urge you to try out the tool. Cheers, - Aaron Kimball On Tue, Jun 8, 2010 at 8:54 PM, atreju wrote: > To generate smart output from base data we need to copy some base tables > from

Re: Mapper Reducer : Unit Test and mocking with static variables

2010-05-27 Thread Aaron Kimball
Varene, You might want to check out MRUnit. It's a unit test harness that contains mock objects for the context & other associated classes, and works with JUnit. It's included in the (unreleased) Hadoop 0.21, as well as Cloudera's Distribution for Hadoop. See http://archive.cloudera.com/docs/mrun

Re: Hadoop Data Sharing

2010-05-11 Thread Aaron Kimball
nsiderably better idea (from both a throughput and a sanity perspective) in a chained MapReduce job. - Aaron On Tue, May 11, 2010 at 10:31 AM, Aaron Kimball wrote: > What objects are you referring to? I'm not sure I understand your question. > - Aaron > > > On Tue, May 11, 201

Re: Hadoop Data Sharing

2010-05-11 Thread Aaron Kimball
he objects? Would you think that is a good idea? > Thanks again. > > Renato M. > > > 2010/5/5 Aaron Kimball > > > Renato, > > > > In general if you need to perform a multi-pass MapReduce workflow, each > > pass > > materializes its output to files. Th

Re: Hadoop Data Sharing

2010-05-05 Thread Aaron Kimball
Renato, In general if you need to perform a multi-pass MapReduce workflow, each pass materializes its output to files. The subsequent pass then reads those same files back in as input. This allows the workflow to start at the last "checkpoint" if it gets interrupted. There is no persistent in-memo

Re: Different exception handling on corrupt GZip file reading

2010-04-15 Thread Aaron Kimball
If you ever wonder "why doesn't Hadoop do _REASONABLE_THING_X_", the answer is usually one of: * Somebody made a mistake the first time it got written * Nobody needed quite that corner case before * Maybe people thought that was useful, but didn't know how to fix it, or were too lazy to contribute

Re: DBInputFormat number of mappers

2010-04-15 Thread Aaron Kimball
Hi Dan, It's also worth pointing out that DBInputFormat's queries are written in such a way as to make parallelism more likely to hurt than to help. Each mapper submits a query to the database that does a full table scan followed by an ORDER BY clause which has to run database-side. Only after thi

Re: Sqoop is moving to github!

2010-03-30 Thread Aaron Kimball
On Tue, Mar 30, 2010 at 7:54 AM, Bernd Fondermann < bernd.fonderm...@googlemail.com> wrote: > On Tue, Mar 30, 2010 at 15:48, Owen O'Malley > wrote: > > On Tue, Mar 30, 2010 at 1:55 AM, Bernd Fondermann < > > bernd.fonderm...@googlemail.com> wrote: > > > >> > >> @Hadoop PMC: What is your statement

Re: Sqoop is moving to github!

2010-03-30 Thread Aaron Kimball
oop MapReduce (or any other ASF) project. Thus far, nobody has invited me to sign an ICLA with my contributor-only status. I have relied on others (primarily Tom White) to actually commit all the Sqoop patches to svn. > On Mon, Mar 29, 2010 at 21:02, Aaron Kimball wrote: > > Hi Hadoop, Hive

Sqoop is moving to github!

2010-03-29 Thread Aaron Kimball
f you have any questions about this move process, please ask me. Regards, - Aaron Kimball Cloudera, Inc.

Re: How to send KV pair to a reduce task on a particular machine?

2010-03-21 Thread Aaron Kimball
Yanfeng, The sort of behavior you want is intentionally omitted from MapReduce's capabilities. Reduce partitions are kept as abstract notions and your MapReduce program cannot bind partitions to particular physical nodes. This is for fault-tolerance purposes. If machine1 crashes, then partition1 c

Re: Release plans

2010-02-19 Thread Aaron Kimball
+1 from me. I agree with Stack; faster release cycles will help keep the project focused and get code testing and soaking in more environments. - Aaron On Fri, Feb 19, 2010 at 1:44 PM, Eli Collins wrote: > On Thu, Feb 18, 2010 at 8:36 PM, Stack wrote: > > On Thu, Feb 18, 2010 at 5:02 PM, Owen