Re: remote job submission

2012-04-21 Thread Harsh J
Hi, A JobClient is something that facilitates validating your job configuration and shipping necessities to the cluster and notifying the JobTracker of that new job. Afterwards, its responsibility may merely be to monitor progress via reports from JobTracker(MR1)/ApplicationMaster(MR2). A client

Re: Accessing global Counters

2012-04-21 Thread Harsh J
Currently the DistributedCache is populated pre-Job run, hence both Map and Reduce phases carry the same items. With MR2, the approach Robert describes above should work better instead. On Sat, Apr 21, 2012 at 5:21 AM, JAX jayunit...@gmail.com wrote: No reducers can't access mapper counters.

Re: Reporter vs context

2012-04-21 Thread Harsh J
Context is what the new MR API offers, and it wraps over a Reporter object, and provides other helpful functions and data you'd require within a task (lives up to its name). Reporter was the raw object provided in the old MR API, that lets one report progress, set status, etc.. In new API, you

Re: Feedback on real world production experience with Flume

2012-04-21 Thread alo alt
Hi, in my former job: productive, Germany, Web portal. Throughput 600 mb/minute. Logfiles from Windows IIS, Apache. Used in a usual way, no own decorators or sinks. Simply syslog - bucketing (1 minute rollover) - hdfs splitted into minutes (MMDDHHMM). Stable, some issues (you'll found on

Re: Feedback on real world production experience with Flume

2012-04-21 Thread M. C. Srivas
Karl, since you did ask for alternatives, people using MapR prefer to use the NFS access to directly deposit data (or access it). Works seamlessly from all Linuxes, Solaris, Windows, AIX and a myriad of other legacy systems without having to load any agents on those machines. And it is fully

Re: remote job submission

2012-04-21 Thread JAX
Thanks j harsh: I have another question , though --- You mentioned that : The client needs access to the DataNodes (for actually writing the previous files to DFS for the JobTracker to pick up) What do you mean by previous files? It seems like, if designing Hadoop from scratch , I wouldn't

Re: remote job submission

2012-04-21 Thread Harsh J
By previous files I meant the job related files there. DataNodes are persistent members in HDFS. A removal of a DN results in loss of blocks. Usually you have replication handling failures of DN flawlessly, but consider a 1-replication cluster. A DN downtime can't be acceptable in that case.

Re: Feedback on real world production experience with Flume

2012-04-21 Thread alo alt
We decided NO product and vendor advertising on apache mailing lists! I do not understand why you'll put that closed source stuff from your employe in the room. It has nothing to do with flume or the use cases! -- Alexander Lorenz http://mapredit.blogspot.com On Apr 21, 2012, at 4:06 PM, M. C.

Re: Feedback on real world production experience with Flume

2012-04-21 Thread Edward Capriolo
It seems pretty relevant. If you can directly log via NFS that is a viable alternative. On Sat, Apr 21, 2012 at 11:42 AM, alo alt wget.n...@googlemail.com wrote: We decided NO product and vendor advertising on apache mailing lists! I do not understand why you'll put that closed source stuff

oceansync hadoop connection properties

2012-04-21 Thread John Stein
hello, I am fairly new to Hadoop and I am trying to figure out how to find the full Name Node URI with port and full JobTracker URI with port for usage with the new oceansync hadoop management software that came out.  The software is asking for two configuration properties and I am trying to

Re: Feedback on real world production experience with Flume

2012-04-21 Thread Chen He
Can the NFS become the bottleneck ? Chen On Sat, Apr 21, 2012 at 5:23 PM, Edward Capriolo edlinuxg...@gmail.comwrote: It seems pretty relevant. If you can directly log via NFS that is a viable alternative. On Sat, Apr 21, 2012 at 11:42 AM, alo alt wget.n...@googlemail.com wrote: We

Re: oceansync hadoop connection properties

2012-04-21 Thread Jagat
Hi Can you tell how you started hadoop , those are locations where hadoop namenode is running. http://hadoop.apache.org/common/docs/current/single_node_setup.html If you read the link above there we have detailed info about then and hadoop install If you are new to hadoop then you should not

Re: Feedback on real world production experience with Flume

2012-04-21 Thread Alexander Lorenz
no. That is the Flume Open Source Mailinglist. Not a vendor list. NFS logging has nothing to do with decentralized collectors like Flume, JMS or Scribe. sent via my mobile device On Apr 22, 2012, at 12:23 AM, Edward Capriolo edlinuxg...@gmail.com wrote: It seems pretty relevant. If you can