Re: Bit of help debugging a TIMED OUT session please

2010-02-22 Thread Ted Dunning
Not sure this helps at all, but these times are remarkably asymmetrical. I would expect members of a ZK cluster to have very comparable times. Additionally, 345 ms is nowhere near large enough to cause a session to expire. My take is that ZK doesn't think it caused the timeout. On Mon, Feb

Re: ZooKeeper packages for Ubuntu

2010-02-16 Thread Ted Dunning
/+archive/ppahttps://launchpad.net/%7Ettx/+archive/ppa This is a Personal Package Archive at the moment, but these packages may end up being promoted depending on how relevant they are. Please let me know if these work or do not work for you. -- Gustavo Niemeyer http://niemeyer.net -- Ted

Re: ephemeral node after server bounce

2010-02-04 Thread Ted Dunning
On Thu, Feb 4, 2010 at 2:20 PM, Yonik Seeley yo...@lucidimagination.comwrote: There's no way to hand over responsibility for an ephemeral znode, right? Right. We have solr nodes create ephemeral znodes (name based on host and port). The ephemeral znode takes some time to remove of course,

Re: ephemeral node after server bounce

2010-02-04 Thread Ted Dunning
, Patrick Hunt ph...@apache.org wrote: Ah, excellent idea [jvm shutdownhooks], won't always work but may help. I think in this case (ephemerals) all Yonik would need to do is close the session. That will remove all ephemerals. -- Ted Dunning, CTO DeepDyve

Re: question regarding connectionloss

2010-02-02 Thread Ted Dunning
: For example Hardware misconfiguration - NIC caused one system to basically work, but with huge numbers of connection loss, esp whenever there was load (and I've seen this particular issue twice now). -- Ted Dunning, CTO DeepDyve

Re: question regarding connectionloss

2010-02-01 Thread Ted Dunning
!? Thanks for any help. Cheers, Michael -- Michael Bauland michael.baul...@knipp.de bauland.tel -- Ted Dunning, CTO DeepDyve

Re: Q about ZK internal: how commit is being remembered

2010-01-28 Thread Ted Dunning
according to Zab's FIFO nature...just want to hear some clarification about it. Thanks alot! -- With Regards! Ye, Qian Made in Zhejiang University -- With Regards! Ye, Qian Made in Zhejiang University -- Ted Dunning

Re: Server exception when closing session

2010-01-25 Thread Ted Dunning
-- Ted Dunning, CTO DeepDyve

Re: Using zookeeper to assign a bunch of long-running tasks to nodes (without unhandled tasks and double-handled tasks)

2010-01-23 Thread Ted Dunning
processing the corresponding task (if something goes wrong, just kill itself and the node will be gone) if not, we go back to wait for watcher. Will this work? -- Ted Dunning, CTO DeepDyve

Re: Can zookeeper achive the IBM TSA function?

2010-01-20 Thread Ted Dunning
Take a look here at the recipes: http://hadoop.apache.org/zookeeper/docs/r3.0.0/recipes.html On Wed, Jan 20, 2010 at 12:15 AM, xeoshow xeos...@gmail.com wrote: Ted, thank you very much for your reply. I think A will exit and so ZK can help .. Not sure if any further link can help on how to

Re: Can zookeeper achive the IBM TSA function?

2010-01-19 Thread Ted Dunning
, but with a database, I would wonder if there are others. On Tue, Jan 19, 2010 at 11:30 PM, xeoshow xeos...@gmail.com wrote: I am wondering can this monitor part be replaced by zookeeper, using zookeeper watch or something else? -- Ted Dunning, CTO DeepDyve

Re: Share Zookeeper instance and Connection Limits

2009-12-18 Thread Ted Dunning
only a idea. The world are changing to SSD's too! -- Ted Dunning, CTO DeepDyve

Re: Share Zookeeper instance and Connection Limits

2009-12-16 Thread Ted Dunning
). Well, the disk IO or network first limits the throughput? Thanks for you quick response. I'm studding Zookeeper in my master thesis, for coordinate distributed index structures. -- Ted Dunning, CTO DeepDyve

Re: SLF4J for logging

2009-12-04 Thread Ted Dunning
? Solr now uses it, as does Avro I believe, and other parts of Hadoop. -Yonik http://www.lucidimagination.com -- Ted Dunning, CTO DeepDyve

Re: Observers!

2009-11-18 Thread Ted Dunning
13:06:39 -0600 (Wed, 18 Nov 2009) | 1 line ZOOKEEPER-368. Observers: core functionality (henry robinson via mahadev) Sweet! Congratulations, and thanks Henry. -- Gustavo Niemeyer http://niemeyer.net -- Ted Dunning, CTO DeepDyve

Re: ZK on EC2

2009-11-10 Thread Ted Dunning
to get double what I got for incoming transfer. On Mon, Nov 9, 2009 at 9:47 PM, Patrick Hunt ph...@apache.org wrote: Could you test networking - scping data between hosts? (I was seeing 64.1MB/s for a 512mb file - the one created by dd, random data) -- Ted Dunning, CTO DeepDyve

Re: ZK on EC2

2009-11-10 Thread Ted Dunning
on the wiki for others interested in running in EC2. -- Ted Dunning, CTO DeepDyve

Re: ZK on EC2

2009-11-10 Thread Ted Dunning
collector? Patrick Ted Dunning wrote: The server side is a fairly standard (but old) config: tickTime=2000 dataDir=/home/zookeeper/ clientPort=2181 initLimit=5 syncLimit=2 Most of our clients now use 5 seconds as the timeout, but I think that we went to longer timeouts in the past. Without

Re: ZK on EC2

2009-11-09 Thread Ted Dunning
the experience there? Are there more timeouts, lead re-election, etc? Thanks, Jun IBM Almaden Research Center K55/B1, 650 Harry Road, San Jose, CA 95120-6099 jun...@almaden.ibm.com -- Ted Dunning, CTO DeepDyve

Re: ZK on EC2

2009-11-09 Thread Ted Dunning
...@almaden.ibm.com Ted Dunning ted.dunn...@gmail.com wrote on 11/09/2009 04:24:16 PM: [image removed] Re: ZK on EC2 Ted Dunning to: zookeeper-user 11/09/2009 04:25 PM Please respond to zookeeper-user Worked pretty well for me. We did extend all of our timeouts. The biggest

Re: ZK on EC2

2009-11-09 Thread Ted Dunning
in the wiki page on say, EC2 small/large nodes? I'd do it myself but I've not used ec2. If anyone could try these and report I'd appreciate it. Patrick Ted Dunning wrote: Worked pretty well for me. We did extend all of our timeouts. The biggest worry for us was timeouts on the client side

Re: API for node entry to the cluster.

2009-11-05 Thread Ted Dunning
in the future to do this? TIA A -- Ted Dunning, CTO DeepDyve

Re: API for node entry to the cluster.

2009-11-05 Thread Ted Dunning
not restarting. Start/Stop the new/old process and then start a round of consensus for adding/removing a machine. I guess if one can do that then there is stopping of process required. Am I missing something here? A On Thu, Nov 5, 2009 at 11:14 AM, Ted Dunning ted.dunn...@gmail.com wrote

Re: zookeeper viewer

2009-10-25 Thread Ted Dunning
/24/09 4:18 PM, Hamoun gh hamoun...@gmail.com wrote: I am looking for the zookeeper viewer. seems the link is broken. can somebody please help? Thank you, Hamoun Ghanbari -- Ted Dunning, CTO DeepDyve

Re: Cluster Configuration Issues

2009-10-22 Thread Ted Dunning
, I know it makes more sense to run an odd number of zookeeper nodes but I just want to make sure it works first). Any suggestions? -- Ted Dunning, CTO DeepDyve

Re: Restarting a single zookeeper Server on the same port within the process

2009-10-22 Thread Ted Dunning
a delay and restarting it on the same port. But the server doesn't startup. When I re-start on a different port, it starts up correctly. Can you let me know how I can make this one work. Thank you. Regards, Siddharth -- Ted Dunning, CTO DeepDyve

Re: how to transfer the parameter from a reduce to another iteration of mapred

2009-10-11 Thread Ted Dunning
the average. I've attached my code here. Can someone shed some light on this? Thanks very much! Congcong -- Ted Dunning, CTO DeepDyve

Re: Parallel data stream processing

2009-10-10 Thread Ted Dunning
On Fri, Oct 9, 2009 at 11:02 PM, Ricky Ho r...@adobe.com wrote: ... To my understanding, within a Map/Reduce cycle, the input data set is freeze (no change is allowed) while the output data set is created from scratch (doesn't exist before). Therefore, the map/reduce model is inherently

Re: Hey Cloudera can you help us In beating Google Yahoo Facebook?

2009-10-02 Thread Ted Dunning
that above mentoned giants use Hadoop via Cloudera? Yahoo sponsored most of the writing of Yahoo and does not use Cloudera's distribution. Facebook sponsored the writing of Hive and probably still runs their own version of Hadoop. Why do you care if they use Cloudera's distribution? -- Ted

Re: feedback zkclient

2009-10-01 Thread Ted Dunning
) somewhere that totally ignores that this would reset the interrupt flag, if e is an InterruptedException. Therefore we better avoid having all of the methods throwing that exception. -- Ted Dunning, CTO DeepDyve

Re: feedback zkclient

2009-10-01 Thread Ted Dunning
is back and check if the znode is there. There is no way of knowing whether it was us who created the node or somebody else, right? -- Ted Dunning, CTO DeepDyve

Re: feedback zkclient

2009-10-01 Thread Ted Dunning
sessionid. As you say, it's highly implementation dependent. It's also something we recognize is a problem for users, we've slated it for 3.3.0 http://issues.apache.org/jira/browse/ZOOKEEPER-22 -- Ted Dunning, CTO DeepDyve

Re: How do we find the Server the client is connected to?

2009-10-01 Thread Ted Dunning
but that is not exposed. Rob Baccus 425-201-3812 -- Ted Dunning, CTO DeepDyve

Re: Advice on new Datacenter Hadoop Cluster?

2009-09-30 Thread Ted Dunning
://di2.deri.iehttp://webstar.deri.iehttp://sindice.com -- Ted Dunning, CTO DeepDyve

Re: Advice on new Datacenter Hadoop Cluster?

2009-09-30 Thread Ted Dunning
3460 Xeon $350 Samsung 7200 RPM SATA2 2x$85 2GB Non-ECC DIMM 4x$65 This totals $1052. Doesn't this seem like a reasonable setup? Isn't the purpose of a hadoop cluster to build cheap,fast, replaceable nodes? On Wed, Sep 30, 2009 at 9:06 PM, Ted Dunning

Re: Running Hadoop on cluster with NFS booted systems

2009-09-28 Thread Ted Dunning
I think that the last time you asked this question, the suggestion was to look at DNS and make sure that everything is exactly correct in the net-boot configuration. Hadoop is very sensitive to network routing and naming details. So, a) in your net-boot, how are IP addresses assigned? b) how

Re: Running Hadoop on cluster with NFS booted systems

2009-09-28 Thread Ted Dunning
. I have also confirmed that the IB IP address's match our DNS . -- Ted Dunning, CTO DeepDyve

Re: The idea behind 'myid'

2009-09-25 Thread Ted Dunning
there is a good reason for using this approach, but it is the first time I have come over this type of non-automatic way for administrating replicas. Regards, Orjan -- Ted Dunning, CTO DeepDyve

Re: hadoop fsck through proxy...

2009-09-24 Thread Ted Dunning
through a proxy with the hadoop command line. -- Ted Dunning, CTO DeepDyve

Re: Start problem of Running Replicated ZooKeeper

2009-09-23 Thread Ted Dunning
in mailing list archives, but got nothing helpful. I need your help, thanks and best regards! -- Ted Dunning, CTO DeepDyve

Re: Start problem of Running Replicated ZooKeeper

2009-09-23 Thread Ted Dunning
Good points. On the other hand, it could still be firewall issues. On Wed, Sep 23, 2009 at 8:30 AM, Benjamin Reed br...@yahoo-inc.com wrote: The connection refused message as opposed to no route to host, or unknown host, indicate that zookeeper has not been started on the other machines. are

Re: Hadoop On Cluster

2009-09-23 Thread Ted Dunning
resources and computing? -- Ted Dunning, CTO DeepDyve

Re: Hadoop On Cluster

2009-09-23 Thread Ted Dunning
To amplify this point, don't try reading from mySQL with a whole bunch of map tasks either. It is very impressive how quickly Hadoop can take down a database. On Wed, Sep 23, 2009 at 12:51 PM, Jeff Hammerbacher ham...@cloudera.comwrote: NFS mounts can be quite flaky at scale -- Ted

Re: Re: Processing a large quantity of smaller XML files?

2009-09-17 Thread Ted Dunning
? If you really mean changing existing files, HBase might be good for you - We have to change existing files...and add some new ones as well. So HAR won't really cut it for us. -- Ted Dunning, CTO DeepDyve

Re: HadoopDB and similar stuff

2009-09-15 Thread Ted Dunning
, Jeff Hammerbacher ham...@cloudera.comwrote: Do you want to tightly integrate SQL and map-reduce? Asterdata has a product that might help you. As does Greenplum. You could also get this functionality from Pig or Hive, which are Apache 2.0-licensed subprojects of Hadoop. -- Ted Dunning

Re: HadoopDB and similar stuff

2009-09-15 Thread Ted Dunning
On Tue, Sep 15, 2009 at 7:28 AM, Jeff Hammerbacher ham...@cloudera.comwrote: ... I would like to correct any misperceptions which may exist in the community. 1) HiveQL intends to include SQL as a subset of its syntax: see the VLDB paper for more (

Re: HadoopDB and similar stuff

2009-09-15 Thread Ted Dunning
there is a work around. The flip side is true as well, Hive has specific support that other databases don't :) -- Ted Dunning, CTO DeepDyve

Re: HadoopDB and similar stuff

2009-09-15 Thread Ted Dunning
Great description. On Tue, Sep 15, 2009 at 12:01 PM, Edward Capriolo edlinuxg...@gmail.comwrote: I often describe Hive query language in this way: If you know SQL I can teach you Hive-QL rather quickly. -- Ted Dunning, CTO DeepDyve

Re: computing pairwise document similarity

2009-09-10 Thread Ted Dunning
this with the pig commands. Any ideas? Thanks, Tommy -- Ted Dunning, CTO DeepDyve -- Ted Dunning, CTO DeepDyve

Re: Decommissioning Individual Disks

2009-09-10 Thread Ted Dunning
replacing one disk at a time, we wouldn't worry about it (because of redundancy). We can decommission the servers, but moving all the data off of all their disks is a waste. What's the best way to handle this? Thanks! David -- Ted Dunning, CTO DeepDyve

Re: computing pairwise document similarity

2009-09-09 Thread Ted Dunning
Post in haste, repent at leisure. The thrust of my comment is correct. The details are not. The cost of the correct solution is more like sum_w (DF(w)^2) where you sum over all words. If you use a stop list, you eliminate all words with large DF. On Wed, Sep 9, 2009 at 2:43 PM, Ted Dunning

Re: computing pairwise document similarity

2009-09-09 Thread Ted Dunning
, 2009 at 3:16 PM, Paolo D'alberto pdalb...@yahoo-inc.comwrote: Interesting, how matrix multiply is used for the 1-1 comparison ? I new it that you can use matrix multiply for the All pair shortest path (N^3) but for all 1-1 comparison should be N^2 ... would you mind to share ? -- Ted Dunning

Re: Pregel

2009-09-04 Thread Ted Dunning
, 2009 at 9:57 AM, Edward J. Yoonedwardy...@apache.org wrote: We've already made a prototype of Hamburg based on multi thread. It's a BSP based graph computing framework, not a M/R based application. Please Join to ... http://groups.google.com/group/hamburg-dev -- Ted Dunning, CTO DeepDyve

Re: Pregel

2009-09-03 Thread Ted Dunning
You would be entirely welcome in Mahout. Graph based algorithms are key for lots of kinds of interesting learning and would be a fabulous thing to have in a comprehensive substrate. I personally would also be very interested in learning more about about what sorts of things Pregel is doing. It

Re: zookeeper on ec2

2009-09-01 Thread Ted Dunning
) It's been running for about 48 hours. On Tue, Sep 1, 2009 at 5:12 PM, Ted Dunning ted.dunn...@gmail.com wrote: Do you have long GC delays? On Tue, Sep 1, 2009 at 4:51 PM, Satish Bhatti cthd2...@gmail.com wrote: Session timeout is 30 seconds. On Tue, Sep 1, 2009 at 4

Re: Watches

2009-08-29 Thread Ted Dunning
in receiving notifications. Cheers Avinash -- Ted Dunning, CTO DeepDyve

Re: NN memory consumption on 0.20/0.21 with compressed pointers/

2009-08-25 Thread Ted Dunning
had a recent article in which the author measured (informally) the throughput available for Disk, SSD and main memory with sequential and random access patterns. Sequential disk was slightly faster than random access to main memory. -- Ted Dunning, CTO DeepDyve

Re: Intra-datanode balancing?

2009-08-25 Thread Ted Dunning
It used to matter quite a lot. On Tue, Aug 25, 2009 at 1:25 PM, Kris Jirapinyo kris.jirapi...@biz360.comwrote: The order matters?

Re: How to deal with too many fetch failures?

2009-08-20 Thread Ted Dunning
://markmail.org/message/lgafou6d434n2dvx On Wed, Aug 19, 2009 at 10:39 PM, yang song hadoop.ini...@gmail.com wrote: Thank you Ted. Update current cluster is a huge work, we don't want to do so. Could you tell me how 0.19.1 causes certain failures in detail? Thanks again. 2009/8/20 Ted Dunning

Re: File Chunk to Map Thread Association

2009-08-20 Thread Ted Dunning
. Increasing the DFS blocksize for the input files is another means to achieve the same effect. -- Ted Dunning, CTO DeepDyve

Re: Why the jobs are suspended when I add new nodes?

2009-08-17 Thread Ted Dunning
Have you looked at the logs? On Sun, Aug 16, 2009 at 11:36 PM, yang song hadoop.ini...@gmail.com wrote: Hi, all When I add another 50 nodes into the current cluster(200 nodes) at the same time, the jobs run very smoothly at first. However, after a while, all the jobs are suspended and

Re: zkclient now has a mailing list

2009-08-13 Thread Ted Dunning
THat would be a great way to get really good feedback. On Thu, Aug 13, 2009 at 4:13 PM, Stefan Groschupf s...@101tec.com wrote: If we have something clean and stable running we might contribute it back to the apache zk project. -- Ted Dunning, CTO DeepDyve

Re: What will we encounter if we add a lot of nodes into the current cluster?

2009-08-12 Thread Ted Dunning
at the same time. Could you give me some tips and notes? During the process, which part shall we pay much attention on? Thank you! P.S. Our environment is hadoop-0.19.1, jdk1.6.0_06, linux redhat enterprise 4.0 -- Ted Dunning, CTO DeepDyve

Re: What will we encounter if we add a lot of nodes into the current cluster?

2009-08-12 Thread Ted Dunning
tool(bin/hadoop balancer -t xxx). However, the data transfer is so slow that it will take a long long time. Is there a good method to solve it? -- Ted Dunning, CTO DeepDyve

Re: How to break a hadoop-cluster in subclusters (how to group physical nodes)?

2009-08-09 Thread Ted Dunning
On Sun, Aug 9, 2009 at 8:17 AM, Harold Valdivia Garcia harold.valdi...@upr.edu wrote: Ok, you mean that I could setup an instance of HDFS, then install multiple cluster of tasktracker with the same HDFS.? I think so. In this configuration as you say I'd loss data-locatily because map-task

Re: How to break a hadoop-cluster in subclusters (how to group physical nodes)?

2009-08-08 Thread Ted Dunning
lower cluster utilization combined with configuration headaches. On Sat, Aug 8, 2009 at 7:28 PM, Harold Valdivia Garcia harold.valdi...@upr.edu wrote: for example I'd like to have a region for only sorting, other for only joins, other for only groupby -- Ted Dunning, CTO DeepDyve

Re: How to redistribute files on HDFS after adding new machines to cluster?

2009-08-07 Thread Ted Dunning
to one local machine and then uploading it back on HDFS? ~ Prashant, SIEL, IIIT-Hyderabad. -- Ted Dunning, CTO DeepDyve

Re: How to redistribute files on HDFS after adding new machines to cluster?

2009-08-07 Thread Ted Dunning
together as one will slow down the other. -- Ted Dunning, CTO DeepDyve

Re: Counting no. of keys.

2009-08-04 Thread Ted Dunning
prashullega...@gmail.com wrote: Hi, I've say 800 sequence files written using SequenceFileOutputFormat. Is there any way to know no. of unique keys in those sequence files? Thanks, Prashant. -- Ted Dunning, CTO DeepDyve -- Zhong Wang -- Ted Dunning

Re: Counting no. of keys.

2009-08-02 Thread Ted Dunning
wrote: Hi, I've say 800 sequence files written using SequenceFileOutputFormat. Is there any way to know no. of unique keys in those sequence files? Thanks, Prashant. -- Ted Dunning, CTO DeepDyve

Re: Map performance with custom binary format

2009-07-28 Thread Ted Dunning
), and actually saw a decrease in performance (~90MB/s). Any help is appreciated. Thanks! Will some hadoop-site.xml values: dfs.replication 3 io.file.buffer.size 65536 dfs.datanode.handler.count 3 mapred.tasktracker.map.tasks.maximum 6 dfs.namenode.handler.count 5 -- Ted

Re: Map performance with custom binary format

2009-07-28 Thread Ted Dunning
expect good performance. Is it possible that the 50MB/s on a single node was not a real number? It seems somewhat high but probably reasonable with modern hardware. Was the file already in memory? -- Ted Dunning, CTO DeepDyve

Re: Map performance with custom binary format

2009-07-28 Thread Ted Dunning
(isSplittable() is false). I don't think that would count for such a large discrepancy in expected performance, would it? -- Ted Dunning, CTO DeepDyve

Re: Zookeeper WAN Configuration

2009-07-26 Thread Ted Dunning
the performance of the ensemble, provided large blobs of traffic were not being sent across the network. -- Ted Dunning, CTO DeepDyve

Re: Zookeeper WAN Configuration

2009-07-24 Thread Ted Dunning
to...@audiencescience.comwrote: Ted, could you elaborate a bit more on this? I was under the (mis) impression that each ZK server in an ensemble only needed connectivity to another member in the ensemble, not to each member in the ensemble. It sounds like you are saying the latter is true. -- Ted Dunning, CTO

Re: Remote access to cluster using user as hadoop

2009-07-23 Thread Ted Dunning
Another best practice is to have a sandbox cluster separated from production. On Thu, Jul 23, 2009 at 9:17 AM, Aaron Kimball aa...@cloudera.com wrote: The current best practice is to firewall off your cluster, configure a SOCKS proxy/gateway, and only allow traffic to the cluster from the

Re: Remote access to cluster using user as hadoop

2009-07-23 Thread Ted Dunning
it to downgrade superusers, and it doesn't have to be too clean or work for every edge case. it's more to stop accidental problems. -- Ted Dunning, CTO DeepDyve

Re: drbl for hadoop

2009-07-21 Thread Ted Dunning
: I would imagine os mgmt would be easier and you could use all the disk space in the machines for data, any ideas on this? -- Ted Dunning, CTO DeepDyve

Re: Queue code

2009-07-17 Thread Ted Dunning
for extremely large queues of pending tasks. On Fri, Jul 17, 2009 at 1:20 PM, Mahadev Konar maha...@yahoo-inc.comwrote: Also are there any performance numbers of zookeeeper based queues. How does it compare with JMS. -- Ted Dunning, CTO DeepDyve

Re: map side Vs. Reduce side join

2009-07-17 Thread Ted Dunning
. On Thu, Jul 16, 2009 at 11:01 PM, jason hadoop jason.had...@gmail.comwrote: I seem to be one of the mapside join champions. For jobs that fit onto that pattern there is usually a 100x speed improvment, compared to doing reduce side joins, for real (large) datasets. -- Ted Dunning, CTO DeepDyve

Re: Data-local map tasks lower than Launched map tasks even with full replication

2009-07-17 Thread Ted Dunning
Does [hadoop fs -fsck /] show any under-replicated files/blocks? you may not waited long enough after increasing the target replication rate. Another thing to watch out for in a production node is the distribution of node blocks. You should be careful to load data from outside the cluster to

Re: Hardware Manufacturer

2009-07-15 Thread Ted Dunning
It is very unusual to have enough power to fill a rack with servers. Check your power and heat loading calculations. You might consider also a new box that Sillicon Mechanics has. It is essentially four 1U servers in a 2U package. Each of the four servers has dual quad core machines and up to

Re: Looking for counterpart of Configure Method

2009-07-14 Thread Ted Dunning
the Hadoop core-user mailing list archive at Nabble.com. -- Ted Dunning, CTO DeepDyve 111 West Evelyn Ave. Ste. 202 Sunnyvale, CA 94086 http://www.deepdyve.com 858-414-0013 (m) 408-773-0220 (fax)

Re: How to implement Counters in Pig ?

2009-07-13 Thread Ted Dunning
This doesn't sound like support for the standard hadoop counters. Those counters are *very* valuable for job monitoring. Logs serve a very different purpose. On Mon, Jul 13, 2009 at 11:44 AM, Alan Gates ga...@yahoo-inc.com wrote: For 3) the logs seems sufficient to me, but if people want them

Re: How to implement Counters in Pig ?

2009-07-13 Thread Ted Dunning
I am saying that Hadoop counters as displayed in the standard Hadoop web interface are really useful to have. As such, it is really nice to be able to increment those counters directly without having to write a UDF just for that purpose. Counters in logs are a completely orthogonal issue.

Re: Disk configuration.

2009-07-13 Thread Ted Dunning
be a dfs directory and another for task temp storage. Hadoop will round-robin writes to these automatically. Dfs.data.dir might look something like: property namedfs.data.dir/name value/hadoop1/dfs/data,/hadoop2/dfs/data/value -- Ted Dunning, CTO DeepDyve 111 West Evelyn Ave. Ste

Re: Limit the number of open files in MultipleTextOutputFormat

2009-07-10 Thread Ted Dunning
On Fri, Jul 10, 2009 at 1:16 AM, Marcus Herou marcus.he...@tailsweep.comwrote: However I am sure that we have more keys than that in our production data so I guess hadoop will throw the Too many open files exception then. Generally having lots of small files is very bad for performance. It

Re: Lucene index creation using Hadoop

2009-07-09 Thread Ted Dunning
You don't mention what size cluster you have, but we use a relatively small cluster and index hundreds of GB in an hour to few hours (depending on the content and the size fo the cluster). So your results are anomalous. However, we wrote our own indexer. The way it works is that documents are

Re: How to make data available in 10 minutes.

2009-07-09 Thread Ted Dunning
You are basically re-inventing lots of capabilities that others have solved before. The idea of building an index that refers to files which are constructed by progressive merging is very standard and very similar to the way that Lucene works. You don't say how much data you are moving, but I

Re: Accessing static variables in map function

2009-07-09 Thread Ted Dunning
Use the configuration object. Remember that the outer class is replicated all across the known universe. Your command line arguments only exist on your original machine. On Thu, Jul 9, 2009 at 11:35 AM, smarthr...@yahoo.co.in wrote: Hey Ram. The problem is i initialize these variables in the

Re: Extracting data from HDFS and displaying stats to a webpage

2009-07-08 Thread Ted Dunning
On Wed, Jul 8, 2009 at 7:46 PM, Christophe Bisciglia christo...@cloudera.com wrote: Hey Usman, your second approach is on the right track. You don't want to have your end users interacting directly with HDFS. The latency is too high, and it wasn't designed for this. This definitely used to

Re: zookeeper on ec2

2009-07-06 Thread Ted Dunning
On Mon, Jul 6, 2009 at 12:58 PM, Gustavo Niemeyer gust...@niemeyer.netwrote: can make the ZK servers appear a bit less connected. You have to plan for ConnectionLoss events. Interesting. Note that most of these seem to be related to client issues, especially GC. If you configure in such

Re: Parallell maps

2009-07-03 Thread Ted Dunning
I don't understand this statement. Basic page rank in map-reduce is normally a simple undergraduate class assignment: http://www.ics.uci.edu/~abehm/class.../uci/.../Behm-Shah_PageRank.ppt http://www.umiacs.umd.edu/~jimmylin/cloud9/docs/exercises/pagerank.html What is it about your problem that

Re: Parallell maps

2009-07-03 Thread Ted Dunning
hardware to get the most for bang for the buck. What is mostly needed for HBase to scale ? Memory ? Total amount of HDFS IO ? CPU ? To little memory then I guess the load go IO-bound ? -- Ted Dunning, CTO DeepDyve 111 West Evelyn Ave. Ste. 202 Sunnyvale, CA 94086 http://www.deepdyve.com

Re: Parallell maps

2009-07-02 Thread Ted Dunning
is a cartoon, but is surprisingly realistic. Whenever you say random access, I think that you are paying four orders of magnitude more in costs than you should. -- Ted Dunning, CTO DeepDyve 111 West Evelyn Ave. Ste. 202 Sunnyvale, CA 94086 http://www.deepdyve.com 858-414-0013 (m) 408-773-0220 (fax)

Re: Using addCacheArchive

2009-07-02 Thread Ted Dunning
a directory to the tasktrackers. The API doc http://hadoop.apache.org/core/docs/r0.20.0/api/index.html says that archives are unzipped on the tasktrackers but I want an example of how to use this in case of a dreictory. -- Ted Dunning, CTO DeepDyve 111 West Evelyn Ave. Ste. 202 Sunnyvale, CA 94086

Re: local directory

2009-07-01 Thread Ted Dunning
-- Pro Hadoop, a book to guide you from beginner to hadoop mastery, http://www.amazon.com/dp/1430219424?tag=jewlerymall www.prohadoopbook.com a community for Hadoop Professionals -- Ted Dunning, CTO DeepDyve 111 West Evelyn Ave. Ste. 202 Sunnyvale, CA 94086 http

Re: Some questions about Zookeeper 3.2.0

2009-06-29 Thread Ted Dunning
A rolling update works very well for that. You can also change the number of nodes in the cluster. To do this, you replace the config files on the surviving servers and on the new server. Then take down the one that is leaving the cluster and then one by one restart the servers that will remain

Re: Some questions about Zookeeper 3.2.0

2009-06-28 Thread Ted Dunning
I don't think you should be very nervous at all. There are two questions: 1) can 3.1.1 go to 3.2 with no down time. This is very likely, but a wiser head than mine should have final say 2) can 3.1.1 go to 3.2 with 1 minute of downtime. The is for sure. Neither option involves data loss. ZK

Re: Some questions about Zookeeper 3.2.0

2009-06-27 Thread Ted Dunning
In general for changes like this, you need to be running more than one server in a cluster to avoid losing state such as the ephemeral nodes. I can't say for certain that the 3.1.1 to 3.2 change can be done this way, but most upgrades can be done by stopping one server at a time, changing the

<    1   2   3   4   5   6   7   >