Re: Bloom Filter analogy in SQL

2013-03-29 Thread Ted Dunning
This isn't a very Hadoop question. A Bloom filter is a very low level data structure that doesn't really any correlate in SQL. It allows you to find duplicates quickly and probabilistically. In return for a small probability of a false positive, it uses less memory. On Fri, Mar 29, 2013 at 5:3

Re: FileSystem Error

2013-03-29 Thread Azuryy Yu
using haddop jar, instead of java -jar. hadoop script can set a proper classpath for you. On Mar 29, 2013 11:55 PM, "Cyril Bogus" wrote: > Hi, > > I am running a small java program that basically write a small input data > to the Hadoop FileSystem, run a Mahout Canopy and Kmeans Clustering and

Re: Running test on hadoop cluster

2013-03-29 Thread Mohammad Tariq
Hello Rahul, You might find this links useful : http://www.michael-noll.com/blog/2011/04/09/benchmarking-and-stress-testing-an-hadoop-cluster-with-terasort-testdfsio-nnbench-mrbench/ And the official page : http://hadoop.apache.org/docs/current/api/org/apache/hadoop/examples/terasort/package-s

Running test on hadoop cluster

2013-03-29 Thread Shah, Rahul1
Hi all, I have my Hadoop cluster setup. I am using the Intel distribution of Hadoop. I was planning to run some test like terasort on the cluster just to check whether all the nodes in the cluster are working properly. As I am new to this Hadoop I am not sure where to start with. Any kind of h

Re: error copying file from local to hadoop fs

2013-03-29 Thread Jens Scheidtmann
Dear Ravi, 2013/3/29 Ravi Chandran > But in standalone mode, should the safemode be faster? I mean because > everything is running locally. still the deamons are not visible in jps. > how can i restart it individually? > What do you mean with standalone mode? "standalone" in the sense of hadoo

Re: Hadoop distcp from CDH4 to Amazon S3 - Improve Throughput

2013-03-29 Thread Jens Scheidtmann
Hi Himanish, 2013/3/29 HimaHnish Kushary > [...] > > > But the real issue is the throughput. You mentioned that you had > transferred 1.5 TB in 45 mins which comes to around 583 MB/s. I am barely > getting 4 MB/s upload speed > How large is your outgoing link? Can you expect 500 MB/s with it?

Re: Understanding Sys.output from mapper & partitioner

2013-03-29 Thread Jens Scheidtmann
Dear Sai Sai, you wrote: > key = 0 value = 1010 > key = 6 value = 20200 > ... the provided key is the byte offset of the respective line in your input file. See TextInputFormat docs here: http://hadoop.apache.org/docs/stable/api/org/apache/hadoop/mapred/TextInputFormat.html I guess this

Re: Understanding Sys.output from mapper & partitioner

2013-03-29 Thread Sai Sai
Hi Jens Here is the code for the driver if this is what you r referring to is missing, plesae let me know if you need any additional info: Your help is appreciated. public class SecondarySortDriver { /** * @param args * @throws Exception  */ public static void main(String[] args) throws Exce

Re: list of linux commands for hadoop

2013-03-29 Thread Sai Sai
Just wondering if there are a list of linux commands or any article which r needed for learning hadoop. Thanks

Re: Bloom Filter analogy in SQL

2013-03-29 Thread Ted Yu
>From http://msdn.microsoft.com/en-us/library/cc278097(v=sql.100).aspx : The new technology employed is based on bitmap filters, also known as *Bloom filters *(see *Bloom filter, *Wikipedia 2007, http://en.wikipedia.org/wiki/Bloom_filter) ... HBase uses bloom filters extensively. I can give refer

Re: Bloom Filter analogy in SQL

2013-03-29 Thread Sai Sai
Can some one give a simple analogy of Bloom Filter in SQL. I am trying to understand and always get confused. Thanks

Re: Understanding Sys.output from mapper & partitioner

2013-03-29 Thread Jens Scheidtmann
Hallo Sai, the interesting bits are, how your job is configured. Depending on how you defined the input to the MR-job, e.g. as a text file you might get this result. Unfortunately, you didn't give this source code... Best regards, Jens

FileSystem Error

2013-03-29 Thread Cyril Bogus
Hi, I am running a small java program that basically write a small input data to the Hadoop FileSystem, run a Mahout Canopy and Kmeans Clustering and then output the content of the data. In my hadoop.properties I have included the core-site.xml definition for the Java program to connect to my sin

How to use HPROF for rhadoop jobs ?

2013-03-29 Thread rohit sarewar
Hi All I can use HPROF in java map reduce jobs. Configuration conf = getConf(); conf.setBoolean("mapred.task.profile", true); conf.set("mapred.task.profile.params", "-agentlib:hprof=cpu=samples," + "heap=sites,depth=6,force=n,thread=y,verbose=n,file=%s"); conf.set("mapred.task.profile.maps",

Re: Hadoop distcp from CDH4 to Amazon S3 - Improve Throughput

2013-03-29 Thread Himanish Kushary
Yes you are right CDH4 is the 2.x line, but I even checked in the javadocs for 1.0.4 branch (could not find 1.0.3 API's so used http://hadoop.apache.org/docs/r1.0.4/api/index.html) but did not find the "ProgressableResettableBufferedFileInputStream" class.Not sure how it is present in the hadoop-co

Re: Hadoop distcp from CDH4 to Amazon S3 - Improve Throughput

2013-03-29 Thread David Parks
CDH4 can be either 1.x or2.x hadoop, are you using the 2.x line? I've used it primarily with 1.0.3, which is what AWS uses, so I presume that's what it's tested on. Himanish Kushary wrote: >Thanks Dave. > > >I had already tried using the s3distcp jar. But got stuck on the below >error,which m

Million docs and word count scenario

2013-03-29 Thread Ling Kun
Maybe har is a choice. http://hadoop.apache.org/docs/r1.1.2/hadoop_archives.html Ling kun On Friday, March 29, 2013, Ted Dunning wrote: > Putting each document into a separate file is not likely to be a great > thing to do. > > On the other hand, putting them all into one file may not be what y

Re: Million docs and word count scenario

2013-03-29 Thread Ling Kun
Maybe har is a choice. http://hadoop.apache.org/docs/r1.1.2/hadoop_archives.html Ling kun On Friday, March 29, 2013, Ted Dunning wrote: > Putting each document into a separate file is not likely to be a great > thing to do. > > On the other hand, putting them all into one file may not be what y

Re: Hadoop distcp from CDH4 to Amazon S3 - Improve Throughput

2013-03-29 Thread Himanish Kushary
Thanks Dave. I had already tried using the s3distcp jar. But got stuck on the below error,which made me think that this is something specific to Amazon hadoop distribution. Exception in thread "Thread-28" java.lang.NoClassDefFoundError: org/apache/hadoop/fs/s3native/ProgressableResettableBuffered

Re: Million docs and word count scenario

2013-03-29 Thread Ted Dunning
Putting each document into a separate file is not likely to be a great thing to do. On the other hand, putting them all into one file may not be what you want either. It is probably best to find a middle ground and create files each with many documents and each a few gigabytes in size. On Fri,

Million docs and word count scenario

2013-03-29 Thread pathurun
If there r 1 million docs in an enterprse and we need to perform word count computation on all the docs what is the first step to be done. Is it to extract all the text of all the docs into a single file and then put into hdfs or put each one separately in hdfs. Thanks Sent from BlackBerry®

Re: Which hadoop installation should I use on ubuntu server?

2013-03-29 Thread Bertrand Dechoux
For information, the 50 node limit on CDH is a past limitation. It is no longer the case. *Support for unlimited nodes*. Previous versions of Cloudera Manager Free > Edition limited the number of managed nodes to 50. This limitation has been > removed. > https://ccp.cloudera.com/display/FREE45DOC

Re: Which hadoop installation should I use on ubuntu server?

2013-03-29 Thread Bruno Mahé
On 03/29/2013 01:09 AM, David Parks wrote: Hmm, seems intriguing. I’m still not totally clear on bigtop here. It seems like they’re creating and maintain basically an installer for Hadoop? I tried following their docs for Ubuntu, but just get a 404 error on the first step, so it makes me wonder

Re: Differences hadoop-2.0.0-alpha Vs hadoop-2.0.3-alpha

2013-03-29 Thread Krishna Kishore Bonagiri
Hi Arun, I had to change the way I get queueInfo in Client.java from GetQueueInfoRequest queueInfoReq = Records.newRecord(GetQueueInfoRequest.class); GetQueueInfoResponse queueInfoResp = applicationsManager.getQueueInfo(queueInfoReq); QueueInfo queueInfo = queueInfoResp.getQueueInfo

RE: Which hadoop installation should I use on ubuntu server?

2013-03-29 Thread David Parks
I’ve never used the Cloudera distributions, but you can’t not hear about them. Is it really much easier to manage the whole platform using clouderas manager? 50 nodes free is generous enough that I’d feel comfortable committing to them as a platform (and thus the future potential cost), I think.

Re: Which hadoop installation should I use on ubuntu server?

2013-03-29 Thread Håvard Wahl Kongsgård
I recommend cloudera's CDH4 on ubuntu 12.04 LTS On Thu, Mar 28, 2013 at 7:07 AM, David Parks wrote: > I’m moving off AWS MapReduce to our own cluster, I’m installing Hadoop on > Ubuntu Server 12.10. > > ** ** > > I see a .deb installer and installed that, but it seems like files are all > o

RE: Which hadoop installation should I use on ubuntu server?

2013-03-29 Thread David Parks
Hmm, seems intriguing. I'm still not totally clear on bigtop here. It seems like they're creating and maintain basically an installer for Hadoop? I tried following their docs for Ubuntu, but just get a 404 error on the first step, so it makes me wonder how reliable that project is. https://

Fwd:

2013-03-29 Thread Mohit Vadhera
Now I have linked the shared library. Now I get below error while running mount -a # mount -a INFO /data/1/jenkins/workspace/generic-package-rhel64-6-0/topdir/BUILD/hadoop-2.0.0-cdh4.1.3/src/hadoop-hdfs-project/hadoop-hdfs/src/main/native/fuse-dfs/fuse_options.c:164 Adding FUSE arg /mnt/san1/hadoo

Re: error copying file from local to hadoop fs

2013-03-29 Thread Anand Aravindan
It would appear that your HDFS set up is not functioning properly. Please try to shut down HDFS (stop-all.sh).. waiting for a bit (5 mins) and restarting HDFS (start-all.sh). If that does not work, you might have to reformat the NameNode (after shutting down HDFS again). Similar solution presente

Re: error copying file from local to hadoop fs

2013-03-29 Thread Ravi Chandran
But in standalone mode, should the safemode be faster? I mean because everything is running locally. still the deamons are not visible in jps. how can i restart it individually? also, the service status returned this info: Hadoop namenode is running [ OK ] Hadoop

[no subject]

2013-03-29 Thread Mohit Vadhera
Hi, I am getting below error while mounting fuse_dfs i am getting shared library error while running the command. mount -a. Can anybody tell me to fix this plz # cat /etc/fstab | grep hadoop hadoop-fuse-dfs#dfs://localhost:8020 /mnt/san1/hadoop_mount fuse allow_other,usetrash,rw 2 0 # mount -

Re: error copying file from local to hadoop fs

2013-03-29 Thread Anand Aravindan
Hello, In the Safe Mode, -copyFromLocal would not work. Please read: http://hadoop.apache.org/docs/stable/hdfs_user_guide.html#Safemode Please wait a bit for the HDFS system to exit SafeMode. If it takes a significantly long time, and the HDFS is still in the SafeMode, something could be wrong wit

Re: error copying file from local to hadoop fs

2013-03-29 Thread Ravi Chandran
Thanks for replying, I did jps, it doesnt show any of the deamon services. also, i just got the error message showing: Cannot create file/user/training/inputs/basic.txt._COPYING_. Name node is in safe mode. looks like JT and DN are not responding to NN. but this is a standalone setup, i dont und

Re: error copying file from local to hadoop fs

2013-03-29 Thread Jagat Singh
Your cluster is not running properly. Can you do jps and see if all services are running. JT NN DN etc On Fri, Mar 29, 2013 at 6:07 PM, Ravi Chandran wrote: > hi, > I am trying to copy a local text file into hadoop fs using -copyFromLocal > option, but i am getting error: > > 13/03/29 03:02:

error copying file from local to hadoop fs

2013-03-29 Thread Ravi Chandran
hi, I am trying to copy a local text file into hadoop fs using -copyFromLocal option, but i am getting error: 13/03/29 03:02:54 INFO ipc.Client: Retrying connect to server: 0.0.0.0/0.0.0.0:8020. Already tried 0 time(s); maxRetries=45 13/03/29 03:03:15 INFO ipc.Client: Retrying connect to server: 0