This isn't a very Hadoop question.
A Bloom filter is a very low level data structure that doesn't really any
correlate in SQL. It allows you to find duplicates quickly and
probabilistically. In return for a small probability of a false positive,
it uses less memory.
On Fri, Mar 29, 2013 at 5:3
using haddop jar, instead of java -jar.
hadoop script can set a proper classpath for you.
On Mar 29, 2013 11:55 PM, "Cyril Bogus" wrote:
> Hi,
>
> I am running a small java program that basically write a small input data
> to the Hadoop FileSystem, run a Mahout Canopy and Kmeans Clustering and
Hello Rahul,
You might find this links useful :
http://www.michael-noll.com/blog/2011/04/09/benchmarking-and-stress-testing-an-hadoop-cluster-with-terasort-testdfsio-nnbench-mrbench/
And the official page :
http://hadoop.apache.org/docs/current/api/org/apache/hadoop/examples/terasort/package-s
Hi all,
I have my Hadoop cluster setup. I am using the Intel distribution of Hadoop. I
was planning to run some test like terasort on the cluster just to check
whether all the nodes in the cluster are working properly. As I am new to this
Hadoop I am not sure where to start with. Any kind of h
Dear Ravi,
2013/3/29 Ravi Chandran
> But in standalone mode, should the safemode be faster? I mean because
> everything is running locally. still the deamons are not visible in jps.
> how can i restart it individually?
>
What do you mean with standalone mode?
"standalone" in the sense of hadoo
Hi Himanish,
2013/3/29 HimaHnish Kushary
> [...]
>
>
> But the real issue is the throughput. You mentioned that you had
> transferred 1.5 TB in 45 mins which comes to around 583 MB/s. I am barely
> getting 4 MB/s upload speed
>
How large is your outgoing link? Can you expect 500 MB/s with it?
Dear Sai Sai,
you wrote:
> key = 0 value = 1010
> key = 6 value = 20200
> ...
the provided key is the byte offset of the respective line in your input
file.
See TextInputFormat docs here:
http://hadoop.apache.org/docs/stable/api/org/apache/hadoop/mapred/TextInputFormat.html
I guess this
Hi Jens
Here is the code for the driver if this is what you r referring to is missing,
plesae let me know if you need any additional info:
Your help is appreciated.
public class SecondarySortDriver {
/**
* @param args
* @throws Exception
*/
public static void main(String[] args) throws Exce
Just wondering if there are a list of linux commands or any article which r
needed for learning hadoop.
Thanks
>From http://msdn.microsoft.com/en-us/library/cc278097(v=sql.100).aspx :
The new technology employed is based on bitmap filters, also known as *Bloom
filters *(see *Bloom filter, *Wikipedia 2007,
http://en.wikipedia.org/wiki/Bloom_filter) ...
HBase uses bloom filters extensively. I can give refer
Can some one give a simple analogy of Bloom Filter in SQL.
I am trying to understand and always get confused.
Thanks
Hallo Sai,
the interesting bits are, how your job is configured. Depending on how you
defined the input to the MR-job, e.g. as a text file you might get this
result. Unfortunately, you didn't give this source code...
Best regards,
Jens
Hi,
I am running a small java program that basically write a small input data
to the Hadoop FileSystem, run a Mahout Canopy and Kmeans Clustering and
then output the content of the data.
In my hadoop.properties I have included the core-site.xml definition for
the Java program to connect to my sin
Hi All
I can use HPROF in java map reduce jobs.
Configuration conf = getConf();
conf.setBoolean("mapred.task.profile", true);
conf.set("mapred.task.profile.params", "-agentlib:hprof=cpu=samples," +
"heap=sites,depth=6,force=n,thread=y,verbose=n,file=%s");
conf.set("mapred.task.profile.maps",
Yes you are right CDH4 is the 2.x line, but I even checked in the javadocs
for 1.0.4 branch (could not find 1.0.3 API's so used
http://hadoop.apache.org/docs/r1.0.4/api/index.html) but did not find
the "ProgressableResettableBufferedFileInputStream"
class.Not sure how it is present in the hadoop-co
CDH4 can be either 1.x or2.x hadoop, are you using the 2.x line? I've used it
primarily with 1.0.3, which is what AWS uses, so I presume that's what it's
tested on.
Himanish Kushary wrote:
>Thanks Dave.
>
>
>I had already tried using the s3distcp jar. But got stuck on the below
>error,which m
Maybe har is a choice.
http://hadoop.apache.org/docs/r1.1.2/hadoop_archives.html
Ling kun
On Friday, March 29, 2013, Ted Dunning wrote:
> Putting each document into a separate file is not likely to be a great
> thing to do.
>
> On the other hand, putting them all into one file may not be what y
Maybe har is a choice.
http://hadoop.apache.org/docs/r1.1.2/hadoop_archives.html
Ling kun
On Friday, March 29, 2013, Ted Dunning wrote:
> Putting each document into a separate file is not likely to be a great
> thing to do.
>
> On the other hand, putting them all into one file may not be what y
Thanks Dave.
I had already tried using the s3distcp jar. But got stuck on the below
error,which made me think that this is something specific to Amazon hadoop
distribution.
Exception in thread "Thread-28" java.lang.NoClassDefFoundError:
org/apache/hadoop/fs/s3native/ProgressableResettableBuffered
Putting each document into a separate file is not likely to be a great
thing to do.
On the other hand, putting them all into one file may not be what you want
either.
It is probably best to find a middle ground and create files each with many
documents and each a few gigabytes in size.
On Fri,
If there r 1 million docs in an enterprse and we need to perform word count
computation on all the docs what is the first step to be done. Is it to
extract all the text of all the docs into a single file and then put into hdfs
or put each one separately in hdfs.
Thanks
Sent from BlackBerry®
For information, the 50 node limit on CDH is a past limitation. It is no
longer the case.
*Support for unlimited nodes*. Previous versions of Cloudera Manager Free
> Edition limited the number of managed nodes to 50. This limitation has been
> removed.
>
https://ccp.cloudera.com/display/FREE45DOC
On 03/29/2013 01:09 AM, David Parks wrote:
Hmm, seems intriguing. I’m still not totally clear on bigtop here. It
seems like they’re creating and maintain basically an installer for Hadoop?
I tried following their docs for Ubuntu, but just get a 404 error on the
first step, so it makes me wonder
Hi Arun,
I had to change the way I get queueInfo in Client.java
from
GetQueueInfoRequest queueInfoReq =
Records.newRecord(GetQueueInfoRequest.class);
GetQueueInfoResponse queueInfoResp =
applicationsManager.getQueueInfo(queueInfoReq);
QueueInfo queueInfo = queueInfoResp.getQueueInfo
Ive never used the Cloudera distributions, but you cant not hear about
them. Is it really much easier to manage the whole platform using clouderas
manager? 50 nodes free is generous enough that Id feel comfortable
committing to them as a platform (and thus the future potential cost), I
think.
I recommend cloudera's CDH4 on ubuntu 12.04 LTS
On Thu, Mar 28, 2013 at 7:07 AM, David Parks wrote:
> I’m moving off AWS MapReduce to our own cluster, I’m installing Hadoop on
> Ubuntu Server 12.10.
>
> ** **
>
> I see a .deb installer and installed that, but it seems like files are all
> o
Hmm, seems intriguing. I'm still not totally clear on bigtop here. It seems
like they're creating and maintain basically an installer for Hadoop?
I tried following their docs for Ubuntu, but just get a 404 error on the
first step, so it makes me wonder how reliable that project is.
https://
Now I have linked the shared library. Now I get below error while running
mount -a
# mount -a
INFO
/data/1/jenkins/workspace/generic-package-rhel64-6-0/topdir/BUILD/hadoop-2.0.0-cdh4.1.3/src/hadoop-hdfs-project/hadoop-hdfs/src/main/native/fuse-dfs/fuse_options.c:164
Adding FUSE arg /mnt/san1/hadoo
It would appear that your HDFS set up is not functioning properly.
Please try to shut down HDFS (stop-all.sh).. waiting for a bit (5 mins) and
restarting HDFS (start-all.sh).
If that does not work, you might have to reformat the NameNode (after
shutting down HDFS again).
Similar solution presente
But in standalone mode, should the safemode be faster? I mean because
everything is running locally. still the deamons are not visible in jps.
how can i restart it individually?
also, the service status returned this info:
Hadoop namenode is running [ OK ]
Hadoop
Hi,
I am getting below error while mounting fuse_dfs i am getting shared
library error while running the command. mount -a. Can anybody tell me to
fix this plz
# cat /etc/fstab | grep hadoop
hadoop-fuse-dfs#dfs://localhost:8020 /mnt/san1/hadoop_mount fuse
allow_other,usetrash,rw 2 0
# mount -
Hello,
In the Safe Mode, -copyFromLocal would not work.
Please read:
http://hadoop.apache.org/docs/stable/hdfs_user_guide.html#Safemode
Please wait a bit for the HDFS system to exit SafeMode. If it takes a
significantly long time, and the HDFS is still in the SafeMode, something
could be wrong wit
Thanks for replying, I did jps, it doesnt show any of the deamon services.
also, i just got the error message showing:
Cannot create file/user/training/inputs/basic.txt._COPYING_. Name node is
in safe mode.
looks like JT and DN are not responding to NN. but this is a standalone
setup, i dont und
Your cluster is not running properly.
Can you do jps and see if all services are running.
JT
NN
DN
etc
On Fri, Mar 29, 2013 at 6:07 PM, Ravi Chandran wrote:
> hi,
> I am trying to copy a local text file into hadoop fs using -copyFromLocal
> option, but i am getting error:
>
> 13/03/29 03:02:
hi,
I am trying to copy a local text file into hadoop fs using -copyFromLocal
option, but i am getting error:
13/03/29 03:02:54 INFO ipc.Client: Retrying connect to server:
0.0.0.0/0.0.0.0:8020. Already tried 0 time(s); maxRetries=45
13/03/29 03:03:15 INFO ipc.Client: Retrying connect to server:
0
35 matches
Mail list logo