Re: doubts reg Hive

2012-10-01 Thread sudha sadhasivam
Sir I want to retrieve names of fields having a particular value. How can it be done on a single table and across multiple tables? G Sudha --- On Mon, 10/1/12, Harsh J ha...@cloudera.com wrote: From: Harsh J ha...@cloudera.com Subject: Re: doubts reg Hive To: common-user@hadoop.apache.org Date:

AUTO: Yuan Jin is out of the office. (returning 10/08/2012)

2012-10-01 Thread Yuan Jin
I am out of the office until 10/08/2012. I am out of office. I will reply you after the holiday. Note: This is an automated response to your message java.lang.IllegalStateException: Job in state DEFINE instead of RUNNING sent on 01/10/2012 21:32:09. This is the only notification you will

Add file to distributed cache

2012-10-01 Thread Abhishek
Hi all How do you add a small file to distributed cache in MR program Regards Abhi Sent from my iPhone

Re: Add file to distributed cache

2012-10-01 Thread Bejoy KS
Hi Abshiek You can find a simple example of using Distributed Cache here http://kickstarthadoop.blogspot.co.uk/2011/05/hadoop-for-dependent-data-splits-using.html --Original Message-- From: Abhishek To: common-user@hadoop.apache.org ReplyTo: common-user@hadoop.apache.org Subject: Add file

Re: Which hardware to choose

2012-10-01 Thread Alexander Pivovarov
Privet Oleg Cloudera and Dell setup the following cluster for my company Company receives 1.5 TB raw data per day 38 data nodes + 2 Name Nodes Data Node: Dell PowerEdge C2100 series 2 x XEON x5670 48 GB RAM ECC (12x4GB 1333MHz) 12 x 2 TB 7200 RPM SATA HDD (with hot swap) JBOD Intel Gigabit

Re: DFSClient may read wrong data in local read

2012-10-01 Thread jlei liu
Hi Colin, Thanks for your reply. What is mean that the patch will work on files that are in the process of being written? Thanks, LiuLei 2012/10/1 Colin McCabe cmcc...@alumni.cmu.edu I'm going to post a patch to HDFS-347 shortly. From the user's point of view, the important thing about the

HADOOP in Production

2012-10-01 Thread yogesh dhari
Hi all, I have understood the Hadoop and Hadoop Ecosystem(Pig as ETL, Hive as DataWare house, Sqoop as importing tool). I worked and learned on single node cluster with demo data. As Hadoop suits best on Unix platform. Please help me to understand the requirement form start to finish to use

Re: NEED HELP:: using Hadoop in Production

2012-10-01 Thread Bertrand Dechoux
Nothing personal but I might be the only one to answer and I will provide only one link http://www.catb.org/~esr/faqs/smart-questions.html Regards Bertrand On Mon, Oct 1, 2012 at 4:19 PM, yogesh dhari yogeshdh...@live.com wrote: Hi all, I have understood the Hadoop and Hadoop

RES: NEED HELP:: using Hadoop in Production

2012-10-01 Thread Ferreira, Rafael
Try using Cloudera CDH4. http://www.cloudera.com/products-services/enterprise/ It´s a easy way, web front-ended Hadoop ecosystem manager. Regards, Rafael Pecin De: yogesh dhari [mailto:yogeshdh...@live.com] Enviada em: segunda-feira, 1 de outubro de 2012 11:19 Para: hadoop helpforoum Assunto:

HDFS file missing a part-file

2012-10-01 Thread Björn-Elmar Macek
Hi, i am kind of unsure where to post this problem, but i think it is more related to hadoop than to pig. By successfully executing a pig script i created a new file in my hdfs. Sadly though, i cannot use it for further processing except for dumping and viewing the data: every

Re: java.lang.IllegalStateException: Job in state DEFINE instead of RUNNING

2012-10-01 Thread Harsh J
Vinod's right, but its just waitForCompletion(true|false); you ought to use for isSuccessful() checks afterwards, cause with submit(), which is a non-blocking way of submission, you'll end up immediately checking it and get a false always, cause the state will still be mostly RUNNING at that

Re: HDFS file missing a part-file

2012-10-01 Thread Björn-Elmar Macek
Hi Robert, the exception i see in the output of the grunt shell and in the pig log respectively is: Backend error message - java.util.EmptyStackException at java.util.Stack.peek(Stack.java:102) at

Reduce Copy Speed

2012-10-01 Thread Brandon
What speed do people typically see for the copy during a reduce? From tasktracker here is an average on: reduce copy (500 of 504 at 1.52 MB/s) We have seen it range from .5 to 4 MB/s. That seems a bit slow. Does anyone else have other benchmark numbers to share?

Re: Reduce Copy Speed

2012-10-01 Thread Harsh J
Hi Brandon, On Mon, Oct 1, 2012 at 11:23 PM, Brandon bma...@upstreamsoftware.com wrote: What speed do people typically see for the copy during a reduce? It varies due to a few factors. But there's highly improved netty-based transfers in Hadoop 2.x that you can use for even faster, and more

RE: Nested class

2012-10-01 Thread Kartashov, Andy
Got it to work by empting HADOOP_CLASSPATH variable. Andy Kartashov MPAC Architecture RD, Co-op 1340 Pickering Parkway, Pickering, L1V 0C4 * Phone : (905) 837 6269 * Mobile: (416) 722 1787 andy.kartas...@mpac.camailto:andy.kartas...@mpac.ca From: Alexander Pivovarov

File block size use

2012-10-01 Thread Anna Lahoud
I would like to be able to resize a set of inputs, already in SequenceFile format, to be larger. I have tried 'hadoop distcp -Ddfs.block.size=$[64*1024*1024]' and did not get what I expected. The outputs were exactly the same as the inputs. I also tried running a job with an IdentityMapper and

AUTO: Yuan Jin is out of the office. (returning 10/08/2012)

2012-10-01 Thread Yuan Jin
I am out of the office until 10/08/2012. I am out of office. I will reply you after the holiday. Note: This is an automated response to your message HADOOP in Production sent on 01/10/2012 21:36:58. This is the only notification you will receive while this person is away.

Re: HDFS file missing a part-file

2012-10-01 Thread Björn-Elmar Macek
The script i now want to executed looks like this: x = load 'tag_count_ts_pro_userpair' as (group:tuple(),cnt:int,times:bag{t:tuple(c:chararray)}); y = foreach x generate *, moins.daysFromStart('2011-06-01 00:00:00', times); store y into 'test_daysFromStart'; The problem is, that i do not

RE: Two map inputs (file HBase). Join file data and Hbase data into a map reduce.

2012-10-01 Thread Pablo Musa
Hi Bejoy, thank you for the answer, I will try it. But I still have a doubt. How should I manage connections to HBase inside the job? Should I open a new connection in each job? How can I set a connectionPool inside a job? Thank you, Pablo From: Bejoy Ks [mailto:bejoy.had...@gmail.com] Sent:

Re: File block size use

2012-10-01 Thread Chris Nauroth
Hello Anna, If I understand correctly, you have a set of multiple sequence files, each much smaller than the desired block size, and you want to concatenate them into a set of fewer files, each one more closely aligned to your desired block size. Presumably, the goal is to improve throughput of

Re: File block size use

2012-10-01 Thread Bejoy KS
Hi Anna If you want to increase the block size of existing files. You can use a Identity Mapper with no reducer. Set the min and max split sizes to your requirement (512Mb). Use SequenceFileInputFormat and SequenceFileOutputFormat for your job. Your job should be done. Regards Bejoy KS