Size of data directory same on all nodes in cluster

2014-03-12 Thread Vimal Jain
Hi, I have setup 2 node Hbase cluster on top of 2 node HDFS cluster. When i perform du -sh command on data directory ( where hadoop stores data ) on both machines , its shows the same size. As per my understanding , of entire data half of the data is stored in one machine and other half on other

Re: GC overhead limit exceeded

2014-03-12 Thread haihong lu
Thanks, even if i had added this parameter, but had no effect. On Tue, Mar 11, 2014 at 12:11 PM, unmesha sreeveni unmeshab...@gmail.comwrote: Try to increase the memory for datanode and see.This need to restart hadoop export HADOOP_DATANODE_OPTS=-Xmx10g This will set the heap to 10gb You

error in hadoop hdfs while building the code.

2014-03-12 Thread Avinash Kujur
hi, i am getting error like RefreshCallQueueProtocol can not be resolved. it is a java problem. help me out. Regards, Avinash

Re: error in hadoop hdfs while building the code.

2014-03-12 Thread unmesha sreeveni
I think it is Hadoop problem not java https://issues.apache.org/jira/browse/HADOOP-5396 On Wed, Mar 12, 2014 at 11:37 AM, Avinash Kujur avin...@gmail.com wrote: hi, i am getting error like RefreshCallQueueProtocol can not be resolved. it is a java problem. help me out. Regards, Avinash

Re: GC overhead limit exceeded

2014-03-12 Thread divye sheth
Hi Haihong, Please check out the link below, I believe it should solve your problem. http://stackoverflow.com/questions/21005643/container-is-running-beyond-memory-limits Thanks Divye Sheth On Wed, Mar 12, 2014 at 11:33 AM, haihong lu ung3...@gmail.com wrote: Thanks, even if i had added

Re: error in hadoop hdfs while building the code.

2014-03-12 Thread Avinash Kujur
+ import org.apache.hadoop.ipc.RefreshCallQueueProtocol; + import org.apache.hadoop.ipc.protocolPB.RefreshCallQueueProtocolPB; + import org.apache.hadoop.ipc.protocolPB.RefreshCallQueueProtocolClientSideTranslatorPB; + private static RefreshCallQueueProtocol +

Process of files in mapreduce

2014-03-12 Thread Ranjini Rathinam
Hi, How to read a PDF file in mapreduce. Please provide sample code or sample link for refernce. thanks in advance. Ranjini

Re: Process of files in mapreduce

2014-03-12 Thread Stanley Shi
For reading PDF in java, you may refer to this link: http://stackoverflow.com/questions/4784825/how-to-read-pdf-files-using-java in mapreduce, you can use the same code; except that each map() function processes one file; Regards, *Stanley Shi,* On Wed, Mar 12, 2014 at 4:53 PM, Ranjini

unsubsrcibe

2014-03-12 Thread Mailing-List
unsubsrcibe

Re: Process of files in mapreduce

2014-03-12 Thread Mingjiang Shi
Hi Ranjini, What's your use case? How would you create key value pairs from the pdf? On Wed, Mar 12, 2014 at 4:53 PM, Ranjini Rathinam ranjinibe...@gmail.comwrote: Hi, How to read a PDF file in mapreduce. Please provide sample code or sample link for refernce. thanks in advance.

unsubsrcibe

2014-03-12 Thread Junpeng Wang
unsubsrcibe Junpeng WANG(王俊鹏) Mobile: +86.186.1819.5625 Skype: wangj...@hotmail.com -邮件原件- 发件人: Mailing-List [mailto:mailingl...@datenvandalismus.org] 发送时间: 2014年3月12日 18:16 收件人: user@hadoop.apache.org 主题: unsubsrcibe unsubsrcibe

Re: Process of files in mapreduce

2014-03-12 Thread Ranjini Rathinam
Hi , Need to convert the PDF file into Text file using mapreduce. Thanks for the support Ranjini On Wed, Mar 12, 2014 at 3:54 PM, Mingjiang Shi m...@gopivotal.com wrote: Hi Ranjini, What's your use case? How would you create key value pairs from the pdf? On Wed, Mar 12, 2014 at

RE: In hadoop All racks belongs to same subnet !

2014-03-12 Thread German Florez-Larrahondo
If you want to learn more about the networking on small to mid-size production sites, I recommend you do a search for “Hadoop reference architectures”. As an example, a good source of information is here: http://en.community.dell.com/techcenter/extras/m/white_papers/20353010.aspx (I’m not

Re: unsubsrcibe

2014-03-12 Thread sawan gupta
unsubsrcibe ME FROM LIST On Wednesday, 12 March 2014 5:28 PM, Junpeng Wang junpeng.w...@pactera.com wrote: unsubsrcibe Junpeng WANG(王俊鹏) Mobile: +86.186.1819.5625 Skype: wangj...@hotmail.com -邮件原件- 发件人: Mailing-List [mailto:mailingl...@datenvandalismus.org] 发送时间: 2014年3月12日

Re: unsubsrcibe

2014-03-12 Thread Devin Suiter RDX
You are: 1) Not unsubscribing correctly. From the welcome email you get when you subscribed - 'To remove your address from the list, send a message to: user-unsubscr...@hadoop.apache.org' 2) Spelling 'unsubscribe' incorrectly. When you send the 'unsubscribe' request to

Use Cases for Structured Data

2014-03-12 Thread ados1...@gmail.com
Hello Team, I am starting off on Hadoop eco-system and wanted to learn first based on my use case if Hadoop is right tool for me. I have only structured data and my goal is to safe this data into Hadoop and take benefit of replication factor. I am using Microsoft tools for doing analysis and it

Re: Use Cases for Structured Data

2014-03-12 Thread Shahab Yunus
I would suggest that given the level of details that you are looking for and fundamental nature of your questions, you should get hold of books or online documentation. Basically some reading/research. Latest edition of http://www.amazon.com/Hadoop-Definitive-Guide-Tom-White/dp/1449311520 is

Creating a new file in HDFS from a URL

2014-03-12 Thread Shayan Pooya
Hello, I'd like to create a new file in HDFS from a remote URL. Is there any way to do that with webhdfs? So instead of using the following command curl -i -X PUT -T FILE 'data_node_url' I like to use a url instead of the FILE. The input will be in another distributed filesystem (Disco's DFS)

Re: Use Cases for Structured Data

2014-03-12 Thread ados1...@gmail.com
Thank you Shahab but it would be really nice if I can get some input on my initial question as it would really help. On Wed, Mar 12, 2014 at 3:11 PM, Shahab Yunus shahab.yu...@gmail.comwrote: I would suggest that given the level of details that you are looking for and fundamental nature of

Re: Use Cases for Structured Data

2014-03-12 Thread Shahab Yunus
Assuming that the following is your initial questions: *My question here is how benefits YARN architecture give me in tems of analysis that my Microsoft, Netezza of Tableau products are not giving me. I am just trying to understand value of introducing Hadoop in my Architecture in terms of

Re: Use Cases for Structured Data

2014-03-12 Thread Dieter De Witte
Hi, 1) HDFS is just a file system, it hides the fact that it is distributed. 2) Mapreduce is the most lowlevel analytics tool I think, you can just specify an input and in your map and reduce function define some functionality to deal with this input. No need for HBase,... although they can be

Re: Use Cases for Structured Data

2014-03-12 Thread ados1...@gmail.com
Thanks D, that certainly answers my question. I was just taking quick look at Hortonworks HDP vs Hortonworks Sandbox, do you know of any benefits of using Sandbox as opposed to Hortonworks Data Platforms? On Wed, Mar 12, 2014 at 4:02 PM, Dieter De Witte drdwi...@gmail.com wrote: Hi, 1) HDFS

Re: Use Cases for Structured Data

2014-03-12 Thread ados1...@gmail.com
Hey D, Regarding your point 5: For a proof of concept I would use a ready-made virtual machine from one to 3 big vendors - cloudera, mapR and hortonworks I want to understand how this virtual setup would work and how much master and slaves nodes I can have in this virtual setup and in general

Re: GC overhead limit exceeded

2014-03-12 Thread haihong lu
Thanks a lot, the answer is helpful. On Wed, Mar 12, 2014 at 2:20 PM, divye sheth divs.sh...@gmail.com wrote: Hi Haihong, Please check out the link below, I believe it should solve your problem. http://stackoverflow.com/questions/21005643/container-is-running-beyond-memory-limits Thanks