Re: Processing small xml files

2012-02-17 Thread Srinivas Surasani
Hi Mohit, You can use Pig for processing XML files. PiggyBank has build in load function to load the XML files. Also you can specify pig.maxCombinedSplitSize and pig.splitCombination for efficient processing. On Sat, Feb 18, 2012 at 1:18 AM, Mohit Anchlia wrote: > On Tue, Feb 14, 2012 at 10:56

Re: Processing small xml files

2012-02-17 Thread Mohit Anchlia
On Tue, Feb 14, 2012 at 10:56 AM, W.P. McNeill wrote: > I'm not sure what you mean by "flat format" here. > > In my scenario, I have an file input.xml that looks like this. > > > > 1 > > > 2 > > > > input.xml is a plain text file. Not a sequence file. If I read it with the

Re: Addendum to Hypertable vs. HBase Performance Test (w/ mslab enabled)

2012-02-17 Thread Edward Capriolo
As your numbers show. Dataset SizeHypertable Queries/s HBase Queries/s Hypertable Latency (ms)HBase Latency (ms) 0.5 TB 3256.42 2969.52 157.221 172.351 5 TB2450.01 2066.52 208.972 247.680 Raw data goes up. Read performance g

Re: Addendum to Hypertable vs. HBase Performance Test (w/ mslab enabled)

2012-02-17 Thread Doug Judd
Hi Edward, The problem is that even if the workload is 5% write and 95% read, if you can't load the data, you need more machines. In the 167 billion insert test, HBase failed with *Concurrent mode failure* after 20% of the data was loaded. One of our customers has loaded 1/2 trillion records of

Re: Addendum to Hypertable vs. HBase Performance Test (w/ mslab enabled)

2012-02-17 Thread Edward Capriolo
I would almost agree with prospective. But their is a problem with 'java is slow' theory. The reason is that in a 100 percent write workload gc might be a factor. But in the real world people have to read data and read becomes disk bound as your data gets larger then memory. Unless C++ can make y

Re: location of HDFS directory on my Local system

2012-02-17 Thread Harsh J
To add onto Rohit's great response on HDFS interaction, and to clear your confusion here, HDFS does not exactly make itself available as a physical filesystem like say ext3/4 mount points. The usual way of interacting with them is by communicating with the services you run, via RPC calls (over a ne

Re: location of HDFS directory on my Local system

2012-02-17 Thread Rohit
Hi Sujit, I'd recommend you look at this tutorial on interacting with HDFS: http://developer.yahoo.com/hadoop/tutorial/module2.html#interacting When you create a folder '/foodir' in HDFS, that folder will be in the top level of HDFS. If you were to create a directory 'foodir' (without the '/'),

Re: Hadoop Example in java

2012-02-17 Thread Owen O'Malley
On Fri, Feb 17, 2012 at 1:00 AM, vikas jain wrote: > > Hi All, > > I am looking for example in java for hadoop. I have done lots of search but > I have only found word count. Are there any other exapmple for the same. If you want to find them on the web, you can look in subversion: http://svn.ap

What determines task attempt list URLs?

2012-02-17 Thread Keith Wiley
What property or setup parameter determines the URLs displayed on the task attempts webpage of the job/task trackers? My cluster seems to be configured such that all URLs for higher pages (the top cluster admin page, the individual job overview page, and the map/reduce task list page) show URLs

Re: Building Hadoop UI

2012-02-17 Thread Fabio Pitzolu
Thanks Neil, I'll get some documentation for that. Can we maybe get in touch so you can explain me a little more of that integration? Fabio 2012/2/17 > We in our company have integrated Liferay which is an open source java > portal with hbase using its java api by creating an extension to its >

Re: Building Hadoop UI

2012-02-17 Thread neil . harwani
We in our company have integrated Liferay which is an open source java portal with hbase using its java api by creating an extension to its document library portlet to show content. We are indexing content separately on a solr index server where you can search and that content (image, docs, etc.

Re: Building Hadoop UI

2012-02-17 Thread fabio . pitzolu
Thanks Peter, I'll give Spring a try, also because there is also a .NET version of this framework, which is basically what my company is looking for. Fabio Pitzolu fabio.pitz...@gmail.com Il giorno 17/feb/2012, alle ore 15:31, Jamack, Peter ha scritto: > You could use something like Spring, b

Re: Building Hadoop UI

2012-02-17 Thread Jamack, Peter
You could use something like Spring, but you'll need to figure ways to connect and integrate and it'll be a homegrown solution. Peter Jamack On 2/17/12 5:52 AM, "fabio.pitz...@gmail.com" wrote: >Hello everyone, > >in order to provide our clients a custom UI for their MapReduce jobs and >HDFS fi

Building Hadoop UI

2012-02-17 Thread fabio . pitzolu
Hello everyone, in order to provide our clients a custom UI for their MapReduce jobs and HDFS files, what is the best solution to create a web-based UI for Hadoop? We are not going to use Cloudera HUE, we need something more user-friendly and shaped for our clients needs. Thanks, Fabio Pitzolu

Re: Hadoop Example in java

2012-02-17 Thread Harsh J
For more framework-provided examples, also take a look at your downloaded distributions' src/examples directory. I also suggest getting 'Hadoop: The Definitive Guide" by Tom White (O'Reilly) to get started with, it carries examples and all other information useful for using/deploying/developing wi

Hadoop Example in java

2012-02-17 Thread vikas jain
Hi All, I am looking for example in java for hadoop. I have done lots of search but I have only found word count. Are there any other exapmple for the same. -- View this message in context: http://old.nabble.com/Hadoop-Example-in-java-tp33341353p33341353.html Sent from the Hadoop core-user mai

Fwd: when run mrbench on pseudo distributed failed.....

2012-02-17 Thread Rock Ice
Hi guys, execute command : bin/hadoop jar build/hadoop-0.20.3-dev-test.jar mrbench -numRuns 5 error message: [hadoop@localhost hadoop]$ bin/hadoop jar build/hadoop-0.20.3-dev-test.jar mrbench -numRuns 5 MRBenchmark.0.0.2 12/02/17 16:45:11 INFO mapred.MRBench: creating control file: 1 numLines, A

when run mrbench on pseudo distributed failed.....

2012-02-17 Thread Rock Ice
Hi guys, execute command : bin/hadoop jar build/hadoop-0.20.3-dev-test.jar mrbench -numRuns 5 error message: [hadoop@localhost hadoop]$ bin/hadoop jar build/hadoop-0.20.3-dev-test.jar mrbench -numRuns 5 MRBenchmark.0.0.2 12/02/17 16:45:11 INFO mapred.MRBench: creating control file: 1 numLines, A