Re: Is it wrong to bypass HDFS?

2014-11-09 Thread Steve Lewis
You should consider writing a custom InputFormat which reads directly from the database - while FileInputformat is the most common class for InputFormat, the specification for InputFormat or what the critical method getSplits does not require HDFS - A custom version can return database entries as

Re: Need some help with RecordReader

2014-10-28 Thread Steve Lewis
This InputFormat reads a Fasta file (See below) Format is a line starting > plus N lines of Data The projects in https://code.google.com/p/distributed-tools/ Have other samples of more complex input formats >YDR356W SPC110 SGDID:S02764, Chr IV from 1186099-1188933, Verified ORF, "Inner plaq

Re: What configuration parameters cause a Hadoop 2.x job to run on the cluster

2014-04-25 Thread Steve Lewis
es). So you can copy the > dependencies to all hadoop nodes classpath (e.g., shared dir) > > Oleg > > > On Fri, Apr 25, 2014 at 1:02 PM, Steve Lewis wrote: > >> so if I create a Hadoop jar file with referenced libraries in the lib >> directory do I need to move it to

Re: What configuration parameters cause a Hadoop 2.x job to run on the cluster

2014-04-25 Thread Steve Lewis
kou...@gmail.com> wrote: > Yes, if you are running MR > > > On Fri, Apr 25, 2014 at 12:48 PM, Steve Lewis wrote: > >> Thank you for your answer >> >> 1) I am using YARN >> 2) So presumably dropping core-site.xml, yarn-site into user.dir works >> do

Re: What configuration parameters cause a Hadoop 2.x job to run on the cluster

2014-04-25 Thread Steve Lewis
om the actual cluster to the > application classpath and then you can run it straight from IDE. > > Not a windows user so not sure about that second part of the question. > > Cheers > Oleg > > > On Fri, Apr 25, 2014 at 11:46 AM, Steve Lewis wrote: > >> Assume I

What configuration parameters cause a Hadoop 2.x job to run on the cluster

2014-04-25 Thread Steve Lewis
Assume I have a machine on the same network as a hadoop 2 cluster but separate from it. My understanding is that by setting certain elements of the config file or local xml files to point to the cluster I can launch a job without having to log into the cluster, move my jar to hdfs and start the jo

Re: hadoop version

2014-03-31 Thread Steve Lewis
How about programmatically with my my code? On Mon, Mar 31, 2014 at 9:09 AM, Zhijie Shen wrote: > Run "hadoop version" > > > On Mon, Mar 31, 2014 at 2:22 AM, Avinash Kujur wrote: > >> hi, >> >> how can i know my hadoop version which i have build in my system (apart >> from the version which wa

Does anyone have a downloadable version of the buildable files to run under windows

2014-03-14 Thread Steve Lewis
To run Hadoop 2.0 you need to build WunUtil.exe and hadoop.dll I am having problems building these and given that virtually ALL windows work is on 64 bit windows see little reason why users cannot download these - does enyone have these build and in a spot where they can be downloaded?

How do I programmatically run a Hadoop 2.0 job from a Hadoop Client outside the cluster

2014-03-10 Thread Steve Lewis
Under Hadoop 0.2 I was able to run a Hadoop from an external machine (say a windows box with Cygwin) running on the same network as the cluster by setting "fs.default.name" in my Java code on the client machine and little else in the config file With 2.0 I want to do something similar launching a

Re: 答复: Application of MapReduce

2013-12-26 Thread Steve Lewis
Hydra is an application for doing tandem mass spectrometry searches for proteomics The search uses three Map-reduce jobs run in succession (the last simply uses a single reducer to create several output files. http://www.biomedcentral.com/1471-2105/13/324/ Code at https://code.google.com/p/hydr

Re: Execute hadoop job remotely and programmatically

2013-12-09 Thread Steve Lewis
out them in a lib directory in the jar you pass to Hadoop and they will be found On Mon, Dec 9, 2013 at 12:58 PM, Yexi Jiang wrote: > Hi, All, > > I am working on a project that requires to execute a hadoop job remotely > and the job requires some third-part libraries (jar files). > > Based on

Re: How to process only input files containing 100% valid rows

2013-04-18 Thread Steve Lewis
With files that small it is much better to write a custom input format which checks the entire file and only passes records from good files. If you need Hadoop you are probably processing a large number of these files and an input format could easily read the entire file and handle it if it as as s

Re: Querying a Prolog Server from a JVM during a MapReduce Job

2013-04-16 Thread Steve Lewis
Assuming that the server can handle high volume and multiple queries there is no reason not to run it on a large and powerful machine outside the cluster. Nothing prevents your mappers from accessing a server or even, depending on the design, a custom InputFormat from pulling data from the server.

Re: Java jars and MapReduce

2013-03-01 Thread Steve Lewis
A few basic questions - 1) is the rate limiting step the Java processing or storage in accumulo. Hadoop may not be able to speed up a database which is not designed to work in a distributed manner. B) Can ObjectD or any intermediate objects be serialized possibly to xml and efficiently deseriali

Re: Using Hadoop infrastructure with input streams instead of key/value input

2012-12-03 Thread Steve Lewis
I presume a single file is handled by one and only one mapper. in that case you can pass the path as a string and do something like this public void map(Object key, Text value, Context context ) throws IOException, InterruptedException { String hdfspath = value.toString();