You should consider writing a custom InputFormat which reads directly from
the database - while FileInputformat is the most common class for
InputFormat, the specification for InputFormat or what the critical method
getSplits does not require HDFS -
A custom version can return database entries as
This InputFormat reads a Fasta file (See below)
Format is a line starting >
plus N lines of Data
The projects in
https://code.google.com/p/distributed-tools/
Have other samples of more complex input formats
>YDR356W SPC110 SGDID:S02764, Chr IV from 1186099-1188933, Verified
ORF, "Inner plaq
es). So you can copy the
> dependencies to all hadoop nodes classpath (e.g., shared dir)
>
> Oleg
>
>
> On Fri, Apr 25, 2014 at 1:02 PM, Steve Lewis wrote:
>
>> so if I create a Hadoop jar file with referenced libraries in the lib
>> directory do I need to move it to
kou...@gmail.com> wrote:
> Yes, if you are running MR
>
>
> On Fri, Apr 25, 2014 at 12:48 PM, Steve Lewis wrote:
>
>> Thank you for your answer
>>
>> 1) I am using YARN
>> 2) So presumably dropping core-site.xml, yarn-site into user.dir works
>> do
om the actual cluster to the
> application classpath and then you can run it straight from IDE.
>
> Not a windows user so not sure about that second part of the question.
>
> Cheers
> Oleg
>
>
> On Fri, Apr 25, 2014 at 11:46 AM, Steve Lewis wrote:
>
>> Assume I
Assume I have a machine on the same network as a hadoop 2 cluster but
separate from it.
My understanding is that by setting certain elements of the config file or
local xml files to point to the cluster I can launch a job without having
to log into the cluster, move my jar to hdfs and start the jo
How about programmatically with my my code?
On Mon, Mar 31, 2014 at 9:09 AM, Zhijie Shen wrote:
> Run "hadoop version"
>
>
> On Mon, Mar 31, 2014 at 2:22 AM, Avinash Kujur wrote:
>
>> hi,
>>
>> how can i know my hadoop version which i have build in my system (apart
>> from the version which wa
To run Hadoop 2.0 you need to build WunUtil.exe and hadoop.dll
I am having problems building these and given that virtually ALL windows
work is on 64 bit windows see little reason why users cannot download these
- does enyone have these build and in a spot where they can be downloaded?
Under Hadoop 0.2 I was able to run a Hadoop from an external machine (say a
windows box with Cygwin) running on the same network as the cluster by
setting
"fs.default.name" in my Java code on the client machine
and little else in the config file
With 2.0 I want to do something similar launching a
Hydra is an application for doing tandem mass spectrometry searches for
proteomics
The search uses three Map-reduce jobs run in succession (the last simply
uses a single reducer to
create several output files.
http://www.biomedcentral.com/1471-2105/13/324/
Code at
https://code.google.com/p/hydr
out them in a lib directory in the jar you pass to Hadoop and they will be
found
On Mon, Dec 9, 2013 at 12:58 PM, Yexi Jiang wrote:
> Hi, All,
>
> I am working on a project that requires to execute a hadoop job remotely
> and the job requires some third-part libraries (jar files).
>
> Based on
With files that small it is much better to write a custom input format
which checks the entire file and only passes records from good files. If
you need Hadoop you are probably processing a large number of these files
and an input format could easily read the entire file and handle it if it
as as s
Assuming that the server can handle high volume and multiple queries there
is no reason not to run it on a large and powerful machine outside the
cluster. Nothing prevents your mappers from accessing a server or even,
depending on the design, a custom InputFormat from pulling data from the
server.
A few basic questions -
1) is the rate limiting step the Java processing or storage in accumulo.
Hadoop may not be able to speed up a database which is not designed to work
in a distributed manner.
B) Can ObjectD or any intermediate objects be serialized possibly to xml
and efficiently deseriali
I presume a single file is handled by one and only one mapper. in that case
you can pass the path as a string and do something like this
public void map(Object key, Text value, Context context
) throws IOException, InterruptedException {
String hdfspath = value.toString();
15 matches
Mail list logo