The colon is a reserved character in a URI according to RFC 3986[1].
You should be able to percent encode those colons as %3A.
[1] http://tools.ietf.org/html/rfc3986
On Wed, Dec 26, 2012 at 1:00 PM, Mohit Anchlia wrote:
> It looks like hadoop fs -put command doesn't like ":" in the file names.
It looks like hadoop fs -put command doesn't like ":" in the file names. Is
there a way I can escape it?
hadoop fs -put /home/mapr/p/hjob.2012:12:26:11.0.dat
/user/apuser/temp-qdc/scratch/merge_jobs
put: java.net.URISyntaxException: Relative path in absolute URI:
hjob.2012:12:26:11.0.dat
Hi Harsh,
Fixed it. I was putting the -Dmapred.map.tasks=20 after specifying the
input directory. I completely forgot about this trick of
genericOptionParser of Hadoop. Thanks a lot. :)
On Wed, Dec 26, 2012 at 10:33 AM, Harsh J wrote:
> The MR1 teragen's mappers # depends on the total number of
The MR1 teragen's mappers # depends on the total number of rows and
demanded # of maps.
How are you passing -Dmapred.map.tasks=20 (no spaces) exactly? All
generic options must go in before any other options do, so it should
appear right after the word "teragen" in your command.
On Wed, Dec 26, 20
Hi Jamal,
A missing semi-colon should get flagged by the Java compiler, but one way to
keep you debug cycles short is to (1) use local mode and (2) small data sets
which you can run through under a minute. Once you are happy that your stuff
works, move to distributed and target data sets.
HTH
I actually have this exact same error. After running my namenode for
awhile (with a snn), it gets to a point where the snn starts crashing and
if I try to restart the NN I will get this problem. I typically wind up
having to go with a much older copy of the image and edits files in order
to get i
For this I need to know where an inputsplit is located. And where a join
is computed. How can I do this programmatically ?
This isn't called 'shuffle' (but rather a plain remote read) so your
original question was confusing, thanks for clarifying!
In that case, you could count the bytes coming i
How much disk space to meet the needs of hadoop.tmp.dir?
thx
2012/12/26 Harsh J
> The hadoop.tmp.dir is a local directory, usually defaulting to under
> /tmp/ and is thereby limited by that mount's space, not the HDFS
> space.
>
> On Wed, Dec 26, 2012 at 1:25 PM, centerqi hu wrote:
> > hi all
Hi,
Sorry for having been ambiguous. For (1) I meant a large block (if the
block size is large). For (2) I meant multiple, concurrent threads.
On Wed, Dec 26, 2012 at 5:36 PM, Lin Ma wrote:
> Thanks Harsh,
>
> For long read, you mean read a large continuous part of a file, other than a
> small c
Thanks Harsh,
1. For long read, you mean read a large continuous part of a file, other
than a small chunk of a file?
2. "gradually decreasing performance for long reads" -- you mean
parallel multiple threads long read degrade performance? Or single thread
exclusive long read degrade
This isn't called 'shuffle' (but rather a plain remote read) so your
original question was confusing, thanks for clarifying!
In that case, you could count the bytes coming in from the required
record reader - for example a TextRecordReader uses a Long key that
denotes current offset in file, which
Hi Lin,
It is comparable (and is also logically similar) to reading a file
multiple times in parallel in a local filesystem - not too much of a
performance hit for small reads (by virtue of OS caches, and quick
completion per read, as is usually the case for distributed cache
files), and gradually
Hi,
I mean TO the mappers. I'm using the CompositeInputFormat for my
application to compute map-side joins.
I want to join two datasets A and B one is stored on node 1 and the
other one on node 2.
For example if the join will be computed on node 2 then the inputsplit
of the dataset which is st
Thanks Harsh, multiple concurrent read is generally faster or?
regards,
Lin
On Wed, Dec 26, 2012 at 6:21 PM, Harsh J wrote:
> There is no limitation in HDFS that limits reads of a block to a
> single client at a time (no reason to do so) - so downloads can be as
> concurrent as possible.
>
> On
There is no limitation in HDFS that limits reads of a block to a
single client at a time (no reason to do so) - so downloads can be as
concurrent as possible.
On Wed, Dec 26, 2012 at 3:41 PM, Lin Ma wrote:
> Thanks Harsh,
>
> Supposing DistributedCache is uploaded by client, for each replica, in
Thanks Harsh,
Supposing DistributedCache is uploaded by client, for each replica, in
Hadoop design, it could only serve one download session (download from a
mapper or a reducer which requires the DistributedCache) at a time until
DistributedCache file download is completed, or it could serve mult
Hi Lin,
DistributedCache files are stored onto the HDFS by the client first.
The TaskTrackers download and localize it. Therefore, as with any
other file on HDFS, "downloads" can be efficiently parallel with
higher replicas.
The point of having higher replication for these files is also tied to
t
Hi,
What do you mean by "shuffled bytes [to] the mappers"? If you mean
"from", it is "Reduce shuffle bytes" you look for; otherwise, you may
be looking for the per-map counter of "Map output bytes".
Per-partition counters can be constructed on the user side if needed,
by pre-computing the partiti
The hadoop.tmp.dir is a local directory, usually defaulting to under
/tmp/ and is thereby limited by that mount's space, not the HDFS
space.
On Wed, Dec 26, 2012 at 1:25 PM, centerqi hu wrote:
> hi all
> I encountered trouble
>
> Message: org.apache.hadoop.ipc.RemoteException: java.io.IOExceptio
For Java MR jobs, there is Apache MRUnit that provides a good way of
writing test cases. See http://mrunit.apache.org
On Wed, Dec 26, 2012 at 7:26 AM, jamal sasha wrote:
> Hi,
> I have been using python hadoop streaming framework to write the code and
> now I am slowly moving towards the core j
Do you have mounted drives on the disk like JBOD setup where you have
allocated few drives to hdfs?
check df -h on all the nodes you may get the mount which holds the logs or
any other information which is outside dfs may be full
On Wed, Dec 26, 2012 at 1:25 PM, centerqi hu wrote:
> hi all
>
21 matches
Mail list logo