Well, when I said I found a solution this link was one of them :). Even
though I set :
dfs.block.size = mapred.min.split.size = mapred.max.split.size = 14MB the
job is still running maps with 64MB !
I don't see what else can I change :(
Thanks,
Mark
On Fri, Oct 26, 2012 at 2:23 PM, Bertrand
I think these methods should are idempotent, these methods should be repeated
calls to be harmless by same client.
Thanks,
LiuLei
Create cannot be idempotent because of the problem of watches and
sequential files.
Similarly, mkdirs, rename and delete cannot generally be idempotent. In
particular applications, you might find it is OK to treat them as such, but
there are definitely applications where they are not idempotent.
I just completed the Cloudera Developer course last week and would
highly recommend it. I have not taken the test yet, but the
instructor will point out many topics that are included in the test.
For resources, be sure to make use of the Cloudera University,
I need a unique permanent ID assigned to new item encountered, which has a
constraint that it is in the range of, let's say for simple discussion, one
to one million.
I suppose I could assign a range of usable IDs to each reduce task (where
ID's are assigned) and keep those organized somehow
Twitter's Snowflake may provide you with some inspiration:
https://github.com/twitter/snowflake
-Michael
On Oct 28, 2012, at 9:16 PM, David Parks davidpark...@yahoo.com wrote:
I need a unique permanent ID assigned to new item encountered, which has
a constraint that it is in the range of,
Thanks Ted for your reply.
What is the the problem of watches and sequential files? If you can
describe in detail, I can better understand the problem.
2012/10/29 Ted Dunning tdunn...@maprtech.com
Create cannot be idempotent because of the problem of watches and
sequential files.
On Sun, Oct 28, 2012 at 9:15 PM, David Parks davidpark...@yahoo.com wrote:
I need a unique permanent ID assigned to new item encountered, which has
a constraint that it is in the range of, let’s say for simple discussion,
one to one million.
Having such a limited range may require that you
Create cannot be idempotent with sequential files. Doing the same create
twice creates two different files.
On Sun, Oct 28, 2012 at 10:25 PM, lei liu liulei...@gmail.com wrote:
Thanks Ted for your reply.
What is the the problem of watches and sequential files? If you can
describe in