https://bugzilla.wikimedia.org/show_bug.cgi?id=65420

--- Comment #20 from Andrew Otto <o...@wikimedia.org> ---
Geez, sometimes my mail doesn't check often enough, I just got this.

Yes can look into it!  But, it'll have to wait til next week...:/

That query Oliver is running generates > 80,000 mappers.  Hive has some fancy
ways to do do sampling of data, but it doesn't work on external tables.  If we
get that sorted out, these types of queries should be more feasible.  We need
to get the data refining (aka ETL) phase up and going for that first.

Yes, there are almost certainly tweaks we can do to make Hadoop more efficient
for things like this, but I have yet to be convinced that there is actually a
memory problem on the datanodes themselves.  All of the OOMs that we've seen
were client side.  We brainstormed for a few minutes about this in standup
today.

Re: R OOMing connecting to analytics1027, I'd need to check, but that also
sounds like weird client side stuff.  analytics1027 is not a datanode.  You're
connecting to Hive there with R just like you do with the Hive CLI.

-- 
You are receiving this mail because:
You are the assignee for the bug.
You are on the CC list for the bug.
_______________________________________________
Wikibugs-l mailing list
Wikibugs-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l

Reply via email to