Yeah, maybe I should have said "right or outer join". What I wanted to make clear is that if you want to identify non-matches in the large (fragment, or left side) you can still use fragment-replicate join. If you want to identify non-matches in the small (replicate, or right side) you cannot.
Alan. On Jan 30, 2012, at 6:09 AM, Vincent Barat wrote: > I understand you point and it makes sense. > > The graph in Alan's book says that if you "outer join on the small input" you > should not use replicated join. > > Maybe this sentence is not clear enough :) > > > Le 28/01/12 00:21, Alex Rovner a écrit : >> From what I understand replicated should not be used with full outer join >> since full outer means both tables records will be in the output regardless >> if they exist in the joined table. In your case you only care about session >> which is left join and not a full outer. >> >> Reason for that is pigs and Hadoop schematics of the join: the "small" table >> is loaded into each mapper and thus is not meant to be used solely in the >> output. >> >> Alex >> >> Sent from my iPhone >> >> On Jan 27, 2012, at 8:15 AM, Vincent Barat<vincent.ba...@gmail.com> wrote: >> >>> Hi folks, >>> >>> I use replicated joins, and recently I encountered an issue : my rightmost >>> relation seems to become too big and, even if I don't get any "Java heap >>> space" the time it take to finish the maps become exponentially long (I >>> cannot figure why exactly). >>> >>> Removing "replicated" fix the issue, but several questions raise. >>> >>> In Alan's book " *Figure 8.1. Choosing a Join Implementation " it is said >>> that replicated joins should NOT BE USED for outer joins. >>> >>> *Nevertheless, it seems to work in the following case, and is faster than >>> regular joins. So why ? >>> >>> sessions = JOIN sessions BY locid LEFT, locations BY locid USING >>> 'replicated'; >>> >>> (not all sessions have a location in this case) >>> >>> Thanks for your advices. >>> >>> >>> >>> > > -- > > *Vincent BARAT, UBIKOD, CTO* > > > vba...@ubikod.com <mailto:vba...@ubikod.com> Mob +33 (0)6 15 41 15 18 > > UBIKOD Paris, c/o ESSEC VENTURES, Avenue Bernard Hirsch, 95021 Cergy-Pontoise > cedex, FRANCE, Tel +33 (0)1 34 43 28 89 > > UBIKOD Rennes, 10 rue Duhamel, 35000 Rennes, FRANCE, Tel. +33 (0)2 99 65 69 13 > > > www.ubikod.com <http://www.ubikod.com/>@ubikod <http://twitter.com/ubikod> > > www.capptain.com <http://www.capptain.com/>@capptain_hq > <http://twitter.com/capptain_hq> > > > IMPORTANT NOTICE -- UBIKOD and CAPPTAIN are registered trademarks of UBIKOD > S.A.R.L., all copyrights are reserved. The contents of this email and > attachments are confidential and may be subject to legal privilege and/or > protected by copyright. Copying or communicating any part of it to others is > prohibited and may be unlawful. If you are not the intended recipient you > must not use, copy, distribute or rely on this email and should please return > it immediately or notify us by telephone. At present the integrity of email > across the Internet cannot be guaranteed. Therefore UBIKOD S.A.R.L. will not > accept liability for any claims arising as a result of the use of this medium > for transmissions by or to UBIKOD S.A.R.L.. UBIKOD S.A.R.L. may exercise any > of its rights under relevant law, to monitor the content of all electronic > communications. You should therefore be aware that this communication and any > responses might have been monitored, and may be accessed by UBIKOD S.A.R.L. > The views expressed in this document are that of the individual and may not > necessarily constitute or imply its endorsement or recommendation by UBIKOD > S.A.R.L. The content of this electronic mail may be subject to the > confidentiality terms of a "Non-Disclosure Agreement" (NDA). >