Hey Boris, Sorry to say that the situation is still the same.
-Todd On Wed, Apr 24, 2019 at 9:02 AM Boris Tyukin <bo...@boristyukin.com> wrote: > sorry to revive the old thread but curious if there is a better solution 1 > year after...We have a few small tables (under 300k rows) which are > practically used with every single query and to make things worse joined > more than once in the same query. > > Is there a way to replicate this table on every node to improve > performance and avoid broadcasting this table every time? > > On Mon, Jul 23, 2018 at 10:52 AM Todd Lipcon <t...@cloudera.com> wrote: > >> >> >> On Mon, Jul 23, 2018, 7:21 AM Boris Tyukin <bo...@boristyukin.com> wrote: >> >>> Hi Todd, >>> >>> Are you saying that your earlier comment below is not longer valid with >>> Impala 2.11 and if I replicate a table to all our Kudu nodes Impala can >>> benefit from this? >>> >> >> No, the earlier comment is still valid. Just saying that in some cases >> exchange can be faster in the new Impala version. >> >> >>> " >>> *It's worth noting that, even if your table is replicated, Impala's >>> planner is unaware of this fact and it will give the same plan regardless. >>> That is to say, rather than every node scanning its local copy, instead a >>> single node will perform the whole scan (assuming it's a small table) and >>> broadcast it from there within the scope of a single query. So, I don't >>> think you'll see any performance improvements on Impala queries by >>> attempting something like an extremely high replication count.* >>> >>> *I could see bumping the replication count to 5 for these tables since >>> the extra storage cost is low and it will ensure higher availability of the >>> important central tables, but I'd be surprised if there is any measurable >>> perf impact.* >>> " >>> >>> On Mon, Jul 23, 2018 at 9:46 AM Todd Lipcon <t...@cloudera.com> wrote: >>> >>>> Are you on the latest release of Impala? It switched from using Thrift >>>> for RPC to a new implementation (actually borrowed from kudu) which might >>>> help broadcast performance a bit. >>>> >>>> Todd >>>> >>>> On Mon, Jul 23, 2018, 6:43 AM Boris Tyukin <bo...@boristyukin.com> >>>> wrote: >>>> >>>>> sorry to revive the old thread but I am curious if there is a good way >>>>> to speed up requests to frequently used tables in Kudu. >>>>> >>>>> On Thu, Apr 12, 2018 at 8:19 AM Boris Tyukin <bo...@boristyukin.com> >>>>> wrote: >>>>> >>>>>> bummer..After reading your guys conversation, I wish there was an >>>>>> easier way...we will have the same issue as we have a few dozens of >>>>>> tables >>>>>> which are used very frequently in joins and I was hoping there was an >>>>>> easy >>>>>> way to replicate them on most of the nodes to avoid broadcasts every time >>>>>> >>>>>> On Thu, Apr 12, 2018 at 7:26 AM, Clifford Resnick < >>>>>> cresn...@mediamath.com> wrote: >>>>>> >>>>>>> The table in our case is 12x hashed and ranged by month, so the >>>>>>> broadcasts were often to all (12) nodes. >>>>>>> >>>>>>> On Apr 12, 2018 12:58 AM, Mauricio Aristizabal <mauri...@impact.com> >>>>>>> wrote: >>>>>>> Sorry I left that out Cliff, FWIW it does seem to have been >>>>>>> broadcast.. >>>>>>> >>>>>>> >>>>>>> >>>>>>> Not sure though how a shuffle would be much different from a >>>>>>> broadcast if entire table is 1 file/block in 1 node. >>>>>>> >>>>>>> On Wed, Apr 11, 2018 at 8:52 PM, Cliff Resnick <cre...@gmail.com> >>>>>>> wrote: >>>>>>> >>>>>>>> From the screenshot it does not look like there was a broadcast of >>>>>>>> the dimension table(s), so it could be the case here that the multiple >>>>>>>> smaller sends helps. Our dim tables are generally in the single-digit >>>>>>>> millions and Impala chooses to broadcast them. Since the fact result >>>>>>>> cardinality is always much smaller, we've found that forcing a >>>>>>>> [shuffle] >>>>>>>> dimension join is actually faster since it only sends dims once rather >>>>>>>> than >>>>>>>> all to all nodes. The degenerative performance of broadcast is >>>>>>>> especially >>>>>>>> obvious when the query returns zero results. I don't have much >>>>>>>> experience >>>>>>>> here, but it does seem that Kudu's efficient predicate scans can >>>>>>>> sometimes >>>>>>>> "break" Impala's query plan. >>>>>>>> >>>>>>>> -Cliff >>>>>>>> >>>>>>>> On Wed, Apr 11, 2018 at 5:41 PM, Mauricio Aristizabal < >>>>>>>> mauri...@impact.com> wrote: >>>>>>>> >>>>>>>>> @Todd not to belabor the point, but when I suggested breaking up >>>>>>>>> small dim tables into multiple parquet files (and in this thread's >>>>>>>>> context >>>>>>>>> perhaps partition kudu table, even if small, into multiple tablets), >>>>>>>>> it was >>>>>>>>> to speed up joins/exchanges, not to parallelize the scan. >>>>>>>>> >>>>>>>>> For example recently we ran into this slow query where the 14M >>>>>>>>> record dimension fit into a single file & block, so it got scanned on >>>>>>>>> a >>>>>>>>> single node though still pretty quickly (300ms), however it caused >>>>>>>>> the join >>>>>>>>> to take 25+ seconds and bogged down the entire query. See highlighted >>>>>>>>> fragment and its parent. >>>>>>>>> >>>>>>>>> So we broke it into several small files the way I described in my >>>>>>>>> previous post, and now join and query are fast (6s). >>>>>>>>> >>>>>>>>> -m >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> On Fri, Mar 16, 2018 at 3:55 PM, Todd Lipcon <t...@cloudera.com> >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>>> I suppose in the case that the dimension table scan makes a >>>>>>>>>> non-trivial portion of your workload time, then yea, parallelizing >>>>>>>>>> the scan >>>>>>>>>> as you suggest would be beneficial. That said, in typical analytic >>>>>>>>>> queries, >>>>>>>>>> scanning the dimension tables is very quick compared to scanning the >>>>>>>>>> much-larger fact tables, so the extra parallelism on the dim table >>>>>>>>>> scan >>>>>>>>>> isn't worth too much. >>>>>>>>>> >>>>>>>>>> -Todd >>>>>>>>>> >>>>>>>>>> On Fri, Mar 16, 2018 at 2:56 PM, Mauricio Aristizabal < >>>>>>>>>> mauri...@impactradius.com> wrote: >>>>>>>>>> >>>>>>>>>>> @Todd I know working with parquet in the past I've seen small >>>>>>>>>>> dimensions that fit in 1 single file/block limit parallelism of >>>>>>>>>>> join/exchange/aggregation nodes, and I've forced those dims to >>>>>>>>>>> spread >>>>>>>>>>> across 20 or so blocks by leveraging SET PARQUET_FILE_SIZE=8m; or >>>>>>>>>>> similar >>>>>>>>>>> when doing INSERT OVERWRITE to load them, which then allows these >>>>>>>>>>> operations to parallelize across that many nodes. >>>>>>>>>>> >>>>>>>>>>> Wouldn't it be useful here for Cliff's small dims to be >>>>>>>>>>> partitioned into a couple tablets to similarly improve parallelism? >>>>>>>>>>> >>>>>>>>>>> -m >>>>>>>>>>> >>>>>>>>>>> On Fri, Mar 16, 2018 at 2:29 PM, Todd Lipcon <t...@cloudera.com> >>>>>>>>>>> wrote: >>>>>>>>>>> >>>>>>>>>>>> On Fri, Mar 16, 2018 at 2:19 PM, Cliff Resnick < >>>>>>>>>>>> cre...@gmail.com> wrote: >>>>>>>>>>>> >>>>>>>>>>>>> Hey Todd, >>>>>>>>>>>>> >>>>>>>>>>>>> Thanks for that explanation, as well as all the great work >>>>>>>>>>>>> you're doing -- it's much appreciated! I just have one last >>>>>>>>>>>>> follow-up >>>>>>>>>>>>> question. Reading about BROADCAST operations (Kudu, Spark, Flink, >>>>>>>>>>>>> etc. ) it >>>>>>>>>>>>> seems the smaller table is always copied in its entirety BEFORE >>>>>>>>>>>>> the >>>>>>>>>>>>> predicate is evaluated. >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> That's not quite true. If you have a predicate on a joined >>>>>>>>>>>> column, or on one of the columns in the joined table, it will be >>>>>>>>>>>> pushed >>>>>>>>>>>> down to the "scan" operator, which happens before the "exchange". >>>>>>>>>>>> In >>>>>>>>>>>> addition, there is a feature called "runtime filters" that can push >>>>>>>>>>>> dynamically-generated filters from one side of the exchange to the >>>>>>>>>>>> other. >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>> But since the Kudu client provides a serialized scanner as >>>>>>>>>>>>> part of the ScanToken API, why wouldn't Impala use that instead >>>>>>>>>>>>> if it knows >>>>>>>>>>>>> that the table is Kudu and the query has any type of predicate? >>>>>>>>>>>>> Perhaps if >>>>>>>>>>>>> I hash-partition the table I could maybe force this (because that >>>>>>>>>>>>> complicates a BROADCAST)? I guess this is really a question for >>>>>>>>>>>>> Impala but >>>>>>>>>>>>> perhaps there is a more basic reason. >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> Impala could definitely be smarter, just a matter of >>>>>>>>>>>> programming Kudu-specific join strategies into the optimizer. >>>>>>>>>>>> Today, the >>>>>>>>>>>> optimizer isn't aware of the unique properties of Kudu scans vs >>>>>>>>>>>> other >>>>>>>>>>>> storage mechanisms. >>>>>>>>>>>> >>>>>>>>>>>> -Todd >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> -Cliff >>>>>>>>>>>>> >>>>>>>>>>>>> On Fri, Mar 16, 2018 at 4:10 PM, Todd Lipcon < >>>>>>>>>>>>> t...@cloudera.com> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>>> On Fri, Mar 16, 2018 at 12:30 PM, Clifford Resnick < >>>>>>>>>>>>>> cresn...@mediamath.com> wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>>> I thought I had read that the Kudu client can configure a >>>>>>>>>>>>>>> scan for CLOSEST_REPLICA and assumed this was a way to take >>>>>>>>>>>>>>> advantage of >>>>>>>>>>>>>>> data collocation. >>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> Yea, when a client uses CLOSEST_REPLICA it will read a local >>>>>>>>>>>>>> one if available. However, that doesn't influence the higher >>>>>>>>>>>>>> level >>>>>>>>>>>>>> operation of the Impala (or Spark) planner. The planner isn't >>>>>>>>>>>>>> aware of the >>>>>>>>>>>>>> replication policy, so it will use one of the existing supported >>>>>>>>>>>>>> JOIN >>>>>>>>>>>>>> strategies. Given statistics, it will choose to broadcast the >>>>>>>>>>>>>> small table, >>>>>>>>>>>>>> which means that it will create a plan that looks like: >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> +-------------------------+ >>>>>>>>>>>>>> | | >>>>>>>>>>>>>> +---------->build JOIN | >>>>>>>>>>>>>> | | | >>>>>>>>>>>>>> | | probe | >>>>>>>>>>>>>> +--------------+ +-------------------------+ >>>>>>>>>>>>>> | | | >>>>>>>>>>>>>> | Exchange | | >>>>>>>>>>>>>> +----+ (broadcast | | >>>>>>>>>>>>>> | | | | >>>>>>>>>>>>>> | +--------------+ | >>>>>>>>>>>>>> | | >>>>>>>>>>>>>> +---------+ | >>>>>>>>>>>>>> | | >>>>>>>>>>>>>> +-----------------------+ >>>>>>>>>>>>>> | SCAN | | >>>>>>>>>>>>>> | >>>>>>>>>>>>>> | KUDU | | SCAN (other >>>>>>>>>>>>>> side) | >>>>>>>>>>>>>> | | | >>>>>>>>>>>>>> | >>>>>>>>>>>>>> +---------+ >>>>>>>>>>>>>> +-----------------------+ >>>>>>>>>>>>>> >>>>>>>>>>>>>> (hopefully the ASCII art comes through) >>>>>>>>>>>>>> >>>>>>>>>>>>>> In other words, the "scan kudu" operator scans the table >>>>>>>>>>>>>> once, and then replicates the results of that scan into the JOIN >>>>>>>>>>>>>> operator. >>>>>>>>>>>>>> The "scan kudu" operator of course will read its local copy, but >>>>>>>>>>>>>> it will >>>>>>>>>>>>>> still go through the exchange process. >>>>>>>>>>>>>> >>>>>>>>>>>>>> For the use case you're talking about, where the join is just >>>>>>>>>>>>>> looking up a single row by PK in a dimension table, ideally we'd >>>>>>>>>>>>>> be using >>>>>>>>>>>>>> an altogether different join strategy such as nested-loop join, >>>>>>>>>>>>>> with the >>>>>>>>>>>>>> inner "loop" actually being a Kudu PK lookup, but that strategy >>>>>>>>>>>>>> isn't >>>>>>>>>>>>>> implemented by Impala. >>>>>>>>>>>>>> >>>>>>>>>>>>>> -Todd >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>>> If this exists then how far out of context is my >>>>>>>>>>>>>>> understanding of it? Reading about HDFS cache replication, I do >>>>>>>>>>>>>>> know that >>>>>>>>>>>>>>> Impala will choose a random replica there to more evenly >>>>>>>>>>>>>>> distribute load. >>>>>>>>>>>>>>> But especially compared to Kudu upsert, managing mutable data >>>>>>>>>>>>>>> using Parquet >>>>>>>>>>>>>>> is painful. So, perhaps to sum thing up, if nearly 100% of my >>>>>>>>>>>>>>> metadata scan >>>>>>>>>>>>>>> are single Primary Key lookups followed by a tiny broadcast >>>>>>>>>>>>>>> then am I >>>>>>>>>>>>>>> really just splitting hairs performance-wise between Kudu and >>>>>>>>>>>>>>> HDFS-cached >>>>>>>>>>>>>>> parquet? >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> From: Todd Lipcon <t...@cloudera.com> >>>>>>>>>>>>>>> Reply-To: "user@kudu.apache.org" <user@kudu.apache.org> >>>>>>>>>>>>>>> Date: Friday, March 16, 2018 at 2:51 PM >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> To: "user@kudu.apache.org" <user@kudu.apache.org> >>>>>>>>>>>>>>> Subject: Re: "broadcast" tablet replication for kudu? >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> It's worth noting that, even if your table is replicated, >>>>>>>>>>>>>>> Impala's planner is unaware of this fact and it will give the >>>>>>>>>>>>>>> same plan >>>>>>>>>>>>>>> regardless. That is to say, rather than every node scanning its >>>>>>>>>>>>>>> local copy, >>>>>>>>>>>>>>> instead a single node will perform the whole scan (assuming >>>>>>>>>>>>>>> it's a small >>>>>>>>>>>>>>> table) and broadcast it from there within the scope of a single >>>>>>>>>>>>>>> query. So, >>>>>>>>>>>>>>> I don't think you'll see any performance improvements on Impala >>>>>>>>>>>>>>> queries by >>>>>>>>>>>>>>> attempting something like an extremely high replication count. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> I could see bumping the replication count to 5 for these >>>>>>>>>>>>>>> tables since the extra storage cost is low and it will ensure >>>>>>>>>>>>>>> higher >>>>>>>>>>>>>>> availability of the important central tables, but I'd be >>>>>>>>>>>>>>> surprised if there >>>>>>>>>>>>>>> is any measurable perf impact. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> -Todd >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> On Fri, Mar 16, 2018 at 11:35 AM, Clifford Resnick < >>>>>>>>>>>>>>> cresn...@mediamath.com> wrote: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Thanks for that, glad I was wrong there! Aside from >>>>>>>>>>>>>>>> replication considerations, is it also recommended the number >>>>>>>>>>>>>>>> of tablet >>>>>>>>>>>>>>>> servers be odd? >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> I will check forums as you suggested, but from what I read >>>>>>>>>>>>>>>> after searching is that Impala relies on user configured >>>>>>>>>>>>>>>> caching strategies >>>>>>>>>>>>>>>> using HDFS cache. The workload for these tables is very light >>>>>>>>>>>>>>>> write, maybe >>>>>>>>>>>>>>>> a dozen or so records per hour across 6 or 7 tables. The size >>>>>>>>>>>>>>>> of the tables >>>>>>>>>>>>>>>> ranges from thousands to low millions of rows so so >>>>>>>>>>>>>>>> sub-partitioning would >>>>>>>>>>>>>>>> not be required. So perhaps this is not a typical use-case but >>>>>>>>>>>>>>>> I think it >>>>>>>>>>>>>>>> could work quite well with kudu. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> From: Dan Burkert <danburk...@apache.org> >>>>>>>>>>>>>>>> Reply-To: "user@kudu.apache.org" <user@kudu.apache.org> >>>>>>>>>>>>>>>> Date: Friday, March 16, 2018 at 2:09 PM >>>>>>>>>>>>>>>> To: "user@kudu.apache.org" <user@kudu.apache.org> >>>>>>>>>>>>>>>> Subject: Re: "broadcast" tablet replication for kudu? >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> The replication count is the number of tablet servers which >>>>>>>>>>>>>>>> Kudu will host copies on. So if you set the replication level >>>>>>>>>>>>>>>> to 5, Kudu >>>>>>>>>>>>>>>> will put the data on 5 separate tablet servers. There's no >>>>>>>>>>>>>>>> built-in >>>>>>>>>>>>>>>> broadcast table feature; upping the replication factor is the >>>>>>>>>>>>>>>> closest >>>>>>>>>>>>>>>> thing. A couple of things to keep in mind: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> - Always use an odd replication count. This is important >>>>>>>>>>>>>>>> due to how the Raft algorithm works. Recent versions of Kudu >>>>>>>>>>>>>>>> won't even >>>>>>>>>>>>>>>> let you specify an even number without flipping some flags. >>>>>>>>>>>>>>>> - We don't test much much beyond 5 replicas. It *should* >>>>>>>>>>>>>>>> work, but you may run in to issues since it's a relatively rare >>>>>>>>>>>>>>>> configuration. With a heavy write workload and many replicas >>>>>>>>>>>>>>>> you are even >>>>>>>>>>>>>>>> more likely to encounter issues. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> It's also worth checking in an Impala forum whether it has >>>>>>>>>>>>>>>> features that make joins against small broadcast tables >>>>>>>>>>>>>>>> better? Perhaps >>>>>>>>>>>>>>>> Impala can cache small tables locally when doing joins. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> - Dan >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> On Fri, Mar 16, 2018 at 10:55 AM, Clifford Resnick < >>>>>>>>>>>>>>>> cresn...@mediamath.com> wrote: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> The problem is, AFIK, that replication count is not >>>>>>>>>>>>>>>>> necessarily the distribution count, so you can't guarantee >>>>>>>>>>>>>>>>> all tablet >>>>>>>>>>>>>>>>> servers will have a copy. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> On Mar 16, 2018 1:41 PM, Boris Tyukin < >>>>>>>>>>>>>>>>> bo...@boristyukin.com> wrote: >>>>>>>>>>>>>>>>> I'm new to Kudu but we are also going to use Impala mostly >>>>>>>>>>>>>>>>> with Kudu. We have a few tables that are small but used a >>>>>>>>>>>>>>>>> lot. My plan is >>>>>>>>>>>>>>>>> replicate them more than 3 times. When you create a kudu >>>>>>>>>>>>>>>>> table, you can >>>>>>>>>>>>>>>>> specify number of replicated copies (3 by default) and I >>>>>>>>>>>>>>>>> guess you can put >>>>>>>>>>>>>>>>> there a number, corresponding to your node count in cluster. >>>>>>>>>>>>>>>>> The downside, >>>>>>>>>>>>>>>>> you cannot change that number unless you recreate a table. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> On Fri, Mar 16, 2018 at 10:42 AM, Cliff Resnick < >>>>>>>>>>>>>>>>> cre...@gmail.com> wrote: >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> We will soon be moving our analytics from AWS Redshift to >>>>>>>>>>>>>>>>>> Impala/Kudu. One Redshift feature that we will miss is its >>>>>>>>>>>>>>>>>> ALL >>>>>>>>>>>>>>>>>> Distribution, where a copy of a table is maintained on each >>>>>>>>>>>>>>>>>> server. We >>>>>>>>>>>>>>>>>> define a number of metadata tables this way since they are >>>>>>>>>>>>>>>>>> used in nearly >>>>>>>>>>>>>>>>>> every query. We are considering using parquet in HDFS cache >>>>>>>>>>>>>>>>>> for these, and >>>>>>>>>>>>>>>>>> Kudu would be a much better fit for the update semantics but >>>>>>>>>>>>>>>>>> we are worried >>>>>>>>>>>>>>>>>> about the additional contention. I'm wondering if having a >>>>>>>>>>>>>>>>>> Broadcast, or >>>>>>>>>>>>>>>>>> ALL, tablet replication might be an easy feature to add to >>>>>>>>>>>>>>>>>> Kudu? >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> -Cliff >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> -- >>>>>>>>>>>>>>> Todd Lipcon >>>>>>>>>>>>>>> Software Engineer, Cloudera >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> -- >>>>>>>>>>>>>> Todd Lipcon >>>>>>>>>>>>>> Software Engineer, Cloudera >>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> -- >>>>>>>>>>>> Todd Lipcon >>>>>>>>>>>> Software Engineer, Cloudera >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> -- >>>>>>>>>>> *MAURICIO ARISTIZABAL* >>>>>>>>>>> Architect - Business Intelligence + Data Science >>>>>>>>>>> mauri...@impactradius.com(m)+1 323 309 4260 <(323)%20309-4260> >>>>>>>>>>> 223 E. De La Guerra St. | Santa Barbara, CA 93101 >>>>>>>>>>> <https://maps.google.com/?q=223+E.+De+La+Guerra+St.+%7C+Santa+Barbara,+CA+93101&entry=gmail&source=g> >>>>>>>>>>> >>>>>>>>>>> Overview <http://www.impactradius.com/?src=slsap> | Twitter >>>>>>>>>>> <https://twitter.com/impactradius> | Facebook >>>>>>>>>>> <https://www.facebook.com/pages/Impact-Radius/153376411365183> >>>>>>>>>>> | LinkedIn >>>>>>>>>>> <https://www.linkedin.com/company/impact-radius-inc-> >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> -- >>>>>>>>>> Todd Lipcon >>>>>>>>>> Software Engineer, Cloudera >>>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> -- >>>>>>>>> Mauricio Aristizabal >>>>>>>>> Architect - Data Pipeline >>>>>>>>> *M * 323 309 4260 >>>>>>>>> *E *mauri...@impact.com | *W * https://impact.com >>>>>>>>> <https://www.linkedin.com/company/608678/> >>>>>>>>> <https://www.facebook.com/ImpactMarTech/> >>>>>>>>> <https://twitter.com/impactmartech> >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> Mauricio Aristizabal >>>>>>> Architect - Data Pipeline >>>>>>> *M * 323 309 4260 >>>>>>> *E *mauri...@impact.com | *W * https://impact.com >>>>>>> <https://www.linkedin.com/company/608678/> >>>>>>> <https://www.facebook.com/ImpactMarTech/> >>>>>>> <https://twitter.com/impactmartech> >>>>>>> >>>>>> >>>>>> -- Todd Lipcon Software Engineer, Cloudera