OK, so I think we have our candidates:
1) PostgreSQL
2) Cassandra
We can speak about this at our next tasking meeting.
If someone has more suggestions or comments, we've still a couple days
until then.
Thank you all!
Marcel
On Sat, Jun 13, 2015 at 11:37 AM, Joseph Allemandou <
jalleman...@wiki
Andrew, Toby, that makes perfect sense.
While thinking that the distributed aspect of Impala would handle high
availability issues, I very much understand that having a front-end system
relying on the analytics cluster is not as good as having a dedicated
storage solution.
Thanks for the good point
As someone who has run production serving systems on top of Hadoop, I think
this is risky. We've had substantial planned and unplanned downtime on the
cluster (which is to be expected) and it would be bad for a pageview API to
be impacted.
-Toby
On Fri, Jun 12, 2015 at 9:46 AM, Andrew Otto wrote
> I think we could add Impala in storage technologies to assess.
I think we don’t want to build the pageview API on top of the Analytics Cluster.
> On Jun 12, 2015, at 05:37, Joseph Allemandou
> wrote:
>
> I think we could add Impala in storage technologies to assess.
> It allows reading / c
I think we could add Impala in storage technologies to assess.
It allows reading / computing straight from HDFS and should be fast enough
for not too bad UEx.
Maybe ?
On Thu, Jun 11, 2015 at 11:11 PM, Marcel Ruiz Forns
wrote:
> This thread seems to have paused for 1 or 2 days now.
>
> So summar
This thread seems to have paused for 1 or 2 days now.
So summarizing, the following storage technologies have been mentioned:
- PostgreSQL
- MySQL
- Cassandra
- Voldemort
And the following concerns have been raised on using something that:
- We're already familiar with
- Permi
If we are going to completely denormalize the data sets for anonymization,
and we expect just slice and dice queries to the database,
I think we wouldn't take much advantage of a relational DB,
because it wouldn't need to aggregate values, slice or dice,
all slices and dices would be precomputed, r
On Tue, Jun 9, 2015 at 5:23 PM, Gabriel Wicke wrote:
> On Tue, Jun 9, 2015 at 11:53 AM, Dan Andreescu
> wrote:
>
>> Eric, I think we should allow arbitrary querying on any dimension for
>> that first data block. We could pre-aggregate all of those combinations
>> pretty easily since the dimensi
On Tue, Jun 9, 2015 at 11:53 AM, Dan Andreescu
wrote:
> Eric, I think we should allow arbitrary querying on any dimension for that
> first data block. We could pre-aggregate all of those combinations pretty
> easily since the dimensions have very low cardinality.
>
Are you thinking about someth
>
> Dan/Kevin: slightly OT, are we aware of any use case related to features
> that would be exposing PV data in production? I’ve seen mocks from the
> Discovery team with PV data embedded in articles or search interfaces and
> I’m not sure what their status is.
>
We have asks from people to expos
Thanks, Gabriel – this is super-helpful.
Dan/Kevin: slightly OT, are we aware of any use case related to features that
would be exposing PV data in production? I’ve seen mocks from the Discovery
team with PV data embedded in articles or search interfaces and I’m not sure
what their status is.
Eric, I think we should allow arbitrary querying on any dimension for that
first data block. We could pre-aggregate all of those combinations pretty
easily since the dimensions have very low cardinality. For the
article-level data, no, we'd want just basic timeseries querying.
Thanks Gabriel, if
I think Eric's original response got lost, so let me include it below:
>>> dim1, dim2, dim3, ..., dimN, val
>>>a, null, null, ..., null, 15// pv for dim1=a
>>>a, x, null, ..., null, 34// pv for dim1=a & dim2=x
>>>a, x, 1, ..., null, 27// pv f
@otto: I believe that the high throughput bulk input use case will be
difficult for a relational db (e.g. postgres) to handle. It will be
interesting to see how well cassandra can handle the queries that people
want to run. Tradeoffs...
@dario: RestBASE and Cassandra are definitely different thing
> p.s. I will never drink Bud Lite Lime. Like, never.
You have the wrong attitude. Pretend it is beer-soda, not beer. Beer + sprite
is yummy!
(I’ll let someone else figure out how this advice also applies to the database
analogy.)
> On Jun 8, 2015, at 19:52, Dan Andreescu wrote:
>
>>
Does one of these options support both SQL based and REST interfaces with
TSV and JSON output with little user setup? I'm thinking of your typical
Ubuntu/Mac/Windows user with the ability to import TSV data into a
spreadsheet.
On Tuesday, June 9, 2015, Dario Taraborelli
wrote:
> I too would love
I too would love to understand if RestBASE can become our default solution for
this kind of data-intensive APIs. Can you guys briefly explain what kind of
queries and aggregations would be problematic if we were to go with Cassandra?
> On Jun 9, 2015, at 8:39 AM, Oliver Keyes wrote:
>
> Rememb
Remember that (as things currently stand) putting the thing on labs
means meta-analytics ("how are the cubes being used?") being a pain in
the backside to integrate with our existing storage solutions.
On 8 June 2015 at 22:52, Dan Andreescu wrote:
>> As always, I'd recommend that we go with tech
>
> As always, I'd recommend that we go with tech we are familiar with --
> mysql or cassandra. We have a cassandra committer on staff who would be
> able to answer these questions in detail.
>
>
> WMF uses PostGRES for some things, no? Or is that is just in labs?
>
Since this data is meant to be
On Mon, Jun 8, 2015 at 7:44 PM, Gabriel Wicke wrote:
> (+ Eric)
>
> On Mon, Jun 8, 2015 at 5:42 PM, Toby Negrin wrote:
>>
>> As always, I'd recommend that we go with tech we are familiar with --
>> mysql or cassandra. We have a cassandra committer on staff who would be able
>> to answer these que
> As always, I'd recommend that we go with tech we are familiar with -- mysql
> or cassandra. We have a cassandra committer on staff who would be able to
> answer these questions in detail.
WMF uses PostGRES for some things, no? Or is that is just in labs?
> On Jun 8, 2015, at 17:42, Toby N
(+ Eric)
On Mon, Jun 8, 2015 at 5:42 PM, Toby Negrin wrote:
> As always, I'd recommend that we go with tech we are familiar with --
> mysql or cassandra. We have a cassandra committer on staff who would be
> able to answer these questions in detail.
>
> -Toby
>
> On Mon, Jun 8, 2015 at 4:46 PM,
As always, I'd recommend that we go with tech we are familiar with -- mysql
or cassandra. We have a cassandra committer on staff who would be able to
answer these questions in detail.
-Toby
On Mon, Jun 8, 2015 at 4:46 PM, Marcel Ruiz Forns
wrote:
> *This discussion is intended to be a branch of
*This discussion is intended to be a branch of the thread: "[Analytics]
Pageview API Status update".*
Hi all,
We Analytics are trying to *choose a storage technology to keep the
pageview data* for analysis.
We don't want to get to a final system that covers all our needs yet (there
are still thi
24 matches
Mail list logo