seems like a nice effort, generic ETLs are tough!

On Jan 20, 2011, at 10:25 PM, Stefan Urbanek wrote:

> Hi,
> I am working on a framework called Brewery. Goal is to provide
> abstract interface for data streams from heterogenous sources into
> heterogenous targets. More information with images:
> Point is to have objects similar to file streams, but streaming
> structured data in form of records/rows instead of bytes.
> Currently implemented sources/targets are:
> * Relational database table through SQLAlchemy (source+target)
> * CSV file (source+target)
> * XLS file (source only)
> * MongoDB (source+target)
> * google spreadsheet (source only)
> * directory with YAML files - one file per record (source+target)
> For each source there are three basic methods:
> - fields - list of fields provided by the source (has to be explicitly
> set for sources with unknown fields)
> - rows() - iterator for data represented by list
> - records() - iterator for data represented by dict object
> Optionally you can use: read_fields(limit) to learn what fields are
> present in data source (for example in mongo DB)
> For each target:
> - append() - append an object, either a dictionary or a list to the
> target
> With this simple interface you can easily create pipes between MongoDB
> and Postgres, import directory of YAML files into MySQL, ...
> In addition to that, there is simple data auditing tool for basic data
> quality audit. You can use StreamAuditor (stream target) to collect
> information about data and then generate data quality report.
> Currently audited data properties are:
> * record and value count (might be different in document based
> DBs,same in relational)
> * null count
> * empty string count
> * distinct value count
> * distinct values
> * storage types (only one for relational databases)
> * ratios of measured properties, such as null/value count or null/
> record count
> More probes to come (in a modular way).
> API is documented here:
> Sources:
> bitbucket: (main - mercurial
> repository)
> github: (synchronized with main)
> Example usage: Some source streams (XLS/CSV) are already being used
> for data proxy in project CKAN for converting data from various
> resources into common structured form:
> Plans for the future are:
> * command-line tools for simple data streaming tasks: copy, quality
> audit
> * data processing stream network with nodes for simple
> transformations, analysis and data mining
> * modular data quality probes - injectable into the network
> The Brewery project is in early stage. I would like have some
> feedback: what do you think about it? Do you have any suggestions,
> comments? If anyone would like to try it and will have any troubles,
> just drop me a line and I will help.
> Regards,
> Stefan Urbanek
> --
> Twitter: @Stiivi
> -- 
> You received this message because you are subscribed to the Google Groups 
> "sqlalchemy" group.
> To post to this group, send email to
> To unsubscribe from this group, send email to 
> For more options, visit this group at 

You received this message because you are subscribed to the Google Groups 
"sqlalchemy" group.
To post to this group, send email to
To unsubscribe from this group, send email to
For more options, visit this group at

Reply via email to