Hello -

I have created an open-source project on 
http://code.google.com/p/opencalaccess/ and I would like to hear if anyone is 
interested in participating in this with me.

There is a web page maintained by the California Secretary of State called 
CalAccess (http://cal-access.sos.ca.gov/campaign/). It provides access to a 
large database of campaign financing data from candidates, lobbyists, 
fundraisers and many others who have their noses in the electoral trough. The 
data is a mess and, from what I have seen, the SoS puts up a sanitized version 
of the data with lots of data removed. One can get dumps of the data from the 
SoS. I have done so and I am interested in resolving ambiguities in the data 
and in presenting the data, using WebObjects apps of course, in more useful 
ways.

There are several interesting problems one sees as one works with the data.

1 - There are errors are every place there can be errors. There are illegal 
characters in the data dump and malformed lines. There are "information" errors 
as well. For every constraint that is documented, there are violations. There 
is a document that describes the schema but it is rife with errors. For 
example, one table in the data only includes half of its columns. In the 
document, there is a page break in the list of columns and the columns on the 
second page do not appear in the data dump. Really.

2 - The schema is hugely bloated in the way that government agencies mess up 
all databases.

3 - There is another layer of obfuscation. For example, names are not related 
to primary keys but may only be linked in different tables by string 
comparisons. For example, a committee treasurer may appear as "Bob Smith" in 
one table, "Robert Smith" in another, "R. E. Smith" in yet another, and then 
again as "Robert Smyth" at the same address....

4 - The schema itself is obfuscated. One can tease out the relationships, for 
example, in the tables that contain info on lobbyists, their employers, and who 
they give money to, but the graph of these is amazingly complicated. One wants 
to assume to SoS is not deliberately trying to hide the relationships. But then 
one sees the schema. Something is definitely going on.

5 - There are many ways to display the data and I am not sure which are useful. 
Anyone with political oversight or forensic accounting experience would be 
appreciated.

6 - It is not clear what the "WO way" to work with the data should be. If one 
is creating a shopping cart app, for example, WO makes this easy. When one 
wants to find near-duplicates in a list of 1.5 million addresses, it is not 
clear what the "WO way" is. Naive implementations cause much suckage.

I have put up a slice of the data (CSV files), some scripts I use to clean up 
and import the data into MySQL and a basic WO app. I have tried to document 
what is going on. I am only relying on WebObjects (and so java), perl and 
Bourne shell and I am trying to do this cross-platform-ishly, but am working on 
Mac OS X.

I have built and taken apart parts of this Edsel before and, since I am 
interested in this for citizen advocacy reasons and not to make money, I 
thought I would try the open-source route.

The state of California keeps saying that they cannot afford to do anything 
with the data. I think it would be cool to come up with an open-source solution 
and tell the SoS that they can either get a $5 million proposal from KPMG for 
another bloat-ware system or they can use something that is free and which 
works. When citizens put up data of this sort, the government does tend to get 
embarrassed into doing their jobs.

Let me know if you are interested.

cheers - ray

r...@ganymede.org

 _______________________________________________
Do not post admin requests to the list. They will be ignored.
Webobjects-dev mailing list      (Webobjects-dev@lists.apple.com)
Help/Unsubscribe/Update your Subscription:
http://lists.apple.com/mailman/options/webobjects-dev/archive%40mail-archive.com

This email sent to arch...@mail-archive.com

Reply via email to