Hi, Joseph:

          What are your priorities regarding the US lobbying database?


At "http://www.senate.gov/legislative/Public_Disclosure/LDA_reports.htm";, I see 4 links:


                    * Search the Lobbying Database (LD-1, LD-2)


                    * Download a Lobbying Documents Database


                    * Search the Contributions Database (LD-203)

        
                    * Downloadable Contributions Databases


Am I correct that the two "Search" links are to databases that contain lots of nonsense, and the task is to download the "Lobbying Documents" and maybe also the "Contributions" database, run a number of checks, screen out the nonsense and create search capabilities similar to what is offered at this web site but without the garbage?


I downloaded one file from "http://www.senate.gov/legislative/Public_Disclosure/database_download.htm";. I see that it's "xml" inside. I have not worked with XML much before, but it doesn't look too difficult just from a casual perusal -- and R has an "XML" package.


Also, do you have a list publications by others who have done things with these data? I'd like to contact them to find out what tools they have they'd be willing to share, the priorities they would suggest for a project like this, etc. The project already exists on R-Forge at "https://r-forge.r-project.org/R/?group_id=84";. Currently, that only contains a very brief statement of intent. However, that's clear evidence that I've done, and it's available in an environment that would support collaboration from others who might be interested in contributing.


I thought I'd first ask interested researchers for their input on priorities and the circumstances under which they might use and even contribute to a project like this. I also plan the 41 packages contributed to the Comprehensive R Archive Network (CRAN) with "political science" mentioned on a help page. Some of those identify political science professors, whom I plan to contact with similar questions. After I've done this, I plan to send a broader invitation to "R-help@r-project.org" to see if I can get volunteers there. With a modest amount of luck, this will generate both advice on the most important things to consider here AND volunteers to help produce the tools needed to make it all happen.


          Comments?
          Best Wishes,
          Spencer


p.s. A journey of a thousand miles can be achieved in a year at 3 miles per day or 20 miles per week.


#######################################


Does the database you identified (http://www.senate.gov/legislative/Public_Disclosure/LDA_reports.htm) pertain to all branches of government (Senate, House, executive) or only the US Senate?


I ask, because I'd like a terse name for the project like "USSenateLobbying" or just "USlobbying". Which more accurately describes these data? Or would you recommend something different? (The name should not include blank space, though it can include a period ".".)


I recommend we create the desired software using, at least in part, the free, open source software language R (www.r-project.org). I propose we structure the code in a "package" to be developed on a subversion repository, R-Forge (r-forge.r-project.org), and submitted to the Comprehensive R Archive Network (CRAN). I have substantial experience with R, CRAN, R package creation including using R-Forge.


      Thanks,
      Spencer


p.s. R is the language of choice for a large and growing number of people engaged in new statistical algorithm development, with almost 3700 contributed packages currently downloadable from any of 84 mirrors in 38 countries. I like it partly because it promotes good development practices encouraging simultaneous development of documentation and code. Creating a package on R-Forge makes it easy to involve a team of volunteers, none of whom ever need to meet face to face. We can start as soon as we have a name. After initiation, we can notify developers of other R packages designed for political science applications to seek their suggestions and possible collaboration. With a little luck we may be able to obtain help from professors and similar researchers at Harvard, Stanford and elsewhere.


--
Spencer Graves, PE, PhD
President and Chief Technology Officer
Structure Inspection and Monitoring, Inc.
751 Emerson Ct.
San José, CA 95126
ph:  408-655-4567
web:  www.structuremonitoring.com

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to