Archives for category: Data

Hey y’all (currently in Atlanta, y’know?),

So the impression is that this blog is done and gone, and my work on the Clear Congress Project is also finished.  Not so! I’ve made a number of improvements over the past 2 months. I also want to explicitly outline some of the features I hope to implement soon!

Features Added

First I’ll talk about the passive interface elements I’ve added.  The most important is the legend on the right hand side, which provides some immediate explanation.  I also think it’s important to include some simple initial directions to the user, since it may be hard to determine that the scatter plot can be interacted with.  I will likely change the cursor CSS for the entire canvas to imply more interactivity.  In addition to this, I also added some middle lines across the chart to create quadrants.  I will likely add the option to add/remove these. In addition, I changed the background color to black.  I think it makes the details window pop more and makes the graphic a bit more dramatic.  I want to give the user the ability to change between black and white, and also provide a color-blind viewing option, which affects around 2% of people and almost 8% of all men.

On the interactive side of things, I implemented a few viewing options, such as a jitter/reset option, as well as the ability to show/hide labels and the network graph. I’m still having some performance issues when collision is enabled, particularly with Firefox. I also added the ability to capture an image of the current state of the graphic. I felt that it was necessary to add a time element at the top of the canvas to automatically place each captured image in a temporal context.  Currently it uses the user’s computer’s time, but I will probably make it standard Eastern time eventually.  I haven’t implemented any new filtering options yet, but that leads me into the next section

What’s To Come

First, let’s talk filters. I plan on cleaning up the interface, making each element buttons instead of form checkboxes.  This will be my first big change.  Then I plan on adding more filters. Lots more. So many I’ll need to divide them up accordion style. First I want to add some flexible sliding-bar filters for the derived attributes: the partisanship score and the leader-follower score.  I also want to add some sliding bars for experience in years as a legislator and for age. I’d also love to add income or wealth at some point, but that will require implementation of a new API, so this is likely a long-term goal. Finally, I’d like the ability to filter out all but those connected to the current revealed network.

Now, the largest feature I HAVE to implement is the ability to view changes through time. As one of the few people who check the view on a daily basis, the evolution over the past few months has been astonishing. Basically, the Republicans legislative stonewalling has forced the entire House more and more to the right, with a large number of Democrats now crossing the center partisanship line, some dramatically so. Being able to view these changes fluidly over time will have an incredible impact on the strength of the application, while at the same time creating a complete 365 image/year archive! Yes, I’m excited about this one. You should be too!  I hope to complete this by the end of the summer, maybe sooner if I get someone to help me out!


I plan on blogging regularly starting today, likely linking an image from Clear Congress Project to a something I’ve read or some relevant news story.  Just a head’s up.

I spent a few hours last night confirming that it is indeed fairly easy to integrate JavaScript elements into a Processing program using Processing.js.  I was able to pull info from an API using PHP and, turn it into a global JS variable, which Processing can then access, no problem!

I then spent a few more hours poking through more data and testing out some APIs.  One thing I realized is that it’s impractical for me to be pulling all my data from an API.  The only case in which I might use an API is if I attempt to create real-time element to my viz, in which case, I’ll use Sunlight’s new Real Time Congress API.

As for the my other data sources, I’ve decided to pull the dumps from Transparency Data and Sunlight and then modify those CSV files to suit my needs.   I will then put those into database, most likely accessing the data from SQL with PHP, transforming it into JS and then plugging it into my Processing code.  As for historical data, if I decide to go that route, it appears I have no choice but to use, but it’s a huge amount (16Gb!) of XML data, and I will probably try to avoid this if possible.

On nice thing about this approach is that I will be able to utilize a “two-pronged attack” going forward.  I will spend half my time working on getting the back-end setup, messaging data, etc.  The other half of my time will be spent working on the front-end interface, utilizing place-holder data.  Then, sometime soon, I will have the front-end and back-end “meet in the middle.”   After tying the front- and back-end together, I can move on to testing, evaluation, and refinement.

Spent a few hours last night swimming through the sea of data available on the political process, examining various sources.  It appears a lot of places use the source data at  They’ve done a great job of maintaining a huge database of congressional data (including historical data!), and it’s mostly in XML and maintained BY HAND (quite a task).  Sunlight’s Drumbone API uses this data, and likely, so will I.  I also found another fantastic source of bulk data at Transparency Data.  This means I potentially will not have to rely on an API at all.

I’ve also realized that apart from financial contributions or earmark costs I’m having trouble finding interesting qualitative data (most is nominal).  I might have to integrate some data from other sources.  For partisan / ideology scores, I will probably look to the Cook PVI (Partisan Voting Index) or the work done at VoteView.  I’ll be keeping my eye out for more in the coming weeks.  If I implement the system properly, adding data sources late in the game should not cause too many problems.

As I mentioned in my proposal, I expect to rely heavily on Sunlight Labs for my basic congressional data. Specifically I will probably use the Sunlight Congress API. Another option from the same source is to download the latest API dump. This might provide better performance and might ultimately be more manageable since I can just dump it fairly easily into an SQL format.

To access the data in the API, I plan on leaning heavily on the Sunlight Labs PHP Library.  Not only does it have methods and classes for Sunlight but also for some great data from OpenSecrets, such as lobbyist details, rep’s financial transactions, and various fundraising and campaign contributor information.  What a boon!

Sunlight Labs also provides access to their Dumbone API, which they developed for their free “Congress” Android app.  This looks especially useful for finding information on particular legislation or policy.

Also, GovTrack has a huge pile of source data.  It might be a little more difficult to sift through but may ultimately prove to the best data source available.

I will spend the next week determining how to organize this data and also determining how to tackle the problem of historical data… It’s unclear how easy it will be access through the API, and if it isn’t, I either have to abandon the historical perspective or identify an alternative data collection method.

%d bloggers like this: