The city of Chicago makes many government data sets publicly available through their data portal. One of those data sets contains all the data from food inspections conducted throughout the city. This data contains the records from all food inspections from January 2010 forward, and it is updated weekly (the version I worked off of was last updated August 21st 2014).
There is a column in this data set that explains what prompted the particular inspection, with possible values being “canvass” (basically random selection), “complaint” (a patron of the venue complained), “sickness” (a special case of complaint, a patron believes he/she contracted food poisoning from the venue), etc. I thought it might be interesting to use this to try to find out which specific violations are correlated with food poisoning.
The data set provides 45 different violations that a restaurant could have. I condensed these 45 separate violations into a few larger categories of violations. I wanted to see which categories are more prevalent during inspections that are prompted by sickness than random canvass inspections. Here is a chart showing my findings:
Probably not surprisingly, the violations that seems to be the most likely cause of sickness are hygiene violations (not washing hands, not wearing hairnets, etc.). From this, I was curious which venues are the most and least likely to make people sick. Here is a bar chart showing which venue types have the most to least hygiene violations:
Link to Viz (at this link you can hover over each region to see the zip code and the percentage of the violations at that zip code that were due to hygiene)
Given this, it looks like you are the safer from food poisoning if you eat at a gas station in Portage Park than if you eat at a coffee shop near Goose Island. Who would have thought? I think it’s also a bit surprising how high health focused venues (like juice bars and cafes in fitness centers) are in hygiene violations, perhaps not so healthy after all!
For my next post I’m hoping to merge a few other datasets into this data keyed on zip code in an attempt to “profile” each neighborhood based on characteristics like number of 311 calls, types of restaurants, and whatever else I can pull from the data portal, so stay tuned!