Pedestrian fatality data from 2001-2021 and how the time of fatal crashes varies over the day
There is a pedestrian fatality crisis in the United States After years of progress, the number of people killed while walking started rising again around 2010, contrary to what’s happening outside of the US. A recent article in the New York Times took a closer look at the fatality data and made a convincing case that the crisis specifically is a crisis happening at night. This chart shows the peak of pedestrian fatalities shifting over the course of the year, presumably corresponding with when the sun rises and sets.
It occurred to me and my fellow data nerd Austin Griesbach that we can explicitly look at sunrise and sunset times and how they correlate with the time of a crash. This turned out to be a more complicated process than expected and I document my part of it at the end of the article, in the Methods section. Most people probably won’t care about that though, and so let’s dive into our findings.
# load cleaned data. Data cleaning is documented in Methods section at the end of this article
peds <- readRDS("data/peds_with_sunset.rds")
peds <- peds |>
st_drop_geometry()
peds |>
filter(YEAR > 2000) |>
filter(abs(as.numeric(time_from_sunset_min, units = "hours")) < 6.125) |>
ggplot(aes(as.numeric(time_from_sunset_min, units = "hours"))) +
geom_histogram(binwidth = 1 / 4) +
geom_vline(xintercept = 0, color = "red", alpha = 0.3, linewidth = 1) +
# facet_wrap(~YEAR) +
ylab("Pedestrian fatalities") +
xlab("Hours from sunset") +
labs(
title = "Crashes are highly concentrated right after sunset",
subtitle = "Fatal pedestrian crashes in the US, 2001-2021; 15-minute bins",
caption = "Data: Fatality Analysis Reporting System (FARS)\nVisualization: Harald Kliems"
) +
theme_ipsum(base_family = "Roboto Condensed") +
annotate(geom = "text", x = .5, y = 5400, label = "Peak about 20-50\n minutes after sunset", family = "Roboto Condensed") +
annotate("segment",
x = 22.5 / 60, xend = 52.5 / 60, y = 4950, yend = 4950,
arrow = arrow(ends = "both", angle = 90, length = unit(.2, "cm"))
) +
annotate(geom = "text", x = -2.8, y = 2000, label = "No marked increase right\nbefore/after sunset", hjust = 0, family = "Roboto Condensed") +
annotate(geom = "text", x = 3.3, y = 3000, label = "Gradual decline for several \nhours after sunset", family = "Roboto Condensed") +
geom_curve(x = -1, xend = -.05, y = 2000, yend = 800, curvature = -.3, arrow = arrow(type = "closed", length = unit(.2, "cm"))) +
geom_curve(x = 1.5, xend = 4.5, y = 3900, yend = 1690, arrow = arrow(type = "closed", length = unit(.2, "cm")), curvature = .2)
This chart is stunning: The number of fatal pedestrian crashes is fairly flat for several hours before sunset. Crashes don’t start going up right before sunset or in the first few minutes after, when the sun is low and with news reports frequently mentioning glare as a factor contributing to crashes. But within less than an hour of sunset, pedestrian fatalities spike. And they spike a lot. In the 15 minutes right before/after sunset, 683 pedestrians were killed. 45 minutes later, that number is 4802, more than 7 times as many! The spike is so pronounced that it looks like an error. But we checked and re-checked the data and could not find any errors in the analysis (see also below for some data checks). After the peak within the first hour over sunset, the number of crashes declines. The decline is gradual, and the number of crashes remains much higher than before sunset.
What does the pattern look like for sunrise? It is largely a less dramatic version of the sunset chart.
peds |>
filter(YEAR > 2000) |>
filter(abs(as.numeric(time_from_sunrise_min, units = "hours")) < 6.125) |>
ggplot(aes(as.numeric(time_from_sunrise_min, units = "hours"))) +
geom_histogram(binwidth = 1 / 4) +
geom_vline(xintercept = 0, color = "red", alpha = 0.3, linewidth = 1) +
ylab("Pedestrian fatalities") +
xlab("Hours from sunset") +
ylim(c(0, 5500)) +
labs(
title = "Pedestrian fatalities have a less pronounced peak before sunrise",
subtitle = "Fatal pedestrian crashes in the US, 2001-2021; 15-minute bins",
caption = "Data: Fatality Analysis Reporting System (FARS)\nVisualization: Harald Kliems"
) +
theme_ipsum(base_family = "Roboto Condensed") +
annotate(geom = "text", x = -.5, y = 2450, label = "Peak about 20-50\n minutes before sunrise", family = "Roboto Condensed") +
annotate("segment",
x = -22.5 / 60, xend = -52.5 / 60, y = 1980, yend = 1980,
arrow = arrow(ends = "both", angle = 90, length = unit(.2, "cm"))
) +
annotate(geom = "text", x = 1.8, y = 2000, label = "Local minimum\naround sunrise", family = "Roboto Condensed") +
annotate(geom = "text", x = -3.1, y = 1600, label = "Gradual increase starting\n ~2 hours before sunrise", family = "Roboto Condensed") +
geom_curve(x = 1, xend = .05, y = 2000, yend = 800, curvature = .3, arrow = arrow(type = "closed", length = unit(.2, "cm"))) +
geom_curve(x = -2.5, xend = -1.1, y = 820, yend = 1700, arrow = arrow(type = "closed", length = unit(.2, "cm")), curvature = .2)
Pedestrian fatalities start increasing about 3 hours before sunset, peak 20-50 minutes before sunset, and rapidly decline right before and after sunset, and then stay relatively flat. The peak 15-minutes has fewer than 1800 fatalities, and that 1800 is only 4 times as many as in the low period around sunrise.
Rather than using bar charts to show the crashes in relation to sunset and time, Austin put all the data together and added it to an animated map.