Data

Data Description

The primary data that we worked on is Fatal Police Shooting Data collected by The Washington Post.
- The data relies primarily on news accounts, social media postings and police reports.
- The data is collected mainly for analysis of circumstances of fatal shootings and the overall demographics of the victims.
- The dataset and detailed description can be found here.

TidyCensus Data
- It contains geographic information of the US, and they are used for spatial merge with the primary dataset and perform geographic visualization.
- It is retrieved from TidyCensus package.
ACS 2015-2019 SAIPE(Small Area Income and Poverty Estimates) Data
- It contains income and poverty information of people of different age groups for each county, and they are used for analysis of impact of economic status on each shooting case.
- The dataset and detailed description can be found here
State Gun Ownership Data
- It contains gun ownership information for each state, and they are used for analysis of the impact of state gun ownership on each shooting case.
- The dataset and detailed description can be found here

Year –> year when each row’s information was collected
State –> number code of each state
County.ID –> number code of each county
All.Ages.in.Poverty.Count –> count of people of all ages living under the poverty line for each county
All.Ages.in.Poverty.Percent –> proportion of people of all ages living under the poverty line for each county
Median.Household.Income.in.Dollars –> median household income in dollars for each county
state –> letter code of each state
County –> name of each county

State –> name of each state
gunOwnership -> the proportion of people who have gun(s) for each state
totalGuns -> the total number of guns for each state
state –> letter code of each state(manually added for joining with the primary dataset)

Since our primary dataset has the exact location information, we converted the df to a shp object setting crs to 4326, so that it can be merged with the geographic data provided by TidyCensus.
We selected 7 useful columns out of 45 columns from ACS 2015-2019 SAIPE dataset
Since the original county name column contains both the county name and the state code while the county column of our primary dataset only contains county name, we cleaned out the state codes so that the dataset became easier to join with the primary data.
Since our primary dataset does not include the state name while the state gun ownership dataset only uses state names, we manually added a column of letter code for each corresponding state in order to merge.
Since our model focuses on whether the victims were armed with gun or not while the armed column contains 96 types of arms, we recategorized the armed column as a new column named armed_with_gun, which only contains Yes or No to indicate whether each victim was armed with real guns or not.
After all tables were joined, we cleaned out rows which contain NA.

Previous Analysis

Next Interactive