11/12 Post 5

2021-11-12
suppressPackageStartupMessages(library(ggplot2))
suppressPackageStartupMessages(library(dplyr))
suppressPackageStartupMessages(library(tidyverse))
## Warning: package 'readr' was built under R version 4.1.2
options(warn = -1)

shooting.joined = readRDS("../../../dataset/Merge-with-County/shooting_joined_sf_obj.rds")

shooting.joined1 = shooting.joined[,-24] %>% group_by(county, All.Ages.in.Poverty.Count) %>% mutate(num_events_of_county = n())
shooting.joined1
## # A tibble: 6,283 x 24
## # Groups:   county, All.Ages.in.Poverty.Count [2,919]
##       id name     date    manner_of_death armed    age gender race  city   state
##    <int> <chr>    <chr>   <chr>           <chr>  <int> <chr>  <chr> <chr>  <chr>
##  1     3 Tim Ell~ 2015-0~ shot            gun       53 M      A     Shelt~ WA   
##  2     4 Lewis L~ 2015-0~ shot            gun       47 M      W     Aloha  OR   
##  3     5 John Pa~ 2015-0~ shot and Taser~ unarm~    23 M      H     Wichi~ KS   
##  4     8 Matthew~ 2015-0~ shot            toy w~    32 M      W     San F~ CA   
##  5     9 Michael~ 2015-0~ shot            nail ~    39 M      H     Evans  CO   
##  6    11 Kenneth~ 2015-0~ shot            gun       18 M      W     Guthr~ OK   
##  7    13 Kenneth~ 2015-0~ shot            gun       22 M      H     Chand~ AZ   
##  8    15 Brock N~ 2015-0~ shot            gun       35 M      W     Assar~ KS   
##  9    16 Autumn ~ 2015-0~ shot            unarm~    34 F      W     Burli~ IA   
## 10    17 Leslie ~ 2015-0~ shot            toy w~    47 M      B     Knoxv~ PA   
## # ... with 6,273 more rows, and 14 more variables:
## #   signs_of_mental_illness <chr>, threat_level <chr>, flee <chr>,
## #   body_camera <chr>, is_geocoding_exact <chr>, county <chr>, GEOID <int>,
## #   Year <int>, State <int>, County.ID <int>, All.Ages.in.Poverty.Count <dbl>,
## #   All.Ages.in.Poverty.Percent <dbl>,
## #   Median.Household.Income.in.Dollars <dbl>, num_events_of_county <int>
ggplot(shooting.joined1) + geom_point(aes(x = All.Ages.in.Poverty.Count, y = num_events_of_county,color=County.ID))

model0 = lm(num_events_of_county ~ signs_of_mental_illness + age + All.Ages.in.Poverty.Count + Median.Household.Income.in.Dollars, data = shooting.joined1)
summary(model0)
## 
## Call:
## lm(formula = num_events_of_county ~ signs_of_mental_illness + 
##     age + All.Ages.in.Poverty.Count + Median.Household.Income.in.Dollars, 
##     data = shooting.joined1)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -14.4009  -1.0638  -0.1785   0.7583  22.6951 
## 
## Coefficients:
##                                      Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                         1.012e-01  2.629e-01   0.385    0.700    
## signs_of_mental_illnessTrue        -1.127e-01  1.222e-01  -0.922    0.356    
## age                                -8.444e-04  4.003e-03  -0.211    0.833    
## All.Ages.in.Poverty.Count           2.641e-05  1.598e-07 165.236  < 2e-16 ***
## Median.Household.Income.in.Dollars  1.920e-05  3.511e-06   5.468 4.81e-08 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 3.492 on 4508 degrees of freedom
##   (1770 observations deleted due to missingness)
## Multiple R-squared:  0.8622, Adjusted R-squared:  0.862 
## F-statistic:  7050 on 4 and 4508 DF,  p-value: < 2.2e-16
model1 = lm(num_events_of_county ~ signs_of_mental_illness + age + All.Ages.in.Poverty.Count + Median.Household.Income.in.Dollars + race + flee + gender + threat_level, data = shooting.joined1)
summary(model1)
## 
## Call:
## lm(formula = num_events_of_county ~ signs_of_mental_illness + 
##     age + All.Ages.in.Poverty.Count + Median.Household.Income.in.Dollars + 
##     race + flee + gender + threat_level, data = shooting.joined1)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -14.5111  -1.1229  -0.2985   0.8732  23.3617 
## 
## Coefficients:
##                                      Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                         1.057e+00  5.147e-01   2.053   0.0401 *  
## signs_of_mental_illnessTrue        -2.053e-01  1.244e-01  -1.651   0.0989 .  
## age                                -8.512e-03  4.193e-03  -2.030   0.0424 *  
## All.Ages.in.Poverty.Count           2.649e-05  1.647e-07 160.859  < 2e-16 ***
## Median.Household.Income.in.Dollars  1.835e-05  3.500e-06   5.243 1.65e-07 ***
## raceA                              -4.551e-02  4.496e-01  -0.101   0.9194    
## raceB                              -1.432e+00  2.586e-01  -5.537 3.25e-08 ***
## raceH                              -1.116e-01  2.679e-01  -0.417   0.6770    
## raceN                               9.114e-01  4.816e-01   1.892   0.0585 .  
## raceO                               1.157e-01  5.767e-01   0.201   0.8410    
## raceW                              -1.826e-01  2.436e-01  -0.750   0.4536    
## fleeCar                             3.975e-02  2.889e-01   0.138   0.8906    
## fleeFoot                           -1.461e-01  2.969e-01  -0.492   0.6226    
## fleeNot fleeing                    -5.868e-02  2.662e-01  -0.220   0.8256    
## fleeOther                           4.119e-01  3.964e-01   1.039   0.2988    
## genderM                            -6.731e-02  2.445e-01  -0.275   0.7831    
## threat_levelother                  -1.504e-01  1.114e-01  -1.350   0.1771    
## threat_levelundetermined           -2.705e-01  3.344e-01  -0.809   0.4186    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 3.447 on 4495 degrees of freedom
##   (1770 observations deleted due to missingness)
## Multiple R-squared:  0.866,  Adjusted R-squared:  0.8655 
## F-statistic:  1709 on 17 and 4495 DF,  p-value: < 2.2e-16
# model1 = lm(num_events_of_illness ~ signs_of_mental_illness + age + All.Ages.in.Poverty.Count, data = df)
# summary(model1)

unique(shooting.joined1$armed)
##  [1] "gun"                              "unarmed"                         
##  [3] "toy weapon"                       "nail gun"                        
##  [5] "knife"                            ""                                
##  [7] "shovel"                           "vehicle"                         
##  [9] "hammer"                           "hatchet"                         
## [11] "sword"                            "machete"                         
## [13] "box cutter"                       "undetermined"                    
## [15] "metal object"                     "screwdriver"                     
## [17] "lawn mower blade"                 "flagpole"                        
## [19] "guns and explosives"              "cordless drill"                  
## [21] "crossbow"                         "BB gun"                          
## [23] "metal pole"                       "Taser"                           
## [25] "metal pipe"                       "metal hand tool"                 
## [27] "blunt object"                     "metal stick"                     
## [29] "sharp object"                     "meat cleaver"                    
## [31] "carjack"                          "chain"                           
## [33] "contractor's level"               "railroad spikes"                 
## [35] "stapler"                          "beer bottle"                     
## [37] "unknown weapon"                   "binoculars"                      
## [39] "bean-bag gun"                     "baseball bat and fireplace poker"
## [41] "straight edge razor"              "gun and knife"                   
## [43] "ax"                               "brick"                           
## [45] "baseball bat"                     "hand torch"                      
## [47] "chain saw"                        "garden tool"                     
## [49] "scissors"                         "pole"                            
## [51] "pick-axe"                         "flashlight"                      
## [53] "spear"                            "chair"                           
## [55] "pitchfork"                        "hatchet and gun"                 
## [57] "rock"                             "piece of wood"                   
## [59] "pipe"                             "glass shard"                     
## [61] "motorcycle"                       "pepper spray"                    
## [63] "metal rake"                       "baton"                           
## [65] "crowbar"                          "oar"                             
## [67] "machete and gun"                  "air conditioner"                 
## [69] "pole and knife"                   "baseball bat and bottle"         
## [71] "fireworks"                        "pen"                             
## [73] "chainsaw"                         "gun and sword"                   
## [75] "gun and car"                      "pellet gun"                      
## [77] "claimed to be armed"              "incendiary device"               
## [79] "samurai sword"                    "bow and arrow"                   
## [81] "gun and vehicle"                  "vehicle and gun"                 
## [83] "wrench"                           "walking stick"                   
## [85] "barstool"                         "grenade"                         
## [87] "BB gun and vehicle"               "wasp spray"                      
## [89] "air pistol"                       "Airsoft pistol"                  
## [91] "vehicle and machete"              "ice pick"                        
## [93] "tire iron"                        "bottle"                          
## [95] "gun and machete"                  "knife and vehicle"
plot(model1$residuals,lwd = 0.5)

plot(model1)

#Idea:
Instead of doing point processing, we decide to count the cases in each county and make each county’s number of shootings as our dependent variable(y). We attempt to find what are the influences of those independent variables like gender, race, flee, threat level and so on and how will they influence our model. There are several reasons for us to do this.

First,we have the information of median house income and poverty count of each county which are related to what we are interested in(find the potential relationship of shooting and poverty rate).

Moreover, it is much harder doing point process as a starting point for our project since we need more time and knowledge to analyze a point process and make the prediction. Thirdly, a simple multivariate linear regression is a good begining point for us to get the sense of modeling and it can be investigated deeper in the future.

#Analysis:
Test Statistics:
F-statistic: 1709
Analysis: Test Statistics:
F-statistic: 1709
p-value: < 2.2e-16
Adjusted R-squared: 0.8655

The Adjusted R-squared is 0.8655 which is relatively big since 87% data can be explained by our model and the p-value is small which mean that our model is significant.

From the analysis graph of residuals, we can find that (residual vs fitted) the residual value is close to the ideal line. However, from the (Scale location) graph and the qq plot, our model cannot fit the line and especially the the last part of the qq plot, the points are far from the line.

#Problems & Future plan:
We may use methods like AIC, BIC, forward or backward selection later to improve our model. Another problem we met is that there are too many categories for the “armed” variable and some of them are outrageous like “piece of wood” or “stapler”. We plan to resort them into a more precise way or just not using it as a variable.

Since our dataset covers 4 years time period, we consider making graphs interactive with a time line.

#Improving graphs in from former posts:
We will try to figure out a better plan for coloring the usmap, and concentrate on some “spots” based on tract.

Previous Post 6
Next Post 4