Intro

These are suggested solutions to mandatory exercise set 2 for Log 708. Solutions are prepared using R markdown. As usual, file paths used in the solutions must be changed to work on other PC’s. I have also included some more advanced solutions where this is giving a better result. It is not expected that beginners in R would find or use these solutions, they are just to show that some problems can be solved easier in newer packages as compared to base R.

Exercise 1

This was regular exercise 7.1, the solution is provided along with other chapter 7 solutions. (You may need to reload the web page to make sure updates are included.)

Exercise 2

OK, let’s go:

library(stargazer) #just in case we need it. 


cardata <- read.csv("M:/Undervisning/Undervisningh21/Data/el_car_data_1.csv")

cardata$zone <-  factor(cardata$zone, levels = c("city", 
                                               "below30",
                                               "30to60",
                                                "above60"))

cardata$zone <- relevel(cardata$zone, ref = "city")

This was also done in previous submission:

cardata12 <- subset(cardata, year == 2012)
cardata18 <- subset(cardata, year == 2018)

We follow the suggested way:

selectvars <- subset(cardata18, select = c(pct_elcar, sent_index, income_med_adj))

We continue with selectvars:

cor(selectvars)
##                pct_elcar sent_index income_med_adj
## pct_elcar      1.0000000  0.6291064      0.6475091
## sent_index     0.6291064  1.0000000      0.4849241
## income_med_adj 0.6475091  0.4849241      1.0000000
plot(selectvars)

We see positive correlations between all pairs of these variables. So, municipalities rated as more central tend to have higher income levels and also higher ratio of elcars. Moreover, high income goes together with high ratio of elcars. All of these findings are as expected.

This would be achieved as follows.

model1 <- lm(pct_elcar ~ sent_index + income_med_adj, data = cardata18)
stargazer(model1, type = "text")
## 
## ===============================================
##                         Dependent variable:    
##                     ---------------------------
##                              pct_elcar         
## -----------------------------------------------
## sent_index                   0.011***          
##                               (0.001)          
##                                                
## income_med_adj               0.023***          
##                               (0.002)          
##                                                
## Constant                    -18.469***         
##                               (1.102)          
##                                                
## -----------------------------------------------
## Observations                    422            
## R2                             0.549           
## Adjusted R2                    0.547           
## Residual Std. Error      2.444 (df = 419)      
## F Statistic          255.119*** (df = 2; 419)  
## ===============================================
## Note:               *p<0.1; **p<0.05; ***p<0.01

Both effects are highly significant, the estimates are given in the table above. An increased income of 1000NOK means the income_med_adj variable increases by 1 unit, so the dependent variable pct_elcar will on average increase by 0.023 percentage points. When we compare municipalities differing by 100 000 NOK in income level, the pct_elcar difference is expected to be about 100 times larger, i.e. about 2.3 percentage points. (Like: from 4% to 6.2%)

We can update our object.

model2 <- update(model1, . ~ . + pct_expense)
#compare models
stargazer(model1, model2, type = "text")
## 
## =====================================================================
##                                    Dependent variable:               
##                     -------------------------------------------------
##                                         pct_elcar                    
##                               (1)                      (2)           
## ---------------------------------------------------------------------
## sent_index                  0.011***                 0.010***        
##                             (0.001)                  (0.001)         
##                                                                      
## income_med_adj              0.023***                 0.020***        
##                             (0.002)                  (0.002)         
##                                                                      
## pct_expense                                          0.474***        
##                                                      (0.075)         
##                                                                      
## Constant                   -18.469***               -16.790***       
##                             (1.102)                  (1.087)         
##                                                                      
## ---------------------------------------------------------------------
## Observations                  422                      422           
## R2                           0.549                    0.588          
## Adjusted R2                  0.547                    0.585          
## Residual Std. Error     2.444 (df = 419)         2.338 (df = 418)    
## F Statistic         255.119*** (df = 2; 419) 199.044*** (df = 3; 418)
## =====================================================================
## Note:                                     *p<0.1; **p<0.05; ***p<0.01

Previous estimates are almost the same in model 2, pct_expense is highly significant, and the \(R^2\) goes from 0.55 to 0.59, so improvement is seen.

with(cardata18, plot(population, pct_elcar))

The population variable is HEAVILY skewed, with a few municipalities being very much larger than others in this respect. This is what we termed an outlier problem, and as the plot shows, fitting a linear regression model will likely give VERY weard estimates of the effect, and moreover can destroy other estimates as well.

  1. We can do this:
model3 <- update(model2, . ~ . + zone)
stargazer(model1, model2, model3, type = "text", keep.stat = c("n", "rsq"))
## 
## ==============================================
##                      Dependent variable:      
##                -------------------------------
##                           pct_elcar           
##                   (1)        (2)        (3)   
## ----------------------------------------------
## sent_index      0.011***   0.010***  0.006*** 
##                 (0.001)    (0.001)    (0.001) 
##                                               
## income_med_adj  0.023***   0.020***  0.013*** 
##                 (0.002)    (0.002)    (0.002) 
##                                               
## pct_expense                0.474***  0.414*** 
##                            (0.075)    (0.068) 
##                                               
## zonebelow30                           -1.884  
##                                       (1.163) 
##                                               
## zone30to60                           -5.375***
##                                       (1.095) 
##                                               
## zoneabove60                          -7.224***
##                                       (1.075) 
##                                               
## Constant       -18.469*** -16.790*** -3.360** 
##                 (1.102)    (1.087)    (1.661) 
##                                               
## ----------------------------------------------
## Observations      422        422        422   
## R2               0.549      0.588      0.681  
## ==============================================
## Note:              *p<0.1; **p<0.05; ***p<0.01

So while “city” is the reference category, we estimate corrections to the pct_elcar level for other zones, and we get a non-significant effect to zone “below30” (as expected?). For the further away zones we get significant and large corrections at about -5.4 and -7.2 percentage points respectively. Basically, the closer we are to major cities, the higher becomes the elcar density. Note that the zone variable is quite related to the sent_index variable, and in model 3 the effect of this variable is reduced. To some extent also the same goes for income.

Ok, let’s try.

  1. Public transport lanes are more or less only found in-and-out of major cities in Norway, so the incentive for el-cars to freely use such lanes should be effective mainly in the two “inner” zones. So, some of the high pct_elcar levels in “city” and “below30” zones could be explained by this incentive working.

  2. Additionally, there are (to my knowledge) toll road stations on inbound/outbound roads of all the major cities. So another part of the observed zone effect could be attributed to the incentive allowing el_cars to pass for free.

More on 2. and 3.: The pct_expence variable measures to a certain extent whether people in a municipality tend to have high traveling expenses related to work. (such expenses are to a certain degree deductible from tax in Norway). So, where this variable is high there is likely a large proportion of people passing either toll stations or ferry crossings to and from work. Since this is free for el-cars, we could also take the highly significant effect of pct_expense on pct_elcar as a sign that the incentives 2 and 3 are working.

Further inspection of (say) top 25 municipalities wrt pct_elcar would point very much in the same direction, as we see almost exclusively

I want to do this :-) So, we start with a plot,

with(cardata18, plot(log(population), pct_elcar))

There are still some large deviations, but now mainly in the vertical direction, which is less of a problem for regression models. So, it doesn’t seem all to crazy to include log(population) in the model. We try and see what we get. So updating model 3 and then comparing, we go

model4 <- update(model3, . ~ . + log(population))

stargazer(model3, model4, type ="text", keep.stat=c("n", "rsq"))
## 
## ============================================
##                     Dependent variable:     
##                 ----------------------------
##                          pct_elcar          
##                      (1)            (2)     
## --------------------------------------------
## sent_index         0.006***        0.002    
##                    (0.001)        (0.002)   
##                                             
## income_med_adj     0.013***      0.014***   
##                    (0.002)        (0.002)   
##                                             
## pct_expense        0.414***      0.449***   
##                    (0.068)        (0.068)   
##                                             
## zonebelow30         -1.884        -0.873    
##                    (1.163)        (1.201)   
##                                             
## zone30to60        -5.375***      -4.264***  
##                    (1.095)        (1.147)   
##                                             
## zoneabove60       -7.224***      -6.275***  
##                    (1.075)        (1.112)   
##                                             
## log(population)                  0.548***   
##                                   (0.184)   
##                                             
## Constant           -3.360**      -6.434***  
##                    (1.661)        (1.942)   
##                                             
## --------------------------------------------
## Observations         422            422     
## R2                  0.681          0.688    
## ============================================
## Note:            *p<0.1; **p<0.05; ***p<0.01

So, something perhaps interesting happens. Even though the \(R^2\) does not go much up, we get a significant contribution from population in the logarithmic form. Comparing to model 3, the sentrality index falls away, so we can say that the zone and population variables was probably what made sentrality look significant at the beginning. The other coefficent estimates are similar from 3 to 4.

Finally, to quantify the effect of population along the lines given in the exercise, since the estimate is 0.55 we would on average see an increase in pct_elcar of 0.55 percentage points when the population doubles (+100%). This seems small, so appreciate this number, recall that from the small municipalities, we have to double the population a lot of times to reach the larger central municipalities.