This exercise examines methods of summarizing the relationship between two variables: a simple graphical analysis, the bivariate linear regression model. The application is to the relationship between infant mortality rates (IMRs) and total suspended particulates (TSPs) air pollution. The Environmental Protection Agency recently toughened the regulations that limit firms abil-ity to emit TSPs, because of the presumed health effects of TSPs. Whether or not, IMRs and TSPs are causally related is an issue of tremendous importance to public policy.
Feel free to work cooperatively but each person is required to turn in their own problem set that provides the solutions in their own words.
For those of you who become interested in this topic, you might be interested in:
Chay, Kenneth Y. and Michael Greenstone. 2005. Does Air Quality Matter?: Evidence from the Housing Market. Journal of Political Economy, 113(2): 376-424.
Chay, Kenneth Y. and Michael Greenstone. 2003. Air Quality, Infant Mortality, and the Clean Air Act of 1970. MIT Department of Economics Working Paper No. 04-08.
Data Source: imrtsp71.dta and imrtsp72.dtaimrtsp71.dta is a data file from 1971. The unit of observation is the county and there are 715
observations of 21 variables.
This Stata format data file contains county-level information on county-level number of infant mortalities per 1000 births (IMR), the ln of this same number, TSPs concentrations, number of births, characteristics of new parents (e.g. race of mother, years of education, marital status of mother, mothers age), whether the infant is considered to have a low-birth weight (a poor indicator of infant health), month of the pregnancy that the mother initiated prenatal care, and mean per-capita income.
The relevant variables with descriptions in quotations are:
imr71 Ã¢â‚¬Å“# inf deaths per 1000 births 71Ã¢â‚¬Âlnimr71 Ã¢â‚¬Å“ln(# inf death per 1000 births 71)Ã¢â‚¬Âmtspar71 Ã¢â‚¬Å“county-level tsps concentration, measured in micrograms per cubic meter 71Ã¢â‚¬Â tsp sq Ã¢â‚¬Å“the square of mtspar71Ã¢â‚¬Âbirth71Ã¢â‚¬Å“# births 71Ã¢â‚¬Âwhite71 Ã¢â‚¬Å“% births, white mom 71Ã¢â‚¬Â
othr71 Ã¢â‚¬Å“% births, nonwhite/nonblack mom 71Ã¢â‚¬Â female71 Ã¢â‚¬Å“% female births 71Ã¢â‚¬Âedudad71 Ã¢â‚¬Ëœmean father years of ed 71Ã¢â‚¬Â edumom71 Ã¢â‚¬Å“mean mother years of ed 71Ã¢â‚¬Â maried71 Ã¢â‚¬Å“% mother married 71Ã¢â‚¬Â
umard71 Ã¢â‚¬Å“% mother unmarried 71Ã¢â‚¬Âagemom71 Ã¢â‚¬Å“mean mother age 71Ã¢â‚¬Âlwght71 Ã¢â‚¬Å“% births with weight <2500 g in 71Ã¢â‚¬Âpcare171 Ã¢â‚¬Å“% mother began prenatal care in 1st or 2nd month 71Ã¢â‚¬Â pcare271 Ã¢â‚¬Å“% mother began prenatal care in 3rd month 71Ã¢â‚¬Â pcare371 Ã¢â‚¬Å“% mother began prenatal care in 4th-6th month 71Ã¢â‚¬Â pcare471 Ã¢â‚¬Å“% mother began prenatal care in 7th-9th month 71Ã¢â‚¬Â pcinc71 Ã¢â‚¬Å“county-level per cap income 71Ã¢â‚¬Âlocation Ã¢â‚¬Å“5-digit county fips codeÃ¢â‚¬Âfstate Ã¢â‚¬Å“2 digit state fips codeÃ¢â‚¬Â;
[Note: There may be a few extra variables in the data file, but you should ignore them.]
imrtsp72.dta is structured exactly the same way except that the observations are from 1972 and all the appropriate variable names end with Ã¢â‚¬Å“72Ã¢â‚¬Â instead of Ã¢â‚¬Å“71Ã¢â‚¬Â. Again, the unit of observation is the county and here there are 983 observations of 22 variables. DO NOT USE imrtsp72.dta in this problem set.
1. Summarize the relationship between the number of infant deaths per 1000 births and TSPs concentrations.
Create histograms of imr71 and lnimr71. Do either of these variables look normal? (Hint: experimenting with the number of bins and overlaying a normal curve will help with this.)
Graph scatter plots of imr71 and lnimr71 against mtspar71. Does it look like there is an association between infant mortality and tsps?
Examine the edudad71 variable. What are the deciles of the variable? What is the average year of education in the largest decile? Graph scatter plot of imr71 and eudad71. Do you think that counties with more educated fathers have lower levels of infant mortality?
Graph scatter plots of imr71 and lnimr71 against mtspar71, but this time, weight the observations by the total number of births in the county. What is your prediction about the covariance of infant mortality rates and tsps? Does this relationship appear linear for either form of the dependent variable?
2. Background Questions
Does the available data allow for a determination of the causal relationship between infant mortality and TSPs? Why not? Describe the data file that would allow for an examination of this issue?
Under what assumptions is the least squares estimator the best linear unbiased esti- mator (BLUE)?
c. WhatassumptionisnecessaryforLStoproduceanunbiasedestimateoftheIMR/TSPs relationship? Do you think this assumption is likely to hold? If you had any data file that you wanted, how would you test whether this assumption may be valid? Describe your ideal data file. With the current data file, present some evidence as to whether this assumption is likely to hold?
d. In the bivariate linear regression model, derive the estimating equations for the inter- cept and slope coefficients? Derive their standard errors?
3. The bivariate linear regression model of infant mortality rates and TSPs.
Run the regressions of imr71 on a constant and mtspar71 and lnimr71 on a constant and mtspar71. In both cases, weight the regressions by birth71 so that larger counties have a greater influence. Interpret the parameter estimate (i.e., Beta Hat) in words; for instance, describe the effect of a 10 unit decline in TSPs on infant mortality.
Plot the residuals from both regressions and overlay a normal curve. Does the normal- ity assumption appear reasonable? Does homoscedasticity of residuals hold? (Hint: graph residuals against the fitted values)
Use the total sum of squares (TSS), error sum of squares (ESS), and regression sum of squares (RSS) to derive the R2 statistic? Determine the components of the corrected R2 statistic and show that STATA accurately calculated that statistic.
Determine the values of TSPs that define the deciles of TSPs. Create 10 dummy variables where each one corresponds to a decile of TSPs. For instance, an observation that has a TSPs concentration in the smallest decile would have a value of 1 for the dummy variable that corresponds to the smallest decile and a value of 0 for the other 9 dummy variables. Regress imr71 on a constant and the 10 dummy variables. Why does STATA drop one of the dummy variables? Plot the parameter estimates from the dummy variables where the y-axis is the parameter estimate of the dummy variables and the values on the x-axis are the midpoint of the range that determine each of the dummy variables. Is the effect of TSPs on imr71 linear in TSPs?
e Now regress imr71 on mtspar71 and the square of mtspar71. (Note you will have to generate the square variable.) Plot the predicted values of this regression against mtspar71. Describe the shape of this function.
Delivering a high-quality product at a reasonable price is not enough anymore.
That’s why we have developed 5 beneficial guarantees that will make your experience with our service enjoyable, easy, and safe.
You have to be 100% sure of the quality of your product to give a money-back guarantee. This describes us perfectly. Make sure that this guarantee is totally transparent.Read more
Each paper is composed from scratch, according to your instructions. It is then checked by our plagiarism-detection software. There is no gap where plagiarism could squeeze in.Read more
Thanks to our free revisions, there is no way for you to be unsatisfied. We will work on your paper until you are completely happy with the result.Read more
Your email is safe, as we store it according to international data protection rules. Your bank details are secure, as we use only reliable payment systems.Read more
By sending us your money, you buy the service we provide. Check out our terms and conditions if you prefer business talks to be laid out in official language.Read more