Project #83153 - Econometrics by stata

Econ: Practice of Econometrics

Department of Economics

This problem set has 2 pages. Return a printout with your answers and an appendix with your do file.

Download the datafile “mus03data.dta” from the MEPS directory in the Datasets section of Blackboard. This dataset is an extract from the 2003 Medical Expenditure Panel Survey, used in Cameron & Trivedi (2010), Microeconometrics using Stata (Rev.Ed.). It contains individuals 65
and older, all of whom have health insurance through Medicare. But Medicare does not cover all medical expenditures, and in particular, at that time did not cover prescription drugs. Therefore, some individuals had supplemental health insurance, but not everyone did. 

You will need the following variables:


totexp   Total medical expenditure___

ltotexp ln(totexp) if totexp > 0___

posexp =1 if total expenditure > 0___

suppins             =1 if has supplemental private insurance

phylim =1 if has functional limitation

actlim   =1 if has activity limitation

totchr    number of chronic problems

age       Age

female   =1 if female

income annual household income/1000

famsze Size of the family


Note that total medical expenditure as measured in the MEPS does not only include what the individualpays (out-of-pocket), but also what the insurance company or Medicare pays.


(a)  Load the data. In the income variable, the code −1 is a special kind of missing, so to avoid complications later on, set this to missing (“.”). What is the percentage of individuals who have supplemental insurance? What is the percentage of individuals who have positive medical expenditure? 

(b)  Regress posexp on suppins. Also, regress totexp on suppins. For both regressions, answer the following questions: (i) What is the interpretation of the constant and the coefficient? (ii) Is the 

coefficient significant? (iii) Give an explanation for the sign of the coefficient (i.e., explain why it is positive or negative).

(c) The remaining questions use only the data that have no missings on lotexp and income, so drop any observations with missings on these variables. Regress ltotexp on suppins. What is the interpretation of the coefficient? (For answering this question, you may pretend the coefficient is “small”.)

(d)  “totchr” is the number of chronic conditions an individual has, and it seems obvious that medical expenditures are (partly/largely) spent on treating these conditions, so expenditures should be related to totchr. Therefore, add totchr as a control to the regression in (c). Comparing the results from this regression with the regression in (c), what is your estimate of the omitted variables bias in (c)? 

(e)  Estimate the two factors that make up the omitted variables bias and verify that these correspond with your estimate in (d). Do you find the evidence for the presence of omitted variables bias in the regression in (c) strong or weak? 

(f)  Investigate whether medical expenditures are related to family size by adding famsze to the regression in (d). Rerun this same regression, but now treating family size as a categorical variable (using dummies). Choose one of the estimated dummy coefficients and explain how this coefficient should be interpreted. 

(g)  Add the remaining variables (phylim, actlim, age, female, income) to the regression in (f) that uses the dummies. Comment on the sign and significance of the age variable. How would you explain this result? 

(h)  Compute the fitted values and residuals from the regression in (g) and compute their sample means. Explain how you could find these sample means without actually computing these two variables. Make a histogram of the residuals. What do you conclude about the assumption that εi is normally distributed? 

(i)  On slide 16 of lecture 7, it is stated that a possible model of heteroskedasticity is an exponential function of the regressors. A crude way to implement this is the following: (i) compute vˆi = ln(εˆ2i ); (ii) regress vˆi on Yˆi. Does this provide evidence for heteroskedasticity? 

(j)  Comment on how well the regressions from (c), (d), (f), and (g) fit the data. Compare the coefficient of supplemental insurance in these regressions. Does the coefficient vary a lot among these regressions? How about the standard error? How confident are you about the causal effect of supplemental insurance on medical expenditures? (For example, could there be selection bias? If so, give an example. If not, give an argument why not.) 


Subject Mathematics
Due By (Pacific Time) 09/23/2015 01:35 pm
Report DMCA

Chat Now!

out of 1971 reviews

Chat Now!

out of 766 reviews

Chat Now!

out of 1164 reviews

Chat Now!

out of 721 reviews

Chat Now!

out of 1600 reviews

Chat Now!

out of 770 reviews

Chat Now!

out of 766 reviews

Chat Now!

out of 680 reviews
All Rights Reserved. Copyright by - Copyright Policy