Elements of Econometrics

Task description

Instruction

Answer all questions following a similar format of the answers to your tutorial questions. That is, when you use Stata (R) to conduct empirical analysis, you should show your Stata (R) commands and outputs (e.g., screenshots for commands, tables, and figures). When you are asked to discuss or interpret, your response should be brief and compact. To facilitate the grading work, please clearly label all your answers.

The marking system will check the similarity, and UQ’s student integrity and misconduct policy on plagiarism apply.

Question 1: OLS, TSLS, and Panel Data Regression

Consider the following linear panel data model

where (x_it,₁; x_it,2; x_it,3) are explanatory variables, u_it is unobservable error, and (β₀, β₁, β₂, β₃) are unknown parameters of interest. As usual, i = 1, …., N refers to individuals (id, cross-sectional units) and t = 1, …., T refers to time periods. Use the data file Q1data.dta (provided) to answer the following questions. Unless otherwise specified, use 5% as the significance level for all the tests below.

(5 points) Declare the data to be a panel via specifying the individual identifier (id) and time identifier (t). Which regressor(s) are not time-varying? What are N and T? Do you have a balanced panel? [Hint: You can use the egen command along with the by option to compute the standard deviation of each regressor for each i. Which regressor(s) have zero variation over time?]

(5 points) Use OLS to estimate (1) and report estimation results.

(10 points) It is well-known that the standard errors (SE) of panel data estimation need to be adjusted to control for likely correlation of the error uit over time for given i (clustering on i), i.e., C() ≠ 0 for t ≠ s. Re-estimate (1) using OLS and calculate cluster-robust SE. Compare the estimation results with those obtained in (b). Comment on your findings. If C() ≠ 0 is true, do you think the OLS estimator is BLUE?

(10 points) One of your friends argues that the OLS estimator may be problematic as x_it,1 is probably endogenous. If this were true, which assumption of linear regression would not be valid, and what could be wrong with using OLS? Your friend suggests that you should use TSLS rather than OLS. In particular, he proposes two instrumental variables (IV), z_it,1 and z_it,2, for x_it,1. What conditions must hold for z_it,1 and z_it,2to be valid IV?

(15 points) Estimate (1) using TSLS with z_it,1 and z_it,2 as IV. As in (c), you should compute and report cluster-robust SE. Compare the TSLS estimates with the OLS estimates obtained in (c), and comment on your findings. Assuming both z_it,1 and z_it,2 are valid IV, do you think x_it,1 is an endogenous regressor? Explain your answer. Suppose you are pretty sure that z_it,1 is exogenous. Name a test that can be used to check if z_it,2 also satisfies the exogeneity condition. Assess the strength of (z_it,1; z_it,2) as IV. What is the first-stage regression of the TSLS?

(f) (10 points) To capture potential time effects, consider the following model

where d_s,t are time dummies (d_s,t = 1 if s = t, and 0 otherwise). Note that the sample includes data from t = 1 to t = T, but (2) includes only dummies for t = 2 to t = T. Why?

Estimate (2) using TSLS with z_it,1 and z_it,2 as IV and test if time effects are significant, i.e., at least one γ_t are not zero. With time effects controlled, do you think x_it,1 is still an endogenous regressor? [Hint: Use OLS and TSLS to estimate (2) and compare their estimates.]

(g) (10 points) Suppose that v_it = α_i + e_it with . Re-write (2) as

Treat α_i as fixed effects (FE). Use an FE estimator to estimate (3) (To report the estimation results, you only need to post the FE regression table returned by Stata). Justify the fact that the FE estimator cannot estimate all slope coefficients. Compare the FE estimates with the TSLS estimates obtained in (f). Comment on your findings.

Question 2: Binary Response Model

In April 2008, the unemployment rate in the United States stood at 5%. By April 2009, it had increased to 9%, and it had increased further, to 10%, by October 2009. Were some groups of workers more likely to lose their jobs than others during the Great Recession? For example, were young workers more likely to lose their jobs than middle-aged workers? What about workers with a college degree versus those without a degree or women versus men? The data file employment 08-09.dta (provided) contains a random sample of 5440 workers who were surveyed in April 2008 and reported that hey were employed full-time. A detailed description is given in employment 08 09 description.pdf (provided). These workers were surveyed one year later, in April 2009, and asked about their employment status (employed, unemployed, or out of the labor force). The data set also includes various demographic measures for each individual. Use these data to answer the following questions.

(5 points) Regress employed on age and age², using a linear probability model (LPM). Report regression results. Was age a statistically significant determinant of employment? Is there evidence of a nonlinear effect of age on the probability of being employed?

(5 points) Repeat (a) using a probit regression.

(5 points) Repeat (a) using a logit regression.

(6 points) Compute the predicted probability of employment for a 20-year-old worker, a 40-year-old worker, and a 60-year-old worker.

(4 points) Are there important differences in your answers to (d)? Explain.

(f) (10 points) The data set includes variables measuring the workers’ educational attainment, sex, race, marital status, region of the country, and weekly earnings in April 2008. Repeat (a)-(c) using these factors as additional regressors and construct a table like Table 11.2 in SW (pp. 410-411) to investigate whether the conclusions on the effect of age on employment from (a)-(c) are affected by omitted variable bias. Use the regressions in your table to discuss the characteristics of workers who were hurt most by the Great Recession. [Hint: You will need to generate dummies for race groups and use logarithm of weekly earnings.]