**Task description**

**Instruction**

Answer all questions following a similar format of the answers to your tutorial questions. That is, when you use Stata (R) to conduct empirical analysis, you should show your Stata (R) commands and outputs (e.g., screenshots for commands, tables, and figures). When you are asked to discuss or interpret, your response should be brief and compact. To facilitate the grading work, please clearly label all your answers.

The marking system will check the similarity, and UQ’s student integrity and misconduct policy on plagiarism apply.

** **

__Question 1: OLS, TSLS, and Panel Data Regression__

Consider the following linear panel data model

where (*x _{it},_{1}; x_{it,2}; x_{it,3}*) are explanatory variables, u

- (5 points) Declare the data to be a panel via specifying the individual identifier (
*id*) and time identifier (*t*). Which regressor(s) are*not*time-varying? What are*N*and*T*? Do you have a balanced panel? [*Hint*: You can use the**egen**command along with the by option to compute the standard deviation of each regressor for each*i*. Which regressor(s) have zero variation over time?]

- (5 points) Use OLS to estimate (1) and report estimation results.

- (10 points) It is well-known that the standard errors (SE) of panel data estimation need to be adjusted to control for likely correlation of the error
*uit*over time for given*i*(clustering on*i*), i.e.,*C*() ≠ 0 for*t*≠*s*. Re-estimate (1) using OLS and calculate cluster-robust SE. Compare the estimation results with those obtained in (b). Comment on your findings. If*C*() ≠ 0 is true, do you think the OLS estimator is BLUE?

- (10 points) One of your friends argues that the OLS estimator may be problematic as
*x*is probably endogenous. If this were true, which assumption of linear regression would not be valid, and what could be wrong with using OLS? Your friend suggests that you should use TSLS rather than OLS. In particular, he proposes two instrumental variables (IV),_{it,1}*z*and_{it,1}*z*, for_{it,2}*x*. What conditions must hold for_{it,1}*z*and_{it,1}*z*to be valid IV?_{it,2 }

- (15 points) Estimate (1) using TSLS with
*z*and_{it,1}*z*as IV. As in (c), you should compute and report cluster-robust SE. Compare the TSLS estimates with the OLS estimates obtained in (c), and comment on your findings. Assuming both_{it,2}*z*and_{it,1}*z*are valid IV, do you think_{it,2}*x*is an endogenous regressor? Explain your answer. Suppose you are pretty sure that_{it,1}*z*is exogenous. Name a test that can be used to check if_{it,1}*z*also satisfies the exogeneity condition. Assess the strength of (_{it,2}*z*;_{it,1}*z*) as IV. What is the first-stage regression of the TSLS?_{it,2}

(f) (10 points) To capture potential time effects, consider the following model

where *d _{s,t}* are time dummies (

Estimate (2) using TSLS with *z _{it,1}* and

(g) (10 points) Suppose that *v _{it}* =

Treat *α _{i}* as fixed effects (FE). Use an FE estimator to estimate (3)

** **

__Question 2: Binary Response Model__

In April 2008, the unemployment rate in the United States stood at 5%. By April 2009, it had increased to 9%, and it had increased further, to 10%, by October 2009. Were some groups of workers more likely to lose their jobs than others during the Great Recession? For example, were young workers more likely to lose their jobs than middle-aged workers? What about workers with a college degree versus those without a degree or women versus men? The data file employment 08-09.dta *(provided)* contains a random sample of 5440 workers who were surveyed in April 2008 and reported that hey were employed full-time. A detailed description is given in employment 08 09 description.pdf *(provided)*. These workers were surveyed one year later, in April 2009, and asked about their employment status (employed, unemployed, or out of the labor force). The data set also includes various demographic measures for each individual. Use these data to answer the following questions.

- (5 points) Regress employed on age and age
^{2}, using a linear probability model (LPM). Report regression results. Was age a statistically significant determinant of employment? Is there evidence of a nonlinear effect of age on the probability of being employed?

- (5 points) Repeat (a) using a probit regression.

- (5 points) Repeat (a) using a logit regression.

- (6 points) Compute the predicted probability of employment for a 20-year-old worker, a 40-year-old worker, and a 60-year-old worker.

- (4 points) Are there important differences in your answers to (d)? Explain.

(f) (10 points) The data set includes variables measuring the workers’ educational attainment, sex, race, marital status, region of the country, and weekly earnings in April 2008. Repeat (a)-(c) using these factors as additional regressors and construct a table like Table 11.2 in SW (pp. 410-411) to investigate whether the conclusions on the effect of age on employment from (a)-(c) are affected by omitted variable bias. Use the regressions in your table to discuss the characteristics of workers who were hurt most by the Great Recession. [*Hint*: You will need to generate dummies for race groups and use logarithm of weekly earnings.]