• If you are citizen of an European Union member nation, you may not use this service unless you are at least 16 years old.

• Finally, you can manage your Google Docs, uploads, and email attachments (plus Dropbox and Slack files) in one convenient place. Claim a free account, and in less than 2 minutes, Dokkio (from the makers of PBworks) can automatically organize your content for you.

View

# lab2

last edited by 11 years, 4 months ago

Q: In question 6 from the HW I am assuming that we are going from the error term e2 and determining the true variance and 95% CI based off of the sum for that variable. I am assuming also that I can check my calculation against what stata calculated. Is that right? Thanks, Nora

A:(Ted) correct, except that any estimate in Stata is the result of random draws from the error term.  By knowing the true variance of the error term you can calculate the true 95% ci.  But they should, in general, be close.

eed7: I am just writing to verify that on the second lab (including homework question 6) where you say/wrote the error term is N(0,2) that the 2 is not the variance but the standard deviation. I thought I remember you saying that in class, but might be mistaken.

TM: yes...the s.d. is 2, so the variance is 4.

Q: I'm not sure I understand what you mean by a "random draw".  Can you explain this?

Sure.   Take the example of dice.  A single roll of the die is a "draw" from the discrete random distribution 1-6.  A draw from the standard normal distribution is 1 randomly drawn  z.

lab 2 survey, 14 responses as of 12:45pm, 1/28/2008 (answers to questions typed in directly)

1. Again, I apologize for taking your time to answer these questions. However, this is extremely helpful for me to improve the course. Thanks. These questions refer to: Lab 2 How well do you feel that you understoo...

1.

Answer

Response

%

1

very well

2

14%

2

well

7

50%

3

so-so, OK

1

7%

4

not well

1

7%

5

I was lost

2

7%

6

I was absent from class (see next question)

2

14%

Total

14

100%

2. If you were absent from this class, please explain why.

 Text Response sick I was presenting a poster at the INSNA's Sunbelt Social Networks conference in St. Petersburg, FL.

3. If, in the previous question, you said you didn't understand the lecture/lab "well" or "very well", please comment on what the principal source of difficulty was.

 Text Response Issues with Stata syntax as well as some of the content of the lab   A (Ted): ok...let me or Michael know where you were having trouble. It was hard to pay attention because I was having issues with my computer and I didn't fully understand what the purpose of xi was, but I asked my partner at the end of the class and she explained to me that it was used for categorical variables. It automatically assigns a random number to the categories so that you don't have to do each one individually by hand.   A (Ted): xi allows you to turn a categorical variable into a set of indicator (1 or 0) variables. You have to exclude 1 category or you can't estimate the constant term. Example: you have a variable for race with 5 categories. "xi: reg wage i.race" will give you 4 variables indicating specific racial categories.   I also was a little apprehensive about the example you used in class regarding the use of a "seed number" and why we needed to manipulate the regression model, but then, again, I asked my partner, who explained to me that this was just a hypothetical example of what would happen to the regression coefficients if the seed numbers were randomly chosen. At first, I thought this was being used to show how we could manipulate the regression model to say whatever we wanted it to!!   A (Ted): Don't worry about the set seed #. The random numbers follow a sequence and the set seed changes the sequence. Lastly, I don't think I understood why the midpoint connections on the graph didn't show up...   A (Ted): because the graph only had 1 point per education category, it was redundant syntax. What I wanted to do was connect the points together in a line. I am having trouble remembering/looking up basic stuff from last semester, like how to calculate the 90% confidence interval, what we mean when we talk about cumulative probability, what is r^2 supposed to tell me. I feel like I'm just muddling through. I did OK with the first part of the lab, where we created the short do-file, but I find the labs kind of loud and chaotic; I can't always focus on what you're saying and also focus on Stata.

4. Did the class move at the right speed for you today?

1.

Answer

Response

%

1

Too fast

1

8%

2

OK

12

92%

3

Too slow

0

0%

Total

12

100%

5. If you said either "too fast" or "too slow" please explain (it can be brief, just enough so I get the idea)

 Text Response Once I got behind with the sytax, then I lost pace with the other info. I have scheduled a meeting with you to discuss specfic questions, so I won't list them below.

6. Do you have any specific questions regarding this lecture/lab or the class in general?

1.

Answer

Response

%

1

No questions, I'm cool.

11

92%

2

Yes (see below).

1

8%

Total

12

100%

7. What questions do you have?

 Text Response I think I elaborated pretty well in the previous question box. I'm still only about halfway through finishing the lab. I don't know how to begin interpreting "test whether the coefficient from #4 is equal to 1" or  "calculate the true variance of the coefficient." I'll look through your slides and the reading but I often don't have a clue as the first step I need to take when breaking down that kind of problem. I will need to schedule time with you or Andrew so I can work these things out loud; I feel like I'm behind the rest of the class. mike

8. Do you have any suggestions on how this lecture/lab (or the course in general) could be improved?

 Text Response I almost wish the lab sessions could be longer!!   (Ted): I will try to ensure that we have plenty of time for you to work on the homework in lab. I feel that whatever hands on learning we can get done in lab will be efficient in reducing the probability you get stuck working on your own. I think it would be good to switch up groups every once in a while   Ted: good suggestion. I didn't intend for the groups to be permanent. I found the simulation we did really helpful - it helped clarify some of the things we went over in lecture, and it's really useful to use as a study tool at home. Thanks for including this part in the lab. For me, I think that the homework is great for learning Stata commands but that more "optional" homework would be even better. I hate to just memorize commands instead of learning them in context.

More questions:

> Hi Ted,

>

> I am trying to do the rest of the lab 2 homework. Number 5 says: 5b:

> Use the following syntax as a guide and calculate the MSE of your

> regression (show the steps you took)

>

> reg y x

>

> predict yhat

>

> gen resid=yhat-y

>

> sum resid

>

> Doesn't the regression give you the MSE. In the output it says root

> MSE. I'm not sure how I would calculate it otherwise. I have this

> equation in my class notes that says MSE=RSS/N-2, but which value is RSS

> in the output? Should the equation read MSE=4845642.98/55943 ??

>

>

> i.sex _Isex_1-2 (naturally coded; _Isex_1 omitted)

>

> Source | SS df MS Number of obs =

> 55945

> -------------+------------------------------ F( 1, 55943)

> =13373.27

> Model | 1158359.21 1 1158359.21 Prob > F =

> 0.0000

> Residual | 4845642.98 55943 86.6175032 R-squared =

> 0.1929

> -------------+------------------------------ Adj R-squared =

> 0.1929

> Total | 6004002.2 55944 107.321647 Root MSE =

> 9.3069

>

> ------------------------------------------------------------------------------

>

> wage_re Coef. Std. Err. t P>t| [95% Conf.

> Interval]

> -------------+----------------------------------------------------------------

>

> edyrs | 2.374525 .0205333 115.64 0.000 2.33428

> 2.41477

> _Isex_2 | (dropped)

> _cons | -15.42167 .2721664 -56.66 0.000 -15.95512

> -14.88822

>

Ted: Yes...the regression output gives it to you. The RSS is the

SS for the residual, 4845642.98.

I also want you to be able to calculate it using Stata. Here is an example:

> . use morg05_small_1

>

> . reg wage sex

>

> Source | SS df MS Number of obs = 111588

> -------------+------------------------------ F( 1,111586) = 2747.77

> Model | 367325.419 1 367325.419 Prob > F = 0.0000

> Residual | 14916957.7111586 133.681266 R-squared = 0.0240

> -------------+------------------------------ Adj R-squared = 0.0240

> Total | 15284283.2111587 136.971898 Root MSE = 11.562

>

> ------------------------------------------------------------------------------

> wage_re Coef. Std. Err. t P>t| [95% Conf. Interval]

> -------------+----------------------------------------------------------------

> sex | -3.628677 .0692242 -52.42 0.000 -3.764356 -3.492999

> _cons | 22.97912 .1095419 209.77 0.000 22.76442 23.19382

> ------------------------------------------------------------------------------

>

>

> . reg wage edyrs

>

> Source | SS df MS Number of obs = 111588

> -------------+------------------------------ F( 1,111586) =24198.42

> Model | 2723843.14 1 2723843.14 Prob > F = 0.0000

> Residual | 12560440111586 112.562867 R-squared = 0.1782

> -------------+------------------------------ Adj R-squared = 0.1782

> Total | 15284283.2111587 136.971898 Root MSE = 10.61

>

> ------------------------------------------------------------------------------

> wage_re Coef. Std. Err. t P>t| [95% Conf. Interval]

> -------------+----------------------------------------------------------------

> edyrs | 2.301577 .0147956 155.56 0.000 2.272578 2.330577

> _cons | -12.39165 .194962 -63.56 0.000 -12.77378 -12.00953

> ------------------------------------------------------------------------------

>

> . predict hat

> (option xb assumed; fitted values)

>

> . gen e=wage-hat

>

> . sum e

>

> Variable | Obs Mean Std. Dev. Min Max

> -------------+--------------------------------------------------------

> e | 111588 1.74e-08 10.60952 -30.15074 230.0743

>

> . di 10.609^2

> 112.55088

>

I know it's probably too late to post questions and get a response, but I am working on the homework for lab 2 and have gotten to the same problem as the one previously posted. Only, this time, whenever I type "sum resid" I get 0's for all of the summary statistics (mean, std dev, min, max). I know this is not right, but I don't know what I did wrong. I did everything else according to the directions on the lab handout. Here is what I did below.

. reg wage_ave edyrs if sex==1

Source | SS df MS Number of obs = 14

-------------+------------------------------ F( 1, 12) = 26.57

Model | 753.680098 1 753.680098 Prob > F = 0.0002

Residual | 340.448425 12 28.3707021 R-squared = 0.6888

-------------+------------------------------ Adj R-squared = 0.6629

Total | 1094.12852 13 84.1637326 Root MSE = 5.3264

wage_ave Coef. Std. Err. t P>t| [95% Conf. Interval]

-------------+----------------------------------------------------------------

edyrs | 1.393086 .2702834 5.15 0.000 .8041889 1.981983

_cons | -.5771335 3.244205 -0.18 0.862 -7.64565 6.491383

. predict yhat

(option xb assumed; fitted values)

. gen resid=yehat-y

yehat not found

r(111);

. gen resid=yhat-y

. sum resid

Variable | Obs Mean Std. Dev. Min Max

-------------+--------------------------------------------------------

resid | 28 0 0 0 0

Also, on number 6, how are we supposed to the denominator (x-x-bar)^2 for the standard error for B1 when the regression output and summary statistics do not give the individual values for x?

***I think the answer to your problem is that you hadn't defined y in your statement: <<gen resid=yhat-y>>. I think the instructions are simply to show the syntax, in other words they say y as in the y variable you used in the equation, the dependent variable (in this case wage). Hope that helps and didn't lead you astray.

(Ted):  instead of

gen resid=yhat-y

do this:

gen resid=wage-yhat

Hi everyone, I was not in the lab and am having problems answering question 6 (at the beginning) because it seems like a word is missing. It says "Q6: referring to the output, what is the estimated". My question is what is the estimated what? Or does it mean what is estimated? If anyone could help me with this I'd appreciate it. Sincerely, Ashton

(Ted): That was part of a question for me to ask in class, not to answer as homework.  The homework questions start farther down in the lab.  Sorry for the confusion.

Q: I'm still feeling like there's a huge disconnect between the lab and the lectures.  I get that you are trying to get us to understand the calculations behind the Stata work but I feel like I'm missing basic stuff here.  I still didn't get how to set up the hypothesis test in the lab 2 question 4, or how to get the estimated variance of B1 based on the sum table in question 6.

(Ted):  I assume you are talking about question 5 from lab2:

5.  Test whether the coefficient from #4 is equal to 1.

Refer to equations 3 and 4 from lecture 2.  Convert the test into a z value.  Follow the steps in the lecture...(***we will talk about this in class)

(Ted) With respect to question 6, refer to the formula for the variance of b1.  What pieces of information do you need?  We know what the s.d. of the error term is.  What else do you need to know?  You have all you need with the formula and a "sum" of all the variables in Stata.

### Comments (0)

You don't have permission to comment on this page.