Q: In question 6 from the HW I am assuming that we are going from the error term e2 and determining the true variance and 95% CI based off of the sum for that variable. I am assuming also that I can check my calculation against what stata calculated. Is that right? Thanks, Nora
A:(Ted) correct, except that any estimate in Stata is the result of random draws from the error term. By knowing the true variance of the error term you can calculate the true 95% ci. But they should, in general, be close.
eed7: I am just writing to verify that on the second lab (including homework question 6) where you say/wrote the error term is N(0,2) that the 2 is not the variance but the standard deviation. I thought I remember you saying that in class, but might be mistaken.
TM: yes...the s.d. is 2, so the variance is 4.
Q: I'm not sure I understand what you mean by a "random draw". Can you explain this?
Sure. Take the example of dice. A single roll of the die is a "draw" from the discrete random distribution 16. A draw from the standard normal distribution is 1 randomly drawn z.
lab 2 survey, 14 responses as of 12:45pm, 1/28/2008 (answers to questions typed in directly)
1. Again, I apologize for taking your time to answer these questions. However, this is extremely helpful for me to improve the course. Thanks. These questions refer to: Lab 2 How well do you feel that you understoo...

Answer 

Response 
% 

1 
very well 

2 
14% 

2 
well 

7 
50% 

3 
soso, OK 

1 
7% 

4 
not well 

1 
7% 

5 
I was lost 

2 
7% 

6 
I was absent from class (see next question) 

2 
14% 


Total 

14 
100% 
2. If you were absent from this class, please explain why.
Text Response 
sick 
I was presenting a poster at the INSNA's Sunbelt Social Networks conference in St. Petersburg, FL. 
3. If, in the previous question, you said you didn't understand the lecture/lab "well" or "very well", please comment on what the principal source of difficulty was.
Text Response 
Issues with Stata syntax as well as some of the content of the lab
A (Ted): ok...let me or Michael know where you were having trouble. 
It was hard to pay attention because I was having issues with my computer and I didn't fully understand what the purpose of xi was, but I asked my partner at the end of the class and she explained to me that it was used for categorical variables. It automatically assigns a random number to the categories so that you don't have to do each one individually by hand.
A (Ted): xi allows you to turn a categorical variable into a set of indicator (1 or 0) variables. You have to exclude 1 category or you can't estimate the constant term. Example: you have a variable for race with 5 categories. "xi: reg wage i.race" will give you 4 variables indicating specific racial categories.
I also was a little apprehensive about the example you used in class regarding the use of a "seed number" and why we needed to manipulate the regression model, but then, again, I asked my partner, who explained to me that this was just a hypothetical example of what would happen to the regression coefficients if the seed numbers were randomly chosen. At first, I thought this was being used to show how we could manipulate the regression model to say whatever we wanted it to!!
A (Ted): Don't worry about the set seed #. The random numbers follow a sequence and the set seed changes the sequence. Lastly, I don't think I understood why the midpoint connections on the graph didn't show up...
A (Ted): because the graph only had 1 point per education category, it was redundant syntax. What I wanted to do was connect the points together in a line.

I am having trouble remembering/looking up basic stuff from last semester, like how to calculate the 90% confidence interval, what we mean when we talk about cumulative probability, what is r^2 supposed to tell me. I feel like I'm just muddling through. I did OK with the first part of the lab, where we created the short dofile, but I find the labs kind of loud and chaotic; I can't always focus on what you're saying and also focus on Stata. 
4. Did the class move at the right speed for you today?

Answer 

Response 
% 

1 
Too fast 

1 
8% 

2 
OK 

12 
92% 

3 
Too slow 

0 
0% 


Total 

12 
100% 
5. If you said either "too fast" or "too slow" please explain (it can be brief, just enough so I get the idea)
Text Response 
Once I got behind with the sytax, then I lost pace with the other info. I have scheduled a meeting with you to discuss specfic questions, so I won't list them below. 
6. Do you have any specific questions regarding this lecture/lab or the class in general?

Answer 

Response 
% 

1 
No questions, I'm cool. 

11 
92% 

2 
Yes (see below). 

1 
8% 


Total 

12 
100% 
7. What questions do you have?
Text Response 
I think I elaborated pretty well in the previous question box. 
I'm still only about halfway through finishing the lab. I don't know how to begin interpreting "test whether the coefficient from #4 is equal to 1" or "calculate the true variance of the coefficient." I'll look through your slides and the reading but I often don't have a clue as the first step I need to take when breaking down that kind of problem. I will need to schedule time with you or Andrew so I can work these things out loud; I feel like I'm behind the rest of the class. mike 
8. Do you have any suggestions on how this lecture/lab (or the course in general) could be improved?
Text Response 
I almost wish the lab sessions could be longer!!
(Ted): I will try to ensure that we have plenty of time for you to work on the homework in lab. I feel that whatever hands on learning we can get done in lab will be efficient in reducing the probability you get stuck working on your own. 
I think it would be good to switch up groups every once in a while
Ted: good suggestion. I didn't intend for the groups to be permanent. 
I found the simulation we did really helpful  it helped clarify some of the things we went over in lecture, and it's really useful to use as a study tool at home. Thanks for including this part in the lab. 
For me, I think that the homework is great for learning Stata commands but that more "optional" homework would be even better. I hate to just memorize commands instead of learning them in context. 
More questions:
> Hi Ted,
>
> I am trying to do the rest of the lab 2 homework. Number 5 says: 5b:
> Use the following syntax as a guide and calculate the MSE of your
> regression (show the steps you took)
>
> reg y x
>
> predict yhat
>
> gen resid=yhaty
>
> sum resid
>
> Doesn't the regression give you the MSE. In the output it says root
> MSE. I'm not sure how I would calculate it otherwise. I have this
> equation in my class notes that says MSE=RSS/N2, but which value is RSS
> in the output? Should the equation read MSE=4845642.98/55943 ??
>
>
> i.sex _Isex_12 (naturally coded; _Isex_1 omitted)
>
> Source  SS df MS Number of obs =
> 55945
> + F( 1, 55943)
> =13373.27
> Model  1158359.21 1 1158359.21 Prob > F =
> 0.0000
> Residual  4845642.98 55943 86.6175032 Rsquared =
> 0.1929
> + Adj Rsquared =
> 0.1929
> Total  6004002.2 55944 107.321647 Root MSE =
> 9.3069
>
> 
>
> wage_re Coef. Std. Err. t P>t [95% Conf.
> Interval]
> +
>
> edyrs  2.374525 .0205333 115.64 0.000 2.33428
> 2.41477
> _Isex_2  (dropped)
> _cons  15.42167 .2721664 56.66 0.000 15.95512
> 14.88822
>
Ted: Yes...the regression output gives it to you. The RSS is the
SS for the residual, 4845642.98.
I also want you to be able to calculate it using Stata. Here is an example:
> . use morg05_small_1
>
> . reg wage sex
>
> Source  SS df MS Number of obs = 111588
> + F( 1,111586) = 2747.77
> Model  367325.419 1 367325.419 Prob > F = 0.0000
> Residual  14916957.7111586 133.681266 Rsquared = 0.0240
> + Adj Rsquared = 0.0240
> Total  15284283.2111587 136.971898 Root MSE = 11.562
>
> 
> wage_re Coef. Std. Err. t P>t [95% Conf. Interval]
> +
> sex  3.628677 .0692242 52.42 0.000 3.764356 3.492999
> _cons  22.97912 .1095419 209.77 0.000 22.76442 23.19382
> 
>
>
> . reg wage edyrs
>
> Source  SS df MS Number of obs = 111588
> + F( 1,111586) =24198.42
> Model  2723843.14 1 2723843.14 Prob > F = 0.0000
> Residual  12560440111586 112.562867 Rsquared = 0.1782
> + Adj Rsquared = 0.1782
> Total  15284283.2111587 136.971898 Root MSE = 10.61
>
> 
> wage_re Coef. Std. Err. t P>t [95% Conf. Interval]
> +
> edyrs  2.301577 .0147956 155.56 0.000 2.272578 2.330577
> _cons  12.39165 .194962 63.56 0.000 12.77378 12.00953
> 
>
> . predict hat
> (option xb assumed; fitted values)
>
> . gen e=wagehat
>
> . sum e
>
> Variable  Obs Mean Std. Dev. Min Max
> +
> e  111588 1.74e08 10.60952 30.15074 230.0743
>
> . di 10.609^2
> 112.55088
>
I know it's probably too late to post questions and get a response, but I am working on the homework for lab 2 and have gotten to the same problem as the one previously posted. Only, this time, whenever I type "sum resid" I get 0's for all of the summary statistics (mean, std dev, min, max). I know this is not right, but I don't know what I did wrong. I did everything else according to the directions on the lab handout. Here is what I did below.
. reg wage_ave edyrs if sex==1
Source  SS df MS Number of obs = 14
+ F( 1, 12) = 26.57
Model  753.680098 1 753.680098 Prob > F = 0.0002
Residual  340.448425 12 28.3707021 Rsquared = 0.6888
+ Adj Rsquared = 0.6629
Total  1094.12852 13 84.1637326 Root MSE = 5.3264
wage_ave Coef. Std. Err. t P>t [95% Conf. Interval]
+
edyrs  1.393086 .2702834 5.15 0.000 .8041889 1.981983
_cons  .5771335 3.244205 0.18 0.862 7.64565 6.491383
. predict yhat
(option xb assumed; fitted values)
. gen resid=yehaty
yehat not found
r(111);
. gen resid=yhaty
. sum resid
Variable  Obs Mean Std. Dev. Min Max
+
resid  28 0 0 0 0
Also, on number 6, how are we supposed to the denominator (xxbar)^2 for the standard error for B1 when the regression output and summary statistics do not give the individual values for x?
***I think the answer to your problem is that you hadn't defined y in your statement: <<gen resid=yhaty>>. I think the instructions are simply to show the syntax, in other words they say y as in the y variable you used in the equation, the dependent variable (in this case wage). Hope that helps and didn't lead you astray.
(Ted): instead of
gen resid=yhaty
do this:
gen resid=wageyhat
Hi everyone, I was not in the lab and am having problems answering question 6 (at the beginning) because it seems like a word is missing. It says "Q6: referring to the output, what is the estimated". My question is what is the estimated what? Or does it mean what is estimated? If anyone could help me with this I'd appreciate it. Sincerely, Ashton
(Ted): That was part of a question for me to ask in class, not to answer as homework. The homework questions start farther down in the lab. Sorry for the confusion.
Q: I'm still feeling like there's a huge disconnect between the lab and the lectures. I get that you are trying to get us to understand the calculations behind the Stata work but I feel like I'm missing basic stuff here. I still didn't get how to set up the hypothesis test in the lab 2 question 4, or how to get the estimated variance of B1 based on the sum table in question 6.
(Ted): I assume you are talking about question 5 from lab2:
5. Test whether the coefficient from #4 is equal to 1.
Refer to equations 3 and 4 from lecture 2. Convert the test into a z value. Follow the steps in the lecture...(***we will talk about this in class)
(Ted) With respect to question 6, refer to the formula for the variance of b1. What pieces of information do you need? We know what the s.d. of the error term is. What else do you need to know? You have all you need with the formula and a "sum" of all the variables in Stata.