discuss at beginning of class:
equation 1 and table 1 from lec0
go over mean, variance and covariance formulas
>these are the three key pieces to understanding regression
discuss why covariance is so important.
Q: draw an xy line indicating , 0, and + covariance.
(show how this works with actual numbers>i.e., to maximize covariance the deviations from the mean should go the same direction etc.)
Calculating beta:
In the regression line y=a+bx, b=beta is calculated as:
b=cov(x,y)/var(x).
Thus, beta is the ratio of what is explained by the covariance of x and y over the total variation observed for x.
The error term
In measuring a relationship for y and x modeled by the regression line y=a+bx, the relationship between x(i) and y(i) for observation i is:
y(i)=a+bx(i)+e(i)
the error term e is assumed to be normally distributed around the line y=a+bx with mean zero and variance sigma. Thus, on average,
e does not change the regression line. However, for observation i, e(i) is the difference between what the general line predicts for x and y and
what is observed for the observed pair (xi,yi).
Lecture 1 questions and comments (from the survey)
Q: I feel like come steps were missing. I feel like I went from understanding y=mx+b to not understanding where all of
the other terms (e.g., variance and covariance play into this equation).
A:(Ted) We jumped from the formula for the regression line into the solution for b1 and b0 on slide 14 of lecture 1. In equation 3 of slide 14 we
want to minimize the sum of squared errors, and the answer of how to do that is Equation 4 and 5. Later we will prove it.
Right now you have to take it on faith.
I very much enjoyed the class. I understand that some students might not be comfortable with equations, but I felt there
was no need apologizing for using math notation.

But, more details would have been beneficial.
Ted: Details on what?

Today's class was very helpful.

I think everyone needs different things explained to them. For me the review of the
equation from class 1 was too slow and I was eager to move on. But it is not a big deal.

Q: Your description of covariance sounded a lot like correlation. What's the difference between
the two?
How is the extent to which x and y are correlated related to slope of the regression line?
A:(Ted) They are very similar. Correlation is Cov(x,y)/[SE(x)*SE(y)] where SE is standard deviation. Hence, in simple
OLS, the estimate of b1 is related to the concept of correlation.

Q: We discussed that Y has a variance = sigma squared. I think the textbook also said that the error term
has a variance of sigma squared. Does that mean that Y and the error term have the same variance?
A:(Ted) Yes.

Q1. During class you pointed out that although we make the assumption that the residual is normally
distributed, we can relax this assumption and still get "consistent" estimates of the parameters. You also
clarified what you meant by "consistent". Could you reiterate what you mean by "consistent"?
A:(Ted) Consistent means "the right number, on average" >i.e., the average estimate will have the right answer, but
any single estimate could be off.
2. You pointed out that in some (more advanced) cases of regression, you may minimize the absolute
value of the residuals rather than minimizing the sum of the squares. Out of curiosity, what
type of situation would incline you to use the absolute value rather than the sum of the squares
(and is there a specific name for this technique)?
A:(Ted) An example is quantile regression, which estimates the effect of X at different percentiles of Y.
http://en.wikipedia.org/wiki/Quantile_regression

Comments (0)
You don't have permission to comment on this page.