• If you are citizen of an European Union member nation, you may not use this service unless you are at least 16 years old.

View

# lec1

last edited by 12 years, 7 months ago

discuss at beginning of class:

equation 1 and table 1 from lec0

go over mean, variance and covariance formulas

-->these are the three key pieces to understanding regression

discuss why covariance is so important.

Q: draw an x-y line indicating -, 0, and + covariance.

(show how this works with actual numbers-->i.e., to maximize covariance the deviations from the mean should go the same direction etc.)

Calculating beta:

In the regression line y=a+bx, b=beta is calculated as:

b=cov(x,y)/var(x).

Thus, beta is the ratio of what is explained by the covariance of x and y over the total variation observed for x.

The error term

In measuring a relationship for y and x modeled by the regression line y=a+bx, the relationship between x(i) and y(i) for observation i is:

y(i)=a+bx(i)+e(i)

the error term e is assumed to be normally distributed around the line y=a+bx with mean zero and variance sigma.  Thus, on average,

e does not change the regression line.  However, for observation i, e(i) is the difference between what the general line predicts for x and y and

what is observed for the observed pair (xi,yi).

Lecture 1 questions and comments (from the survey)

Q: I feel like come steps were missing. I feel like I went from understanding y=mx+b to not understanding where all of

the other terms (e.g., variance and covariance play into this equation).

A:(Ted) We jumped from the formula for the regression line into the solution for b1 and b0 on slide 14 of lecture 1.  In equation 3 of slide 14 we

want to minimize the sum of squared errors, and the answer of how to do that is Equation 4 and 5.  Later we will prove it.

Right now you have to take it on faith.

 I very much enjoyed the class. I understand that some students might not be comfortable with equations, but I felt there was no need apologizing for using math notation. But, more details would have been beneficial. Ted: Details on what? Today's class was very helpful. I think everyone needs different things explained to them. For me the review of the equation from class 1 was too slow and I was eager to move on. But it is not a big deal.

 Q: Your description of covariance sounded a lot like correlation. What's the difference between the two? How is the extent to which x and y are correlated related to slope of the regression line? A:(Ted) They are very similar.  Correlation is Cov(x,y)/[SE(x)*SE(y)]  where SE is standard deviation.  Hence, in simple OLS, the estimate of b1 is related to the concept of correlation. Q: We discussed that Y has a variance = sigma squared.  I think the textbook also said that the error term has a variance of sigma squared.  Does that mean that Y and the error term have the same variance?   A:(Ted) Yes. Q1.  During class you pointed out that although we make the assumption that the residual is normally distributed, we can relax this assumption and still get "consistent" estimates of the parameters.  You also clarified what you meant by "consistent".  Could you reiterate what you mean by "consistent"?      A:(Ted) Consistent means "the right number, on average" -->i.e., the average estimate will have the right answer, but any single estimate could be off.   2.  You pointed out that in some (more advanced) cases of regression, you may minimize the absolute value of the residuals rather than minimizing the sum of the squares.  Out of curiosity, what type of situation would incline you to use the absolute value rather than the sum of the squares (and is there a specific name for this technique)?   A:(Ted) An example is quantile regression, which estimates the effect of X at different percentiles of Y. http://en.wikipedia.org/wiki/Quantile_regression