**p**(probability) values for the constant (a) and X, actually the slope of the line (bThese values measure the probability that the values for a and b are not derived by chance. These p values are not a measure of ‘goodness of fit’

*per se*, rather they state the confidence that one can have in the estimated values being correct, given the constraints of the regression analysis (

*ie.*, linear with all data points having equal influence on the fitted line).

**R-squared**and

**adjusted R-squared**values are estimates of the ‘goodness of fit’ of the line. They represent the

**% variation of the data explained by the fitted line**; the closer the points to the line, the better the fit. Adjusted R-squared is not sensitive to the number of points within the data. R-squared is derived from

R-squared = 100 * SS(regression) / SS(total)

*r*, the correlation coefficient.

**co.eeficient of determination**that explains tha mount of variation as explained by the model

**SS(regression)** describes the variation within the fitted values of **Y**, and is the sum of the squared difference between each **fitted** value of **Y** and the mean of **Y**. The squares are taken to ‘remove’ the sign (+ or -) from the residual values to make the calculation easier.

**SS(error)** **SSE **describes the variation of observed Y from estimated (fitted) Y. It is derived from the cumulative addition of the square of each **residual**, where a residual is the distance of a data point above or below the fitted line (see **Fig 2.2**).

**SS(total)** **SST** describes the variation within the values of **Y**, and is the sum of the squared difference between each value of **Y** and the mean of **Y**.

Regression 1 SSR SSR /1 SSR /s^2

Error n − 2 SSE s^2 = SSE /(n − 2)

references : http://www.le.ac.uk/bl/gat/virtualfc/Stats/regression/regr1.html

http://www.math.umbc.edu/~kofi/Courses/Stat355/Chap12.pdf

http://www.econ.ucsb.edu/~pjkuhn/AEASTP/Lecture Notes is a good point for linear regression notes

From Wood, GAM.pdf

The results of the linear model need to be checked if the assumption for the erros are met: namely-independant errors and homoscedasticity. This is done by using residual plots-

1. the residuals x fitted a scattered around the 0 value without any discernible pattern

2. QQ plot should show a normal distribution