LINEAR REGRESSION AND STATISTICS

Last Update 03/ 6/ 2016

in English/ in Portuguese

The least squares method is used by the program min6 in this page as a tool in order to obtain the linear regression, the x and y graphic with the line of best fit, and the respectively straight line equation for the data entered by the user when at least 4 pairs of numbers  x[0], y[0], x[1], y[1], x[2], y[2] and x[3], y[3] were entered on panel Ci sketched on figure-1. The program writes the straight line equation, Y = C2 * x  + C1, writes the C2 and C1 constants and the calculated Y[n] value for each x[n] value entered and writes also the percentage error E%[n] of the original y[n] value entered and the Y[n] value calculated by regression. It writes the amplification factors: Nx for the values represented on the x axis and Ny for the values represented on the y axis of the graphic. This program writes the standard deviation of the residuals of Y, the standard deviation of C1 and of C2 and the correlation coefficient. This program will be able to acquire a population or pop up to 50 pairs of numbers  x[n] and y[n], in this case the 50th pair of numbers entered will be x[49] and y[49]. As mentioned above, the program will calculate Nx and Ny, on the other hand if the user wants to customize the graphic, then other Nx and/or Ny values can be entered. Any pair of acquired values x[n] and y[n] can be corrected acquiring a new pair of values x'[n] and y'[n] where both may be new values or one new value and the other an old value. Before the start of a new set of values it is necessary to clean all the previously registered values. Table-I directs immediately to selected sections in this page.

Table-I The sections of this resource
Sections
Beginning
To acquire a pair of values x[n] and y[n]
To change from to on graphic and vice versa
To change the graphic amplification
To clean all acquired data
To drag the panel of functions
To enable or not indices on the graphic
To erase anything before a click on button E
To observe the graphic
To see the statistics and acquired values
To select the other panel of functions
Reference

Beginning

The application at the opening of this page shows a hypothetical grading of students results on test with teacher in the classroom represented on the x axis and grading results in a homework as difficult as the test, without consult and with no teacher on the y axis. The scale represents 0 for completely failed to 100 for excellent.

To select the other panel of functions
If the panel of functions shows button Ci, click on button Ci to operate on the panel of functions Cii, see figure-1.
If the panel of functions shows button Cii, click on button Cii to operate on the panel of functions Ci, see figure-2.

To observe the graphic
If the program acquired 4 or more pairs of numbers x[n] and y[n]: click on button xy on panel Cii, as sketched on figure-2.

To change from  to  on graphic and vice versa
On panel of functions Cii the user can modify the graphic identification of points already marked by  with a click on button  , as shown on figure-2. A second click on the same button will change the rings by points.
To modify the graphic marks  to , click on button  on panel Cii.
 

The figures below are static; they are a rough sketch to represent the dynamic panel of functions of program min6
 
 
(Botão Azul)
0
-
 
Ci
[
]
X Y B
E
.
0
1 2 3
4 5 6
7 8 9

Figure-1. Panel Ci

 
(Blue Button)
xy
 
i
Cii
W
XY N
E
.
0
1 2 3
4 5 6
7 8 9

Figure-2. Panel Cii

To enable or not indices on the graphic
A click on button i, as in figure-2, will remove indices of points on the graphic. A second will enable them.

To drag the panel of functions
To drive the panel of functions away from the line; drag the blue button as represented on figure-2.

To change the graphic amplification
The first click on button N on panel Cii, as represented in figure-2 will enable this linear regression program to acquire the value of the new Nx factor to amplify more or less the representation of the scale on x axis.
A second click on button N will enable to acquire the new Ny factor for the scale on the y axis.
Example: if it is wanted to enter the factor Ny=3.5 and the first click on button N displays Nx=, then button N should receive one more click to show Ny= on the display. After this a click on button 3, a click on the point button and a click on button 5 and finally a click on button E and so it is done.

To see the new statistics
If at least 3 pairs of values x[n] and y[n] were already acquired and the last action acquired one more pair of values x[n+1] and y[n+1] on panel of functions Ci, as shown on figure-1, change to the panel of functions Cii, click on button xy, as shown on figure-2, change to panel of functions Ci and click on button B, represented in figure-1, to read the new statistics on the blue pages.
In resume, the new statistics will be readable on the blue pages after the graphic output is observed. Only after any new pair of values x[m] and y[m], with m > 2 is acquired, a new statistics can be observed.
Repeated clicks on button B will change to the next blue page. There are 10 blue pages: B1, B2,...B10. Observe on pages B6 up to B10 the values Y[n], calculated by the regression equation for each x[n] and the percentage error, E%[n], calculated by equation-1

E%[n] = (y[n] - Y[n])/y[n] * 100.; for all y[n] not equal 0.       (1)

Data not yet acquired have value 0.0. Observe the blue pages sketched on table-1, they follow the data organization.

 Tabel-I. Data organization on the blue pages.
Page Contents
B1 x[0]= and y[0]=
x[1]= and y[1]=
...        ...
x[9]= and y[9]=
pop = number of pairs x[n] and y[n] acquired
amplification factors Nx and Ny
statistics if pop > 3 
B2 x[10]= and y[10]=
x[11]= and y[11]=
...        ...
x[19]= ande y[19]=
pop = number of pairs x[n] and y[n] acquired
amplification factors Nx and Ny
statistics if pop > 3 
... ...
B5 x[40]= and y[40]=
x[41]= and y[41]=
...        ...
x[49]= and y[49]=
pop = number of pairs x[n] and y[n] acquired
amplification factors Nx and Ny
statistics if pop > 3 
B6 x[0]= and Y[0]= and E%[0]=
x[1]= and Y[1]= and E%[1]=
...        ...
x[9]= and Y[9]= and E%[9]=
... ...
B10 x[40]= and Y[40]= and E%[40]=
x[41]= and Y[41]= and E%[41]=
...        ...              ...
x[49]= and Y[49]= and E%[99]=

To clean all acquired data
The program min6 initially shows an example. Click on button W on panel Cii as represented on figure-2 to clean all.

To erase anything before a click on button E
To erase a wrong number or a wrong bracket when the button E was not yet clicked: click on the button Ci or Cii.

To acquire a pair of values x[n] and y[n]
Although not mandatory, it is convenient to click on button B, on panel of functions Ci, as represented in figure-1, in order to see in real time what the linear regression program acquired after any value is entered. The first pair of values will be x[0] and y[0], where 0 is the index. Suppose x[0]= 2.5 and y[0]= -2.5, as an example. In this case those values will be entered on the same panel of functions as follows: click on button x, click on button [, click on button 0, click on button ], click on button 2, click on button point, click on button 5 and click on button E. Now click on button y, click on button [, click on button 0, click on button ], click on button -, and follow the same sequence used to enter the number 2.5 above. Check pop = 1 and the first pair of values in the blue page.
Limitations: the largest value for x or y that can be entered is 9999.9999 and the smallest posivie value is 0.0001.
A new pair of values will always have the index equal 1 plus the index of the previous new pair of values acquired and the number pop will increase 1 unit after done.
A pair of values will be named as a substitution if their index is equal to the index of any previous pair of values acquired, in this case the number pop will stay constant after done.
The biggest index allowed is [49] and the smallest index allowed is [0].

Exercise

Analyse the study on beginning and answer the following questions:
1) How many students took the test?
2) Suggest an explanation for the high value of constant C1.

Reference

Statistics for Analytical Chemistry, J.C.MILLER e J.N.MILLER, John Willey & Sons, 1988.

Please send your comments.

Table of subjects.
Presentation
Other Varied Diary
Linear regression and statistics
Oblique and orthogonal systems study
Study of the oblique at all coordinate system and the orthogonal coordinate system
The Great Wall of China and the Great Higway of Love