Saturday, March 17, 2012

MODULE 7: EDUCATIONAL STATISTICS

Reflection about Module 7
MODULE 7: EDUCATIONAL STATISTICS
            Lesson: 1 Descriptive Vs. Inferential Statistics
What is Statistics?
            Statistics is the study of the collection, organization, analysis, and interpretation of data. It deals with all aspects of this, including the planning of data collection in terms of the design of surveys and experiments. It deals with scientific methods of collecting, organizing, summarizing, presenting and analyzing data, as well as drawing valid conclusions and making reasonable decision and the basis of this analysis. It may also be considered to be a method that can be used to analyze data, that is, to organize and make sense out of a large amount of materials.
What is Descriptive Statistics?
            Descriptive statistics comprises the kind of analyses we use when we want to describe the population we are studying, and when we have a population that is small enough to permit our including every case. It comprises those methods concerned with collecting and describing a set of data so as to yield meaningful information it provides information. It provides information only about the collected data and in no way draws inferences or conclusions concerning a larger set of data. The construction of tables, charts, graphs, and relevants computations in various newspapers and magazines usually fall under this method.
Types of descriptive statistics:
      A quantitative index that describes performance of a sample or samples
      A quantitative index describing the performance of a population
      Measures of central tendency are used to determine the typical or average value among a group of values
      Measures of variability indicate how spread out the values are

What is Inferential Statistics?
Inferential Statistics comprises those methods concerned with the analysis of subset of datas leading to predictions or inferences about the entire set of data. It consists of methods that are used to infer characteristics of a population from observations on sample or formulate general laws on the basis of repeated observations. Considered as the central function of modern statistics, statistical inference is concerned with two types of problems: estimations of populations parameters and test hypotheses. It extends conclusions to a broader population, like all such classes, all workers, all women.
Types of Inferential Statistics:
·         NOMINAL
·         ORDINAL
·         INTERVAL/ RATIO

Descriptive Statistics vs. Inferential Statistics
            Both descriptive and inferential statistics rely on the same set of data. Descriptive statistics rely solely on this set of data whilst inferential statistics also rely on this data in order to make generalizations about a larger population. Other than the clarity with which descriptive statistics can clarify large volumes of data, there are no uncertainties about the values you get. Descriptive statistics are limited in so much that they only allow you to make summations about the people or objects that you have actually measured. You cannot use the data you have collected to generalize to other people or objects. There are two main limitations to the use of inferential statistics. The first, and most important, limitation, which is present in all inferential statistics, is that you are providing data about a population that you have not fully measured and, therefore, cannot ever be completely sure that the values/statistics you calculate are correct. Remember, inferential statistics are based on the concept of using the values measured in a sample to estimate/infer the values that would be measured in a population; there will always be a degree of uncertainty in doing this. The second limitation is connected with the first limitation. Some, but not all, inferential tests require the user. Again, there will be some uncertainty in this process, which will have repercussions on the certainty of the results of some inferential statistics.
Lesson 2 Measures of Central Tendency
            A measure of central tendency is a single value that attempts to describe a set of data by identifying the central position within that set of data and are also classed as summary statistics. The mean is most likely the measure of central tendency that you are most familiar with, but there are others, such as median and mode. The mean, median, and mode are all valid measures of central tendency.
Mean
            The mean (average) is the most popular and well-known measure of central tendency. The mean is the sum of item values divided by the number of items. (Jose-Dilao, 2003) So, if we have n values in a data set and they have values ,, then the sample mean usually denoted by:
http://statistics.laerd.com/statistical-guides/img/measures-of-central-tendency-1.png

http://statistics.laerd.com/statistical-guides/img/measures-of-central-tendency-4.png

(sigma) is a Greek word which means “the sum of”.
 Where:
            frequency of the class interval
            midpoint of the class interval




Example:
The number of students in six fourth year sections are 45,55,68,62,57,60. What is the average class size in the fourth year?
Solution:
                        X 
               
                = 56.17 / 56 students
Median
            The median is the point on the scale of scores below which half of the scores lie, above which the other half of the scores lie. (Pangan, 1996) The median represented by , is the value of the middle term when data are arranged in ascending or descending order.
In computing the median, remember:
1.)  Arrange the data in the array of ascending or descending order.
2.)  Take note of the items in the middle position. If there is an odd number of an item, the middle item is the median. If there is an even number of items, the median is taken as the arithmetic mean of the two values falling in the middle.  







Median of grouped data
http://www.duncanwil.co.uk/gif%20files/ave17.gif
Where:
L = the lower limit of the class containing the median
n = the total number of frequencies
f = the frequency of the median class
CF = the cumulative number of frequencies in the classes preceding the class containing the median
i = the width of the class containing the median
Example: Find the median score of students in Math class of Miss Jose.
Scores
Frequency
Exact lower Limit (L)
Cumulative Frequency (CF)
95-99
5
94.5
100
90-94
11
89.5
95
85-89
17
84.5
84
80-84
25 *
79.5
67
75-79
20
74.5
42
70-74
12
69.5
22
65-69
7
64.5
10
60-64
3
59.5
3




           
            Therefore, the median is 81.1. This means that one half of the students scored above 81.1 and the other half scored below 81.1.

Mode
            The mode is referred to as the most frequently occurring value in a given set of data. In a distribution, the element which is repeated the most number of times is the mode.
Example:
1.)  The sizes of 15 families in a barangay chosen at random are as follows: 8,7,4,6,12,6,7,6,8,10,7,8,5,3,4.
Therefore the modes are 6,7,and 8.

Mode of grouped data
Where:
exact lower limit of the modal class
the difference between the frequency of the modal class and that of the frequency below the modal class
the difference between the frequency of the modal class and that of the frequency above the modal class
size of the class interval
Example:
Consider the distribution of the monthly wages of the factory workers in Matina Garments Factory. Find the modal weekly wages of the workers in the factory.
Weekly wages
No. of workers
P  1,380-1,399
4
    1,360-1,379
6
    1,340-1,359
12
    1,320-1,339
31 modal class
    1,300-1,319
24
    1,280-1,299
15
    1,260-1,279
11
    1.240-1,259
8

Therefore, the modal weekly wage of the factory workers is approximately P1,324.88.
Lesson 3: Measures of Variability
MEASURES OF VARIABILITY
            The measure of variability, also called measure of dispersion or spread is a descriptive measurement that is used to indicate the amount of variations in a data set. The most common measures of variability are the RANGE, QUARTILE DEVIATION and STANDARD DEVIATION.
Range and its Properties
            The Range of a data set is defined to be the difference between the highest and lowest values in the data set.The range is very unstable, because it depends on only two scores. If one of those scores moves further from the distribution, the range will increase even though the typical variability among the scores has changed very little.This instability of the range has lead to the development of two other range measures, neither of which relies on only the lowest and highest scores. The range value of a data set is greatly influenced by the presence of just one unusually large or small value in the sample.
Computation of the Range from Ungrouped Scores
Method of computing the Range from Ungrouped scores:
A. Data Given:

Ø  AR= H-L
Ø  TR=H-L+1
 
      48 79 39 51 62 41 35 80 75 63 67 49 55 69 70 45 71 49 58 62 70
B. Procedure:
1. Use the formula, AR=H-L,if it is absolute range
TR=H-L+1 ,if it is total range
    Where:       H=the highest score
Text Box: Ø H= 80
L=35                              L=the lowest score
2. Find the values of the symbols from the given data

Ø  TR= H-L +1                  AR=H-L           
TR= 80-35 +1               AR=80-35
TR= 45+1                     AR=45
TR=46

 
3. Substitute the values in the formula and solve


Computation of the Range from Grouped Scores
            Method of computing the Range from Grouped scores:
A.   Data Given:
Classes
Frequency
46-50
41-45
36-40
31-35
26-30
21-25
16-20
11-15
5
7
9
10
8
6
4
4

B.   Procedure:
1.    Use the formula,
AR= EU-EL, if it is absolute range
TR= EU-EL+1, if it is total range
Where: EU=50.5= exact upper limit of the highest class interval
              EL=10.5= exact lower limit of the lowest class interval
Ø  AR= EU- EL
AR=50.5-10.5
AR= 40
Ø  TR=EU-EL+1
TR=50.5-10.5+1
TR=41
 
 

2.    Substitute the values in the formula and solve





Quartile Deviation
            Quartile is a Quantile divide the distribution into four equal parts. Quartile Deviation is a measure that describes the existing dispersion in terms of the distance selected observation points. It is a measure of variation that is specifically used as a measure of central tendency.
Computation of Quartile Deviation for Ungrouped Data
A.   Data Given: The scores of 9 students in Ed103 quiz
15, 12, 33, 25, 30, 23,17,19,16

B.   To find the Quantile Deviation, the measure of Quantiles in Central Tendency must be computed first

1.    Arrange the Test scores from lowest to highest and rank it respectively
Rank
Scores
1st
2nd
3rd
4th
5th
6th
7th
8th
9th
12
15
16
17
19
23
25
30
33

n=9
           


2.    Compute Q1 and Q3 by using the formula
Qk=

Ø  Qk=
Q1=
Q1= 3rd score
Therefore,Q1=16


Ø  Qk=
Q3=
Q3= 7th score
Therefore,Q3= 25


Ø  QD
QD
QD=4
 
a.SolvingQ1




            b. Solving Q3



C.   Computation of Quantile Deviation:
Use the formula, QD
in which, QD= the quartile deviation
                     Q1=the first quartile
                     Q3=the third quartile
and solve.
Computation of Quartile Deviation for Grouped Data
A. Data Given:
            a. Computation of Q1
Ø  Q3= LB + (
Ø  n=50, = 12.5
Q3 class=73-75
       LB    = 72.5
        C    = 3
Cfp   = 34
            fq  = 8
Ø  Q3= LB + (
Q3= 72.5+ (
Q3=72.5z+ 1.3125
           
Q3=73.81

 
 

b. Computation of Q3












C. Computation of Quartile Deviation

Ø  QD


 
a. Use the formula, QD

Ø  QD

QD

QD

QD



 
in which, QD= the quartile deviation
                     Q1=67.88
                     Q3=73.81

    b. Substitute the values of Q1 and
     Q3in the formula and solve
           



o   Standard Deviation
Standard Deviation is a measure of the average deviation or departure of the individual scores from the mean.The standard deviation has proven to be an extremely useful measure of spread in part because it is mathematically tractable.




Table for the Given Data
x
X2
23
18
24
17
27
19
15
20
529
324
576
289
729
361
225
400
Σx= 163
2=3433

 
               
 
Computation of Standard Deviation
for Ungrouped Data
A. Data Given:
23 18 24 17 27 19 15 20
n= 8
           




Ø  s=

 
B.Procedure:
1.Use the formula :

Ø  s=

s=
s=
s=
s=3.74

 
2. Substitute the values of symbols in
      The formula and solve.






Standard Deviation for Grouped Data
A.   Data Given
Class Interval
Frequency
Class mark
X
X-
(X- )2
f(X- )2
61-63
64-66
67-69
70-72
73-75
76-78
76-81
2
5
12
15
8
5
3
62
65
68
71
74
77
80
8.94
5.94
2.94
.06
3.06
6.06
9.06
79.924
35.284
8.644
.004
9.364
36.724
82.084
159.848
175.42
103.728
0.06
74.912
183.62
246.252

 n=50
                        ∑f=( X- )2=944.84


B.   Procedure
1.   
Ø  S=

S=

S=
S= 4.39



 

 
Substitute the values
in the formula and
solve,





Standard Deviation Formulas

Sample Standard Deviation
Population Standard Deviation
Standard Deviation for Ungrouped Data
s=

s=

Standard Deviation for Grouped Data
S=

S=


                               Lesson 4: Measures of Correlation
CORRELATION
            Correlation is the relationship between two or more paired factors of two or more sets of test scores (Best & Kahn, 1998). It is the tendency for the corresponding observations in two or more series to vary together from the averages of their respective series, that is, to have similar relative positions (Good, 134).
3 Types of Correlation
Positive Correlation
            Is the tendency for the corresponding observations to have similar relative positions in their respective series (Good,134). Means that high scores in one variable (x) are associated with high scores in another variable (y).


Negative Correlation
            Is the tendency for corresponding values to be divergent in position in their respective series (Good, 134). Means that high scores on one variable are associated with low scores in another variable or vice-versa.
Zero Correlation
Is the absence of any systematic tendency for corresponding observations to be either similar or dissimilar in their relative position in their respective series (Good, 134) there is no definite relationship between the two sets of measures.
Correlation coefficient can range from a- 1.00 or a-1.00 toward zero. The sign of the coefficient indicates the direction of the relationship and the numerical value of its strength.
            Obtained correlation coefficient can be interpreted with the use of a scale, like the ones presented below (Best & Kahn, 1998).

                     Correlation Coefficient          Degree of Relationship
                                 .00 - .20                                     Negligible
                                 .21 - .40                         Low
                                    .41 - .60                                  Moderate
                                    .61 - .80                                  Substantial
                                    .81 - 1.00                                High to Very High

Pearson’s Product-Moment Correlation
             
            This measure of relationship is used when factors to be correlated are both metric data. By metric data are meant measurements, which can be subjected to the four fundamental operations. To compute the correlation coefficient using the aforementioned test statistics, follow these steps:
1.    Compute the sum of each set of scores (SX, SY).
2.    Square each score and sum the squares (SX2, SY2).
3.    Count the number of scores in each group (N).
4.    Multiply each X score by its corresponding Y score.
5.    Sum the cross products of X and Y (SXY).
6.    Calculate the correlation, following the formula:


            r=
Where:   N     = number of paired observations
               SXY = sum of the scores products of X and Y
               SX    = sum of the scores under variable X
               SY    = sum of the scores under variable Y
              (SX)2 = sum of X scores squared
              (SY)2  = sum of Y scores squared
              SX2     = sum of squared X scores
              SY2   = sum of squared Y scores
Let us illustrate how Pearson’s r is computed. This table shows the computational procedures in determining the degree of relationship between test scores of 10 students in English (X) and Mathematics (Y).




Computation of Correlation Coefficient using Pearson’s r
         X
         Y
          X2
          Y2
        XY
90
85
80
75
70
65
60
55
50
45
80
72
70
65
68
55
60
50
53
44
8100
7225
6400
5625
4900
4225
3600
3025
2500
2025
6400
5184
4900
4225
4624
3025
3600
2500
2809
1936
7200
6120
5600
4875
4760
3575
3600
2750
2650
1980
SX=675
SY=617
SX2=47625
SY2=39203
SXY=43110

N =10
            N SXY = 10(43110) = 431 100
            (SX)(SY) = (675) (617) = 416 475
            {N SXY – (SX) (SY)} = 431 100 – 416 475 = 14 625
            N SX2 = 10 (47 625) = 476 250
            (SX)2 = (675)(675) = 455 625
            (N SX2 – (SX)2) = 476 250 – 455 625 = 20 625
            N SY2 = 10 (39 203) = 392 030
            (SY)2  = (617)(617) = 380 689
            (N SY2 – (SY)2 = 392 030 – 380 689 = 11 341
            {(N SX2 – (SX)2) (N SY2 – (SY)2)} = (20 625)(11 341)
                                                                        = 233,908,125
                                                            =square root of 233,908,125 = 15294.05522
                                                       r = 14,625/15294.05522
                                                       r = 0. 956 or 0.96
            Results of the computation of Pearson’s r yielded a computed r of 0.96. this indicates that very high degree of relationship exists between the test scores in English and Mathematics. A student who scored high in English also obtained a high score in Mathematics.
Spearman Rho
            This measure of relationship is used when test scores are ordinal or rank-ordered. In computing rho, the ff. steps have to be observed:
1.    Rank the scores in distribution X, giving the highest score a rank of 1.
2.    Repeat the process for the scores in distribution Y.
3.    Obtain the difference between the two sets of ranks (D).
4.    Square for rho, following the formula:

rho = 1-
                       
               Where:        rho = rank-order correlation coefficient
                                    D    = difference between paired ranks
                                    SD2= sum of squared differences between paired ranks
                                    N    = number of paired ranks
            The computational procedures for the calculation of rho are reflected in this table.
Computation of Correlation Coefficient using Spearman Rho
       X
     Y
 Rank       of X
 Rank of Y
D
D
90
85
80
75
70
65
60
55
50
45
80
72
70
65
68
55
60
50
53
44
1
2
3
4
5
6
7
8
9
10
1
2
3
4
5
6
7
8
9
10
0
0
0
-1
1
-1
1
-1
1
0
0
0
0
1
1
1
1
1
1
0





SD2=6

                             N = 10
Rho = 1-  = 1 –  = 1-

= 1 -  = 1 – 0.0364 = 0.964 or 0.96













No comments:

Post a Comment