The MathFillers: MODULE 7: EDUCATIONAL STATISTICS

Reflection about Module 7

MODULE 7: EDUCATIONAL STATISTICS

Lesson: 1 Descriptive Vs. Inferential Statistics

What is Statistics?

Statistics is the study of the collection, organization, analysis, and interpretation of data. It deals with all aspects of this, including the planning of data collection in terms of the design of surveys and experiments. It deals with scientific methods of collecting, organizing, summarizing, presenting and analyzing data, as well as drawing valid conclusions and making reasonable decision and the basis of this analysis. It may also be considered to be a method that can be used to analyze data, that is, to organize and make sense out of a large amount of materials.

What is Descriptive Statistics?

Descriptive statistics comprises the kind of analyses we use when we want to describe the population we are studying, and when we have a population that is small enough to permit our including every case. It comprises those methods concerned with collecting and describing a set of data so as to yield meaningful information it provides information. It provides information only about the collected data and in no way draws inferences or conclusions concerning a larger set of data. The construction of tables, charts, graphs, and relevants computations in various newspapers and magazines usually fall under this method.

Types of descriptive statistics:

• A quantitative index that describes performance of a sample or samples

• A quantitative index describing the performance of a population

• Measures of central tendency are used to determine the typical or average value among a group of values

• Measures of variability indicate how spread out the values are

What is Inferential Statistics?

Inferential Statistics comprises those methods concerned with the analysis of subset of datas leading to predictions or inferences about the entire set of data. It consists of methods that are used to infer characteristics of a population from observations on sample or formulate general laws on the basis of repeated observations. Considered as the central function of modern statistics, statistical inference is concerned with two types of problems: estimations of populations parameters and test hypotheses. It extends conclusions to a broader population, like all such classes, all workers, all women.

Types of Inferential Statistics:

· NOMINAL

· ORDINAL

· INTERVAL/ RATIO

Descriptive Statistics vs. Inferential Statistics

Both descriptive and inferential statistics rely on the same set of data. Descriptive statistics rely solely on this set of data whilst inferential statistics also rely on this data in order to make generalizations about a larger population. Other than the clarity with which descriptive statistics can clarify large volumes of data, there are no uncertainties about the values you get. Descriptive statistics are limited in so much that they only allow you to make summations about the people or objects that you have actually measured. You cannot use the data you have collected to generalize to other people or objects. There are two main limitations to the use of inferential statistics. The first, and most important, limitation, which is present in all inferential statistics, is that you are providing data about a population that you have not fully measured and, therefore, cannot ever be completely sure that the values/statistics you calculate are correct. Remember, inferential statistics are based on the concept of using the values measured in a sample to estimate/infer the values that would be measured in a population; there will always be a degree of uncertainty in doing this. The second limitation is connected with the first limitation. Some, but not all, inferential tests require the user. Again, there will be some uncertainty in this process, which will have repercussions on the certainty of the results of some inferential statistics.

Lesson 2 Measures of Central Tendency

A measure of central tendency is a single value that attempts to describe a set of data by identifying the central position within that set of data and are also classed as summary statistics. The mean is most likely the measure of central tendency that you are most familiar with, but there are others, such as median and mode. The mean, median, and mode are all valid measures of central tendency.

Mean

The mean (average) is the most popular and well-known measure of central tendency. The mean is the sum of item values divided by the number of items. (Jose-Dilao, 2003) So, if we have n values in a data set and they have values

then the sample mean usually denoted by:

∑ (sigma) is a Greek word which means “the sum of”.

Where:

frequency of the class interval

midpoint of the class interval

Example:

The number of students in six fourth year sections are 45,55,68,62,57,60. What is the average class size in the fourth year?

Solution:

= 56.17 / 56 students

Median

The median is the point on the scale of scores below which half of the scores lie, above which the other half of the scores lie. (Pangan, 1996) The median represented by

, is the value of the middle term when data are arranged in ascending or descending order.

In computing the median, remember:

1.) Arrange the data in the array of ascending or descending order.

2.) Take note of the items in the middle position. If there is an odd number of an item, the middle item is the median. If there is an even number of items, the median is taken as the arithmetic mean of the two values falling in the middle.

Median of grouped data

Where:

L = the lower limit of the class containing the median
n = the total number of frequencies
f = the frequency of the median class
CF = the cumulative number of frequencies in the classes preceding the class containing the median
i = the width of the class containing the median

Example: Find the median score of students in Math class of Miss Jose.

Scores	Frequency	Exact lower Limit (L)	Cumulative Frequency (CF)
95-99	5	94.5	100
90-94	11	89.5	95
85-89	17	84.5	84
80-84	25 *	79.5	67
75-79	20	74.5	42
70-74	12	69.5	22
65-69	7	64.5	10
60-64	3	59.5	3

Therefore, the median is 81.1. This means that one half of the students scored above 81.1 and the other half scored below 81.1.

Mode

The mode is referred to as the most frequently occurring value in a given set of data. In a distribution, the element which is repeated the most number of times is the mode.

Example:

1.) The sizes of 15 families in a barangay chosen at random are as follows: 8,7,4,6,12,6,7,6,8,10,7,8,5,3,4.

Therefore the modes are 6,7,and 8.

Mode of grouped data

Where:

exact lower limit of the modal class

the difference between the frequency of the modal class and that of the frequency below the modal class

the difference between the frequency of the modal class and that of the frequency above the modal class

size of the class interval

Example:

Consider the distribution of the monthly wages of the factory workers in Matina Garments Factory. Find the modal weekly wages of the workers in the factory.

Weekly wages	No. of workers
P 1,380-1,399	4
1,360-1,379	6
1,340-1,359	12
1,320-1,339	31 modal class
1,300-1,319	24
1,280-1,299	15
1,260-1,279	11
1.240-1,259	8

Therefore, the modal weekly wage of the factory workers is approximately P1,324.88.

Lesson 3: Measures of Variability

MEASURES OF VARIABILITY

The measure of variability, also called measure of dispersion or spread is a descriptive measurement that is used to indicate the amount of variations in a data set. The most common measures of variability are the RANGE, QUARTILE DEVIATION and STANDARD DEVIATION.

Range and its Properties

The Range of a data set is defined to be the difference between the highest and lowest values in the data set.The range is very unstable, because it depends on only two scores. If one of those scores moves further from the distribution, the range will increase even though the typical variability among the scores has changed very little.This instability of the range has lead to the development of two other range measures, neither of which relies on only the lowest and highest scores. The range value of a data set is greatly influenced by the presence of just one unusually large or small value in the sample.

Computation of the Range from Ungrouped Scores

Method of computing the Range from Ungrouped scores:

A. Data Given:

Ø AR= H-L

Ø TR=H-L+1

48 79 39 51 62 41 35 80 75 63 67 49 55 69 70 45 71 49 58 62 70

B. Procedure:

1. Use the formula, AR=H-L,if it is absolute range

TR=H-L+1 ,if it is total range

Where: H=the highest score

L=the lowest score

2. Find the values of the symbols from the given data

Ø TR= H-L +1 AR=H-L

TR= 80-35 +1 AR=80-35

TR= 45+1 AR=45

TR=46

3. Substitute the values in the formula and solve

Computation of the Range from Grouped Scores

Method of computing the Range from Grouped scores:

A. Data Given:

Classes	Frequency
46-50 41-45 36-40 31-35 26-30 21-25 16-20 11-15	5 7 9 10 8 6 4 4

B. Procedure:

1. Use the formula,

AR= EU-EL, if it is absolute range

TR= EU-EL+1, if it is total range

Where: EU=50.5= exact upper limit of the highest class interval

EL=10.5= exact lower limit of the lowest class interval

Ø AR= EU- EL

AR=50.5-10.5

AR= 40

Ø TR=EU-EL+1

TR=50.5-10.5+1

TR=41

2. Substitute the values in the formula and solve

Quartile Deviation

Quartile is a Quantile divide the distribution into four equal parts. Quartile Deviation is a measure that describes the existing dispersion in terms of the distance selected observation points. It is a measure of variation that is specifically used as a measure of central tendency.

Computation of Quartile Deviation for Ungrouped Data

A. Data Given: The scores of 9 students in Ed103 quiz

15, 12, 33, 25, 30, 23,17,19,16

B. To find the Quantile Deviation, the measure of Quantiles in Central Tendency must be computed first

1. Arrange the Test scores from lowest to highest and rank it respectively

Rank	Scores
1^st 2^nd 3^rd 4^th 5^th 6^th 7^th 8^th 9^th	12 15 16 17 19 23 25 30 33
	n=9

2. Compute Q₁ and Q₃ by using the formula

Q_k=

Ø Q_k=

Q₁=

Q₁= 3^rd score

Therefore,Q₁=16

Ø Q_k=

Q₃=

Q₃₌ 7^th score

Therefore,Q₃= 25

Ø QD

QD=4

a.SolvingQ₁

b. Solving Q₃

C. Computation of Quantile Deviation:

Use the formula, QD

in which, QD= the quartile deviation

Q1=the first quartile

Q3=the third quartile

and solve.

Computation of Quartile Deviation for Grouped Data

A. Data Given:

a. Computation of Q₁

Ø Q₃₌L_B+ (

Ø n=50,

12.5

Q₃class=73-75

L_B = 72.5

C = 3

Cf_p = 34

f_q = 8

Ø Q₃₌L_B+ (

Q₃₌72.5+ (

Q₃₌72.5z+ 1.3125

Q₃₌73.81

b. Computation of Q₃

C. Computation of Quartile Deviation

Ø QD

a. Use the formula, QD

Ø QD

in which, QD= the quartile deviation

Q₁=67.88

Q₃=73.81

b. Substitute the values of Q₁and

Q₃in the formula and solve

o Standard Deviation

Standard Deviation is a measure of the average deviation or departure of the individual scores from the mean.The standard deviation has proven to be an extremely useful measure of spread in part because it is mathematically tractable.

Table for the Given Data

x	X²
23 18 24 17 27 19 15 20	529 324 576 289 729 361 225 400
Σx= 163	²=3433

Computation of Standard Deviation

for Ungrouped Data

A. Data Given:

23 18 24 17 27 19 15 20

n= 8

Ø s=

B.Procedure:

1.Use the formula :

Ø s=

s=3.74

2. Substitute the values of symbols in

The formula and solve.

Standard Deviation for Grouped Data

A. Data Given

Class Interval	Frequency	Class mark X	X-	(X- )²	f(X- )²
61-63 64-66 67-69 70-72 73-75 76-78 76-81	2 5 12 15 8 5 3	62 65 68 71 74 77 80	8.94 5.94 2.94 .06 3.06 6.06 9.06	79.924 35.284 8.644 .004 9.364 36.724 82.084	159.848 175.42 103.728 0.06 74.912 183.62 246.252
	n=50	∑f=( X- )²=944.84

B. Procedure

Ø S=

S= 4.39

Substitute the values

in the formula and

solve,

Standard Deviation Formulas

	Sample Standard Deviation	Population Standard Deviation
Standard Deviation for Ungrouped Data	s=	s=
Standard Deviation for Grouped Data	S=	S=

Lesson 4: Measures of Correlation

CORRELATION

Correlation is the relationship between two or more paired factors of two or more sets of test scores (Best & Kahn, 1998). It is the tendency for the corresponding observations in two or more series to vary together from the averages of their respective series, that is, to have similar relative positions (Good, 134).

3 Types of Correlation

Positive Correlation

Is the tendency for the corresponding observations to have similar relative positions in their respective series (Good,134). Means that high scores in one variable (x) are associated with high scores in another variable (y).

Negative Correlation

Is the tendency for corresponding values to be divergent in position in their respective series (Good, 134). Means that high scores on one variable are associated with low scores in another variable or vice-versa.

Zero Correlation

Is the absence of any systematic tendency for corresponding observations to be either similar or dissimilar in their relative position in their respective series (Good, 134) there is no definite relationship between the two sets of measures.

Correlation coefficient can range from a- 1.00 or a-1.00 toward zero. The sign of the coefficient indicates the direction of the relationship and the numerical value of its strength.

Obtained correlation coefficient can be interpreted with the use of a scale, like the ones presented below (Best & Kahn, 1998).

Correlation Coefficient Degree of Relationship

.00 - .20 Negligible

.21 - .40 Low

.41 - .60 Moderate

.61 - .80 Substantial

.81 - 1.00 High to Very High

Pearson’s Product-Moment Correlation

This measure of relationship is used when factors to be correlated are both metric data. By metric data are meant measurements, which can be subjected to the four fundamental operations. To compute the correlation coefficient using the aforementioned test statistics, follow these steps:

1. Compute the sum of each set of scores (SX, SY).

2. Square each score and sum the squares (SX², SY²).

3. Count the number of scores in each group (N).

4. Multiply each X score by its corresponding Y score.

5. Sum the cross products of X and Y (SXY).

6. Calculate the correlation, following the formula:

Where: N = number of paired observations

SXY = sum of the scores products of X and Y

SX = sum of the scores under variable X

SY = sum of the scores under variable Y

(SX)² = sum of X scores squared

(SY)² = sum of Y scores squared

SX²= sum of squared X scores

SY² = sum of squared Y scores

Let us illustrate how Pearson’s r is computed. This table shows the computational procedures in determining the degree of relationship between test scores of 10 students in English (X) and Mathematics (Y).

Computation of Correlation Coefficient using Pearson’s r

X	Y	X²	Y²	XY
90 85 80 75 70 65 60 55 50 45	80 72 70 65 68 55 60 50 53 44	8100 7225 6400 5625 4900 4225 3600 3025 2500 2025	6400 5184 4900 4225 4624 3025 3600 2500 2809 1936	7200 6120 5600 4875 4760 3575 3600 2750 2650 1980
SX=675	SY=617	SX²=47625	SY²=39203	SXY=43110

N =10

N SXY = 10(43110) = 431 100

(SX)(SY) = (675) (617) = 416 475

{N SXY – (SX) (SY)} = 431 100 – 416 475 = 14 625

N SX² = 10 (47 625) = 476 250

(SX)^{2 =}(675)(675) = 455 625

(N SX² – (SX)²) = 476 250 – 455 625 = 20 625

N SY² = 10 (39 203) = 392 030

(SY)² = (617)(617) = 380 689

(N SY² – (SY)² = 392 030 – 380 689 = 11 341

{(N SX² – (SX)²) (N SY²– (SY)²)} = (20 625)(11 341)

= 233,908,125

=square root of 233,908,125 = 15294.05522

r = 14,625/15294.05522

r = 0. 956 or 0.96

Results of the computation of Pearson’s r yielded a computed r of 0.96. this indicates that very high degree of relationship exists between the test scores in English and Mathematics. A student who scored high in English also obtained a high score in Mathematics.

Spearman Rho

This measure of relationship is used when test scores are ordinal or rank-ordered. In computing rho, the ff. steps have to be observed:

1. Rank the scores in distribution X, giving the highest score a rank of 1.

2. Repeat the process for the scores in distribution Y.

3. Obtain the difference between the two sets of ranks (D).

4. Square for rho, following the formula:

rho = 1-

Where: rho = rank-order correlation coefficient

D = difference between paired ranks

SD²= sum of squared differences between paired ranks

N = number of paired ranks

The computational procedures for the calculation of rho are reflected in this table.

Computation of Correlation Coefficient using Spearman Rho

X	Y	Rank of X	Rank of Y	D	D
90 85 80 75 70 65 60 55 50 45	80 72 70 65 68 55 60 50 53 44	1 2 3 4 5 6 7 8 9 10	1 2 3 4 5 6 7 8 9 10	0 0 0 -1 1 -1 1 -1 1 0	0 0 0 1 1 1 1 1 1 0
					SD²=6