Reflection about Module 7
MODULE 7: EDUCATIONAL STATISTICS
Lesson: 1 Descriptive Vs. Inferential Statistics
What is Statistics?
Statistics is the study of the collection, organization, analysis, and interpretation of data. It deals with all aspects of this, including the planning of data collection in terms of the design of surveys and experiments. It deals with scientific methods of collecting, organizing, summarizing, presenting and analyzing data, as well as drawing valid conclusions and making reasonable decision and the basis of this analysis. It may also be considered to be a method that can be used to analyze data, that is, to organize and make sense out of a large amount of materials.
What is Descriptive Statistics?
Descriptive statistics comprises the kind of analyses we use when we want to describe the population we are studying, and when we have a population that is small enough to permit our including every case. It comprises those methods concerned with collecting and describing a set of data so as to yield meaningful information it provides information. It provides information only about the collected data and in no way draws inferences or conclusions concerning a larger set of data. The construction of tables, charts, graphs, and relevants computations in various newspapers and magazines usually fall under this method.
Types of descriptive statistics:
• A quantitative index that describes performance of a sample or samples
• A quantitative index describing the performance of a population
• Measures of central tendency are used to determine the typical or average value among a group of values
• Measures of variability indicate how spread out the values are
What is Inferential Statistics?
Inferential Statistics comprises those methods concerned with the analysis of subset of datas leading to predictions or inferences about the entire set of data. It consists of methods that are used to infer characteristics of a population from observations on sample or formulate general laws on the basis of repeated observations. Considered as the central function of modern statistics, statistical inference is concerned with two types of problems: estimations of populations parameters and test hypotheses. It extends conclusions to a broader population, like all such classes, all workers, all women.
Types of Inferential Statistics:
· NOMINAL
· ORDINAL
· INTERVAL/ RATIO
Descriptive Statistics vs. Inferential Statistics
Both descriptive and inferential statistics rely on the same set of data. Descriptive statistics rely solely on this set of data whilst inferential statistics also rely on this data in order to make generalizations about a larger population. Other than the clarity with which descriptive statistics can clarify large volumes of data, there are no uncertainties about the values you get. Descriptive statistics are limited in so much that they only allow you to make summations about the people or objects that you have actually measured. You cannot use the data you have collected to generalize to other people or objects. There are two main limitations to the use of inferential statistics. The first, and most important, limitation, which is present in all inferential statistics, is that you are providing data about a population that you have not fully measured and, therefore, cannot ever be completely sure that the values/statistics you calculate are correct. Remember, inferential statistics are based on the concept of using the values measured in a sample to estimate/infer the values that would be measured in a population; there will always be a degree of uncertainty in doing this. The second limitation is connected with the first limitation. Some, but not all, inferential tests require the user. Again, there will be some uncertainty in this process, which will have repercussions on the certainty of the results of some inferential statistics.
Lesson 2 Measures of Central Tendency
A measure of central tendency is a single value that attempts to describe a set of data by identifying the central position within that set of data and are also classed as summary statistics. The mean is most likely the measure of central tendency that you are most familiar with, but there are others, such as median and mode. The mean, median, and mode are all valid measures of central tendency.
Mean
The mean (average) is the most popular and well-known measure of central tendency. The mean is the sum of item values divided by the number of items. (Jose-Dilao, 2003) So, if we have n values in a data set and they have values
,
,
then the sample mean usually denoted by:

∑ (sigma) is a Greek word which means “the sum of”.
Where:
Example:
The number of students in six fourth year sections are 45,55,68,62,57,60. What is the average class size in the fourth year?
Solution:
X 
= 56.17 / 56 students
Median
The median is the point on the scale of scores below which half of the scores lie, above which the other half of the scores lie. (Pangan, 1996) The median represented by
, is the value of the middle term when data are arranged in ascending or descending order.
In computing the median, remember:
1.) Arrange the data in the array of ascending or descending order.
2.) Take note of the items in the middle position. If there is an odd number of an item, the middle item is the median. If there is an even number of items, the median is taken as the arithmetic mean of the two values falling in the middle.
Median of grouped data

Where:
L = the lower limit of the class containing the median
n = the total number of frequencies
f = the frequency of the median class
CF = the cumulative number of frequencies in the classes preceding the class containing the median
i = the width of the class containing the median
n = the total number of frequencies
f = the frequency of the median class
CF = the cumulative number of frequencies in the classes preceding the class containing the median
i = the width of the class containing the median
Example: Find the median score of students in Math class of Miss Jose.
| Scores | Frequency | Exact lower Limit (L) | Cumulative Frequency (CF) |
| 95-99 | 5 | 94.5 | 100 |
| 90-94 | 11 | 89.5 | 95 |
| 85-89 | 17 | 84.5 | 84 |
| 80-84 | 25 * | 79.5 | 67 |
| 75-79 | 20 | 74.5 | 42 |
| 70-74 | 12 | 69.5 | 22 |
| 65-69 | 7 | 64.5 | 10 |
| 60-64 | 3 | 59.5 | 3 |
| | | | |

Therefore, the median is 81.1. This means that one half of the students scored above 81.1 and the other half scored below 81.1.
Mode
The mode is referred to as the most frequently occurring value in a given set of data. In a distribution, the element which is repeated the most number of times is the mode.
Example:
1.) The sizes of 15 families in a barangay chosen at random are as follows: 8,7,4,6,12,6,7,6,8,10,7,8,5,3,4.
Therefore the modes are 6,7,and 8.
Mode of grouped data
Where:
Example:
Consider the distribution of the monthly wages of the factory workers in Matina Garments Factory. Find the modal weekly wages of the workers in the factory.
| Weekly wages | No. of workers |
| P 1,380-1,399 | 4 |
| 1,360-1,379 | 6 |
| 1,340-1,359 | 12 |
| 1,320-1,339 | 31 modal class |
| 1,300-1,319 | 24 |
| 1,280-1,299 | 15 |
| 1,260-1,279 | 11 |
| 1.240-1,259 | 8 |
Therefore, the modal weekly wage of the factory workers is approximately P1,324.88.
Lesson 3: Measures of Variability
MEASURES OF VARIABILITY
The measure of variability, also called measure of dispersion or spread is a descriptive measurement that is used to indicate the amount of variations in a data set. The most common measures of variability are the RANGE, QUARTILE DEVIATION and STANDARD DEVIATION.
Range and its Properties
The Range of a data set is defined to be the difference between the highest and lowest values in the data set.The range is very unstable, because it depends on only two scores. If one of those scores moves further from the distribution, the range will increase even though the typical variability among the scores has changed very little.This instability of the range has lead to the development of two other range measures, neither of which relies on only the lowest and highest scores. The range value of a data set is greatly influenced by the presence of just one unusually large or small value in the sample.
Computation of the Range from Ungrouped Scores
Method of computing the Range from Ungrouped scores:
A. Data Given:
|
B. Procedure:
1. Use the formula, AR=H-L,if it is absolute range
TR=H-L+1 ,if it is total range
Where: H=the highest score
L=the lowest score2. Find the values of the symbols from the given data
|
Computation of the Range from Grouped Scores
Method of computing the Range from Grouped scores:
A. Data Given:
| Classes | Frequency |
| 46-50 41-45 36-40 31-35 26-30 21-25 16-20 11-15 | 5 7 9 10 8 6 4 4 |
B. Procedure:
1. Use the formula,
AR= EU-EL, if it is absolute range
TR= EU-EL+1, if it is total range
Where: EU=50.5= exact upper limit of the highest class interval
EL=10.5= exact lower limit of the lowest class interval
|
2. Substitute the values in the formula and solve
Quartile Deviation
Quartile is a Quantile divide the distribution into four equal parts. Quartile Deviation is a measure that describes the existing dispersion in terms of the distance selected observation points. It is a measure of variation that is specifically used as a measure of central tendency.
Computation of Quartile Deviation for Ungrouped Data
A. Data Given: The scores of 9 students in Ed103 quiz
15, 12, 33, 25, 30, 23,17,19,16
B. To find the Quantile Deviation, the measure of Quantiles in Central Tendency must be computed first
1. Arrange the Test scores from lowest to highest and rank it respectively
| Rank | Scores |
| 1st 2nd 3rd 4th 5th 6th 7th 8th 9th | 12 15 16 17 19 23 25 30 33 |
| | n=9 |
2. Compute Q1 and Q3 by using the formula
Qk=
|
b. Solving Q3
C. Computation of Quantile Deviation:
Use the formula, QD
in which, QD= the quartile deviation
Q1=the first quartile
Q3=the third quartile
and solve.
Computation of Quartile Deviation for Grouped Data
A. Data Given:
a. Computation of Q1
|
b. Computation of Q3
C. Computation of Quartile Deviation
|
|
Q1=67.88
Q3=73.81
b. Substitute the values of Q1 and
Q3in the formula and solve
o Standard Deviation
Standard Deviation is a measure of the average deviation or departure of the individual scores from the mean.The standard deviation has proven to be an extremely useful measure of spread in part because it is mathematically tractable.
|
|
for Ungrouped Data
A. Data Given:
23 18 24 17 27 19 15 20
n= 8
|
1.Use the formula :
|
The formula and solve.
Standard Deviation for Grouped Data
A. Data Given
| Class Interval | Frequency | Class mark X | X- | (X- | f(X- |
| 61-63 64-66 67-69 70-72 73-75 76-78 76-81 | 2 5 12 15 8 5 3 | 62 65 68 71 74 77 80 | 8.94 5.94 2.94 .06 3.06 6.06 9.06 | 79.924 35.284 8.644 .004 9.364 36.724 82.084 | 159.848 175.42 103.728 0.06 74.912 183.62 246.252 |
| | n=50 | ∑f=( X- | |||
B. Procedure
1.
|
|
in the formula and
solve,
Standard Deviation Formulas
| | Sample Standard Deviation | Population Standard Deviation |
| Standard Deviation for Ungrouped Data | s= | s= |
| Standard Deviation for Grouped Data | S= | S= |
Lesson 4: Measures of Correlation
CORRELATION
Correlation is the relationship between two or more paired factors of two or more sets of test scores (Best & Kahn, 1998). It is the tendency for the corresponding observations in two or more series to vary together from the averages of their respective series, that is, to have similar relative positions (Good, 134).
3 Types of Correlation
Positive Correlation
Is the tendency for the corresponding observations to have similar relative positions in their respective series (Good,134). Means that high scores in one variable (x) are associated with high scores in another variable (y).
Negative Correlation
Is the tendency for corresponding values to be divergent in position in their respective series (Good, 134). Means that high scores on one variable are associated with low scores in another variable or vice-versa.
Zero Correlation
Is the absence of any systematic tendency for corresponding observations to be either similar or dissimilar in their relative position in their respective series (Good, 134) there is no definite relationship between the two sets of measures.
Correlation coefficient can range from a- 1.00 or a-1.00 toward zero. The sign of the coefficient indicates the direction of the relationship and the numerical value of its strength.
Obtained correlation coefficient can be interpreted with the use of a scale, like the ones presented below (Best & Kahn, 1998).
Correlation Coefficient Degree of Relationship
.00 - .20 Negligible
.21 - .40 Low
.41 - .60 Moderate
.61 - .80 Substantial
.81 - 1.00 High to Very High
Pearson’s Product-Moment Correlation
This measure of relationship is used when factors to be correlated are both metric data. By metric data are meant measurements, which can be subjected to the four fundamental operations. To compute the correlation coefficient using the aforementioned test statistics, follow these steps:
1. Compute the sum of each set of scores (SX, SY).
2. Square each score and sum the squares (SX2, SY2).
3. Count the number of scores in each group (N).
4. Multiply each X score by its corresponding Y score.
5. Sum the cross products of X and Y (SXY).
6. Calculate the correlation, following the formula:
r= 

Where: N = number of paired observations
SXY = sum of the scores products of X and Y
SX = sum of the scores under variable X
SY = sum of the scores under variable Y
(SX)2 = sum of X scores squared
(SY)2 = sum of Y scores squared
SX2 = sum of squared X scores
SY2 = sum of squared Y scores
Let us illustrate how Pearson’s r is computed. This table shows the computational procedures in determining the degree of relationship between test scores of 10 students in English (X) and Mathematics (Y).
Computation of Correlation Coefficient using Pearson’s r
| X | Y | X2 | Y2 | XY |
| 90 85 80 75 70 65 60 55 50 45 | 80 72 70 65 68 55 60 50 53 44 | 8100 7225 6400 5625 4900 4225 3600 3025 2500 2025 | 6400 5184 4900 4225 4624 3025 3600 2500 2809 1936 | 7200 6120 5600 4875 4760 3575 3600 2750 2650 1980 |
| SX=675 | SY=617 | SX2=47625 | SY2=39203 | SXY=43110 |
N =10
N SXY = 10(43110) = 431 100
(SX)(SY) = (675) (617) = 416 475
{N SXY – (SX) (SY)} = 431 100 – 416 475 = 14 625
N SX2 = 10 (47 625) = 476 250
(SX)2 = (675)(675) = 455 625
(N SX2 – (SX)2) = 476 250 – 455 625 = 20 625
N SY2 = 10 (39 203) = 392 030
(SY)2 = (617)(617) = 380 689
(N SY2 – (SY)2 = 392 030 – 380 689 = 11 341
{(N SX2 – (SX)2) (N SY2 – (SY)2)} = (20 625)(11 341)
= 233,908,125
=square root of 233,908,125 = 15294.05522
r = 14,625/15294.05522
r = 0. 956 or 0.96
Results of the computation of Pearson’s r yielded a computed r of 0.96. this indicates that very high degree of relationship exists between the test scores in English and Mathematics. A student who scored high in English also obtained a high score in Mathematics.
Spearman Rho
This measure of relationship is used when test scores are ordinal or rank-ordered. In computing rho, the ff. steps have to be observed:
1. Rank the scores in distribution X, giving the highest score a rank of 1.
2. Repeat the process for the scores in distribution Y.
3. Obtain the difference between the two sets of ranks (D).
4. Square for rho, following the formula:
rho = 1- 

Where: rho = rank-order correlation coefficient
D = difference between paired ranks
SD2= sum of squared differences between paired ranks
N = number of paired ranks
The computational procedures for the calculation of rho are reflected in this table.
Computation of Correlation Coefficient using Spearman Rho
| X | Y | Rank of X | Rank of Y | D | D |
| 90 85 80 75 70 65 60 55 50 45 | 80 72 70 65 68 55 60 50 53 44 | 1 2 3 4 5 6 7 8 9 10 | 1 2 3 4 5 6 7 8 9 10 | 0 0 0 -1 1 -1 1 -1 1 0 | 0 0 0 1 1 1 1 1 1 0 |
| | | | | | SD2=6 |
N = 10
Rho = 1-
= 1 –
= 1- 
= 1 -
= 1 – 0.0364 = 0.964 or 0.96
No comments:
Post a Comment