E-Learn Knowledge Base


Vsasf Tech ICT Academy, Enugu in early 2025 introduced a hybrid learning system that is flexible for all her courses offered to the general public. With E-learn platform powered by Vsasf Nig Ltd, all students can continue learning from far distance irrespective of one's location, hence promoting ODL system of education for Nigerians and the world at large.

Students are encouraged to continue learning online after fully registered through the academy's registration portal. All fully registered students with training fee payment completed can click on the login link Login to continue to access their course materials online

What is Categorial Data?

Data that can be categorized or grouped is called categorical data. It is a type of data in statistics that consists of categorial variables or data that is grouped, and it can be derived from observations made of qualitative data that are summarized as counts or from observations of quantitative data grouped within given intervals. Categorial data is also well-known as qualitative data.

Definition of Categorical Data

Categorical data is a type of data in statistics that stores data into groups or categories using names or labels.

Types of Categorial Data

Categorial Data is mainly divided into two main categories:

  • Nominal Categorial Data
  • Ordinal Categorial Data

They can be represented in pie charts and bar graphs respectively.

Nominal Data

Nominal Data is a type of data that consists of two or more categories without any specific order. They cannot be quantified that is put into any definite hierarchy. Variables without any quantitative value or order are labelled using nominal data.

Nominal Data is the simplest measure level and is considered the foundation of statistical analysis. Examples of Nominal data include hair, color, gender, race, place of residence, and college major.

Ordinal Data

Ordinal Categorial Data is a type of data that consists of categories with a natural rank order. However, the difference between the ranks may not be equal. It is a statistical type of quantitative data where variables exist in naturally occurring ordered categories.

Ordinal Data is used in social science and survey research, as it is relatively convenient for respondents to choose even when the underlying attribute is difficult to measure. This type of data can be easily represented using Bar Graphs, Histograms, Pie Charts, etc.

Bar-graphs

 

Difference Between Ordinal Data and Nominal Data

On the basis of characterstics of or ordinal data and nominal data, they can be differentiated as:

Ordinal Data Vs Nominal Data

Characterstics

Ordinal Data

Nominal Data

Definition

Represents categories with a specific order or ranking.

Represents categories with no inherent order or ranking.

Numeric Value

Grades (A, B, C), Likert scales (1st, 2nd, 3rd), Socio-economic status (Low, Medium, High).

Colors (Red, Blue, Green), Gender (Male, Female), Types of fruit (Apple, Orange, Banana).

Arithmetic Operations

Values have a meaningful order or sequence.

Values do not have a meaningful order or sequence.

Scale of Measurement

Limited arithmetic operations (e.g., you can say B is higher than C, but not by how much).

No meaningful arithmetic operations (e.g., no sense in saying Red + Blue = Green).

Examples

Falls under the ordinal scale.

Falls under the nominal scale.

Examples in Everyday Life

Ranking your preferences, ordering items by importance.

Categorizing items without any inherent order, like classifying colors or gender

Features of Categorical Data

Understanding the features of categorial data can help to choose appropriate statistical methods and make meaningful interpretations.

Here are some key features of Categorial Data:

Categorial Data

Categorial data is further sub-classified into nominal and ordinal Data.

Nominal Data: Nominal data represents unordered categories or categories without any inherent order.

  • Example: Colors, gender, and types of animals.

Ordinal Data: Ordinal Data represents ordered categories or categories having systematic order or ranking.

  • Example: Education level (high school, college, graduate).

Mutually Exclusive

The categorial data are mutually exclusive as each observation falls into exactly one category, and no overlapping happens between categories.

Countable Categories

The categories in the categorial data are countable and distinct. They are used in frequency distribution and bar charts.

No Arithmetic Operations

The arithmetic operations are not meaningful in categorial data as you cannot perform operations like the average of categories.

Mode as Measure of Central Tendency

In categorial data, the mode is often used to describe the central tendency. It represents the most number of times a category has occurred.

Chi-Square Test

One famous statistical test for categorical data analysis is the chi-square test. It helps to determine the significant associations between two categorical variables.

Examples of Categorical Data

Some examples of categorical data are,

Pet Preference: This is an example of nominal data, where the categories are based on qualitative characteristics. The categories are dogs, cats, birds, etc.

Yes/No Questions: This is an example of binary data, where the categories are limited to two values. For example, a survey question asking if someone has a pet or not.

Color Grouping: This is an example of nominal data, where the categories are based on qualitative characteristics. The categories are red, blue, green, etc.

Breed or Model: This is an example of nominal data, where the categories are based on qualitative characteristics. The categories are poodle, bulldog, sedan, SUV, etc.

Gender: This is an example of nominal data, where the categories are based on qualitative characteristics. The categories are male, female, non-binary, etc.

Hometown: This is an example of nominal data, where the categories are based on qualitative characteristics. The categories are New York, Los Angeles, Chicago, etc.

Coffee Preference: This is an example of nominal data, where the categories are based on qualitative characteristics. The categories are latte, espresso, cappuccino, etc.

Clothing Sizes: This is an example of ordinal data, where the categories have a natural order. The categories are small, medium, large, etc.

Analysis of Categorical Data

Analysis of categorial data refers to using statistical methods to analyze data grouped into categories. These categories can be nominal (with no inherent order, like hair color) or ordinal (with an inherent order, like education level). The goal of categorial data analysis is to uncover the patterns, relationships, and insights within this data type.

Here are some common ways of analysis of Categorial Data:

Frequency Tables: Create tables to display the data counts or frequencies of different categories.

Crosstabulation: Crosstabulation of two categorical variables is performed to explore the relationship between the two variables.

Chi-Squared Tests: A statistical method used to determine if there is a significant association between two categorical variables.

Contingency Tables: Constructing a two-way table showcases the frequency of occurrence of all unique pairs of values in two columns of attribute data.

Bar Charts and Pie Charts: Categorical data's Graphical representations help visualize the categories' distribution.

Odd Ratios: It is a statistical measure used to quantify the association between two categorical variables in case-control studies.

Logistic Regression: A regression analysis used to model the relationship between a categorical dependent variable and one or more categorical or continuous independent variables.

Multiple Correspondence Analysis: A technique used to analyze the relationships among categories of multiple nominal variables.

Analysis of Variance (ANOVA): A set of statistical tests used to compare the means of three or more groups, allowing for the analysis of the effects of categorical variables on continuous outcomes.

Regression Analysis: Modeling the relationship between a continuous outcome and one or more categorical predictors, providing insights into the effects of categorical variables on continuous outcomes.

What is Categorial Variable?

A categorical variable is a type of variable in statistics that can take on a limited or usually fixed number of possible values, assigning each individual or other unit of observation to a particular group or nominal category on the basis of a characteristic.

  • Qualitative variables or attribute variables are other names for categorical variables. They may be ordinal or nominal.
  • Nominal variables describe a name, label, or category without natural order.

In contrast, ordinal variables have a straightforward ordering of the categories. Examples of categorical variables include demographic information of a population, college major, and the roll of a six-sided die.

Advantages of Categorical Data

The advantages below show the value of categorical data for various analytical and business purposes, including market segmentation, trend analysis, and targeted marketing. The following are the advantages of categorical data:

  • Easy Interpretation: Categorial data is easier to interpret and analyze than quantitative data, making it an ideal choice for individuals without a strong background in mathematics or statistics.
  • Quick Recognition of Trends and Patterns: Categorical data allows for the quick recognition of trends, changes, and patterns based on interrelated variables, making the information easier to digest and understand
  • Segmentation for Targeted Marketing: The segmentation of categorial data helps to differentiate customers into different groups for targeted marketing, allowing businesses to modify their strategies to specific customer segmentation.
  • Use in Correlation and Trend Analysis: Categorial data is beneficial in understanding how different populations interact with each other, as well as for ascertaining correlations between different variables and understanding trends and patterns within a population.
  • Concrete Results: The results of categorical data are concrete, without subjective, open-ended questions, providing straightforward insights.

Disadvantages of Categorical Data

There are some disadvantages to using categorical data, which are mentioned below:

  • Limited Statistical Analysis: Categorical data is limited to the kind of statistical analysis that can be performed on it. It does not have the same statistical properties as quantitative data, which means that numerical or statistical analysis cannot be performed on categorical data.
  • Loss of Detail: When continuous variables are categorized, a level of detail is lost. This can make it challenging to analyze the data and may result in a less accurate representation of the underlying patterns or relationships.
  • Low Sensitivity: Categorical data research is often low in sensitivity, with responses typically being either good/bad or yes/no. This can limit the ability to detect subtle differences or trends in the data.
  • Expensive and Time-Consuming: Categorical data requires larger samples, which can be more expensive and time-consuming to gather compared to quantitative data.
  • Potential for Irrelevant Data: When collecting categorical data, researchers may have to handle irrelevant data, which can add complexity to the data analysis process

Categorical and Numerical Data

On the basis of aspects of categorical data and nominal data, they can be differentiated as:

Categorical Data Vs Nominal Data

Aspects

Categorial Data

Numerical Data

Other Name

Qualitative Data

Quantitative Data

Nature of Data

Non-numerical and can be identified based on names or labels

Form of numbers and can be used for arithmetic processes.

Types Of Data

Nominal and Ordinal Data

Discrete and Continuous Data

Analysis Technique

Perform research involving qualitative analysis

Perform calculation problem in statistics.

Examples

Name, Gender, Phone Number etc.

Measurement, Such as height and weight, etc.

Application Of Categorial Data

Categorial data is divided into nominal and ordinal Data. They have various real-world applications. Here are some of the real-world examples of them.

Nominal Data is used in places such as purchase information, where non-numerical, unordered categorical data is collected from customers for activities like shipping orders or serving food is considered nominal.

Educational levels, income ranges, and customer satisfaction surveys all lie in ordinal data, where data has a natural order or ranking.

Challenges In Categorial Data

While working with categorial data, several challenges need to be considered. Some of these challenges include:

  • Data Quality: Ensuring the accuracy and consistency of categorical data is crucial for accurate analysis. Errors in categorization or incorrect labelling can lead to incorrect insights and conclusions.
  • Measurement Error: Ordinal data with a ranked order can suffer from measurement error due to the lack of consistent spacing between ranks. This can make it difficult to compare and analyze the data accurately.
  • Mutually Exclusive Categories: Categories in categorical data must be mutually exclusive, meaning each category should not overlap with any other category. This ensures that the data is properly organized and can be analyzed effectively.
  • Lack of Quantitative Information: Nominal data does not provide any quantitative information, which can limit the types of analyses and insights that can be derived from the data.
  • Difficulty in Ranking: Nominal data cannot be ranked or ordered, making it challenging to compare and analyze the data in a meaningful way.
  • Limited Analysis Options: Nominal data has fewer analysis options compared to ordinal data, as it does not provide any information about the ranking or order of the categories.
  • Handling Irrelevant Data: Nominal data, which is often collected through surveys or questionnaires, can sometimes contain irrelevant or empty responses. Researchers need to find ways to handle this irrelevant data to ensure accurate analysis.

Examples on Categorical Data

Example 1: Favorite Ice Cream Flavors

You conduct a survey in your school cafeteria to find out students' favorite ice cream flavors. You collect the following data:

Student

Favorite Flavor

John

Chocolate

Mary

Vanilla

Peter

Mint Chocolate Chip

Alice

Chocolate

Bob

Strawberry

Sarah

Strawberry

Solution:

Data is categorical because "Favorite Flavor" has distinct categories like chocolate, strawberry, vanilla, etc. You can analyze this data in various ways below is one such way:

Create a table showing the number of students who prefer each flavor.

Flavor

Frequency

Chocolate

2

Strawberry

2

Vanilla

1

Mint Chocolate Chip

1

Example 2: Movie Genre Preferences

You ask your classmates about their favorite movie genres and get the following data:

Students

Favorite Genre

David

Animation

Emma

Sci-Fi

Liam

Action

Olivia

Drama

Adam

Comedy

Noah

Comedy

Solution:

Data is categorical because "Favorite Genre" has distinct categories like Comedy, Drama, Action, etc.

Create a table showing the number of students who prefer each Genre.

Genre

Frequency

Comedy

2

Action

1

Drama

1

Sci-Fi

1

Animation

1

You can represent this both example in Pie chart as well as in Bar graph.

Example 3: Clothing Sizes

You ask a group of people their clothing size, and you get the following responses:

Person 1: Medium

Person 2: Large

Person 3: Small

Person 4: Medium

Person 5: Large

Solution:

Create a frequency table:

Size

Frequency

Small

1

Medium

2

Large

2

Example 4: Color Preference Survey

You survey a group of people to find out their favorite colors:

Person 1: Blue

Person 2: Red

Person 3: Green

Person 4: Blue

Person 5: Red

Solution:

Create a frequency table:

Color

Frequency

Blue

2

Red

2

Green

1

Example 5: Students’ Sports Participation

You collect data on sports participation in your school:

Alex: Basketball, Soccer

Ben: Baseball

Emma: Volleyball

Chloe: Tennis, Swimming

David: Basketball, Track

Solution:

This data is categorical, and you can create a frequency table showing the number of students participating in each sport:

Sport

Frequency

Basketball

2

Soccer

1

Baseball

1

Volleyball

1

Tennis

1

Swimming

1

Track

1

Practice Questions on Categorical Data

1: Sports Survey: Your school wants to improve its sports program and asks students which sports they participate in. You collect the following data:

Student

Sports

Alex

Basketball, Soccer

Ben

Baseball

Emma

Volleyball

Chloe

Tennis, Swimming

David

Basketball, Track

  • Create a frequency table showing the number of students who participate in each sport.
  • Draw a bar chart to visualize the popularity of different sports.
  • If your school has budget limitations, which sports might be prioritized based on this data? Why?

2: You want to understand your classmates' lunch habits. You ask them about their preferred lunch options (packed lunch, school cafeteria, fast food) and their grade level. Examine the data and answer these questions:

  • Are there any differences in lunch preferences between different grade levels? Explain your findings.
  • If you were in charge of the school cafeteria, what changes might you make based on this data?

3: A teacher wants to know how students travel to school. The options are Bus, Car, Bike, Walk. Create a frequency table and visualize the data using a bar chart.

4: Survey the favorite social media platform (Facebook, Instagram, Twitter, LinkedIn) of 10 people. Create a frequency table and visualize the data using a pie chart.

5: You collect data on the favorite pet animals (Dog, Cat, Fish, Bird) among 20 people. Create a frequency table and represent it using a bar chart.

6: A restaurant owner wants to know the preferred payment methods (Cash, Card, Mobile Payment) of their customers. Analyze the data by creating a frequency table and visualizing it with a pie chart.

7: A school survey asks students about their favorite subject (Math, Science, History, English). Create a frequency table and visualize the data using a bar chart.

8: You want to find out the most common shoe size in your neighborhood. The options are 7, 8, 9, 10. Create a frequency table and represent the data using a bar chart.

9: A researcher surveys people on their coffee preference (Latte, Espresso, Cappuccino, Americano). Create a frequency table and visualize the data using a pie chart.

10: Conduct a survey to find out the preferred movie-watching method (Theater, Streaming, DVD) among a group of friends. Analyze the data by creating a frequency table and visualizing it with a bar chart.

Authors: T. C. Okenna, GeeksforGeeks
Register for this course: Enrol Now

Statistics in Economics

Statistics plays a major role in economics. Statistics helps in the study of market structure and understand the different economic problems. After a better understanding of the economic problems, statistics also help in solving those issues by formulating appropriate economic policies. Every economics branch uses statistics to prove different economic theories. One can also establish a mathematical relationship with the help of statistics. Economists can present the facts of economics precisely. They can also determine the cause-and-effect relationship between different data sets. 

Functions of Statistics

1. Simplification of Complex Facts: The study of mass and complex data is difficult to understand. A layman cannot understand the complex terms and information presented in the analysis and results of the study. Therefore, different statistical methods help an economist or user in presenting complex data in an understandable and simple form.

2. Presentation of Facts in the Definite Form: Statistics helps preset the facts of data using figures in their true form. Presenting qualitative facts about data instead of quantitative figures can not present the data effectively. For example, saying that the literacy rate has increased by 5% over the past two years is better than simply saying that the literacy rate is increasing.

3. Comparisons of Facts: Comparing the facts and figures for the pre-determined purpose is an essential function of statistics. It is because absolute figures can not convey a better concrete meaning. Therefore, the relationship between two data sets or groups can be compared through different statistical methods such as ratios, averages, percentages, rates, etc.

4. Forecasting: Uncertainty and risk in business can be found in abundance. Therefore, organizations and the economy as well have to forecast the future to prepare themselves for any kind of change. Proper and accurate forecasting helps in reducing uncertainty. For this purpose, one can use different statistical tools such as time series analysis, interpolation, etc., as they can help make a projection of the future.

5. Formulation and Hypothesis Testing:  Testing a hypothesis means testing a fake scenario to understand the results of its formulation. Therefore, different statistical tools and methods help an economist in formulating and testing the hypothesis.

6. Enlarging Individual Knowledge and Experience: An individual can widen their horizon using statistics while going through different procedures of statistics. Statistics also enlarge the thinking and reasoning power of an individual and ultimately help them reach a rational conclusion.

Importance of Statistics

To Government

  • Statistics helps the government of a country fulfill different objectives by letting them collect, organize, present, analyze, and interpret a piece of large information in numerical figures. With this, the government can efficiently run the economy and fulfill welfare and other objectives.
  • The government of a country can also formulate various economic policies by using statistical methods like index numbers, forecasting and demand analysis, time-series analysis, and many more.
  • In democratic countries like India, different political groups take the help of statistics to know about their popularity among the masses.

In Economics

  • Statistics help in formulating different economic laws such as the Law of Demand, Law of Supply, Elasticity of Demand, Elasticity of Supply, etc., were developed using the inductive method of generalization.
  • An economy faces different economic problems such as unemployment, poverty, etc. Statistics, with the help of different techniques and tools, help an economy understand and solve these economic problems efficiently.
  • An economy includes different market structures such as perfect competition, monopoly, oligopoly market, etc. For better results and functioning, the study of these market structures is essential. Statistics help in the study through the comparison of the cost, profits, and prices of the firms.
  • An economist can also estimate a mathematical relationship between the different variables of economics.
  • Ultimately, statistics helps study the behavior of different concepts of economics. For example, the laws of supply and demand are used to understand the behavior of consumers toward the purchase and usage of a commodity or service by considering different determinants of supply and demand.

In Business

  • Statistics provide different guidelines and tools to know the feasibility, location, inputs availability, taxes, size of output, turnover, market size, etc., before establishing a business.
  • A businessman can estimate the demand for their service or product with the help of different statistical methods such as trend analysis, etc.
  • Statistics also help a business in the production planning process to ensure a proper balance between the supply and demand for a good or service offered by the firm.
  • Different statistical techniques help a business in the analysis of purchasing power, consumer wants, pricing, population, etc., to understand the potential of the target market for its service or product.

Limitations of Statistics

1. Ignores the Qualitative Aspect: Statistics does not consider aspects that can not be expressed in quantitative terms. One has to convert the qualitative aspects like kindness, honesty, care, health, intelligence, etc., into quantitative terms to study.

2. Does not Deal with Individual Terms: As the definition of statistics suggests, it only deals with the aggregate of facts and does not consider individual items. For example, it does not consider the marks of one student but will consider the marks of a class.

3. Requires only Uniform and Homogeneous Data: An economist can not perform a statistical study if the data gathered is not homogeneous.

4. Can Be Misused: If the methods of statistics are not used by an expert, trained, specialized, and unbiased person, there are huge chances of misuse and inaccurate results. A biased individual can transform the data according to their needs and purpose.

5. Results are True only on Average: In statistics, a result is true only on average. It means that if we say that the average mark of a class of 50 students is 60, it does not mean that every student has the same marks. One student might have 30 or 40 marks.

Authors: T. C. Okenna, GeeksforGeeks
Register for this course: Enrol Now

What Are Probability Distributions?

A probability distribution is a statistical function that describes all the possible values and probabilities for a random variable within a given range. This range will be bound by the minimum and maximum possible values, but where the possible value would be plotted on the probability distribution will be determined by a number of factors. The mean (average), standard deviation, skewness, and kurtosis of the distribution are among these factors.

Types of Probability Distribution

The probability distribution is divided into two parts:

  1. Discrete Probability Distributions
  2. Continuous Probability Distributions

Discrete Probability Distribution

A discrete distribution describes the probability of occurrence of each value of a discrete random variable. The number of spoiled apples out of 6 in your refrigerator can be an example of a discrete probability distribution.

Each possible value of the discrete random variable can be associated with a non-zero probability in a discrete probability distribution.

Let's discuss some significant probability distribution functions.

Binomial Distribution

The binomial distribution is a discrete distribution with a finite number of possibilities. When observing a series of what are known as Bernoulli trials, the binomial distribution emerges. A Bernoulli trial is a scientific experiment with only two outcomes: success or failure.

Consider a random experiment in which you toss a biased coin six times with a 0.4 chance of getting head. If 'getting a head' is considered a ‘success’, the binomial distribution will show the probability of r successes for each value of r.

The binomial random variable represents the number of successes (r) in n consecutive independent Bernoulli trials.

bino-1

Bernoulli's Distribution

The Bernoulli distribution is a variant of the Binomial distribution in which only one experiment is conducted, resulting in a single observation. As a result, the Bernoulli distribution describes events that have exactly two outcomes.

Here’s a Python Code to show Bernoulli distribution:

ber-1.

The Bernoulli random variable's expected value is p, which is also known as the Bernoulli distribution's parameter.

The experiment's outcome can be a value of 0 or 1. Bernoulli random variables can have values of 0 or 1.

The pmf function is used to calculate the probability of various random variable values.

ber-1.

Poisson Distribution

A Poisson distribution is a probability distribution used in statistics to show how many times an event is likely to happen over a given period of time. To put it another way, it's a count distribution. Poisson distributions are frequently used to comprehend independent events at a constant rate over a given time interval. Siméon Denis Poisson, a French mathematician, was the inspiration for the name.

The Python code below shows a simple example of Poisson distribution. 

It has two parameters:

  1. Lam: Known number of occurrences
  2. Size: The shape of the returned array

The below-given Python code generates the 1x100 distribution for occurrence 5.

pois-1

Continuous Probability Distributions

A continuous distribution describes the probabilities of a continuous random variable's possible values. A continuous random variable has an infinite and uncountable set of possible values (known as the range). The mapping of time can be considered as an example of the continuous probability distribution. It can be from 1 second to 1 billion seconds, and so on.

The area under the curve of a continuous random variable's PDF is used to calculate its probability. As a result, only value ranges can have a non-zero probability. A continuous random variable's probability of equaling some value is always zero.

Now, look at some varieties of the continuous probability distribution.

Normal Distribution

Normal Distribution is one of the most basic continuous distribution types. Gaussian distribution is another name for it. Around its mean value, this probability distribution is symmetrical. It also demonstrates that data close to the mean occurs more frequently than data far from it. Here, the mean is 0, and the variance is a finite value.

In the example, you generated 100 random variables ranging from 1 to 50. After that, you created a function to define the normal distribution formula to calculate the probability density function. Then, you have plotted the data points and probability density function against X-axis and Y-axis, respectively.

normal-1

normal-2.

Continuous Uniform Distribution

In continuous uniform distribution, all outcomes are equally possible. Each variable has the same chance of being hit as a result. Random variables are spaced evenly in this symmetric probabilistic distribution, with a 1/ (b-a) probability.

The below Python code is a simple example of continuous distribution taking 1000 samples of random variables.

cud-1

cud-1

Log-Normal Distribution

The random variables whose logarithm values follow a normal distribution are plotted using this distribution. Take a look at the random variables X and Y. The variable represented in this distribution is Y = ln(X), where ln denotes the natural logarithm of X values.

The size distribution of rain droplets can be plotted using log normal distribution.

log-1

Exponential Distribution

In a Poisson process, an exponential distribution is a continuous probability distribution that describes the time between events (success, failure, arrival, etc.).

You can see in the below example how to get random samples of exponential distribution and return Numpy array samples by using the numpy.random.exponential() method.

exp-1

Authors: T. C. Okenna
Register for this course: Enrol Now

Skewness is a measure used in statistics to understand a data set's symmetry or lack thereof. It helps determine whether the data is more spread out on one side of the mean than the other. A data set can be skewed either to the left (negative skew) or the right (positive skew), or it can be symmetrical (zero skew).

The measure of skewness tells us the direction and the extent of skewness. In a symmetrical distribution, the mean, median, and mode are identical. The more the mean moves away from the mode, the larger the asymmetry or skewness.

Before learning, let's learn more about Mean, Median, and Mode first.

Table of Content

  • Skewness Formula
  • Type of Skewness
    • Positive Skewness
    • Negative Skewness
    • Zero Skewness
  • Methods to Measure Skewness
  • Karl Pearson's Co-efficient of Skewness 
  • Solved Examples on Skewness Formula

Mean

Mean is the average of the numbers in the data distribution, It is calculated by adding up all the values in the dataset and dividing the sum by the number of values in the dataset.

Mean= Sum of all values in Dataset / Total number of values

Example: Find the mean of a dataset of exam scores: 70, 80, 85, 90, and 95.

Solution:

Mean = (70 + 80 + 85 + 90 + 95) / 5 = 84

So the mean of this dataset is 84.

Median

  • When arranging all the data in order (ascending and descending), the comes in the middle of the data is called the median.
  • Median is the middle value of a dataset when the values are arranged in order from smallest to largest. 

Examples of Odd Numbers in the Dataset

Example 1: Find the median of a dataset of exam scores: 70, 85, 80, 95, 90

Solution:

Firstly arrange all no. in order from smallest to largest: 70, 80, 85, 90, 95.
The mid value is 85. so, the median is 85.

Example 2: Find the median of a dataset: 5, 10, 15, 20, 25. 

Solution:

Firstly arrange all no. in order from smallest to largest: 5, 10, 15, 20, 25. 
The mid value is 15. so, the median is 15.

If there are an even number of values in the dataset, the median is calculated by taking the average of the two middle values.

Examples of Even Numbers in the Dataset

Example 1: Find the median of a dataset of exam scores: 70, 80, 85, 90.

Solution:

The median is calculated as (80 + 85) / 2 = 82.5
So the median of this dataset is 82.5.

Example 2: Find the median of a dataset: 2, 4, 6, 8, 10, 12.

Solution:

Firstly, we need to find the middle two numbers. So, 6, and 8 are mid values of the dataset 
Median = (6 + 8) / 2 = 7
So the median of this dataset is 7.

Mode

The most frequently used number in data is called the mode of the data.

Example 1: We have a data set representing the number of pets owned by 10 people: 3, 1, 0, 2, 1, 1, 4, 2, 2, 1. Find the mode.

Solution:

So, the value that appears most frequently in the data set is 1. The value 1 appears four times. Therefore, the mode of this data set is 1.

Skewness Formula

The skewness formula is discussed in the image below.

Skewness Formula
Skewness Formula

Type of Skewness

Various types of skewness used in mathematics are,

  • Positive Skewness
  • Negative Skewness
  • Zero Skewness

Positive Skewness

  • Positive Skewness means the tail on the right side of the distribution is longer. The mean and median will be greater than the mode. 
  • Condition for positive skewness = Mean > Median >Mode

The positive curve of skewness is shown in the image below.

Positive Skewness
Graph of Positive Skew

Let's take an example of the income distribution where a few people earn very high incomes and the majority earn lower incomes. So, this is often positively skewed. Analyzing skewed data can provide valuable insights into the underlying causes and potential solutions or interventions.

Negative Skewness

  • Negative Skewness means the tail of the left side of the distribution is longer than the tail on the right side. The mean and median will be less than the mode.
  • The condition for negative skewness is Mode > Median > Mean

The curve shows negative skewness in the image below,

Negative Skewness
Graph of Negative Skew

Let's take an example of a match; during the match, most of the players of a particular team scored runs above 50, and only a few of them scored below 10. In such a case, the data is generally represented with the help of a negatively skewed distribution. This data is helpful to analyze the game's performance.

Zero Skewness

  • It is also known as a "symmetric distribution". It signifies that the distribution of data is evenly distributed around the mean, with no long tails on either end of the distribution.
  • The condition for zero skewness is Mean = Mode = Median

The curve for zero skews is shown in the image below.

Zero Skewness
Graph of Zero Skew

Methods to Measure Skewness

Skewness can be measured using Karl Pearson's Coefficient of Skewness.

Karl Pearson's Coefficient of Skewness 

The formula for measuring Skewness using Karl Pearson's Coefficient is discussed below in the image,

Karl Pearson Coefficient of Skewness
Karl Pearson Coefficient of Skewness

Karl'sonditions

  • Mean = Mode = Median, then the coefficient of skewness is zero for symmetrical distribution.
  • Mean > Mode, then the coefficient of skewness will be positive.
  • Mean < Mode, then the coefficient of skewness will be negative.

Karl's person`s coefficient of skewness has a positive sign for the positively skewed and a negative sign for the negatively skewed.

Solved Examples on Skewness Formula

Example 1: Find the skewness for the given Data ( 2,4,6,6) 

Solution:

Mean of Data = (2 + 4 + 6 + 6) / 4

                       = 18 / 4

                       = 4.5

Number of terms (n) = 4 (even)

Median of Data = {[n / 2]th + [n / 2 + 1]th}/2 term
                          = [(4 /2)th term + (4/2 +1)th term] / 2
                          = [2nd term + 3rd term] / 2
                          = [4+6]/2
                          = 10/2

Median of Data  = 5

Mode of Data = Highest Frequency term = 6 (frequency 2)

S.D. = √[(4.5-2 )2 + (4.5-4)2 + (4.5-6)2 + (4.5-6)2/4]
       = √[(6.25 + 0.25 + 2.25 + 2.25) / 4]
       = √1.658
       = 1.1.658

Skewness = 3(Mean - Median)/S.D.

By Applying Skewness Formula,
Skewness = 3(4.5 - 5)/1.658
= 3(-0.5)/ 1.658
Skewness = - 0.904

So, the skewness of these data is negative.

Example 2: A boy collects some rupees in a week as follows (25,28,26,30,40,50,40) and finds the skewness of the given Data in question with the help of the skewness formula.

Solution:

Mean of Data = (25+28+26+30+40+50+40) / 7
= 239 / 7

= 34.14

Number of terms (n) =7 (odd) 

Arrange Data in ascending order = 25,26 ,28,30,40,40,50
The median of data is = 30

Mode of Data = Highest Frequency term = 40 (frequency 2)

S.D       = √(1/7 - 1) x ((25 - 34.1429)2 + (28 - 34.1429)2 + (26 - 34.1429)2 + (30 - 34.1429)2 + (40 - 34.1429)2 +(534.1429)2 + (40 - 34.1429)2)
           = √(1/6) x ((-9.1429)2 + (-6.1429)2 + (-8.1429)2 + (-4.1429)2 + (5.8571)2 + (15.8571)2 + (5.8571)2)
           = √(0.1667) x ((83.5926) + (37.7352) + (66.3068) + (17.1636) + (34.3056) + (251.4476) + (34.3056))
           = √(0.1667) x 524.8571
           = √87.4762
         . = 9.3529

                                                             Skewness = 3(Mean - Median)/S.D.

By Applying Skewness Formula,
Skewness = 3(34.14 - 30)/9.3529
  = 1.32
Skewness = 1.32

So skewness for these data is positive

Example 3: The is of all classes of a school are as follows., find their skewness? 

1st (35), 2nd(32), 3rd(38), 4th(39), 5th(43)

Class Name Number of students
1st 35
2nd 32
3rd 38
4th 39
5th 45

Solution:

Mean of Data =  (35 + 32 + 38 + 39 + 42)/5

                      = 186/5

                      = 37.2

Number of terms (n) = 5 (odd)

Arrange Data in ascending order = 32,35,38,39,42 

Median of Data  = 38

S.D. = √(1/5 - 1) x ((35 - 37.2)2 + (32 - 37.2)2 + (38 - 37.2)2 + (39 - 37.2)2 + (42 - 37.2)2)
       = √(1/4) x ((-2.2)2 + (-5.2)2 + (0.8)2 + (1.8)2 + (4.8)2)
       = √(0.25) x ((4.84) + (27.04) + (0.64) + (3.24) + (23.04))
       = √(0.25) x 58.8
       = √14.7
       = 3.8341

Skewness = ∑(yi - ymean) / (n - 1) x (sd)³

Skewness =((35 - 37.2)³ + (32 - 37.2)³ + (38 - 37.2)³ + (39 - 37.2)³ + (42 - 37.2)³) / (5 - 1)³ x 3.8341
Skewness = ((-2.2)³ + (-5.2)³ + (0.8)³ + (1.8)³ + (4.8)³ )/ (4)³ x 3.8341
Skewness =((-10.648) + (-140.608) + (0.512) + (5.832) + (110.592)) / 64 x 3.8341
Skewness =-34.32 / 245.3824
Skewness = -0.1522

So, the skewness of these data is negative.

Authors: T. C. Okenna, GeeksforGeeks
Register for this course: Enrol Now

Kurtosis is a statistical measure used to describe the distribution of observed data around the mean. It is used to identify the tails and sharpness of a distribution. The kurtosis of a probability distribution for a random variable x is defined as the ratio of the fourth central moment (μ4​) to the fourth power of the standard deviation (σ4).

In this article, we will explore how to calculate kurtosis in statistics.

Table of Content

  • What is Kurtosis in Statistics?
  • Types of Kurtosis
  • How to Calculate Kurtosis?

What is Kurtosis in Statistics?

Kurtosis is a measure of the “tailedness” of the probability distribution of a real-valued random variable. In other words, kurtosis identifies whether the tails of a given distribution contain extreme values.

Types of Kurtosis

There are three types of kurtosis: mesokurtic, leptokurtic, and platykurtic. Mesokurtic distributions have a kurtosis value similar to that of the normal distribution. Leptokurtic distributions have positive kurtosis and platykurtic distributions have negative kurtosis.

How to Calculate Kurtosis?

Kurtosis can be calculated by dividing the fourth-order moment by the standard deviation of the population raised to the fourth power. Kurtosis is a measure of the fourth moment of a probability distribution of a random variable. It can be calculated as the ratio of the fourth moment to the square of the variance.

 

To calculate kurtosis in statistics, you can follow these steps:

  1. Compute the Mean (μ): Calculate the arithmetic mean of the dataset.
  2. Compute the Variance (σ2): Calculate the variance of the dataset, which is the average of the squared differences from the mean.
  3. Compute the Standard Deviation (σ): Take the square root of the variance to find the standard deviation.
  4. Compute the Fourth Moment (μ4): Calculate the fourth moment of the dataset, which is the average of the fourth power of the differences from the mean.
  5. Compute Kurtosis: The formula for calculating kurtosis is:
    Kurtosis = μ4/σ4​

    Sometimes, you might also see a version of kurtosis that subtracts 3 from this calculation. This is called excess kurtosis, and it subtracts 3 because the kurtosis of a normal distribution is 3.
    So the formula becomes:
    Excess Kurtosis = (μ4/σ4​)​ − 3
  6. This version is often used because it allows for easier comparison to the normal distribution, where excess kurtosis of 0 indicates normality.

Kurtosis can be classified as:

  • Leptokurtic: Distributions with wide tails and positive kurtosis.
  • Mesokurtic: When the excess kurtosis is zero or close to zero.
  • Platykurtic: When the excess kurtosis is negative.

Conclusion - How to Calculate Kurtosis in Statistics

Kurtosis is a valuable tool in statistics that allows us to understand the shape of a distribution. By calculating kurtosis, we can identify whether a dataset has heavy or light tails, and whether it has more or fewer extreme values than the normal distribution.

Authors: T. C. Okenna, GeeksforGeeks
Register for this course: Enrol Now
Page 1 of 2