E-Learn Knowledge Base
Register for this course: Enrol Now
Register for this course: Enrol Now
Data Analysis with Python
There are many programming languages available, but Python is popularly used by statisticians, engineers, and scientists to perform data analytics.
Here are some of the reasons why Data Analytics using Python has become popular:
- Python is easy to learn and understand and has a simple syntax.
- The programming language is scalable and flexible.
- It has a vast collection of libraries for numerical computation and data manipulation.
- Python provides libraries for graphics and data visualization to build plots.
- It has broad community support to help solve many kinds of queries.
Python Libraries for Data Analytics
One of the main reasons why Data Analytics using Python has become the most preferred and popular mode of data analysis is that it provides a range of libraries.
NumPy: NumPy supports n-dimensional arrays and provides numerical computing tools. It is useful for Linear algebra and Fourier transform.
Pandas: Pandas provides functions to handle missing data, perform mathematical operations, and manipulate the data.
Matplotlib: Matplotlib library is commonly used for plotting data points and creating interactive visualizations of the data.
SciPy: SciPy library is used for scientific computing. It contains modules for optimization, linear algebra, integration, interpolation, special functions, signal and image processing.
Scikit-Learn: Scikit-Learn library has features that allow you to build regression, classification, and clustering models.
Now, let’s look at how to perform data analytics using Python and its libraries.
Data Analytics Using the Python Library, NumPy
Let’s see how you can perform numerical analysis and data manipulation using the NumPy library.
1. Create a NumPy array.
2. Access and manipulate elements in the array.
3. Create a 2-dimensional array and check the shape of the array.
4. Access elements from the 2D array using index positions.
5. Create an array of type string.
6. Using the arange() and linspace() function to evenly space values in a specified interval.
7. Create an array of random values between 0 and 1 in a given shape.
8. Create an array of constant values in a given shape.
9. Repeat each element of an array by a specified number of times using repeat() and tile() functions.
10. Create an identity matrix using eye() and identity() function.
11. Create a 5x5 2D array for random numbers between 0 and 1.
12. Sum an array along the column.
13. Sum an array along the row.
14. Calculate the mean, median, standard deviation, and variance.
15. Sort an array along the row using the sort() function.
16. Append elements to an array using the append() function.
17. Delete multiple elements in an array.
18. Concatenate elements from 2 arrays.
Data Analytics Using Python Libraries, Pandas and Matplotlib
We’ll use a car.csv dataset and perform exploratory data analysis using Pandas and Matplotlib library functions to manipulate and visualize the data and find insights.
1. Import the libraries.
2. Load the dataset using pandas read_csv() function.
3. Display the head of the dataset using the head() function.
4. Display the bottom 5 rows from the dataset using the tail() function.
5. Print summary statistics of the dataset using the describe() function.
6.Plot a histogram for all the variables.
7. Box plot to visualize the relationship between vehicle size and engine hp.
8. Build a pair plot using the seaborn library.
9. Drop irrelevant columns from the dataset using drop() function.
10. Use rename() function to rename the columns.
11. Print the total number of duplicate rows.
12. Remove the duplicate rows using the drop_duplicates() function.
13. Drop the missing values from the dataset.
14. Plot a histogram to find the number of cars per brand.
15. Draw a correlation plot between the variables.
Conclusion
Data is getting generated rapidly in various formats. And companies are relying on data analytics to derive valuable information and hidden insights from this data. After reading this ‘Data analytics using Python’ article, you would have learned what data analytics is and the various applications of data analytics. You also looked at the different types of data analytics and process steps. Finally, you performed data analytics using Python’s NumPy, Pandas, and Matplotlib libraries.
Authors: Avijeet Biswal, T. C. OkennaRegister for this course: Enrol Now
Vectors for Data Science in Python
With the democratization of AI/ML and open source libraries like Keras, scikit-learn etc, anyone with basic python knowledge can set up a working ML classifier in under 5 mins time. While this is more than enough to get started, if you want to understand how different ML algorithms work or implement the latest SOTA (State of the Art) papers to your particular domain, the lack of mathematical expertise quickly becomes a bottleneck as I have experienced firsthand.
In this set of articles, I would try to introduce fundamental mathematics concepts one at a time for non math audience and show it’s practical use in ML / AI domain.
We start off with the simplest of the lot, vectors.
Vectors are simply quantities with direction. A few relatable real world example of vectors are Force, velocity, displacement etc.
For moving a shopping cart, you need to push (apply force) in the direction you want to move the cart. The force expended by you in moving the cart can be described fully by two values, the intensity (magnitude) of the push and the direction you pushed the cart. Any such quantity which requires both magnitude and direction to describe completely is called a vector.
Vectors are usually represented as bold lower case characters like v, w etc. **** Since writing boldface characters using pen and paper is difficult, it’s also represented with an arrow on top of lower case characters when using pen and paper. For this article, we will stick with boldface representation.
Graphically vectors are represented as arrows whose length signify the magnitude (intensity) of the vector and whose angle (from a frame of reference; in this case horizontal )represent the direction of the vector as shown below.

Please note that it is not required that the vector should start from origin (0,0). They can start from any point. For e.g. In the above diagram, u = w and v = a since they have same magnitude and same direction.
There are many ways to represent vectors mathematically. As Data scientists, the one we are interested in is to represent them as a tuple of numbers. Thus vector u can be represented as (2,2) while vector v can be represented as (4, 1). Same holds true for vector w and a.
Though it is easy for us to visualize vectors in 2 and 3 dimensions, the concept of vectors is not limited to 2 and 3 dimensions. It can be generalized to any number of dimensions and this is what makes vectors so useful in Machine Learning.
For e.g. c = (2,1,0) represent a vector in 3 dimensional space while d = (2,1,3,4) represent a vector in 4 dimensional space. As humans, though we cannot visualize dimensions higher than 3, mathematical way of representing vectors gives us the ability to perform operations on higher dimensional vector space.
By now, you must be bored and wondering why as a ML enthusiast you need to learn elementary physics and vectors. Turns out vectors have multiple applications in machine learning from building recommendation engines, to numerical representation of words for Natural Language processing etc and forms the base for all Deep Learning models for NLP.
Let’s start with a code example of how vectors are implemented in numpy and tensorlfow.
Please note: Full code is made available as gist on last section. Relevant subsections are inserted as pictures for illustration purpose.

Below you can see a simple word2vec implementation to show practical use of vectors in Natural Language Processing

As you can see, storing words in higher dimensional vector format is one of the main applications of vectors in Natural Language processing. This type of embedding preserves the context of the word.
In next section, we will go through basic operations like addition and subtraction and how it applies on vectors.
Vector Addition
Now that we have defined what a vector is, let’s find out how to perform basic arithmetic operations on them.
Let’s take the same two vectors u and v and perform a vector addition on them.

To add two vectors u and v graphically, we move the vector v such that it’s tail starts at the head of vector u as shown above (Lines DE and EF ). The sum of two vectors is the vector b that starts at tail of u and ends at the head of v (Line DF).
For better intuition, let’s take a real world example of driving to grocery shop. On the way, you stopped at gas station to fill gas. Let’s assume vector u represents how far the gas station (Point E ) is from your home (point D). If vector u represent the distance (displacement) from gas station to grocery store, then vector b, drawn from the tail of u to the head of v represent the sum u + v. It **** represents how far the grocery store (Point F) is from your home (initial starting point D).
Mathematically b can be represented as (6, 3) by looking at the graph.
There are other methods of calculating vector addition graphically like parallelogram method which you can explore on your own.
Now it’s not possible to plot a graph every-time we want to do vector arithmetic especially when it comes to higher dimension vectors. Fortunately, the mathematical representation of vectors provide us an easy way of doing vector addition.
Since each vector is a tuple of numbers, let’s see what we get if we add the corresponding numbers of each vector.
In the example above, u = (2,2) v = (4,1) b = (2+4, 2+1) = (6,3) which is same as the solution obtained graphically.
Thus vector addition can be done by simply adding corresponding elements of each vector and as you might have already inferred, only vectors having same dimensions can be added together. Let’s see a code implementation

Vector Subtraction
Before we move on to vector subtraction, let’s take a quick look at scalar multiplication another useful property of vectors.
Scalar is nothing but a quantity with only magnitude and no direction. e.g. any integer is a scalar. A real world example of scalar quantity is mass (weight), height of a person etc.
Let’s see what happens when we multiply a vector with a scalar quantity.
u = (2,2) v = (4,1)
If we want to multiply u with a scalar quantity C = 3, one intuitive way to look at it would be to multiply the individual numbers within vector u (2,2) with 3. Let’s see how that looks
d = C x u = 3 x u = (3 x 2, 3 x 2) = (6, 6)
Let’s plot the vectors on a graph and see.

As you can see, multiplying a vector u with a positive scalar value results in a new vector d in same direction, but with magnitude scaled by a factor C = 3
Let’s try multiplying a vector with negative value C = -1
e = C x v = -1 x v = (-1 x 4, -1 x 1) = (-4, -1).
Let’s plot and see how that looks like.

As you can see, multiplying a vector v by -1 results in a vector with same magnitude, but in opposite direction which can be represented as
e = -v or e + v = 0 (Null vector)
Vector subtraction graphically can be considered as a special case of vector addition where u -v = u + -v
Solving graphically

w = u a = -v b = w + a = u + -v = u -v = (-2, 1)
As is evident from graph, b = c or vector subtraction u -v is equal to vector c drawn from head of v to head of u (distance between heads)
Intuitively, this makes sense and is also consistent with traditional number system.
7 -5 = 2 (where 2 is the quantity when added to 5 gives 7) 5 + 2 = 7
Similarly, if you look at the graph, c is the vector which when added to v gives the vector v u = v + c
Now lets do do this mathematically by subtracting individual components within two vectors. c = u -v = (2,2) -(4,1) = (2 -4, 2–1) = (-2,1)
The result is same as the graphical method of solving. A code example is given below for reference.

Register for this course: Enrol Now
Introduction to Vectors for Data Science
Vectors in Data Science tell the properties of a data point in different dimensions. Different components of a data point forms a vector each component is related to one dimension.

In the above image point P is a vector in 2 Dimension space with x1 and x2 component or (x,y component).
(0,0) represents the origin.

In the above image point Q is a vector in 3-D space with x1,x2,x3 components. A mosquito at one position in a square room is like a data point in 3-D space 😄.
Similarly we can have N-Diminsion in a vector but it’s hard to plot N-D vector on a 2D surface. A vector in N-D will look like this V = [x1,x2,x3,……,xN]
Distance of a point from origin:
Let’s see how we can calculate distance of a point from origin in a space.

In the above image we have computed distance of three points from origin, point P in 2D, point Q in 3D and point X in N-D. We can use simple Pythagoras theorem to compute the distance a point from origin.
Distance between two points:
Let’s see how we can calculate distance between two points in a space.

In the above image, we are calculating the distance “d” between two points “P” and “Q”. Calculation of distance between two points in a space is similar to calculating the distance of a point from origin. You can consider in the above example if let, point Q is origin then the co-ordinates of point Q would be (0,0) and the same formula will get converted to the previous formula we used to find distance of point from origin.

Types of Vector Representation:
There are two types of vector representation,
Row Vector:
A row vector has one row and n columns.

Column Vector:
A column vector has one column and n rows.

Addition of two vector:

In the above image we can see how to add two vectors.
Multiplication of two vector:
There are two type of multiplication we can perform on vectors, Dot Product and Cross Product, For Data Science related study Cross Product is not used frequently so we will focus on Dot Product.
Transpose:
Before performing the dot product on two vectors perform transpose operation on one of the vector iff both the vectors are of same representation e.g. both the vectors are row vectors. Transpose of a vector converts the row vector to column vector and column vector to row vector.

In the above image vector A^T is transpose of vector A.
Dot Product:
We represent the dot product of two vectors by putting a dot between the vectors e.g. (A . B).
Note: For performing the dot product between two vectors number of column in vector 1 and number of row in vector 2 should be same. Which means dimension of both the vectors should be same. Before performing the dot product perform the transpose operation on one of the vector iff both the vectors are of same representation e.g. both the vectors are row vectors.

Geometric Intuition Behind Dot Product:
Now since we have learned what is dot product and how to compute it, let’s see what is the geometric intuition behind it so that we can connect the dots.
A . B = ||A|| ||B|| Cos θ
Above equation also calculate the dot product of two vectors A and B this equation can be used to calculate the angle between two vectors. Here ||A|| represents the length of the vector A and θ represents the angle between vector A and B.
Now let’s see how to calculate the angle between two vectors.

In the above image we can see how easily we can compute the angle between two vectors. Now let’s look at one interesting case.
What if the dot product between two vectors is zero?

In the above image we can see that if the dot product of two vectors is zero, both the vectors are perpendicular to each other.
Projection of Vector:
Projection of one vector on another vector is like throwing light on one vector and projecting it’s shadow on another vector. Let’s see how to get projection of one vector on another vector.

In the above image AB is the projection of vector A on vector B. Just assume that a light is coming from the bulb above vector A and the shadow of A is getting projected on B.
Unit Vector:
A unit vector is represented by hat on top of vector. It represents the single unit of a vector.
A unit vector always has the same direction as the vector.
Length of the unit vector is 1, ||A^|| = 1

Register for this course: Enrol Now