Introduction to Data Science in Python: Numpy Module
Numpy is a popular python module that provides fast and efficient operations on n-dimensional arrays of homogeneous data. It has variety of functions that can perform high-end scientific and numerical operations.
In this post, we will see some basic common usages of the module.
First, we have to import the module. Usually, we import like this-
import numpy as np
Now, let’s checkout some basic operations.
Creating Numpy Arrays
From Python Lists
We can simply, use np.array()
function to convert a python list into array.
>>> a = [1,2,3,4,5]
>>> np.array(a)
array([1, 2, 3, 4, 5])
>>> matrix = [[1,2,3],[4,5,6],[7,8,9]]
>>> np.array(matrix)
array([[ 1, 2, 3],
[ 4, 5, 6],
[ 7, 8, 9]])
Built-in Methods
np.arrange()
works exactly like pythonrange()
. Let’s look at some examples. The first example has start and ending value while the second one has another parameter for step difference.>>> np.arange(0,10) array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
>>> np.arange(9,21,2) array([ 9, 11, 13, 15, 17, 19])
- If we want to create n-dimensional arrays (matrix) using only zeros or ones, we can use
np.zeros()
andnp.ones()
respectively.>>> np.zeros(5) array([0., 0., 0., 0., 0.])
we can also define the size $m \times n$ where m is the number of rows and n is the number of columns
>>> np.zeros((4,3)) array([[0., 0., 0.], [0., 0., 0.], [0., 0., 0.], [0., 0., 0.]])
>>> np.ones((2,3)) array([[1., 1., 1.], [1., 1., 1.]])
-
We can use
np.linspace()
to return evenly spaced numbers over a specified interval>>> np.linspace(0,10,3) array([ 0., 5., 10.])
np.eye()
can be used to create an identity matrix. The following example creates an identity matrix of size $4 \times 4$>>> np.eye(4) array([[1., 0., 0., 0.], [0., 1., 0., 0.], [0., 0., 1., 0.], [0., 0., 0., 1.]])
np.random.rand()
can be used to create random samples from a uniform distribution over [0, 1).>>> np.random.rand(5) array([0.26107011, 0.39744656, 0.30421456, 0.33464264, 0.00952929])
>>> np.random.rand(5,5)
array([[0.06062392, 0.50044789, 0.16726052],
[0.22518242, 0.6768908 , 0.34212198],
[0.81288987, 0.71675748, 0.65559496]])
for creating a matrix of random samples from a normal distribution we use np.random.randn()
method
>>> np.random.randn(5)
array([-0.4356041 , 3.21925889, -1.95362245, -1.11175927, -0.61676613])
array of random integer can be created using np.random.randint()
method. the following example creates an array of $10$ random integers ranged between $1$ and $100$.
>>> np.random.randint(1,100,10)
array([61, 73, 57, 15, 13, 61, 50, 14, 47, 44])
also, we can use np.random.choice()
method. Here, we create an array of $10$ random numbers ranged between $0$ and $19$. The replace=False
ensures no duplicate value is included.
>>> np.random.choice(range(20), 10, replace=False)
array([ 8, 0, 6, 10, 3, 7, 2, 4, 1, 16])
same thing can be done using random.sample()
method too. But you need to convert it to numpy array if you want it to perform numpy operations.
>>> import random
>>> random.sample(range(100), 10)
[18, 99, 98, 92, 63, 30, 5, 20, 60, 47]
max/min
and argmax/argmin
in numpy, array_variable.max()
and array_variable.min()
are used to return the maximum/minimum values respectively. array_variable.argmax()
and array_variable.argmin()
are used to return the indices of maximum/minimum values respectively.
>>> test = np.random.randint(1,100,9)
>>> test.reshape(3,3)
>>> test
array([[34, 12, 22],
[69, 36, 27],
[26, 57, 53]])
>>> test.max()
69
>>> test.min()
12
>>> test.argmax()
3
>>> test.argmin()
1
Numpy Operations
We can use the itemwise addition, subtraction, multiplications, and divisions as follows. Square and logs can be calculated using np.sqrt()
and np.log()
methods.
>>> test + test
[[ 68 24 44]
[138 72 54]
[ 52 114 106]]
>>>
>>> test - test
[[0 0 0]
[0 0 0]
[0 0 0]]
>>>
>>> test * test
[[1156 144 484]
[4761 1296 729]
[ 676 3249 2809]]
>>>
>>> test / test
[[1. 1. 1.]
[1. 1. 1.]
[1. 1. 1.]]
>>>
>>> test ** 2
[[1156 144 484]
[4761 1296 729]
[ 676 3249 2809]]
>>>
>>> np.sqrt(test)
array([[5.83095189, 3.46410162, 4.69041576],
[8.30662386, 6. , 5.19615242],
[5.09901951, 7.54983444, 7.28010989]])
>>>
>>> np.log(test)
array([[3.52636052, 2.48490665, 3.09104245],
[4.2341065 , 3.58351894, 3.29583687],
[3.25809654, 4.04305127, 3.97029191]])
Indexing in Numpy Array
Indexing is similar to the regular python list.
>>> test[2]
array([26, 57, 53])
>>> test[1:]
array([[69, 36, 27],
[26, 57, 53]])
>>> test[1][1]
36
>>>
>>> test[:2,1:]
array([[12, 22],
[36, 27]])
Mapping/Filtering
Numpy offers direct filtering/mapping options. Let’s take a look at the examples:
>>> test > 30
array([[ True, False, False],
[ True, True, False],
[False, True, True]])
which is normally done using python map
>>> a = [5, 6, 78, 34, 56]
>>> list(map(lambda x: x>15, a))
[False, False, True, True, True]
I hope, the above-mentioned methods are enough for the basic. We will go through more advanced operations later in another post. Till then, cheers!
For accessing all data science in python
related posts, check this post:
Leave a comment