Introduction to NumPy

Jason Drummond
4 min readJan 19, 2021

--

NumPy is one of the fundamental packages, just like SciPy, Scikit-Learn, and Pandas, that is essential for a Data Scientist to become familiar with. Numpy is a library that provides the user with an array data structure that has some superior benefits over lists in Python, such as faster reading and writing speeds, more efficient, convenient and also being more compact. As a refresher Python Lists are pretty powerful in their own right, a list can hold any type, can hold different types at the same time, you can also change, add, and remove elements. Their is however a feature that is missing that is necessary for data scientists, this is the ability to carry out operations over entire collections of values. This is where NumPy arrays come in.

NumPy Array

As we have just mentioned above the NumPy array is pretty similar to the python list, but has one additional feature: you can perform calculations over entire arrays. In order to learn about the NumPy array, first we need to install the NumPy package and import it. To install all we need to do is open a terminal window and install it with pip by using the following command

We will then import the numpy package to use in our environment, to do so we will use the following command.

You may be wondering why we imported numpy as np, this is simply a shorthand so when we are writing code we do not have to type out numpy everytime we want to use the library. Now let’s move on to actually creating a numpy array, we can do this with numpy’s array function which will use a python list as its input. This list will represent heights, in centimeters, of various baseball players and we will convert it into an array with the following code.

np.array() function

Congratulations! You have now created your first numpy array, this is a relatively easy process but will now allow you to perform calculations on the elements in the array, which we will now show you how to do.

Calculations in NumPy

Now let’s say we want to find the BMI, or Body Mass Index, of these baseball players. We can do this using the following formula

Formula for BMI

If you haven’t noticed already we have a numpy array of players heights but they are in centimeters. We will need to convert all of the elements to reflect meters in order to use the above formula. Luckily in Numpy if we multiply an array by a constant value it will do this element-wise as opposed to python where two things may happen, first if we try to multiply by a float we will get an error and second if we multiply by an integer, say 2, we will end up with a duplicate of the whole list. These list “calculations” can be seen below

Multiplication of Python Lists example

We have now seen the reason why numpy can be so powerful, we can simply multiply our numpy array of players heights by 0.1 to alter each element to reflect meters whereas in python we would have to loop through our list to perform the same operation.

Now to finish calculating players BMI’s we will create a numpy array of players weight, luckily this is already given to us in kilograms. We will then simply input these two arrays in the formula above to get an array with the BMI of every player.

Conclusion

Congratulations! You have successfully used numpy to create arrays and perform various calculations on those arrays. You have seen how powerful arrays are over regular python lists and why a data scientist would want to learn how to use Numpy. This is just an introduction to the numpy package as it has many wonderful and fast uses for data scientists. I urge you to continue to dive deeper into the Numpy package and learn all of its very useful and sometimes niche applications.

--

--

Responses (1)