Javaexercise.com

How To Drop Rows Of Pandas DataFrame Whose Value In A Certain Column Is NaN?

A DataFrame is the primary data structure of the Pandas library in Python and is commonly used for storing and working with tabular data. A common operation that could be performed on such data is to drop rows of the dataframe whose value in a certain column is nan in case it becomes inconvenient at later stages of processing. 

To start working with Pandas, we first need to import it:

import pandas as pd

We’ll also need the NumPy library for this to specify NaN values. To work with NumPy, we first need to import it in the following manner:

Python 3 Code :

import numpy as np

Running Example

Let us understand this operation with the help of an example. Consider the following DataFrame containing 3 students with names A, B, and C and their corresponding marks (out of 10) for two subjects, Mathematics and Physics.

Code snippet for generating the above DataFrame : 

Python 3 Code : 

import pandas as pd
import numpy as np

# Dictionary for our data
data = {'Name' : ['A', 'B', 'C'], 'Mathematics' : [8, 5, 10], 'Physics' : [np.nan, np.nan, 8]}

# DataFrame for the dictionary
df = pd.DataFrame.from_dict(data=data)

# Printing the DataFrame
print(df)

Here, data is a dictionary we created to initialize the DataFrame. For this, we use the DataFrame() function of the Pandas library which takes the dictionary as an argument and returns the required DataFrame. One important thing to note here is that the first two rows of the column physics have NaN values. 

Now, let’s say we need to count the number of NaN values we have in the physics column in order to maybe gather some more information about the dataframe at hand. The resulting output would look something like this :

Let us look at different ways of performing this operation on a given DataFrame : 

1. Using the dataframe.dropna() function

In this method, we use the dataframe.dropna() function to drop the rows that have nan values in any column of the dataframe.

This function returns the updated dataframe. The changes are not made in place so reassignment is required. Let us take a look at the corresponding code snippet and generated output for this method:

# Importing required libraries
import pandas as pd
import numpy as np

# Dictionary for our data
data = {'Name' : ['A', 'B', 'C'], 'Mathematics' : [8, 5, 10], 'Physics' : [np.nan, np.nan, 8]}

# DataFrame for the dictionary
df = pd.DataFrame.from_dict(data=data)

# Performing the operation
df = df.dropna()

# Printing the result
print(df)

Output : 

In case we want to perform the above operation in an in place manner, we pass the parameter inplace as True inside the dropna() function. Here, we do not require reassignment.

Let us take a look at the corresponding code snippet and generated output for this method:

# Importing required libraries
import pandas as pd
import numpy as np

# Dictionary for our data
data = {'Name' : ['A', 'B', 'C'], 'Mathematics' : [8, 5, 10], 'Physics' : [np.nan, np.nan, 8]}

# DataFrame for the dictionary
df = pd.DataFrame.from_dict(data=data)

# Performing the operation
df.dropna(inplace = True)

# Printing the result
print(df)

Output : 

2. Using the dataframe.isna() function

In this method, we use the dataframe.isna() function to drop the rows that have nan values in any column of the dataframe. This function returns True or False values depending on the entries i.e. True for nan values and False for values that are not nan. Then we apply the .any() function with axis = 1 which helps us select the rows that have nan values in any column.

We use the tilde operator or ~ to complement this selection and finally use square brackets. Let us take a look at the corresponding code snippet and generated output for this method:

# Importing required libraries
import pandas as pd
import numpy as np

# Dictionary for our data
data = {'Name' : ['A', 'B', 'C'], 'Mathematics' : [8, 5, 10], 'Physics' : [np.nan, np.nan, 8]}

# DataFrame for the dictionary
df = pd.DataFrame.from_dict(data=data)

# Performing the operation
df = df[~df.isna().any(axis=1)]

# Printing the result
print(df)

Output : 

Conclusion

In this topic, we have learned to drop rows of the dataframe whose value in a certain column is nan in an existing Pandas DataFrame, following a running example of test scores of students in different subjects, thus giving us an intuition of how this concept could be applied in real-world situations. Feel free to reach out to info.javaexercise@gmail.com in case of any suggestions.