Javaexercise.com

Use A List Of Values To Select Rows From A Pandas Dataframe

A DataFrame is the primary data structure of the Pandas library in Python and is commonly used for storing and working with tabular data.

A common operation that could be performed on such data is to use a list of values to select particular rows in order to extract more information from it.

To start working with Pandas, we first need to import it :

import pandas as pd

Running Example

Let us understand this operation with the help of an example. Consider the following DataFrame containing 3 students with names A, B, and C and their corresponding marks (out of 10) for two subjects, Mathematics and Physics.

Code snippet for generating the above DataFrame: 

import pandas as pd

# Dictionary for our data
data = {'Name' : ['A', 'B', 'C'], 'Mathematics' : [8, 5, 10], 'Physics' : [7, 9, 8]}

# DataFrame for the dictionary
df = pd.DataFrame(data)

# Printing the DataFrame
print(df)

Here, data is a dictionary we created to initialize the DataFrame. For this, we use the .DataFrame() function of the Pandas library which takes the dictionary as an argument and returns the required DataFrame.

Now, let’s say we need to get only those rows where the corresponding values in a particular column are present in a specified list of values. Let us say the column of interest here is Physics and the desired values are 7 and 8. This gives us the first and third columns as they match this condition. The resulting output would look like this :

Let us look at different ways of performing this operation on a given DataFrame : 

1. Using the .isin() function

In this method, we use the isin() function to use a list of values to select particular rows from a dataframe. The list of values based on which the rows are to be selected, here [7,8] is passed as a parameter to the isin function after using df[‘Physics’] to access the column with physics marks. The returned dataframe is now reassigned to the variable df and is later printed out. Let us take a look at the corresponding code snippet and generated output for this method:

# Importing pandas
import pandas as pd

# Dictionary for our data
data = {'Name' : ['A', 'B', 'C'], 'Mathematics' : [8, 5, 10], 'Physics' : [7, 9, 8]}

# DataFrame for the dictionary
df = pd.DataFrame(data)

# Performing the operation
df = df[df['Physics'].isin([7,8])]

# Printing
print(df)

Output : 

Instead of using df[‘Physics’] to access the physics column of the DataFrame as shown earlier, we can also use df.Physics to access it instead and get the same results. Let us take a look at the corresponding code snippet and generated output for this method :

# Importing pandas
import pandas as pd

# Dictionary for our data
data = {'Name' : ['A', 'B', 'C'], 'Mathematics' : [8, 5, 10], 'Physics' : [7, 9, 8]}

# DataFrame for the dictionary
df = pd.DataFrame(data)

# Performing the operation
df = df[df.Physics.isin([7,8])]

# Printing
print(df)

Output : 

2. Using the dataframe.query() function

In this method, we use the dataframe.query() function to use a list of values to select particular rows from a dataframe.

The list of values based on which the rows are to be selected is [7,8]here. We pass the query as a parameter to the function which returns a dataframe.

The returned dataframe is then reassigned to the variable df which is later printed out. The desired column here is Physics so the query becomes Physics in [7,8].

Let us take a look at the corresponding code snippet and generated output for this method:

# Importing pandas
import pandas as pd

# Dictionary for our data
data = {'Name' : ['A', 'B', 'C'], 'Mathematics' : [8, 5, 10], 'Physics' : [7, 9, 8]}

# DataFrame for the dictionary
df = pd.DataFrame(data)

# Performing the operation
df = df.query('Physics in [7,8]')

# Printing
print(df)

Output : 

Conclusion

In this topic, we have learned to use a list of values to select particular rows from an existing Pandas DataFrame, following a running example of test scores of students in different subjects, thus giving us an intuition of how this concept could be applied in real-world situations. Feel free to reach out to info.javaexercise@gmail.com in case of any suggestions.