Javaexercise.com

How to Select Multiple Columns In A Pandas DataFrame?

A DataFrame is the primary data structure of the Pandas library and is commonly used for storing and working with tabular data. A common operation that could be performed on such data is to select multiple columns in order to extract more information from the DataFrame.

To start working with Pandas, we first need to import this statement in the Python code :

Python 3 Code :

import pandas as pd

Running Example

Let us understand this operation with the help of an example. Consider the following DataFrame containing 3 students with names A, B, and C and their corresponding marks (out of 10) for two subjects, Mathematics and Physics.

Selecting Multiple Columns In A Pandas DataFrame

Python code snippet for generating the above DataFrame : 

Python 3 Code : 

# Importing pandas
import pandas as pd

# Dictionary for our data
data = {'Name' : ['A', 'B', 'C'], 'Mathematics' : [8, 5, 10], 'Physics' : [7, 9, 8]}

# DataFrame for the dictionary
df = pd.DataFrame(data)

# Printing the DataFrame
print(df)

Here, data is a dictionary we created to initialize the DataFrame. For this, we use the DataFrame() function of the Pandas library which takes the dictionary as an argument and returns the required DataFrame. 

Now, let’s say that for some reason, we need to select only the columns Name and Physics from the DataFrame, maybe to perform more operations on them.

The resulting DataFrame would look like this :

Selecting Multiple Columns In A Pandas DataFrame

Let us look at different ways of performing this operation on a given DataFrame : 

Selecting Multiple Columns In A Pandas DataFrame Using square brackets in Pandas

This method is pretty straightforward and is the most commonly used one. In this method, we use square brackets to mention the columns we want to select.

Since the column names are strings, for slicing them we will require two pairs of opening and closing square brackets, i.e [[‘Name’, ‘Physics’]].

The changes are not made in place so we need to reassign the DataFrame.

Let us look at the python 3 code and corresponding output for this method.

Python 3 Code : 

# Importing pandas
import pandas as pd

# Dictionary for our data
data = {'Name' : ['A', 'B', 'C'], 'Mathematics' : [8, 5, 10], 'Physics' : [7, 9, 8]}

# DataFrame for the dictionary
df = pd.DataFrame(data)

# Performing the operation
df = df[['Name', 'Physics']]

# Printing the updated DataFrame
print(df)

Output : 

Selecting Multiple Columns In A Pandas DataFrame

Selecting Multiple Columns In A Pandas DataFrame Using the DataFrame.columns property

This method is an alternative method to the previous one. In this method, we use the DataFrame.columns property to specify the columns to be selected along with using square brackets.

The df.columns property returns the label of the corresponding column when you pass an index to it, here 0 and 2.

This method is useful in cases where column labels are unknown or complex. The changes are not made in place so we need to reassign the DataFrame.

Let us look at the Python code and corresponding output for this method.

Python 3 Code : 

# Importing pandas
import pandas as pd

# Dictionary for our data
data = {'Name' : ['A', 'B', 'C'], 'Mathematics' : [8, 5, 10], 'Physics' : [7, 9, 8]}

# DataFrame for the dictionary
df = pd.DataFrame(data)

# Performing the operation
df = df[[df.columns[0], df.columns[2]]]

# Printing the updated DataFrame
print(df)

Output : 

Selecting Multiple Columns In A Pandas DataFrame

Selecting Multiple Columns In A Pandas DataFrame Using the DataFrame.loc property

This method is an alternative method to the previous ones. In this method, we use the DataFrame.loc property to specify the columns to be selected along with using square brackets.

The first parameter is passed as :(colon) to specify that all rows have to be selected. Next, we pass a list of labels of the columns that need to be selected.

The changes are not made in place so we need to reassign the DataFrame.

Let us look at the Python code and corresponding output for this method.

Python 3 Code : 

# Importing pandas
import pandas as pd

# Dictionary for our data
data = {'Name' : ['A', 'B', 'C'], 'Mathematics' : [8, 5, 10], 'Physics' : [7, 9, 8]}

# DataFrame for the dictionary
df = pd.DataFrame(data)

# Performing the operation
df = df.loc[:,['Name', 'Physics']]

# Printing the updated DataFrame
print(df)

Output : 

Selecting Multiple Columns In A Pandas DataFrame

Selecting Multiple Columns In A Pandas DataFrame Using the DataFrame.iloc property

This method is an alternative method to the previous ones. In this method, we use the DataFrame.iloc property to specify the columns to be selected along with using square brackets.

The first parameter is passed as :(colon) to specify that all rows have to be selected.

Next, we pass a list of indices of the columns that need to be selected, here 0 and 2. This method is very useful in cases where the column labels are unknown or too complex.

The changes are not made in place so we need to reassign the DataFrame.

Let us look at the python 3 code and corresponding output for this method.

Python 3 Code : 

# Importing pandas
import pandas as pd

# Dictionary for our data
data = {'Name' : ['A', 'B', 'C'], 'Mathematics' : [8, 5, 10], 'Physics' : [7, 9, 8]}

# DataFrame for the dictionary
df = pd.DataFrame(data)

# Performing the operation
df = df.iloc[:,[0,2]]

# Printing the updated DataFrame
print(df)

Output : 

Selecting Multiple Columns In A Pandas DataFrame

Conclusion

In this topic, we have learned how to select multiple columns in an existing Pandas DataFrame, following a running example of test scores of students in different subjects, thus giving us an intuition of how this concept could be applied in real-world situations. Feel free to reach out to info.javaexercise@gmail.com in case of any suggestions.