A DataFrame is the primary data structure of the Pandas library and is commonly used for storing and working with tabular data. A common operation that could be performed on such data is deleting a column from the DataFrame in order to remove data that is no longer useful.
To start working with Pandas, we first need to import it:
import pandas as pd
Let us understand this operation with the help of an example. Consider the following DataFrame containing 3 students with names A, B and C and their corresponding marks (out of 10) for three subjects, Mathematics, Physics, and History.
Code snippet for generating the above DataFrame:
import pandas as pd
# Dictionary for our data
data = {'Name' : ['A', 'B', 'C'], 'Mathematics' : [8, 5, 10], 'Physics' : [7, 9, 8], 'History' : [6, 8, 9]}
# DataFrame for the dictionary
df = pd.DataFrame(data)
# Printing the DataFrame
print(df)
Here, data is a dictionary that we created to initialize the DataFrame. For this, we used the DataFrame() function of the Pandas library which takes the dictionary as an argument and returns the required DataFrame.
Now, let’s say that there has been some problem in evaluating the students in the subject of History and the column needs to be deleted. Let us have a look at different ways of performing this operation:
In this method, we will use the drop() function. The labels of columns to be deleted are passed as arguments here.
The first argument is a list of labels of columns to be deleted, specified by [‘History’]. The second argument, ‘axis’, is specified as 1 representing the fact that we need to delete a column.
This function returns the updated DataFrame that we assign back to df variable.
import pandas as pd
# Dictionary for our data
data = {'Name' : ['A', 'B', 'C'], 'Mathematics' : [8, 5, 10], 'Physics' : [7, 9, 8], 'History' : [6, 8, 9]}
# DataFrame for the dictionary
df = pd.DataFrame(data)
# Deleting the column named 'History'
df = df.drop(['History'], axis = 1)
# Printing the updated DataFrame
print(df)
Output:
To make the deletion ‘in-place’ instead of reassigning it, we set the argument inplace = True, which is set to False by default.
import pandas as pd
# Dictionary for our data
data = {'Name' : ['A', 'B', 'C'], 'Mathematics' : [8, 5, 10], 'Physics' : [7, 9, 8], 'History' : [6, 8, 9]}
# DataFrame for the dictionary
df = pd.DataFrame(data)
# Deleting the column named 'History'
df.drop(['History'], axis = 1, inplace = True)
# Printing the updated DataFrame
print(df)
In this method also, we use the drop() function. But this time, we use the .columns property of DataFrames to specify the columns that need to be deleted.
The .columns property returns a list of the column labels of the DataFrames. We can easily extract the label of the fourth column, i.e History, using df.columns[3].
Remember that lists in Python 3 are zero-indexed.
The first argument of the drop() function is a list of labels of columns to be deleted, specified by df.columns[3]. The second argument of the drop() function, ‘axis’, is specified as 1 representing the fact that we need to delete a column.
This function returns the updated DataFrame that we assign back to df variable.
import pandas as pd
# Dictionary for our data
data = {'Name' : ['A', 'B', 'C'], 'Mathematics' : [8, 5, 10], 'Physics' : [7, 9, 8], 'History' : [6, 8, 9]}
# DataFrame for the dictionary
df = pd.DataFrame(data)
# Deleting the column named 'History'
df = df.drop([df.columns[3]], axis = 1)
# Printing the updated DataFrame
print(df)
Output:
To make the deletion ‘in-place’ instead of reassigning it, we set the argument inplace = True, which is set to False by default.
import pandas as pd
# Dictionary for our data
data = {'Name' : ['A', 'B', 'C'], 'Mathematics' : [8, 5, 10], 'Physics' : [7, 9, 8], 'History' : [6, 8, 9]}
# DataFrame for the dictionary
df = pd.DataFrame(data)
# Deleting the column named 'History'
df.drop([df.columns[3]], axis = 1, inplace = True)
# Printing the updated DataFrame
print(df)
In this keyword, we use the del keyword to delete a particular column using the corresponding column label.
The del keyword is used in Python 3 to delete objects (In Python everything is treated as an object).
Here, the object of interest is the column of the DataFrame df, specified by the column label ‘History’.
The syntax to use del is as follows:
del <object to be deleted>
The column deletion done by this method is in-place.
import pandas as pd
# Dictionary for our data
data = {'Name' : ['A', 'B', 'C'], 'Mathematics' : [8, 5, 10], 'Physics' : [7, 9, 8], 'History' : [6, 8, 9]}
# DataFrame for the dictionary
df = pd.DataFrame(data)
# Deleting the column named 'History'
del df['History']
# Printing the updated DataFrame
print(df)
Output:
For an alternative version of this method, we can use the column indices along with the .columns property of DataFrame, similar to what has been described in method 2 as shown below.
import pandas as pd
# Dictionary for our data
data = {'Name' : ['A', 'B', 'C'], 'Mathematics' : [8, 5, 10], 'Physics' : [7, 9, 8], 'History' : [6, 8, 9]}
# DataFrame for the dictionary
df = pd.DataFrame(data)
# Deleting the column named 'History'
del df[df.columns[3]]
# Printing the updated DataFrame
print(df)
In this method, we use the pop() function to delete a particular column using the corresponding column label.
The pop() function takes the label of the column to be deleted as an argument, specified by ‘History’ here. The deletion carried out by this method is ‘in-place’ deletion.
import pandas as pd
# Dictionary for our data
data = {'Name' : ['A', 'B', 'C'], 'Mathematics' : [8, 5, 10], 'Physics' : [7, 9, 8], 'History' : [6, 8, 9]}
# DataFrame for the dictionary
df = pd.DataFrame(data)
# Deleting the column named 'History'
df.pop('History')
# Printing the updated DataFrame
print(df)
Output:
The pop() function returns the deleted column as can be seen in the code snippet below:
import pandas as pd
# Dictionary for our data
data = {'Name' : ['A', 'B', 'C'], 'Mathematics' : [8, 5, 10], 'Physics' : [7, 9, 8], 'History' : [6, 8, 9]}
# DataFrame for the dictionary
df = pd.DataFrame(data)
# Deleting the column named 'History'
col = df.pop('History')
# Printing the deleted column
print(col)
Output:
For an alternative version of this method, we can use the column indices along with the .columns property of DataFrame, similar to what has been described in method 2 as shown below.
import pandas as pd
# Dictionary for our data
data = {'Name' : ['A', 'B', 'C'], 'Mathematics' : [8, 5, 10], 'Physics' : [7, 9, 8], 'History' : [6, 8, 9]}
# DataFrame for the dictionary
df = pd.DataFrame(data)
# Deleting the column named 'History'
df.pop(df.columns[3])
# Printing the updated DataFrame
print(df)
In this method, we create a new DataFrame and specify the column labels of the columns of the original DataFrame we need to keep in the new DataFrame. This method is extremely useful when you need to delete a lot of columns.
The required column labels are specified as a list, [‘Name’, ‘Mathematics’, ‘Physics’] here.
import pandas as pd
# Dictionary for our data
data = {'Name' : ['A', 'B', 'C'], 'Mathematics' : [8, 5, 10], 'Physics' : [7, 9, 8], 'History' : [6, 8, 9]}
# DataFrame for the dictionary
df = pd.DataFrame(data)
# Deleting the column named 'History'
df1 = df[['Name', 'Mathematics', 'Physics']]
# Printing the updated DataFrame
df1
Output :
In this topic, we have learned to delete a column from a Pandas DataFrame, following a running example of test scores of students in different subjects, thus giving us an intuition of how this concept could be applied in real-world situations.