A DataFrame is the primary data structure of the Pandas library in Python and is commonly used for storing and working with tabular data.
A common operation that could be performed on such data is to sort the data based on two or more columns in order to extract more information from it.
To start working with Pandas, we first need to import it:
import pandas as pd
Let us understand this operation with the help of an example. Consider the following DataFrame containing 3 students with names A, B, and C and their corresponding marks (out of 10) for two subjects, Mathematics and Physics.
Code snippet for generating the above DataFrame :
import pandas as pd
# Dictionary for our data
data = {'Name' : ['A', 'B', 'C'], 'Mathematics' : [8, 5, 10], 'Physics' : [7, 9, 8]}
# DataFrame for the dictionary
df = pd.DataFrame(data)
# Printing the DataFrame
print(df)
Here, data is a dictionary we created to initialize the DataFrame. For this, we use the DataFrame() function of the Pandas library which takes the dictionary as an argument and returns the required DataFrame.
Now, let’s say we need to sort the dataframe based on the values contained in the columns mathematics and physics to visualize the trend in the data in a better manner.
The resulting output would look like this :
Let us look at different ways of performing this operation on a given DataFrame:
In this method, we use the dataframe.sort() function to sort the dataframe based on the values in two or more columns, here the columns of interest are mathematics and physics.
The sort() function returns the sorted dataframe when the labels of desired columns are passed as a list. By default, the sorting is performed in ascending order. The updates are not made in place so reassignment is required.
Let us take a look at the corresponding code snippet and generated output for this method:
# Importing pandas
import pandas as pd
# Dictionary for our data
data = {'Name' : ['A', 'B', 'C'], 'Mathematics' : [8, 5, 10], 'Physics' : [7, 9, 8]}
# DataFrame for the dictionary
df = pd.DataFrame(data)
# Performing the operation
df = df.sort_values(['Mathematics', 'Physics'])
# PrintingÂ
print(df)
Output :
Instead of sorting in ascending order, if we want the sorting to be done in a descending order, we need to pass the parameter ascending as [False, False], a False corresponding to each column we are considering here.
Let us take a look at the corresponding code snippet and generated output for this method:
# Importing pandas
import pandas as pd
# Dictionary for our data
data = {'Name' : ['A', 'B', 'C'], 'Mathematics' : [8, 5, 10], 'Physics' : [7, 9, 8]}
# DataFrame for the dictionary
df = pd.DataFrame(data)
# Performing the operation
df = df.sort_values(['Mathematics', 'Physics'], ascending = [False, False])
# PrintingÂ
print(df)
Output :
Instead of reassigning the dataframe, if we want the operation to be performed in an in place manner, we need to pass the parameter inplace as True.
Let us take a look at the corresponding code snippet and generated output for this method:
# Importing pandas
import pandas as pd
# Dictionary for our data
data = {'Name' : ['A', 'B', 'C'], 'Mathematics' : [8, 5, 10], 'Physics' : [7, 9, 8]}
# DataFrame for the dictionary
df = pd.DataFrame(data)
# Performing the operation
df.sort_values(['Mathematics', 'Physics'], inplace = True)
# PrintingÂ
print(df)
Output :
In this topic, we have learned to sort an existing Pandas DataFrame by two or more columns, following a running example of test scores of students in different subjects, thus giving us an intuition of how this concept could be applied in the real-world situations. Feel free to reach out to info.javaexercise@gmail.com in case of any suggestions.