A DataFrame is the primary data structure of the Pandas library in Python and is commonly used for storing and working with tabular data.
A common operation that could be performed on such data is to convert a Pandas dataframe to a list of dictionaries in order to have the information in a more workable format.
To start working with Pandas, we first need to import it:
import pandas as pd
Let us understand this operation with the help of an example. Consider the following DataFrame containing 3 students with names A, B, and C and their corresponding marks (out of 10) for two subjects, Mathematics and Physics.
Code snippet for generating the above DataFrame :
import pandas as pd
# Dictionary for our data
data = {'Name' : ['A', 'B', 'C'], 'Mathematics' : [8, 5, 10], 'Physics' : [7, 9, 8]}
# DataFrame for the dictionary
df = pd.DataFrame(data)
# Printing the DataFrame
print(df)
Here, data is a dictionary we created to initialize the DataFrame. For this, we use the DataFrame() function of the Pandas library which takes the dictionary as an argument and returns the required DataFrame.
Now, let’s say we need to convert this dataframe into a list of dictionaries to maybe have the information in a more tractable format. The resulting output would look like this :
[{'Name': 'A', 'Mathematics': 8, 'Physics': 7}, {'Name': 'B', 'Mathematics': 5, 'Physics': 9}, {'Name': 'C', 'Mathematics': 10, 'Physics': 8}]
Let us look at different ways of performing this operation on a given DataFrame :
In this method, we use the DataFrame.to_dict() function to convert the given dataframe into the desired form, as a list of dictionaries.
The resultant list is obtained as the returned object. We pass in a parameter as records. The changes are not made in place so reassignment is required.
Let us take a look at the corresponding code snippet and generated output for this method :
# Importing pandas
import pandas as pd
# Dictionary for our data
data = {'Name' : ['A', 'B', 'C'], 'Mathematics' : [8, 5, 10], 'Physics' : [7, 9, 8]}
# DataFrame for the dictionary
df = pd.DataFrame(data)
# Performing the operation
df = df.to_dict('records')
# Printing
print(df)
Output :
[{'Name': 'A', 'Mathematics': 8, 'Physics': 7}, {'Name': 'B', 'Mathematics': 5, 'Physics': 9}, {'Name': 'C', 'Mathematics': 10, 'Physics': 8}]
Instead of passing the parameter records, if we need to have the index as well in the list, we pass the parameter as an index. The changes are not made in place so reassignment is required. Let us take a look at the corresponding code snippet and generated output for this method :
# Importing pandas
import pandas as pd
# Dictionary for our data
data = {'Name' : ['A', 'B', 'C'], 'Mathematics' : [8, 5, 10], 'Physics' : [7, 9, 8]}
# DataFrame for the dictionary
df = pd.DataFrame(data)
# Performing the operation
df = df.to_dict('index')
# Printing
print(df)
Output :
{0: {'Name': 'A', 'Mathematics': 8, 'Physics': 7}, 1: {'Name': 'B', 'Mathematics': 5, 'Physics': 9}, 2: {'Name': 'C', 'Mathematics': 10, 'Physics': 8}}
In this method, we use the .iterrows() function to convert the required Pandas dataframe to a list. First, we initialize a variable row with an empty list. This will store our desired result at the end.
Then, we have a for loop that iterates over an iterable returned by the dataframe.iterrows() function. Inside this loop, we append a dictionary to the list stored in the variable rows.
This dictionary has the key as column label and values as corresponding values in that row that are accessed using square brackets and the required column labels, for example, row[‘Name’].
Let us take a look at the corresponding code snippet and generated output for this method:
# Importing pandas
import pandas as pd
# Dictionary for our data
data = {'Name' : ['A', 'B', 'C'], 'Mathematics' : [8, 5, 10], 'Physics' : [7, 9, 8]}
# DataFrame for the dictionary
df = pd.DataFrame(data)
# Performing the operation
rows = []
for index, row in df.iterrows():
  rows.append({
      'Name': row['Name'],
      'Mathematics' : row['Mathematics'],
      'Physics' : row['Physics']
  })
# Printing
print(rows)
Output :
[{'Name': 'A', 'Mathematics': 8, 'Physics': 7}, {'Name': 'B', 'Mathematics': 5, 'Physics': 9}, {'Name': 'C', 'Mathematics': 10, 'Physics': 8}]
Instead of using square brackets to access the column values, we can also use simple dot operators. For example, row.Physics in place of row[‘Physics’]. Let us take a look at the python code and the corresponding output for this method:
# Importing pandas
import pandas as pd
# Dictionary for our data
data = {'Name' : ['A', 'B', 'C'], 'Mathematics' : [8, 5, 10], 'Physics' : [7, 9, 8]}
# DataFrame for the dictionary
df = pd.DataFrame(data)
# Performing the operation
rows = []
for index, row in df.iterrows():
  rows.append({
      'Name': row.Name,
      'Mathematics' : row.Mathematics,
      'Physics' : row.Physics
  })
# Printing
print(rows)
Output :
[{'Name': 'A', 'Mathematics': 8, 'Physics': 7}, {'Name': 'B', 'Mathematics': 5, 'Physics': 9}, {'Name': 'C', 'Mathematics': 10, 'Physics': 8}]
In this topic, we have learned to convert an existing Pandas DataFrame to a list of dictionaries, following a running example of test scores of students in different subjects, thus giving us an intuition of how this concept could be applied in real-world situations. Feel free to reach out to info.javaexercise@gmail.com in case of any suggestions.