A DataFrame is the primary data structure of the Pandas library in Python and is commonly used for storing and working with tabular data. A common operation that could be performed on such data is to iterate over the rows in order to extract more information from it.
To start working with Pandas, we first need to import it:
import pandas as pd
Let us understand this operation with the help of an example. Consider the following DataFrame containing 3 students with names A, B, and C and their corresponding marks (out of 10) for two subjects, Mathematics and Physics.
Code snippet for generating the above DataFrame:
import pandas as pd
# Dictionary for our data
data = {'Name' : ['A', 'B', 'C'], 'Mathematics' : [8, 5, 10], 'Physics' : [7, 9, 8]}
# DataFrame for the dictionary
df = pd.DataFrame(data)
# Printing the DataFrame
print(df)
Here, data is a dictionary that we created to initialize the DataFrame. For this, we use the DataFrame() function of the Pandas library which takes the dictionary as an argument and returns the required DataFrame.
Now, let’s say we need to iterate over the rows of the dataframe and print the corresponding entries in the physics column. The resulting output would look like this :
7
9
8
Let us look at different ways of performing this operation on a given DataFrame :
In this method, we use the dataframe.iterrows() function to iterate over the rows of the dataframe and print the corresponding entries in the physics column.
We use a for loop along with this function. The dataframe.iterrows() function returns the index and row in each iteration.
We can then use square brackets to access the entry in the physics column in the manner row[‘Physics’]. Let us take a look at the corresponding code snippet and generated output for this method :
# Importing pandas
import pandas as pd
# Dictionary for our data
data = {'Name' : ['A', 'B', 'C'], 'Mathematics' : [8, 5, 10], 'Physics' : [7, 9, 8]}
# DataFrame for the dictionary
df = pd.DataFrame(data)
# Performing the operation
for index, row in df.iterrows():
  print(row['Physics'])
Output :
7
9
8
Instead of using square brackets to access the physics column, i.e. doing something like row[‘Physics’] , we can also use the dot operator to get the same result, in the manner row.Physics. Let us take a look at the corresponding code snippet and generated output for this method :
# Importing pandas
import pandas as pd
# Dictionary for our data
data = {'Name' : ['A', 'B', 'C'], 'Mathematics' : [8, 5, 10], 'Physics' : [7, 9, 8]}
# DataFrame for the dictionary
df = pd.DataFrame(data)
# Performing the operation
for index, row in df.iterrows():
  print(row.Physics)
Output :
7
9
8
In this method, we use the dataframe.itertuples() function to iterate over the rows of the dataframe and print the corresponding entries in the physics column. We use a for loop along with this function. The dataframe.itertuples() function returns the row in each iteration. We can then use square brackets to access the entry in the physics column using the appropriate column index.
We obtain this index using dataframe.columns.get_loc() function with the parameter as the label of the desired column, here physics. We need to add 1 to this index to account for the index of the index column. Let us take a look at the corresponding code snippet and generated output for this method :
# Importing pandas
import pandas as pd
# Dictionary for our data
data = {'Name' : ['A', 'B', 'C'], 'Mathematics' : [8, 5, 10], 'Physics' : [7, 9, 8]}
# DataFrame for the dictionary
df = pd.DataFrame(data)
# Performing the operation
for row in df.itertuples():
  print(row[df.columns.get_loc('Physics') + 1])
Output :
7
9
8
Instead of using square brackets for indexing, we can also use the dot operator to get the same result, in the manner row.Physics, to access the required element from the dataframe. Let us take a look at the corresponding code snippet and generated output for this method :
# Importing pandas
import pandas as pd
# Dictionary for our data
data = {'Name' : ['A', 'B', 'C'], 'Mathematics' : [8, 5, 10], 'Physics' : [7, 9, 8]}
# DataFrame for the dictionary
df = pd.DataFrame(data)
# Performing the operation
for row in df.itertuples():
  print(row.Physics)
Output :
7
9
8
In this topic, we have learned to iterate over the rows of an existing Pandas DataFrame, following a running example of test scores of students in different subjects, thus giving us an intuition of how this concept could be applied in the real-world situations. Feel free to reach out to info.javaexercise@gmail.com in case of any suggestions.