A DataFrame is the primary data structure of the Pandas library and is commonly used for storing and working with tabular data. A common operation that could be performed on such data is getting the row count of an existing DataFrame in order to gauge exactly how many data points we have.
To start working with Pandas, we first need to import it to the Python code:
import pandas as pd
Let us understand this operation with the help of an example. Consider the following DataFrame containing 3 students with names A, B, and C and their corresponding marks (out of 10) for two subjects, Mathematics and Physics. Let this be DataFrame df1.
Code snippet for generating the above DataFrame :
import pandas as pd
# Dictionary for our data
data1 = {'Name' : ['A', 'B', 'C'], 'Mathematics' : [8, 5, 10], 'Physics' : [7, 9, 8]}
# DataFrame for the dictionary
df1 = pd.DataFrame(data1)
# Printing the DataFrame
print(df1)
Here, data1 is a dictionary we created to initialize the DataFrame. For this, we use the DataFrame() function of the Pandas library which takes the dictionary as an argument and returns the required DataFrame.
Let us consider another dataframe df2 with the same column labels but no student entries. Therefore, the number of rows in df2 is 0. The dataframe, which is an empty one, looks like this :
Code snippet for generating the above DataFrame:
import pandas as pd
# Dictionary for our data
data1 = {'Name' : ['A', 'B', 'C'], 'Mathematics' : [8, 5, 10], 'Physics' : [7, 9, 8]}
# DataFrame for the dictionary
df1 = pd.DataFrame(data1)
# Dictionary for our data
data2 = {'Name' : [], 'Mathematics' : [], 'Physics' : []}
# DataFrame for the dictionary
df2 = pd.DataFrame(data2)
# Printing the DataFrame
print(df2)
Here, data2 is a dictionary we created to initialize the DataFrame. For this, we use the DataFrame() function of the Pandas library which takes the dictionary as an argument and returns the required DataFrame.
Now let us say that for some reason, we need to check if our DataFrames are empty or not. The expected output is very straightforward since the dataframe df1 is not empty and the dataframe df2 is empty.
An important fact to note here is that the dataframe is empty even if the number of columns is not 0, but the number of rows is 0.
Expected Output :
Let us look at different ways of performing this operation :
In this method, we use the dataframe.empty property to check if a Pandas dataframe is empty or not. The property returns a boolean value, which is True if the corresponding dataframe is empty and False if it is not empty.
Therefore, here the variable is_df1_empty stores False, and the variable is_df2_empty stores True.
Let us look at the code for this method and the corresponding output.
import pandas as pd
# Dictionary for our data
data1 = {'Name' : ['A', 'B', 'C'], 'Mathematics' : [8, 5, 10], 'Physics' : [7, 9, 8]}
# DataFrame for the dictionary
df1 = pd.DataFrame(data1)
# Dictionary for our data
data2 = {'Name' : [], 'Mathematics' : [], 'Physics' : []}
# DataFrame for the dictionary
df2 = pd.DataFrame(data2)
# Variable to store True if df1 is empty and False if it is not empty
is_df1_empty = df1.empty
# Variable to store True if df2 is empty and False if it is not empty
is_df2_empty = df2.empty
# Print statements inside conditions for printing if a
# particular DataFrame is empty or not.Â
if is_df1_empty:
  print("DataFrame df1 is empty")
else:
  print("DataFrame df1 is not empty")
if is_df2_empty:
  print("DataFrame df2 is empty")
else:
  print("DataFrame df2 is not empty")
Output :
In this method, we use the len() function to check if a Pandas dataframe is empty or not. When a dataframe, say df1 is passed as a parameter to this function, it returns the length of the dataframe, i.e. the number of rows.
If the number of rows is 0 then the dataframe is empty, else the number of rows would be an integer greater than 0 and the dataframe would not be empty.
Therefore, here the variable is_df1_empty stores False and the variable is_df2_empty stores True. Let us look at the code for this method and the corresponding output.
import pandas as pd
# Dictionary for our data
data1 = {'Name' : ['A', 'B', 'C'], 'Mathematics' : [8, 5, 10], 'Physics' : [7, 9, 8]}
# DataFrame for the dictionary
df1 = pd.DataFrame(data1)
# Dictionary for our data
data2 = {'Name' : [], 'Mathematics' : [], 'Physics' : []}
# DataFrame for the dictionary
df2 = pd.DataFrame(data2)
# Variable to store True if df1 is empty and False if it is not empty
is_df1_empty = (len(df1) == 0)
# Variable to store True if df2 is empty and False if it is not empty
is_df2_empty = (len(df2) == 0)
# Print statements inside conditions for printing if a
# particular DataFrame is empty or not.Â
if is_df1_empty:
  print("DataFrame df1 is empty")
else:
  print("DataFrame df1 is not empty")
if is_df2_empty:
  print("DataFrame df2 is empty")
else:
  print("DataFrame df2 is not empty")
Output :
In this method, we use dataframe.index property to check if a Pandas dataframe is empty or not. This property returns a Pandas index object containing the labels of the rows of the DataFrame.
When we pass dataframe.index as a parameter to the length function, the resulting returned value is the same as the number of rows in the dataframe.
If the number of rows is 0 then the dataframe is empty, else the number of rows would be an integer greater than 0 and the dataframe would not be empty.
Therefore, here the variable is_df1_empty stores False, and the variable is_df2_empty stores True.
Let us look at the code for this method and the corresponding output.
import pandas as pd
# Dictionary for our data
data1 = {'Name' : ['A', 'B', 'C'], 'Mathematics' : [8, 5, 10], 'Physics' : [7, 9, 8]}
# DataFrame for the dictionary
df1 = pd.DataFrame(data1)
# Dictionary for our data
data2 = {'Name' : [], 'Mathematics' : [], 'Physics' : []}
# DataFrame for the dictionary
df2 = pd.DataFrame(data2)
# Variable to store True if df1 is empty and False if it is not empty
is_df1_empty = (len(df1.index) == 0)
# Variable to store True if df2 is empty and False if it is not empty
is_df2_empty = (len(df2.index) == 0)
# Print statements inside conditions for printing if a
# particular DataFrame is empty or not.Â
if is_df1_empty:
  print("DataFrame df1 is empty")
else:
  print("DataFrame df1 is not empty")
if is_df2_empty:
  print("DataFrame df2 is empty")
else:
  print("DataFrame df2 is not empty")
Output :
In this method, we use dataframe.count() function to check if a Pandas dataframe is empty or not. This function returns the column-wise total number of non-NaN entries in the dataframe.
The dataframe.sum() function is used to add all these column-wise counts. If this returns 0, it means that the dataframe is empty otherwise not empty.
Therefore, here the variable is_df1_empty stores False and the variable is_df2_empty stores True.
Let us look at the code for this method and the corresponding output.
import pandas as pd
# Dictionary for our data
data1 = {'Name' : ['A', 'B', 'C'], 'Mathematics' : [8, 5, 10], 'Physics' : [7, 9, 8]}
# DataFrame for the dictionary
df1 = pd.DataFrame(data1)
# Dictionary for our data
data2 = {'Name' : [], 'Mathematics' : [], 'Physics' : []}
# DataFrame for the dictionary
df2 = pd.DataFrame(data2)
# Variable to store True if df1 is empty and False if it is not empty
is_df1_empty = (df1.count().sum() == 0)
# Variable to store True if df2 is empty and False if it is not empty
is_df2_empty = (df2.count().sum() == 0)
# Print statements inside conditions for printing if a
# particular DataFrame is empty or not.Â
if is_df1_empty:
  print("DataFrame df1 is empty")
else:
  print("DataFrame df1 is not empty")
if is_df2_empty:
  print("DataFrame df2 is empty")
else:
  print("DataFrame df2 is not empty")
Output :
In this topic, we have learned to check if a Pandas DataFrame is empty or not, following a running example of test scores of students in different subjects, thus giving us an intuition of how this concept could be applied in real-world situations. Feel free to reach out to info.javaexercise@gmail.com in case of any suggestions