A DataFrame is the primary data structure of the Pandas library and is commonly used for storing and working with tabular data. A common operation that could be performed on such data is to determine whether a value is present in a column in order to extract information from it.
To start working with Pandas, we first need to import it in Python code:
import pandas as pd
Let us understand this operation with the help of an example. Consider the following DataFrame containing 3 students with names A, B, and C and their corresponding marks (out of 10) for two subjects, Mathematics and Physics.
Code snippet for generating the above DataFrame :
import pandas as pd
# Dictionary for our data
data = {'Name' : ['A', 'B', 'C'], 'Mathematics' : [8, 5, 10], 'Physics' : [7, 9, 8]}
# DataFrame for the dictionary
df = pd.DataFrame(data)
# Printing the DataFrame
print(df)
Here, data is a dictionary we created to initialize the DataFrame. For this, we use the DataFrame() function of the Pandas library which takes the dictionary as an argument and returns the required DataFrame.
Now let us say that for some reason, we want to determine if a particular value, say 7 is present in the column of the dataframe df labeled as Physics, the resulting output will look as follows :
7 is present in the column Physics
If we want to determine if a particular value, say 2 is present in the column of the dataframe df labeled as Physics, the resulting output will look as follows:
2 is not present in the column Physics
Let us look at different ways of performing this operation on a given DataFrame :
In this method, we use the tolist() function to determine if a particular value is present in a column of a given Pandas DataFrame.
Let the column of interest be the one labeled as Physics. df[‘Physics’] returns this column and on applying the function .tolist() on this returned object, we get a list of the values in that column. Then we use the in keyword to check if a value is in this list or not.
Let us take a look at the code and corresponding output for this method. The tolist() function is basically used for type-casting here.
import pandas as pd
# Dictionary for our data
data = {'Name' : ['A', 'B', 'C'], 'Mathematics' : [8, 5, 10], 'Physics' : [7, 9, 8]}
# DataFrame for the dictionary
df = pd.DataFrame(data)
# Determining if 2 and 7 are present in column Physics
is2 = 2 in df['Physics'].tolist()
is7 = 7 in df['Physics'].tolist()
# Printing the output
if(is2):
  print("2 is present in the column Physics")
else:
  print("2 is not present in the column Physics")
if(is7):
  print("7 is present in the column Physics")
else:
  print("7 is not present in the column Physics")
Output :
2 is not present in the column Physics
7 is present in the column Physics
In this method, we use the list() function to determine if a particular value is present in a column of a given Pandas DataFrame.
Let the column of interest be the one labeled as Physics. df[‘Physics’] returns this column and on passing this as a parameter to the function list() on this returned object, we get a list of the values in that column. Then we use the in keyword to check if a value is in this list or not.
Let us take a look at the code and corresponding output for this method. The list() function is basically used for type-casting here.
import pandas as pd
# Dictionary for our data
data = {'Name' : ['A', 'B', 'C'], 'Mathematics' : [8, 5, 10], 'Physics' : [7, 9, 8]}
# DataFrame for the dictionary
df = pd.DataFrame(data)
# Determining if 2 and 7 are present in column Physics
is2 = 2 in list(df['Physics'])
is7 = 7 in list(df['Physics'])
# Printing the output
if(is2):
  print("2 is present in the column Physics")
else:
  print("2 is not present in the column Physics")
if(is7):
  print("7 is present in the column Physics")
else:
  print("7 is not present in the column Physics")
Output :
2 is not present in the column Physics
7 is present in the column Physics
In this method, we use the value property to determine if a particular value is present in a column of a given Pandas DataFrame.
Let the column of interest be the one labeled as Physics. df[‘Physics’] returns this column and the df[‘Physics’].values property gives us a list of the values in that column. Then we use the in keyword to check if a value is in this list or not.
Let us take a look at the code and corresponding output for this method.
import pandas as pd
# Dictionary for our data
data = {'Name' : ['A', 'B', 'C'], 'Mathematics' : [8, 5, 10], 'Physics' : [7, 9, 8]}
# DataFrame for the dictionary
df = pd.DataFrame(data)
# Determining if 2 and 7 are present in column Physics
is2 = 2 in df['Physics'].values
is7 = 7 in df['Physics'].values
# Printing the output
if(is2):
  print("2 is present in the column Physics")
else:
  print("2 is not present in the column Physics")
if(is7):
  print("7 is present in the column Physics")
else:
  print("7 is not present in the column Physics")
Output :
2 is not present in the column Physics
7 is present in the column Physics
In this topic, we have learned to determine whether a value is present in a column of an existing DataFrame, following a running example of test scores of students in different subjects, thus giving us an intuition of how this concept could be applied in real-world situations. Feel free to reach out to info.javaexercise@gmail.com in case of any suggestions.