A DataFrame is the primary data structure of the Pandas library and is commonly used for storing and working with tabular data. A common operation that could be performed on such data is to get a list of a column or row in order to extract information from it.
To start working with Pandas, we first need to import it in the Python code:
import pandas as pd
Let us understand this operation with the help of an example. Consider the following DataFrame containing 3 students with names A, B, and C and their corresponding marks (out of 10) for two subjects, Mathematics and Physics.
Code snippet for generating the above DataFrame :
import pandas as pd
# Dictionary for our data
data = {'Name' : ['A', 'B', 'C'], 'Mathematics' : [8, 5, 10], 'Physics' : [7, 9, 8]}
# DataFrame for the dictionary
df = pd.DataFrame(data)
# Printing the DataFrame
print(df)
Here, data is a dictionary we created to initialize the DataFrame. For this, we use the DataFrame() function of the Pandas library which takes the dictionary as an argument and returns the required DataFrame.
Now let us say that for some reason, we want to get a list from the column Physics of this dataframe df, the resulting output will look as follows :
[7, 9, 8]
If we want to get a list of a particular row, say the one with index 1, the resulting output will look as follows :
['B', 5, 9]
Let us look at different ways of performing this operation on a given DataFrame :
In this method, we use the list() function to get a list of a column of a given Pandas DataFrame. The column of interest is Physics here. df[‘Physics’] returns the column Physics from the dataframe and when passed to the list() function as a parameter, a Python list is returned. Let us take a look at the code and corresponding output for this method.
import pandas as pd
# Dictionary for our data
data = {'Name' : ['A', 'B', 'C'], 'Mathematics' : [8, 5, 10], 'Physics' : [7, 9, 8]}
# DataFrame for the dictionary
df = pd.DataFrame(data)
# Getting the list of column Physics
required_list = list(df['Physics'])
# Printing the list
print(required_list)
Output :
[7, 9, 8]
In this method, we use the list() function to get a list of a column of a given Pandas DataFrame. The column of interest is Physics here. The df.loc property returns the row with index 1 from the dataframe, when 1 is passed as a parameter to the .loc property. When this is passed to the list() function as a parameter, a Python list is returned. Let us take a look at the code and corresponding output for this method.
import pandas as pd
# Dictionary for our data
data = {'Name' : ['A', 'B', 'C'], 'Mathematics' : [8, 5, 10], 'Physics' : [7, 9, 8]}
# DataFrame for the dictionary
df = pd.DataFrame(data)
# Getting the list of row with index 1
required_list = list(df.loc[1])
# Printing the list
print(required_list)
Output :
['B', 5, 9]
You can also use the dataframe.iloc property in a similar manner for this operation in place of dataframe.loc. Let us take a look at the code and corresponding output for this method.
import pandas as pd
# Dictionary for our data
data = {'Name' : ['A', 'B', 'C'], 'Mathematics' : [8, 5, 10], 'Physics' : [7, 9, 8]}
# DataFrame for the dictionary
df = pd.DataFrame(data)
# Getting the list of row with index 1
required_list = list(df.iloc[1])
# Printing the list
print(required_list)
Output :
['B', 5, 9]
In this method, we use the tolist() function to get a list of a column of a given Pandas DataFrame. The column of interest is Physics here. df[‘Physics’] returns the column Physics from the dataframe and when the .tolist() function is applied to it, a Python list is returned. Let us take a look at the code and corresponding output for this method.
import pandas as pd
# Dictionary for our data
data = {'Name' : ['A', 'B', 'C'], 'Mathematics' : [8, 5, 10], 'Physics' : [7, 9, 8]}
# DataFrame for the dictionary
df = pd.DataFrame(data)
# Getting the list of column Physics
required_list = df['Physics'].tolist()
# Printing the list
print(required_list)
Output :
[7, 9, 8]
In this method, we use the .tolist() function to get a list of a column of a given Pandas DataFrame. The column of interest is Physics here. Df.loc property returns the row with index 1 from the dataframe, when 1 is passed as a parameter to the .loc property. When the .tolist() function is applied to this returned object, a Python list is returned. Let us take a look at the code and corresponding output for this method.
import pandas as pd
# Dictionary for our data
data = {'Name' : ['A', 'B', 'C'], 'Mathematics' : [8, 5, 10], 'Physics' : [7, 9, 8]}
# DataFrame for the dictionary
df = pd.DataFrame(data)
# Getting the list of row with index 1
required_list = df.loc[1].tolist()
# Printing the list
print(required_list)
Output :
['B', 5, 9]
You can also use the dataframe.iloc property in a similar manner for this operation in place of dataframe.loc. Let us take a look at the code and corresponding output for this method.
import pandas as pd
# Dictionary for our data
data = {'Name' : ['A', 'B', 'C'], 'Mathematics' : [8, 5, 10], 'Physics' : [7, 9, 8]}
# DataFrame for the dictionary
df = pd.DataFrame(data)
# Getting the list of row with index 1
required_list = df.iloc[1].tolist()
# Printing the list
print(required_list)
Output :
['B', 5, 9]
In this topic, we have learned to get a list of a column or row of an existing Pandas DataFrame, following a running example of test scores of students in different subjects, thus giving us an intuition of how this concept could be applied in real-world situations. Feel free to reach out to info.javaexercise@gmail.com in case of any suggestions