A DataFrame is the primary data structure of the Pandas library in Python and is commonly used for storing and working with tabular data.
A common operation that could be performed on such data is to convert the type of a column to date-time in order to update existing information and have it in a more meaningful format.
To start working with Pandas, we first need to import it:
import pandas as pd
Let us understand this operation with the help of an example. Consider the following DataFrame containing 3 students with names A, B, and C and their corresponding marks (out of 10) for two subjects, Mathematics and Physics, with a column for dates.
Code snippet for generating the above DataFrame :
# Importing pandas
import pandas as pd
# Dictionary for our data
data = {'date': ['10/1/2022', '11/1/2022', '12/1/2022'], 'Name' : ['A', 'B', 'C'], 'Mathematics' : [8, 5, 10], 'Physics' : [7, 9, 8]}
# DataFrame for the dictionary
df = pd.DataFrame(data)
# Printing
print(df)
Here, data is a dictionary that we created to initialize the DataFrame. For this, we use the DataFrame() function of the Pandas library which takes the dictionary as an argument and returns the required DataFrame.
Now, let’s say we need to convert the date column from string type to datetime type. The output doesn’t change much and can be interpreted in the same manner but the object type does change. The resulting output would look like this :
Let us look at different ways of performing this operation on a given DataFrame :
In this method, we use the pandas.to_datetime() function to convert the date column to date-time type.
The column to be converted is passed as a parameter to this function and the required object is returned. Reassignment is required here since the update is not done in an in place manner.
Let us take a look at the corresponding code snippet and generated output for this method:
# Importing pandas
import pandas as pd
# Dictionary for our data
data = {'date': ['10/1/2022', '11/1/2022', '12/1/2022'], 'Name' : ['A', 'B', 'C'], 'Mathematics' : [8, 5, 10], 'Physics' : [7, 9, 8]}
# DataFrame for the dictionary
df = pd.DataFrame(data)
# Performing the operation
df['date'] = pd.to_datetime(df['date'])
# Printing
print(df)
Output :
Instead of using df[‘date’] to access the date column of the DataFrame as shown earlier, we can also use df.date to access it instead and get the same results. Let us take a look at the corresponding code snippet and generated output for this method:
# Importing pandas
import pandas as pd
# Dictionary for our data
data = {'date': ['10/1/2022', '11/1/2022', '12/1/2022'], 'Name' : ['A', 'B', 'C'], 'Mathematics' : [8, 5, 10], 'Physics' : [7, 9, 8]}
# DataFrame for the dictionary
df = pd.DataFrame(data)
# Performing the operation
df.date = pd.to_datetime(df.date)
# Printing
print(df)
Output :
In this method, we use the .astype() function to convert the date column to datetime type. The desired column, here date, is accessed using square brackets in the manner df[‘date’], and then we apply the .astype() function with the desired data type, datetime64 passed as a parameter here.
The returned object is the updated column which requires reassignment since the updates are not made in an in place manner.
Let us take a look at the corresponding code snippet and generated output for this method :
# Importing pandas
import pandas as pd
# Dictionary for our data
data = {'date': ['10/1/2022', '11/1/2022', '12/1/2022'], 'Name' : ['A', 'B', 'C'], 'Mathematics' : [8, 5, 10], 'Physics' : [7, 9, 8]}
# DataFrame for the dictionary
df = pd.DataFrame(data)
# Performing the operation
df['date'] = df['date'].astype('datetime64')
# Printing
print(df)
Output :
Instead of using df[‘date’] to access the date column of the DataFrame as shown earlier, we can also use df.date to access it instead and get the same results.
Let us take a look at the corresponding code snippet and generated output for this method :
# Importing pandas
import pandas as pd
# Dictionary for our data
data = {'date': ['10/1/2022', '11/1/2022', '12/1/2022'], 'Name' : ['A', 'B', 'C'], 'Mathematics' : [8, 5, 10], 'Physics' : [7, 9, 8]}
# DataFrame for the dictionary
df = pd.DataFrame(data)
# Performing the operation
df.date = df.date.astype('datetime64')
# Printing
print(df)
Output :
In this method, we use the .apply() function to convert the date column to datetime type.
The desired column, here date, is accessed using square brackets in the manner df[‘date’] and then we use the .apply() function with the desired function, pd.to_datetime passed as a parameter here.
The returned object is the updated column which requires reassignment since the updates are not made in an in place manner.
Let us take a look at the corresponding code snippet and generated output for this method:
# Importing pandas
import pandas as pd
# Dictionary for our data
data = {'date': ['10/1/2022', '11/1/2022', '12/1/2022'], 'Name' : ['A', 'B', 'C'], 'Mathematics' : [8, 5, 10], 'Physics' : [7, 9, 8]}
# DataFrame for the dictionary
df = pd.DataFrame(data)
# Performing the operation
df.date = df.date.apply(pd.to_datetime)
# Printing
print(df)
Output :
In this topic, we have learned to convert a column to datetime type in an existing Pandas DataFrame, following a running example of test scores of students in different subjects, thus giving us an intuition of how this concept could be applied in real-world situations. Feel free to reach out to info.javaexercise@gmail.com in case of any suggestions.