Pandas is a well-known Python library used for data manipulation and analysis. Adding a new column to a DataFrame object is pretty straightforward. The Pandas library provides a few different techniques to do this.
In this tutorial, we will learn how to add a new column in a DataFrame.
The simplest way of adding a new column at the end of a DataFrame is by using the [] accessor. We need to specify the column name inside the square brackets.
Let's first create a DataFrame object.
import pandas as pd
​
#creating the data frame
c1 = ["Justin", "Jessica", "Simon", "Harry", "Victor"]
c2 = [56, 71, 31, 92, 40]
c3 = ["C", "B", "D", "A", "C"]
​
dataframe = pd.DataFrame(pd.DataFrame({"Student Name": c1, "Marks": c2, "Grade": c3}))
​
dataframe.head()
Student Name Marks Grade 0 Justin 56 C 1 Jessica 71 B 2 Simon 31 D 3 Harry 92 A 4 Victor 40 C
Now, let's add a new GPA column to the above DataFrame.
dataframe["GPA"] = [8.6, 8.9, 7.8, 9.1, 8.0]
print(dataframe)
Output:
Student Name Marks Grade GPA 0 Justin 56 C 8.6 1 Jessica 71 B 8.9 2 Simon 31 D 7.8 3 Harry 92 A 9.1 4 Victor 40 C 8.0
The method discussed in the previous section will always add a new column at the end of the DataFrame. We can use insert() to add a column at some other location.
This method takes the index, the column name, and the column values as parameters. It also takes an optional boolean allow_duplicates parameters.
Let's add a new column at the very beginning of the DataFrame(index 0).
import pandas as pd
​
#creating the data frame
c1 = ["Justin", "Jessica", "Simon", "Harry", "Victor"]
c2 = [56, 71, 31, 92, 40]
c3 = ["C", "B", "D", "A", "C"]
​
dataframe = pd.DataFrame(pd.DataFrame({"Student Name": c1, "Marks": c2, "Grade": c3}))
#Adding a column at index 0
c4 = [8.6, 8.9, 7.8, 9.1, 8.0]
dataframe.insert(0, "GPA", c4, True)
print(dataframe)
Output:
GPA Student Name Marks Grade 0 8.6 Justin 56 C 1 8.9 Jessica 71 B 2 7.8 Simon 31 D 3 9.1 Harry 92 A 4 8.0 Victor 40 C
The assign() method is also a simple way of adding a new column to a DataFrame. However, it will not modify the existing DataFrame. Instead, it returns a new DataFrame object. We can make our current reference point to the new DataFrame. The following code demonstrates the use of this method.
import pandas as pd
#creating the data frame
c1 = ["Justin", "Jessica", "Simon", "Harry", "Victor"]
c2 = [56, 71, 31, 92, 40]
c3 = ["C", "B", "D", "A", "C"]
dataframe = pd.DataFrame(pd.DataFrame({"Student Name": c1, "Marks": c2, "Grade": c3}))
#Adding a column
dataframe = dataframe.assign(GPA = [8.6, 8.9, 7.8, 9.1, 8.0])
print(dataframe)
Output:
Student Name Marks Grade GPA 0 Justin 56 C 8.6 1 Jessica 71 B 8.9 2 Simon 31 D 7.8 3 Harry 92 A 9.1 4 Victor 40 C 8.0
The loc property of a DataFrame is used to access a group of rows or columns. We can also use the loc property to insert a new column. The following Python code demonstrates this.
import pandas as pd
​
#creating the data frame
c1 = ["Justin", "Jessica", "Simon", "Harry", "Victor"]
c2 = [56, 71, 31, 92, 40]
c3 = ["C", "B", "D", "A", "C"]
​
dataframe = pd.DataFrame(pd.DataFrame({"Student Name": c1, "Marks": c2, "Grade": c3}))
#Adding a column
dataframe.loc[:, "GPA"] = [8.6, 8.9, 7.8, 9.1, 8.0]
print(dataframe)
Output:
Student Name Marks Grade GPA 0 Justin 56 C 8.6 1 Jessica 71 B 8.9 2 Simon 31 D 7.8 3 Harry 92 A 9.1 4 Victor 40 C 8.0
A thing to note about the methods discussed above is that they don't give the desired outcome if the column to add is a Pandas Series.
If we have a DataFrame where the indexes are not in order, then a Pandas Series will also be jumbled.
For example, consider the following DataFrame object where the indexes are not in order.
import pandas as pd
​
#creating the data frame
c1 = ["Justin", "Jessica", "Simon", "Harry", "Victor"]
c2 = [56, 71, 31, 92, 40]
c3 = ["C", "B", "D", "A", "C"]
​
dataframe = pd.DataFrame(pd.DataFrame({"Student Name": c1, "Marks": c2, "Grade": c3}))
dataframe.index = [4, 1, 2, 0, 3]#Jumbling the dataframe index
print(dataframe)
Output:
Student Name Marks Grade 4 Justin 56 C 1 Jessica 71 B 2 Simon 31 D 0 Harry 92 A 3 Victor 40 C
Now, let's add a Pandas Series column from top to bottom(the first entry of the Series should be the first entry of the DataFrame column, the second entry of the Series should be the second entry of the DataFrame column, and so on).
gpa = pd.Series([8.6, 8.9, 7.8, 9.1, 8.0])
dataframe["GPA"] = gpa
print(dataframe)
Output:
Student Name Marks Grade GPA 4 Justin 56 C 8.0 1 Jessica 71 B 8.9 2 Simon 31 D 7.8 0 Harry 92 A 8.6 3 Victor 40 C 9.1
As we can see, the Series data gets jumbled. It happens because the Series is also indexed(0 to n), and Pandas tries to match the two indexes. To avoid this, we can use the values property with the Series.
gpa = pd.Series([8.6, 8.9, 7.8, 9.1, 8.0])
dataframe["GPA"] = gpa.values #Using values property
print(dataframe)
Output:
Student Name Marks Grade GPA 4 Justin 56 C 8.6 1 Jessica 71 B 8.9 2 Simon 31 D 7.8 0 Harry 92 A 9.1 3 Victor 40 C 8.0
Or we can change the index of the Series to match the index of the DataFrame.
gpa = pd.Series([8.6, 8.9, 7.8, 9.1, 8.0])
gpa.index = dataframe.index #Changing the index
dataframe["GPA"] = gpa
print(dataframe)
Output:
Student Name Marks Grade GPA 4 Justin 56 C 8.6 1 Jessica 71 B 8.9 2 Simon 31 D 7.8 0 Harry 92 A 9.1 3 Victor 40 C 8.0
Adding a new column to a Pandas DataFrame is pretty simple. The square brackets method is the most intuitive and easiest to remember. Use the insert() method if you wish to add a column at some other index. Use assign() to create a new DataFrame with an additional column.
The assign() method will not alter the original DataFrame. Make sure to use the values property or change the index of the Pandas Series object if you have an unordered DataFrame.