Javaexercise.com

How to Add New Columns in a DataFrame Pandas?

Pandas is a well-known Python library used for data manipulation and analysis. Adding a new column to a DataFrame object is pretty straightforward. The Pandas library provides a few different techniques to do this.

In this tutorial, we will learn how to add a new column in a DataFrame.

Method 1 - Simple Column Assignment

The simplest way of adding a new column at the end of a DataFrame is by using the [] accessor. We need to specify the column name inside the square brackets.

Let's first create a DataFrame object.

import pandas as pd
​
#creating the data frame
c1 = ["Justin", "Jessica", "Simon", "Harry", "Victor"]
c2 = [56, 71, 31, 92, 40]
c3 = ["C", "B", "D", "A", "C"]
​
dataframe = pd.DataFrame(pd.DataFrame({"Student Name": c1, "Marks": c2, "Grade": c3}))
​
dataframe.head()
  Student Name  Marks Grade
0       Justin     56     C
1      Jessica     71     B
2        Simon     31     D
3        Harry     92     A
4       Victor     40     C

Now, let's add a new GPA column to the above DataFrame.

dataframe["GPA"] = [8.6, 8.9, 7.8, 9.1, 8.0]
print(dataframe)

Output:

  Student Name  Marks Grade  GPA
0       Justin     56     C  8.6
1      Jessica     71     B  8.9
2        Simon     31     D  7.8
3        Harry     92     A  9.1
4       Victor     40     C  8.0

Adding Columns To Dataframe by Using insert() Method

The method discussed in the previous section will always add a new column at the end of the DataFrame. We can use insert() to add a column at some other location.

This method takes the index, the column name, and the column values as parameters. It also takes an optional boolean allow_duplicates parameters.

Let's add a new column at the very beginning of the DataFrame(index 0).

import pandas as pd
​
#creating the data frame
c1 = ["Justin", "Jessica", "Simon", "Harry", "Victor"]
c2 = [56, 71, 31, 92, 40]
c3 = ["C", "B", "D", "A", "C"]
​
dataframe = pd.DataFrame(pd.DataFrame({"Student Name": c1, "Marks": c2, "Grade": c3}))
 
#Adding a column at index 0
c4 = [8.6, 8.9, 7.8, 9.1, 8.0]
dataframe.insert(0, "GPA", c4, True)
print(dataframe)

Output:

   GPA Student Name  Marks Grade
0  8.6       Justin     56     C
1  8.9      Jessica     71     B
2  7.8        Simon     31     D
3  9.1        Harry     92     A
4  8.0       Victor     40     C

Adding Columns To Dataframe by Using assign() Method

The assign() method is also a simple way of adding a new column to a DataFrame. However, it will not modify the existing DataFrame. Instead, it returns a new DataFrame object. We can make our current reference point to the new DataFrame. The following code demonstrates the use of this method.

import pandas as pd

#creating the data frame
c1 = ["Justin", "Jessica", "Simon", "Harry", "Victor"]
c2 = [56, 71, 31, 92, 40]
c3 = ["C", "B", "D", "A", "C"]

dataframe = pd.DataFrame(pd.DataFrame({"Student Name": c1, "Marks": c2, "Grade": c3}))
 
#Adding a column
dataframe = dataframe.assign(GPA = [8.6, 8.9, 7.8, 9.1, 8.0])
print(dataframe)

Output:

  Student Name  Marks Grade  GPA
0       Justin     56     C  8.6
1      Jessica     71     B  8.9
2        Simon     31     D  7.8
3        Harry     92     A  9.1
4       Victor     40     C  8.0

Adding Columns To Dataframe by Using loc

The loc property of a DataFrame is used to access a group of rows or columns. We can also use the loc property to insert a new column. The following Python code demonstrates this.

import pandas as pd
​
#creating the data frame
c1 = ["Justin", "Jessica", "Simon", "Harry", "Victor"]
c2 = [56, 71, 31, 92, 40]
c3 = ["C", "B", "D", "A", "C"]
​
dataframe = pd.DataFrame(pd.DataFrame({"Student Name": c1, "Marks": c2, "Grade": c3}))
 
#Adding a column
dataframe.loc[:, "GPA"] = [8.6, 8.9, 7.8, 9.1, 8.0]
print(dataframe)

Output:

  Student Name  Marks Grade  GPA
0       Justin     56     C  8.6
1      Jessica     71     B  8.9
2        Simon     31     D  7.8
3        Harry     92     A  9.1
4       Victor     40     C  8.0

Adding Columns To Dataframe by Working with Pandas Series

A thing to note about the methods discussed above is that they don't give the desired outcome if the column to add is a Pandas Series.

If we have a DataFrame where the indexes are not in order, then a Pandas Series will also be jumbled.

For example, consider the following DataFrame object where the indexes are not in order.

import pandas as pd
​
#creating the data frame
c1 = ["Justin", "Jessica", "Simon", "Harry", "Victor"]
c2 = [56, 71, 31, 92, 40]
c3 = ["C", "B", "D", "A", "C"]
​
dataframe = pd.DataFrame(pd.DataFrame({"Student Name": c1, "Marks": c2, "Grade": c3}))
dataframe.index = [4, 1, 2, 0, 3]#Jumbling the dataframe index
print(dataframe)

Output:

  Student Name  Marks Grade
4       Justin     56     C
1      Jessica     71     B
2        Simon     31     D
0        Harry     92     A
3       Victor     40     C

Now, let's add a Pandas Series column from top to bottom(the first entry of the Series should be the first entry of the DataFrame column, the second entry of the Series should be the second entry of the DataFrame column, and so on).

gpa = pd.Series([8.6, 8.9, 7.8, 9.1, 8.0])
dataframe["GPA"] = gpa
print(dataframe)

Output:

  Student Name  Marks Grade  GPA
4       Justin     56     C  8.0
1      Jessica     71     B  8.9
2        Simon     31     D  7.8
0        Harry     92     A  8.6
3       Victor     40     C  9.1

As we can see, the Series data gets jumbled. It happens because the Series is also indexed(0 to n), and Pandas tries to match the two indexes. To avoid this, we can use the values property with the Series.

gpa = pd.Series([8.6, 8.9, 7.8, 9.1, 8.0])
dataframe["GPA"] = gpa.values #Using values property 
print(dataframe)

Output:

  Student Name  Marks Grade  GPA
4       Justin     56     C  8.6
1      Jessica     71     B  8.9
2        Simon     31     D  7.8
0        Harry     92     A  9.1
3       Victor     40     C  8.0

Or we can change the index of the Series to match the index of the DataFrame.

gpa = pd.Series([8.6, 8.9, 7.8, 9.1, 8.0])
gpa.index = dataframe.index #Changing the index
dataframe["GPA"] = gpa
print(dataframe)

Output:

  Student Name  Marks Grade  GPA
4       Justin     56     C  8.6
1      Jessica     71     B  8.9
2        Simon     31     D  7.8
0        Harry     92     A  9.1
3       Victor     40     C  8.0

Summary

Adding a new column to a Pandas DataFrame is pretty simple. The square brackets method is the most intuitive and easiest to remember. Use the insert() method if you wish to add a column at some other index. Use assign() to create a new DataFrame with an additional column.

The assign() method will not alter the original DataFrame. Make sure to use the values property or change the index of the Pandas Series object if you have an unordered DataFrame.

 

Useful References: