Python Tutorial

Python Introduction Python History & Versions Python Installation Python Interactive Shell Python Self Help Python3.8 Features & Updates Python Variables Python Data types Python Control Statements Python If Statements Python For loop Python While Loop Python Break Statement Python Continue Statement Python Functions Python Functions Python Default Arguments Python Keyword Arguments Python Positional Arguments Python Arbitrary Arguments Python Lambda Expression Python Variable Scopes Python Data Structures Python List Python List Comprehension Python Tuple Python Nested Tuple Python Set Python FrozenSet Python Dictionary Python miscellaneous Topics Python Operators Python Del Statement Python String Python String Formatting Python Date Python Regex Python Exception Handling Python Programming Exercise Learn Python Programs Python Library Learn Pandas Pandas Interview Questions Python DataBase Handling Python MySQL Connectivity Install MongoDB Python MongoDB Connectivity Python Built-In Methods/Functions Python built-in methods

How to Add New Columns in a DataFrame Pandas?

Pandas is a well-known Python library used for data manipulation and analysis. Adding a new column to a DataFrame object is pretty straightforward. The Pandas library provides a few different techniques to do this.

In this tutorial, we will learn how to add a new column in a DataFrame.

Method 1 - Simple Column Assignment

The simplest way of adding a new column at the end of a DataFrame is by using the [] accessor. We need to specify the column name inside the square brackets.

Let's first create a DataFrame object.

import pandas as pd
â€‹
#creating the data frame
c1 = ["Justin", "Jessica", "Simon", "Harry", "Victor"]
c2 = [56, 71, 31, 92, 40]
c3 = ["C", "B", "D", "A", "C"]
â€‹
dataframe = pd.DataFrame(pd.DataFrame({"Student Name": c1, "Marks": c2, "Grade": c3}))
â€‹
dataframe.head()

  Student Name  Marks Grade
0       Justin     56     C
1      Jessica     71     B
2        Simon     31     D
3        Harry     92     A
4       Victor     40     C

Now, let's add a new GPA column to the above DataFrame.

dataframe["GPA"] = [8.6, 8.9, 7.8, 9.1, 8.0]
print(dataframe)

Output:

  Student Name  Marks Grade  GPA
0       Justin     56     C  8.6
1      Jessica     71     B  8.9
2        Simon     31     D  7.8
3        Harry     92     A  9.1
4       Victor     40     C  8.0

Adding Columns To Dataframe by Using insert() Method

The method discussed in the previous section will always add a new column at the end of the DataFrame. We can use insert() to add a column at some other location.

This method takes the index, the column name, and the column values as parameters. It also takes an optional boolean allow_duplicates parameters.

Let's add a new column at the very beginning of the DataFrame(index 0).

import pandas as pd
â€‹
#creating the data frame
c1 = ["Justin", "Jessica", "Simon", "Harry", "Victor"]
c2 = [56, 71, 31, 92, 40]
c3 = ["C", "B", "D", "A", "C"]
â€‹
dataframe = pd.DataFrame(pd.DataFrame({"Student Name": c1, "Marks": c2, "Grade": c3}))
 
#Adding a column at index 0
c4 = [8.6, 8.9, 7.8, 9.1, 8.0]
dataframe.insert(0, "GPA", c4, True)
print(dataframe)

Output:

   GPA Student Name  Marks Grade
0  8.6       Justin     56     C
1  8.9      Jessica     71     B
2  7.8        Simon     31     D
3  9.1        Harry     92     A
4  8.0       Victor     40     C

Adding Columns To Dataframe by Using assign() Method

The assign() method is also a simple way of adding a new column to a DataFrame. However, it will not modify the existing DataFrame. Instead, it returns a new DataFrame object. We can make our current reference point to the new DataFrame. The following code demonstrates the use of this method.

import pandas as pd

#creating the data frame
c1 = ["Justin", "Jessica", "Simon", "Harry", "Victor"]
c2 = [56, 71, 31, 92, 40]
c3 = ["C", "B", "D", "A", "C"]

dataframe = pd.DataFrame(pd.DataFrame({"Student Name": c1, "Marks": c2, "Grade": c3}))
 
#Adding a column
dataframe = dataframe.assign(GPA = [8.6, 8.9, 7.8, 9.1, 8.0])
print(dataframe)

Output:

  Student Name  Marks Grade  GPA
0       Justin     56     C  8.6
1      Jessica     71     B  8.9
2        Simon     31     D  7.8
3        Harry     92     A  9.1
4       Victor     40     C  8.0

Adding Columns To Dataframe by Using loc

The loc property of a DataFrame is used to access a group of rows or columns. We can also use the loc property to insert a new column. The following Python code demonstrates this.

import pandas as pd
â€‹
#creating the data frame
c1 = ["Justin", "Jessica", "Simon", "Harry", "Victor"]
c2 = [56, 71, 31, 92, 40]
c3 = ["C", "B", "D", "A", "C"]
â€‹
dataframe = pd.DataFrame(pd.DataFrame({"Student Name": c1, "Marks": c2, "Grade": c3}))
 
#Adding a column
dataframe.loc[:, "GPA"] = [8.6, 8.9, 7.8, 9.1, 8.0]
print(dataframe)

Output:

  Student Name  Marks Grade  GPA
0       Justin     56     C  8.6
1      Jessica     71     B  8.9
2        Simon     31     D  7.8
3        Harry     92     A  9.1
4       Victor     40     C  8.0

Adding Columns To Dataframe by Working with Pandas Series

A thing to note about the methods discussed above is that they don't give the desired outcome if the column to add is a Pandas Series.

If we have a DataFrame where the indexes are not in order, then a Pandas Series will also be jumbled.

For example, consider the following DataFrame object where the indexes are not in order.

import pandas as pd
â€‹
#creating the data frame
c1 = ["Justin", "Jessica", "Simon", "Harry", "Victor"]
c2 = [56, 71, 31, 92, 40]
c3 = ["C", "B", "D", "A", "C"]
â€‹
dataframe = pd.DataFrame(pd.DataFrame({"Student Name": c1, "Marks": c2, "Grade": c3}))
dataframe.index = [4, 1, 2, 0, 3]#Jumbling the dataframe index
print(dataframe)

Output:

  Student Name  Marks Grade
4       Justin     56     C
1      Jessica     71     B
2        Simon     31     D
0        Harry     92     A
3       Victor     40     C

Now, let's add a Pandas Series column from top to bottom(the first entry of the Series should be the first entry of the DataFrame column, the second entry of the Series should be the second entry of the DataFrame column, and so on).

gpa = pd.Series([8.6, 8.9, 7.8, 9.1, 8.0])
dataframe["GPA"] = gpa
print(dataframe)

Output:

  Student Name  Marks Grade  GPA
4       Justin     56     C  8.0
1      Jessica     71     B  8.9
2        Simon     31     D  7.8
0        Harry     92     A  8.6
3       Victor     40     C  9.1

As we can see, the Series data gets jumbled. It happens because the Series is also indexed(0 to n), and Pandas tries to match the two indexes. To avoid this, we can use the values property with the Series.

gpa = pd.Series([8.6, 8.9, 7.8, 9.1, 8.0])
dataframe["GPA"] = gpa.values #Using values property 
print(dataframe)

Output:

  Student Name  Marks Grade  GPA
4       Justin     56     C  8.6
1      Jessica     71     B  8.9
2        Simon     31     D  7.8
0        Harry     92     A  9.1
3       Victor     40     C  8.0

Or we can change the index of the Series to match the index of the DataFrame.

gpa = pd.Series([8.6, 8.9, 7.8, 9.1, 8.0])
gpa.index = dataframe.index #Changing the index
dataframe["GPA"] = gpa
print(dataframe)

Output:

  Student Name  Marks Grade  GPA
4       Justin     56     C  8.6
1      Jessica     71     B  8.9
2        Simon     31     D  7.8
0        Harry     92     A  9.1
3       Victor     40     C  8.0

Summary

Adding a new column to a Pandas DataFrame is pretty simple. The square brackets method is the most intuitive and easiest to remember. Use the insert() method if you wish to add a column at some other index. Use assign() to create a new DataFrame with an additional column.

The assign() method will not alter the original DataFrame. Make sure to use the values property or change the index of the Pandas Series object if you have an unordered DataFrame.

Useful References:

Pandas Official Documentation

Trending

How to Undo a Git Add before a Commit?

How to Read a File Line by Line in Python?

What are comments in Python and How to create them?

How To Delete Rows From A Pandas DataFrame Based On A Conditional Expression?