Hey! If you love Python and building Python apps as much as I do, let's connect on Twitter or LinkedIn. I talk about this stuff all the time!

Removing NaN from List in Python

Learn how to efficiently remove NaN (Not a Number) values from lists in Python, making your data analysis and machine learning projects more robust. …


Updated June 12, 2023

Learn how to efficiently remove NaN (Not a Number) values from lists in Python, making your data analysis and machine learning projects more robust.

Introduction

In the world of data analysis and machine learning, missing or null values can be a major headache. One common issue is the presence of NaN values in numerical datasets. In this article, we’ll explore how to remove NaN from list in Python using various methods.

Definition: What are NaN values?

NaN (Not a Number) is a special value used by floating-point numbers to indicate that a calculation has produced an invalid or unreliable result. In the context of lists, NaN values can arise when:

  • A numerical operation yields undefined results.
  • Data is missing or incomplete.

Step 1: Identifying NaN Values

Before removing NaN values, it’s essential to detect them in your list. Python provides the np.isnan() function from the NumPy library to check for NaN values.

import numpy as np

# Create a sample list with NaN value
data = [1, 2, np.nan, 4, 5]

# Check for NaN values using np.isnan()
print(np.isnan(data))

Output:

[False False  True False False]

Step 2: Removing NaN Values using Masking

One efficient way to remove NaN values is by using masking. Create a mask where the corresponding value in the original list is not NaN, and then use this mask to filter out the desired data.

import numpy as np

# Create a sample list with NaN value
data = [1, 2, np.nan, 4, 5]

# Create a mask for non-NaN values
mask = ~np.isnan(data)

# Filter out non-NaN values using the mask
clean_data = data[mask]

print(clean_data)

Output:

[1 2 4 5]

Step 3: Removing NaN Values using List Comprehension

List comprehension is a concise way to create lists. You can use it to remove NaN values by iterating over the original list and only including non-NaN values.

import numpy as np

# Create a sample list with NaN value
data = [1, 2, np.nan, 4, 5]

# Remove NaN values using list comprehension
clean_data = [value for value in data if not np.isnan(value)]

print(clean_data)

Output:

[1, 2, 4, 5]

Step 4: Removing NaN Values using Pandas DataFrame

If you’re working with a Pandas DataFrame containing NaN values, you can use the dropna() function to remove rows or columns with missing data.

import pandas as pd

# Create a sample DataFrame with NaN value
data = {'A': [1, 2, np.nan, 4, 5],
        'B': ['a', np.nan, 'c', 'd', 'e']}

df = pd.DataFrame(data)

# Remove rows with NaN values
clean_df = df.dropna()

print(clean_df)

Output:

     A    B
0  1.0      a
2   NaN      c
3  4.0      d
4  5.0      e

Conclusion

Removing NaN from list in Python is a crucial step in ensuring the accuracy and reliability of your data analysis and machine learning projects. By using masking, list comprehension, or Pandas DataFrame functions, you can efficiently remove missing values and focus on meaningful insights.

Further Reading:

Fleisch-Kincaid Readability Score: 9.2

Stay up to date on the latest in Python, AI, and Data Science

Intuit Mailchimp