Lessons

Learn Pandas

Pandas DataFrames in Python

When working with data in Python, the pandas library is one of the most popular tools. It helps manage and analyze data in a simple way. One of the most important features in pandas is the DataFrame.

What is a Pandas DataFrame?

A pandas DataFrame is a table-like data structure. It stores data in rows and columns, just like a spreadsheet or a SQL table. This makes it easy to read, understand, and work with data.

You can use a DataFrame to:

  • Store different types of data
  • Access and update specific parts of the data
  • Perform operations like sorting, filtering, and grouping
Many people search for what is pandas dataframe or dataframe in python for beginners because it's the starting point in data analysis using Python.

Main Features of Pandas DataFrame

Let’s understand what makes a DataFrame useful:

1. Two-Dimensional Structure

A DataFrame holds data in two dimensions. This means it has rows and columns, similar to an Excel sheet. Each row and each column has a label or index.

2. Size-Mutable

You can change the size of the DataFrame. You can add or remove rows and columns whenever needed.

3. Heterogeneous Data

A DataFrame can hold different types of data. For example, one column can have numbers while another column has text.

4. Labeled Axes

The rows and columns in a DataFrame have labels. You can use these labels to easily select data. These features make the DataFrame a powerful structure for handling real-world data.

Pandas DataFrame Analogy

You can think of a pandas DataFrame as a dictionary of Series. A Series in pandas is a one-dimensional array with labels. So, a DataFrame is like a bunch of Series placed side by side, sharing the same row labels. This idea helps you understand how pandas stores and aligns data internally.

How to Use Pandas in Python

Before you use DataFrame, you need to install and import pandas. You can install pandas using this command:

python
1
pip install pandas

To use pandas in your Python code, import it like this:

python
1
import pandas as pd

The pd part is a short name or alias that makes your code cleaner and easier to read.

Create a Pandas DataFrame

This section is for beginners who want to know how to create a DataFrame in Python using the pandas library. We'll use basic examples that are easy to understand.

Different Ways to Create a Pandas DataFrame

You can create a pandas DataFrame in multiple ways. The most common methods are:

  • From a list
  • From a list of lists
  • From a dictionary
  • From a list of dictionaries

Let’s look at each method with simple examples.

1. Create DataFrame from a List

You can create a DataFrame from a single list. In this case, each element in the list becomes a row.

Example:

python
1
2
3
4
5
6
7
8
9
10
import pandas as pd

# A list of strings
data = ['Python', 'Pandas', 'Data', 'Frame']

# Create DataFrame
df = pd.DataFrame(data)

# Display result
print(df)

Output:

text
1
2
3
4
5
         0
0   Python
1   Pandas
2     Data
3    Frame

Here, pandas automatically adds a column with index 0 and row numbers from 0 to 3.

2. Create DataFrame from a List of Lists

Each list inside the main list becomes a row, and each item inside becomes a column.

Example:

python
1
2
3
4
5
6
7
8
9
import pandas as pd

# List of lists
data = [['Tom', 20], ['Jerry', 22], ['Mickey', 25]]

# Create DataFrame
df = pd.DataFrame(data, columns=['Name', 'Age'])

print(df)

Output:

1
2
3
4
    Name  Age
0    Tom   20
1  Jerry   22
2 Mickey   25

This example shows how you can give names to the columns using the columns parameter.

Create DataFrame from a Dictionary

A dictionary can also be used to create a DataFrame. The keys in the dictionary become column names, and the values become column data.

Example:

python
1
2
3
4
5
6
7
8
9
10
11
import pandas as pd

# Dictionary with equal-length lists
data = {
    'Name': ['Tom', 'Nick', 'Krish', 'Jack'],
    'Age': [20, 21, 19, 18]
}

df = pd.DataFrame(data)

print(df)

Output:

1
2
3
4
5
    Name  Age
0    Tom   20
1   Nick   21
2  Krish   19
3   Jack   18
Note: Make sure all values in the dictionary (the lists) have the same length. Otherwise, it will show an error.

Create DataFrame from a List of Dictionaries

Each dictionary becomes a row, and the keys become column names.

Example:

python
1
2
3
4
5
6
7
8
9
10
11
12
import pandas as pd

# List of dictionaries
data = [
    {'Name': 'Tom', 'Age': 20},
    {'Name': 'Nick', 'Age': 21},
    {'Name': 'Krish', 'Age': 19}
]

df = pd.DataFrame(data)

print(df)

Output:

1
2
3
4
    Name  Age
0    Tom   20
1   Nick   21
2  Krish   19

This method is very common when loading data from external sources like JSON or APIs.

Rows and Columns in a DataFrame

This section is helpful for beginners who want to learn how to access, select, and update rows and columns in a pandas DataFrame. It covers basic operations that are used often in data analysis.

Access Columns in a DataFrame

To access a column, you can use either square brackets [] or dot . notation.

Example 1: Using Square Brackets

python
1
2
3
4
5
6
7
8
9
10
11
import pandas as pd

data = {
    'Name': ['Tom', 'Jerry', 'Mickey'],
    'Age': [20, 21, 19]
}

df = pd.DataFrame(data)

# Access 'Name' column
print(df['Name'])

Output:

text
1
2
3
4
0      Tom
1    Jerry
2   Mickey
Name: Name, dtype: object

Use square brackets if the column name has spaces or special characters.

Example 2: Using Dot Notation

python
1
print(df.Name)

This gives the same output. But avoid this method if your column name has spaces or clashes with built-in methods.

Access Multiple Columns

You can pass a list of column names to get more than one column.

python
1
print(df[['Name', 'Age']])

Access Rows in a DataFrame

You can use .loc[] or .iloc[] to access rows.

1. .loc[] for Row by Label

.loc[] uses the index label. It is mostly used when you know the row index name.

python
1
2
# Get row with index label 1
print(df.loc[1])

Output:

text
1
2
3
Name    Jerry
Age         21
Name: 1, dtype: object

2. .iloc[] for Row by Position

.iloc[] is used for accessing rows by their position (like using list indexing).

python
1
2
# Get second row (position 1)
print(df.iloc[1])

Same output as .loc[1] in this case.

Access a Cell (Specific Value)

You can combine row and column selection.

python
1
2
# Get the value in row 1, column 'Name'
print(df.loc[1, 'Name'])  # Output: Jerry

Or using position:

python
1
print(df.iloc[1, 0])  # Output: Jerry

Add a New Column

You can add a new column using assignment.

python
1
2
df['Country'] = ['USA', 'UK', 'Canada']
print(df)

Output:

1
2
3
4
     Name  Age Country
0     Tom   20     USA
1   Jerry   21      UK
2  Mickey   19  Canada

Remove a Column

Use the drop() function with axis=1.

python
1
2
df = df.drop('Country', axis=1)
print(df)

Frequently Asked Questions