Lessons
Learn Pandas
Pandas DataFrames in Python
When working with data in Python, the pandas library is one of the most popular tools. It helps manage and analyze data in a simple way. One of the most important features in pandas is the DataFrame.
What is a Pandas DataFrame?
A pandas DataFrame is a table-like data structure. It stores data in rows and columns, just like a spreadsheet or a SQL table. This makes it easy to read, understand, and work with data.
You can use a DataFrame to:
- Store different types of data
- Access and update specific parts of the data
- Perform operations like sorting, filtering, and grouping
Many people search for what is pandas dataframe or dataframe in python for beginners because it's the starting point in data analysis using Python.
Main Features of Pandas DataFrame
Let’s understand what makes a DataFrame useful:
1. Two-Dimensional Structure
A DataFrame holds data in two dimensions. This means it has rows and columns, similar to an Excel sheet. Each row and each column has a label or index.
2. Size-Mutable
You can change the size of the DataFrame. You can add or remove rows and columns whenever needed.
3. Heterogeneous Data
A DataFrame can hold different types of data. For example, one column can have numbers while another column has text.
4. Labeled Axes
The rows and columns in a DataFrame have labels. You can use these labels to easily select data. These features make the DataFrame a powerful structure for handling real-world data.
Pandas DataFrame Analogy
You can think of a pandas DataFrame as a dictionary of Series. A Series in pandas is a one-dimensional array with labels. So, a DataFrame is like a bunch of Series placed side by side, sharing the same row labels. This idea helps you understand how pandas stores and aligns data internally.
How to Use Pandas in Python
Before you use DataFrame, you need to install and import pandas. You can install pandas using this command:
python
1
pip install pandas
To use pandas in your Python code, import it like this:
python
1
import pandas as pd
The pd
part is a short name or alias that makes your code cleaner and easier to read.
Create a Pandas DataFrame
This section is for beginners who want to know how to create a DataFrame in Python using the pandas library. We'll use basic examples that are easy to understand.
Different Ways to Create a Pandas DataFrame
You can create a pandas DataFrame in multiple ways. The most common methods are:
- From a list
- From a list of lists
- From a dictionary
- From a list of dictionaries
Let’s look at each method with simple examples.
1. Create DataFrame from a List
You can create a DataFrame from a single list. In this case, each element in the list becomes a row.
Example:
python
1 2 3 4 5 6 7 8 9 10
import pandas as pd # A list of strings data = ['Python', 'Pandas', 'Data', 'Frame'] # Create DataFrame df = pd.DataFrame(data) # Display result print(df)
Output:
text
1 2 3 4 5
0 0 Python 1 Pandas 2 Data 3 Frame
Here, pandas automatically adds a column with index 0
and row numbers from 0
to 3
.
2. Create DataFrame from a List of Lists
Each list inside the main list becomes a row, and each item inside becomes a column.
Example:
python
1 2 3 4 5 6 7 8 9
import pandas as pd # List of lists data = [['Tom', 20], ['Jerry', 22], ['Mickey', 25]] # Create DataFrame df = pd.DataFrame(data, columns=['Name', 'Age']) print(df)
Output:
1 2 3 4
Name Age 0 Tom 20 1 Jerry 22 2 Mickey 25
This example shows how you can give names to the columns using the columns
parameter.
Create DataFrame from a Dictionary
A dictionary can also be used to create a DataFrame. The keys in the dictionary become column names, and the values become column data.
Example:
python
1 2 3 4 5 6 7 8 9 10 11
import pandas as pd # Dictionary with equal-length lists data = { 'Name': ['Tom', 'Nick', 'Krish', 'Jack'], 'Age': [20, 21, 19, 18] } df = pd.DataFrame(data) print(df)
Output:
1 2 3 4 5
Name Age 0 Tom 20 1 Nick 21 2 Krish 19 3 Jack 18
Note: Make sure all values in the dictionary (the lists) have the same length. Otherwise, it will show an error.
Create DataFrame from a List of Dictionaries
Each dictionary becomes a row, and the keys become column names.
Example:
python
1 2 3 4 5 6 7 8 9 10 11 12
import pandas as pd # List of dictionaries data = [ {'Name': 'Tom', 'Age': 20}, {'Name': 'Nick', 'Age': 21}, {'Name': 'Krish', 'Age': 19} ] df = pd.DataFrame(data) print(df)
Output:
1 2 3 4
Name Age 0 Tom 20 1 Nick 21 2 Krish 19
This method is very common when loading data from external sources like JSON or APIs.
Rows and Columns in a DataFrame
This section is helpful for beginners who want to learn how to access, select, and update rows and columns in a pandas DataFrame. It covers basic operations that are used often in data analysis.
Access Columns in a DataFrame
To access a column, you can use either square brackets []
or dot .
notation.
Example 1: Using Square Brackets
python
1 2 3 4 5 6 7 8 9 10 11
import pandas as pd data = { 'Name': ['Tom', 'Jerry', 'Mickey'], 'Age': [20, 21, 19] } df = pd.DataFrame(data) # Access 'Name' column print(df['Name'])
Output:
text
1 2 3 4
0 Tom 1 Jerry 2 Mickey Name: Name, dtype: object
Use square brackets if the column name has spaces or special characters.
Example 2: Using Dot Notation
python
1
print(df.Name)
This gives the same output. But avoid this method if your column name has spaces or clashes with built-in methods.
Access Multiple Columns
You can pass a list of column names to get more than one column.
python
1
print(df[['Name', 'Age']])
Access Rows in a DataFrame
You can use .loc[]
or .iloc[]
to access rows.
1. .loc[] for Row by Label
.loc[]
uses the index label. It is mostly used when you know the row index name.
python
1 2
# Get row with index label 1 print(df.loc[1])
Output:
text
1 2 3
Name Jerry Age 21 Name: 1, dtype: object
2. .iloc[] for Row by Position
.iloc[]
is used for accessing rows by their position (like using list indexing).
python
1 2
# Get second row (position 1) print(df.iloc[1])
Same output as .loc[1]
in this case.
Access a Cell (Specific Value)
You can combine row and column selection.
python
1 2
# Get the value in row 1, column 'Name' print(df.loc[1, 'Name']) # Output: Jerry
Or using position:
python
1
print(df.iloc[1, 0]) # Output: Jerry
Add a New Column
You can add a new column using assignment.
python
1 2
df['Country'] = ['USA', 'UK', 'Canada'] print(df)
Output:
1 2 3 4
Name Age Country 0 Tom 20 USA 1 Jerry 21 UK 2 Mickey 19 Canada
Remove a Column
Use the drop()
function with axis=1
.
python
1 2
df = df.drop('Country', axis=1) print(df)