![碧水源 ow-df-28c](https://www.zennitec.com/wp-content/webp-express/webp-images/uploads/2025/01/碧水源-ow-df-28c-850x460.jpg.webp)
Introduction to Pandas DataFrame
Pandas is a powerful and widely used Python library for data analysis and manipulation. One of its core structures is the DataFrame, which provides an efficient way to handle tabular data, similar to spreadsheets or SQL tables. In this article, we will explore what a Pandas DataFrame is, its key features, and how to work with it.
What is a Pandas DataFrame?
A DataFrame is a two-dimensional, mutable, and heterogeneous data structure in Pandas. It consists of rows and columns, where:
- Rows represent individual records (like database rows).
- Columns represent different data attributes (like fields in a table).
- Each column can contain different data types (integers, strings, floats, etc.).
DataFrames are built on top of NumPy arrays and can be created from various data sources, including dictionaries, lists, CSV files, SQL databases, and JSON files.
Creating a Pandas DataFrame
To work with DataFrames, you need to install Pandas if you haven’t already:
pip install pandas
Then, import Pandas:
import pandas as pd
Creating a DataFrame from a Dictionary
One of the most common ways to create a DataFrame is using a dictionary:
data = {
'Name': ['Alice', 'Bob', 'Charlie'],
'Age': [25, 30, 35],
'City': ['New York', 'Los Angeles', 'Chicago']
}
df = pd.DataFrame(data)
print(df)
Creating a DataFrame from a List of Lists
You can also create a DataFrame from a list of lists:
data = [
['Alice', 25, 'New York'],
['Bob', 30, 'Los Angeles'],
['Charlie', 35, 'Chicago']
]
df = pd.DataFrame(data, columns=['Name', 'Age', 'City'])
print(df)
Creating a DataFrame from a CSV File
To read data from a CSV file:
df = pd.read_csv('data.csv')
print(df.head()) # Displays the first 5 rows
Common DataFrame Operations
Accessing Columns
To access a specific column:
print(df['Name'])
Accessing Rows
Use .loc[]
and .iloc[]
to access specific rows:
print(df.loc[0]) # Access row by label (index)
print(df.iloc[1]) # Access row by position
Filtering Data
You can filter data using conditions:
filtered_df = df[df['Age'] > 28]
print(filtered_df)
Adding a New Column
To add a new column:
df['Salary'] = [50000, 60000, 70000]
print(df)
Deleting a Column
To remove a column:
df = df.drop(columns=['Salary'])
print(df)
Sorting Data
Sort data by a column:
df = df.sort_values(by='Age', ascending=False)
print(df)
Grouping Data
You can group data using groupby()
:
grouped_df = df.groupby('City').mean()
print(grouped_df)
Exporting Data
Save the DataFrame as a CSV file:
df.to_csv('output.csv', index=False)