Working with Excel or CSV Data in Python: A Comprehensive Guide

DS - VRP
3 min readJul 14, 2024

--

Python offers a plethora of libraries for working with Excel and CSV data. Whether you’re dealing with simple CSV files or complex Excel spreadsheets, there’s a tool that fits your needs. This article will guide you through the most popular libraries and provide interactive examples to help you get started.

1. Using pandas

pandas is a powerful data manipulation library that provides data structures like DataFrames to work with structured data efficiently. It offers various functions to read and write data from and to different file formats like CSV and Excel.

Reading CSV Files

import pandas as pd

# Read a CSV file into a DataFrame
df = pd.read_csv(‘data.csv’)
print(df.head()) # Display the first 5 rows

Here, pd.read_csv('data.csv') reads the CSV file into a DataFrame. The head() method is used to display the first five rows of the DataFrame.

Reading Excel Files

import pandas as pd

# Read an Excel file into a DataFrame
df = pd.read_excel(‘data.xlsx’, sheet_name=’Sheet1')
print(df.head()) # Display the first 5 rows

Here, pd.read_excel('data.xlsx', sheet_name='Sheet1') reads the specified sheet from the Excel file into a DataFrame.

Writing CSV Files

import pandas as pd

# Write a DataFrame to a CSV file
df.to_csv(‘output.csv’, index=False)

Here, df.to_csv('output.csv', index=False) writes the DataFrame to a CSV file without writing the index column.

Writing Excel Files

import pandas as pd

# Write a DataFrame to an Excel file
df.to_excel(‘output.xlsx’, index=False, sheet_name=’Sheet1')

Here, df.to_excel('output.xlsx', index=False, sheet_name='Sheet1') writes the DataFrame to an Excel file, specifying the sheet name.

2. Using the csv Module

The csv module provides functionality to read from and write to CSV files. It’s a part of Python’s standard library and is suitable for basic CSV operations.

Reading CSV Files

import csv

with open(‘data.csv’, mode=’r’) as file:
csv_reader = csv.reader(file)
for row in csv_reader:
print(row)

Here, csv.reader(file) reads the CSV file, and we iterate through each row using a for loop to print the data.

Writing CSV Files

import csv

with open(‘output.csv’, mode=’w’, newline=’’) as file:
csv_writer = csv.writer(file)
csv_writer.writerow([‘column1’, ‘column2’, ‘column3’])
csv_writer.writerow([‘value1’, ‘value2’, ‘value3’])

Here, csv.writer(file) creates a CSV writer object, and writerow() writes a single row to the CSV file.

3. Using openpyxl

openpyxl is a library for reading and writing Excel files with the .xlsx extension. It is useful for more complex Excel operations.

Reading Excel Files

from openpyxl import load_workbook

# Load the workbook and select a sheet
wb = load_workbook(‘data.xlsx’)
sheet = wb[‘Sheet1’]

for row in sheet.iter_rows(values_only=True):
print(row)

Here, load_workbook('data.xlsx') loads the Excel file, and sheet.iter_rows(values_only=True) iterates over the rows in the sheet, printing the values.

from openpyxl import Workbook

# Create a workbook and add a sheet
wb = Workbook()
sheet = wb.active

# Write data to the sheet
sheet[‘A1’] = ‘Hello’
sheet[‘B1’] = ‘World’

# Save the workbook
wb.save(‘output.xlsx’)

Here, Workbook() creates a new workbook, and sheet['A1'] = 'Hello' writes data to the specified cell. The wb.save('output.xlsx') method saves the workbook to a file.

4. Using xlrd and xlwt

xlrd and xlwt are libraries for reading and writing old-style Excel files with the .xls extension. xlrd is used for reading .xls files, and xlwt is used for writing .xls files.

import xlrd

# Open the workbook and select the first sheet
wb = xlrd.open_workbook(‘data.xls’)
sheet = wb.sheet_by_index(0)

for rowx in range(sheet.nrows):
print(sheet.row_values(rowx))

Here, xlrd.open_workbook('data.xls') opens the Excel file, and sheet.row_values(rowx) returns the values of the specified row.

Writing Excel Files

import xlwt

# Create a workbook and add a sheet
wb = xlwt.Workbook()
sheet = wb.add_sheet(‘Sheet1’)

# Write data to the sheet
sheet.write(0, 0, ‘Hello’)
sheet.write(0, 1, ‘World’)

# Save the workbook
wb.save(‘output.xls’)

Here, xlwt.Workbook() creates a new workbook, and sheet.write(0, 0, 'Hello') writes data to the specified cell. The wb.save('output.xls') method saves the workbook to a file.

Conclusion

Each of these libraries provides different levels of functionality and complexity for working with Excel and CSV data in Python. pandas is the most versatile and powerful for data analysis and manipulation, while csv is suitable for simpler CSV operations. openpyxl and xlrd/xlwt are specialized for handling Excel files, with openpyxl being more modern and suited for .xlsx files and xlrd/xlwt for .xls files.

By choosing the right tool for your needs, you can efficiently handle and analyze your data, making Python a powerful ally in your data processing tasks.

Feel free to connect over Linked

--

--

DS - VRP
DS - VRP

Written by DS - VRP

An aspiring data scientist on a journey of continuous learning and discovery—turning curiosity into insights and challenges into opportunities to innovate

No responses yet