CSV files

By Martin McBride, 2018-08-07
Tags: csv file open close csv reader csv writer csv writerow
Categories: python language beginning python


Introduction

In this lesson we will look at how to use CSV (Comma Separated Values) files in Python:

  • What is CSV?
  • Reading data
  • Writing data

What is CSV?

A CSV file, or comma-separated values file, is a special type of file for storing structured data. It can be used to store data records, in a table. CSV files can be used to store spreadsheet data.

The idea of a CSV file is that each line of text stores several values, like a single row of a spreadsheet. For example, here is a CSV containing information about capital cities - the name, the country and the approximate population in millions (in 2017):

Beijing, China, 21
New Delhi, India, 17
Tokyo, Japan, 13
Manila, Philippines, 15
Moscow, Russia, 12

Spreadsheet programs like Calc or Excel can load CSV files. If you load this information into a spreadsheet, it would look like this:

You can also read and write CSV files using Python, as we will see in this section.

Reading data

To read data from a CSV file, you need to open the file in the normal way.

You then use csv.reader to read the data. Be sure to close the file afterwards. You also need to import the csv module:

import csv

f = open('cities.csv')
csv = csv.reader(f)

for line in csv:
    print(line)

f.close()

Trying the code

To run this code, you should create a text file in any text editor (such as Windows Notepad or similar), and type in the following lines (or cut and paste to avoid errors):

Beijing, China, 21
New Delhi, India, 17
Tokyo, Japan, 13
Manila, Philippines, 15
Moscow, Russia, 12

Save the file as cities.csv in your Python home folder (the same folder that your Python files are stored in).

Result This part of the code loops through the CSV file, one line at a time:

for line in csv:
    print(line)

If you look at the output, you will see that each time through the loop, the variable line contains a list of that line's values:

['Beijing', ' China', '21']
['New Delhi', ' India', '17']
['Tokyo', ' Japan', '13']
['Manila', ' Philippines', '15']
['Moscow', ' Russia', '12']

The CSV reader reads the input file and automatically splits each line when it sees a comma. It stores each line in a list.

Formatting the output

Rather than just printing the output as a Python list, we can access the individual elements to format the data.

  • line[0] is the name of the city
  • line[1] is the name of the country for that city
  • line[2] is the approximate population of the city

So we can change our print code to make it easier to read the values:

for line in csv:
    print('City:', line[0],
          'Country:', line[1],
          'Pop:', line[2])

Of course, you could put more effort into improving the display by lining up the columns etc.

Writing data

To write data from a CSV file, you need to open the file in the normal way.

You then use csv.writer to write rows of data to the file. Each row of data is stored as a list. The elements of the list are written out to file, separated by commas. Each new row is written on a separate line.

import csv

f = open('output.csv', 'w', newline='')
csv = csv.writer(f)

for i in range(5):
    data = [i, i*2, i*10]
    csv.writerow(data)

f.close()

Understanding the code

We first open the file for writing:

f = open('output.csv', 'w', newline='')

This is fairly standard, but notice that we use the parameter newline=''. This is important - if you don't use it you will get a blank line between each real line on your output file.

We then set up a csv.writer to write data to the file:

csv = csv.writer(f)

In the loop, we generate some data records. Each record has 3 values, stored in a list. Each record creates one line in the file, and since the loop generates 5 records we will get a 5 line file. We use writerow to write out the data.

for i in range(5):
    data = [i, i*2, i*10]
    csv.writerow(data)

As always, we remember to close the file at the end.

Trying the code

When you run the code, it will create a file called output.csv in your Python folder (the same place your Python source files are stored). You can open the file in a text editor, such as Windows Notepad. You should see something like this:

0,0,0
1,2,10
2,4,20
3,6,30
4,8,40

The lines are generated by the code

data = [i, i*2, i*10]

This code just generates some artificial data so we can see the code working:

On the first line, i is zero, so i*2 is zero, and i*10 is zero.
On the second line, i is 1, so i*2 is 2, and i*10 is 10.
etc.

See also

If you found this article useful, you might be interested in the book NumPy Recipes or other books by the same author.

Join the PythonInformer Newsletter

Sign up using this form to receive an email when new content is added:

Popular tags

2d arrays abstract data type alignment and angle animation arc array arrays bar chart bar style behavioural pattern bezier curve built-in function callable object chain circle classes clipping close closure cmyk colour combinations comparison operator comprehension context context manager conversion count creational pattern data science data types decorator design pattern device space dictionary drawing duck typing efficiency ellipse else encryption enumerate fill filter font font style for loop formula function function composition function plot functools game development generativepy tutorial generator geometry gif global variable gradient greyscale higher order function hsl html image image processing imagesurface immutable object in operator index inner function input installing iter iterable iterator itertools join l system lambda function latex len lerp line line plot line style linear gradient linspace list list comprehension logical operator lru_cache magic method mandelbrot mandelbrot set map marker style matplotlib monad mutability named parameter numeric python numpy object open operator optimisation optional parameter or pandas partial application path pattern permutations pie chart pil pillow polygon pong positional parameter print product programming paradigms programming techniques pure function python standard library radial gradient range recipes rectangle recursion reduce regular polygon repeat rgb rotation roundrect scaling scatter plot scipy sector segment sequence setup shape singleton slice slicing sound spirograph sprite square str stream string stroke structural pattern subpath symmetric encryption template tex text text metrics tinkerbell fractal transform translation transparency triangle truthy value tuple turtle unpacking user space vectorisation webserver website while loop zip zip_longest