Programmer's Python Data - Text Files & CSV
Written by Mike James   
Tuesday, 10 June 2025
Article Index
Programmer's Python Data - Text Files & CSV
Text Formats
The CSV Module
CSV Dialects

CSV Dialects

To determine the format that is use for the CSV file you have to select or configure a dialect. The csv.list_dialects() method lets us discover what predefined dialects are supported. Currently there are three standard classes:

  • csv.excel standard Excel format CSV, short name 'excel'

  • csv.excel_tab Excel format but using TAB separators, short name 'excel-tab'

  • csv.unix_dialect usual format generated by UNIX systems using '\n' as line terminator and quoting all fields, short name 'unix'

You can create your own dialect class by creating a custom class inheriting from csv.dialect.

The dialect class has the following attributes:

  • delimiter single character used to separate fields

  • doublequote two quotes are used to enter a quote character within a quote if this is true, otherwise the quotes are prefixed by the specified escape character.

  • escapechar single character used to prefix special characters when included in data.

  • lineterminator string used to terminate lines, currently the reader always uses \r\n'.

  • quotechar single character used to quote fields.

  • skipinitialspace whitespace following the delimiter is ignored if True, the default is False.

  • quoting determines which fields are quoted:
    QUOTE_ALL all fields
    QUOTE_MINIMAL only fields containing a special
    character, this is the default
    QUOTE_NONNUMERIC all non-numeric fields as floats
    QUOTE_NONE no fields.

  • strict raises exception error on bad CSV input if True, the default is False.

For example, to create and read a CSV file that quotes only non-numeric data you would define a new dialect subclass:

class MyCSV(csv.Dialect):
    quoting =csv.QUOTE_NONNUMERIC
    delimiter      = ','
    quotechar      = '"'
    lineterminator = '\r\n'

Once you have this defined you can use it as a dialect class in the reader and writer objects:

import pathlib
import dataclasses
import csv
@dataclasses.dataclass
class person:
    name:str=""
    id:int=0
    score:float=0.0
    def values(self):
        return [self.name,str(self.id),str(self.score)]
    
class MyCSV(csv.Dialect):
    quoting =csv.QUOTE_NONNUMERIC
    delimiter      = ','
    quotechar      = '"'
    lineterminator = '\r\n'
me=person("mike",42,3.145)
path=pathlib.Path("myTextFile.csv")
with path.open(mode="wt",newline="") as f:
    peopleWriter=csv.writer(f,dialect=MyCSV)
    for i in range(43):
        peopleWriter.writerow([me.name,i,me.score])
        
with path.open(mode="rt") as f:
    peopleReader=csv.reader(f,dialect=MyCSV)
    for row in peopleReader:
        print(row)

If you run this you will see that a side effect of selecting csv.QUOTE_NONNUMERIC is that numeric values are converted to floats. So a typical record is:

['mike', 42.0, 3.145]

Notice that the integer 42 has been converted to a float.

You can also register a dialect name using either a subclass of dialect or as a set of parameters. For example, following:

csv.register_dialect("myDialect",MyCSV)

you can use the dialect via the short name “myDialect”:

peopleReader=csv.reader(f,dialect="myDialect")

Alternatively you can register the dialect directly without having to create a subclass. For example:

csv.register_dialect("myDialect",quoting=csv.QUOTE_NONNUMERIC)

defines the quoting attribute and accepts the defaults for the others.

CSV is a very common format for encoding data, but there are more modern alternatives that are better if you are free to choose the format.

In chapter but not in this extract

  • JSON
  • Multiple Records
  • JSON and Dataclasses
  • XML
  • Python XML
  • ElementTree
  • More XML
  • Pickle

Advanced Pickling

 

Summary

  • Text files are simply binary files where the conversion to a string with suitable decoding is automatic.

  • As well as reading a fixed number of characters, you can also use the readline instruction to read in a single line of text.

  • The print instruction can be used with files and has the advantage of performing the conversion to a string automatically.

  • To make text files able to be read in and decoded you need to use a standard format like CSV, JSON or XML.

  • CSV, Comma Separated Values, is simple but it has a number of disadvantages in that converting from a string to the appropriate data type isn’t generally automatic and there are different dialects of CSV.

  • JSON is a good match to Python’s objects. It is easy to use and is cross-platform.

  • XML is more complicated and probably not a good choice if you can avoid it, but it is a widespread standard and very suitable for representing complex data.

  • XML is not well supported if you are looking for standard processing options such as DOM or SAX. The ElementTree module, however, provides good Python-oriented processing of XML.

  • Pickle is Python’s own object serialization format. It uses a binary file but it is very easy to use to save and load any Python class. Pickle is a good choice if the data is being produced and consumed by Python programs.

Programmer's Python
Everything is Data

Is now available as a print book: Amazon

pythondata360Contents

  1. Python – A Lightning Tour
  2. The Basic Data Type – Numbers
       Extract: Bignum
  3. Truthy & Falsey
  4. Dates & Times
       Extract Naive Dates
  5. Sequences, Lists & Tuples
       Extract Sequences 
  6. Strings
       Extract Unicode Strings
  7. Regular Expressions
       Extract Simple Regular Expressions 
  8. The Dictionary
       Extract The Dictionary 
  9. Iterables, Sets & Generators
       Extract  Iterables 
  10. Comprehensions
       Extract  Comprehensions 
  11. Data Structures & Collections
       Extract Stacks, Queues and Deques
      
    Extract Named Tuples and Counters
  12. Bits & Bit Manipulation
       Extract Bits and BigNum 
       Extract Bit Masks ***NEW!!!
  13. Bytes
       Extract Bytes And Strings
       Extract Byte Manipulation 
  14. Binary Files
       Extract Files and Paths 
  15. Text Files
       Extract Text Files & CSV 
  16. Creating Custom Data Classes
        Extract A Custom Data Class 
  17. Python and Native Code
        Extract   Native Code
    Appendix I Python in Visual Studio Code
    Appendix II C Programming Using Visual Studio Code

Related Articles

Creating The Python UI With Tkinter

Creating The Python UI With Tkinter - The Canvas Widget

The Python Dictionary

Arrays in Python

Advanced Python Arrays - Introducing NumPy

pico book

 

Comments




or email your comment to: comments@i-programmer.info

To be informed about new articles on I Programmer, sign up for our weekly newsletter, subscribe to the RSS feed and follow us on Twitter, Facebook or Linkedin.

Banner

 

 



Last Updated ( Tuesday, 10 June 2025 )