Programmer's Python Data - Text Files & CSV
Written by Mike James   
Tuesday, 10 June 2025
Article Index
Programmer's Python Data - Text Files & CSV
Text Formats
The CSV Module
CSV Dialects

CSV Dialects

To determine the format that is use for the CSV file you have to select or configure a dialect. The csv.list_dialects() method lets us discover what predefined dialects are supported. Currently there are three standard classes:

  • csv.excel standard Excel format CSV, short name 'excel'

  • csv.excel_tab Excel format but using TAB separators, short name 'excel-tab'

  • csv.unix_dialect usual format generated by UNIX systems using '\n' as line terminator and quoting all fields, short name 'unix'

You can create your own dialect class by creating a custom class inheriting from csv.dialect.

The dialect class has the following attributes:

  • delimiter single character used to separate fields

  • doublequote two quotes are used to enter a quote character within a quote if this is true, otherwise the quotes are prefixed by the specified escape character.

  • escapechar single character used to prefix special characters when included in data.

  • lineterminator string used to terminate lines, currently the reader always uses \r\n'.

  • quotechar single character used to quote fields.

  • skipinitialspace whitespace following the delimiter is ignored if True, the default is False.

  • quoting determines which fields are quoted:
    QUOTE_ALL all fields
    QUOTE_MINIMAL only fields containing a special
    character, this is the default
    QUOTE_NONNUMERIC all non-numeric fields as floats
    QUOTE_NONE no fields.

  • strict raises exception error on bad CSV input if True, the default is False.

For example, to create and read a CSV file that quotes only non-numeric data you would define a new dialect subclass:

class MyCSV(csv.Dialect):
    quoting =csv.QUOTE_NONNUMERIC
    delimiter      = ','
    quotechar      = '"'
    lineterminator = '\r\n'

Once you have this defined you can use it as a dialect class in the reader and writer objects:

import pathlib
import dataclasses
import csv
@dataclasses.dataclass
class person:
    name:str=""
    id:int=0
    score:float=0.0
    def values(self):
        return [self.name,str(self.id),str(self.score)]
    
class MyCSV(csv.Dialect):
    quoting =csv.QUOTE_NONNUMERIC
    delimiter      = ','
    quotechar      = '"'
    lineterminator = '\r\n'
me=person("mike",42,3.145)
path=pathlib.Path("myTextFile.csv")
with path.open(mode="wt",newline="") as f:
    peopleWriter=csv.writer(f,dialect=MyCSV)
    for i in range(43):
        peopleWriter.writerow([me.name,i,me.score])
        
with path.open(mode="rt") as f:
    peopleReader=csv.reader(f,dialect=MyCSV)
    for row in peopleReader:
        print(row)

If you run this you will see that a side effect of selecting csv.QUOTE_NONNUMERIC is that numeric values are converted to floats. So a typical record is:

['mike', 42.0, 3.145]

Notice that the integer 42 has been converted to a float.

You can also register a dialect name using either a subclass of dialect or as a set of parameters. For example, following:

csv.register_dialect("myDialect",MyCSV)

you can use the dialect via the short name “myDialect”:

peopleReader=csv.reader(f,dialect="myDialect")

Alternatively you can register the dialect directly without having to create a subclass. For example:

csv.register_dialect("myDialect",quoting=csv.QUOTE_NONNUMERIC)

defines the quoting attribute and accepts the defaults for the others.

CSV is a very common format for encoding data, but there are more modern alternatives that are better if you are free to choose the format.

In chapter but not in this extract

  • JSON
  • Multiple Records
  • JSON and Dataclasses
  • XML
  • Python XML
  • ElementTree
  • More XML
  • Pickle

Advanced Pickling

 

Summary

  • Text files are simply binary files where the conversion to a string with suitable decoding is automatic.

  • As well as reading a fixed number of characters, you can also use the readline instruction to read in a single line of text.

  • The print instruction can be used with files and has the advantage of performing the conversion to a string automatically.

  • To make text files able to be read in and decoded you need to use a standard format like CSV, JSON or XML.

  • CSV, Comma Separated Values, is simple but it has a number of disadvantages in that converting from a string to the appropriate data type isn’t generally automatic and there are different dialects of CSV.

  • JSON is a good match to Python’s objects. It is easy to use and is cross-platform.

  • XML is more complicated and probably not a good choice if you can avoid it, but it is a widespread standard and very suitable for representing complex data.

  • XML is not well supported if you are looking for standard processing options such as DOM or SAX. The ElementTree module, however, provides good Python-oriented processing of XML.

  • Pickle is Python’s own object serialization format. It uses a binary file but it is very easy to use to save and load any Python class. Pickle is a good choice if the data is being produced and consumed by Python programs.

Programmer's Python
Everything is Data

Is now available as a print book: Amazon

pythondata360Contents

  1. Python – A Lightning Tour
  2. The Basic Data Type – Numbers
       Extract: Bignum
  3. Truthy & Falsey
  4. Dates & Times
       Extract Naive Dates
  5. Sequences, Lists & Tuples
       Extract Sequences 
  6. Strings
       Extract Unicode Strings
  7. Regular Expressions
       Extract Simple Regular Expressions 
  8. The Dictionary
       Extract The Dictionary 
  9. Iterables, Sets & Generators
       Extract  Iterables 
  10. Comprehensions
       Extract  Comprehensions 
  11. Data Structures & Collections
       Extract Stacks, Queues and Deques
      
    Extract Named Tuples and Counters
  12. Bits & Bit Manipulation
       Extract Bits and BigNum 
       Extract Bit Masks
  13. Bytes
       Extract Bytes And Strings
       Extract Byte Manipulation 
  14. Binary Files
       Extract Files and Paths 
  15. Text Files
       Extract Text Files & CSV 
       Extract JSON ***NEW!!!
  16. Creating Custom Data Classes
        Extract A Custom Data Class 
  17. Python and Native Code
        Extract   Native Code
    Appendix I Python in Visual Studio Code
    Appendix II C Programming Using Visual Studio Code

<ASIN:1871962765>

<ASIN:1871962749>

<ASIN:1871962595>

<ASIN:B0CK71TQ17>

<ASIN:187196265X>

Related Articles

Creating The Python UI With Tkinter

Creating The Python UI With Tkinter - The Canvas Widget

The Python Dictionary

Arrays in Python

Advanced Python Arrays - Introducing NumPy

pico book

 

Comments




or email your comment to: comments@i-programmer.info

To be informed about new articles on I Programmer, sign up for our weekly newsletter, subscribe to the RSS feed and follow us on Twitter, Facebook or Linkedin.

Banner

 

 



Last Updated ( Tuesday, 10 June 2025 )