To determine the format that is use for the CSV file you have to select or configure a dialect. The csv.list_dialects() method lets us discover what predefined dialects are supported. Currently there are three standard classes:
csv.excel standard Excel format CSV, short name 'excel'
csv.excel_tab Excel format but using TAB separators, short name 'excel-tab'
csv.unix_dialect usual format generated by UNIX systems using '\n' as line terminator and quoting all fields, short name 'unix'
You can create your own dialect class by creating a custom class inheriting from csv.dialect.
The dialect class has the following attributes:
delimiter single character used to separate fields
doublequote two quotes are used to enter a quote character within a quote if this is true, otherwise the quotes are prefixed by the specified escape character.
escapechar single character used to prefix special characters when included in data.
lineterminator string used to terminate lines, currently the reader always uses \r\n'.
quotechar single character used to quote fields.
skipinitialspace whitespace following the delimiter is ignored if True, the default is False.
quoting determines which fields are quoted: QUOTE_ALL all fields QUOTE_MINIMAL only fields containing a special character, this is the default QUOTE_NONNUMERIC all non-numeric fields as floats QUOTE_NONE no fields.
strict raises exception error on bad CSV input if True, the default is False.
For example, to create and read a CSV file that quotes only non-numeric data you would define a new dialect subclass:
Once you have this defined you can use it as a dialect class in the reader and writer objects:
import pathlib
import dataclasses
import csv
@dataclasses.dataclass
class person:
name:str=""
id:int=0
score:float=0.0
def values(self):
return [self.name,str(self.id),str(self.score)]
class MyCSV(csv.Dialect):
quoting =csv.QUOTE_NONNUMERIC
delimiter = ','
quotechar = '"'
lineterminator = '\r\n'
me=person("mike",42,3.145)
path=pathlib.Path("myTextFile.csv")
with path.open(mode="wt",newline="") as f:
peopleWriter=csv.writer(f,dialect=MyCSV)
for i in range(43):
peopleWriter.writerow([me.name,i,me.score])
with path.open(mode="rt") as f:
peopleReader=csv.reader(f,dialect=MyCSV)
for row in peopleReader:
print(row)
If you run this you will see that a side effect of selecting csv.QUOTE_NONNUMERIC is that numeric values are converted to floats. So a typical record is:
['mike', 42.0, 3.145]
Notice that the integer 42 has been converted to a float.
You can also register a dialect name using either a subclass of dialect or as a set of parameters. For example, following:
csv.register_dialect("myDialect",MyCSV)
you can use the dialect via the short name “myDialect”:
peopleReader=csv.reader(f,dialect="myDialect")
Alternatively you can register the dialect directly without having to create a subclass. For example:
defines the quoting attribute and accepts the defaults for the others.
CSV is a very common format for encoding data, but there are more modern alternatives that are better if you are free to choose the format.
In chapter but not in this extract
JSON
Multiple Records
JSON and Dataclasses
XML
Python XML
ElementTree
More XML
Pickle
Advanced Pickling
Summary
Text files are simply binary files where the conversion to a string with suitable decoding is automatic.
As well as reading a fixed number of characters, you can also use the readline instruction to read in a single line of text.
The print instruction can be used with files and has the advantage of performing the conversion to a string automatically.
To make text files able to be read in and decoded you need to use a standard format like CSV, JSON or XML.
CSV, Comma Separated Values, is simple but it has a number of disadvantages in that converting from a string to the appropriate data type isn’t generally automatic and there are different dialects of CSV.
JSON is a good match to Python’s objects. It is easy to use and is cross-platform.
XML is more complicated and probably not a good choice if you can avoid it, but it is a widespread standard and very suitable for representing complex data.
XML is not well supported if you are looking for standard processing options such as DOM or SAX. The ElementTree module, however, provides good Python-oriented processing of XML.
Pickle is Python’s own object serialization format. It uses a binary file but it is very easy to use to save and load any Python class. Pickle is a good choice if the data is being produced and consumed by Python programs.