Coroutines
Coroutines are functions whose processing can be suspended and resumed at specific points. So, typically, a coroutine will execute up to a certain statement, then suspend execution while waiting for some data. At this point other parts of the program can continue to execute usually other coroutines that aren't suspended . Once the data is received the coroutine resumes from the point it was suspended, performs processing presumably based on the data it got , and possibly sending its results to...
generateusernamespy
Imagine we are setting up a new computer system and need to generate user-names for all of our organization's staff. We have a plain text data file UTF-8 encoding where each line represents a record and fields are colon-delimited. Each record concerns one member of the staff and the fields are their unique staff ID, forename, middle name which may be an empty field , surname, and department name. Here is an extract of a few lines from an example data users.txt data file 1601 Albert Lukas...
A Quick Introduction to PyParsing
PyParsing makes no real distinction between lexing and parsing. Instead, it provides functions and classes to create parser elements one element for each thing to be matched. Some parser elements are provided predefined by PyParsing, others can be created by calling PyParsing functions or by instantiating PyParsing classes. Parser elements can also be created by combining other parser elements together for example, concatenating them with to form a sequence of parser elements, or or-ing them...
Exercises Tpz
The first exercise involves copying and modifying the Bookmarks program shown in this chapter the second exercise involves creating a GUI program from scratch. 1. Copy the bookmarks-tk.pywprogram and modify it so that it can import and export the DBM files that the bookmarks.py console program created as an exercise in Chapter 12 uses. Provide two new menu options in the File menu, Import and Export. Make sure you provide keyboard shortuts for both keep in mind that Ctrl E is already in use for...
Context Managers
Context managers allow us to simplify code by ensuring that certain operations are performed before and after a particular block of code is executed. The behavior is achieved because context managers define two special methods, _enter_ and_exit_ , that Python treats specially in the scope of a with statement. When a context manager is created in a with statement its __en- ter_ method is automatically called, and when the context manager goes out of scope after its with statement its_exit_...
Algorithms and Collection Data Types
The bisect module provides functions for searching sorted sequences such as sorted lists, and for inserting items while preserving the sort order. This module's functions use the binary search algorithm, so they are very fast. The heapq module provides functions for turning a sequence such as a list into a heap a collection data type where the first item at index position 0 is always the smallest item, and for inserting and removing items while keeping the sequence as a heap. Unfortunately for...
Xml
There are two widely used approaches to parsing XML documents. One is the DOM Document Object Model and the other is SAX Simple API for XML . Two DOM parsers are provided, one by the xml.dom module and the other by the xml.dom.minidom module. A SAX parser is provided by the xml.sax mod ule. We have already used the xml.sax.saxutils module for its xml.sax.sax-utils.escape function to XML-escape amp , lt , and gt . There is also an xml.sax.saxutils.quoteattr function that does the same thing but...
Branching Using Dictionaries
As we noted earlier, functions are objects like everything else in Python, and a function's name is an object reference that refers to the function. If we write a function's name without parentheses, Python knows we mean the object reference, and we can pass such object references around just like any others. We can use this fact to replace if statements that have lots of elif clauses with a single function call. In Chapter 12 we will review an interactive console program called dvds-dbm.py,...
Summary Ymb
This chapter showed the most widely used techniques for saving and loading collections of data to and from files. We have seen how easy pickles are to use, and how we can handle both compressed and uncompressed files without knowing in advance whether compression has been used. We saw how writing and reading binary data requires care, and saw that the code can be quite long if we need to handle variable length strings. But we also learned that using binary files usually results in the smallest...
DOM Document Object Model
The DOM is a standard API for representing and manipulating an XML document in memory. The code for creating a DOM and writing it to a file, and for parsing an XML file using a DOM, is structurally very similar to the element tree code, only slightly longer. We will begin by reviewing the export_xml_dom method in two parts. This method works in two phases First a DOM is created to reflect the incident data, and then the DOM is written out to a file. Just as with an element tree, some programs...
Parsing the Blocks DomainSpecific Language
The blocks.py program is provided as one of the book's examples. It reads one Py-or more .blk files that use a custom text format blocks format, a made-up Parsing language that are specified on the command line, and for each one creates blocks an SVG Scalable Vector Graphics file with the same name, but with its suffix changed to .svg. While the rendered SVG files could not be accused of being pretty, they provide a good visual representation that makes it easy to see PLY mistakes in the .blk...
Examples Pyv
We have now completed our review of Python's built-in collection data types, and three of the standard library collection types collections.namedtuple, collections.defaultdict, and collections.OrderedDict . Python also provides the collections.deque type, a double-ended queue, and many other collection types are available from third parties and from the Python Package Index, pypi.python.org pypi. But now we will look at a couple of slightly longer examples that draw together many of the things...
Inheritance and Polymorphism
The Circle class builds on the Point class using inheritance. The Circle class adds one additional data attribute radius , and three new methods. It also reimplements a few of Point's methods. Here is the complete class definition def edge_distance_from_origin self return abs self.distance_from_origin - self.radius return self.radius other.radius and super ._eq_ other return Circle 0.radius r , 0.x r , 0.y r .format self Inheritance is achieved simply by listing the class or classes that we...
Times and Dates
The calendar and datetime modules provide functions and classes for date and time handling. However, they are based on an idealized Gregorian calendar, so they are not suitable for dealing with pre-Gregorian dates. Time and date handling is a very complex topic the calendars in use have varied in different places and at different times, a day is not precisely 24 hours, a year is not exactly 365 days, and daylight saving time and time zones vary. The date-time.datetime class but not the...
String Handling
The string module provides some useful constants such as string.ascii_let-ters and string.hexdigits. It also provides the string.Formatter class which we can subclass to provide custom string formatters. The textwrap module can be used to wrap lines of text to a specified width, and to minimize indentation. Python's most powerful string handling module is the re regular expression module. This is covered in Chapter 13. The io.StringIO class can provide a string-like object that behaves like an...
A Generic BinaryRecordFile Class
The BinaryRecordFile.BinaryRecordFile class's API is similar to a list in that we can get set delete a record at a given index position. When a record is deleted, it is simply marked deleted this saves having to move all the records that follow it up to fill the gap, and also means that after a deletion all the original index positions remain valid. Another benefit is that a record can be undeleted simply by unmarking it. The price we pay for this is that deleting records doesn't save any disk...
CommandLine Programming
If we need a program to be able to process text that may have been redirected in the console or that may be in files listed on the command line, we can use the fileinput module's fileinput.input function. This function iterates over all the lines redirected from the console if any and over all the lines in the files listed on the command line, as one continuous sequence of lines. The module can report the current filename and line number at any time using fileinput.filename and fileinput.lineno...
Exercises 1
1. Modify the external_sites.py program to use a default dictionary. This is an easy change requiring an additional import, and changes to just two other lines. A solution is provided in external_sites_ans.py. 2. Modify the uniquewords2.py program so that it outputs the words in frequency of occurrence order rather than in alphabetical order. You'll need to iterate over the dictionary's items and create a tiny two-line function to extract each item's value and pass this function as sorted 's...
Custom Collection Classes
In this section's subsections we will look at custom classes that are responsible for large amounts of data. The first class we will review, Image, is one that holds image data. This class is typical of many data-holding custom classes in that it not only provides in-memory access to its data, but also has methods for saving and loading the data to and from disk. The second and third classes we will study, SortedList and SortedDict, are designed to fill a rare and surprising gap in Python's...
Creating a TCP Server
Since the code for creating servers often follows the same design, rather than having to use the low-level socket module, we can use the high-level socket-server module which takes care of all the housekeeping for us. All we have to do is provide a request handler class with a handle method which is used to read requests and write replies. The socketserver module handles the communications for us, servicing each connection request, either serially or by passing each request to its own separate...
DBM Databases
The shelve module provides a wrapper around a DBM that allows us to interact bytes with the DBM as though it were a dictionary, providing that we use only string 293 keys and picklable values. Behind the scenes the shelve module converts the keys and values to and from bytes objects. The shelve module uses the best underlying DBM available, so it is possible that a DBM file saved on one machine won't be readable on another, if the other machine doesn't have the same DBM. One solution is to...
File Handling
Most programs need to save and load information, such as data or state information, to and from files. Python provides many different ways of doing this. We already briefly discussed handling text files in Chapter 3 and pickles in the preceding chapter. In this chapter we will cover file handling in much more depth. All the techniques presented in this chapter are platform-independent. This means that a file saved using one of the example programs on one operating system processor architecture...
Integral Types
Python provides two built-in integral types, int and bool. Both integers and Booleans are immutable, but thanks to Python's augmented assignment operators this is rarely noticeable. When used in Boolean expressions, 0 and False are False, and any other integer and True are True. When used in numerical expressions True evaluates to 1 and False to 0. This means that we can write some rather odd things for example, we can increment an integer, i, using the expression i True. Naturally, the correct...
BNF Syntax and Parsing Terminology
Parsing is a means of transforming data that is in some structured format whether the data represents actual data, or statements in a programming language, or some mixture of both into a representation that reflects the data's structure and that can be used to infer the meaning that the data represents. The parsing process is most often done in two phases lexing also called lexical analysis, tokenizing, or scanning , and parsing proper also called syntactic analysis . For example, given a...
Using the Multiprocessing Module
In some situations we already have programs that have the functionality we need but we want to automate their use. We can do this by using Python's sub-process module which provides facilities for running other programs, passing any command-line options we want, and if desired, communicating with them using pipes. We saw one very simple example of this in Chapter 5 when we used the subprocess.call function to clear the console in a platform-specific way. But we can also use these facilities to...
Pythonic Parsing with PyParsing
Writing recursive descent parsers by hand can be quite tricky to get right, and if we need to create many parsers it can soon become tedious both to write them and especially to maintain them. One obvious solution is to use a generic parsing module, and those experienced with BNFs or with the Unix lex and yacc tools will naturally gravitate to similar tools. In the section following this one we cover PLY Python Lex Yacc , a tool that exemplifies this classic approach. But in this section we...
The ObjectOriented Approach
In this section we will look at some of the problems of a purely procedural approach by considering a situation where we need to represent circles, potentially lots of them. The minimum data required to represent a circle is its x, y position and its radius. One simple approach is to use a 3-tuple for each circle. For example One drawback of this approach is that it isn't obvious what each element of the tuple represents. We could mean x, y, radius or, just as easily, radius, x, y . Another...
Creating a TCP Client
The client program is car_registration.py. Here is an example of interaction with the server already running, and with the menu edited slightly to fit on the page C ar M ileage O wner N ew car S top server Q uit c C ar M ileage O wner N ew car S top server Q uit c m License 024 HYR Mileage 97543 103491 Mileage successfully changed The data entered by the user is shown in bold where there is no visible input it means that the user pressed Enter to accept the default. Here the user has asked to...
FloatingPoint Numbers
All the numeric operators and functions in Table 2.2 55 lt can be used with floats, including the augmented assignment versions. The float data type can be called as a function with no arguments it returns 0.0, with a float argument it returns a copy of the argument, and with any other argument it attempts to convert the given object to a float. When used for conversions a string argument can be given, either using simple decimal notation or using exponential notation. It is possible that NaN...
Playlist Data Parsing
The playlists.py program mentioned in the previous subsection can read and write .pls format files. In this subsection we will write a parser that can read files in .m3u format and that returns its results in the form of a list of collections.namedtuple objects, each of which holds a title, a duration in seconds, and a filename. As usual, we will begin by looking at an extract of the data we want to parse, then we will create a suitable BNF, and finally we will create a parser to parse the...
Function Annotations
Functions and methods can be defined with annotations expressions that can be used in a function's signature. Here's the general syntax def functionName par1 expl, par2 exp2, , parN expN - gt rexp suite Every colon expression part expX is an optional annotation, and so is the arrow return expression part - gt rexp . The last or only positional parameter if present can be of the form args, with or without an annotation similarly, the last or only keyword parameter if present can be of the form...
Integers
The size of an integer is limited only by the machine's memory, so integers hundreds of digits long can easily be created and worked with although they will be slower to use than integers that can be represented natively by the machine's processor. When an invalid identifier is used it causes a SyntaxError exception to be raised. In each case the part of the error message that appears in parentheses varies, so we have replaced it with an ellipsis. The first assignment fails because - is not a...
Custom Modules
Since modules are just .py files they can be created without formality. In this section we will look at two custom modules. The first module, TextUtil in file TextUtil.py , contains just three functions is_balanced which returns True if the string it is passed has balanced parentheses of various kinds, shorten shown earlier 177 lt , and simplify , a function that can strip spurious whitespace and other characters from a string. In the coverage of this module we will also see how to execute the...
I
id built-in , 254 identifiers, 51-54,127 identity testing see is identity operator 13-14, 364, 424-425 if statement , 159-161 Image.py example ,261-269 IMAP4 Internet Message Access Protocol , 226 imaplib module, 226 immutable arguments, 175 immutable attributes, 264 immutable classes, 256, 261 immutable objects, 15,16,108, 113, 126 import order policy, 196 ImportError exception , 198, 221, 350 imports, dynamic, 346-351 imports, relative, 202 122,140,265, 274 indentation, for block structure,...
Summary Ede
This chapter showed that creating network clients and servers can be quite straightforward in Python thanks to the standard library's networking modules, and the struct and pickle modules. In the first section we developed a client program and gave it a single function, handle_request , to send and receive arbitrary picklable data to and from a server using a generic data format of length plus pickle. In the second section we saw how to create a server subclass using the classesfrom the...
Piece Collection Data Types
It is often convenient to hold entire collections of data items. Python provides several collection data types that can hold items, including associative arrays and sets. But here we will introduce just two tuple and list. Python tuples and lists can be used to hold any number of data items of any data types. Tuples are immutable, so once they are created we cannot change them. Lists are mutable, so we can easily insert items and remove items whenever we want. Tuples are created using commas ,...
Raw Binary Data with Optional Compression
Writing our own code to handle raw binary data gives us complete control over our file format. It should also be safer than using pickles, since maliciously invalid data will be handled by our code rather than executed by the interpreter. When creating custom binary file formats it is wise to create a magic number to identify your file type, and a version number to identify the version of the file format in use. Here are the definitions used in the convert-incidents.py program MAGIC bAIB x00...
Simple KeyValue Data Parsing
The book's examples include a program called playlists.py. This program can PyPars-read a playlist in .m3u extended Moving Picture Experts Group Audio Layer ingkey-3 Uniform Resource Locator format, and output an equivalent playlist in .pls Play List 2 format or vice versa. In this subsection we will write a parser for .pls format, and in the following subsection we will write a parser for .m3u format. Both parsers are handcrafted and both use regexes. PLY The .pls format is essentially the...
LexYaccStyle Parsing with PLY
PLY Python Lex Yacc is a pure Python implementation of the classic Unix tools, lex and yacc. Lex is a tool that creates lexers, and yacc is a tool that creates parsers often using a lexer created by lex. PLY is described by its author, David Beazley, as reasonably efficient and well suited for larger grammars. It provides most of the standard lex yacc features including support for empty productions, precedence rules, error recovery, and support for ambiguous grammars. PLY is straightforward to...
Slicing and Striding Strings
Piece 3 We know from Piece 3 that individual items in a sequence, and therefore in- 18 lt dividual characters in a string, can be extracted using the item access operator . In fact, this operator is much more versatile and can be used to extract not just one item or character, but an entire slice subsequence of items or characters, in which context it is referred to as the slice operator. First we will begin by looking at extracting individual characters. Index positions into a string begin at...
Pickles with Optional Compression
It is usually easier when creating file formats to write the saving code before the loading code, so we will begin by seeing how to save the incidents into a pickle. def export_pickle self, filename, compress False fh None try fh gzip.open filename, wb else Pickles offer the simplest approach to saving and loading data from Python programs, but as we noted in the preceding chapter, pickles have no security mechanisms no encryption, no digital signature , so loading a pickle that comes from an...
Playlist Data Parsing 1
Hand- In the previous section's second subsection we created a handcrafted regex- PLY crafted based parser for .m3u files. In this subsection we will create a parser to do the m3u m u same thing, but this time using the PyParsing module. An extract from a .m3u parser file is shown in Figure 14.6 523 lt , and the BNF is shown in Figure 14.7 557 As we did when reviewing the previous subsection's .pls parser, we will review the .m3u parser in three parts first the creation of the parser, then the...
String Formatting with the strformat Method
The str.format method provides a very flexible and powerful way of creating strings. Using str.format is easy for simple cases, but for complex formatting we need to learn the formatting syntax the method requires. The str.format method returns a new string with the replacement fields in its string replaced with its arguments suitably formatted. For example gt gt gt The novel ' 0 ' was published in 1 .format Hard Times, 1854 The novel 'Hard Times' was published in 1854 Each replacement field is...
Simple KeyValue Data Parsing 1
Hand- In the previous section's first subsection we created a handcrafted regex-based PLY crafted key-value parser that was used by the playlists.py program to read .pls files. keykey- In this subsection we will create a parser to do the same job, but this time using value value parser parser the PyParsing module. V 555 519 lt As before, the purpose of our parser is to populate a dictionary with key-value items matching those in the file, but with lowercase keys. An extract from a .pls file is...
Thedel Special Method
The_del_ self special method is called when an object is destroyed at least in theory. In practice,_del_ may never be called, even at program termination. Furthermore, when we write del x, all that happens is that the object reference x is deleted and the count of how many object references refer to the object that was referred to by x is decreased by 1. Only when this count reaches 0 is_del_ likely to be called, but Python offers no guarantee that it will ever be called. In view of this,_del_...
Exercises
1. Modify the print_unicode.py program so that the user can enter several separate words on the command line, and print rows only where the Unicode character name contains all the words the user has specified. This means that we can type commands like this One way of doing this is to replace the word variable which held 0, None, or a string , with a words list. Don't forget to update the usage information as well as the code. The changes involve adding less than ten lines of code, and changing...
Writing Handcrafted Parsers
In this section we will develop three handcrafted parsers. The first is little more than an extension of the key-value regex seen in the previous chapter, but shows the infrastructure needed to use such a regex. The second is also regex-based, but is actually a finite state automata since it has two states. Both the first and second examples are data parsers. The third example is a parser for a DSL and uses recursive descent since the DSL allows expressions to be nested. In later sections we...
Generate Grid.py
e constant math module , 60 editor IDLE , 13-14, 364, 424-425 element trees see xml.etree package elif statement see if statement else statement see for loop, if statement, and while loop email module, 226 encode str type , 73, 92, 93, 296, 336,419, 441 encoding attribute file object , 325 encoding errors, 167 encodings, 91-94 encodings, XML, 314 end match object , 507 END constant tkinter module , 583, endpos attribute match object , 507 endswith bytearray type, 299 bytes type, 299 str type,...
Functors
In Python a function object is an object reference to any callable, such as a function, a lambda function, or a method. The definition also includes classes, since an object reference to a class is a callable that, when called, returns an object of the given class for example, x int 5 . In computer science a functor is an object that can be called as though it were a function, so in Python terms a functor is just another kind of function object. Any class that has a_call_ special method is a...
F
fabs math module , 60 factorial math module , 60 factory functions, 136 False built-in constant see bool type fetchall cursor object , 482,485 fetchmany cursor object , 482 fetchone cursor object , 482,484, 486 File associations, Windows, 11 file extension see extension file globbing, 343 file handling, 222-225 file object, 370 close , 131,167, 325 closed attribute, 325 encoding attribute, 325 fileno , 325 flush , 325, 327 isatty , 325 methods, table of, 325, 326 mode attribute, 325 name...


