Further Reading Rbf

itertools Iterator functions for efficient looping Watch Raymond Hettinger's Easy AI with Python talk at PyCon 2009 Recipe 576615 Alphametics solver, Raymond Hettinger's original alphametics solver for Python 2 More of Raymond Hettinger's recipes in the ActiveState Code repository Alphametics Index, including lots of puzzles and a generator to make your own Many thanks to Raymond Hettinger for agreeing to relicense his code so I could port it to Python 3 and use it as the basis for this...

Slicing a List

Once you've defined a list, you can get any part of it as a new list. This is called slicing the list. 'a', 'b', 'mpilgrim', gt gt gt a_list 1 3 'b', 'mpilgrim' gt gt gt a_list 1 -1 'b', 'mpilgrim', 'z' gt gt gt a_list 0 3 'a', 'b', 'mpilgrim' gt gt gt a_list 3 'a', 'b', 'mpilgrim' gt gt gt a_list 3 'z', 'example' gt gt gt a_list 'a', 'b', 'mpilgrim', 1. You can get a part of a list, called a slice, by specifying two indices. The return value is a new list containing all the items of the list,...

callable global function

In Python 2, you could check whether an object was callable like a function with the global callable function. In Python 3, this global function has been eliminated. To check whether an object is callable, check for the existence of the_call_ special method. In Python 2, the global zip function took any number of sequences and returned a list of tuples. The first tuple contained the first item from each sequence the second tuple contained the second item from each sequence and so on. In Python...

Binary Files

Not all files contain text. Some of them contain pictures of my dog. gt gt gt an_image open 'examples beauregard.jpg', mode 'rb' gt gt gt an_image.encoding File lt stdin gt , line 1, in lt module gt AttributeError '_io.BufferedReader' object has no attribute 'encoding' 1. Opening a file in binary mode is simple but subtle. The only difference from opening it in text mode is that the mode parameter contains a 'b' character. 2. The stream object you get from opening a file in binary mode has many...

lambda functions that take a tuple instead of multiple

In Python 2, you could define anonymous lambda functions which took multiple parameters by defining the function as taking a tuple with a specific number of items. In effect, Python 2 would unpack the tuple into named arguments, which you could then reference by name within the lambda function. In Python 3, you can still pass a tuple to a lambda function, but the Python interpreter will not unpack the tuple into named arguments. Instead, you will need to reference each argument by its...

Metaclasses

In Python 2, you could create metaclasses either by defining the metaclass argument in the class declaration, or by defining a special class-level_metaclass_attribute. In Python 3, the class-level attribute class Whip metaclass PapayaMeta pass class C Whipper, Beater, metaclass PapayaMeta pass 1. Declaring the metaclass in the class declaration worked in Python 2, and it still works the same in Python 3. 2. Declaring the metaclass in a class attribute worked in Python 2, but doesn't work in...

Strings vs Bytes

Bytes are bytes characters are an abstraction. An immutable sequence of Unicode characters is called a string. An immutable sequence of numbers-between-0-and-255 is called a bytes object. gt gt gt by b' xff' gt gt gt by File lt stdin gt , line 1, in lt module gt TypeError 'bytes' object does not support item assignment 1. To define a bytes object, use the b'' byte literal syntax. Each byte within the byte literal can be an ASCII character or an encoded hexadecimal number from x00 to xff 0-255 ....

Going Further With lxml

lxml is an open source third-party library that builds on the popular libxml2 parser. It provides a 100 compatible ElementTree API, then extends it with full XPath 1.0 support and a few other niceties. There are installers available for Windows Linux users should always try to use distribution-specific tools like yum or apt-get to install precompiled binaries from their repositories. Otherwise you'll need to install lxml manually. gt gt gt from lxml import etree gt gt gt tree etree.parse...

How Not To Fetch Data Over HTTP

Let's say you want to download a resource over HTTP, such as an Atom feed. Being a feed, you're not just going to download it once you're going to download it over and over again. Most feed readers will check for changes once an hour. Let's do it the quick-and-dirty way first, and then see how you can do better. gt gt gt a_url gt gt gt data urllib.request.urlopen a_url .read gt gt gt type data lt class 'bytes' gt gt gt gt print data lt xml version '1.0' encoding 'utf-8' gt feed xmlns 'http...

basestring datatype

Python 2 had two string types Unicode and non-Unicode. But there was also another type, basestring. It was an abstract type, a superclass for both the str and unicode types. It couldn't be called or instantiated directly, but you could pass it to the global isinstance function to check whether an object was either a Unicode or non-Unicode string. In Python 3, there is only one string type, so basestring has no reason to exist. Python 2.3 introduced the itertools module, which defined variants...

Using The Python Shell

The Python Shell is where you can explore Python syntax, get interactive help on commands, and debug short programs. The graphical Python Shell named IDLE also contains a decent text editor that supports Python syntax coloring and integrates with the Python Shell. If you don't already have a favorite text editor, you should give IDLE a try. First things first. The Python Shell itself is an amazing interactive playground. Throughout this book, you'll see examples like this The three angle...

Calculating Permutations The Lazy Way

First of all, what the heck are permutations Permutations are a mathematical concept. There are actually several definitions, depending on what kind of math you're doing. Here I'm talking about combinatorics, but if that doesn't mean anything to you, don't worry about it. As always, Wikipedia is your friend. The idea is that you take a list of things could be numbers, could be letters, could be dancing bears and find all the possible ways to split them up into smaller lists. All the smaller...

Parsing Broken XML

The XML specification mandates that all conforming XML parsers employ draconian error handling. That is, they must halt and catch fire as soon as they detect any sort of wellformedness error in the XML document. Wellformedness errors include mismatched start and end tags, undefined entities, illegal Unicode characters, and a number of other esoteric rules. This is in stark contrast to other common formats like HTML your browser doesn't stop rendering a web page if you forget to close an HTML...

Some Boring Stuff You Need To Understand Before You CanDiveIn

ew people think about it, but text is incredibly complicated. Start with the alphabet. The people of Bougainville have the smallest alphabet in the world their Rotokas alphabet is composed of only 12 letters A, E, G, I, K, O, P, R, S, T, U, and V. On the other end of the spectrum, languages like Chinese, Japanese, and Korean have thousands of characters. English, of course, has 26 letters 52 if you count uppercase and lowercase separately plus a handful of amp punctuation marks. When you talk...

Assigning Multiple Values At Once

Here's a cool programming shortcut in Python, you can use a tuple to assign multiple values at once. gt gt gt v 'a', 2, True gt gt gt x, y, z v I. v is a tuple of three elements, and x, y, z is a tuple of three variables. Assigning one to the other assigns each of the values of v to each of the variables, in order. This has all kinds of uses. Suppose you want to assign names to a range of values. You can use the built-in range function with multi-variable assignment to quickly assign...

Halt And Catch Fire

It is not enough to test that functions succeed when given good input you must also test that they fail when given bad input. And not just any sort of failure they must fail in the way you expect. gt gt gt import roman1 gt gt gt roman1.to_roman 4000 'MMMM' gt gt gt roman1.to_roman 5000 'MMMMM' gt gt gt roman1.to_roman 9000 'MMMMMMMMM' 1. That's definitely not what you wanted that's not even a valid Roman numeral In fact, each of these numbers is outside the range of acceptable input, but the...

Diving In 1

onvention dictates that I should bore you with the fundamental building blocks of programming, so we can slowly work up to building something useful. Let's skip all that. Here is a complete, working Python program. It probably makes absolutely no sense to you. Don't worry about that, because you're going to dissect it line by line. But read through it first and see what, if anything, you can make of it. SUFFIXES 1000 'KB', 'MB', 'GB', 'TB', 'PB', 'EB', 'ZB', 'YB' , 1024 'KiB', 'MiB', 'GiB',...

os getcwdu function

Python 2 had a function named os.getcwd , which returned the current working directory as a non-Unicode string. Because modern file systems can handle directory names in any character encoding, Python 2.3 introduced os.getcwdu . The os.getcwdu function returned the current working directory as a Unicode string. In Python 3, there is only one string type Unicode , so os.getcwd is all you need.

Refactoring

The best thing about comprehensive unit testing is not the feeling you get when all your test cases finally pass, or even the feeling you get when someone else blames you for breaking their code and you can actually prove that you didn't. The best thing about unit testing is that it gives you the freedom to refactor mercilessly. Refactoring is the process of taking working code and making it work better. Usually, better means faster, although it can also mean using less memory, or using less...

xreadlines IO method

In Python 2, file objects had an xreadlinesQ method which returned an iterator that would read the file one line at a time. This was useful in for loops, among other places. In fact, it was so useful, later versions of Python 2 added the capability to file objects themselves. In Python 3, the xreadlines method no longer exists. 2to3 can fix the simple cases, but some edge cases will require manual intervention. 1. If you used to call xreadlines with no arguments, 2to3 will convert it to just...

Beyond HTTP GET

HTTP web services are not limited to GET requests. What if you want to create something new Whenever you post a comment on a discussion forum, update your weblog, publish your status on a microblogging service like Twitter or Identi.ca, you're probably already using HTTP POST. Both Twitter and Identi.ca both offer a simple HTTP-based API for publishing and updating your status in 140 characters or less. Let's look at Identi.ca's API documentation for updating your status Identi.ca rest api...

Relative imports within a package

A package is a group of related modules that function as a single entity. In Python 2, when modules within a package need to reference each other, you use import foo or from foo import Bar. The Python 2 interpreter first searches within the current package to find foo.py, and then moves on to the other directories in the Python search path sys.path . Python 3 works a bit differently. Instead of searching the current package, it goes directly to the Python search path. If you want one module...

Further Reading Eqf

abc Abstract Base Classes module Other light reading Format Specification Mini-Language PEP 357 Allowing Any Object to be Used for Slicing PEP 3119 Introducing Abstract Base Classes

Running to

We're going to migrate the chardet module from Python 2 to Python 3. Python 3 comes with a utility script called 2to3, which takes your actual Python 2 source code as input and auto-converts as much as it can to Python 3. In some cases this is easy a function was renamed or moved to a different module but in other cases it can get pretty complex. To get a sense of all that it can do, refer to the appendix, Porting code to Python 3 with 2to3. In this chapter, we'll start by running 2to3 on the...

Whats On The Wire

To see why this is inefficient and rude, let's turn on the debugging features of Python's HTTP library and see what's being sent on the wire i.e. over the network . gt gt gt from http.client import HTTPConnection gt gt gt HTTPConnection.debuglevel 1 gt gt gt from urllib.request import urlopen gt gt gt response send b'GET examples feed.xml HTTP 1.1 Host diveintopython3.org Accept-Encoding identity User-Agent Python-urllib 3.1' Connection close reply 'HTTP 1.1 200 OK' further debugging...

What is Character Encoding AutoDetection

It means taking a sequence of bytes in an unknown character encoding, and attempting to determine the encoding so you can read the text. It's like cracking a code when you don't have the decryption key. In general, yes. However, some encodings are optimized for specific languages, and languages are not random. Some character sequences pop up all the time, while other sequences make no sense. A person fluent in English who opens a newspaper and finds txzqJv 2 dasd0a QqdKjvz will instantly...

Other Fun Stuff in the i tertools Module

gt gt gt list itertools.product 'ABC '123' 'A', '1' , 'A', '2' , 'A', '3' , 'B', '1' , 'B', '2' , 'B', '3' , 'C', '1' , 'C', '2' , 'C', '3' gt gt gt list itertools.combinations 'ABC', 2 'A', 'B' , 'A', 'C' , 'B', 'C' 1. The itertools.product function returns an iterator containing the Cartesian product of two sequences. 2. The itertools.combinations function returns an iterator containing all the possible combinations of the given sequence of the given length. This is like the...

The Current Working Directory

When you're just getting started with Python, you're going to spend a lot of time in the Python Shell. Throughout this book, you will see examples that go like this 1. Import one of the modules in the examples folder 2. Call a function in that module If you don't know about the current working directory, step 1 will probably fail with an ImportError. Why If you don't know about the current working directory, step 1 will probably fail with an ImportError. Why Because Python will look for the...

IntroducingTHe chardet Module

Before we set off porting the code, it would help if you understood how the code worked This is a brief guide to navigating the code itself. The chardet library is too large to include inline here, but you can download it from chardet.feedparser.org. universaldetector.py, which has one class, UniversalDetector. You might think the main entry point is the detect function in chardet _init_.py, but that's really just a convenience function that creates a UniversalDetector object, calls it, and...

Slicing a String

Once you've defined a string, you can get any part of it as a new string. This is called slicing the string. Slicing strings works exactly the same as slicing lists, which makes sense, because strings are just sequences of characters. gt gt gt a_string 'My alphabet starts where your alphabet ends.' 'alphabet starts where your alphabet en' gt gt gt a_string 0 2 'My alphabet starts' gt gt gt a_string 18 1. You can get a part of a string, called a slice, by specifying two indices. The return value...

Diving In Fxi

o much has changed between Python 2 and Python 3, there are vanishingly few programs that will run unmodified under both. But don't despair To help with this transition, Python 3 comes with a utility script called 2to3, which takes your actual Python 2 source code as input and auto-converts as much as it can to Python 3. Case study porting chardet to Python 3 describes how to run the 2to3 script, then shows some things it can't fix automatically. This appendix documents what it can fix...

The i mport Search Path

Before this goes any further, I want to briefly mention the library search path. Python looks in several places when you try to import a module. Specifically, it looks in all the directories defined in sys.path. This is just a list, and you can easily view it or modify it with standard list methods. You'll learn more about lists in Native Datatypes. gt gt gt import sys gt gt gt sys.path ' usr lib python31.zip', ' usr lib python3.1', ' usr lib python3.1 lib-dynload', ' usr lib python3.1...

Working With Filenames and Directory Names

While we're on the subject of directories, I want to point out the os.path module. os.path contains functions for manipulating filenames and directory names. gt gt gt 'humansize.py' gt gt gt print os.path.expanduser ' ' gt gt gt 'diveintopython3', 'examples', 'humansize.py' 1. The os.path.joinQ function constructs a pathname out of one or more partial pathnames. In this case, it simply concatenates strings. 2. In this slightly less trivial case, calling the os.path.joinQ function will add an...

Getting File Metadata

Every modern file system stores metadata about each file creation date, last-modified date, file size, and so on. Python provides a single API to access this metadata. You don't need to open the file all you need is the filename. gt gt gt print os.getcwd gt gt gt metadata os.stat 'feed.xml' gt gt gt metadata.st_mtime gt gt gt time.localtime metadata.st_mtime time.struct_time tm_year 2009, tm_mon 7, tm_mday 13, tm_hour 17, tm_min 25, tm_sec 44, tm_wday 0, tm_yday 194, tm_isdst 1 1. The current...

Serializing Datatypes Unsupported by json

Even if JSON has no built-in support for bytes, that doesn't mean you can't serialize bytes objects. The json module provides extensibility hooks for encoding and decoding unknown datatypes. By unknown, I mean not defined in JSON. Obviously the json module knows about byte arrays, but it's constrained by the limitations of the JSON specification. If you want to encode bytes or other datatypes that JSON doesn't support natively, you need to provide custom encoders and decoders for those types....

Dictionary Comprehensions

A dictionary comprehension is like a list comprehension, but it constructs a dictionary instead of a list. gt gt gt metadata f, os.stat f for f in glob.glob ' test .py' gt gt gt metadata 0 'alphameticstest.py', nt.stat_result st_mode 33206, st_ino 0, st_dev 0, st_nlink 0, st_uid 0, st_gid 0, st_size 2509, st_atime 1247520344, st_mtime 1247520344, st_ctime 1247520344 gt gt gt metadata_dict f os.stat f for f in glob.glob ' test .py' gt gt gt type metadata_dict gt gt gt list metadata_dict.keys...

Unbound Variables

Take another look at this line of code from the approximate_size function multiple 1024 if a_kilobyte_is_1024_bytes else 1000 You never declare the variable multiple, you just assign a value to it. That's OK, because Python lets you do that. What Python will not let you do is reference a variable that has never been assigned a value. Trying to do so will raise a NameError exception. File lt stdin gt , line 1, in lt module gt NameError name 'x' is not defined gt gt gt x 1 gt gt gt x 1 You will...

Searching For Nodes Within An XML Document

So far, we've worked with this XML document from the top down, starting with the root element, getting its child elements, and so on throughout the document. But many uses of XML require you to find specific elements. Etree can do that, too. gt gt gt import xml.etree.ElementTree as etree gt gt gt tree etree.parse 'examples feed.xml' gt gt gt root tree.getroot gt gt gt lt Element http www.w3.org 2005 Atom entry at e2b4e0 gt , lt Element http www.w3.org 2005 Atom entry at e2b510 gt , lt Element...

StandardError exception

In Python 2, StandardError was the base class for all built-in exceptions other than Stoplteration, GeneratorExit, Keyboardlnterrupt, and SystemExit. In Python 3, StandardError has been eliminated use Exception instead. The types module contains a variety of constants to help you determine the type of an object. In Python 2, it contained constants for all primitive types like diet and int. In Python 3, these constants have been eliminated just use the primitive type name instead....

Stream Objects From NonFile Sources

Imagine you're writing a library, and one of your library functions is going to read some data from a file. The function could simply take a filename as a string, go open the file for reading, read it, and close it before exiting. But you shouldn't do that. Instead, your API should take an arbitrary stream object. In the simplest case, a stream object is anything with a read method which takes an optional size parameter and returns a string. When called with no size parameter, the read method...

Diving In Jdn

uestion what's the 1 cause of gibberish text on the web, in your inbox, and across every computer system ever written It's character encoding. In the Strings chapter, I talked about the history of character encoding and the creation of Unicode, the one encoding to rule them all. I'd love it if I never had to see a gibberish character on a web page again, because all authoring systems stored accurate encoding information, all transfer protocols were Unicode-aware, and every system that handled...

A New Kind Of String Manipulation

Python strings have many methods. You learned about some of those methods in the Strings chapter lower , count , and format . Now I want to introduce you to a powerful but little-known string manipulation technique the translate method. gt gt gt translation_table ord 'A' ord 'O' gt gt gt translation_table gt gt gt 1. String translation starts with a translation table, which is just a dictionary that maps one character to another. Actually, character is incorrect the translation table really...

Loading Data from a json File

Like the pickle module, the json module has a load function which takes a stream object, reads JSON-encoded data from it, and creates a new Python object that mirrors the JSON data structure. File lt stdin gt , line 1, in lt module gt NameError name 'entry' is not defined gt gt gt import json gt gt gt with open 'entry.json', 'r', encoding 'utf-8' as f entry json.load f 'internal_id' '_class_' 'bytes', '_value_' 222, 213, 180, 248 , 'title' 'Dive into history, 2009 edition', 'tags'...

Diving In Mvp

aving grown up the son of a librarian and an English major, I have always been fascinated by languages. Not programming languages. Well yes, programming languages, but also natural languages. Take English. English is a schizophrenic language that borrows words from German, French, Spanish, and Latin to name a few . Actually, borrows is the wrong word pillages is more like it. Or perhaps assimilates like the Borg. Yes, I like that. We are the Borg. Your linguistic and etymological...

Declaring Functions

Python has functions like most other languages, but it does not have separate header files like C or interface implementation sections like Pascal. When you need a function, just declare it, like this def approximate_size size, a_kilobyte_is_1024_bytes True The keyword def starts the function declaration, followed by the function name, followed by the arguments in parentheses. Multiple arguments are separated with commas. Also note that the function doesn't define a return datatype. Python...

Modifying a Dictionary

Dictionaries do not have any predefined size limit. You can add new key-value pairs to a dictionary at any time, or you can modify the value of an existing key. Continuing from the previous example db.diveintopython3.org' database' 'blog' db.diveintopython3.org' user' 'dora' db.diveintopython3.org' User' 'mark' 'server' 'db.diveintopython3.org', 'database' 'mysql' gt gt gt a_dict gt gt gt a_dict 'server' gt gt gt a_dict gt gt gt a_dict 'server' gt gt gt a_dict gt gt gt a_dict 'server' gt gt gt...

Matters of style

The rest of the fixes listed here aren't really fixes per se. That is, the things they change are matters of style, not substance. They work just as well in Python 3 as they do in Python 2, but the developers of Python have a vested interest in making Python code as uniform as possible. To that end, there is an official Python style guide which outlines in excruciating detail all sorts of nitpicky details that you almost certainly don't care about. And given that 2to3 provides such a great...

Things Distutils Cant Do For You

Releasing your first Python package is a daunting process. Releasing your second one is a little easier. Distutils tries to automate as much of it as possible, but there are some things you simply must do yourself. Choose a license. This is a complicated topic, fraught with politics and peril. If you wish to release your software as open source, I humbly offer five pieces of advice 1. Don't write your own license. 2. Don't write your own license. 3. Don't write your own license. 4. It doesn't...

Catching Import Errors

One of Python's built-in exceptions is ImportError, which is raised when you try to import a module and fail. This can happen for a variety of reasons, but the simplest case is when the module doesn't exist in your import search path. You can use this to include optional features in your program. For example, the chardet library provides character encoding auto-detection. Perhaps your program wants to use this library if it exists, but continue gracefully if the user hasn't installed it. You...

Writing to Text Files

You can write to files in much the same way that you read from them. First you open a file and get a stream object, then you use methods on the stream object to write data to the file, then you close the file. To open a file for writing, use the open function and specify the write mode. There are two file modes for writing Write mode will overwrite the file. Pass mode 'w' to the open function. Append mode will add data to the end of the file. Pass mode 'a' to the open function. Either mode will...