Indexing Lists Versus Dictionaries

A text, as we have seen, is treated in Python as a list of words. An important property of lists is that we can "look up" a particular item by giving its index, e.g., textl[l00]. Notice how we specify a number and get back a word. We can think of a list as a simple kind of table, as shown in Figure 5-2.

0 Call

1 me

2 Ishmael

Figure 5-2. List lookup: We access the contents of a Python list with the help of an integer index.

Contrast this situation with frequency distributions (Section 1.3), where we specify a word and get back a number, e.g., fdist['monstrous'], which tells us the number of times a given word has occurred in a text. Lookup using words is familiar to anyone who has used a dictionary. Some more examples are shown in Figure 5-3.

Phone List

Domain Name Resolution

Word Frequency Table

Phone List

Alex

xl 54

Dana

x642

Kim

X911

Les

X120

Sandy

X124

Domain Name Resolution

aclweb.org

128.231.23.4

amazon.com

12.118.92.43

google.com

28.31.23.124

python.org

18,21.3.144

sourceforge.net

51.98.23.53

Word Frequency Table

computational

25

language

196

linguistics

17

natural

56

processing

57

Figure 5-3. Dictionary lookup: we access the entry of a dictionary using a key such as someone's name, a web domain, or an English word; other names for dictionary are map, hashmap, hash, and associative array.

In the case of a phonebook, we look up an entry using a name and get back a number. When we type a domain name in a web browser, the computer looks this up to get back an IP address. A word frequency table allows us to look up a word and find its frequency in a text collection. In all these cases, we are mapping from names to numbers, rather than the other way around as with a list. In general, we would like to be able to map between arbitrary types of information. Table 5-4 lists a variety of linguistic objects, along with what they map.

Table 5-4. Linguistic objects as mappings from keys to values

Linguistic object Maps from Maps to

Document Index Word List of pages (where word is found)

Thesaurus Word sense List of synonyms

Dictionary Headword Entry (part-of-speech, sense definitions, etymology)

Comparative Wordlist Gloss term Cognates (list of words, one per language)

Morph Analyzer Surface form Morphological analysis (list of component morphemes)

Most often, we are mapping from a "word" to some structured object. For example, a document index maps from a word (which we can represent as a string) to a list of pages (represented as a list of integers). In this section, we will see how to represent such mappings in Python.

0 0

Post a comment

  • Receive news updates via email from this site