generateusernamespy

Imagine we are setting up a new computer system and need to generate user-names for all of our organization's staff. We have a plain text data file (UTF-8 encoding) where each line represents a record and fields are colon-delimited. Each record concerns one member of the staff and the fields are their unique staff ID, forename, middle name (which may be an empty field), surname, and department name. Here is an extract of a few lines from an example data/users.txt data file:

1601:Albert:Lukas:Montgomery:Legal 3702:Albert:Lukas:Montgomery:Sales 4730:Nadelle::Landale:Warehousing

The program must read in all the data files given on the command line, and for every line (record) must extract the fields and return the data with a suitable username. Each username must be unique and based on the person's name. The output must be text sent to the console, sorted alphabetically by surname and forename, for example:

Name ID Username

Each record has exactly five fields, and although we could refer to them by number, we prefer to use names to keep our code clear:

ID, FORENAME, MIDDLENAME, SURNAME, DEPARTMENT = range(5)

It is a Python convention that identifiers written in all uppercase characters are to be treated as constants.

We also need to create a named tuple type for holding the data on each user:

User = collections.namedtuple("User",

"username forename middlename surname id")

We will see how the constants and the User named tuple are used when we look at the rest of the code.

The program's overall logic is captured in the main() function: def main():

if len(sys.argv) == 1 or sys.argv[1] in {"-h", "—help"}: print("usage: {0} file1 [file2 [... fileN]]".format(

for filename in sys.argv[1:]:

for line in open(filename, encoding="utf8"): line = line.rstrip() if line:

user = process_line(line, usernames) users[(user.surname.lower(), user.forename.lower(), user.id)] = user print_users(users)

If the user doesn't provide any filenames on the command line, or if they type "-h" or "--help" on the command line, we simply print a usage message and terminate the program.

For each line read, we strip off any trailing whitespace (e.g., \n) and process only nonempty lines. This means that if the data file contains blank lines they will be safely ignored.

We keep track of all the allocated usernames in the usernames set to ensure that we don't create any duplicates. The data itself is held in the users dictionary, with each user (member of the staff) stored as a dictionary item whose key is a tuple of the user's surname, forename, and ID, and whose value is a named tuple of type User. Using a tuple of the user's surname, forename, and ID for the dictionary's keys means that if we call sorted() on the dictionary, the iterable returned will be in the order we want (i.e., surname, forename, ID), without us having to provide a key function.

def process_line(line, usernames): fields = line.split(":")

username = generate_username(fields, usernames) user = User(username, fields[FORENAME], fields[MIDDLENAME], fields[SURNAME], fields[ID])

return user

Since the data format for each record is so simple, and because we've already stripped the trailing whitespace from the line, we can extract the fields simply by splitting on the colons. We pass the fields and the usernames set to the generate_username() function, and then we create an instance of the User named tuple type which we then return to the caller (main()), which inserts the user into the users dictionary, ready for printing.

If we had not created suitable constants to hold the index positions, we would be reduced to using numeric indexes, for example:

user = User(username, fields[1], fields[2], fields[3], fields[0])

Although this is certainly shorter, it is poor practice. First it isn't clear to future maintainers what each field is, and second it is vulnerable to data file format changes—if the order or number of fields in a record changes, this code will break everywhere it is used. But by using named constants in the face of changes to the record struture, we would have to change only the values of the constants, and all uses of the constants would continue to work.

def generate_username(fields, usernames):

username = ((fields[FORENAME][0] + fields[MIDDLENAME][:1] +

fields[SURNAME]).replace("-", "").replace(....., ""))

username = original_name = username[:8].lower() count = 1

while username in usernames:

username = "{0}{1}".format(original_name, count) count += 1 usernames.add(username) return username

We make a first attempt at creating a username by concatenating the first letter of the forename, the first letter of the middle name, and the whole surname, and deleting any hyphens or single quotes from the resultant string. The code for getting the first letter of the middle name is quite subtle. If we had used fields[MIDDLENAME][0] we would get an IndexError exception for empty middle names. But by using a slice we get the first letter if there is one, or an empty string otherwise.

Next we make the username lowercase and no more than eight characters long. If the username is in use (i.e., it is in the usernames set), we try the username with a "1" tacked on at the end, and if that is in use we try with a "2", and so on until we get one that isn't in use. Then we add the username to the set of usernames and return the username to the caller.

def print_users(users): namewidth = 32 usernamewidth = 9

print("{0:<{nw}} {1:^6} {2:{uw}}".format(

"Name", "ID", "Username", nw=namewidth, uw=usernamewidth)) print("{0:-<{nw}} {0:-<6} {0:-<{uw}}".format( "", nw=namewidth, uw=usernamewidth))

for key in sorted(users): user = users[key] initial = "" if user.middlename:

initial = " " + user.middlename[0] name = "{0.surname}, {0.forename}{1}".format(user, initial) print("{0:.<{nw}} ({1.id:4}) {1.username:{uw}}".format( name, user, nw=namewidth, uw=usernamewidth))

Once all the records have been processed, the print_users() function is called, with the users dictionary passed as its parameter.

str. The first print() statement prints the column titles, and the second print() format() statement prints hyphens under each title. This second statement's str. 78 •< format() call is slightly subtle. The string we give to be printed is "", that is, the empty string—we get the hyphens by printing the empty string padded with hyphens to the given widths.

Next we use a for ... in loop to print the details of each user, extracting the key for each user's dictionary item in sorted order. For convenience we create the user variable so that we don't have to keep writing users[key] throughout the rest of the function. In the loop's first call to str.format() we set the name variable to the user's name in surname, forename (and optional initial) form. We access items in the user named tuple by name. Once we have the user's name as a single string we print the user's details, constraining each column, (name, ID, username) to the widths we want.

The complete program (which differs from what we have reviewed only in that it has some initial comment lines and some imports) is in gener-ate_usernames.py. The program's structure—read in a data file, process each record, write output—is one that is very frequently used, and we will meet it again in the next example.

0 0

Post a comment

  • Receive news updates via email from this site