Introduction to Python

From oldITPedia

Jump to: navigation, search

Examples used in this tutorial

This presentation is composed mostly of beginner-friendly (and hopefully somewhat practical) sample code. We've only got an hour, so I'm glossing over a number of important topics, such as:

  • lists vs. tuples
  • sets and other useful data structures
  • how to define your own classes
  • keyword arguments
  • lambda functions
  • exceptions
  • string formatting
  • Python 3.0
  • list comprehensions (unless we have extra time)

The idea is to get as much working code in front of your eyes as possible, so you'll have something to hang your hats on when moving on to more advanced code.

Contents

Where to get it?

Here's the Python download page.

Python comes pre-installed on most UNIX-like systems, including OS X. If you're running Tiger, you'll want to download the newest version (Tiger shipped with an ancient and broken version of Python). Leopard users should be okay.

Here's the latest version for Windows.

Why Python?

It's clean, fast, elegant and portable. Here's what the Python folks have to say.

I personally like Python because it has simple idioms for working with files, strings, and lists. As an example, here are two programs, both of which do the same thing: read in a file and print out the first three characters of each line in the file.

First, in Java:

import java.io.*;
class TextTest {
  public static void main(String args[]) {
    String line;
    try {
      BufferedReader stdin =
        new BufferedReader(new InputStreamReader(System.in));
      while ((line = stdin.readLine()) != null) {
        if (line.length() >= 3) {
          System.out.println(line.substring(0, 3));
        }
        else {
          System.out.println(line);
        }
      }
    } 
    catch (IOException e) {
      System.err.println("Error: " + e);
    }
  }
}

The equivalent in Python:

import sys
for line in sys.stdin:
  print line[0:3]

Python development frameworks and libraries

Among others:

  • Python for S60 (a mobile phone development platform from Nokia)
  • NodeBox, a Processing-like environment for visual programming (OS X only, unfortunately)
  • Django, "the web framework for perfectionists with deadlines" (Python's answer to Ruby on Rails)
  • Plone is an extensible CMS written in Python
  • Pyglet, an OpenFrameworks-esque library (see also Mirra)
  • PyGame, a framework for writing games in Python
  • Beautiful Soup, a handly library for extracting data from HTML (even poorly structured HTML)
  • Python Natural Language Toolkit, a collection of libraries for parsing and generating natural language

Tutorial

Getting the example code

If you're using a Mac, you can follow along thusly:

  • Open Terminal (in Applications > Utilities)
  • Type in the following (that's a capital O, not a zero):
cd ~/Desktop
mkdir python_examples
cd python_examples
curl -O http://static.decontextualize.com/itp/python_examples.zip
unzip python_examples

This will create a folder on your desktop with the Python examples inside. (Python files have a .py extension.) You can edit these files in any text editor (on OS X, drag them to TextEdit, or use TextWrangler; on Windows, try NotePad+). Keep your Terminal window open, though: you'll be using it to run your scripts.

Using the interactive interpreter

Mac users can simply type the following in Terminal:

python

Windows users may have to type something crazy, like:

C:\Python2.6\python.exe

This will start the Python interactive interpreter. You can type in Python code, and the program will interpret and evaluate your input. Try some simple arithmetic:

>>> 9 + 5
14
>>> 42 + (3 * 6)
60
>>> 9 / 4.0
2.25

Create variables and use them in arithmetic:

>>> foo = 9
>>> bar = 5
>>> baz = 1.1
>>> foo + bar - baz
12.9

Create a string variable using quotes (you can use single or double quotes):

>>> message = "python"
>>> message
'python'
>>> message + " is for lovers"
'python is for lovers'

Note that you don't have to give each variable a specific type when you declare it, but once it's declared, you have to stick with the type that Python inferred. If you try to combine types in unexpected ways, Python will get angry with you.

>>> message + foo
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: cannot concatenate 'str' and 'int' objects

How do I exit the interactive interpreter?

On UNIX/Linux/OS X: Ctrl-D

On Windows: like Ctrl-Z or something

Everything is an object

In Python, everything is an object--including integers, floating-point variables, strings, etc. You can ask Python what type a variable is like so:

>>> type(foo)
<type 'int'>
>>> type(baz)
<type 'float'>
>>> type(message)
<type 'str'>

Python lets you look inside of any object to see what methods and properties it supports. You can do this right from the interpreter, like so:

>>> dir(message)
[... 'capitalize', 'center', 'count', 'decode', 'encode', 'endswith', 'expandtabs', ...]

What you're seeing is a list (another built-in Python type!) of methods that you can call on string objects. Some of these are mysterious (those beginning and ending with double underscores), others are straightforward. If you want to know more about these methods, you can use "help":

>>> help(message.upper)
Help on built-in function upper:

upper(...)
   S.upper() -> string
   
   Return a copy of the string S converted to uppercase.

To call a method on an object, use a period and parentheses, just like you would in Java. Method parameters are separated by commas, like you'd expect.

>>> message.upper()
'PYTHON'
>>> message.center(24, '*')
'*********python*********'

More about strings

Python has a powerful syntax for indexing parts of strings. (You use the same syntax for lists, which we'll talk about below.) Some examples:

>>> message = "bungalow"
>>> message[3]
'g'
>>> message[1:6]
'ungal'
>>> message[:3]
'bun'
>>> message[2:]
'ngalow'
>>> message[-2]
'o'
>>> message[:]
'bungalow'

Use the built-in method len (short for "length") to determine the length of the string:

>>> len(message)
8

Or the in keyword to check if a particular character occurs within a string:

>>> 'a' in message
True
>>> 'x' in message 
False

Other methods supported by string objects

Lists

Lists in Python are like arrays in Processing, but much more powerful. Some code:

>>> parts = ['led', 'resistor', 'capacitor']
>>> len(parts)
3
>>> parts[1]
'resistor'
>>> parts.append('ultrasonic range finder')
>>> parts
['led', 'resistor', 'capacitor', 'ultrasonic range finder']
>>> parts[2:]
['capacitor', 'ultrasonic range finder']
>>> parts.sort()
>>> parts
['capacitor', 'led', 'resistor', 'ultrasonic range finder']
>>> parts.reverse()
>>> parts
['ultrasonic range finder', 'resistor', 'led', 'capacitor']
>>> 'led' in parts
True
>>> 'flex sensor' in parts
False

Scripts and for loops

Typing commands directly into the interpreter is lots of fun (and very helpful when testing). But you'll mostly be running Python programs as scripts--files containing Python code. (Like a Processing sketch.) Here's how to do it from the command line (make sure you're in the same directory that you created earlier):

python words.py

The output should look like this:

Burr is a short word.
Symbologists is a long word.
Sawdusted is a long word.
Paramaecium is a long word.
Stags is a short word.
Untie is a short word.

Open words.py in your favorite text editor, and let's take a look.

# our very first python program... aw, so cute.

words = ['burr', 'symbologists', 'sawdusted', 'paramaecium', 'stags', 'untie']

for word in words:
  capitalized = word.title()
  if len(word) > 5:
    print capitalized + " is a long word."
  else:
    print capitalized + " is a short word."

Some things to note immediately:

  • Lines that begin with # are comments. (Python will ignore anything from # to the end of the line.)
  • You don't need to put a semicolon after every statement; you just put one statement on each line of the file.
  • Code blocks are indented--everything at the same indent level is part of the same block.
  • Long lists can be broken up over multiple lines; use a backslash (\) to break long statements across multiple lines

Syntax of the for loop:

for temp_variable in list:
  statements

Syntax of if/elif/else:

if expression:
  statements
elif:
  statements
else:
  statements

The very same program as Processing code, to compare and contrast:

String[] words =
  {"burr", "symbologists", "sawdusted", "paramaecium", "stags", "untie"};

for (int i = 0; i < words.length; i++) {
  String word = words[i];
  String capitalized = word.substring(0, 1).toUpperCase() +
    word.substring(1, word.length());
  if (word.length() > 5) {
    println(capitalized + " is a long word.");
  }
  else {
    println(capitalized + " is a short word.");
  }
}

What if I just want to count from one to ten?

Use Python's built-in range() function, which returns a list containing numbers in the desired range:

>>> range(1,11)
[1, 2, 3, 4, 5, 6, 7, 8, 9, 10]

This script:

for i in range(1,6):
  print i

Will output:

1
2
3
4
5

Reading files

Reading files in Python is simple, because file objects in Python are iterable: you can use a for loop to look at every line in the file. Check out words2.py for an example:

file_object = open("words.txt")
short_total = 0
long_total = 0

for line in file_object:
  line = line.strip()
  capitalized = line.title()
  if len(capitalized) > 5:
    print capitalized + " is a long word."
    long_total += 1
  else:
    print capitalized + " is a short word."
    short_total += 1

print
print "Total number of short words: " + str(short_total)
print "Total number of long words: " + str(long_total)

Should output:

Halides is a long word.
Preconception is a long word.
Daft is a short word.
Snobbiest is a long word.
Dirts is a short word.
Tumescing is a long word.
Drearest is a long word.
Distracters is a long word.
Nightsides is a long word.
Helminthologic is a long word.
Shrow is a short word.
Aloofly is a long word.
Trophotropic is a long word.
Reattached is a long word.
Alberghi is a long word.
Muzaky is a long word.
Grimier is a long word.
Publicness is a long word.
Zygospore is a long word.
Muckspreader is a long word.

Total number of short words: 3
Total number of long words: 17

Notes:

  • short_total and long_total need to be declared outside of the loop: variable scope is similar (but not identical) to Java.
  • line.strip() is called to remove whitespace from the end of the line (otherwise we'd still have the newline character from the text file).
  • We call the built-in function str() to convert the total integers to strings, so they can be concatenated with the rest of the text.

Dictionaries

The dictionary is a very powerful data structure. You can think of it as an array whose indexes are strings (or any other object) instead of numbers. In PHP, they're known as associative arrays and in Perl they're hashes; in Java, there's a class called Map that does the same thing. Some sample code for the interactive interpreter:

>>> assoc = {'butter': 'flies', 'cheese': 'wheel', 'milk': 'expensive'}
>>> assoc['butter']
'flies'
>>> assoc['gelato'] = 'delicious'
>>> assoc.keys()
['butter', 'cheese', 'milk', 'gelato']
>>> assoc.values()
['flies', 'wheel', 'expensive', 'delicious']
>>> 'milk' in assoc
True
>>> 'yogurt' in assoc
False

One classic application of this data structure is to keep track of how many times a particular string occurs in a source text. Here's the listing for token_count.py, which prints out all words that occur more than 100 times in Shakespeare's sonnets:

tokens = dict()

for line in open("sonnets.txt"):
  line = line.strip()
  for token in line.split(" "):
    if token in tokens:
      tokens[token] += 1
    else:
      tokens[token] = 1

for key in tokens.keys():
  if tokens[key] > 100:
    print key + " occurs " + str(tokens[key]) + " times."

Notes:

  • The split() method of Python's string class returns a list of substrings, broken up with the given separator.
  • Unlike PHP and Perl, we have to assign a value to a key before we can add one to it, hence the if/else in the loop.

Using this code to generate a NY Times/Jonathan Harris-style word frequency bubble visualization in Processing is an exercise left to the reader.

Using libraries

Python comes with a wide range of useful libraries. These libraries provide functionality ranging from parsing XML to network sockets to GUI interfaces. Full list here. In order to use these libraries, we need to tell Python to load the library and make its functions and classes available to our program. The way we do that is with import. The first library we'll use is Python's random library, which has some very useful functions for generating random numbers, choosing random items from lists, shuffling lists, etc. Examples from the interactive interpreter:

>>> import random
>>> dir(random) # gives us a list of everything defined in the library
[...'betavariate', 'choice', 'expovariate', 'gammavariate', 'gauss', 'getrandbits', 'getstate', ..., 'random', 'randrange', ...]
>>> random.randrange(10)
3
>>> random.choice(['foo', 'bar', 'baz'])
'baz'

As you can see, import random lets us use functions from the random module, but we have to type random. before those functions so Python knows where to look. If we want to use those functions without prefixing the name of the library, we can use an alternate syntax:

>>> from random import choice
>>> choice(['foo', 'bar', 'baz'])
'bar'

The choice function returns a random item from the given list. We'll be using it below.

Functions

Here's the syntax for defining a function in Python:

def name_of_function(argument_list):
  statements

Here's the listing of autopoet.py, which uses the random library mentioned above. This program opens a file, reads all of the words from it, and then generates a series of ten (ahem) poems, inserting a random conjunction between two random words. It defines a function for reading tokens from a given file. (You could potentially re-use this function in other programs.)

from random import choice

conjunctions = ['and', 'if', 'so', 'even if', 'not', 'but', 'therefore',
  'notwithstanding']

def get_tokens_from_file(filename):
  tokens = list()
  f = open(filename)
  for line in f:
    line = line.strip()
    for token in line.split(" "):
      tokens.append(token)
  return tokens

tokens = get_tokens_from_file("sowpods.txt")

for i in range(0, 10):
  print choice(tokens) + " " + choice(conjunctions) + " " + choice(tokens)

Sample output:

brassfounding therefore accoutred
sabering even if cashed
pudsier and stagecoach
mortalised and buboes
geometrical therefore skoaled
sluttiest therefore whirret
nemeses even if nutter
kidologists therefore overeaten
titivator therefore philibeg
pandation notwithstanding churchmanships

Notes:

  • Use the return keyword to return a value from a function. Note that you don't have to specify a return type.
  • The range built-in function is used here to generate a list of values from zero to nine. We're just using this to count, so we don't need to use the loop variable (i).

Parsing HTML with Beautiful Soup

Beautiful Soup is a Python library developed by Leonard Richardson. It makes parsing HTML files a snap--even in the HTML files in question are poorly structured. It's not included with Python by default; I've included a copy of the library in the .zip file for the drive-by. To use the library, it just needs to be in the same directory as the rest of your code.

Here's some sample code that uses the library. It loads the most recent ITP Projects page, finds all of the project titles, and prints out the titles that begin with a vowel:

from BeautifulSoup import BeautifulSoup
import urllib

url = "http://itp.nyu.edu/shows/spring2008/category/projects/"

# use urllib to get the page, then create BeautifulSoup object
page = urllib.urlopen(url).read()
soup = BeautifulSoup(page)

# search the soup object for td cells with class="cell title"
cells = soup('td', {'class': 'cell title'})

for cell in cells:
  anchors = cell('a')
  title = anchors[0].contents[0]
  title = title.strip()
  if title[0] in "aeiouAEIOU":
    print title

Notes:

  • We're using Python's urllib library to retrieve the web page. The urlopen() function returns a response object, and we can read the contents of the page we requested by calling read() on the response.
  • In Python, you (usually) create an object by "calling" its class as if it were a function.
  • Python objects can be "called" like functions as well. Here, the soup object takes two parameters, which tell it which HTML tags to find and filters those tags by attribute.

List comprehensions

When you're programming computers, there frequently arises the need to create a list that is a copy of another list, except with certain elements modified or filtered. Normally, you'd write code following this pattern:

source = variable of type list
dest = list()
for item in source:
  if condition:
    dest.append(expression)

Python has a special syntactic structure called a list comprehension to condense this logic into one line:

dest = [expression for item in source if condition]

Here's our project parser from the previous section, redone to use list comprehensions:

from BeautifulSoup import BeautifulSoup
import urllib

url = "http://itp.nyu.edu/shows/spring2008/category/projects/"
page = urllib.urlopen(url).read()
soup = BeautifulSoup(page)

cells = soup('td', {'class': 'cell title'})
titles = [cell('a')[0].contents[0].strip() for cell in cells]
titles = [t for t in titles if t[0] in "aeiouAEIOU"]

for title in titles:
  print title

Arguably more expressive and readable.

Learning more

Personal tools