Python Tutorial for CS114

September 7, 2001

Please send questions or comments to Anna Rumshisky

Content of this tutorial:

Python at Brandeis

Python should be available on both SG and Linux workstations in the Berry Patch (slate, granite, capri, quartz, talc; psyche, calypso, circe, calliope, orestes, theseus, electra, alcestis, cleo, medea, antiope, penelope, iphitus, omphale). To run Python on any CS machine, type python at the prompt.

To use Python from xemacs:

  • Edit a file ending in .py to invoke Python mode.
  • C-c ! in Python mode invokes the Python interpreter as an inferior process of emacs.
  • C-c C-c in Python mode (in a .py buffer) executes the current buffer in the Python interpreter process.
  • Python Documentation and Tutorials

    The official Python language website is www.python.org.

    You can download a copy to your PC from http://www.python.org/download/

    The following Python tutorials are available on the web:

  • Python tutorial from the official www.python.org site.
  • Python 101 cheat sheet
  • Another minimal crash-course in Python
  • A good summary of features can be found in the cheatsheet that comes with the documentation (see /usr/doc/python-docs-1.5.2/cheatsheet)
  • The Python documentation, along with links to more tutorials, is available at http://www.python.org/doc/.

    You might find especially useful the Python Library Reference.

    A Quick Intro to Python

    Python is an interpreted language implemented mostly in C/C++. The Python interpreter provides a command-line interface.

    Invoking the interpreter

    You can invoke the interpreter as follows:

    To exit from the interpreter: C-d

    In order to access command line arguments, you need to import the appropriate variable from the module sys.

    Importing modules

    You can import standard modules or your own files. To import the whole module:

    
    >>> import sys
    >>> sys.argv
    [<filename>.py, <arg1>, <arg2>]
    

    To import a particular variable or function:

    
    >>> from sys import exit
    >>> exit()
    [yourhandle@cleo ~]$ _
    

    Basics

    Indentation after a colon is used to mark blocks:

    
    if a < 0:
        print "a is negative"
    else:
        print "a is non-negative"    
    

    Comments start with '#', as in Perl.

    Variables and Built-in Data Types

    Variables don't have types, so no variable declarations are necessary.

    Numeric types are standard, except long integers have unlimited precision in Python:

    
    >>> a = 44444444444444444422222222222222222444444444444444444L
    >>> a * a
    1975308641975308640000000000000000020246913580246913530864197530864197599999999999999999802469135802469136L
    

    Python has the following built-in data types:

    Strings
    Lists
    Tuples
    Dictionaries

    The first 3 types are sequence types, and are basically similar to arrays in other programming languages. Strings are arrays of characters. Tuples and lists are arrays of arbitrary values. Consequently, tuples and lists can be nested. Strings and tuples are immutable. That is, e.g. unlike in C, you can not assign Lists are mutable.

    Truth Values and Boolean Operations

    The following values are false in Python:

    None
    zero of any numeric type
    any empty sequence or mapping

    All other values are true (except for instances of user-defined classes, if that class defines a __nonzero__() method or a __len__() method, and that method returns zero)

    Boolean operations: not, and, or

    
    >>> a = '' 
    >>> b = 32
    >>> a or b
    32
    >>> a or not b
    0
    

    For sequences, there is a membership testing operator in:

    
    >>> b = 32
    >>> b in [21, 32]
    1
    

    You can also compare sequence objects of the same type (using lexicographic order):

    
    >>> (1, 2, ('aa', 'ab')) < (1, 2, ('abc', 'a'), 4)
    1
    

    Control Flow

    All the standard loop and conditional constructs are present:

    Functions

    Python allows default arguments:

    
    >>> def modsum(a, b, c = 10):
    ...     return (a+b)%c
    ... 
    >>> modsum(22, 33)
    5
    

    By the way, Python allows function objects to returned or stored in variables:

    
    >>> def square(x): return x*x
    >>> def cube(x): return x*x*x
    >>> def compose(f1, f2):
    ...     return lambda x, f1=f1, f2=f2: f1(f2(x))
    >>> f = compose(square, cube)
    >>> f(2)
    64    
    

    Reading and writing files

    Printing in Python is easy:

    
    >>> x = 10 * 3.14
    >>> y = 200 * 200
    >>> s = 'The value of x is ' + `x` + ', and y is ' + `y` + '...'
    >>> print s
    The value of x is 31.400000000000002, and y is 40000...
    

    Note that in the above writing `x` is the same as str(x), the latter being an builtin function. So

    
    >>> s = 'The value of x is ' + `x` + ', and y is ' + `y` + '...'
    
    is the same as
    
    >>> s = 'The value of x is ' + str(x) + ', and y is ' + str(y) + '...'
    

    You can also build strings with standard C-style format strings:

    
    >>> x = math.pi
    >>> a = '%f %f %f' % (x, x*x, x*x*x)
    >>> print a
    3.141593 9.869604 31.006277
    

    Some common commands for dealing with file objects:

    Some relevant modules

    string

    string.split(str, sep)
    
    >>> from string import split
    >>> split("There is no place like home")
    ['There', 'is', 'no', 'place', 'like', 'home']
    

    re

    Similar to regular expressions in Perl, etc.

    Using regular expressions in Python:

    Always compile regular expressions you are going to use more than once .. it saves time.

    So, instead of

    
    result = re.search("", "")
    
    Use
    
    pat = re.compile("")
    result = pat.search("")
    
    Always use re.search() or p.search(), unless you're matching from the start of the string

    (1) Methods

    
    compile("<pattern>")
      - useful flags are I or IGNORECASE - all patterns are case insensitive
    		     S or DOTALL - dot also matches newlines
        # flags seem absent from 1.5.2 (!)
    
    search()
    match()
    findall() 
      - finds all nonoverlapping matches of pattern in a string
    finditer()		# seems absent from 1.5.2 
      - returns iterator over results of findall()
    split(<pattern>, <string>)
      - returns list
    escape() 
      - escapes all non-alphanumerics in a string
    sub(pattern, repl, string[, count]) 
      - replacing the leftmost non-overlapping occurrences of pattern in
        string by the replacement repl. repl can be a string or a function;
        if it is a string, any backslash escapes in it are processed.
      - If repl is a function, it is called for every non-overlapping
        occurrence of pattern. The function takes a single match object
        argument, and returns the replacement string
    subn()
      - same as sub() but  returns a tuple (new_string, number_of_subs_made)
    

    (2) Match Objects

    
    compile() returns a Regular Expression Object, with the same 
    	  methods as re module.
      
    search()  returns a Match Object.  
    

    (3) Match Object methods and fields:

    
    group([group1, ...]) 
         Returns one or more subgroups of the match. If there is a single
         argument, the result is a single string; if there are multiple
         arguments, the result is a tuple with one item per argument.
    
    groups([default]) 
         Return a tuple containing all the subgroups of the match, from 1
         up to however many.
    
    span([group]) 
         For MatchObject m, return the 2-tuple (m.start(group), m.end(group)).
    
         - The above three functions default argument to 0, i.e. the
         whole match, rather than specific groups in the match indicated
         with '(' and ')' in the regular expression
    
    re 
      - Regular expression object whose match() or search() method
        produced this MatchObject instance.
    string
      - string on which searching was performed
    

    Example:

    
    >>> m = re.search("(n't|'ve|'ll|'er|'em|'re)", "I haven't noticed")
    >>> m.groups()
    ("n't",)
    >>> m.group(1)
    "n't"
    >>> m.span(1)
    (6, 9)
    >>> m.string  
    "I haven't noticed"
    >>> m.re    
    
    >>> m.re.pattern
    "(n't|'ve|'ll|'er|'em|'re)"
    

    (4) Raw strings

    Python's raw string notation for regular expression patterns:

    
    r"<string>"
    r'<string>'
    

    <- backlashes are treated as literals in these, i.e. r"\n" REF: http://www.python.org/doc/current/lib/node99.html

    
    

    Classes

    Misc. remarks