Introductions

Pyladies

Welcome all!

We like you, and want to know your OS, skill level, and primary interest / goal related to Python (and us!). Please fill in the survey linked here or use this QR code:

http://svy.mk/18hTQTE

Please

Find people with a similar OS and interests — and different skill levels.


Hit [Space] to continue ...

Quick look at survey results

link to survey results

Today is the best day
of your life so far

... because ...

Hallelujah! Python! Pyladies

Python

  • Is named after Monty Python, Guido Van Rossum's (creator of Python's) favorite acting troupe
  • Was created in 1990
  • Is now the second most frequently used language in business analytics, after SQL (2012 Berkeley paper)

Goals

If this is your first language, don't sweat it!

These slides will be here later...if all you get out of this is the ability to open a Python shell and understand these slides later when you look at them again, you're doing great!

Make your own micro-goals and don't be intimidated.

P.S.

One of your goals should be meeting other awesome PyLadies — we are doing this together for a reason, so please talk and interact and exchange cards and make friends today!

Outline

Today is a three-hour workshop. We have to move fast so please help the people around you if you finish before them.

  • Operators and functions
  • Data and container types
  • Control structures
  • I/O, including basic web APIs
  • How to write and run a Python script

Unlike yesterday, we will try to stick together (I'll be talking in between).

Setup

We want a terminal open to run Python interactively, and Sublime Text (or other editor) open to edit code at the same time.

  • Open a terminal (or Powershell) and change directories (cd) to inside the directory python_intro
  • Open Sublime Text within the context of python_intro by typing
    subl .
    
    at the terminal prompt
  • Go to ProjectSave Project As... and save the project in the same folder as the Git repo "python_intro.sublime-project" — this will not be saved in your Git repo because the .gitignore file has an entry to ignore the sublime-project file extension

Right now you have:


  • A terminal open and you are in the directory python_introthat you cloned yesterday
  • An editor open from inside python_intro

Sublime projects

The concept of 'projects' in Sublime makes the editing and searching more powerful. It, customizability, and a whole lot of keystroke shortcuts is one of the major draws of Sublime (just trust me for today).

The Python interpreter

  • Python is an interpreted language
  • It interpreted while it is run, not compiled before it is run
  • There is a Python shell that allows you to interact with the interpreter — in your terminal, just change directories to inside python_intro, and then type:

      python
    

    and the Python shell will open (please do it now)

In [41]:
# First, try Python as a calculator.
#
#    One at a time, in the interpreter please
1 + 1
3 / 4  # Caution about integer division!
3.0 / 4  # That's more like it
7**3   # use ** not ^ to raise to a power
Out[41]:
343

Challenge for you

The arithmetic operators in Python are:

    +   -   *   /   **   %   //

Use the Python interpreter to calculate:

  • 16 times 26515
  • 1835 modulo 163
In [61]:
16 * 26515
1835 % 163
424240
42

More math requires the math module

In [56]:
import math

print "The square root of 3 is:", math.sqrt(3)
print "pi is:", math.pi
print "The sin of  90 degrees is:", math.sin(math.pi / 2)
The square root of 3 is: 1.73205080757
pi is: 3.14159265359
The sin of  90 degrees is: 1.0

  • The import statement imports the module into the namespace
  • Then access functions (or constants) by using:
        <module>.<function>
    
  • And get help on what is in the module by using:
      help(<module>)
    

Challenge for you

Hint: help(math) will show all the functions...

  • What is the arc cosine of 0.743144 in degrees?
In [66]:
from math import acos, degrees  # use 'from' sparingly

int(degrees(acos(0.743144)))  # 'int' to make an integer
Out[66]:
42

Math takeaways

  • Operators are what you think
  • Be careful of unintended integer math
  • the math module has the remaining functions

Strings

(Easier in Python than in any other language ever. Even Perl.)

Strings

Use help(str) to see available functions for string objects. For help on a particular function from the class, type the class name and the function name: help(str.join)

String operations are easy:

s = "foobar"

"bar" in s
s.find("bar")
index = s.find("bar")
s[:index]
s[index:] + "  this is intuitive! Hooray!"
s[-1]  # The last element in the list or string

Strings are immutable, meaning they cannot be modified, only copied or replaced. (This is related to memory use, and interesting for experienced programmers ... don't worry if you don't get what this means.)

Challenge for you

Using only string addition (concatenation) and the function str.join, combine declaration and sayings :

declaration = "We are the knights who say:\n"
sayings = ['"icky"'] * 3 + ['"p\'tang"']
# the (\') escapes the quote

to a variable, sentence, that when printed does this:

>>> print sentence
We are the knights who say:
"icky", "icky", "icky", "p'tang"!
In [39]:
declaration = "We are now the knights who say:\n"
sayings = ['"icky"'] * 3 + ['"p\'tang"']

sentence = declaration + ", ".join(sayings) + "!"
print sentence
print    # empty 'print' makes a newline

# By the way:
print " - ".join( ['ni'] * 20 )
print "\n".join("icky, icky, icky, p'tang!".split(", "))
We are now the knights who say:
"icky", "icky", "icky", "p'tang"!

ni - ni - ni - ni - ni - ni - ni - ni - ni - ni - ni - ni - ni - ni - ni - ni - ni - ni - ni - ni
icky
icky
icky
p'tang!

String formatting

There are a bunch of ways to do string formatting:

  • C-style:
    "%s is: %.3f (or %d in Indiana)" % \
          ("Pi", math.pi, math.pi) 
    # Style notes:
    #     Line continuation with '\' works but
    #     is frowned upon. Indent twice
    #     (8 spaces) so it doesn't look
    #     like a control statement
    
  • New in Python 2.6, str.format doesn't require types:
    "{0} is: {1} ({1:3.2} truncated)".format(
          "Pi", math.pi)
    # More style notes:
    #    Line continuation in square or curly
    #    braces or parenthesis is better.
    
  • And Python 2.7+ allows named specifications:
    "{pi} is {pie:05.3}".format(
          pi="Pi", pie=math.pi)
    # Zero-padding and 5 total chars
    
In [42]:
# Try it out -- yeah!

print "%s is: %.3f (%d in Indiana)" % ("Pi", math.pi, math.pi)
print "{0} is: {1} ({1:.3} truncated)".format("Pi", math.pi)
print "{pi} is: {pie:05.3}".format(pi="Pi", pie=math.pi)
Pi is: 3.142 (3 in Indiana)
Pi is: 3.14159265359 (3.14 truncated)
Pi is: 03.14

String takeaways

  • str.split and str.join, plus the regex module (pattern matching tools for strings), make Python my language of choice for data manipulation
  • There are many ways to format a string
  • help(str) for more

Quick look at other types

In [ ]:
# I'll comment on these one by one while you try them:
x = True
x.__class__

# Lists can contain multiple types
x = [True, 1, 1.2, 'hi', [1], (1,2,3), {}, None]
x.__class__
# (the underscores are for special internal variables)

# List access
x[1]
x[0]  # Python is zero-indexed
x.append(set(["a", "b", "c"]))

for item in x:
    print item, "has class:", item.__class__

# If you need to check a type, using __class__
# is not considered very Pythonic. Instead do:
isinstance(x, list)
isinstance(x[1], bool)

Caveat

Lists, when copied, are copied by pointer. What that means is every symbol that points to a list, points to that same list.

Same with dictionaries and sets.

Example:

fifth_element = x[4]
fifth_element.append("Both!")
print fifth_element
print x

Why? The assignment (=) operator copies the pointer to the place on the computer where the list (or dictionary or set) is: it does not copy the actual contents of the whole object, just the address where the data is in the computer. This is efficent because the object could be megabytes big.

To make a duplicate copy you must do it explicitly

The copy module

Example:

import copy

# -------------------- A shallow copy
x[4] = ["list"]
shallow_copy_of_x = copy.copy(x)
shallow_copy_of_x[0] = "Shallow copy"
fifth_element = x[4]
fifth_element.append("Both?")

def print_list(l):
   print "-' * 10
   for elem in l:
       print elem
   print


# look at them
print_list(shallow_copy_of_x)
print_list(x)
fifth_element
# -------------------- A deep copy

x[4] = ["list"]
deep_copy_of_x = copy.deepcopy(x)
deep_copy_of_x[0] = "Deep copy"
fifth_element = deep_copy_of_x[4]
fifth_element.append("Both?")

# look at them
print_list(deep_copy_of_x)
print_list(x)
fifth_element

Common atomic types

boolean integer float string None
True 42 42.0 "hello" None

Common container types

list tuple set dictionary
  • Iterable
  • Mutable
  • No restriction on elements
  • Elements are ordered
  • Iterable
  • Immutable
  • Elements must be hashable
  • Elements are ordered
  • Iterable
  • Mutable
  • Elements are
    unique and must
    be hashable
  • Elements are not ordered
  • Iterable
  • Mutable
  • Key, value pairs.
    Keys are unique and
    must be hashable
  • Keys are not ordered

Iterable

You can loop over it

Mutable

You can change it

Hashable

A hash function converts an object to a number that will always be the same for the object. They help with identifying the object. A better explanation kind of has to go into the guts of the code...

Container examples

List

  • To make a list, use square braces.
    l = ["a", 0, [1, 2] ]
    l[1] = "second element"
    
  • Items in a list can be anything:
    sets, other lists, dictionaries, atoms
indices = range(len(l))
print indices


for i in indices:
    print l[i]
for x in l:
    print x

Tuple

To make a tuple, use parenthesis.

t = ("a", 0, "tuple")
for x in t:
    print x

Set

To make a set, wrap a list with the function set().

  • Items in a set are unique
  • Lists, dictionaries, and sets cannot be in a set
s = set(['a', 0])
if 'b' in s:
    print "has b"

s.add("b")
s.remove("a")

l = [1,2,3]
try:
    s.add(l)
except TypeError:
    print "Could not add the list"
    #raise  # uncomment to raise error

Dictionary

To make a dictionary, use curly braces.

  • A dictionary is a set of key,value pairs where the keys are unique.
  • Lists, dictionaries, and sets cannot be dictionary keys
  • To iterate over a dictionary use iteritems
#   two ways to do the same thing
d = {"mother":"hamster",
     "father":"elderberries"}
d = dict(mother="hamster",
         "father"="elderberries")

for k, v in d.iteritems():
    print "key: ", k,
    print "val: ", v

Type takeaways

  • Lists, tuples, dictionaries, sets all are base Python objects
  • Be careful of duck typing
  • Remember aboud copy / deepcopy
# For more information, use help(object)
help(tuple)
help(set)
help

Function definition and punctuation

Please open scratchscratch.py in your editor to follow along in the next few slides.

The syntax for creating a function is:

def function_name(arg1, arg2, kwarg1=default1):
    """Docstring goes here -- triple quoted."""
    pass  # the 'pass' keyword means 'do nothing'


# The next thing unindented statement is outside
# of the function. Leave a blank line between the
# end of the function and the next statement.

the def keyword begins a function declaration and the colon finishes the signature. The body must be indented. There are no curly braces for function bodies in Python — white space at the beginning of a line has meaning.

Also, at the end of a function, leave at least one blank line to separate the thought from the next thing in the script.

Whitespace matters

The 'tab' character '\t' counts as one single character even if it looks like multiple characters in your editor.

But indentation is how you denote nesting!

So, this can seriously mess up your coding. The Python style guide recommends configuring your editor to make the tab keypress type four spaces.

Please set the spacing for Python code in Sublime by going to Go to Sublime TextPreferencesSettings - MoreSyntax Specific - User

It will open up the file Python.sublime-settings. Please put this inside, then save and close.

{
    "tab_size": 4,
    "translate_tabs_to_spaces": true
}

Duck typing

Python's philosophy for handling data types is called duck typing (If it walks like a duck, and quacks like a duck, it's a duck). Functions do no type checking — they happily process an argument until something breaks. This is great for fast coding but can sometimes make for odd errors. (This may change, per a recent suggestion)

Challenge for you

Modify scratch.py to have another function named greet_people that takes a list of people and greets them all one by one. Hint: you can call the function greet_person.

In [ ]:
def greet_people(list_of_people):
    for person in list_of_people:
        greet_person(person)

Quack quack

Be sure to save your code and then change back to your Python interactive session. Import the scratch module by typing:

import scratch as sc  # shorthand for 'scratch'

If you modify the module and ned to reload it, use

reload(sc)

Then make a list of all of the people in your group and use your function to greet them:

people = ["King Arthur",
          "Sir Galahad",
          "Sir Robin"]
sc.greet_people(people)

# What do you think will happen if I do:
sc.greet_people("Tanya")

WTW?

Remember strings are iterable...

quack!
quack!

Whitespace / duck typing takeways

  • Indentation is how to denote nesting in Python
  • Do not use tabs; expand them to spaces
  • If it walks like a duck and quacks like a duck, it's a duck

Control structures

Common comparison operators

== != <= or <
>= or >
x in (1, 2) x is None
x is not None
equals not equals less or
equal, etc.
works for sets,
lists, tuples,
dictionary keys,
strings
just for None

If statement

i = 1

if i is None:
    print "None!"
elif i % 2 == 0:
    print "even!"
else:
    print "Neither None nor even"

# ternary:
"Y" if i==1 else "N"

Advanced users, there is no switch statement in Python.

While loop

i = 0
while i < 3:
    print i
    i += 1

print i

For loop

for i in range(3):
    print i

for element in ("one", 2, "three"):
    print element

Challenge for you

Please look at this code and think of what will happen, then copy it and run it:

  • When will it stop?
  • What will it print out?
  • What will i be at the end?
for i in range(20):
    if i == 15:
        break
    elif i % 2 == 0:
        continue
    for j in range(5):
        print i + j,
    print  # newline
In [5]:
for i in range(20):
    if i == 15:
        break
    elif i % 2 == 0:
        continue
    for j in range(5):
        print i + j,
    print  # newline

File I/O

There are two ways to open a file handle in Python: using a with statement (Python 2.7+) and the standard statement. With the standard way, you must close the file handle yourself. Both options are at right. The standard way is easier for interactive use.

Write to a file

# 'w' for write -- overwrites existing content
outfile = open("test_file.txt", "w")
outfile.write("This is a test.\n")

# Close the file handle yourself
outfile.close()

Append to a file

# 'a' for append -- appends to existing content
with open("test_file.txt", "a") as outfile:
    outfile.write("This is only a test.\n")

# The file handle automatically closes on exit
# of the 'with' statement
# (Advanced users: it's the C++ RAII philosophy)

Read from a file

# 'r' for read -- only reads existing content
with open("test_file.txt", "r") as infile:
    for line in infile:
        print "-".join(line.strip().split())

# -------------------------------------- #
# or (easier when in interactive mode):
infile = open("test_file.txt", "r")
for line in infile:
    print "-".join(line.strip().split())

# Close the file handle yourself
infile.close()

Challenge for you

What happens if you remove the strip()? Go ahead and try it... Why?

Hint: help(str.strip)

Onward to web stuff

Scraping

Scraping means getting data directly from a site's web page. This is different from using an Application Programming Interface (API) that a provider makes available.

It is impolite to scrape too much because it takes up the provider's bandwidth...so just be polite!

Tools:

Scraping the list of Python base modules

The list of Python base modules is great for an introduction to scraping because the page is simple, and the hard part is figuring out from the site's HTML how to get what you want.

Getting the page is easy with the requests library.

Challenge for you

Look on the requests quickstart page to find out how to get the content from the given url:

import requests

url = "https://docs.python.org/2/py-modindex.html"
## what next?
In [3]:
import requests

url = "https://docs.python.org/2/py-modindex.html"
response = requests.get(url)
print response.text

Parsing the HTML

Try it! it's as easy as

import requests
from bs4 import BeautifulSoup

url = "https://docs.python.org/2/py-modindex.html"
response = requests.get(url)
# The parse step
soup = BeautifulSoup(response.text)
# The step where you find the HTML elements 
modules = soup.find_all(
        "tt",
        attrs={"class":"xref"})
print "\n".join(m.text for m in modules)

Moving on

In the interest of time, this code is already in the script get_base_packages.py for later. Let's keep going.

U.S. census data

The U.S. actually puts lots of data online, and they have an API (Application Programming Interface) to get the data: you send a query in the correct format and they'll send a response with the data.

We are going to figure out salary data for the U.S. and for the Chicago region for a handful of jobs.

Script from scratch

We are going to start to write our code that pulls from the BLS API. Please open a new file, write this at the top, and save it inside the python_intro directory as get_bls_data.py:

#!/usr/bin/env python
# -*- coding: utf-8 -*-
"""
get_bls_data.py

This pulls data from the BLS
    - useful code:
      http://www.bls.gov/developers/api_python.htm
    - reference for series IDs:
      http://www.bls.gov/help/hlpforma.htm
"""
import json  # JSON to dictionary or vice versa
import requests  # for HTTP requests and much more
import tablib  # to output data

Also open the link to the BLS developers' page for Python

Double check


Your file should be in the directory python_intro/get_bls_data.py

Go over the BLS developer site's code

We will use tablib not prettytable because tablib is the new cool kid:

  • There are over 16,000 downloads per week
  • It's by Kenneth Reitz, the same guy who wrote the really great requests library
  • and prettytable hasn't really been changed in a few years

Lucky you

It's really great to have sample code.

No matter what, the approach would be the same for other API's — you have to read the documentation or just navigate the JSON or XML response and then manipulate it to get the data you want out. That is always the hard part.

Please copy the sample code for the API Version 2 from the BLS site into your get_bls_data.py script.

You have to register here to use API Version 2. You want to because they'll cut off your access if you query too much with API Version 1. Please do it now.

They will send you a code via email. Open that email and click the 'verify' link before you go to the next slide. Keep the email on hand for later when we use the access key.

How to code

Python has a philosophy about how to code. You can see it if, in the Python interpreter, you type

import this

What you see is The zen of Python. My favorite pearl of wisdom is Flat is beter than nested, because modular code is easier to test.

If you code by writing small functions, you can test them in the interpreter as you go. That is what we'll do!

Challenge for you

Would you please write a function

def get_data(series_list):
    """Get results from the BLS API query.

    :param series_list: list of series IDs
    :returns: a dictionary parsed from the
              returned JSON string
    """
    pass  ## for you to do!

at the top of your get_bls_data.py file that contains the BLS sample code starting from the line with header and returning the json_data.

Be sure to preserve the original series ID list as a variable:

series_ids = ['CUUR0000SA0','SUUR0000SA0']

Note

The example code is missing the field for your registration key. Be sure to add it. Follow the pattern in the BLS API definition. (Use your own key, though. The one you got from the email.)

In [6]:
def get_data(series_list):
    """Get results from the BLS API query.

    :param series_list: list of series IDs
    :returns: a dictionary parsed from the returned JSON string.
    """
    headers = {'Content-type': 'application/json'}
    url = 'http://api.bls.gov/publicAPI/v1/timeseries/data/'
    data = json.dumps({
            "registrationKey": 'YOUR BLS API KEY HERE',
            "seriesid": series_list,
            "startyear": "2011",
            "endyear": "2014"})
    p = requests.post(url, data=data, headers=headers)
    return json.loads(p.text)



series_ids = ['CUUR0000SA0','SUUR0000SA0']

Keep the interpreter session current

Remember to copy and paste your get_data function definition into the interpreter...

Test the code

... by running it in the interpreter.

series_ids = ['CUUR0000SA0','SUUR0000SA0']
json_data = get_data(series_ids)

Also put the above line in your get_bls_data.py script.

Right now the top of get_bls_data.py should look something like this:

In [ ]:
#!/usr/bin/env python
# -*- coding: utf-8 -*-
"""
get_bls_data.py

This pulls data from the BLS
    - useful code: http://www.bls.gov/developers/api_python.htm
    - reference for series IDs http://www.bls.gov/help/hlpforma.htm
"""
import json
import requests  # for HTTP requests and much more
import tablib  # to format data for output


#----------------------------------------------------- Setup ---#
def get_data(series_list):
    """Get results from the BLS API query.

    :param series_list: list of series IDs
    :returns: a dictionary parsed from the returned JSON string.
    """
    headers = {'Content-type': 'application/json'}
    url = 'http://api.bls.gov/publicAPI/v2/timeseries/data/'
    data = json.dumps({
            "registrationKey": 'YOUR BLS API KEY HERE',
            "seriesid": series_list,
            "startyear": "2011",
            "endyear": "2014"})
    p = requests.post(url, data=data, headers=headers)
    return json.loads(p.text)



series_ids = ['CUUR0000SA0','SUUR0000SA0']
json_data =  get_data(series_ids)

Typo

By the way there is a typo in the BLS code.

'if 'M01' <= period <= 'M12':'

should not have the quotes around it.



It should be:

if 'M01' <= period <= 'M12':

(and indented correctly)

Tablib

We need to convert from prettytable to tablib. The tablib quickstart is here. Three things need to change:

Creating the dataset

x=prettytable.PrettyTable([
        "series id", "year", "period",
        "value",  "footnotes"])

Adding rows

x.add_row([
        seriesId,year, period,
        value,footnotes[0:-1]])

Exporting

x.get_string()

Challenge for you

Convert from prettytable to tablib. You can do it!

In [12]:
# Your code should look like this:

for series in json_data['Results']['series']:
    x=tablib.Dataset(
            headers= ["series id","year",
                      "period","value","footnotes"])
    seriesId = series['seriesID']
    for item in series['data']:
        year = item['year']
        period = item['period']
        value = item['value']
        footnotes=""
        for footnote in item['footnotes']:
            if footnote:
                footnotes = footnotes + footnote['text'] + ','
        if 'M01' <= period <= 'M12':
            x.append([seriesId,year,period,value,footnotes[0:-1]])
    output = open(seriesId + '.csv','w')
    output.write (x.csv)
    output = open(seriesId + '.xlsx', 'w')
    output.close()

Data series selection

If you have time, you can pick a different set of data series to pull.

The help and tutorials page for the Bureau of Labor and Statistics describes each dataset and how to indicate the series you want.

We will look at:

Lucky you

We pulled the descriptions for those three data series for you in bls_codes.json in JSON.

'JSON' is an acronym for JavaScript Object Notation. It represents data in key-value pairs. It looks similar to a Python dictionary (not too hard to read).

Please open bls_codes.json in your editor.

Choose what data we want

We have to look at how to request data now, so change to bls_codes.json and we can look.

For each series, there is a series_id_format term that is written as a Python format string. Let's just focus on the "National Employment, Hours, and Earnings for now, and add this to the script:

id_format = ("{prefix}{seasonal_adjustment}"
             "{supersector_and_industry}"
             "{data_type}")
# This is an elegant way to make one long string.

Check it by copying this to the interpreter and then typing

id_format

at the interpreter prompt

Choose what data we want

Add these too, both to the script and in the interpreter:

series = ("National Employment, "
          "Hours, and Earnings")
prefix = "CE"
seasonal_adjustment = "U"
supersector_and_industry = "50511200"  #Software
data_type_list = ["01", "02", "11"]

Challenge for you

Fill in this loop to create a list of the series IDs:

series_list = []
for data_type in data_type_list:
    ### Your Code
    ### Hint:
    ###    you are appending to the series_list
    ###    you are formatting the id_format string
In [ ]:
series_list = []
for data_type in data_type_list:
    series_list.append(id_format.format(
        prefix=prefix,
        seasonal_adjustment=seasonal_adjustment,
        supersector_and_industry=supersector_and_industry,
        data_type=data_type))

Now run the loop in the interpreter

If there are no errors, paste this into both your script and the interpreter

json_data = get_data(series_list)

Check that it worked by just typing

json_data

in the interpreter

Hooray! You have new data! (Or ask for help...)

Try to run the whole file

See if everything runs by typing

python get_bls_data.py

in the Powershell window / terminal (not the Python interpreter shell)

Congratulations!

You have written a script that pulls data from the Bureau of Labor and Statistics API.

Great resources

In addition to the other PyLadies and, especially your awesome TAs, here are some well-loved resources:

To sync your fork of the python_intro repo

Follow the commands in the GitHub 'sync a fork' article, which are copied below.

Open a powershell or terminal window, and change directories into python_intro. Then type:

git fetch upstream
git checkout master
git merge upstream/master