Welcome all!
We like you, and want to know your OS, skill level, and primary interest / goal related to Python (and us!). Please fill in the survey linked here or use this QR code:
Find people with a similar OS and interests — and different skill levels.
If this is your first language, don't sweat it!
These slides will be here later...if all you get out of this is the ability to open a Python shell and understand these slides later when you look at them again, you're doing great!
Make your own micro-goals and don't be intimidated.
One of your goals should be meeting other awesome PyLadies — we are doing this together for a reason, so please talk and interact and exchange cards and make friends today!
Today is a three-hour workshop. We have to move fast so please help the people around you if you finish before them.
Unlike yesterday, we will try to stick together (I'll be talking in between).
We want a terminal open to run Python interactively, and Sublime Text (or other editor) open to edit code at the same time.
cd
) to inside the directory
python_introsubl .
at the terminal promptThe concept of 'projects' in Sublime makes the editing and searching more powerful. It, customizability, and a whole lot of keystroke shortcuts is one of the major draws of Sublime (just trust me for today).
There is a Python shell that allows you to interact with the interpreter — in your terminal, just change directories to inside python_intro, and then type:
python
and the Python shell will open (please do it now)
# First, try Python as a calculator.
#
# One at a time, in the interpreter please
1 + 1
3 / 4 # Caution about integer division!
3.0 / 4 # That's more like it
7**3 # use ** not ^ to raise to a power
343
The arithmetic operators in Python are:
+ - * / ** % //
Use the Python interpreter to calculate:
16 * 26515
1835 % 163
424240 42
import math
print "The square root of 3 is:", math.sqrt(3)
print "pi is:", math.pi
print "The sin of 90 degrees is:", math.sin(math.pi / 2)
The square root of 3 is: 1.73205080757 pi is: 3.14159265359 The sin of 90 degrees is: 1.0
import
statement imports the module into the namespace <module>.<function>
help(<module>)
Hint: help(math)
will show all the functions...
0.743144
in degrees?from math import acos, degrees # use 'from' sparingly
int(degrees(acos(0.743144))) # 'int' to make an integer
42
math
module has the remaining functions(Easier in Python than in any other language ever. Even Perl.)
Use help(str)
to see available functions for string objects. For help on a particular function from the class, type the class name and the function name: help(str.join)
String operations are easy:
s = "foobar"
"bar" in s
s.find("bar")
index = s.find("bar")
s[:index]
s[index:] + " this is intuitive! Hooray!"
s[-1] # The last element in the list or string
Strings are immutable, meaning they cannot be modified, only copied or replaced. (This is related to memory use, and interesting for experienced programmers ... don't worry if you don't get what this means.)
Using only string addition (concatenation) and the function str.join
, combine declaration
and sayings
:
declaration = "We are the knights who say:\n"
sayings = ['"icky"'] * 3 + ['"p\'tang"']
# the (\') escapes the quote
to a variable, sentence
, that when printed does this:
>>> print sentence
We are the knights who say:
"icky", "icky", "icky", "p'tang"!
declaration = "We are now the knights who say:\n"
sayings = ['"icky"'] * 3 + ['"p\'tang"']
sentence = declaration + ", ".join(sayings) + "!"
print sentence
print # empty 'print' makes a newline
# By the way:
print " - ".join( ['ni'] * 20 )
print "\n".join("icky, icky, icky, p'tang!".split(", "))
We are now the knights who say: "icky", "icky", "icky", "p'tang"! ni - ni - ni - ni - ni - ni - ni - ni - ni - ni - ni - ni - ni - ni - ni - ni - ni - ni - ni - ni icky icky icky p'tang!
There are a bunch of ways to do string formatting:
"%s is: %.3f (or %d in Indiana)" % \
("Pi", math.pi, math.pi)
# Style notes:
# Line continuation with '\' works but
# is frowned upon. Indent twice
# (8 spaces) so it doesn't look
# like a control statement
str.format
doesn't require types:"{0} is: {1} ({1:3.2} truncated)".format(
"Pi", math.pi)
# More style notes:
# Line continuation in square or curly
# braces or parenthesis is better.
"{pi} is {pie:05.3}".format(
pi="Pi", pie=math.pi)
# Zero-padding and 5 total chars
# Try it out -- yeah!
print "%s is: %.3f (%d in Indiana)" % ("Pi", math.pi, math.pi)
print "{0} is: {1} ({1:.3} truncated)".format("Pi", math.pi)
print "{pi} is: {pie:05.3}".format(pi="Pi", pie=math.pi)
Pi is: 3.142 (3 in Indiana) Pi is: 3.14159265359 (3.14 truncated) Pi is: 03.14
str.split
and str.join
, plus the regex module (pattern matching tools for strings), make Python my language of choice for data manipulationhelp(str)
for more# I'll comment on these one by one while you try them:
x = True
x.__class__
# Lists can contain multiple types
x = [True, 1, 1.2, 'hi', [1], (1,2,3), {}, None]
x.__class__
# (the underscores are for special internal variables)
# List access
x[1]
x[0] # Python is zero-indexed
x.append(set(["a", "b", "c"]))
for item in x:
print item, "has class:", item.__class__
# If you need to check a type, using __class__
# is not considered very Pythonic. Instead do:
isinstance(x, list)
isinstance(x[1], bool)
Lists, when copied, are copied by pointer. What that means is every symbol that points to a list, points to that same list.
Same with dictionaries and sets.
fifth_element = x[4]
fifth_element.append("Both!")
print fifth_element
print x
Why? The assignment (=
) operator copies the pointer to the place on the computer where the list (or dictionary or set) is: it does not copy the actual contents of the whole object, just the address where the data is in the computer. This is efficent because the object could be megabytes big.
Example:
import copy
# -------------------- A shallow copy
x[4] = ["list"]
shallow_copy_of_x = copy.copy(x)
shallow_copy_of_x[0] = "Shallow copy"
fifth_element = x[4]
fifth_element.append("Both?")
def print_list(l):
print "-' * 10
for elem in l:
print elem
print
# look at them
print_list(shallow_copy_of_x)
print_list(x)
fifth_element
# -------------------- A deep copy
x[4] = ["list"]
deep_copy_of_x = copy.deepcopy(x)
deep_copy_of_x[0] = "Deep copy"
fifth_element = deep_copy_of_x[4]
fifth_element.append("Both?")
# look at them
print_list(deep_copy_of_x)
print_list(x)
fifth_element
boolean | integer | float | string | None |
True | 42 | 42.0 | "hello" | None |
list | tuple | set | dictionary |
|
|
|
|
You can loop over it
You can change it
A hash function converts an object to a number that will always be the same for the object. They help with identifying the object. A better explanation kind of has to go into the guts of the code...
l = ["a", 0, [1, 2] ]
l[1] = "second element"
indices = range(len(l))
print indices
for i in indices:
print l[i]
for x in l:
print x
To make a tuple, use parenthesis.
t = ("a", 0, "tuple")
for x in t:
print x
To make a set, wrap a list with the function set()
.
s = set(['a', 0])
if 'b' in s:
print "has b"
s.add("b")
s.remove("a")
l = [1,2,3]
try:
s.add(l)
except TypeError:
print "Could not add the list"
#raise # uncomment to raise error
To make a dictionary, use curly braces.
iteritems
# two ways to do the same thing
d = {"mother":"hamster",
"father":"elderberries"}
d = dict(mother="hamster",
"father"="elderberries")
for k, v in d.iteritems():
print "key: ", k,
print "val: ", v
# For more information, use help(object)
help(tuple)
help(set)
help
Please open scratch → scratch.py in your editor to follow along in the next few slides.
The syntax for creating a function is:
def function_name(arg1, arg2, kwarg1=default1):
"""Docstring goes here -- triple quoted."""
pass # the 'pass' keyword means 'do nothing'
# The next thing unindented statement is outside
# of the function. Leave a blank line between the
# end of the function and the next statement.
the def keyword begins a function declaration and the colon finishes the signature. The body must be indented. There are no curly braces for function bodies in Python — white space at the beginning of a line has meaning.
Also, at the end of a function, leave at least one blank line to separate the thought from the next thing in the script.
The 'tab' character '\t' counts as one single character even if it looks like multiple characters in your editor.
But indentation is how you denote nesting!
So, this can seriously mess up your coding. The Python style guide recommends configuring your editor to make the tab keypress type four spaces.
Please set the spacing for Python code in Sublime by going to Go to Sublime Text → Preferences → Settings - More → Syntax Specific - User
It will open up the file Python.sublime-settings. Please put this inside, then save and close.
{
"tab_size": 4,
"translate_tabs_to_spaces": true
}
Python's philosophy for handling data types is called duck typing (If it walks like a duck, and quacks like a duck, it's a duck). Functions do no type checking — they happily process an argument until something breaks. This is great for fast coding but can sometimes make for odd errors. (This may change, per a recent suggestion)
Modify scratch.py to have another function named greet_people
that takes a list of people and greets them all one by one. Hint: you can call the function greet_person
.
def greet_people(list_of_people):
for person in list_of_people:
greet_person(person)
Be sure to save your code and then change back to your Python interactive session. Import the scratch module by typing:
import scratch as sc # shorthand for 'scratch'
If you modify the module and ned to reload it, use
reload(sc)
Then make a list of all of the people in your group and use your function to greet them:
people = ["King Arthur",
"Sir Galahad",
"Sir Robin"]
sc.greet_people(people)
# What do you think will happen if I do:
sc.greet_people("Tanya")
Remember strings are iterable...
== | != | <= or < >= or > |
x in (1, 2) | x is None x is not None |
equals | not equals | less or equal, etc. |
works for sets, lists, tuples, dictionary keys, strings |
just for None |
i = 1
if i is None:
print "None!"
elif i % 2 == 0:
print "even!"
else:
print "Neither None nor even"
# ternary:
"Y" if i==1 else "N"
Advanced users, there is no switch statement in Python.
i = 0
while i < 3:
print i
i += 1
print i
for i in range(3):
print i
for element in ("one", 2, "three"):
print element
Please look at this code and think of what will happen, then copy it and run it:
i
be at the end?for i in range(20):
if i == 15:
break
elif i % 2 == 0:
continue
for j in range(5):
print i + j,
print # newline
for i in range(20):
if i == 15:
break
elif i % 2 == 0:
continue
for j in range(5):
print i + j,
print # newline
There are two ways to open a file handle in Python: using a with statement (Python 2.7+) and the standard statement. With the standard way, you must close the file handle yourself. Both options are at right. The standard way is easier for interactive use.
# 'w' for write -- overwrites existing content
outfile = open("test_file.txt", "w")
outfile.write("This is a test.\n")
# Close the file handle yourself
outfile.close()
# 'a' for append -- appends to existing content
with open("test_file.txt", "a") as outfile:
outfile.write("This is only a test.\n")
# The file handle automatically closes on exit
# of the 'with' statement
# (Advanced users: it's the C++ RAII philosophy)
# 'r' for read -- only reads existing content
with open("test_file.txt", "r") as infile:
for line in infile:
print "-".join(line.strip().split())
# -------------------------------------- #
# or (easier when in interactive mode):
infile = open("test_file.txt", "r")
for line in infile:
print "-".join(line.strip().split())
# Close the file handle yourself
infile.close()
What happens if you remove the strip()
? Go ahead and try it... Why?
Hint: help(str.strip)
Scraping means getting data directly from a site's web page. This is different from using an Application Programming Interface (API) that a provider makes available.
It is impolite to scrape too much because it takes up the provider's bandwidth...so just be polite!
Tools:
The list of Python base modules is great for an introduction to scraping because the page is simple, and the hard part is figuring out from the site's HTML how to get what you want.
Getting the page is easy with the requests library.
Look on the requests quickstart page to find out how to get the content from the given url:
import requests
url = "https://docs.python.org/2/py-modindex.html"
## what next?
import requests
url = "https://docs.python.org/2/py-modindex.html"
response = requests.get(url)
print response.text
Try it! it's as easy as
import requests
from bs4 import BeautifulSoup
url = "https://docs.python.org/2/py-modindex.html"
response = requests.get(url)
# The parse step
soup = BeautifulSoup(response.text)
# The step where you find the HTML elements
modules = soup.find_all(
"tt",
attrs={"class":"xref"})
print "\n".join(m.text for m in modules)
In the interest of time, this code is already in the script get_base_packages.py for later. Let's keep going.
The U.S. actually puts lots of data online, and they have an API (Application Programming Interface) to get the data: you send a query in the correct format and they'll send a response with the data.
We are going to figure out salary data for the U.S. and for the Chicago region for a handful of jobs.
We are going to start to write our code that pulls from the BLS API. Please open a new file, write this at the top, and save it inside the python_intro directory as get_bls_data.py:
#!/usr/bin/env python
# -*- coding: utf-8 -*-
"""
get_bls_data.py
This pulls data from the BLS
- useful code:
http://www.bls.gov/developers/api_python.htm
- reference for series IDs:
http://www.bls.gov/help/hlpforma.htm
"""
import json # JSON to dictionary or vice versa
import requests # for HTTP requests and much more
import tablib # to output data
Also open the link to the BLS developers' page for Python
We will use tablib not prettytable because tablib is the new cool kid:
It's really great to have sample code.
No matter what, the approach would be the same for other API's — you have to read the documentation or just navigate the JSON or XML response and then manipulate it to get the data you want out. That is always the hard part.
Please copy the sample code for the API Version 2 from the BLS site into your get_bls_data.py script.
You have to register here to use API Version 2. You want to because they'll cut off your access if you query too much with API Version 1. Please do it now.
They will send you a code via email. Open that email and click the 'verify' link before you go to the next slide. Keep the email on hand for later when we use the access key.
Python has a philosophy about how to code. You can see it if, in the Python interpreter, you type
import this
What you see is The zen of Python. My favorite pearl of wisdom is Flat is beter than nested, because modular code is easier to test.
If you code by writing small functions, you can test them in the interpreter as you go. That is what we'll do!
Would you please write a function
def get_data(series_list):
"""Get results from the BLS API query.
:param series_list: list of series IDs
:returns: a dictionary parsed from the
returned JSON string
"""
pass ## for you to do!
at the top of your get_bls_data.py file that contains the BLS sample code starting from the line with header
and returning the json_data
.
Be sure to preserve the original series ID list as a variable:
series_ids = ['CUUR0000SA0','SUUR0000SA0']
The example code is missing the field for your registration key. Be sure to add it. Follow the pattern in the BLS API definition. (Use your own key, though. The one you got from the email.)
def get_data(series_list):
"""Get results from the BLS API query.
:param series_list: list of series IDs
:returns: a dictionary parsed from the returned JSON string.
"""
headers = {'Content-type': 'application/json'}
url = 'http://api.bls.gov/publicAPI/v1/timeseries/data/'
data = json.dumps({
"registrationKey": 'YOUR BLS API KEY HERE',
"seriesid": series_list,
"startyear": "2011",
"endyear": "2014"})
p = requests.post(url, data=data, headers=headers)
return json.loads(p.text)
series_ids = ['CUUR0000SA0','SUUR0000SA0']
Remember to copy and paste your get_data
function definition into the interpreter...
... by running it in the interpreter.
series_ids = ['CUUR0000SA0','SUUR0000SA0']
json_data = get_data(series_ids)
Also put the above line in your get_bls_data.py script.
Right now the top of get_bls_data.py should look something like this:
#!/usr/bin/env python
# -*- coding: utf-8 -*-
"""
get_bls_data.py
This pulls data from the BLS
- useful code: http://www.bls.gov/developers/api_python.htm
- reference for series IDs http://www.bls.gov/help/hlpforma.htm
"""
import json
import requests # for HTTP requests and much more
import tablib # to format data for output
#----------------------------------------------------- Setup ---#
def get_data(series_list):
"""Get results from the BLS API query.
:param series_list: list of series IDs
:returns: a dictionary parsed from the returned JSON string.
"""
headers = {'Content-type': 'application/json'}
url = 'http://api.bls.gov/publicAPI/v2/timeseries/data/'
data = json.dumps({
"registrationKey": 'YOUR BLS API KEY HERE',
"seriesid": series_list,
"startyear": "2011",
"endyear": "2014"})
p = requests.post(url, data=data, headers=headers)
return json.loads(p.text)
series_ids = ['CUUR0000SA0','SUUR0000SA0']
json_data = get_data(series_ids)
By the way there is a typo in the BLS code.
'if 'M01' <= period <= 'M12':'
should not have the quotes around it.
It should be:
if 'M01' <= period <= 'M12':
(and indented correctly)
We need to convert from prettytable
to tablib
. The tablib quickstart is here. Three things need to change:
x=prettytable.PrettyTable([
"series id", "year", "period",
"value", "footnotes"])
x.add_row([
seriesId,year, period,
value,footnotes[0:-1]])
x.get_string()
Convert from prettytable
to tablib
. You can do it!
# Your code should look like this:
for series in json_data['Results']['series']:
x=tablib.Dataset(
headers= ["series id","year",
"period","value","footnotes"])
seriesId = series['seriesID']
for item in series['data']:
year = item['year']
period = item['period']
value = item['value']
footnotes=""
for footnote in item['footnotes']:
if footnote:
footnotes = footnotes + footnote['text'] + ','
if 'M01' <= period <= 'M12':
x.append([seriesId,year,period,value,footnotes[0:-1]])
output = open(seriesId + '.csv','w')
output.write (x.csv)
output = open(seriesId + '.xlsx', 'w')
output.close()
If you have time, you can pick a different set of data series to pull.
The help and tutorials page for the Bureau of Labor and Statistics describes each dataset and how to indicate the series you want.
We will look at:
We pulled the descriptions for those three data series for you in bls_codes.json in JSON.
'JSON' is an acronym for JavaScript Object Notation. It represents data in key-value pairs. It looks similar to a Python dictionary (not too hard to read).
Please open bls_codes.json in your editor.
We have to look at how to request data now, so change to bls_codes.json and we can look.
For each series, there is a series_id_format term that is written as a Python format string. Let's just focus on the "National Employment, Hours, and Earnings for now, and add this to the script:
id_format = ("{prefix}{seasonal_adjustment}"
"{supersector_and_industry}"
"{data_type}")
# This is an elegant way to make one long string.
Check it by copying this to the interpreter and then typing
id_format
at the interpreter prompt
Add these too, both to the script and in the interpreter:
series = ("National Employment, "
"Hours, and Earnings")
prefix = "CE"
seasonal_adjustment = "U"
supersector_and_industry = "50511200" #Software
data_type_list = ["01", "02", "11"]
Fill in this loop to create a list of the series IDs:
series_list = []
for data_type in data_type_list:
### Your Code
### Hint:
### you are appending to the series_list
### you are formatting the id_format string
series_list = []
for data_type in data_type_list:
series_list.append(id_format.format(
prefix=prefix,
seasonal_adjustment=seasonal_adjustment,
supersector_and_industry=supersector_and_industry,
data_type=data_type))
If there are no errors, paste this into both your script and the interpreter
json_data = get_data(series_list)
Check that it worked by just typing
json_data
in the interpreter
Hooray! You have new data! (Or ask for help...)
See if everything runs by typing
python get_bls_data.py
in the Powershell window / terminal (not the Python interpreter shell)
You have written a script that pulls data from the Bureau of Labor and Statistics API.
In addition to the other PyLadies and, especially your awesome TAs, here are some well-loved resources:
Follow the commands in the GitHub 'sync a fork' article, which are copied below.
Open a powershell or terminal window, and change directories into python_intro. Then type:
git fetch upstream
git checkout master
git merge upstream/master