Tanya Schlusser 11 December 2014
Slides prepared using iPython Notebook. (Awesome quick tutorial... and how to 'Markdown')
Following along? Clone this: https://github.com/tanyaschlusser/ipython_talk__OReilly_python_books
...wouldn't be able to get a drink for the next 18 - 36 months...
There are over 120 Python publications by O'Reilly alone
Of course we're going to use Python to find out. And of course the universe is only the size of O'Reilly
# See what's out there now. Pull the:
# -- media type (book | video)
# -- title
# -- publication date
import requests
from bs4 import BeautifulSoup
books_uri = "http://shop.oreilly.com/category/browse-subjects/programming/python.do?sortby=publicationDate&page=%d"
# Loop over all of the pages
results = []
description_results = {}
for page in range(1,5):
result = requests.get(books_uri % page)
soup = BeautifulSoup(result.text)
books = soup.find_all("td", "thumbtext")
for b in books:
yr = b.find("span", "directorydate").string.strip().split()
while not yr[-1].isdigit():
yr.pop()
yr = int(yr[-1])
title = b.find("div", "thumbheader").text.strip()
url = b.find("div", "thumbheader").find("a")["href"]
hasvideo = "Video" in b.text
results.append(dict(year=yr, title=title, hasvideo=hasvideo))
# Want to
# -- plot year over year number of books
# ++ stacked plot with video + print
# -- Get all the different words in the titles
# ++ count them
# ++ and order by frequency
#
# Use the Matplotlib magic command. Magic commands start with '%'.
# This sets up to plot inline. It doesn't import anything...
# Or use %pylab inline -- this apparently imports a lot of things into
# the global namespace
#
%matplotlib inline
# For year over year I need pandas.DataFrame.groupby
# For stacked plot I need matplotlib.pyplot
# Plain dictionary for the word counts
#
import matplotlib.pyplot as plt
import pandas as pd
# Year over year -- number of publications by 'video' and 'print'.
#
df = pd.DataFrame(results)
byyear = pd.crosstab(df["year"],df["hasvideo"])
byyear.rename(columns={True:'video', False:'print'}, inplace=True)
byyear.plot(kind="area", xlim=(2000,2014), title="Ever increasing publications")
<matplotlib.axes._subplots.AxesSubplot at 0x108cc9210>
That actually wasn't very satisfying...
# Out of curiosity, what happened in 2010?
df[df["year"]==2010]
hasvideo | title | year | |
---|---|---|---|
77 | False | Python 2.6 Text Processing Beginner's Guide | 2010 |
78 | False | Programming Python, 4th Edition | 2010 |
79 | False | Python Geospatial Development | 2010 |
80 | False | wxPython 2.8 Application Development Cookbook | 2010 |
81 | False | Python 2.6 Graphics Cookbook | 2010 |
82 | False | Head First Python | 2010 |
83 | False | Real World Instrumentation with Python | 2010 |
84 | False | Python Text Processing with NLTK 2.0 Cookbook | 2010 |
85 | False | MySQL for Python | 2010 |
86 | False | Python Multimedia | 2010 |
87 | True | Practical Python Programming: Callbacks | 2010 |
88 | False | Python 3 Object Oriented Programming | 2010 |
89 | False | Spring Python 1.1 | 2010 |
90 | False | Blender 2.49 Scripting | 2010 |
91 | False | Professional IronPython | 2010 |
92 | False | Grok 1.0 Web Development | 2010 |
93 | False | Beginning Python | 2010 |
94 | False | Python Testing | 2010 |
plt.figure(figsize=(5,20))
data_link = linkage(percent_overlap, method='single', metric='euclidean')
den = dendrogram(data_link,labels=sorted_titles, orientation="left")
plt.ylabel('Samples', fontsize=9)
plt.xlabel('Distance')
plt.suptitle('Books clustered by description similarity', fontweight='bold', fontsize=14);
from IPython.display import HTML
container = """
<a href='http://bl.ocks.org/mbostock/4063570'>Layout attribution: Michael Bostock</a>
<div id='display_container'></div>"""
with open("data/d3-stacked-Tree.js") as infile:
display_js = infile.read()
with open("data/human_hclust.json") as infile:
the_json = infile.read()
HTML(container + display_js % the_json )
It makes sense that topics have little overlap. Otherwise why write a different book? Do we have anything to contribute?
Making this deck in IPython was life-changing -- Python talks belong in IPython.
How to:
Make the notebook
pip install ipython
ipython notebook # Make something.
Remember to identify the slides:
Convert to html slideshow
export PREFIX=http://cdn.jsdelivr.net/reveal.js/2.6.2
ipython nbconvert <my_notebook>.ipynb \
--to slides \
--reveal-prefix ${PREFIX}
Add the new slides to the
Wait about 10 minutes and the slides are there
Also: