## Jupyter at Bryn Mawr College

Public notebooks: /services/public/dblank / Experiments / Debate1

# US Presidential Debate Sep 26 2016¶

This is a Jupyter notebook detailing an analysis of the language used in the first debate.

Prepared by Doug Blank, Bryn Mawr College
For full discussion, see: http://blankversusblank.blogspot.com/2016/09/post-debate-analysis.html

There were some errors that I corrected, so you can use the version here first_debate.txt.

First, we read the text into an array of lines:

In [165]:
text = [line.strip().replace("\n", " ").replace(".", " ").replace("?", " ")
.replace("“", " ").replace("”", " ").replace(":", " ")
.replace(",", " ").replace("—", " ").replace("-", " ")
for line in open("first_debate.txt").readlines()]
text_all = " ".join(text)


A sample to see what it looks like:

In [166]:
text[0]

Out[166]:
'HOLT  Good evening from Hofstra University in Hempstead  New York  I’m Lester Holt  anchor of  NBC Nightly News   I want to welcome you to the first presidential debate '

Now, we break it down by speaker:

In [167]:
holt = ""
clinton = ""
trump = ""

current = None
for line in text:
if not line:
continue
elif line in ["(APPLAUSE)", "(CROSSTALK)", "(LAUGHTER)"]:
continue
elif line.startswith("HOLT"):
current = "HOLT"
holt += line[4:] + " "
elif line.startswith("TRUMP"):
current = "TRUMP"
trump += line[5:] + " "
elif line.startswith("CLINTON"):
current = "CLINTON"
clinton += line[7:] + " "
else:
if current == "HOLT":
holt += line + " "
elif current == "TRUMP":
trump += line + " "
elif current == "CLINTON":
clinton += line + " "
else:
raise Exception("No speaker?!")

holt = holt.lower()
clinton = clinton.lower()
trump = trump.lower()

clinton = clinton.strip()
while "  " in clinton:
clinton = clinton.replace("  ", " ")
holt = holt.strip()
while "  " in holt:
holt = holt.replace("  ", " ").strip()
trump = trump.strip()
while "  " in trump:
trump = trump.replace("  ", " ").strip()


## Characters:¶

In [168]:
len(holt), len(trump), len(clinton)

Out[168]:
(10400, 42263, 33173)

And split the text into words:

In [169]:
clinton_words = clinton.split(" ")
trump_words = trump.split(" ")
holt_words = holt.split(" ")


## Number of total "words" spoken¶

In [170]:
len(clinton_words), len(trump_words), len(holt_words)

Out[170]:
(6237, 8139, 1878)

## Number of unique words spoken¶

In [171]:
clinton_set = set(clinton_words)
trump_set = set(trump_words)
holt_set = set(holt_words)

In [172]:
len(clinton_set), len(trump_set)

Out[172]:
(1379, 1269)

## Number of each word spoken, ranked from highest to lowest¶

In [173]:
def make_dict(words):
d = {}
for word in words:
count = d.get(word, 0)
d[word] = count + 1
return d

In [174]:
clinton_dict = make_dict(clinton_words)
trump_dict = make_dict(trump_words)

In [175]:
common_words = ["the", "to", "and", "or", "that", "of", "a", "in", "have", "it", "be",
"am", "are", "was", "were", "been", "be", "being", "is", "do", "would",
"but", "what", "so", "with", "about", "at", "on", "has", "can", "as",
"because", "when", "by", "an", "for", "this"]

In [184]:
for pair in sorted([items for items in clinton_dict.items() if items[0] not in common_words and
items[1] > 2],
key=lambda pair: pair[1], reverse=True):
print("%s: %s" % pair)

i: 138
we: 122
you: 76
he: 56
our: 42
not: 40
think: 38
well: 36
people: 32
they: 31
know: 28
going: 27
donald: 26
your: 25
one: 24
need: 23
us: 22
who: 21
more: 21
will: 21
that’s: 21
them: 21
it’s: 21
really: 20
there: 20
their: 19
want: 19
his: 18
from: 18
said: 17
if: 17
we’re: 17
country: 16
just: 16
good: 16
we’ve: 16
jobs: 16
lot: 16
tax: 16
make: 15
get: 15
new: 15
out: 15
up: 15
business: 15
got: 14
how: 14
work: 14
some: 14
should: 14
go: 13
all: 13
economy: 13
very: 13
also: 12
american: 12
had: 12
nuclear: 11
see: 11
he’s: 11
no: 11
down: 10
debt: 10
don’t: 10
look: 10
million: 10
my: 10
too: 10
into: 10
i’ve: 10
actually: 10
many: 10
kind: 10
fact: 10
important: 10
now: 9
put: 9
other: 9
police: 9
deal: 9
did: 9
again: 9
information: 9
middle: 9
say: 9
wealthy: 8
those: 8
first: 8
isis: 8
iran: 8
let’s: 8
plan: 8
president: 8
talk: 8
back: 8
over: 8
then: 8
support: 8
years: 8
much: 7
i’m: 7
lester: 7
ever: 7
things: 7
why: 7
me: 7
class: 7
something: 7
even: 7
paid: 7
state: 7
states: 7
him: 7
working: 7
pay: 7
trade: 7
world: 7
time: 7
proposed: 7
only: 7
communities: 7
home: 7
where: 7
taken: 6
everyone: 6
top: 6
national: 6
young: 6
percent: 6
everything: 6
having: 6
thing: 6
man: 6
money: 6
called: 6
which: 6
they’ve: 6
part: 6
we’ll: 6
made: 6
different: 6
nations: 6
weapons: 6
take: 6
you’re: 6
any: 6
trying: 6
government: 6
after: 6
worked: 6
able: 6
better: 6
against: 6
cyber: 5
returns: 5
obama: 5
two: 5
future: 5
plans: 5
come: 5
iraq: 5
there’s: 5
right: 5
heard: 5
united: 5
kinds: 5
never: 5
these: 5
together: 5
maybe: 5
give: 5
fair: 5
system: 5
believe: 5
number: 5
done: 5
federal: 5
sure: 5
law: 5
long: 5
both: 5
justice: 5
use: 5
incomes: 5
looked: 5
hope: 5
let: 5
doing: 4
like: 4
secretary: 4
saying: 4
benefit: 4
facing: 4
started: 4
facts: 4
troops: 4
york: 4
black: 4
5: 4
does: 4
clean: 4
says: 4
families: 4
hack: 4
clear: 4
provide: 4
went: 4
finally: 4
worst: 4
best: 4
zero: 4
america: 4
determines: 4
debate: 4
deals: 4
means: 4
security: 4
gun: 4
try: 4
understand: 4
they’re: 4
way: 4
real: 4
donald’s: 4
family: 4
taxes: 4
nato: 4
military: 4
help: 4
foreign: 4
attacks: 4
question: 4
trillion: 4
buy: 4
sometimes: 4
making: 4
here: 4
biggest: 4
you’ve: 4
job: 4
reasons: 4
around: 4
russia: 4
hard: 4
year: 4
invest: 4
most: 4
off: 4
same: 4
growth: 4
energy: 4
businesses: 4
small: 4
add: 4
under: 4
away: 4
could: 4
trickle: 4
great: 4
create: 4
still: 4
second: 4
issues: 4
problems: 4
recession: 3
street: 3
father: 3
grow: 3
seen: 3
intelligence: 3
40: 3
another: 3
asked: 3
half: 3
americans: 3
deserve: 3
course: 3
vote: 3
opportunities: 3
building: 3
lose: 3
installers: 3
financial: 3
white: 3
start: 3
private: 3
live: 3
muslim: 3
near: 3
may: 3
she: 3
before: 3
keep: 3
every: 3
race: 3
sailors: 3
stand: 3
life: 3
prepared: 3
responsibilities: 3
college: 3
matter: 3
remember: 3
took: 3
her: 3
release: 3
budget: 3
crime: 3
face: 3
criminal: 3
investments: 3
china: 3
racist: 3
though: 3
war: 3
build: 3
met: 3
policy: 3
voted: 3
tried: 3
health: 3
men: 3
education: 3
efforts: 3
leadership: 3
putin: 3
lie: 3
$5: 3 abroad: 3 absolutely: 3 rising: 3 enough: 3 else: 3 word: 3 problem: 3 barack: 3 can’t: 3 intend: 3 african: 3 happen: 3 share: 3 east: 3 unfortunately: 3 ways: 3 defeat: 3  In [177]: clinton_dict["china"], clinton_dict["plan"]  Out[177]: (3, 8) In [178]: trump_dict["china"], trump_dict["plan"]  Out[178]: (9, 3) In [185]: for pair in sorted([items for items in trump_dict.items() if items[0] not in common_words and items[1] > 2], key=lambda pair: pair[1], reverse=True): print("%s: %s" % pair)  i: 229 you: 189 we: 109 it’s: 72 they: 71 very: 65 our: 55 not: 50 country: 46 going: 43 all: 41 they’re: 40 look: 40 me: 39 said: 37 think: 37 them: 35 just: 35 she: 33 don’t: 32 will: 32 say: 32 i’m: 29 that’s: 29 people: 28 no: 28 doing: 27 out: 27 know: 27 get: 26 secretary: 25 clinton: 25 should: 25 one: 25 we’re: 24 many: 24 want: 23 years: 22 now: 21 thing: 21 things: 21 did: 21 their: 20 like: 20 your: 20 can’t: 20 if: 20 much: 20 other: 20 companies: 20 her: 20 jobs: 19 good: 19 really: 19 well: 19 way: 19 my: 19 some: 19 go: 18 these: 18 from: 17 money: 17 tell: 17 new: 17 over: 16 great: 16 you’re: 16 into: 15 believe: 15 time: 15 lot: 15 tax: 15 us: 15 up: 15 could: 15 leaving: 15 bad: 14 ever: 14 world: 14 deal: 14 isis: 14 he: 14 against: 14 agree: 13 got: 13 more: 13 first: 13 bring: 13 which: 13 back: 13 war: 12 trillion: 12 she’s: 12 i’ll: 12 right: 12 lester: 12 even: 12 than: 12 wrong: 12 countries: 12 where: 12 see: 11 done: 11 had: 11 how: 11 tremendous: 11 take: 11 better: 10 nato: 10 i’ve: 10 also: 10 hillary: 10 politicians: 10 why: 10 there: 10 let: 10 stop: 10 president: 10 doesn’t: 10 taken: 9 there’s: 9 trade: 9 job: 9 didn’t: 9 you’ve: 9 never: 9 him: 9 china: 9 times: 9 maybe: 9 whether: 8 big: 8 trump: 8 come: 8 taking: 8 far: 8 then: 8 almost: 8 campaign: 8 russia: 8 company: 8 community: 8 talking: 8 deals: 8 iran: 8 need: 8 sean: 8 nuclear: 8 happened: 8 regulations: 7 help: 7 obama: 7 greatest: 7 who: 7 saying: 7 nafta: 7 started: 7 down: 7 work: 7 nobody: 7 haven’t: 7 defend: 7 give: 7 experience: 7 nothing: 7 before: 7 30: 7 middle: 7 able: 7 worst: 7 under: 7 paying: 7 make: 7 fact: 7 hannity: 7 day: 7 business: 7 losing: 7 wait: 7 taxes: 7 last: 6 seen: 6 korea: 6 mean: 6 everybody: 6 endorsed: 6 ok: 6 he’s: 6 website: 6 long: 6 another: 6$20: 6
oil: 6
respond: 6
spent: 6
something: 6
problem: 6
little: 6
debate: 6
political: 6
mess: 6
happen: 6
watch: 6
mexico: 6
release: 6
east: 6
important: 6
laws: 5
cyber: 5
somebody: 5
hundreds: 5
terms: 5
strongly: 5
put: 5
probably: 5
billions: 5
nation: 5
york: 5
million: 5
fed: 5
biggest: 5
north: 5
after: 5
debt: 5
everything: 5
anywhere: 5
used: 5
called: 5
stamina: 5
land: 5
dollars: 5
income: 5
question: 5
major: 5
formed: 5
single: 5
percent: 5
off: 5
true: 5
disaster: 5
only: 5
cannot: 5
fight: 5
i’d: 5
10: 5
old: 5
talks: 5
we’ve: 5
approve: 5
getting: 5
name: 5
made: 5
list: 5
african: 5
murders: 5
totally: 5
four: 5
lots: 5
returns: 5
temperament: 5
care: 5
certainly: 5
states: 4
wealthy: 4
places: 4
certificate: 4
different: 4
interest: 4
supposed: 4
500: 4
number: 4
read: 4
asked: 4
american: 4
thinking: 4
ask: 4
soon: 4
frisk: 4
around: 4
raise: 4
produce: 4
place: 4
sent: 4
advantage: 4
perhaps: 4
year: 4
during: 4
lists: 4
lawsuit: 4
best: 4
any: 4
billion: 4
proud: 4
expand: 4
anybody: 4
gave: 4
actually: 4
iraq: 4
cut: 4
we’ll: 4
find: 4
minute: 4
airports: 4
relationships: 4
does: 4
once: 4
two: 4
audited: 4
brought: 4
japan: 4
kind: 4
real: 4
budget: 4
birth: 4
excuse: 4
what’s: 4
week: 4
talk: 4
they’ve: 4
needs: 4
inner: 4
cities: 4
left: 4
learn: 4
tough: 4
too: 4
build: 3
beautiful: 3
night: 3
myself: 3
his: 3
article: 3
audit: 3
feel: 3
15: 3
coming: 3
concerned: 3
reason: 3
energy: 3
yes: 3
wonder: 3
avenue: 3
met: 3
releases: 3
trying: 3
created: 3
unbelievable: 3
happy: 3
strong: 3
defective: 3
quickly: 3
5: 3
fine: 3
banks: 3
800: 3
course: 3
shows: 3
small: 3
call: 3
interview: 3
200: 3
\$650: 3
badly: 3
michael: 3
control: 3
blumenthal: 3
mine: 3
life: 3
roads: 3
terror: 3
000: 3
looks: 3
e: 3
bit: 3
start: 3
thousands: 3
dnc: 3
hard: 3
telling: 3
arabia: 3
communities: 3
schedule: 3
pennsylvania: 3
history: 3
opening: 3
united: 3
plants: 3
isn’t: 3
economy: 3
cost: 3
millions: 3
told: 3
numbers: 3
own: 3
assets: 3
thinks: 3
same: 3
impact: 3
reporter: 3
estate: 3
every: 3
credit: 3
went: 3
anything: 3
saudi: 3
within: 3
approved: 3
person: 3
d: 3
keeping: 3
election: 3
end: 3
admirals: 3
fifth: 3
manager: 3
donald: 3
yeah: 3
mails: 3
since: 3
internet: 3
certain: 3
mainstream: 3
fault: 3
plan: 3
order: 3
ohio: 3
shear: 3
ahead: 3
ago: 3
winning: 3
win: 3
signed: 3
love: 3
nice: 3
report: 3
truth: 3
defending: 3
terrible: 3
fly: 3
obama’s: 3
family: 3
sell: 3
rates: 3
pay: 3
frankly: 3
oh: 3
most: 3
story: 3
power: 3
audit’s: 3
worth: 3


## How often did they speak¶

In [180]:
text_all.count("TRUMP")

Out[180]:
126
In [181]:
text_all.count("CLINTON")

Out[181]:
93