Hillary

Ben Schmidt, November 15, 2015

Comparing language between any two senders. Click on the words to see who uses the word the most.

Not Hillary-related directly, but something I wanted to show because it connects to your other work and could be easily run on Hillary: a map of everywhere mentioned in FRUS

The difference between 2011 and 2010; you can use the defined query box in the bottom to compare any two years, or using the Bookworm docs any arbitrary queries you can come up with.

Something different.

I copied this data to my server before moving the classification codes onto my laptop, so that’s not here. If I were going to look into this further, I’d be looking at the inbuilt topic model as a means for classifying things into classified or unclassified that wasn’t going to get quite so hung up on the minutia of the words that are inserted into the e-mails or redacted from it that will create a false sense of security.

There are a bunch of problems remaining. The tokenization was confused by all the \n characters in the e-mails and prefixes a bunch of words with ns. I’m not 100% confident that the distinction between sender and recipient is parsing correctly here.

You can click on all the charts to get directly to the underlying e-mail and see what’s going on.

Here’s a basic Ngram browser by sender

{
    "database": "hillary",
    "plotType": "linechart",
    "method": "return_json",
    "search_limits": {"date_year":{"$gte":2005,"$lte":2014},
"word":["Benghazi"],
        "sender": ["H"]
    },
    "aesthetic": {
        "y": "WordsPerMillion",
        "x": "date_month"
    }
}

And here’s one with counts on the y-axis instead of with percentages

{
    "database": "hillary",
    "plotType": "linechart",
    "method": "return_json",
    "search_limits": {"date_year":{"$gte":2005,"$lte":2014},
"word":["Benghazi"],
        "sender": ["H"]
    },
    "aesthetic": {
        "y": "WordCount",
        "x": "date_month"
    }
}

Just the overall counts for each person are useful to know

{
    "database": "hillary",
    "plotType": "linechart",
    "method": "return_json",
    "search_limits": {"date_year":{"$gte":2005,"$lte":2014},
        "sender": ["H"]
    },
    "aesthetic": {
        "y": "TextCount",
        "x": "date_month"
    }
}

That view can also be a flowchart

{
    "database": "hillary",
    "plotType": "streamgraph",
    "method": "return_json",
    "search_limits": {"date_year":{"$gte":2005,"$lte":2014}    },
    "aesthetic": {
        "y": "TextCount",
        "x": "date_month",
		"fill":"sender"
    }
}

And a sender-specific count aggregator

{
    "database": "hillary",
    "plotType": "barchart",
    "method": "return_json",
    "search_limits": {"date_year":{"$gte":2005,"$lte":2014},"word":["Benghazi"]
    },
    "aesthetic": {
        "x": "TextCount",
        "y": "sender"
    }
}

What time of day is the inner circle e-mailing: a heat map of several candidates

You may need to refresh if the color scheme is off. This is a bug.

{
    "database": "hillary",
    "plotType": "heatmap",
    "method": "return_json",
    "search_limits": {
        "sender__id": {
            "$lte": 10
        }
    },
    "aesthetic": {
        "y": "sender",
        "color": "TextPercent",
        "x": "*date_hour_day"
    },
    "counttype": ["TextPercent"],
    "groups": ["sender", "*date_hour_day"]
}

What time of day is the inner circle e-mailing?

A line chart view. Check out Anne-Marie Slaughter. This is the only finding I’d write home about from in here: that Slaughter actually does seem to knock off for the day, which is really an accomplishment.

{
    "database": "hillary",
    "plotType": "linechart",
    "method": "return_json",
    "search_limits": {
        "sender": ["H"]
    },
    "aesthetic": {
        "y": "TextCount",
        "x": "date_hour_day"
    }
}

What time of day is the inner circle e-mailing, broken down by year.

Sometimes it’s interesting to see how something like this differs by year.

{
    "database": "hillary",
    "plotType": "linechart",
    "method": "return_json",
    "search_limits": {
	    "date_year":[2009],
        "sender": ["H"]
    },
    "aesthetic": {
        "y": "TextCount",
        "x": "date_hour_day"
    }
}

E-mails by month.

Just another aggregate view.

{
    "database": "hillary",
    "plotType": "linechart",
    "method": "return_json",
    "search_limits": {
		    "date_year":[2009,2010,2011,2012],
		    "sender": ["H"]
    },
    "aesthetic": {
        "y": "TextCount",
        "x": "date_month"
    }
}

Finally some exploration of a topic model.

No results in here, just something to push and prod at.

A heatmap of the most common topics in each person’s e-mails

(Starting with Hillary)

Again, heatmaps may require a page refresh right now.

{
    "database": "hillary",
    "plotType": "heatmap",
    "method": "return_json",
    "search_limits": {
		    "date_year":[2009,2010,2011,2012],
		    "sender": ["H"]
    },
    "aesthetic": {
        "color": "WordCount",
		"y":"topic_label",
        "x": "date_month"
    },"scaleType":"log"
	
}

A linechart of the most common topics in each person’s e-mails

This is just the same data in a slightly different way. See what percentage of each persons’ e-mails for each month belong to any topic, selected from the dropdown.

{
    "database": "hillary",
    "plotType": "linechart",
    "method": "return_json",
    "search_limits": {
		    "date_year":[2009,2010,2011,2012],
		    "sender": ["H"],
			"*topic_label":["state gov Sullivan nTo nSubject nSent Message"]
    },
    "aesthetic": {
        "y": "WordCount",
        "x": "date_month"
    }
	
}