CableWeaving workshop - talking points and links
CableGate and other journalistic treasures
The "
CableGate" leak
is a library of 250,000 US diplomatic cables, 2003-2010
For comparison, the "Kissinger library" contains 1.7 million declassified documents
http://wikileaks.org/plusd
Other libraries
http://mirror.wikileaks-press.org
Analysis tools for journalists (general purpose and library-specific)
-
http://wikileaks-press.org/wiki/wiki/tools
Lots of processing was done specifically on CableGate
-
https://github.com/wlwardiary/cable2graph/wiki/data
Why CableGate is the most researched library
- Cables are [more or less] structured information with rich meta-data
- A formally binding medium: Less chit-chat [than e.g. GIFiles], better signal/noise ratio
- "Big enough" but not "too big" - a prototype for Kissinger library :)
- Quite recent. Refers to events and people that are still relevant
How does a cable look like?
CableGate
analysis using
graph theory methods.
Cables refer to each other: e.g.
https://www.wikileaks.org/plusd/cables/08JAKARTA659_a.html#efmAakAdO
@datapornstar decided to refer to the entire CableGate library as a huge graph where:
- Nodes are cables
- Edges are references (a directed graph - edges pointing backwards in time, but
cable2graph treats it as undirected for the sake of community clustering)
- Reference to a cable that isn't in the library is treated as a "missing cable",
and much can be learned from the cables refering to it. E.g.
http://cdpn.io/vkfCw
Operations cable2graph performs
Can also perform other operations (e.g. neighbourhood graphs, TAGS), but it's out of the scope of this talk.
- It's still something humans do better than computers
- Requires heavy computation (not practical as a web service)
CableWeaver was created in order to address these problems.
CableWeaver - browser-side semi-manual graph layout
An attempt to make cable2graph findings more accessible to journalists and other non-technical researchers
Layout is computed on the browser.
E.g.
http://thedod.github.io/cableweaver/#58ce93b9f8a77aab243cab354c3aa6b4
- Static site. No computation on the server (less exposure to DoS attack).
- Instead of a long wait in front of an "hourglass" animation - user gets a psychedelic experience :)
- Semi manual intervention allows the software to reach "near optimal" results and trust the user to make it better if needed.
Basic user interaction
- click - Open cable in new tab
- Ctrl-click - Toggle whether cable is selected (part of the story line)
- Shift-click - Toggle manual/automatic layout
The first two operations can also be done from the timeline on the left
3 steps:
- Search for graphs by MRNs - (Message Reference Numbers, e.g.
08JAKARTA659): you can search for cables at PlusD
or other cablegate search engines.
E.g. a search for cables from
Israel containing wataniya returns
34 results
- Select a graph
- Examine graph
Additional functionality
CableWeaving - sample story line repository
Let the audience choose story lines to display
How to collaborate on a story
- Codepen is friendly and responsive, but less suitable for long term collaboration (no merge, link sandboxing problems only partially solved)
- You can either create a gist from CableWeaver, or from codepen
- "Informal pull requests". Just tell me and maybe I'll merge your fork ;)
How to work on you scoop in secrecy
Install CableWeaver on your computer (avoid snooping):
https://github.com/thedod/cableweaver/wiki/Installing-on-your-own-server-or-home-computer
Known problems
- You can't export layout without "locking down" the story line (if you need to select/deselct cables, you'll also need to redo the layout).
- Nodes in story line pages are not links (you only see meta data on hover)
- Most journalists aren't comfy with html (or words like "fork").