A list of all the posts and pages found on the site. For you robots out there is an XML version available for digesting as well.


About me


Faceted maps in R

19 minute read


I recently needed to create a choropleth of a few different countries for a project on targeting of UN peacekeepers by non-state armed actors I’m working on. A choropleth is a type of thematic map where data are aggregated up from smaller areas (or discrete points) to larger ones and then visualized using different colors to represent different numeric values.

Finding Backcountry Campsites with CalTopo, OpenStreetMap, and R

32 minute read


Like many people, I’ve been spending more time outdoors during this pandemic. While this means daily walks in my neighborhood, it also means getting out into the wilderness and sleeping in a tent when I can. Although outdoor recreation is one of the safer ways to entertain yourself these days, it’s not without its own concerns. The difficulty of safely getting to trailheads means that while I’m backpacking more than usual, it’s still not as often as I’d like.

R Markdown, Jekyll, and Footnotes

8 minute read


I use jekyll to create my website. Jekyll converts Markdown files into the HTML that your browser renders into the pages you see. As others and I have written before, it’s pretty easy to use R Markdown to generate pages with R code and output all together. One thing has consistently eluded me, however: footnotes.

Working with Large Spatial Data in R

22 minute read


In my research I frequently work with large datasets. Sometimes that means datasets that cover the entire globe, and other times it means working with lots of micro-level event data. Usually, my computer is powerful enough to load and manipulate all of the data in R without issue. When my computer’s fallen short of the task at hand, my solution has often been to throw it at a high performance computing cluster. However, I finally ran into a situation where the data proved too large even for that approach.

Jekyll and HTML Widgets

9 minute read


I’m currently compiling a list of university-affiliated programs designed to help prepare students for graduate study in political science and assist them in the process of applying to graduate school (a labyrinthine and opaque process in many regards). Since travel costs can be a deciding factor for some students when deciding whether to apply to these programs, I thought it would be nice to also put them on a map.

Extracting UN Peacekeeping Data from PDF Files

18 minute read


Some coauthors and I recently published a piece in the Monkey Cage on the recent military coup in Mali and the overthrow of president Ibrahim Boubacar Keïta. We examine what the ouster of Keïta means for the future of MINUSMA, the United Nations peacekeeping mission in Mali. One of my contributions that didn’t make the final cut was this plot of casualties to date among UN peacekeepers in the so-called big 5 peacekeeping missions .

Adding Content to an Academic Website

12 minute read


One thing I haven’t covered in my previous posts on creating and customizing an academic website is how to actually add content to your site. You know, the stuff that’s the reason why people go to your website in the first place? If you’ve followed those guides, your website should be professional looking and already feeling a little bit different from the stock template. However, adding new pages or tweaking the existing pages can be a little intimidating, and I realized I should probably walk through how to do so. Luckily Jekyll’s use of Markdown makes it really easy to add new content!

Customizing an Academic Website

10 minute read


This is a followup to my previous post on creating an academic website. If you’ve followed that guide, you should have a website that’s professional-looking and informative, but it’s probably lacking something to really make it feel like your own. There are an infinite number of ways you could customize the academicpages template (many of them far, far beyond my abilities) but I’m going to walk you through the process I used to start tweaking my website. The goal here isn’t to tell you how you should personalize your website, but to give you the tools to learn how to implement whatever changes you want to make.

Building an Academic Website

29 minute read


If you’re an academic, you need a website. Obviously I agree with this since you’re reading this on my website, but if you don’t have one, you should get one. Most universities these days provide a free option, usually powered by WordPress (both WashU and UNC use WordPress for their respective offerings). While these sites are quick to set up and come with the prestige of a .edu URL, they have several drawbacks that have been extensively written on.

Visualizing Police Militarization

5 minute read


Much has been written lately about the increasing militarization of US law enforcement. One of the most visible indicators of this shift in recent decades is the increased frequency of tactical gear and equipment worn and carried by police officers. However, this pales in comparison to images of police departments bringing armored vehicles to peaceful protests.

Counting Words in a Snap

3 minute read


14 pt periods. 1.05” margins. 2.1 spaced lines. Times Newer Roman. I’ve seen them all, and I’m tired of trying to catch them. So, I’ve stopped assigning papers in terms of page length and switched to word counts. Unfortunately, counting words is more time-intensive than counting pages.

Better Beamer Presentations the Easy Way

9 minute read


Everyone knows that Beamer makes frankly terrible presentations without a good deal of help. A well crafted Beamer presentation can be a thing of beauty, especially since you can use knitr or R Markdown to automatically generate tables and figures, but it takes a lot of work.

Checking Progress with Bash

8 minute read


I’m currently cleaning and wrangling a large (> 2 billion observations) dataset. Due to its size, I’m running code in batch mode on a remote cluster. Not running interactively makes it harder for me to check on my code’s progress.

Fancy Icons and LaTeX Quirks

2 minute read


I recently updated my CV to add my ORCiD identifier to it up top among the other places to find me online. An ORCiD is an online identifier that persists through any changes to your name, institution, or email address throughout your life.

Combining PDF Documents the Smarter Way

5 minute read


My previous post on combining multiple PDF files had an important caveat that things would end up in the wrong order if you had files with leading ID numbers that started at 1 and ended at 12, you’d end up with PDFs combined in the order 1, 10, 11, 12, 2, 3, …, 9.

Combining PDF Documents

3 minute read


How many times have you found that your institution has access to a digital version of a book you need only to discover that it comes in 15 different PDF files?


Learning to Log: Helping Developers Make Informed Logging Decisions

Jieming Zhu, Pinjia He, Qiang Fu, Hongyu Zhang, Michael R. Lyu, Dongmei Zhang.
ICSE'15: International Conference on Software Engineering

An Evaluation Study on Log Parsing and Its Use in Log Mining

Pinjia He, Jieming Zhu, Shilin He, Jian Li, Michael R. Lyu.
DSN'16: International Conference on Dependable Systems and Networks

Experience Report: System Log Analysis for Anomaly Detection

Shilin He, Jieming Zhu, Pinjia He, Michael R. Lyu.
ISSRE'16: International Symposium on Software Reliability Engineering
Most Influential Paper Award

Online QoS Prediction for Runtime Service Adaptation via Adaptive Matrix Factorization

Jieming Zhu, Pinjia He*, Zibin Zheng, Michael R. Lyu.
TPDS'17: IEEE Transactions on Parallel and Distributed Systems

Drain: An Online Log Parsing Approach with Fixed Depth Tree

Pinjia He, Jieming Zhu, Zibin Zheng, Michael R. Lyu.
ICWS'17: International Conference on Web Services

Towards Automated Log Parsing for Large-Scale Log Data Analysis

Pinjia He, Jieming Zhu, Shilin He, Jian Li, Michael R. Lyu.
TDSC'18: IEEE Transactions on Dependable an Secure Computing

Characterizing the Natural Language Descriptions in Software Logging Statements

Pinjia He, Zhuangbin Chen, Shilin He, Michael R. Lyu.
ASE'18: International Conference on Automated Software Engineering

Logzip: Extracting Hidden Structures via Iterative Clustering for Log Compression

Jinyang Liu, Jieming Zhu, Shilin He, Pinjia He*, Zibin Zheng, Michael R. Lyu.
ASE'19: International Conference on Automated Software Engineering

Structure-Invariant Testing for Machine Translation

Pinjia He, Clara Meister, Zhendong Su.
ICSE'20: International Conference on Software Engineering

Machine Translation Testing via Pathological Invariance

Shashij Gupta, Pinjia He*, Clara Meister, Zhendong Su.
ESEC/FSE'20: ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering

Testing Machine Translation via Referential Transparency

Pinjia He, Clara Meister, Zhendong Su.
ICSE'21: International Conference on Software Engineering

A Survey on Automated Log Analysis for Reliability Engineering

Shilin He, Pinjia He*, Zhuangbin Chen, Tianyi Yang, Yuxin Su, Michael R. Lyu.
CSUR'21: ACM Computing Surveys

SanRazor: Reducing Redundant Sanitizer Checks in C/C++ Programs

Jiang Zhang, Shuai Wang, Manuel Rigger, Pinjia He, Zhendong Su.
OSDI'21: USENIX Symposium on Operating Systems Design and Implementation

Automated Testing of Image Captioning Systems

Boxi Yu, Zhiqing Zhong, Xinran Qin, Jiayi Yao, Yuancheng Wang, Pinjia He*.
ISSTA'22: International Symposium on Software Testing and Analysis

AEON: A Method for Automatic Evaluation of NLP Test Cases

Jen-tse Huang, Jianping Zhang, Wenxuan Wang, Pinjia He*, Yuxin Su, Michael R. Lyu.
ISSTA'22: International Symposium on Software Testing and Analysis