Sitemap

A list of all the posts and pages found on the site. For you robots out there is an XML version available for digesting as well.

Pages

About me

Posts

Faceted maps in R

19 minute read

Published:

I recently needed to create a choropleth of a few different countries for a project on targeting of UN peacekeepers by non-state armed actors I’m working on. A choropleth is a type of thematic map where data are aggregated up from smaller areas (or discrete points) to larger ones and then visualized using different colors to represent different numeric values.

Finding Backcountry Campsites with CalTopo, OpenStreetMap, and R

32 minute read

Published:

Like many people, I’ve been spending more time outdoors during this pandemic. While this means daily walks in my neighborhood, it also means getting out into the wilderness and sleeping in a tent when I can. Although outdoor recreation is one of the safer ways to entertain yourself these days, it’s not without its own concerns. The difficulty of safely getting to trailheads means that while I’m backpacking more than usual, it’s still not as often as I’d like.

R Markdown, Jekyll, and Footnotes

8 minute read

Published:

I use jekyll to create my website. Jekyll converts Markdown files into the HTML that your browser renders into the pages you see. As others and I have written before, it’s pretty easy to use R Markdown to generate pages with R code and output all together. One thing has consistently eluded me, however: footnotes.

Working with Large Spatial Data in R

22 minute read

Published:

In my research I frequently work with large datasets. Sometimes that means datasets that cover the entire globe, and other times it means working with lots of micro-level event data. Usually, my computer is powerful enough to load and manipulate all of the data in R without issue. When my computer’s fallen short of the task at hand, my solution has often been to throw it at a high performance computing cluster. However, I finally ran into a situation where the data proved too large even for that approach.

Jekyll and HTML Widgets

9 minute read

Published:

I’m currently compiling a list of university-affiliated programs designed to help prepare students for graduate study in political science and assist them in the process of applying to graduate school (a labyrinthine and opaque process in many regards). Since travel costs can be a deciding factor for some students when deciding whether to apply to these programs, I thought it would be nice to also put them on a map.

Extracting UN Peacekeeping Data from PDF Files

18 minute read

Published:

Some coauthors and I recently published a piece in the Monkey Cage on the recent military coup in Mali and the overthrow of president Ibrahim Boubacar Keïta. We examine what the ouster of Keïta means for the future of MINUSMA, the United Nations peacekeeping mission in Mali. One of my contributions that didn’t make the final cut was this plot of casualties to date among UN peacekeepers in the so-called big 5 peacekeeping missions .

Adding Content to an Academic Website

12 minute read

Published:

One thing I haven’t covered in my previous posts on creating and customizing an academic website is how to actually add content to your site. You know, the stuff that’s the reason why people go to your website in the first place? If you’ve followed those guides, your website should be professional looking and already feeling a little bit different from the stock template. However, adding new pages or tweaking the existing pages can be a little intimidating, and I realized I should probably walk through how to do so. Luckily Jekyll’s use of Markdown makes it really easy to add new content!

Customizing an Academic Website

10 minute read

Published:

This is a followup to my previous post on creating an academic website. If you’ve followed that guide, you should have a website that’s professional-looking and informative, but it’s probably lacking something to really make it feel like your own. There are an infinite number of ways you could customize the academicpages template (many of them far, far beyond my abilities) but I’m going to walk you through the process I used to start tweaking my website. The goal here isn’t to tell you how you should personalize your website, but to give you the tools to learn how to implement whatever changes you want to make.

Building an Academic Website

29 minute read

Published:

If you’re an academic, you need a website. Obviously I agree with this since you’re reading this on my website, but if you don’t have one, you should get one. Most universities these days provide a free option, usually powered by WordPress (both WashU and UNC use WordPress for their respective offerings). While these sites are quick to set up and come with the prestige of a .edu URL, they have several drawbacks that have been extensively written on.

Visualizing Police Militarization

5 minute read

Published:

Much has been written lately about the increasing militarization of US law enforcement. One of the most visible indicators of this shift in recent decades is the increased frequency of tactical gear and equipment worn and carried by police officers. However, this pales in comparison to images of police departments bringing armored vehicles to peaceful protests.

Counting Words in a Snap

3 minute read

Published:

14 pt periods. 1.05” margins. 2.1 spaced lines. Times Newer Roman. I’ve seen them all, and I’m tired of trying to catch them. So, I’ve stopped assigning papers in terms of page length and switched to word counts. Unfortunately, counting words is more time-intensive than counting pages.

Better Beamer Presentations the Easy Way

9 minute read

Published:

Everyone knows that Beamer makes frankly terrible presentations without a good deal of help. A well crafted Beamer presentation can be a thing of beauty, especially since you can use knitr or R Markdown to automatically generate tables and figures, but it takes a lot of work.

Checking Progress with Bash

8 minute read

Published:

I’m currently cleaning and wrangling a large (> 2 billion observations) dataset. Due to its size, I’m running code in batch mode on a remote cluster. Not running interactively makes it harder for me to check on my code’s progress.

Fancy Icons and LaTeX Quirks

2 minute read

Published:

I recently updated my CV to add my ORCiD identifier to it up top among the other places to find me online. An ORCiD is an online identifier that persists through any changes to your name, institution, or email address throughout your life.

Combining PDF Documents the Smarter Way

5 minute read

Published:

My previous post on combining multiple PDF files had an important caveat that things would end up in the wrong order if you had files with leading ID numbers that started at 1 and ended at 12, you’d end up with PDFs combined in the order 1, 10, 11, 12, 2, 3, …, 9.

Combining PDF Documents

3 minute read

Published:

How many times have you found that your institution has access to a digital version of a book you need only to discover that it comes in 15 different PDF files?

publications

Experience Report: System Log Analysis for Anomaly Detection

Shilin He, Jieming Zhu, Pinjia He, Michael R. Lyu.
ISSRE'16: International Symposium on Software Reliability Engineering
Most Influential Paper Award

Drain: An Online Log Parsing Approach with Fixed Depth Tree

Pinjia He, Jieming Zhu, Zibin Zheng, Michael R. Lyu.
ICWS'17: International Conference on Web Services

Structure-Invariant Testing for Machine Translation

Pinjia He, Clara Meister, Zhendong Su.
ICSE'20: International Conference on Software Engineering

Testing Machine Translation via Referential Transparency

Pinjia He, Clara Meister, Zhendong Su.
ICSE'21: International Conference on Software Engineering

A Survey on Automated Log Analysis for Reliability Engineering

Shilin He, Pinjia He, Zhuangbin Chen, Tianyi Yang, Yuxin Su, Michael R. Lyu.
CSUR'21: ACM Computing Surveys

ROME: Testing Image Captioning Systems via Recursive Object Melting

Boxi Yu+, Zhiqing Zhong+, Jiaqi Li+, Yixing Yang+, Shilin He, Pinjia He.
ISSTA'23: International Symposium on Software Testing and Analysis

Deep Learning or Classical Machine Learning? An Empirical Study on Log-Based Anomaly Detection

Boxi Yu+, Jiayi Yao+, Qiuai Fu, Zhiqing Zhong+, Haotian Xie+, Yaoliang Wu, Yuchi Ma, Pinjia He.
ICSE'24: International Conference on Software Engineering

Testing Graph Database Systems via Equivalent Query Rewriting

Qiuyang Mang+, Aoyang Fang+, Boxi Yu+, Hanfei Chen+, Pinjia He.
ICSE'24: International Conference on Software Engineering

GPT-4 Is Too Smart To Be Safe: Stealthy Chat with LLMs via Cipher

Youliang Yuan+, Wenxiang Jiao, Wenxuan Wang, Jen-tse Huang, Pinjia He, Shuming Shi, Zhaopeng Tu.
ICLR'24: International Conference on Learning Representations

UTBoost: Rigorous Evaluation of Coding Agents on SWE-Bench

Boxi Yu+, Yuxuan Zhu, Pinjia He, Daniel Kang.
ACL'25: Annual Meeting of the Association for Computational Linguistics

Refuse Whenever You Feel Unsafe: Improving Safety in LLMs via Decoupled Refusal Training

Youliang Yuan+, Wenxiang Jiao, Wenxuan Wang, Jen-tse Huang, Jiahao Xu, Tian Liang, Pinjia He, Zhaopeng Tu.
ACL'25: Annual Meeting of the Association for Computational Linguistics

An Empirical Study on Package-Level Deprecation in Python Ecosystem

Zhiqing Zhong+, Shilin He, Haoxuan Wang+, Boxi Yu+, Haowen Yang+, Pinjia He.
ICSE'25: International Conference on Software Engineering

Aligning the Objective of LLM-Based Program Repair

Junjielong Xu+, Ying Fu+, Shin Hwei Tan, Pinjia He.
ICSE'25: International Conference on Software Engineering

OpenRCA: Can Large Language Models Locate the Root Cause of Software Failures?

Junjielong Xu+, Qinan Zhang+, Zhiqing Zhong+, Shilin He, Chaoyun Zhang, Qingwei Lin, Dan Pei, Pinjia He, Dongmei Zhang, Qi Zhang.
ICLR'25: International Conference on Learning Representations

Towards Evaluating Proactive Risk Awareness of Multimodal Language Models

Youliang Yuan+, Wenxiang Jiao, Yuejin Xie+, Chihao Shen+, Menghan Tian+, Wenxuan Wang, Jen-tse Huang, Pinjia He.
NeurIPS'25: Annual Conference on Neural Information Processing Systems, Datasets and Benchmarks Track

Trust, But Verify: A Self-Verification Approach to Reinforcement Learning with Verifiable Rewards

Xiaoyuan Liu+, Tian Liang, Zhiwei He, Jiahao Xu, Wenxuan Wang, Pinjia He, Zhaopeng Tu, Haitao Mi, Dong Yu.
NeurIPS'25: Annual Conference on Neural Information Processing Systems

research

talks