Sitemap

A list of all the posts and pages found on the site. For you robots out there is an XML version available for digesting as well.

Pages

About me

Posts

Faceted maps in R

19 minute read

Published:

I recently needed to create a choropleth of a few different countries for a project on targeting of UN peacekeepers by non-state armed actors I’m working on. A choropleth is a type of thematic map where data are aggregated up from smaller areas (or discrete points) to larger ones and then visualized using different colors to represent different numeric values.

Finding Backcountry Campsites with CalTopo, OpenStreetMap, and R

32 minute read

Published:

Like many people, I’ve been spending more time outdoors during this pandemic. While this means daily walks in my neighborhood, it also means getting out into the wilderness and sleeping in a tent when I can. Although outdoor recreation is one of the safer ways to entertain yourself these days, it’s not without its own concerns. The difficulty of safely getting to trailheads means that while I’m backpacking more than usual, it’s still not as often as I’d like.

R Markdown, Jekyll, and Footnotes

8 minute read

Published:

I use jekyll to create my website. Jekyll converts Markdown files into the HTML that your browser renders into the pages you see. As others and I have written before, it’s pretty easy to use R Markdown to generate pages with R code and output all together. One thing has consistently eluded me, however: footnotes.

Working with Large Spatial Data in R

22 minute read

Published:

In my research I frequently work with large datasets. Sometimes that means datasets that cover the entire globe, and other times it means working with lots of micro-level event data. Usually, my computer is powerful enough to load and manipulate all of the data in R without issue. When my computer’s fallen short of the task at hand, my solution has often been to throw it at a high performance computing cluster. However, I finally ran into a situation where the data proved too large even for that approach.

Jekyll and HTML Widgets

9 minute read

Published:

I’m currently compiling a list of university-affiliated programs designed to help prepare students for graduate study in political science and assist them in the process of applying to graduate school (a labyrinthine and opaque process in many regards). Since travel costs can be a deciding factor for some students when deciding whether to apply to these programs, I thought it would be nice to also put them on a map.

Extracting UN Peacekeeping Data from PDF Files

18 minute read

Published:

Some coauthors and I recently published a piece in the Monkey Cage on the recent military coup in Mali and the overthrow of president Ibrahim Boubacar Keïta. We examine what the ouster of Keïta means for the future of MINUSMA, the United Nations peacekeeping mission in Mali. One of my contributions that didn’t make the final cut was this plot of casualties to date among UN peacekeepers in the so-called big 5 peacekeeping missions .

Adding Content to an Academic Website

12 minute read

Published:

One thing I haven’t covered in my previous posts on creating and customizing an academic website is how to actually add content to your site. You know, the stuff that’s the reason why people go to your website in the first place? If you’ve followed those guides, your website should be professional looking and already feeling a little bit different from the stock template. However, adding new pages or tweaking the existing pages can be a little intimidating, and I realized I should probably walk through how to do so. Luckily Jekyll’s use of Markdown makes it really easy to add new content!

Customizing an Academic Website

10 minute read

Published:

This is a followup to my previous post on creating an academic website. If you’ve followed that guide, you should have a website that’s professional-looking and informative, but it’s probably lacking something to really make it feel like your own. There are an infinite number of ways you could customize the academicpages template (many of them far, far beyond my abilities) but I’m going to walk you through the process I used to start tweaking my website. The goal here isn’t to tell you how you should personalize your website, but to give you the tools to learn how to implement whatever changes you want to make.

Building an Academic Website

29 minute read

Published:

If you’re an academic, you need a website. Obviously I agree with this since you’re reading this on my website, but if you don’t have one, you should get one. Most universities these days provide a free option, usually powered by WordPress (both WashU and UNC use WordPress for their respective offerings). While these sites are quick to set up and come with the prestige of a .edu URL, they have several drawbacks that have been extensively written on.

Visualizing Police Militarization

5 minute read

Published:

Much has been written lately about the increasing militarization of US law enforcement. One of the most visible indicators of this shift in recent decades is the increased frequency of tactical gear and equipment worn and carried by police officers. However, this pales in comparison to images of police departments bringing armored vehicles to peaceful protests.

Counting Words in a Snap

3 minute read

Published:

14 pt periods. 1.05” margins. 2.1 spaced lines. Times Newer Roman. I’ve seen them all, and I’m tired of trying to catch them. So, I’ve stopped assigning papers in terms of page length and switched to word counts. Unfortunately, counting words is more time-intensive than counting pages.

Better Beamer Presentations the Easy Way

9 minute read

Published:

Everyone knows that Beamer makes frankly terrible presentations without a good deal of help. A well crafted Beamer presentation can be a thing of beauty, especially since you can use knitr or R Markdown to automatically generate tables and figures, but it takes a lot of work.

Checking Progress with Bash

8 minute read

Published:

I’m currently cleaning and wrangling a large (> 2 billion observations) dataset. Due to its size, I’m running code in batch mode on a remote cluster. Not running interactively makes it harder for me to check on my code’s progress.

Fancy Icons and LaTeX Quirks

2 minute read

Published:

I recently updated my CV to add my ORCiD identifier to it up top among the other places to find me online. An ORCiD is an online identifier that persists through any changes to your name, institution, or email address throughout your life.

Combining PDF Documents the Smarter Way

5 minute read

Published:

My previous post on combining multiple PDF files had an important caveat that things would end up in the wrong order if you had files with leading ID numbers that started at 1 and ended at 12, you’d end up with PDFs combined in the order 1, 10, 11, 12, 2, 3, …, 9.

Combining PDF Documents

3 minute read

Published:

How many times have you found that your institution has access to a digital version of a book you need only to discover that it comes in 15 different PDF files?

publications

Experience Report: System Log Analysis for Anomaly Detection

Shilin He, Jieming Zhu, Pinjia He, Michael R. Lyu.
ISSRE'16: International Symposium on Software Reliability Engineering
Most Influential Paper Award

Drain: An Online Log Parsing Approach with Fixed Depth Tree

Pinjia He, Jieming Zhu, Zibin Zheng, Michael R. Lyu.
ICWS'17: International Conference on Web Services

Towards Automated Log Parsing for Large-Scale Log Data Analysis

Pinjia He, Jieming Zhu, Shilin He, Jian Li, Michael R. Lyu.
TDSC'18: IEEE Transactions on Dependable an Secure Computing

Characterizing the Natural Language Descriptions in Software Logging Statements

Pinjia He, Zhuangbin Chen, Shilin He, Michael R. Lyu.
ASE'18: International Conference on Automated Software Engineering

Logzip: Extracting Hidden Structures via Iterative Clustering for Log Compression

Jinyang Liu, Jieming Zhu, Shilin He, Pinjia He*, Zibin Zheng, Michael R. Lyu.
ASE'19: International Conference on Automated Software Engineering

Structure-Invariant Testing for Machine Translation

Pinjia He, Clara Meister, Zhendong Su.
ICSE'20: International Conference on Software Engineering

Machine Translation Testing via Pathological Invariance

Shashij Gupta, Pinjia He*, Clara Meister, Zhendong Su.
ESEC/FSE'20: Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering

Testing Machine Translation via Referential Transparency

Pinjia He, Clara Meister, Zhendong Su.
ICSE'21: International Conference on Software Engineering

A Survey on Automated Log Analysis for Reliability Engineering

Shilin He, Pinjia He*, Zhuangbin Chen, Tianyi Yang, Yuxin Su, Michael R. Lyu.
CSUR'21: ACM Computing Surveys

SanRazor: Reducing Redundant Sanitizer Checks in C/C++ Programs

Jiang Zhang, Shuai Wang, Manuel Rigger, Pinjia He, Zhendong Su.
OSDI'21: USENIX Symposium on Operating Systems Design and Implementation

Automated Testing of Image Captioning Systems

Boxi Yu, Zhiqing Zhong, Xinran Qin, Jiayi Yao, Yuancheng Wang, Pinjia He*.
ISSTA'22: International Symposium on Software Testing and Analysis

AEON: A Method for Automatic Evaluation of NLP Test Cases

Jen-tse Huang, Jianping Zhang, Wenxuan Wang, Pinjia He*, Yuxin Su, Michael R. Lyu.
ISSTA'22: International Symposium on Software Testing and Analysis

MTTM: Metamorphic Testing for Textual Content Moderation Software

Wenxuan Wang, Jen-tse Huang, Weibin Wu, Jianping Zhang, Yizhan Huang, Shuqing Li, Pinjia He*, Michael R. Lyu.
ICSE'23: International Conference on Software Engineering

Validating Multimedia Content Moderation Software via Semantic Fusion

Wenxuan Wang, Jingyuan Huang, Chang Chen, Jiazhen Gu, Jianping Zhang, Weibin Wu, Pinjia He*, Michael R. Lyu.
ISSTA'23: International Symposium on Software Testing and Analysis

ROME: Testing Image Captioning Systems via Recursive Object Melting

Boxi Yu, Zhiqing Zhong, Jiaqi Li, Yixing Yang, Shilin He, Pinjia He*.
ISSTA'23: International Symposium on Software Testing and Analysis

BiasAsker: Measuring the Bias in Conversational AI System

Yuxuan Wan, Wenxuan Wang, Pinjia He, Jiazhen Gu, Haonan Bai, Michael R. Lyu.
ESEC/FSE'23: Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering

Automated Testing and Improvement of Named Entity Recognition Systems

Boxi Yu, Yiyan Hu, Qiuyang Mang, Wenhan Hu, Pinjia He*.
ESEC/FSE'23: Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering

An Image is Worth a Thousand Toxic Words: A Metamorphic Testing Framework for Content Moderation Software

Wenxuan Wang, Jingyuan Huang, Jen-tse Huang, Chang Chen, Jiazhen Gu, Pinjia He, Michael R. Lyu.
ASE'23: International Conference on Automated Software Engineering

AutoLog: A Log Sequence Synthesis Framework for Anomaly Detection

Yintong Huo, Yichen Li, Yuxin Su, Pinjia He, Zifan Xie, Michael R. Lyu.
ASE'23: International Conference on Automated Software Engineering

Hue: A User-Adaptive Parser for Hybrid Logs

Junjielong Xu, Qiuai Fu, Zhouruixing Zhu, Yutong Cheng, Zhijing Li, Yuchi Ma, Pinjia He*.
ESEC/FSE'23: Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering

UniLog: Automatic Logging via LLM and In-Context Learning

Junjielong Xu, Ziang Cui, Yuan Zhao, Xu Zhang, Shilin He, Pinjia He, Liqun Li, Yu Kang, Qingwei Lin, Yingnong Dang, Saravan Rajmohan, Dongmei Zhang.
ICSE'24: International Conference on Software Engineering

Deep Learning or Classical Machine Learning? An Empirical Study on Log-Based Anomaly Detection

Boxi Yu, Jiayi Yao, Qiuai Fu, Zhiqing Zhong, Haotian Xie, Yaoliang Wu, Yuchi Ma, Pinjia He*.
ICSE'24: International Conference on Software Engineering

An Exploratory Investigation of Log Anomalies in Unmanned Aerial Vehicles

Dinghua Wang, Shuqing Li, Guanping Xiao, Yepang Liu, Yulei Sui, Pinjia He*, Michael R. Lyu.
ICSE'24: International Conference on Software Engineering

DivLog: Log Parsing with Prompt Enhanced In-Context Learning

Junjielong Xu, Ruichun Yang, Yintong Huo, Chengyu Zhang, Pinjia He*.
ICSE'24: International Conference on Software Engineering

Testing Graph Database Systems via Equivalent Query Rewriting

Qiuyang Mang+, Aoyang Fang+, Boxi Yu, Hanfei Chen, Pinjia He*.
ICSE'24: International Conference on Software Engineering

GPT-4 Is Too Smart To Be Safe: Stealthy Chat with LLMs via Cipher

Youliang Yuan, Wenxiang Jiao, Wenxuan Wang, Jen-tse Huang, Pinjia He*, Shuming Shi, Zhaopeng Tu.
ICLR'24: International Conference on Learning Representations

research

talks