Abstract

Location-Based Anomaly Detection in Social Media

We live in a time of unprecedented access to social shared data; this creates new opportunities for monitoring behavior and trends ranging from public health, the hottest concert venues, and even criminal activity. Unlike traditional methods of monitoring, such as surveys, phone calls, and a census, we’re able to create a purely dynamic method. This method delivers information for monitoring the population in real-time, a significant advantage over the staticness of traditional methods. In this research project, we focus on detecting location-based anomalies in social media, which is an important first step toward creating a concise visualization of population trends and movements across the globe. A location-based anomaly is defined as any deviation from the normal behavior patterns of the population. For example, if we perform a search with “concert” as the keyword within the region of Austin, TX, it should be no surprise that venues such as the Austin Music Hall or Frank Erwin Center appear on a regular basis. However, if there’s a surge of activity in a suburb, it may mean that people are gathering for an unannounced block party. Hence, we develop new methods and a prototype anomaly detection system that evolves with respect to social media (in this particular case, Twitter) to track anomalies and alert users. The system is based on keyword-sensitive heat maps that contains implicit patterns relative to the density of Tweets in an area, which ultimately reflect the behavior and activity within a set geographical region. In our initial work, we have (i) collected geo-tagged tweets from particular geographical bounding boxes (e.g., reflecting all activity in Austin, TX); (ii) placed these tweets in a database for analysis and application (iii) charted each individual tweet’s location using a heat map based API to express the density of keywords (initially on count) in areas within the bounding box.

In our continuing work, we are developing the core anomaly detection framework by considering keyword-based heat map patterns over multiple time intervals to identify outliers.