Interactive Social Insight Discovery Using Visual Analytics
Uses language analysis to summarize social media and enables users to interactively explore the temporal and geospatial trends in topics.

Background
Massive amounts of data, including both textual and multi-media data, are collected in real-time regarding who we are, where we are, and what we are talking about. Particularly, the emergence of microblogging such as “tweets” has yielded an overwhelming amount of such social media data. Just to provide an example of the popularity these social media have gained over the past few years, Twitter alone rose from about 6 million visitors per month in January 2009 to over 32 million per month as of July 2011. Based on multiple estimates, on an average day, users globally submit 95 million “tweets” on Twitter; and for each month, users share about 30 billion pieces of content on Facebook.
Analyzing this rich textual data gives us the ability to understand and even predict people’s interests, and further to depict the shifts and turns of social activity at the individual, group and global level. The analysis of the ubiquitous social era is going to have a profound impact on understanding social phenomena. With such abundance of information however, comes with the scarcity of methods to extract the latent semantic information in the massive text corpus. As streams of diverse information constantly arrive at end users, it is difficult to harvest important and interesting messages. In addition, a user might desire to identify useful content outside of her own streams, such as relevant blogs that are similar in content to her own tweets. This task involves not only a meaningful summary of vast information streams, but also the support for exploration of content based on topical similarity.
Technology Overview
This technology covers a new process that combines the data-centric topic modeling technologies with user-centric interactive visual analytics systems which we call “Interactive-Social-Insights” (I-Si). I-Si is founded on a rich-server and rich-client architecture, and visually represents the semantically meaningful topical aspects of a large social textual corpus. Currently, I-Si permits effective text analysis of moderate-sized social data, and enables interactive analysis of the extracted topical categories. I-Si also allows users to visually depict both temporal and geospatial developments of topics. Users can interactively monitor and response to the topical trends and patterns that are otherwise hidden in the social data. The technology includes an interactive visual analytics system for exploration of social media data.
In detail, I-Si integrates a state-of-the-art probabilistic topic model Latent Dirichlet Allocation (LDA) with interactive visualization. Instead of relying on keywords (such as hashtags on Twitter), I-Si first extracts a set of semantically meaningful topics using LDA. To highlight this property of such model, I-Si utilizes rich visualizations to present the probabilistic distribution of microblogs across topics. This enables users to visually depict temporal and geospatial developments of topics. Users can thus identify useful topical trends and patterns interactively. I-Si is unique in providing in-depth analysis of social data.
Technology Status
The technology has been demonstrated to work, especially in using Twitter feeds. Near real-time analysis of these feeds and presentation of the results using data visualization techniques has been achieved.
Benefits
- The technology provides the ability to monitor very large data sets in near-real time to determine the trend in topics both over time and by geographical location.
- Large amounts of data can be presented graphically to the user in an easily understood format so as to inform the viewer of trending topics.
- Textual analysis allows the viewer to see broadly the spread of topics without reliance on keywords and hashtags to identify and aggregate those topics.
Applications
- Public health officials and emergency services can use this technology to identify the outbreak of diseases and the areas impacted by natural disasters in near-real time.
- Advertisers and pollsters can use the technology to determine how effective their marketing efforts are and to identify the effectiveness of their work across various geographical areas.
Opportunity
UNC Charlotte is looking for a commercial partner to bring this technology to market. Exclusive patent license available with flexible and favorable terms.