Congratulations to the three winners of the poster competition at the Computer Science and Electrical Engineering Department's annual Research Review, which took place in the UMBC Technology Center's business incubator and accelerator building last Friday. Winners were chosen by UMBC faculty who scored their top five choices with [-9, +9] range voting.
1st place (26 points):
Varish Mulwad (CS, Ph.D.) "A Probabilistic Model for Generating Linked Data from Tables"
Advisor: Tim Finin
Vast amount of information is encoded in tables found in documents, on the Web, and in spreadsheets or databases. Integrating or searching over this information benefits from understanding its intended meaning and making it explicit in a semantic representation language like RDF. Most current approaches to generating Semantic Web representations from tables requires human input to create schemas and often results in graphs that do not follow best practices for linked data. Evidence for a table's meaning can be found in its column headers, cell values, implicit relations between columns, caption and surrounding text but also requires general and domain-specific background knowledge. Approaches that work well for one domain, may not necessarily work well for others. We describe a domain independent framework for interpreting the intended meaning of tables and representing it as Linked Data. At the core of the framework are techniques grounded in graphical models and probabilistic reasoning to infer meaning associated with a table. Using background knowledge from resources in the Linked Open Data cloud, we jointly infer the semantics of column headers, table cell values (e.g., strings and numbers) and relations between columns and represent the inferred meaning as graph of RDF triples. A table's meaning is thus captured by mapping columns to classes in an appropriate ontology, linking cell values to literal constants, implied measurements, or entities in the linked data cloud (existing or new) and discovering or and identifying relations between columns.
2nd place (18 points):
Richard Van Tassel (CS, M.S.) "Visual Obstruction Resistance for Emotion Detection"
Advisor: Marie desJardins
There is an increasing interest in developing systems that can determine a user's emotion by analyzing a video feed of the user's face. However, it cannot always be assumed that the user's face will be completely unobstructed by facial hair or apparel. If the system is a recreational or consumer good, it could be considered too restrictive to require a perfect view of the face at all times. Obstructions can prevent the system from identifying all of the facial expression components, called action units, present in the input face. It is therefore important that such emotion detection systems are capable of coping with partially obstructed faces. I propose a technique for reducing the effect of face obstructions. The technique will learn association rules between sets of action units from a set of unobstructed faces. Then, for a given input obstructed face, the technique will infer what action units are likely to be obstructed based on the visible ones, and will use this hypothetical set of action units to infer the emotion. This technique is tested on real face data, with simulated face obstructions. It will provide a statistically significant improvement in emotion detection accuracy over the same process without the technique applied.
3rd place (16 points):
Patricia Ordonez (CS, Ph.D) (pictured) "Multivariate Time Series Analysis of Physiological and Clinical Data"
Advisor: Marie desJardins, Tim Oates
The complexity and volume of collected medical data is greater now than at any point in the history of medicine. Providers are expected to examine large volumes of data and identify correlations between parameters based on their own clinical experience to detect significant medical events. The information overload that providers face may hinder the diagnostic process. Existing visualizations to assist the provider in analyzing information consist mainly of tables or plots of values for a particular parameter over time. Multivariate Time Series Amalgams (MTSAs) provide an integrated, multivariate approach to represent clinical and physiological data. The hybrid representation automates the personalization of baselines and threshold values based on a patient’s medical history, while also incorporating traditional baselines and thresholds. MTSA visualizations capture the rate of change of provider-selected parameters and the relationships among them.
The second half of my research consists of developing automated techniques for discovering correlations among parameters over time to assist providers in making a diagnosis. The underlying premise of my research is that the complexity of a highly integrated system such as a human being is better captured by examining patterns as multivariate temporal abstractions as opposed to conjunctions of univariate ones — the more common approach for multivariate time series analysis and in medicine. The objective of such an approach is to assist in the identification of latent patterns within the data associated with specific medical conditions or significant medical events. Thus, in addition to the MTSA visualizations, I will present two novel multivariate time series representations, Stacked Bags-of-Patterns and Multivariate Bag-of-Patterns, which have been effective at classifying medical data. These representations are more compact than the raw multivariate time series and would facilitate the retrieval of patients from large medical databases based on physiological similarity and ideally on the presence of similar medically significant events or medical conditions. These techniques been compared to two other multivariate versions of univariate time series representations, Piecewise Dynamic Time Warping and Ensemble Voting using Bag-of-Patterns. Results demonstrate the potential of using these representations for multivariate time series analysis.