
\section{Related Work}
\label{sec:related}

The need for more semantic mechanisms to describe observational data
has led to many proposals for observational data models (e.g.,
\cite{om, tarboton07:_cuahs_commun_obser_data_model,netcdf}) and
ontologies (e.g.,
\cite{fox09:_ontol,sweet,mungall07:_repres_phenot_in_owl}).  The work
presented here is complementary to these efforts by providing a
concrete set of software components that have been integrated with
popular metadata tools (namely, Metacat
\cite{metacat02:_manag_heter_ecolog_data_using_morph} and Morpho
\cite{berkley01:_metac}) to provide a more uniform, semantic view of
heterogeneous observational data. By extending Morpho and Metacat to
support semantic annotations, these tools can provide additional help
to researchers interested in performing synthetic studies by providing
semantically-enhanced discovery and integration services, which are
largely lacking in many existing environmental information management
frameworks \cite{jones_new_2006}.

Our work on using semantic annotations for data integration is closely
aligned to traditional information integration approaches (e.g.,
\cite{kolaitis05}), where a global mediated
schema is used to (physically or logically) merge the structures of
heterogeneous data sources using mapping constraints among the source
and target schemas. As such, the observational model we employ in our
framework can be viewed as a (general-purpose) mediation schema for
observational data sets.  This schema can be augmented with logic
rules (as target constraints) where semantic annotations are used as
mapping constraints. However, instead of users specifying logic
constraints directly, we provide a high-level annotation language and
user-interface components (through Morpho) that can simplify the
specification of mappings and more naturally aligns with the
observation model.

Annotations are playing a more prominent role in database systems,
e.g., the MONDRIAN system \cite{GeertsKM06} employs an annotation
model and a set of query operators to manipulate both data and
annotations.  However, users must be familiar with the underlying data
structures (schemas) to take advantage of these operators, which is
generally not feasible for observational data in which data sets
exhibit a high degree of structural and semantic heterogeneity. Our
annotation approach used to extend EML is also similar in spirit to a
number of other high-level mapping languages used for data exchange
(e.g., \cite{fagin09:_clio,an06:_build_seman_mappin_datab_ontol}). Our
approach differs by being specifically tailored to the OBOE
observational model, which in turn simplifies the annotation language,
making it in general easier for users to specify annotations for
observational data. Our approach also provides well-defined and
unambiguous mappings from data sets to the observation and measurement
model, which is critical for providing automated, high-quality data
integration services over heterogeneous observational data.

% Some efforts have also been carried out for leveraging annotations,
% e.g., for the discovery of domain-specific data
% \cite{obsdatasearch09,StoyanovichMR10}, however, these approaches are
% largely based on keyword queries, and do not consider structured
% searches. Our work differs in that we consider a highly structured and
% generic model for annotations with the aim of providing a uniform
% approach for issuing structured data-discovery searches.



%%% Local Variables: 
%%% mode: latex
%%% TeX-master: "main"
%%% End: 
