A framework for querying heterogeneous data source driven by context

The following image provides an overview of the entire system, aimed at offering the user the opportunity to make retrieve information s/he is interested in, from a set of heterogeneous data sources. The user is not really aware of the richness and heterogeneity of the data sources, and the informatio request s/he performs is actually driven by her/his current context, making the entire operation as simple as possible.

Overview

The scenario of the framework consists of several data sources, not necessarily implemented with the same technology (Relational Databases, XML documents, Ontologies, etc), that are used as the source of information to be queried. In order to gather a unique vision of all the available information, a global view of the schema of the data sources must be derived (using a Global As a View -- GAV -- approach). Such a global view is expressed in an internal representation format suitable for the entire methodology, able to offer the necessary support and features for the context-aware management of the information.
The methodology defines a context-aware association between (a) the possible context a user may be in and (b) the portion of the information s/he would be interested in, with respect to the global schema. This association consists of a "context--data portion dictionary" that is used when the user executes a "context-aware" query.

The user query is processed and a set of queries are re-formulated against the data sources original schema and format, exploiting the meta-information derived by the wrappers and the schema integration phase (intensional integration).
The data provided by the sources, in their (heterogeneous) formats is then integrated and sent to the user. Again, the meta-information from the wrapping and schema integration phases is used to perform the fusion of the data (extensional integration).

Elements

Internal Representation (IR) Format

We are investigating different solutions for the pivot format to represent the global schema, estract and integrate the data sources schemata, and to perform the association between contexts and data portions. At present, the formalims under considerations are: (a) relational database, (b) Extended SDR Networks, and (c) Ontologies.

Domain Model

In order to be able to manage the information of a given application scenario, a domain model is used, containing -- in the same formalism adopted for the Internal Representation -- the knowledge on the "world".

Wrappers

These modules are devoted to the extraction of the schema of a data source from its native format to the Intermediate Representation (IR) format adopted within the framework.

Data Sources Schema Integrator

The internal representation of the schema of the available data sources need to be combined to provide an integrated global view of the available information. Such an operation is performed with the support of a domain model which expresses the reality of the working scenario.

Context-Aware Query

The user identifies (either explicitely by selecting options or implicitly if the parameters can be perceived autonomously by the device - e. g., a location by means of a GPS) her/his current context and requests the data that is deemed important in such a situation.

Query Conversion and Distribution

A single context-aware view is associated with a context, and is expressed with respect to the derived global schema: in order to retrieve the data from the real data sources, it is necessary to convert such view into queries formulated on the specific sources, in their native language.

Data Sources Information Integrator

When the data sources have been singularly queried, in their own native format, the retrieved information needs to be merged and integrated, in order to provide the user with a single block of information. In this phase, both data conversion and data integration are necessary.

Open projects

Part of the framework has already been implemented, while there are portions that still need to be developed. Among the elements that still need to be analyzed, designed and implemented there are:

For more information on the available projects and theses, send an email.