Structure Preserving Semantic Matching (SPSM)

Structure Preserving Semantic Matching is a type of semantic matching that takes two tree-like structures and produces a global similarity score between the trees and a set of correspondences between the nodes. The correspondences preserve a set of structural properties, namely:

  1. one-to-one correspondences between semantically related nodes,
  2. leaf nodes are matched to leaf nodes and internal nodes are matched to internal nodes.

A set of mappings produced by SPSM when comparing two extract of University course catalogs is presented in Figure 1. Note how the set of mappings preserve the above mentioned structural properties. A comparison with the other semantic approaches which do not preserve these set of structural properties can be found in the Examples and comparison page.

The set of minimal mappings produced by SPSM when comparing two extracts of University course catalogs
Figure 1: The set of minimal mappings produced by SPSM when comparing two extracts of University course catalogs.

The main functionality of this type of semantic matching technique is to compare function definitions. Functions can be represented as tree like structures where each function name is an internal node, and each parameter is represented as a leaf node. The goal of SPSM when comparing functions (for example, web services) is to establish whether two functions are similar enough (given by the global similarity score) and to create an adaptor that maps the components of these functions. The structural properties of SPSM guarantees that functions are mapped only to functions (internal nodes) and parameters are mapped only to parameters (leaf nodes). The additional property of generating only one-to-one correspondences between nodes allows us to generate an adaptor where one function or parameter in one tree is mapped only to one function call or parameter in the other tree respectively.

The SPSM approach follows the work in (Giunchiglia et al., 2008). The matching process is organized in two steps: (i) node matching and (ii) tree matching. Node matching solves the semantic heterogeneity problem by considering only labels at nodes and domain specific contextual information of the trees. To match nodes, SPSM approach uses the S-Match system as proposed in (Giunchiglia et al., 2007). Tree matching, in turn, exploits the results of the node matching and the structure of the trees to find if these globally match each other. We are mainly interested in approximate matching, since two web service descriptions may only rarely match perfectly in open environments, see (Giunchiglia et al., 2008) for details. Technically, two trees T1 and T2 approximately match if there is at least one node n1i in T1 and node n2j in T2 such that: (i) n1i approximately matches n2j, (ii) all ancestors of n1i are approximately matched to the ancestors of n2j. Notice that the horizontal order of siblings is not preserved being not a desirable property for the data translation purposes.

The implementation of approximate structure preserving semantic matching is based on (i) the theory of abstraction (Giunchiglia and Walsh, 1992) and (ii) the tree-edit distance. Specifically, the work in (Giunchiglia and Walsh, 1992) categorizes the various kinds of abstraction operations. The key idea is to use abstractions/refinements as tree-edit distance operations in order to estimate the similarity of two tree structures.