Thesis
Approximate XPath
Master of Science (MS), Washington State University
2004
Handle:
https://hdl.handle.net/2376/202
Abstract
As XML has been developed over the past few years, its role has expanded beyond its original domain as a semantics-preserving markup language for online document, and it is now the de facto format for data interchanging and integration among distributed, heterogeneous sources. Several query languages have been proposed that are based on path expressions. Because of the inherent data heterogeneity in XML data, exact path expressions may not locate desired data. It is more appropriate to have an approximate query system that can return relevant results when exact path expressions fail to locate the data This thesis proposes an approximate query language, ApproxXPath, that can cope with data heterogeneity. It extends the popular XPath language by relaxing its semantics. ApproXPath allows both content mismatch and structure mismatch. ApproXPath queries can locate data that is within some number of errors away from the original XML data. The distance away from the exact data is measured by counting how many string edit and tree edit operations are needed to find the data. Our approach can be categorized as query relaxation. ApproXPath redefines the semantics of axes, node test and predicates based on string/tree edit distance. The algorithms we present use navigation-based query evaluation. We also sketch an index-based solution, which is useful for searching in a XML database. We show that the complexity of ApproXPath is reasonable. The thesis also presents an empirical evaluation. ApproXPath is implemented in Java. It combines the front end of Apache Xalan with our own approximate query engine as its back end. The thesis reports the performance of AppproXPath, both exact matching with respect to Xalan and inexact matching varying number of errors allowed. For many queries, the inexact matching (with no errors) is as fast as exact matching and increases linearly with the number of errors.
Metrics
1 File views/ downloads
16 Record Views
Details
- Title
- Approximate XPath
- Creators
- Lin Xu
- Contributors
- Curtis Dyreson (Degree Supervisor)
- Awarding Institution
- Washington State University
- Academic Unit
- Electrical Engineering and Computer Science, School of
- Theses and Dissertations
- Master of Science (MS), Washington State University
- Identifiers
- 99900525043901842
- Language
- English
- Resource Type
- Thesis