Thesis
Improving the efficiency of graph-based data mining with application to public health data
Washington State University
Master of Science (MS), Washington State University
2007
Handle:
https://hdl.handle.net/2376/102993
Abstract
Relational data are most naturally represented as a graph, with the entities as nodes and the relations between them as edges. Graph-based data mining looks for patterns that can best compress and represent the dataset and thus extract useful information from the data. An important topic in graph-based relational learning is its efficiency. A suitable search algorithm can largely improve the efficiency of substructure mining by finding better patterns in less time. In the thesis, performance of different search algorithms for graph-based relational pattern learning is studied. A complete graph space search algorithm, an efficient depth-limited search, and heuristic searches including beam search, hill climbing, stochastic hill-climbing, and simulated annealing (SA) are designed and implemented for pattern search in graph-based space. We also designed two new algorithms, SA-Greedy and Hill-Climbing with Stochastic Escape (HCSE). All seven algorithms are evaluated and compared by running with several depth limits on several datasets with Subdue, a graph-based data mining tool. The experimental results show that SA-Greedy finds best substructures in less time than the other search algorithms. The application of graph-based data mining in public health domain is also conducted. The Pandemic dataset is represented as a graph. Its intrinsic pattern is explored by graph-based data mining. Different search algorithms are run on it and show results consistent with the previous experiments.
Metrics
1 File views/ downloads
11 Record Views
Details
- Title
- Improving the efficiency of graph-based data mining with application to public health data
- Creators
- Yan Zhang
- Contributors
- Lawrence B. Holder (Degree Supervisor)
- Awarding Institution
- Washington State University
- Academic Unit
- Electrical Engineering and Computer Science, School of
- Theses and Dissertations
- Master of Science (MS), Washington State University
- Publisher
- Washington State University; Pullman, Wash. :
- Identifiers
- 99900525105501842
- Language
- English
- Resource Type
- Thesis