The goal of this research is to analyze patterns in the annotation of common vulnerabilities and exposures (CVE). One way to express these patterns is to relate CVEs to classes in Common Weakness Enumeration(CWE). Our research aims to improve this using the extensive annotation in the National Vulnerability Database (NVD). To this end, we process information from the NVD using the natural language processing model V2W-BERT, which generates a large tabular database of approximately 137,226 records each characterizing an annotation by a vector with 768 numerical attributes. Given the data in vector form, we are using the unsupervised machine learning tools to discover patterns through clustering. One of the tool we are using is Self-Organizing Maps(SOM), a well-established technique of data compression. We expect at least a 10-fold data compression, which means a SOM output array of 6417nodes from the full dataset. We are investigating the most informative way to interpret the SOM output array. For example, we have investigated how we can use the SOM generated codebooks of the output array to suggest the number of clusters in a K-means representation of the tabular data, followed by trace-back to the annotation to assign labels to the clusters.
Metrics
2 File views/ downloads
22 Record Views
Details
Title
CLUSTERING SOFTWARE VULNERABILITIES USING SELF-ORGANIZING MAPS
Creators
Khyati Panchal
Contributors
John Miller (Advisor)
Luis DeLaTorre (Committee Member)
Mahantesh Halappanavar (Committee Member)
Awarding Institution
Washington State University
Academic Unit
Engineering and Applied Sciences (TRIC), School of
Theses and Dissertations
Master of Science (MS), Washington State University