Thesis
Detecting code clones
Washington State University
Master of Science (MS), Washington State University
08/2020
DOI:
https://doi.org/10.7273/000004169
Handle:
https://hdl.handle.net/2376/125250
Abstract
People plagiarize because sometimes it is easier for them to plagiarize then to do the work in the first place. Plagiarism has many forms, it could range from just copy and pasting text found off the internet, to modifying the text so that it appears they wrote the text on their own. Programming assignments can modify the text in ways that do not affect the execution. These files that are functionally the same, but textually different, are clones of each other. We explored techniques that are good at finding files that are clones of each other and picked the one that worked best for our use case. An important part of using a technique is fine tuning the algorithm so that the inputs are behaving a way that optimizes the output without hurting efficiency. By modifying the inputs in ways that someone plagiarizing the code would, we can evaluate the performance of the technique. While processing the documents we found that there are parts of the document that have an over representation over other parts. The theory is that the parts of the documents that have a higher representation are parts that are shared across many of the documents. We found that the tuning of the technique used to detect code clones is going to be dependent on the format of the document that is being used, and that there are parts amongst the documents that are shared the impact of these parts may be discounted to better evaluate document pair similarity.
Metrics
7 File views/ downloads
49 Record Views
Details
- Title
- Detecting code clones
- Creators
- Christopher Daniel Bettis
- Contributors
- Scott Wallace (Advisor) - Washington State University, Engineering and Computer Science (VANC), School of
- Awarding Institution
- Washington State University
- Academic Unit
- Engineering and Computer Science (VANC), School of
- Theses and Dissertations
- Master of Science (MS), Washington State University
- Publisher
- Washington State University
- Identifiers
- 99900890769401842
- Language
- English
- Resource Type
- Thesis