I have concrete results for my research! I can now successfully parse a program, assign all of its opcodes unique x and y components, and graph the walk that program takes. Another aspect I decided to add was to graph frequent patterns in the code. This could be useful because there are many ways to write code that does the same thing, so maybe graphing each assembly instruction is too low level. Hopefully by extracting frequent patterns, we are extracting the most important operations. Then if we see common patterns between different programs, we are seeing that they share some core features, so maybe they do similar things. I still need to test this on more samples. I’ve only been working with 2 so far, which is not nearly enough to make generalized claims.
The other thing I’ve started working on is defining a distance metric between 2 programs. There are some papers by Numrich that I’m trying to read through and implement. Being able to get a numerical value to compare the programs would be a big improvement. The eventual idea is to have code that runs in the background that can alert someone if some piece of code they have appears to be malware, so we don’t want to rely on visual comparisons. Will update when I’ve made more progress!