Microsoft and Intel turn malware samples into images for deep learning
Microsoft and Intel will collaborate on a research project that converts malware into images for better pattern recognition. In this way, deep-learning algorithms can better study the malware.
The project is called Stamina, an acronym for static malware-as-image network analysis. With the project, Microsoft and Intel want to convert malware code samples into grayscale images. The binary data of such a sample is converted into raw pixel data. That one-dimensional pixel stream is then converted into a two-dimensional image.
It is important that the images are first made smaller. According to the companies, this is necessary to prevent processing too large images from slowing down the process. After making the images smaller, the images are given to a deep learning algorithm. That is an algorithm that can learn from unstructured libraries itself.
The scientists provided the algorithm with 2.2 million hashes of malware files. Two-thirds of that was used to train the algorithm, and the rest to test and validate the results.
The researchers say that 99.07 percent of the images were classified as malware during an initial test. False positives were involved in 2.58 percent of the cases. The researchers say the results are a sign that deep learning of malware through images is a good method for further study.