VULNERABILITY ANALYSIS PIPELINE USING COMPILER BASED SOURCE TO SOURCE TRANSLATION AND DEEP LEARNING Cover Image

VULNERABILITY ANALYSIS PIPELINE USING COMPILER BASED SOURCE TO SOURCE TRANSLATION AND DEEP LEARNING
VULNERABILITY ANALYSIS PIPELINE USING COMPILER BASED SOURCE TO SOURCE TRANSLATION AND DEEP LEARNING

Author(s): Jan-Alexandru VĂDUVA, Ioana Culi, Alexandru Radovici, Razvan RUGHINIS, Ştefan-Gabriel Dascălu
Subject(s): Security and defense, Higher Education , ICT Information and Communications Technologies, Distance learning / e-learning
Published by: Carol I National Defence University Publishing House
Keywords: artificial neural networks; computer security; source code vulnerability;

Summary/Abstract: A major problem in modern systems are vulnerable application. With modern operating systems becoming more user friendly, a huge part of its users are inexperienced and are not trained to prevent the exploitation of their system from vulnerabilities. Many users do not update their system regularly, which makes them vulnerable to public vulnerabilities that have been exploited. Forecasts show that security related expenditure are becoming extremely expensive. This only reiterates the fact that prevention is a key business plan for multiple corporations and exploitable vulnerabilities tend to cost a lot more. One such example is a report from IBM where an analysis between July 2018 and April 2019 based on 507 companies concluded the fact that the average data breach cost is 8.19 million dollars. In this context, the need for automatic vulnerability detection has risen, especially given the large array of available libraries and open source programs. When an application is written, a trust is placed in any of the libraries it links to and also in any other application that is communicating with it. This paper aims to improve the results obtained for code vulnerability detection with an implementation of a C/C++ vulnerability detection method based on project source code lexicon normalization with the help of tokens and classification with the help of deep learning approaches. The proposed pipeline is a three staged process which includes: scraping source files from GitHub using a static analyzer to produce labels, training the deep learning model on the previous labeled and normalized samples and analyzing the results based on different deep learning methods. We supplement available labeled vulnerability datasets from an array of labeled open source projects gathered with a web-crawler from GitHub. We improve the vulnerability detection tool with the newly provided input and evaluate based on an implementation with binary crossentropys, and adam. Also, we compare multiple types of neural networks in order to see if a temporal based approach offered by long short-term memory (LSTM) and bidirectional LSTM are better than the multilayer perceptron and convolutional neural networks (CNN). We evaluated our implementation with real software packages, NIST SATE IV and Draper VDISC benchmark datasets and obtained 96% accuracy with our analysis.

  • Issue Year: 16/2020
  • Issue No: 01
  • Page Range: 645-652
  • Page Count: 8
  • Language: English