Lost and Found in Translation - An Intro to ML Assisted Decompilation and Deobfuscation

Overview

Explore the intersection of machine learning and cybersecurity in this 54-minute conference talk from NDC Conferences. Delve into the world of ML-assisted decompilation and deobfuscation, examining how large language models like BERT, RoBERTa, and GPT-3 are revolutionizing code processing. Discover the potential security risks and benefits of AI-powered code generation tools such as GitHub's Copilot. Learn about neural machine translation techniques applied to reverse engineering, including decompilation and deobfuscation for malware analysis. Gain insights into the transformer architecture, encoder-decoder models, and fine-tuning processes. By the end, acquire a comprehensive understanding of natural language processing techniques, large language model training, and their applications in code generation, semantic code search, decompilation, and deobfuscation.

Syllabus

Introduction
Who am I
Agenda
Google Whistleblower
GitHub CoPilot
Security
Unsupervised Learning
Reinforcements
Two Rules of Thumb
Machine Learning Models
Feature Engineering
Pseudocode
Natural Language Translation
Neural Machine Translation
Sequence to Sequence Models
Encoder Decoder Architecture
Recurrent Neural Networks
Encoder
Attention
Bert
Language Models
Preprocessing
Data Acquisition
Summary
Questions