TransCoder - Unsupervised Translation of Programming Languages

Overview

Explore an in-depth analysis of the TransCoder paper, which introduces an unsupervised neural machine translation approach for translating programming languages. Delve into the challenges of code migration between Python, C++, and Java, and learn how this innovative method overcomes the limitations of traditional automatic translation tools. Discover the key components of the TransCoder model, including shared embeddings, masked language modeling, denoising, and back-translation objectives. Examine the evaluation dataset, results, tokenization techniques, and human-aware translation aspects. Gain insights into the model's performance, potential failure cases, and its implications for the future of automated code translation across different programming languages.

Syllabus

- Intro & Overview
- The Transcompiling Problem
- Neural Machine Translation
- Unsupervised NMT
- Shared Embeddings via Token Overlap
- MLM Objective
- Denoising Objective
- Back-Translation Objective
- Evaluation Dataset
- Results
- Tokenization
- Shared Embeddings
- Human-Aware Translation
- Failure Cases
- Conclusion