Training AI to Code Using Project CodeNet - Largest Code Dataset

Overview

Explore a comprehensive conference talk on leveraging Project CodeNet, a massive dataset of 14 million code samples, to train AI for coding tasks. Discover how the Machine Learning Exchange (MLX) can be used to classify code and analyze complexity in three steps. Learn about turning domain-specific data subsets into Kubernetes Custom Resources using DataShim, training deep learning models with Jupyter notebooks on Kubernetes, and serving models for inferencing as Kubernetes Custom Resources via KServe. Gain insights into how MLX generates Kubeflow Pipelines on Tekton, eliminating the need for data scientists to write Kubernetes-specific code. Delve into the potential of machine learning for code, including code similarity detection, semantic context extraction, and cross-language translation.