Training AI to Code Using Project CodeNet - Largest Code Dataset
CNCF [Cloud Native Computing Foundation] via YouTube
Overview
Explore a comprehensive conference talk on leveraging Project CodeNet, a massive dataset of 14 million code samples, to train AI for coding tasks. Discover how the Machine Learning Exchange (MLX) can be used to classify code and analyze complexity in three steps. Learn about turning domain-specific data subsets into Kubernetes Custom Resources using DataShim, training deep learning models with Jupyter notebooks on Kubernetes, and serving models for inferencing as Kubernetes Custom Resources via KServe. Gain insights into how MLX generates Kubeflow Pipelines on Tekton, eliminating the need for data scientists to write Kubernetes-specific code. Delve into the potential of machine learning for code, including code similarity detection, semantic context extraction, and cross-language translation.
Syllabus
Training AI To Code Using the Largest Code Dataset - Tommy Li & Animesh Singh, IBM
Taught by
CNCF [Cloud Native Computing Foundation]