Datasets, tools, and benchmarks for representation learning of code
CodeSearchNet is a large-scale dataset and research benchmark designed to advance the development of systems that retrieve source codeusing natural language queries. The project was created through collaboration between GitHub and Microsoft Research and aims to support research on semantic code search and program understanding. The dataset contains millions of pairs of source code functions and corresponding documentation comments extracted from open-source repositories. ...
A tool that AI automatically recommends commit messages
This is implementation of CommitBERT: Commit Message Generation Using Pre-Trained Programming Language Model. CommitBERT is accepted in ACL workshop : NLP4Prog. Have you ever hesitated to write a commit message? Now get a commit message from Artificial Intelligence! CodeBERT: A Pre-Trained Model for Programming and Natural Languages introduces a pre-trained model in a combination of Program Language and Natural Language(PL-NL). It also introduces the problem of converting code into natural...