Datasets, tools, and benchmarks for representation learning of code
CodeSearchNet is a large-scale dataset and research benchmark designed to advance the development of systems that retrieve sourcecode using natural language queries. The project was created through collaboration between GitHub and Microsoft Research and aims to support research on semantic code search and program understanding. The dataset contains millions of pairs of sourcecode functions and corresponding documentation comments extracted from open-source repositories. ...
Brainiac, Is C/C++ Libraries, Programs, And Python, And Lua Scripts For Neural Networking And Genetic Programming, In An Attempt To Create A "Glue-It-All-Together" Project, Striving Towards General Artificial Intelligence