Active Hackathon

Generating Automatic Documentation and Comments to Source Codes using Codist AI

Codist is a platform as a service(PaaS), built for MLonCode and Programming Language Theory to automatically interpret source code and then help developers understand and maintain source code faster.

Applying machine learning to source code is now being widely utilised. Various applications are being built to help developers make better source codes such as Autocompletion software, Type checking/hinting, Unit testing assistant, Code summarization, the Code plagiarism checker for codes written in different programming languages but performing the same tasks, Bug detection, Program repair and induction, and lastly generating docstrings or commenting the codes to understand what it does.

Source: Link


Sign up for your weekly dose of what's up in emerging technology.

For large codebases suppose a 1M lines of code it can be very difficult to provide comments and documentation. Commenting cannot be avoided as the next set of developers and maintainers will refer to it and understand its working to build further advances on top of it. A well-structured codebase along with consistent comments will help the developers to easily understand and maintain the code. Another aspect is reaching out to users. Software companies have to provide good documentation for the software products for their clients to interpret the working out of it. 

This is a big challenge for developers to cope up with the pressure of writing codes, testing modules as well as keeping up with the documentation.


Codist is a platform as a service(PaaS), built for MLonCode and Programming Language Theory to automatically interpret source code and then help developers understand and maintain source code faster. It makes source code documentation easier with its high scaled features. It analyzes both new and legacy code by auditing the documentation. Codist automatically updates any missing or outdated documentation.

Docly by Codist

Docly is CLI based that reviews and completes code documentation required with only one command line. This can be helpful just before pushing the code to any version control systems like GitHub repository. 


pip install docly


docly-gen /path/to/file_or_folder_with_python_files

This line will print out an interactive prompt to ask if you want to see and apply the changes [y/n]. By default, this command generates the comment of the function and lists out all arguments declared. To not generate the arguments list. The following will appear.

To use docly in jupyter notebooks:  

pip install ‘docly[jupyter]’

To run docly on .ipynb file from CLI: 

docly-gen --run_on_notebooks /path/to/file_or_folder_with_python_files 

Save the generated comments:

docly-gen --no_generate_diff --print_report /path/to/file_or_folder_with_python_files

Revert Changes: 

-- docly-restore

Currently, Codist provides access to its beta version of Docly. As of now, it is only available in macOS and Linux. Soon the Windows version will also be released. Docly makes use of source code embeddings using vectors, programming language theory to analyse it and make it automatically understandable and lastly natural language processing to build a semantic understanding between computers and humans. Docly will soon be open-sourced.

Docly uses Graphical navigation of Source Code

Tree hugger

To build a large scale commenting system it requires a large number of codes to be scrapped from various sources and model them to give accurate predictions. Hence Codist developed Tree-Hugger to mine source code repositories. Tree-hugger is a high-level, light-weight library that provides Pythonic APIs to scrape Git repositories and universal code parsers built on top of tree-sitter. It now has supported over different programming languages. Until now it has support for parsers on Python, Java, Javascript, PHP, C++. With the advent of HuggingFace transformer library and other open-source libraries working with NLP has become easier. Codist has open-sourced Tree hugger.

Source: Link

Installation: pip install -U tree-hugger PyYAML

# Python

 from tree_hugger.core import PythonParser
 pp = PythonParser()

 ['first_child', 'second_child', 'say_whee', 'wrapper', 'my_decorator', 'parent']

 {'parent': '"""This is the parent function\n    \n    There are other lines in the doc string\n    This is the third line\n\n    And this is the fourth\n    """',
  'first_child': "'''\n        This is first child\n        '''",
  'second_child': '"""\n        This is second child\n        """',
  'my_decorator': '"""\n    Outer decorator function\n    """',
  'say_whee': '"""\n    Hellooooooooo\n\n    This is a function with decorators\n    """'} 

An end-to-end pipeline implementation of Tree Hugger is present in this notebook.


Codist has generated a package to automatically check whether the source code documentation is up-to-date. code-bert currently works for Python code.

Recently Microsoft has also released codeBERT. The difference being Codist’s model is made of MLM and next-word prediction whereas Microsoft has MLM and replaced token detection.

CodistAI open-source version to easily use the fine-tuned model based on open source MLM code model codeBERT-small-v2 which is a RoBERTa model, trained using Hugging Face Transformer library and then fine-tuned the model.


Although these tools/packages are doing amazing work in analyzing and providing great insights on documentation and commenting on the codes, there are some drawbacks on which researchers are still working to address. These language-based models are highly trained but can fail. Moreover, the usage in code is pretty challenging as many prerequisites are expected.

More Great AIM Stories

Jayita Bhattacharyya
Machine learning and data science enthusiast. Eager to learn new technology advances. A self-taught techie who loves to do cool stuff using technology for fun and worthwhile.

Our Upcoming Events

Conference, in-person (Bangalore)
Machine Learning Developers Summit (MLDS) 2023
19-20th Jan, 2023

Conference, in-person (Bangalore)
Rising 2023 | Women in Tech Conference
16-17th Mar, 2023

Conference, in-person (Bangalore)
Data Engineering Summit (DES) 2023
27-28th Apr, 2023

Conference, in-person (Bangalore)
MachineCon 2023
23rd Jun, 2023

3 Ways to Join our Community

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Telegram Channel

Discover special offers, top stories, upcoming events, and more.

Subscribe to our newsletter

Get the latest updates from AIM