Now Reading
10 Most Popular Machine Learning & Data Science Packages On Github


10 Most Popular Machine Learning & Data Science Packages On Github


GitHub community decided to dig deeper into machine learning and pulled data on contributions from Jan-Dec 2018. The contributions include pushing code, opening an issue or pull request, commenting on an issue and reviewing a pull request. The Octoverse report used data from the dependency graph for the most imported packages which include all public repositories and any private repositories which have opted into the dependency graph. The information on this article has been cited from the original documentation and the sources are also cited.



In this article, we list down the 10 most popular machine learning and data science packages on GitHub.

1| Numpy

NumPy is the fundamental package for scientific computing with Python. It contains a powerful N-dimensional array object, sophisticated (broadcasting) functions, tools for integrating C/C++ and Fortran code and is useful linear algebra, Fourier transform, and random number capabilities. NumPy can also be used as an efficient multi-dimensional container of generic data. Here, the arbitrary datatypes can also be defined that allows NumPy to seamlessly as well as speedily integrate with a wide variety of databases.

Click here to read more.

2| Scipy

SciPy is open-source software for mathematics, science, and engineering which includes modules for statistics, optimisation, integration, linear algebra, Fourier transforms, signal and image processing, ODE solvers, and more. It basically depends on NumPy which provides convenient and fast N-dimensional array manipulation. SciPy is built to work with NumPy arrays and provides many user-friendly and efficient numerical routines such as routines for numerical integration and optimization.

Click here to read more.

3| Pandas

Pandas is a Python package which provides fast, flexible, and expressive data structures designed to make working with "relational" or "labeled" data both easy and intuitive. It aims to be the fundamental high-level building block for doing practical, real-world data analysis in Python. Additionally, it has the broader goal of becoming the most powerful and flexible open source data analysis/manipulation tool available in any language.

Click here to read more.

4| Matplotlib

Matplotlib is a Python 2D plotting library which produces publication quality figures in a variety of hardcopy formats and interactive environments across platforms. Matplotlib can be used in Python scripts, the Python and IPythonshells, the Jupyter notebook, web application servers, and four graphical user interface toolkits. You can easily generate plots, histograms, power spectra, bar charts, error charts, scatterplots, etc., with just a few lines of code in this library.

Click here to read more.

5| Scikit-learn

Scikit-learn is an open source machine learning library for Python which is built on top of SciPy and distributed under the 3-Clause BSD license. Scikit-learn 0.20 is the last version to support Python2.7. Scikit-learn 0.21 and later will require Python 3.5 or newer. Scikit-learn also uses CBLAS, the C interface to the Basic Linear Algebra Subprograms library. If you already have a working installation of NumPy and SciPy, the easiest way to install Scikit-learn is using pip or conda

i.e. pip install -U scikit-learn

Or conda install scikit-learn

Click here to read more.

6| Six

Six is a Python 2 and 3 compatibility library which provides utility functions for smoothing over the differences between the Python versions with the goal of writing Python code that is compatible on both Python versions. Six is basically a utility package and it is intended to support codebases which work on both Python 2 and 3 without any modification and it can be downloaded on PyPI.

Click here to read more.   

See Also

7| TensorFlow

TensorFlow is an open source software library for numerical computation using data flow graphs. The graph nodes represent mathematical operations, while the graph edges represent the multidimensional data arrays (tensors) that flow between them. This flexible architecture enables you to deploy computation to one or more CPUs or GPUs in a desktop, server, or mobile device without rewriting code. TensorFlow also includes TensorBoard which is a data visualization toolkit.

Click here to read more.

8| Requests

Requests is an Apache2 Licensed HTTP library, written in Python. It is designed to be used by humans to interact with the language. It allows you to send organic, grass-fed HTTP/1.1 requests, without the need for manual labor and is one of the most downloaded Python packages of all time, pulling in over 11,000,000 downloads every month.

Click here to read more.

9| Python-dateutil

The dateutil package provides powerful extensions to the standard datetime module, available in Python. The features of Python-dateutil includes computing of relative deltas between two given date or datetime objects, computing of dates based on very flexible recurrence rules, using a superset of iCalender specification, generic parsing of dates, etc.

Click here to read more.

10| Pytz

The pytz library allows accurate and cross platform timezone calculations using Python 2.4 or higher versions and provides access to the Olson timezone database. It also solves the issue of ambiguous times at the end of daylight savings, which you can read more about in the Python Library Reference.

Click here to read more.



Register for our upcoming events:


Enjoyed this story? Join our Telegram group. And be part of an engaging community.

Provide your comments below

comments

What's Your Reaction?
Excited
0
Happy
0
In Love
0
Not Sure
0
Silly
0
Scroll To Top