Github has become the goto source for all things open-source and contains tons of resource for Machine Learning practitioners. We bring to you a list of 10 Github repositories with most stars. We have not included the tutorial projects and have only restricted this list to projects and frameworks.
TensorFlow is an open source software library for numerical computation using data flow graphs. Nodes in the graph represent mathematical operations, while the graph edges represent the multidimensional data arrays (tensors) that flow between them. This flexible architecture lets you deploy computation to one or more CPUs or GPUs in a desktop, server, or mobile device without rewriting code. TensorFlow also includes TensorBoard, a data visualization toolkit.
TensorFlow was originally developed by researchers and engineers working on the Google Brain team within Google’s Machine Intelligence research organization for the purposes of conducting machine learning and deep neural networks research. The system is general enough to be applicable in a wide variety of other domains, as well.
scikit-learn is a Python module for machine learning built on top of SciPy and distributed under the 3-Clause BSD license.
The project was started in 2007 by David Cournapeau as a Google Summer of Code project, and since then many volunteers have contributed. It is currently maintained by a team of volunteers.
Keras is a high-level neural networks API, written in Python and capable of running on top of either TensorFlow or Theano. It was developed with a focus on enabling fast experimentation. Being able to go from idea to result with the least possible delay is key to doing good research.
Use Keras if you need a deep learning library that:
- Allows for easy and fast prototyping (through user friendliness, modularity, and extensibility).
- Supports both convolutional networks and recurrent networks, as well as combinations of the two.
- Runs seamlessly on CPU and GPU.
Apache PredictionIO (incubating) is an open source machine learning framework for developers, data scientists, and end users. It supports event collection, deployment of algorithms, evaluation, querying predictive results via REST APIs. It is based on scalable open source services like Hadoop, HBase (and other DBs), Elasticsearch, Spark and implements what is called a Lambda Architecture.
Tesseract has unicode (UTF-8) support, and can recognize more than 100 languages “out of the box”. It can be trained to recognize other languages.
Tesseract supports various output formats: plain-text, hocr(html), pdf.
Tesseract was originally developed at Hewlett-Packard Laboratories Bristol and at Hewlett-Packard Co, Greeley Colorado between 1985 and 1994, with some more changes made in 1996 to port to Windows, and some C++izing in 1998.
In 2005 Tesseract was open sourced by HP. Since 2006 it is developed by Google.
MXNet is a deep learning framework designed for both efficiency and flexibility. It allows you to mix symbolic and imperative programming to maximize efficiency and productivity. At its core, MXNet contains a dynamic dependency scheduler that automatically parallelizes both symbolic and imperative operations on the fly. A graph optimization layer on top of that makes symbolic execution fast and memory efficient. MXNet is portable and lightweight, scaling effectively to multiple GPUs and multiple machines.
This program generates bitmaps that are locally similar to the input bitmap.
Local similarity means that
- (C1) Each NxN pattern of pixels in the output should occur at least once in the input.
- (Weak C2) Distribution of NxN patterns in the input should be similar to the distribution of NxN patterns over a sufficiently large number of outputs. In other words, probability to meet a particular pattern in the output should be close to the density of such patterns in the input.
Pattern is a web mining module for Python. It has tools for:
- Data Mining: web services (Google, Twitter, Wikipedia), web crawler, HTML DOM parser
- Natural Language Processing: part-of-speech taggers, n-gram search, sentiment analysis, WordNet
- Machine Learning: vector space model, clustering, classification (KNN, SVM, Perceptron)
- Network Analysis: graph centrality and visualization.
It is well documented and bundled with 50+ examples and 350+ unit tests.
NLTK — the Natural Language Toolkit — is a suite of open source Python modules, data sets and tutorials supporting research and development in Natural Language Processing.
Swift AI is a high-performance machine learning library written entirely in Swift.
Swift AI includes a set of common tools used for machine learning and artificial intelligence. These tools are designed to be flexible, powerful and suitable for a wide range of applications.
Enjoyed this story? Join our Telegram group. And be part of an engaging community.
Register for our upcoming Data Engineering Workshop, in Mumbai & Gurugram, here.
Provide your comments below
What's Your Reaction?
Bhasker is a Data Science evangelist and practitioner with proven record of thought leadership and incubating analytics practices for various organizations. With over 16 years of experience in the area of Business Analytics, he is well recognized as an expert within the industry. Earlier, Bhasker worked as Vice President at Goldman Sachs. He is B.Tech from Indian Institute of Technology, Varanasi and MBA from Indian Institute of Management, Lucknow.