Apache SystemML is a flexible machine learning system that automatically scales to Spark and Hadoop clusters. SystemML’s distinguishing characteristics are:
-
- Algorithm customizability via R-like and Python-like languages.
- Multiple execution modes, including Standalone, Spark Batch, Spark MLContext, Hadoop Batch, and JMLC.
- Automatic optimization based on data and cluster characteristics to ensure both efficiency and scalability.
History
SystemML was created in 2010 by IBM Almaden Research Center by IBM Fellow Shivakumar Vaithyanathan. It was observed that data scientists would write machine learning algorithms in languages such as R and Python for small data. When it was time to scale to big data, a system programmer would be needed to scale the algorithm in a language such as Scala. This process involves working with data, and it would be possible to translate the algorithms to operate on big data. SystemML seeks to simplify this process. A primary goal of SystemML is to automatically scale an algorithm written in an R-like or Python-like language to operate on big data, generating the same answer without the error-prone, multi-iterative translation approach.
On June 15, 2015, at the Spark Summit in San Francisco, Beth Smith, General Manager of IBM Analytics, announced that IBM was open-sourcing SystemML as part of IBM’s major commitment to Apache Spark and Spark-related projects. Availability of annual SystemML est devenu we GitHub on August 27, 2015 and year est devenu Apache Incubator project on November 2, 2015. On May 17, 2017, the Apache Software Foundation Board approved the graduation of Apache SystemML have an Apache Top Level Project.