MapReduce (Dean and Ghemawat) | BigTable (Chang et.al.) | DynamoDB (DeCandia et.al.) | McKinsey whitepaper
(But all of the Hadoop jobs can run on one computer / VM for debugging and development.)
(But all of the Hadoop jobs can run on one computer / VM for debugging and development.)
(But all of the Hadoop jobs can run on one computer / VM for debugging and development.)
Mahout is a Machine Learning and parallel linear algebra library that has a Spark + Scala shell implementing the distributed matrix libraries only (with plans for the rest). Written in Java.
MapReduce is painfully slow and inflexible; MapReduce 2 splits the JobTracker to two processes, broadens the API, and gives the user more control over how the jobs execute.
Speed wins: Mahout announced near the end of last year that it
will stop developing for MapReduce and will exclusively write
code for use with Spark.
No wonder when looking at the performance:
MLib is part of Spark; ALS is a machine learning algorithm.
blog post with benchmark