MapReduce is a software framework that was created by Google. It`s prime focus was to aid in distributed computing, specifically large sets of data on a group of many computers. The frameworks took its inspiration from the map and reduce functions from functional programming.
Explain how mapreduce works.
The processing can occur on data which are in a file system (unstructured ) or in a database ( structured ). The mapreduce framework primarily works on two steps:
1. Map step: During this step the master node accepts an input (problem) and splits it into smaller problems. Now the node distributes the small sub problems to the worker node so that they can solve the problem.
2. Reduce step: Once the sub problem is solved by the worker node, the node returns a solution to the master node which accepts all the solutions of the worker node and re-compiles them into a solution. This solution is for the input that was provided to the master node.