Assignment Task
Learning Outcomes
1. Evaluate the challenges within big data analytics (data complexity; computational complexity; and system complexity).
2. Hypothesize the development and techniques for the design of systems capable of performing a given recognition task for a specific research application.
Task:
You are required to carry out a simulation and write a report to evaluate the challenges in big data analytic. You are expected to generate synthetic data. Suppose that each object is represented by a 100-dimension data point. The objective is to implement a KNN algorithm where we want to return the K nearest neighbours to a query. You can apply the Euclidean distance between two vectors to calculate similarity. You have to push the limits of the computer you are using by generating a sufficiently large dataset. Then compare the use of a MapReduce framework to a classic implementation. You can use R or any other software to do the comparison. Discuss the data complexity, computational complexity, and system complexity in this context. Discuss the use of such KNN technique in a real-world situation. Where can it be used in a Big Data context?