(Big Data Analytics)
- Introduction to the course, preliminaries, massive data mining
- Properties of big data, big data storage methods
- Algorithms of finding large itemsets, association rules, I/O efficient algorithms, algorithms of extracting complex patterns
- Map-reduce programming, distributed file systems
- Finding similar items in high dimensions, distance measures
- locality-sensitive hashing (LSH)
- Algorithms of processing data streams, fixed ratio sampling from a data stream, fixed size sampling from a data stream, counting the number of bits 1 in a data stream
- filtering a data stream, counting the number of distinct elements in a data stream, estimating standard deviation of a data stream
- Dimension reduction and matrix decomposition for big data, singular value decomposition (SVD) for dig data, CUR decomposition for big data
- Big data clustering, hierarchical big data clustering, the BFR algorithm for big data clustering, the CURE algorithm for big data clustering
- Classification of online and streaming data, Hoeffding tree
- concept drift and algorithms of detecting it, classification of streaming data in the presence of concept drift
- Sketching, sketching for least squares regression, subspace embedding (subsampled randomized Hadamard transform, CountSketch embedding, affine embedding)
- web advertisement
- large scale recommendation systems, content based recommendation systems, recommendation systems based on collaborative filtering, recommendation systems based on latent factors