Processing Big Data for Analytics Applications (CSCI-UA 476)

This course introduces platforms, tools, and the architectures that facilitate scalable management and processing of vast quantities of data. We will explore open source tools enabling the efficient acquisition, storage, and processing of Big Data. Students will learn about distributed storage solutions such as the Apache Hadoop Distributed File System (HDFS), which supports storage of Big Data. Students will gain hands-on experience with distributed processing Apache solutions such as Hadoop MapReduce, HBase, Hive, Impala, Pig, core Spark, Spark SQL, and Spark Streaming. Other Apache big data tools covered are Sqoop, Oozie, Zookeeper, Flume, and Kafka

Computer Science (Undergraduate)
4 credits – 15 Weeks

Sections (Spring 2021)


CSCI-UA 476-000 (10054)
01/28/2021 – 05/10/2021 Tue,Thu
12:00 AM – 1:00 PM (Early afternoon)
at Washington Square
Instructed by Malavet, Ann