This course introduces platforms, tools, and the architectures that facilitate scalable management and processing of vast quantities of data. We will explore open source tools enabling the efficient acquisition, storage, and processing of Big Data. Students will learn about distributed storage solutions such as the Apache Hadoop Distributed File System (HDFS), which supports storage of Big Data. Students will gain hands-on experience with distributed processing Apache solutions such as Hadoop MapReduce, HBase, Hive, Impala, Pig, core Spark, Spark SQL, and Spark Streaming. Other Apache big data tools covered are Sqoop, Oozie, Zookeeper, Flume, and Kafka
Computer Science (Undergraduate)
4 credits – 15 Weeks
Sections (Spring 2021)
CSCI-UA 476-000 (10054)01/28/2021 – 05/10/2021 Tue,Thu12:00 AM – 1:00 PM (Early afternoon)at Washington SquareInstructed by Malavet, Ann