Learningbitbybit-S11
Search:
Syllabus / Learningbitbybit-S11

Learning Bit by Bit

Instructor: Heather Dewey-Hagborg
Contact: hdh216@nyu.edu ~or~ heather@deweyhagborg.com
Class: Friday 12:30 - 3pm
Office Hours: Friday 3 - 4pm in the Adjunct Office and by appointment

Homework Wiki

Course Description
From mailing a letter to shopping online to walking down a city street, applications of machine learning have penetrated our daily experience. Our faces, our voices, the emails we write, the products we buy, the content we choose, all constitute our data portrait: aggregates of information that are meticulously sifted, sorted and searched by algorithms behind the scenes. This class takes a critical tour of the technologies that learn from this data. We look at the information that defines us and how it is analyzed using techniques common to biology, computer science, robotics and surveillance. We cover both the theory and the implementation of machine learning techniques that are commonly used today in applications of text analysis, web search, classification, and content suggestion. We discuss the concept of a data portrait and how heuristics and inductive bias shape the way we are seen. Finally, we apply these techniques to create projects of our own. This class involves weekly blog responses to readings of advanced technical and theoretical texts, as well as in and out of class work on individual and group projects engaging with the concepts. Class discussion is a vital component of the class and students must be willing to engage in weekly discussions about both readings and projects. Students are encouraged to implement projects in a variety of media but must be comfortable programming in Java. Prerequisite: H79.2233 Introduction to Computational Media or equivalent programming experience.

Course Goals
Students will learn the concepts behind common machine learning techniques and apply these ideas to projects of their own design.

Expectations
Assignments will include weekly readings and projects. Project mediums will be left open to student interest. Students will be expected to collaborate, to document their work, to make presentations and to discuss their ideas regularly in class.

Grading
Homework/preparedness 50%
Class Participation 20%
Final Project 30%

Books
Required Books
All books are available from the NYU Bookstore and are also on reserve at Bobst Library. If you have trouble acquiring these books please discuss with me.

  • Speech and Language Processing (Jurafsky and Martin) 2nd Edition (You can order the softcover online for under $50 but order ASAP!)
  • Algorithms Of the Intelligent Web (Marmanis and Babenko) $27.50 http://www.manning.com/marmanis/
  • Lingpipe book (free PDF and code online) http://alias-i.com/lingpipe-book/index.html
  • You are not a Gadget (Jaron Lanier) ~$13
  • The Mind Within The Net (Manfred Spitzer) – Available for free online through Bobst

Suggested Books

  • Head First Java (if you are inexperienced with Java development) Freely available through NYU O’Reilly account
  • Artificial Intelligence 6th Edition, George F. Luger (For larger context of ML within AI)

Week 1. Introductions, tools of the trade

  • Introductions – who am I? who are you?
  • Intro to the idea of machine learning, where it is used
  • APIs we will use
  • Practical matters, get Java and Eclipse working together

Homework

  • Install Shiftspace
  • Request an invitation to the class google group
  • Read:
    • Computing Machinery and Intelligence, Alan Turing
    • Minds, Brains and Programs, John Searle

(both freely available as PDFs online)

  • Jurafsky and Martin ch. 1 (1st chapter ONLY is available free online)
  • Lingpipe Book ch. Getting Started (go through examples) Try importing examples into Eclipse using the new project from ant build file technique we described in class
  • Write:
    • A blog response to Turing and Searle
    • Post a link on HW wiki and be prepared to discuss in class

Week 2. Regular Expressions and Finite State Automata

  • Slides
  • Turing vs. Searle debate:
    • Teams will be assigned based on personal allegiances as much as possible.
    • You will have 10 min at the beginning of class to discuss a strategy with your team.
    • Each of you will get < 2 minutes to present an argument/rebuttal to what the other team has said.
  • Eliza
    • To run Eliza on mac, open terminal
    • type emacs at prompt
    • Hit escape - this will take you to the command prompt in Emacs
    • Type x doctor
    • ctrl-x ctrl-c to quit
  • Bruce Wilcox's Blue Mars most recent winner of loebner prize
  • And Elbot the 2008 winner
  • ALICE based Captain Kirk
  • Lynn Hershman
  • Stelarc
  • Regular Expressions are Finite State Automata
    • Regexes in java, match and find examples
    • Simple Eliza demo
    • Useful online reference

Homework

  • Read:
    • Jurafsky and Martin ch. 2
    • Lingpipe ch. Regular Expressions (import examples into eclipse to run and feel free to test regexes in the online editor)
    • ELIZA--A Computer Program For the Study of Natural Language Communication Between Man and Machine, Joseph Weizenbaum (freely available as PDF online)
    • (optional) Luger ch. 3.0 – 3.2 Graph theory background and more on state space search
  • Code:
    • Program your own Eliza – use only regular expressions to create a conversational agent (it doesn’t have to be a therapist). Program may be command line interface or an applet. Post a link to code on HW wiki.

Week 3. Morphology, Stemming, Tokenization

  • Slides
  • Discussion of homework, a few lucky students present
  • Stemming- basic morphology, finite state transducers, common stemmers
    • Lingpipe book example: Porter Stemmer
  • Tokenization- sentences, words

Homework

  • Read:
    • Jurafsky and Martin ch. 3
    • Lingpipe Book ch. Tokenization (import examples into eclipse to run)
  • Code:
    • Program a stop tokenizer to normalize an input sentence as if it was being passed to a search engine. What makes a good stop list? How does your stop list compare to Google? How does Google compare to Yahoo! And Bing? How important do you think a good stop list is in web search? Blog your answers.

Week 4. N-Grams

Homework

  • Read:
    • Jurafsky and Martin ch. 4.0 - 4.3 (feel free to read more of this chapter if you want to)
    • When Software is the Sportswriter
    • Optional: Claude Shannon “A Mathematical Theory of Communication” 1948 (PDF available online for free) This is the seminal paper that first originated the idea of text generation.
  • Code:
    • Write a program that takes one or more corpora of text as input and generates new text as output in the style of the original. Write a blog post including the generated text and your evaluation of it and post a link to this as well as your code on the HW wiki.

Week 5. Hidden Markov Models

  • Slides
  • Discussion of homework, a few lucky students present
  • Parts of Speech (POS)
  • Hidden Markov Models
  • Bayesian Inference
  • Corpora of POS tagged text
    • Brown Corpus (popular general purpose) download and info
    • GENIA Corpus (biomedical) download
    • MedPost (biomedical) download
    • Index of many many corpora of various stripes
  • Linpipe POS Tagging Tutorial
  • my POS examples

Homework

  • Read:
    • Jurafsky and Martin ch. 6.0-6.5, 5.0-5.3, 5.5, 5.7
    • Lingpipe Book. Ch. Handlers, Parsers, Corpora parts 1 and 2
  • Code:
    • Update your text generator from last week using POS tagging. Write a blog post including the generated text and your evaluation of it and post a link to this as well as your code on the HW wiki.
      OR visualize the grammar of a text of your choice. Write a blog post including images of the visualization, source code, and a description of your work, and post a link to this on the HW wiki.

Week 6. Collective Intelligence and Machine Learning

  • Discussion of homework, a few lucky students present
  • What is Collective Intelligence?
  • Supervised vs. Unsupervised Learning
  • Data Representation
  • Evaluating Results - Confusion Matrix - Precision vs. Recall
  • Algorithms of the Intelligent Web book examples download
  • If you want to try beanshell
    • check out Appendix A of the Intelligent Web book
    • Also the online manual
    • If you are on a mac check out the hack here to get the examples to work

Homework

  • Read:
    • Algorithms of the Intelligent Web ch. 1
    • Jaron Lanier “You are not a Gadget” ch. 1-3
    • Optional: Langdon Winner's essay Do Artifacts Have Politics?
  • Write:
    • A persuasive essay in response to Lanier’s ideas. Post it online. Be prepared to present your ideas next week in class

Week 7. Search

  • Slides
  • Lanier Discussion
  • Information Retrieval using Lucene
    • See Lingpipe examples in Appendix E of the Lingpipe book
  • Web Crawling
  • Link Analysis and Google’s PageRank algorithm
  • Algorithms of the Intelligent Web book examples download
    • Download book source code and add to Eclipse
    • You will need to add external archives to your build path for the project iweb2

(right click “referenced libraries” at the bottom of the project -> build config, add external archives, navigate to iweb2/lib and select ALL jars)

Homework

  • Read:
  • Write:
    • A blog post describing your thoughts on the page rank algorithm. Where does it excel? What are its shortcomings? How could it be improved? Code or pseudo code your ideas for improvements and post a link to all on the homework wiki.

Week 8. Recommendation Systems

Homework

  • Read:
    • Algorithms of the Intelligent Web ch. 3 (esp. 3.0 - 3.2 and 3.5, more if you are interested)
    • The A.I. Revolution is on in Wired
  • Code:
    • Keep a diary for the week of your interactions with computer generated suggestions. How much influence do these algorithms have on your behavior? Try to get multiple perspectives on the options, if possible and compare them. Analyze one of these systems in depth and post a link to your discussion and code if applicable on the homework wiki.

Week 9. Clustering

Homework

  • Read:
    • Algorithms of the Intelligent Web ch. 4
    • Optional: Lingpipe Tutorial on Clustering
  • Code:
    • Collect a body of data using sensors or the web and analyze for clusters. Visualize your results in Processing. Blog about your results and post links to the wiki.

Week 10. Intro to Classification and Regression

Homework

  • Read:
    • Algorithms of the Intelligent Web ch. 5.0 – 5.3.1 (there is a typo in the formula on p. 182) should be p(X|Y) = p(Y|X)p(X) / p(Y)
    • Algorithms of the Intelligent Web ch. 2.4
    • Jurafsky and Martin ch. 6.6.1 – 6.6.3
    • Lingpipe ch.10 Classifiers and Evaluation through section 4
    • Lingpipe Book ch.11 Naïve Bayes Classifiers
  • Write:
    • A final project proposal and post a link on the wiki. Be prepared to discuss in class next week.

Week 11. Natural Language Processing Guest Lecture and Student Project Ideas

  • Lingpipe's Breck Baldwin will come give a talk in class
  • Students present final project ideas

Week 12. Supervised Neural Networks

Homework

  • Read:
    • The Mind within the Net ch. 1-2 and 6
    • Algorithms of the Intelligent Web ch. 5.5
    • Optional: Algorithms of the Intelligent Web ch. 5.4, 5.6
  • Code:
    • Work on final projects

Week 13. Unsupervised Neural Networks

  • Self Organizing Maps
    • org.encog.examples.neural.gui.som (colors, dimensionality reduction/compression)
    • org.encog.examples.neural.som (simple illustration)
  • Hebbian Learning
    • com.heatonreasearch.book.introneuralnet.ch4.hebb (simple illustration)
  • Bidirectional Associative Memories
    • org.encog.examples.neural.bam (names -> phone numbers)
  • Hopfield Attractor Networks
    • org.encog.examples.neural.hopfield (pattern recognition)
  • Spurious Memories
  • Discuss final project issues/workshop

Homework

  • Read:
    • The Mind within the Net ch. 3, 5 and 8
  • Code:
    • Finish final projects, prepare a 5 min. presentation.
    • Form:
      • One liner – What did you do?
      • Content – Why did you do it? Who is the audience? How does it engage with the theoretical concerns we have discussed in this class?
      • Demo
      • Comments/critique

Week 14. Final Projects Presentations

Search
  Page last modified on April 28, 2011, at 04:44 PM