LearningBitByBitMathCheatSheet
 ClassWork / LearningBitByBitMathCheatSheet Math Cheat Sheet Helpful links for deciphering equations: Sets∀ Universal, for all or any. ∈ set membership - is a member of. ∉ set membership - is not a member of. a ∈ S Element a belongs to set S Example: a = 1 and S = {1,2,3} Inverse: a ∉ S A ⊆ B A is a subset of B Example: A = {1,2,3} and B= {1,2,3,4,5,6} Inverse: A ⊄ B A ⊂ B A is a proper set of B. A is a set of B but A ≠ B Example: A = {1,2,3} and B= {1,2,3,4,5,6} ∅ Null, an empty set Ω Universal set, the universe of a specific context, the sample space Example: The universal set of English vowels {a,e,i,o,u} does not contain the letter z ℜ The set of real numbers A = {x1, x2,…,xn} Finite set A = {x1, x2,…} Infinite set Sc A set’s complement Example: If Ω = {a,e,i,o,u} and S = {a,e,i} Sc ={o,u} A\B Difference between sets A and B Example: A = {1,2,3} and B= {1,2,3,4,5,6} the difference is {4,5,6} A ∪ B The union of A and B; all elements occurring in A OR B Example: A = {1,2,3} and B= {1,2,3,4,5,6} the union is {1,2,3,4,5,6} A ∩ B The intersection of A and B ; all elements occurring in A AND B Example: A = {1,2,3} and B= {1,2,3,4,5,6} the intersection is {1,2,3} {x | K} x such that x satisfies property K Example: The set of all even integers {k | k/2 is an integer} ProbabilityP(X)The probability of X:The number of ways an event can occurThe total number of possible Outcomes (the sample space)Example: The probability of rolling a 5 on a 6-sided die1 in 6 or 1/6 P(W ∩ O)AND, Joint probability; The probability of W and O happening together. P(W U O)OR, the probability of W or O happening. Conditional Probability - Reasoning based on partial informationP(A|B)The conditional probability of A given B. Example: A six-sided die is rolled and you are told the outcome is even. What is the probability that a 6 was rolled? P(A | B) = number of elements of A ∩ B / number of elements of BIn the above case if A is an outcome of 6 and B is an even outcomeA = {6}B = {2,4,6} The union of sets A and B (the elements in common between sets A and B)A ∩ B = {6} The number of elements in set A ∩ B = 1The number of elements in set B = 3 So given our equation above, the conditional probability of A given B = 1/3P(A | B) = 1/3 Chains of ProbabilitySpeech and Language Processing p. 87{w1, w2, … wn}A sequence of a number (n) of words (w) starting with word 1, continuing to n P(w1, w2,…wn)The joint probability that word 1 through word n will all happen together in sequence. This is calculated asP(w1) * P(w2|w1) * P(w2|w3) ... * P(wn-1|wn) The probability of word 1 * the conditional probability of word 2 given word 1 * the conditional probability of word 3 given the sequence word 1, word 2And so on through the last wordThe conditional probability of word n given the sequence of words 1,2,… BigramP(Wn | Wn-1)The probability of a word (Wn) given the word that preceded it (Wn-1). N-Gram Given a set of words Wie. W = (the, dog, runs)For k=1, while k <= n take the product ofThe probability of word k in set W given word k-1 in set WIncrement k Maximum Likelihood Estimation used to normalize n-gram countsP(Wn | Wn-1) = C(Wn-1 Wn) / C(Wn-1)Given a corpus of word sequences, the probability of the bigram (word n-1 word n) =The frequency count for this bigram divided by the frequency count for word n-1 Similarity MetricsEuclidean Distance: translates tofor i=1, while i <= n, sum the following:(index i of set X – index i of set Y) squaredincrement iTake the square root of this sum.Distance between point A B in 2D space = √(A1 – B1)² + (A2 - B2)² This is easily extended into n-dimensional space as such:√(A1 – B1)² + (A2 - B2)² + (A3 – B3)² + (A4 – B4)² + … + (An – Bn)²where n is a variable number of dimensions of points A and B.Find python translation of Euclidean distance formula in utilities.py Pearson Correlation ρ (rho or just r): Given two vectors X and Y in n-dimensional space, find how linear a relationship exists between themThe correlation coefficient may take any value between -1.0 (inversely correlated) ie. X = (1,2,3) and Y = (3,2,1)To 0.0 (no correlation) ie. X = (1,2,3) and Y = (1,2,1)And +1.0 (perfectly correlated) ie. X = (1,2,3) and Y = (4,5,6) You can always double check your answers using the correl(X, Y) function in a spreadsheet application like Excel Deconstructed: - The mean (average value) of X - SumFor i = 1, while i<= n Sum xyz then increment i Given vector X and vector YFor i = 1Subtract the mean of X from the ith index of XSubtract the mean of Y from the ith index of YMultiply both valuesAdd this to the running sumincrement i repeat until i > n Find python translation of this Pearson Correlation formula in utilities.py Jaccard Index Given 2 sets A and B the Jaccard index is the size of the union of the sets divided by the size of the intersection of the sets.Example:Set A = (1,2,3) Set B = (2,3,4)The intersection is (2,3) size of 2The union is (1,2,3,4) size of 4J = 2/4 = .5