• LOGIN
  • No products in the cart.

MTH5P2A: Probability Theory

Introduction to Probability Theory

The statistician is basically concerned with drawing conclusions (or inference) from experiments involving uncertainties. For these conclusions and inferences to be reasonably accurate, an understanding of probability theory is essential.

In this section, we shall develop the concept of probability with equally likely outcomes.

Experiment, Sample Space and Event

Experiment: This is any process of observation or procedure that:

(1) Can be repeated (theoretically) an infinite number of times; and

(2) Has a well-defined set of possible outcomes.

Sample space: This is the set of all possible outcomes of an experiment.

Event: This is a subset of the sample space of an experiment.

Consider the following illustrations:

Experiment 1: Tossing a coin.

Sample space: S = {Head or Tail} or we could write:

S = {0, 1} where

Random Experiment
A random experiment is a physical situation whose outcome cannot be predicted until it is observed.

Sample Space
A sample space, is a set of all possible outcomes of a random experiment.

Random Variables
random variable, is a variable whose possible values are numerical outcomes of a random experiment. There are two types of random variables.
1. Discrete Random Variable is one which may take on only a countable number of distinct values such as 0,1,2,3,4,…….. Discrete random variables are usually (but not necessarily) counts.
2. Continuous Random Variable is one which takes an infinite number of possible values. Continuous random variables are usually measurements.

Probability
Probability is the measure of the likelihood that an event will occur in a Random Experiment. Probability is quantified as a number between 0 and 1, where, loosely speaking, 0 indicates impossibility and 1 indicates certainty. The higher the probability of an event, the more likely it is that the event will occur.
Example
A simple example is the tossing of a fair (unbiased) coin. Since the coin is fair, the two outcomes (“heads” and “tails”) are both equally probable; the probability of “heads” equals the probability of “tails”; and since no other outcomes are possible, the probability of either “heads” or “tails” is 1/2 (which could also be written as 0.5 or 50%).

Conditional Probability
Conditional Probability is a measure of the probability of an event given that (by assumption, presumption, assertion or evidence) another event has already occurred. If the event of interest is A and the event B is known or assumed to have occurred, “the conditional probability of A given B”, is usually written as P(A|B).

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Independence
Two events are said to be independent of each other, if the probability that one event occurs in no way affects the probability of the other event occurring, or in other words if we have observation about one event it doesn’t affect the probability of the other. For Independent events A and B below is true

Example

Let’s say you rolled a die and flipped a coin. The probability of getting any number face on the die is no way influences the probability of getting a head or a tail on the coin.

Conditional Independence
Two events A and B are conditionally independent given a third event C precisely if the occurrence of A and the occurrence of B are independent events in their conditional probability distribution given C. In other words, A and B are conditionally independent given C if and only if, given knowledge that C already occurred, knowledge of whether A occurs provides no additional information on the likelihood of B occurring, and knowledge of whether B occurs provides no additional information on the likelihood of A occurring.

Example
A box contains two coins, a regular coin and one fake two-headed coin (P(H)=1P(H)=1). I choose a coin at random and toss it twice.
Let
A = First coin toss results in an HH.
B = Second coin toss results in an HH.
C = Coin 1 (regular) has been selected.
If C is already observed i.e. we already know whether a regular coin is selected or not, the event A and B becomes independent as the outcome of 1 doesn’t affect the outcome of other event.

Expectation
The expectation of a random variable X is written as E(X). If we observe N random values of X, then the mean of the N values will be approximately equal to E(X) for large N. In more concrete terms, the expectation is what you would expect the outcome of an experiment to be on an average if you repeat the experiment a large number of time.

So the expectation is 3.5 . If you think about it, 3.5 is halfway between the possible values the die can take and so this is what you should have expected.

Variance
The variance of a random variable X is a measure of how concentrated the distribution of a random variable X is around its mean. It’s defined as

Probability Distribution
Is a mathematical function that maps the all possible outcomes of an random experiment with it’s associated probability. It depends on the Random Variable X , whether it’s discrete or continues.
1. Discrete Probability Distribution:The mathematical definition of a discrete probability function, p(x), is a function that satisfies the following properties. This is referred as Probability Mass Function.

  1. Continuous Probability Distribution:The mathematical definition of a continuous probability function, f(x), is a function that satisfies the following properties. This is referred as Probability Density Function.

Joint Probability Distribution
If X and Y are two random variables, the probability distribution that defines their simultaneous behaviour during outcomes of a random experiment is called a joint probability distribution. Joint distribution function of X and Y ,defined as

In general if there are n random variables and each can take values v1, v2 … vn different values then there will be total (v1)^n*(v2)^n*…(vn)^n rows in the table.

Conditional Probability Distribution (CPD)
If Z is random variable who is dependent on other variables X and Y, then the distribution of P(Z|X,Y) is called CPD of Z w.r.t X and Y. It means for every possible combination of random variables X, Y we represent a probability distribution over Z.
Example
There is a student who has a property called ‘Intelligence’ which can be either low(I_0)/high(I_1). He/She enrolls to a course, The course has property called ‘Difficulty’ which can take binary values easy(D_0)/difficult(D_1). And the student gets a ‘Grade’ in the course based on his performance, and grade can take 3 values G_1(Best)/(G_2)/(G_3)(Worst). Then the CPD P(G|I,D) is as follow

There are a number of operations that one can perform over any probability distribution to get interesting results. Some of the important operations are as below.

  1. Conditioning/Reduction
    If we have a probability distribution of n random variables X1, X2 … Xn and we make an observation about k variables that they acquired certain values a1, a2, …, ak. It means we already know their assignment. Then the rows in the JD which are not consistent with the observation is simply can removed and that leave us with lesser number of rows. This operation is known as Reduction.

2. Marginalisation
This operation takes a probability distribution over a large set random variables and produces a probability distribution over a smaller subset of the variables. This operation is known as marginalising a subset of random variables. This operation is very useful when we have large set of random variables as features and we are interested in a smaller set of variables, and how it affects output. For ex.

Factor
A factor is a function or a table which takes a number of random variables {X_1, X_2,…,X_n} as an argument and produces a real number as a output. The set of input random variables are called scope of the factor. For example Joint probability distribution is a factor which takes all possible combinations of random variables as input and produces a probability value for that set of variables which is a real number. Factors are the fundamental block to represent distributions in high dimensions and it support all basic operations that join distributions can be operated up on like product, reduction and marginalisation.

Factor Product
We can do factor products and the result will also be a factor. For ex

LEARN MORE ABOUT PROBABILITY THEORY

Assignment

Probability Theory Assignment

ASSIGNMENT : Probability Theory Assignment MARKS : 10  DURATION : 1 week, 3 days

 

Courses

Featured Downloads