From: Andrew Lorimer Date: Mon, 16 Sep 2019 12:14:40 +0000 (+1000) Subject: [methods] expand probability (conf. intervals, sample proportions) X-Git-Tag: yr12~28 X-Git-Url: https://git.lorimer.id.au/notes.git/diff_plain/496559bac54b297a9e4d6a27e3cb33e92f1c671e?ds=sidebyside [methods] expand probability (conf. intervals, sample proportions) --- diff --git a/methods/methods-collated.pdf b/methods/methods-collated.pdf index ba671ee..776aa11 100644 Binary files a/methods/methods-collated.pdf and b/methods/methods-collated.pdf differ diff --git a/methods/methods-collated.tex b/methods/methods-collated.tex index b3b29d5..a3a1e79 100644 --- a/methods/methods-collated.tex +++ b/methods/methods-collated.tex @@ -42,6 +42,7 @@ decorations.text, decorations.pathreplacing, decorations.text, + patterns, scopes } @@ -89,6 +90,7 @@ \definecolor{cas}{HTML}{e6f0fe} \definecolor{important}{HTML}{fc9871} +\definecolor{highlight}{HTML}{ffb84d} \definecolor{dark-gray}{gray}{0.2} \definecolor{peach}{HTML}{e6beb2} \definecolor{lblue}{HTML}{e5e9f0} diff --git a/methods/statistics-ref.pdf b/methods/statistics-ref.pdf index 2cbc960..b0484e0 100644 Binary files a/methods/statistics-ref.pdf and b/methods/statistics-ref.pdf differ diff --git a/methods/statistics-ref.tex b/methods/statistics-ref.tex index 196455f..8731999 100644 --- a/methods/statistics-ref.tex +++ b/methods/statistics-ref.tex @@ -13,7 +13,7 @@ \Pr(A) &= \Pr(A|B) \cdot \Pr(B) + \Pr(A|B^{\prime}) \cdot \Pr(B^{\prime}) \end{align*} -Mutually exclusive \(\implies \Pr(A \cup B) = 0\) \\ +Mutually exclusive: \(\Pr(A \cap B) = 0\) \\ Independent events: \begin{flalign*} @@ -24,17 +24,37 @@ Independent events: \subsection*{Combinatorics} -\begin{itemize} +\begin{itemize} \tightlist \item Arrangements \({n \choose k} = \frac{n!}{(n-k)}\) - \item \colorbox{important}{Combinations} \({n \choose k} = \frac{n!}{k!(n-k)!}\) + \item \colorbox{highlight}{Combinations} \({n \choose k} = \frac{n!}{k!(n-k)!}\) \item Note \({n \choose k} = {n \choose k-1}\) \end{itemize} \subsection*{Distributions} -\subsubsection*{Mean \(\mu\)} +\begin{tikzpicture} + \begin{axis}[axis lines=left, + ticks=none, + xmin=0, + ymax=0.5, + enlargelimits=upper, + ylabel={\(\Pr(X=x)\)}, + xlabel={\(x\)}, + every axis x label/.style={at={(current axis.right of origin)},anchor=north west}, + every axis y label/.style={at={(axis description cs:-0.02,0.5)}, anchor=south west, rotate=90}, + ] + \fill[pattern=north east lines, pattern color=orange] (0,0) -- plot[domain=0:1.68, samples=50] function {abs(x)*exp(-x)} -- (1.68,0) -- cycle; + \fill[pattern=north west lines, pattern color=red] (1.68,0) -- plot[domain=1.68:5, samples=50] function {abs(x)*exp(-x)} -- (5,0) -- cycle; + \draw[dashed, blue, very thick] (axis cs:1.68,0) -- (axis cs:1.68,0.31) node [above, anchor=south west, black] {Median}; + \draw[dashed, blue, very thick] (axis cs:2,0) -- (axis cs:2,0.27) node [above, anchor=west, black] {Mean}; + \draw[dashed, blue, very thick] (axis cs:1,0) -- (axis cs:1,0.365) node [above, black] {Mode}; + \node at (1,0.18) {\textbf{50\%}}; + \node at (3.1,0.08) {\textbf{50\%}}; + \addplot[thick, black, no markers, samples=200, domain=0:5] {abs(x)*exp(-x)}; + \end{axis} +\end{tikzpicture} -\textbf{Mean} \(\mu\) or \textbf{expected value} \(E(X)\) +\subsubsection*{Mean \(\mu\)} \begin{align*} E(X) &= \frac{\Sigma \left[ x \cdot f(x) \right]}{\Sigma f} \tag{\(f =\) absolute frequency} \\ @@ -44,13 +64,31 @@ Independent events: \subsubsection*{Mode} -Most popular value (has highest probability of all \(X\) values). Multiple modes can exist if \(>1 \> X\) value have equal-highest probability. Number must exist in distribution. +Value of \(X\) which has the highest probability + +\begin{itemize} \tightlist + \item Most popular value in discrete distributions + \item Must exist in distribution + \item Represented by local max in pdf + \item Multiple modes exist when \(>1 \> X\) value have equal-highest probability +\end{itemize} \subsubsection*{Median} -If \(m > 0.5\), then value of \(X\) that is reached is the median of \(X\). If \(m = 0.5 = 0.5\), then \(m\) is halfway between this value and the next. To find \(m\), add values of \(X\) from smallest to alrgest until the sum reaches 0.5. +Value separating lower and upper half of distribution area -\[ m = X \> \text{such that} \> \int_{-\infty}^{m} f(x) dx = 0.5 \] +\textbf{Continuous:} +\[ m = X \> \text{such that} \> \int_{-\infty}^{m} f(x) \> dx = 0.5 \] + +\textbf{Discrete:} (not in course) +\begin{itemize} \tightlist + \item Does not have to exist in distribution + \item Add values of \(X\) smallest to largest until sum is \(\ge 0.5\) + \item If \(X_1 < 0.5 < X_2\), then median is the average of \(X_1\) and \(X_2\) + \begin{itemize}\tightlist + \item If \(m > 0.5\), then value of \(X\) that is reached is the median of \(X\) + \end{itemize} +\end{itemize} \subsubsection*{Variance \(\sigma^2\)} @@ -72,7 +110,7 @@ If \(m > 0.5\), then value of \(X\) that is reached is the median of \(X\). If \ \subsection*{Binomial distributions} Conditions for a \textit{binomial distribution}: -\begin{enumerate} +\begin{enumerate} \tightlist \item Two possible outcomes: \textbf{success} or \textbf{failure} \item \(\Pr(\text{success})\) (=\(p\)) is constant across trials \item Finite number \(n\) of independent trials @@ -115,7 +153,17 @@ A continuous random variable \(X\) has a pdf \(f\) such that: \begin{cas} Define piecewise functions: \\ - Math3 \(\rightarrow\) + \-\hspace{1em}Math3 \(\rightarrow\) + \begin{tikzpicture}% + \draw rectangle (0.5,0.5); + \node at (0.08,0.25) {\(\{\)}; + \filldraw [black] (0.15, 0.4) rectangle(0.25, 0.3); + \draw (0.35, 0.4) rectangle(0.45, 0.3); + \node [font=\footnotesize] at (0.3,0.3) {\verb;,;}; + \draw (0.15, 0.2) rectangle(0.25, 0.1); + \node [font=\footnotesize] at (0.3,0.1) {\verb;,;}; + \draw (0.35, 0.2) rectangle(0.45, 0.1); + \end{tikzpicture} % TODO: finish this section \end{cas} @@ -197,6 +245,20 @@ For a new distribution with mean of \(n\) trials, \(\operatorname{E}(X^\prime) = \end{cas} +\subsection*{Population sampling} + +\subsubsection*{Population proportion} + +\[ p = \dfrac{n \text{ with attribute in population}}{\text{population size}} \] + +Constant for a given population. + +\subsection*{Sample proportion} + +\[ \hat{p} = \dfrac{n \text{ with attribute in sample}}{\text{sample size}} \] + +Varies with each sample. + \subsection*{Normal distributions} @@ -217,6 +279,11 @@ Normal distributions must have area (total prob.) of 1 \(\implies \int^\infty_{- \item \(C\)\% confidence interval \(\implies\) \(C\)\% of samples will contain population mean \(\mu\) \end{itemize} +\begin{cas} + Menu \(\rightarrow\) Stats \(\rightarrow\) Calc \(\rightarrow\) Interval \\ + Set \textit{Type = One-Sample Z Int} \\ \-\hspace{1em} and select \textit{Variable} +\end{cas} + \subsubsection*{95\% confidence interval} For 95\% c.i. of population mean \(\mu\): @@ -230,10 +297,9 @@ where: \item \(n\) is the sample size from which \(\overline{x}\) was calculated \end{description} -\begin{cas} - Menu \(\rightarrow\) Stats \(\rightarrow\) Calc \(\rightarrow\) Interval \\ - Set \textit{Type = One-Sample Z Int} \\ \-\hspace{1em} and select \textit{Variable} -\end{cas} +\subsubsection*{Confidence interval of \(p\) from \(\hat{p}\)} + +\[ x \in \left( \hat{p} \pm Z \sqrt{\dfrac{\hat{p}(1-\hat{p})}{n}} \right) \] \subsection*{Margin of error} diff --git a/methods/statistics.pdf b/methods/statistics.pdf index 27e441e..cf7a26c 100644 Binary files a/methods/statistics.pdf and b/methods/statistics.pdf differ diff --git a/practice-exams.xlsx b/practice-exams.xlsx index 0e491fb..24b6882 100644 Binary files a/practice-exams.xlsx and b/practice-exams.xlsx differ