Thomas Russell: Programmer & Designer
Have a browse of my thoughts!

Sherlock and Probability (in general)

I’ve recently been exploring HackerRank, a wonderful (and somewhat addictive) site with interactive programming problems and I came across one of the problems in the Probability section and thought that it would be interesting to share a slightly more probabilistic approach here.

The problem (loosely stated) is that given a binary string $S$, and two randomly selected indices $i,j\in \{1,\dots,N\}$, what is the probability that: $S_{i}=S_{j}=1$ and $|i-j|\leq K$, where $K \in \{1,\dots,N\}$.

Now in the HackerRank problem, we are given the string explicitly, and the problem reduces to a counting problem; we count all the pairs of indices which meet the criteria, and divide this number by the total number of pairs of indices ($N^{2}$)

General Probability-based Approach

However, we can take a slightly more abstract approach, where $S$ is a random string described by a probability distribution, rather than being given. We can write the problem as trying to find the following probability:

$$\mathbb{P}[S_{i} = 1 \cup S_{j} = 1 \cup |i – j| \leq K]$$

We note that these three things are independent, so using the identity $\mathbb{P}\left[\bigcup_{i=1}^{n}A_{i}\right]=\prod_{i=1}^{n}\mathbb{P}[A_{i}]$, we have:

$$\mathbb{P}[S_{i}=1 \cup S_{j} =1 \cup |i-j|\leq K] = \mathbb{P}[S_{i}=1]\mathbb{P}[S_{j}=1]\mathbb{P}\left[|i-j|\leq K\right]$$

Now if we examine $\mathbb{P}\left[|i-j|\leq K\right] = \mathbb{P}\left[-K \leq i-j \leq K\right]$, we see that we can write:

\mathbb{P}\left[-K \leq i-j \leq K\right] &= \sum_{i=1}^{N}p_{i}\mathbb{P}\left[j \in [\operatorname{max}(i-K, 1), \operatorname{min}(N, i+K)]\right] \\
&= \sum_{i=1}^{N}p_{i}\sum_{j=\operatorname{max}(i-K, 1)}^{\operatorname{min}(N,i+K)}p_{j}

Where $p_{k} \triangleq \mathbb{P}[i = k] \triangleq \mathbb{P}[j = k]$. So we now have our general formula:

$$\mathbb{P}\left[S_{i} = 1 \cup S_{j} = 1 \cup |i-j|\leq K\right] = \mathbb{P}[S_{i} = 1]\mathbb{P}[S_{j}=1]\sum_{i=1}^{N}p_{i}\sum_{j=\operatorname{max}(i-K, 1)}^{\operatorname{min}(N,i+K)}p_{j}$$

Random Boolean Strings

If we examine the case of a random Boolean string; that is one where $\mathbb{P}[S_{i}=1]=\frac{1}{2}$, $\forall i \in \{1,\dots,N\}$, and where we choose $i$ and $j$ from a discrete uniform distribution, $\mathcal{U}\{1,N\}$. We can thus find the probability:

$$\mathbb{P}\left[S_{i}=1 \cup S_{j} = 1 \cup |i-j|\leq K\right] = \frac{1}{4N^{2}}\sum_{i=1}^{N}\left\{\operatorname{min}(N, i+K) – \operatorname{max}(i-K, 1)\right\}$$

We can simplify this sum by examining 3 cases; firstly, the simple case where $N = K+1$, allowing the sum to simplify to $\sum_{i=1}^{N} (N-1)$ giving us:

$$P = \frac{N(N-1)}{4N^{2}} = \frac{N-1}{4N}$$

Where we have $P = \mathbb{P}\left[S_{i} = 1 \cup S_{j} = 1 \cup |i-j| \leq K\right]$ for brevity. If we take the case where $N = 2K + 1$, we note that we can split the sum into two parts:

$$\begin{align*}\sum_{i=1}^{N}\left\{\operatorname{min}(N, i+K) – \operatorname{max}(i-K,1)\right\} &= \sum_{i=1}^{K}(i+K-1) + \sum_{i=K+1}^{N}(N-i+K) \\
&= \frac{K}{2}(3K – 1) + \frac{N-K}{2}(N+K-1) \\
&= K^{2} + \frac{N}{2}(N-1)\end{align*}$$

In general, we can say the following:

$$\begin{align*}\sum_{i=1}^{N}\left\{\operatorname{min}(N, i+K) – \operatorname{max}(i-K,1)\right\} &= \sum_{i=1}^{K}(i+K-1)+2\sum_{i=K+1}^{N-K-1}K + \sum_{i=N-K}^{N}(N-i+K) \\
&= \frac{3}{2}K(K+1) + \frac{1}{2}(3K^2 – K) + 2K(N – 2K – 1) \\
&= -K(1+K-2N)\end{align*}$$

As far as I know this doesn’t have any real-world application, it was just a bit of fun; but I’d be interested to hear in the comments if anyone knows of a use?

Leave a Reply

Your email address will not be published. Required fields are marked *

Designed and Produced by Thomas Russell © 2014-2017

Log-in | Register