
Using Conditional Probability in SQL for Fraud Detection
Probability concepts, such as conditional probability and Bayes’ Theorem, are widely used in real-world applications like fraud detection, risk assessment, and recommendation systems. In this post, I’ll explore how to calculate the probability of fraud given a high-value transaction using SQL and apply probability principles directly to a database.
Understanding the Problem
I have a database table fraud_example, which contains transaction data with the following structure (I made it as an example):
- Transaction: The amount of the transaction
- Fraud: Whether the transaction was fraudulent (1 = Fraud, 0 = Not Fraud)
We want to calculate:
Probability of fraud given that a transaction amount is greater than $500.
We can formulate that as: P (Fraud | Transaction > 500)
First, let me explain briefly the probability:
In a simple word, if we did not have any information about incidence B, we could calculate simply P(A) as:
So, for calculating P(A) given that B is already happened (or P(A|B)) , since we have new information about B (we know it happened), our  whole possible outcomes limited to B (instead of S), and P(A) given that B is happened actually means the proportion of (A to n(B).
 Now let’s see the real example,
F: The transactions which are fraudulent
G: The transactions which are genuine
O: The transactions which are more than 500 $.
P(A)= The number of fraudulent transactions / Total transactions
P(B)= The number of genuine transactions / Total transactions
P(O) = The number of transactions > 500 / Total transactions
  The number of fraudulent transactions and over 500/ total transactions
  The number of genuine transactions and over 500/ total transactions
In this case what is being asked is:
P (Fraud | Transaction > 500) = P (F|O)
We also need P(O) which is: the number of transactions over 500 / total transactions
Â
Now let’s calculate it by SQL:
To compute this probability, we break it down into key counts:
- Total Transactions (nT)
- Total Fraudulent Transactions (nF)
- Total High-Value Transactions (amount > 500) (nO)
- Fraudulent High-Value Transactions (nF and nO)
We use a Common Table Expression (CTE) to first compute these counts and then calculate the conditional probability.
Example Output
In this example:
- There are 3 high-value transactions.
- 3 of them are fraudulent.
- The probability of fraud given a high-value transaction is 0.66 (66%).
Conclusion
Using SQL, I applied conditional probability to a real-world fraud detection problem. This method can be extended to other financial risk analyses, medical diagnoses, and recommendation engines. Integrating probability theory with SQL helps businesses make data-driven decisions directly within their databases.