TimD.one

Phred Quality Scores

Quality scores (Q_score) are frequently used in genome sequencing datasets to encode the probability of a basecalling or alignment error (P_error). To span a wide range of values, Q_score = -10log₁₀(P_error). Since this formula is somewhat non-intuitive, here's a table:

Phred Quality Score	Probability of Error	Accuracy
0	1 in 1	0%
3	1 in 2	50.1%
10	1 in 10	90%
20	1 in 100	99%
30	1 in 1,000	99.9%
40	1 in 10,000	99.99%

Quality scores range from 0 to 93 (although some tools threshold at a lower value, such as 40 or 60) and are frequently encoded as a single character C = ASCII(Q+33). Thus, C spans all the printable characters and ranges from ASCII 33 to 126 (! to ~). For reference, an ASCII table is provided below (or run man ascii in a terminal):

Hex                                   Decimal
0x 2 3 4 5 6 7       30 40 50 60 70 80 90 100 110 120
 -------------      ---------------------------------
0:   0 @ P ` p     0:    (  2  <  F  P  Z  d   n   x
1: ! 1 A Q a q     1:    )  3  =  G  Q  [  e   o   y
2: " 2 B R b r     2:    *  4  >  H  R  \  f   p   z
3: # 3 C S c s     3: !  +  5  ?  I  S  ]  g   q   {
4: $ 4 D T d t     4: "  ,  6  @  J  T  ^  h   r   |
5: % 5 E U e u     5: #  -  7  A  K  U  _  i   s   }
6: & 6 F V f v     6: $  .  8  B  L  V  `  j   t   ~
7: ' 7 G W g w     7: %  /  9  C  M  W  a  k   u  DEL
8: ( 8 H X h x     8: &  0  :  D  N  X  b  l   v
9: ) 9 I Y i y     9: '  1  ;  E  O  Y  c  m   w
A: * : J Z j z
B: + ; K [ k {
C: , < L \ l |
D: - = M ] m }
E: . > N ^ n ~
F: / ? O _ o DEL