Quality scores (Qscore
) are frequently used in genome sequencing datasets to encode the probability of a basecalling or alignment error (Perror
).
To span a wide range of values, Qscore = -10log10(Perror)
.
Since this formula is somewhat non-intuitive, here's a table:
Phred Quality Score | Probability of Error | Accuracy |
---|---|---|
0 | 1 in 1 | 0% |
3 | 1 in 2 | 50.1% |
10 | 1 in 10 | 90% |
20 | 1 in 100 | 99% |
30 | 1 in 1,000 | 99.9% |
40 | 1 in 10,000 | 99.99% |
Quality scores range from 0 to 93 (although some tools threshold at a lower value, such as 40 or 60) and are frequently encoded as a single character C = ASCII(Q+33)
.
Thus, C
spans all the printable characters and ranges from ASCII 33 to 126 (!
to ~
).
For reference, an ASCII table is provided below (or run man ascii
in a terminal):
Hex Decimal
0x 2 3 4 5 6 7 30 40 50 60 70 80 90 100 110 120
------------- ---------------------------------
0: 0 @ P ` p 0: ( 2 < F P Z d n x
1: ! 1 A Q a q 1: ) 3 = G Q [ e o y
2: " 2 B R b r 2: * 4 > H R \ f p z
3: # 3 C S c s 3: ! + 5 ? I S ] g q {
4: $ 4 D T d t 4: " , 6 @ J T ^ h r |
5: % 5 E U e u 5: # - 7 A K U _ i s }
6: & 6 F V f v 6: $ . 8 B L V ` j t ~
7: ' 7 G W g w 7: % / 9 C M W a k u DEL
8: ( 8 H X h x 8: & 0 : D N X b l v
9: ) 9 I Y i y 9: ' 1 ; E O Y c m w
A: * : J Z j z
B: + ; K [ k {
C: , < L \ l |
D: - = M ] m }
E: . > N ^ n ~
F: / ? O _ o DEL