Class CbclReader

java.lang.Object
picard.illumina.parser.readers.BaseBclReader
picard.illumina.parser.readers.CbclReader
All Implemented Interfaces:
htsjdk.samtools.util.CloseableIterator<CbclData>, Closeable, AutoCloseable, Iterator<CbclData>

public class CbclReader extends BaseBclReader implements htsjdk.samtools.util.CloseableIterator<CbclData>
------------------------------------- CBCL Header ----------------------------------- Bytes 0 - 1 Version number, current version is 1 unsigned 16 bits little endian integer Bytes 2 - 5 Header size unsigned 32 bits little endian integer Byte 6 Number of bits per basecall unsigned Byte 7 Number of bits per q-score unsigned

q-val mapping info Bytes 0-3 Number of bins (B), zero indicates no mapping B pairs of 4 byte values (if B > 0) {from, to}, {from, to}, {from, to} from: quality score bin to: quality score

Number of tile records unsigned 32bits little endian integer

gzip virtual file offsets, one record per tile Bytes 0-3: tile number Bytes 4-7 Number of clusters that were written into the current block (required due to bit-packed q-scores) unsigned 32 bit integer

Bytes 8-11 Uncompressed block size of the tile data (useful for sanity check when excluding non-PF clusters) unsigned 32 bit integer

Bytes 12-15 Compressed block size of the tile data unsigned 32 bit integer

non-PF clusters excluded flag 1: non-PF clusters are excluded 0: non-PF clusters are included

------------------------------------- CBCL File Content -----------------------------------

N blocks of gzip files, where N is the number of tiles.

Each block consists of C number of basecall, quality score pairs where C is the number of clusters for the given tile.

Each basecall, quality score pair has the following format (assuming 2 bits are used for the basecalls): Bits 0-1: Basecalls (respectively [A, C, G, T] for [00, 01, 10, 11]) Bits 2 and up: Quality score (unsigned Q bit little endian integer where Q is the number of bits per q-score). For a two bit quality score, this is two clusters per byte where the bottom 4 bits are the first cluster and the higher 4 bits are the second cluster.