Package picard.illumina
Class IlluminaBasecallsToSam
java.lang.Object
picard.cmdline.CommandLineProgram
picard.illumina.ExtractBarcodesProgram
picard.illumina.IlluminaBasecallsToSam
IlluminaBasecallsToSam transforms a lane of Illumina data file formats (bcl, locs, clocs, qseqs, etc.) into
SAM, BAM or CRAM file format.
In this application, barcode data is read from Illumina data file groups, each of which is associated with a tile.
Each tile may contain data for any number of barcodes, and a single barcode's data may span multiple tiles. Once the
barcode data is collected from files, each barcode's data is written to its own SAM/BAM/CRAM. The barcode data must be
written in order; this means that barcode data from each tile is sorted before it is written to file, and that if a
barcode's data does span multiple tiles, data collected from each tile must be written in the order of the tiles
themselves.
This class employs a number of private subclasses to achieve this goal. The TileReadAggregator controls the flow
of operation. It is fed a number of Tiles which it uses to spawn TileReaders. TileReaders are responsible for
reading Illumina data for their respective tiles from disk, and as they collect that data, it is fed back into the
TileReadAggregator. When a TileReader completes a tile, it notifies the TileReadAggregator, which reviews what was
read and conditionally queues its writing to disk, baring in mind the requirements of write-order described in the
previous paragraph. As writes complete, the TileReadAggregator re-evaluates the state of reads/writes and may queue
more writes. When all barcodes for all tiles have been written, the TileReadAggregator shuts down.
The TileReadAggregator controls task execution using a specialized ThreadPoolExecutor. It accepts special Runnables
of type PriorityRunnable which allow a priority to be assigned to the runnable. When the ThreadPoolExecutor is
assigning threads, it gives priority to those PriorityRunnables with higher priority values. In this application,
TileReaders are assigned lowest priority, and write tasks are assigned high priority. It is designed in this fashion
to minimize the amount of time data must remain in memory (write the data as soon as possible, then discard it from
memory) while maximizing CPU usage.
-
Field Summary
FieldsModifier and TypeFieldDescriptionboolean
picard.illumina.ClusterDataToSamConverter.PopulateBarcode
boolean
boolean
boolean
boolean
int
static final String
Fields inherited from class picard.illumina.ExtractBarcodesProgram
BARCODE_COLUMN, BARCODE_NAME_COLUMN, BARCODE_PREFIXES, BARCODE_SEQUENCE_COLUMN, barcodeToMetrics, BASECALLS_DIR, COMPRESS_OUTPUTS, DISTANCE_MODE, INPUT_PARAMS_FILE, inputReadStructure, LANE, LIBRARY_NAME_COLUMN, MAX_MISMATCHES, MAX_NO_CALLS, METRICS_FILE, MIN_MISMATCH_DELTA, MINIMUM_BASE_QUALITY, MINIMUM_QUALITY, noMatchMetric, READ_STRUCTURE
Fields inherited from class picard.cmdline.CommandLineProgram
COMPRESSION_LEVEL, CREATE_INDEX, CREATE_MD5_FILE, MAX_ALLOWABLE_ONE_LINE_SUMMARY_LENGTH, MAX_RECORDS_IN_RAM, QUIET, REFERENCE_SEQUENCE, referenceSequence, specialArgumentsCollection, SYNTAX_TRANSITION_URL, TMP_DIR, USE_JDK_DEFLATER, USE_JDK_INFLATER, VALIDATION_STRINGENCY, VERBOSITY
-
Constructor Summary
Constructors -
Method Summary
Methods inherited from class picard.illumina.ExtractBarcodesProgram
collectErrorMessages, createBarcodeExtractor, finalizeMetrics, outputMetrics, parseInputFile
Methods inherited from class picard.cmdline.CommandLineProgram
checkRInstallation, getCommandLine, getCommandLineParser, getCommandLineParserForArgs, getDefaultHeaders, getFaqLink, getMetricsFile, getPGRecord, getStandardUsagePreamble, getStandardUsagePreamble, getVersion, hasWebDocumentation, instanceMain, instanceMainWithExit, makeReferenceArgumentCollection, parseArgs, requiresReference, setDefaultHeaders, useLegacyParser
-
Field Details
-
USAGE
- See Also:
-
BARCODES_DIR
@Argument(doc="The barcodes directory with _barcode.txt files (generated by ExtractIlluminaBarcodes). If not set, use BASECALLS_DIR. ", shortName="BCD", optional=true) public File BARCODES_DIR -
OUTPUT
@Argument(doc="Deprecated (use LIBRARY_PARAMS). The output SAM, BAM or CRAM file. Format is determined by extension.", shortName="O", mutex={"BARCODE_PARAMS","LIBRARY_PARAMS"}) public File OUTPUT -
RUN_BARCODE
-
SAMPLE_ALIAS
@Argument(doc="Deprecated (use LIBRARY_PARAMS). The name of the sequenced sample", shortName="ALIAS", mutex={"BARCODE_PARAMS","LIBRARY_PARAMS"}) public String SAMPLE_ALIAS -
READ_GROUP_ID
@Argument(doc="ID used to link RG header record with RG tag in SAM record. If these are unique in SAM files that get merged, merge performance is better. If not specified, READ_GROUP_ID will be set to <first 5 chars of RUN_BARCODE>.<LANE> .", shortName="RG", optional=true) public String READ_GROUP_ID -
LIBRARY_NAME
@Argument(doc="Deprecated (use LIBRARY_PARAMS). The name of the sequenced library", shortName="LIB", optional=true, mutex={"BARCODE_PARAMS","LIBRARY_PARAMS"}) public String LIBRARY_NAME -
SEQUENCING_CENTER
@Argument(doc="The name of the sequencing center that produced the reads. Used to set the @RG->CN header tag.") public String SEQUENCING_CENTER -
RUN_START_DATE
-
PLATFORM
@Argument(doc="The name of the sequencing technology that produced the read.", optional=true) public String PLATFORM -
INCLUDE_BC_IN_RG_TAG
@Argument(doc="Whether to include the barcode information in the @RG->BC header tag. Defaults to false until included in the SAM spec.") public boolean INCLUDE_BC_IN_RG_TAG -
BARCODE_PARAMS
@Argument(doc="Deprecated (use LIBRARY_PARAMS). Tab-separated file for creating all output SAM, BAM or CRAM files for barcoded run with single IlluminaBasecallsToSam invocation. Columns are BARCODE, OUTPUT, SAMPLE_ALIAS, and LIBRARY_NAME. Row with BARCODE=N is used to specify a file for no barcode match", mutex={"OUTPUT","SAMPLE_ALIAS","LIBRARY_NAME","LIBRARY_PARAMS"}) public File BARCODE_PARAMS -
LIBRARY_PARAMS
@Argument(doc="Tab-separated file for creating all output SAM, BAM or CRAM files for a lane with single IlluminaBasecallsToSam invocation. The columns are OUTPUT, SAMPLE_ALIAS, and LIBRARY_NAME, BARCODE_1, BARCODE_2 ... BARCODE_X where X = number of barcodes per cluster (optional). Row with BARCODE_1 set to \'N\' is used to specify a file for no barcode match. You may also provide any 2 letter RG header attributes (excluding PU, CN, PL, and DT) as columns in this file and the values for those columns will be inserted into the RG tag for the SAM, BAM or CRAM file created for a given row.", mutex={"OUTPUT","SAMPLE_ALIAS","LIBRARY_NAME","BARCODE_PARAMS"}) public File LIBRARY_PARAMS -
ADAPTERS_TO_CHECK
@Argument(doc="Which adapters to look for in the read.") public List<IlluminaUtil.IlluminaAdapterPair> ADAPTERS_TO_CHECK -
FIVE_PRIME_ADAPTER
@Argument(doc="For specifying adapters other than standard Illumina", optional=true) public String FIVE_PRIME_ADAPTER -
THREE_PRIME_ADAPTER
@Argument(doc="For specifying adapters other than standard Illumina", optional=true) public String THREE_PRIME_ADAPTER -
NUM_PROCESSORS
@Argument(doc="The number of threads to run in parallel. If NUM_PROCESSORS = 0, number of cores is automatically set to the number of cores available on the machine. If NUM_PROCESSORS < 0, then the number of cores used will be the number available on the machine less NUM_PROCESSORS.") public Integer NUM_PROCESSORS -
FIRST_TILE
@Argument(doc="If set, this is the first tile to be processed (used for debugging). Note that tiles are not processed in numerical order.", mutex="PROCESS_SINGLE_TILE", optional=true) public Integer FIRST_TILE -
TILE_LIMIT
@Argument(doc="If set, process no more than this many tiles (used for debugging).", optional=true) public Integer TILE_LIMIT -
PROCESS_SINGLE_TILE
@Argument(doc="If set, process only the tile number given and prepend the tile number to the output file name.", mutex="FIRST_TILE", optional=true) public Integer PROCESS_SINGLE_TILE -
APPLY_EAMSS_FILTER
@Argument(doc="Apply EAMSS filtering to identify inappropriately quality scored bases towards the ends of reads and convert their quality scores to Q2.") public boolean APPLY_EAMSS_FILTER -
MAX_READS_IN_RAM_PER_TILE
@Argument(doc="Configure SortingCollections to store this many records before spilling to disk. For an indexed run, each SortingCollection gets this value/number of indices. Deprecated: use `MAX_RECORDS_IN_RAM`") public int MAX_READS_IN_RAM_PER_TILE -
INCLUDE_NON_PF_READS
@Argument(doc="Whether to include non-PF reads", shortName="NONPF", optional=true) public boolean INCLUDE_NON_PF_READS -
IGNORE_UNEXPECTED_BARCODES
@Argument(doc="Whether to ignore reads whose barcodes are not found in LIBRARY_PARAMS. Useful when outputting SAM, BAM or CRAM files for only a subset of the barcodes in a lane.", shortName="IGNORE_UNEXPECTED") public boolean IGNORE_UNEXPECTED_BARCODES -
MOLECULAR_INDEX_TAG
@Argument(doc="The tag to use to store any molecular indexes. If more than one molecular index is found, they will be concatenated and stored here.", optional=true) public String MOLECULAR_INDEX_TAG -
MOLECULAR_INDEX_BASE_QUALITY_TAG
@Argument(doc="The tag to use to store any molecular index base qualities. If more than one molecular index is found, their qualities will be concatenated and stored here (.i.e. the number of \"M\" operators in the READ_STRUCTURE)", optional=true) public String MOLECULAR_INDEX_BASE_QUALITY_TAG -
TAG_PER_MOLECULAR_INDEX
-
BARCODE_POPULATION_STRATEGY
@Argument(doc="When should the sample barcode (as read by the sequencer) be placed on the reads in the BC tag?") public picard.illumina.ClusterDataToSamConverter.PopulateBarcode BARCODE_POPULATION_STRATEGY -
INCLUDE_BARCODE_QUALITY
@Argument(doc="Should the barcode quality be included when the sample barcode is included?") public boolean INCLUDE_BARCODE_QUALITY -
SORT
@Argument(doc="If true, the output records are sorted by read name. Otherwise they are unsorted.") public Boolean SORT -
MATCH_BARCODES_INLINE
@Argument(doc="If true, match barcodes on the fly. Otherwise parse the barcodes from the barcodes file.") public Boolean MATCH_BARCODES_INLINE
-
-
Constructor Details
-
IlluminaBasecallsToSam
public IlluminaBasecallsToSam()
-
-
Method Details
-
doWork
protected int doWork()Description copied from class:CommandLineProgram
Do the work after command line has been parsed. RuntimeException may be thrown by this method, and are reported appropriately.- Specified by:
doWork
in classCommandLineProgram
- Returns:
- program exit status.
-
customCommandLineValidation
Put any custom command-line validation in an override of this method. clp is initialized at this point and can be used to print usage and access args. Any options set by command-line parser can be validated.- Overrides:
customCommandLineValidation
in classExtractBarcodesProgram
- Returns:
- null if command line is valid. If command line is invalid, returns an array of error message to be written to the appropriate place.
-