Class SortedBasecallsConverter<CLUSTER_OUTPUT_RECORD>


  • public class SortedBasecallsConverter<CLUSTER_OUTPUT_RECORD>
    extends BasecallsConverter<CLUSTER_OUTPUT_RECORD>
    SortedBasecallsConverter utilizes an underlying IlluminaDataProvider to convert parsed and decoded sequencing data from standard Illumina formats to specific output records (FASTA records/SAM records). This data is processed on a tile by tile basis and sorted based on a output record comparator.

    The underlying IlluminaDataProvider apply several optional transformations that can include EAMSS filtering, non-PF read filtering and quality score recoding using a BclQualityEvaluationStrategy.

    The converter can also limit the scope of data that is converted from the data provider by setting the tile to start on (firstTile) and the total number of tiles to process (tileLimit).

    Additionally, BasecallsConverter can optionally demultiplex reads by outputting barcode specific reads to their associated writers.

    • Field Detail

      • log

        protected static final htsjdk.samtools.util.Log log
    • Constructor Detail

      • SortedBasecallsConverter

        protected SortedBasecallsConverter​(File basecallsDir,
                                           File barcodesDir,
                                           int[] lanes,
                                           ReadStructure readStructure,
                                           Map<String,​? extends htsjdk.io.Writer<CLUSTER_OUTPUT_RECORD>> barcodeRecordWriterMap,
                                           boolean demultiplex,
                                           int maxReadsInRamPerTile,
                                           List<File> tmpDirs,
                                           int numThreads,
                                           Integer firstTile,
                                           Integer tileLimit,
                                           Comparator<CLUSTER_OUTPUT_RECORD> outputRecordComparator,
                                           htsjdk.samtools.util.SortingCollection.Codec<CLUSTER_OUTPUT_RECORD> codecPrototype,
                                           Class<CLUSTER_OUTPUT_RECORD> outputRecordClass,
                                           BclQualityEvaluationStrategy bclQualityEvaluationStrategy,
                                           boolean ignoreUnexpectedBarcodes,
                                           boolean applyEamssFiltering,
                                           boolean includeNonPfReads,
                                           htsjdk.io.AsyncWriterPool writerPool,
                                           BarcodeExtractor barcodeExtractor)
        Constructs a new SortedBaseCallsConverter.
        Parameters:
        basecallsDir - Where to read basecalls from.
        barcodesDir - Where to read barcodes from (optional; use basecallsDir if not specified).
        lanes - What lanes to process.
        readStructure - How to interpret each cluster.
        barcodeRecordWriterMap - Map from barcode to CLUSTER_OUTPUT_RECORD writer. If demultiplex is false, must contain one writer stored with key=null.
        demultiplex - If true, output is split by barcode, otherwise all are written to the same output stream.
        maxReadsInRamPerTile - Configures number of reads each tile will store in RAM before spilling to disk.
        tmpDirs - For SortingCollection spilling.
        numThreads - Controls number of threads.
        firstTile - (For debugging) If non-null, start processing at this tile.
        tileLimit - (For debugging) If non-null, process no more than this many tiles.
        outputRecordComparator - For sorting output records within a single tile.
        codecPrototype - For spilling output records to disk.
        outputRecordClass - Class needed to create SortingCollections.
        bclQualityEvaluationStrategy - The basecall quality evaluation strategy that is applyed to decoded base calls.
        ignoreUnexpectedBarcodes - If true, will ignore reads whose called barcode is not found in barcodeRecordWriterMap.
        applyEamssFiltering - If true, apply EAMSS filtering if parsing BCLs for bases and quality scores.
        includeNonPfReads - If true, will include ALL reads (including those which do not have PF set). This option does nothing for instruments that output cbcls (Novaseqs)
    • Method Detail

      • processTilesAndWritePerSampleOutputs

        public void processTilesAndWritePerSampleOutputs​(Set<String> barcodes)
                                                  throws IOException
        Set up tile processing and record writing threads for this converter. This creates a tile processing thread pool of size `numThreads`. The tile processing threads notify the completed work checking thread when they are done processing a thread. The completed work checking thread will then dispatch the record writing for tiles in order.
        Specified by:
        processTilesAndWritePerSampleOutputs in class BasecallsConverter<CLUSTER_OUTPUT_RECORD>
        Parameters:
        barcodes - The barcodes used for demultiplexing. When there is no demultiplexing done this should be a Set containing a single null value.
        Throws:
        IOException
      • awaitTileProcessingCompletion

        protected void awaitTileProcessingCompletion()
                                              throws IOException
        Throws:
        IOException