Package htsjdk.samtools.util
Class QualityEncodingDetector
- java.lang.Object
-
- htsjdk.samtools.util.QualityEncodingDetector
-
public class QualityEncodingDetector extends Object
Utility for determining the type of quality encoding/format (seeFastqQualityFormat
) used in a SAM/BAM or Fastq. To use this class, invoke the detect() method with aSamReader
orFastqReader
, as appropriate. The consumer is responsible for closing readers.
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description static class
QualityEncodingDetector.FileContext
-
Field Summary
Fields Modifier and Type Field Description static long
DEFAULT_MAX_RECORDS_TO_ITERATE
The maximum number of records over which the detector will iterate before making a determination, by default.
-
Constructor Summary
Constructors Constructor Description QualityEncodingDetector()
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description long
add(long maxRecords, FastqReader... readers)
Adds the provided reader's records to the detector.long
add(long maxRecords, SamReader reader)
Adds the provided reader's records to the detector.long
add(long maxRecords, CloseableIterator<SAMRecord> iterator)
long
add(long maxRecords, CloseableIterator<SAMRecord> iterator, boolean useOriginalQualities)
Adds the provided iterator's records (optionally using the original qualities) to the detector.void
add(FastqRecord fastqRecord)
Adds the provided record's qualities to the detector.void
add(SAMRecord samRecord)
void
add(SAMRecord samRecord, boolean useOriginalQualities)
Adds the provided record's qualities to the detector.static FastqQualityFormat
detect(long maxRecords, FastqReader... readers)
Reads through the records in the provided fastq reader and uses their quality scores to determine the quality format used in the fastq.static FastqQualityFormat
detect(long maxRecords, SamReader reader)
static FastqQualityFormat
detect(long maxRecords, CloseableIterator<SAMRecord> iterator)
static FastqQualityFormat
detect(long maxRecords, CloseableIterator<SAMRecord> iterator, boolean useOriginalQualities)
Reads through the records in the provided SAM reader and uses their quality scores to determine the quality format used in the SAM.static FastqQualityFormat
detect(FastqReader... readers)
static FastqQualityFormat
detect(SamReader reader)
static FastqQualityFormat
detect(SamReader reader, FastqQualityFormat expectedQualityFormat)
Reads through the records in the provided SAM reader and uses their quality scores to sanity check the expected quality passed in.FastqQualityFormat
generateBestGuess(QualityEncodingDetector.FileContext context, FastqQualityFormat expectedQuality)
Make the best guess at the quality format.EnumSet<FastqQualityFormat>
generateCandidateQualities(boolean checkExpected)
Processes collected quality data and applies rules to determine which quality formats are possible.boolean
isDeterminationAmbiguous()
Tests whether or not the detector can make a determination without guessing (i.e., if all but one quality format can be excluded using established exclusion conventions).
-
-
-
Field Detail
-
DEFAULT_MAX_RECORDS_TO_ITERATE
public static final long DEFAULT_MAX_RECORDS_TO_ITERATE
The maximum number of records over which the detector will iterate before making a determination, by default.- See Also:
- Constant Field Values
-
-
Method Detail
-
add
public long add(long maxRecords, FastqReader... readers)
Adds the provided reader's records to the detector.- Returns:
- The number of records read
-
add
public long add(long maxRecords, SamReader reader)
Adds the provided reader's records to the detector.- Returns:
- The number of records read
-
add
public long add(long maxRecords, CloseableIterator<SAMRecord> iterator, boolean useOriginalQualities)
Adds the provided iterator's records (optionally using the original qualities) to the detector.- Returns:
- The number of records read
-
add
public long add(long maxRecords, CloseableIterator<SAMRecord> iterator)
-
add
public void add(FastqRecord fastqRecord)
Adds the provided record's qualities to the detector.
-
add
public void add(SAMRecord samRecord, boolean useOriginalQualities)
Adds the provided record's qualities to the detector.
-
add
public void add(SAMRecord samRecord)
-
isDeterminationAmbiguous
public boolean isDeterminationAmbiguous()
Tests whether or not the detector can make a determination without guessing (i.e., if all but one quality format can be excluded using established exclusion conventions).- Returns:
- True if more than one format is possible after exclusions; false otherwise
-
generateCandidateQualities
public EnumSet<FastqQualityFormat> generateCandidateQualities(boolean checkExpected)
Processes collected quality data and applies rules to determine which quality formats are possible. Specifically, for each format's known range of possible values (its "quality scheme"), exclude formats if any observed values fall outside of that range. Additionally, exclude formats for which we expect to see at least one quality in a range of values, but do not. (For example, for Phred, we expect to eventually see a value below 58. If we never see such a value, we exclude Phred as a possible format unless the checkExpected flag is set to false in which case we leave Phred as a possible quality format.)
-
detect
public static FastqQualityFormat detect(long maxRecords, FastqReader... readers)
Reads through the records in the provided fastq reader and uses their quality scores to determine the quality format used in the fastq.- Parameters:
readers
- The fastq readers from which qualities are to be read; at least one must be providedmaxRecords
- The maximum number of records to read from the reader before making a determination (a guess, so more records is better)- Returns:
- The determined quality format
-
detect
public static FastqQualityFormat detect(FastqReader... readers)
-
detect
public static FastqQualityFormat detect(long maxRecords, CloseableIterator<SAMRecord> iterator, boolean useOriginalQualities)
Reads through the records in the provided SAM reader and uses their quality scores to determine the quality format used in the SAM.- Parameters:
iterator
- The iterator from which SAM records are to be readmaxRecords
- The maximum number of records to read from the reader before making a determination (a guess,useOriginalQualities
- whether to use the original qualities (if available) rather than the current ones so more records is better)- Returns:
- The determined quality format
-
detect
public static FastqQualityFormat detect(long maxRecords, CloseableIterator<SAMRecord> iterator)
-
detect
public static FastqQualityFormat detect(long maxRecords, SamReader reader)
-
detect
public static FastqQualityFormat detect(SamReader reader)
-
detect
public static FastqQualityFormat detect(SamReader reader, FastqQualityFormat expectedQualityFormat)
Reads through the records in the provided SAM reader and uses their quality scores to sanity check the expected quality passed in. If the expected quality format is sane we just hand this back otherwise we throw aSAMException
.
-
generateBestGuess
public FastqQualityFormat generateBestGuess(QualityEncodingDetector.FileContext context, FastqQualityFormat expectedQuality)
Make the best guess at the quality format. If an expected quality is passed in the values are sanity checked (ignoring expected range) and if they are deemed acceptable the expected quality is passed back. Otherwise we use a set of heuristics to make our best guess.
-
-