libStatGen Software  1
SamFile Class Reference

Allows the user to easily read/write a SAM/BAM file. More...

#include <SamFile.h>

Inheritance diagram for SamFile:
Collaboration diagram for SamFile:

Public Types

enum  OpenType { READ, WRITE }
 Enum for indicating whether to open the file for read or write. More...
 
enum  SortedType { UNSORTED = 0, FLAG, COORDINATE, QUERY_NAME }
 Enum for indicating the type of sort expected in the file. More...
 

Public Member Functions

 SamFile ()
 Default Constructor, initializes the variables, but does not open any files. More...
 
 SamFile (ErrorHandler::HandlingType errorHandlingType)
 Constructor that sets the error handling type. More...
 
 SamFile (const char *filename, OpenType mode)
 Constructor that opens the specified file based on the specified mode (READ/WRITE), aborts if the file could not be opened. More...
 
 SamFile (const char *filename, OpenType mode, ErrorHandler::HandlingType errorHandlingType)
 Constructor that opens the specified file based on the specified mode (READ/WRITE) and handles errors per the specified handleType. More...
 
 SamFile (const char *filename, OpenType mode, SamFileHeader *header)
 Constructor that opens the specified file based on the specified mode (READ/WRITE) and reads the header, aborts if the file could not be opened or the header not read. More...
 
 SamFile (const char *filename, OpenType mode, ErrorHandler::HandlingType errorHandlingType, SamFileHeader *header)
 Constructor that opens the specified file based on the specified mode (READ/WRITE) and reads the header, handling errors per the specified handleType. More...
 
virtual ~SamFile ()
 Destructor.
 
bool OpenForRead (const char *filename, SamFileHeader *header=NULL)
 Open a sam/bam file for reading with the specified filename, determing the type of file and SAM/BAM by reading the file (if not stdin). More...
 
bool OpenForWrite (const char *filename, SamFileHeader *header=NULL)
 Open a sam/bam file for writing with the specified filename, determining SAM/BAM from the extension (.bam = BAM). More...
 
bool ReadBamIndex (const char *filename)
 Read the specified bam index file. More...
 
bool ReadBamIndex ()
 Read the bam index file using the BAM filename as a base. More...
 
void SetReference (GenomeSequence *reference)
 Sets the reference to the specified genome sequence object. More...
 
void SetReadSequenceTranslation (SamRecord::SequenceTranslation translation)
 Set the type of sequence translation to use when reading the sequence. More...
 
void SetWriteSequenceTranslation (SamRecord::SequenceTranslation translation)
 Set the type of sequence translation to use when writing the sequence. More...
 
void Close ()
 Close the file if there is one open.
 
bool IsOpen ()
 Returns whether or not the file has been opened successfully. More...
 
bool IsEOF ()
 Returns whether or not the end of the file has been reached. More...
 
bool ReadHeader (SamFileHeader &header)
 Reads the header section from the file and stores it in the passed in header. More...
 
bool WriteHeader (SamFileHeader &header)
 Writes the specified header into the file. More...
 
bool ReadRecord (SamFileHeader &header, SamRecord &record)
 Reads the next record from the file & stores it in the passed in record. More...
 
bool WriteRecord (SamFileHeader &header, SamRecord &record)
 Writes the specified record into the file. More...
 
void setSortedValidation (SortedType sortType)
 Set the flag to validate that the file is sorted as it is read/written. More...
 
uint32_t GetCurrentRecordCount ()
 Return the number of records that have been read/written so far.
 
SamStatus::Status GetFailure ()
 Deprecated, get the Status of the last call that sets status. More...
 
SamStatus::Status GetStatus ()
 Get the Status of the last call that sets status.
 
const char * GetStatusMessage ()
 Get the Status Message of the last call that sets status.
 
bool SetReadSection (int32_t refID)
 Sets which reference id (index into the BAM list of reference information) of the BAM file should be read. More...
 
bool SetReadSection (const char *refName)
 Sets which reference name of the BAM file should be read. More...
 
bool SetReadSection (int32_t refID, int32_t start, int32_t end, bool overlap=true)
 Sets which reference id (index into the BAM list of reference information) & start/end positions of the BAM file should be read. More...
 
bool SetReadSection (const char *refName, int32_t start, int32_t end, bool overlap=true)
 Sets which reference name & start/end positions of the BAM file should be read. More...
 
void SetReadFlags (uint16_t requiredFlags, uint16_t excludedFlags)
 Specify which reads should be returned by ReadRecord. More...
 
int32_t getNumMappedReadsFromIndex (int32_t refID)
 Get the number of mapped reads in the specified reference id. More...
 
int32_t getNumUnMappedReadsFromIndex (int32_t refID)
 Get the number of unmapped reads in the specified reference id. More...
 
int32_t getNumMappedReadsFromIndex (const char *refName, SamFileHeader &header)
 Get the number of mapped reads in the specified reference name. More...
 
int32_t getNumUnMappedReadsFromIndex (const char *refName, SamFileHeader &header)
 Get the number of unmapped reads in the specified reference name. More...
 
uint32_t GetNumOverlaps (SamRecord &samRecord)
 Returns the number of bases in the passed in read that overlap the region that is currently set. More...
 
void GenerateStatistics (bool genStats)
 Whether or not statistics should be generated for this file. More...
 
const BamIndexGetBamIndex ()
 Return the bam index if one has been opened. More...
 
int64_t GetCurrentPosition ()
 Get the current file position. More...
 
void DisableBuffering ()
 Turn off file read buffering.
 
void PrintStatistics ()
 Print the statistics that have been recorded due to a call to GenerateStatistics. More...
 
bool attemptRecoverySync (bool(*checkSignature)(void *data), int length)
 
void setAttemptRecovery (bool flag=false)
 

Protected Member Functions

void init ()
 
void init (const char *filename, OpenType mode, SamFileHeader *header)
 
void resetFile ()
 Resets the file prepping for a new file.
 
bool validateSortOrder (SamRecord &record, SamFileHeader &header)
 Validate that the record is sorted compared to the previously read record if there is one, according to the specified sort order. More...
 
SortedType getSortOrderFromHeader (SamFileHeader &header)
 
bool processNewSection (SamFileHeader &header)
 
bool ensureIndexedReadPosition ()
 
bool checkRecordInSection (SamRecord &record)
 

Protected Attributes

IFILE myFilePtr
 
GenericSamInterfacemyInterfacePtr
 
bool myIsOpenForRead
 Flag to indicate if a file is open for reading.
 
bool myIsOpenForWrite
 Flag to indicate if a file is open for writing.
 
bool myHasHeader
 Flag to indicate if a header has been read/written - required before being able to read/write a record. More...
 
SortedType mySortedType
 
int32_t myPrevCoord
 Previous values used for checking if the file is sorted.
 
int32_t myPrevRefID
 
String myPrevReadName
 
uint32_t myRecordCount
 Keep a count of the number of records that have been read/written so far.
 
SamStatisticsmyStatistics
 Pointer to the statistics for this file.
 
SamStatus myStatus
 The status of the last SamFile command.
 
bool myIsBamOpenForRead
 Values for reading Sorted BAM files via the index.
 
bool myNewSection
 
bool myOverlapSection
 
int32_t myRefID
 
int32_t myStartPos
 
int32_t myEndPos
 
uint64_t myCurrentChunkEnd
 
SortedChunkList myChunksToRead
 
BamIndexmyBamIndex
 
GenomeSequencemyRefPtr
 
SamRecord::SequenceTranslation myReadTranslation
 
SamRecord::SequenceTranslation myWriteTranslation
 
std::string myRefName
 

Detailed Description

Allows the user to easily read/write a SAM/BAM file.

The SamFile class contains additional functionality that allows a user to read specific sections of sorted & indexed BAM files. In order to take advantage of this capability, the index file must be read prior to setting the read section. This logic saves the time of having to read the entire file and takes advantage of the seeking capability of BGZF.

Definition at line 35 of file SamFile.h.

Member Enumeration Documentation

◆ OpenType

Enum for indicating whether to open the file for read or write.

Enumerator
READ 

open for reading.

WRITE 

open for writing.

Definition at line 39 of file SamFile.h.

39  {
40  READ, ///< open for reading.
41  WRITE ///< open for writing.
42  };
open for reading.
Definition: SamFile.h:40
open for writing.
Definition: SamFile.h:41

◆ SortedType

Enum for indicating the type of sort expected in the file.

Enumerator
UNSORTED 

file is not sorted.

FLAG 

SO flag from the header indicates the sort type.

COORDINATE 

file is sorted by coordinate.

QUERY_NAME 

file is sorted by queryname.

Definition at line 46 of file SamFile.h.

46  {
47  UNSORTED = 0, ///< file is not sorted.
48  FLAG, ///< SO flag from the header indicates the sort type.
49  COORDINATE, ///< file is sorted by coordinate.
50  QUERY_NAME ///< file is sorted by queryname.
51  };
file is not sorted.
Definition: SamFile.h:47
file is sorted by coordinate.
Definition: SamFile.h:49
SO flag from the header indicates the sort type.
Definition: SamFile.h:48
file is sorted by queryname.
Definition: SamFile.h:50

Constructor & Destructor Documentation

◆ SamFile() [1/6]

SamFile::SamFile ( )

Default Constructor, initializes the variables, but does not open any files.

Definition at line 26 of file SamFile.cpp.

References resetFile().

27  : myStatus()
28 {
29  init();
30  resetFile();
31 }
SamStatus myStatus
The status of the last SamFile command.
Definition: SamFile.h:415
void resetFile()
Resets the file prepping for a new file.
Definition: SamFile.cpp:953

◆ SamFile() [2/6]

SamFile::SamFile ( ErrorHandler::HandlingType  errorHandlingType)

Constructor that sets the error handling type.

Parameters
errorHandlingTypehow to handle errors.

Definition at line 35 of file SamFile.cpp.

References resetFile().

36  : myStatus(errorHandlingType)
37 {
38  init();
39  resetFile();
40 }
SamStatus myStatus
The status of the last SamFile command.
Definition: SamFile.h:415
void resetFile()
Resets the file prepping for a new file.
Definition: SamFile.cpp:953

◆ SamFile() [3/6]

SamFile::SamFile ( const char *  filename,
OpenType  mode 
)

Constructor that opens the specified file based on the specified mode (READ/WRITE), aborts if the file could not be opened.

Parameters
filenamename of the file to open.
modemode to use for opening the file.

Definition at line 45 of file SamFile.cpp.

46  : myStatus()
47 {
48  init(filename, mode, NULL);
49 }
SamStatus myStatus
The status of the last SamFile command.
Definition: SamFile.h:415

◆ SamFile() [4/6]

SamFile::SamFile ( const char *  filename,
OpenType  mode,
ErrorHandler::HandlingType  errorHandlingType 
)

Constructor that opens the specified file based on the specified mode (READ/WRITE) and handles errors per the specified handleType.

Parameters
filenamename of the file to open.
modemode to use for opening the file.
errorHandlingTypehow to handle errors.

Definition at line 54 of file SamFile.cpp.

56  : myStatus(errorHandlingType)
57 {
58  init(filename, mode, NULL);
59 }
SamStatus myStatus
The status of the last SamFile command.
Definition: SamFile.h:415

◆ SamFile() [5/6]

SamFile::SamFile ( const char *  filename,
OpenType  mode,
SamFileHeader header 
)

Constructor that opens the specified file based on the specified mode (READ/WRITE) and reads the header, aborts if the file could not be opened or the header not read.

Parameters
filenamename of the file to open.
modemode to use for opening the file.
headerto read into or write from

Definition at line 64 of file SamFile.cpp.

65  : myStatus()
66 {
67  init(filename, mode, header);
68 }
SamStatus myStatus
The status of the last SamFile command.
Definition: SamFile.h:415

◆ SamFile() [6/6]

SamFile::SamFile ( const char *  filename,
OpenType  mode,
ErrorHandler::HandlingType  errorHandlingType,
SamFileHeader header 
)

Constructor that opens the specified file based on the specified mode (READ/WRITE) and reads the header, handling errors per the specified handleType.

Parameters
filenamename of the file to open.
modemode to use for opening the file.
errorHandlingTypehow to handle errors.
headerto read into or write from

Definition at line 73 of file SamFile.cpp.

76  : myStatus(errorHandlingType)
77 {
78  init(filename, mode, header);
79 }
SamStatus myStatus
The status of the last SamFile command.
Definition: SamFile.h:415

Member Function Documentation

◆ GenerateStatistics()

void SamFile::GenerateStatistics ( bool  genStats)

Whether or not statistics should be generated for this file.

The value is carried over between files and is not reset, but the statistics themselves are reset between files.

Parameters
genStatsset to true if statistics should be generated, false if not.

Definition at line 878 of file SamFile.cpp.

References myStatistics.

Referenced by GetStatusMessage().

879 {
880  if(genStats)
881  {
882  if(myStatistics == NULL)
883  {
884  // Want to generate statistics, but do not yet have the
885  // structure for them, so create one.
886  myStatistics = new SamStatistics();
887  }
888  }
889  else
890  {
891  // Do not generate statistics, so if myStatistics is not NULL,
892  // delete it.
893  if(myStatistics != NULL)
894  {
895  delete myStatistics;
896  myStatistics = NULL;
897  }
898  }
899 
900 }
SamStatistics * myStatistics
Pointer to the statistics for this file.
Definition: SamFile.h:412

◆ GetBamIndex()

const BamIndex * SamFile::GetBamIndex ( )

Return the bam index if one has been opened.

Returns
const pointer to the bam index, or null if one has not been opened.

Definition at line 903 of file SamFile.cpp.

References GetStatusMessage(), myStatistics, SamRecord::NONE, OpenForRead(), OpenForWrite(), READ, and resetFile().

Referenced by GetStatusMessage().

904 {
905  return(myBamIndex);
906 }

◆ GetCurrentPosition()

int64_t SamFile::GetCurrentPosition ( )
inline

Get the current file position.

Returns
current position in the file.

Definition at line 336 of file SamFile.h.

References iftell().

337  {
338  return(iftell(myFilePtr));
339  }
int64_t iftell(IFILE file)
Get current position in the file.
Definition: InputFile.h:682

◆ GetFailure()

SamStatus::Status SamFile::GetFailure ( )
inline

Deprecated, get the Status of the last call that sets status.

To remain backwards compatable - will be removed later.

Definition at line 201 of file SamFile.h.

References GetStatus().

202  {
203  return(GetStatus());
204  }
SamStatus::Status GetStatus()
Get the Status of the last call that sets status.
Definition: SamFile.h:207

◆ getNumMappedReadsFromIndex() [1/2]

int32_t SamFile::getNumMappedReadsFromIndex ( int32_t  refID)

Get the number of mapped reads in the specified reference id.

Returns -1 for out of range refIDs.

Parameters
refIDreference ID for which to extract the number of mapped reads.
Returns
number of mapped reads for the specified reference id.

Definition at line 790 of file SamFile.cpp.

References StatGenStatus::FAIL_ORDER, BamIndex::getNumMappedReads(), myStatus, and StatGenStatus::setStatus().

Referenced by GetStatusMessage().

791 {
792  // The bam index must have already been read.
793  if(myBamIndex == NULL)
794  {
796  "Cannot get num mapped reads from the index until it has been read.");
797  return(false);
798  }
799  return(myBamIndex->getNumMappedReads(refID));
800 }
int32_t getNumMappedReads(int32_t refID)
Get the number of mapped reads for this reference id.
Definition: BamIndex.cpp:355
void setStatus(Status newStatus, const char *newMessage)
Set the status with the specified status enum and message.
SamStatus myStatus
The status of the last SamFile command.
Definition: SamFile.h:415
FAIL_ORDER: method failed because it was called out of order, like trying to read a file without open...
Definition: StatGenStatus.h:41

◆ getNumMappedReadsFromIndex() [2/2]

int32_t SamFile::getNumMappedReadsFromIndex ( const char *  refName,
SamFileHeader header 
)

Get the number of mapped reads in the specified reference name.

Returns -1 for unknown reference names.

Parameters
refNamereference name for which to extract the number of mapped reads.
headerheader object containing the map from refName to refID
Returns
number of mapped reads for the specified reference name.

Definition at line 820 of file SamFile.cpp.

References StatGenStatus::FAIL_ORDER, BamIndex::getNumMappedReads(), SamFileHeader::getReferenceID(), myStatus, BamIndex::REF_ID_UNMAPPED, and StatGenStatus::setStatus().

822 {
823  // The bam index must have already been read.
824  if(myBamIndex == NULL)
825  {
827  "Cannot get num mapped reads from the index until it has been read.");
828  return(false);
829  }
830  int32_t refID = BamIndex::REF_ID_UNMAPPED;
831  if((strcmp(refName, "") != 0) && (strcmp(refName, "*") != 0))
832  {
833  // Reference name specified, so read just the "-1" entries.
834  refID = header.getReferenceID(refName);
835  }
836  return(myBamIndex->getNumMappedReads(refID));
837 }
int32_t getNumMappedReads(int32_t refID)
Get the number of mapped reads for this reference id.
Definition: BamIndex.cpp:355
void setStatus(Status newStatus, const char *newMessage)
Set the status with the specified status enum and message.
SamStatus myStatus
The status of the last SamFile command.
Definition: SamFile.h:415
FAIL_ORDER: method failed because it was called out of order, like trying to read a file without open...
Definition: StatGenStatus.h:41
int getReferenceID(const String &referenceName, bool addID=false)
Get the reference ID for the specified reference name (chromosome).
static const int32_t REF_ID_UNMAPPED
The number used for the reference id of unmapped reads.
Definition: BamIndex.h:86

◆ GetNumOverlaps()

uint32_t SamFile::GetNumOverlaps ( SamRecord samRecord)

Returns the number of bases in the passed in read that overlap the region that is currently set.

Overlapping means that the bases occur in both the read and the reference as either matches or mismatches. This does not count insertions, deletions, clips, pads, or skips.

Parameters
samRecordto check for overlapping bases.
Returns
number of bases that overlap region that is currently set.

Definition at line 864 of file SamFile.cpp.

References SamRecord::getNumOverlaps(), SamRecord::setReference(), and SamRecord::setSequenceTranslation().

Referenced by GetStatusMessage().

865 {
866  if(myRefPtr != NULL)
867  {
868  samRecord.setReference(myRefPtr);
869  }
870  samRecord.setSequenceTranslation(myReadTranslation);
871 
872  // Get the overlaps in the sam record for the region currently set
873  // for this file.
874  return(samRecord.getNumOverlaps(myStartPos, myEndPos));
875 }
uint32_t getNumOverlaps(int32_t start, int32_t end)
Return the number of bases in this read that overlap the passed in region.
Definition: SamRecord.cpp:1841
void setSequenceTranslation(SequenceTranslation translation)
Set the type of sequence translation to use when getting the sequence.
Definition: SamRecord.cpp:187
void setReference(GenomeSequence *reference)
Set the reference to the specified genome sequence object.
Definition: SamRecord.cpp:178

◆ getNumUnMappedReadsFromIndex() [1/2]

int32_t SamFile::getNumUnMappedReadsFromIndex ( int32_t  refID)

Get the number of unmapped reads in the specified reference id.

Returns -1 for out of range refIDs.

Parameters
refIDreference ID for which to extract the number of unmapped reads.
Returns
number of unmapped reads for the specified reference id.

Definition at line 805 of file SamFile.cpp.

References StatGenStatus::FAIL_ORDER, BamIndex::getNumUnMappedReads(), myStatus, and StatGenStatus::setStatus().

Referenced by GetStatusMessage().

806 {
807  // The bam index must have already been read.
808  if(myBamIndex == NULL)
809  {
811  "Cannot get num unmapped reads from the index until it has been read.");
812  return(false);
813  }
814  return(myBamIndex->getNumUnMappedReads(refID));
815 }
int32_t getNumUnMappedReads(int32_t refID)
Get the number of unmapped reads for this reference id.
Definition: BamIndex.cpp:377
void setStatus(Status newStatus, const char *newMessage)
Set the status with the specified status enum and message.
SamStatus myStatus
The status of the last SamFile command.
Definition: SamFile.h:415
FAIL_ORDER: method failed because it was called out of order, like trying to read a file without open...
Definition: StatGenStatus.h:41

◆ getNumUnMappedReadsFromIndex() [2/2]

int32_t SamFile::getNumUnMappedReadsFromIndex ( const char *  refName,
SamFileHeader header 
)

Get the number of unmapped reads in the specified reference name.

Returns -1 for unknown reference names.

Parameters
refNamereference name for which to extract the number of unmapped reads.
headerheader object containing the map from refName to refID
Returns
number of unmapped reads for the specified reference name.

Definition at line 842 of file SamFile.cpp.

References StatGenStatus::FAIL_ORDER, BamIndex::getNumUnMappedReads(), SamFileHeader::getReferenceID(), myStatus, BamIndex::REF_ID_UNMAPPED, and StatGenStatus::setStatus().

844 {
845  // The bam index must have already been read.
846  if(myBamIndex == NULL)
847  {
849  "Cannot get num unmapped reads from the index until it has been read.");
850  return(false);
851  }
852  int32_t refID = BamIndex::REF_ID_UNMAPPED;
853  if((strcmp(refName, "") != 0) && (strcmp(refName, "*") != 0))
854  {
855  // Reference name specified, so read just the "-1" entries.
856  refID = header.getReferenceID(refName);
857  }
858  return(myBamIndex->getNumUnMappedReads(refID));
859 }
int32_t getNumUnMappedReads(int32_t refID)
Get the number of unmapped reads for this reference id.
Definition: BamIndex.cpp:377
void setStatus(Status newStatus, const char *newMessage)
Set the status with the specified status enum and message.
SamStatus myStatus
The status of the last SamFile command.
Definition: SamFile.h:415
FAIL_ORDER: method failed because it was called out of order, like trying to read a file without open...
Definition: StatGenStatus.h:41
int getReferenceID(const String &referenceName, bool addID=false)
Get the reference ID for the specified reference name (chromosome).
static const int32_t REF_ID_UNMAPPED
The number used for the reference id of unmapped reads.
Definition: BamIndex.h:86

◆ IsEOF()

bool SamFile::IsEOF ( )

Returns whether or not the end of the file has been reached.

Returns
true = EOF; false = not eof. If the file is not open, true is returned.

Definition at line 424 of file SamFile.cpp.

References ifeof().

425 {
426  if (myFilePtr != NULL)
427  {
428  // File Pointer is set, so return if eof.
429  return(ifeof(myFilePtr));
430  }
431  // File pointer is not set, so return true, eof.
432  return true;
433 }
int ifeof(IFILE file)
Check to see if we have reached the EOF (returns 0 if not EOF).
Definition: InputFile.h:654

◆ IsOpen()

bool SamFile::IsOpen ( )

Returns whether or not the file has been opened successfully.

Returns
true = open; false = not open.

Definition at line 410 of file SamFile.cpp.

References InputFile::isOpen().

411 {
412  if (myFilePtr != NULL)
413  {
414  // File Pointer is set, so return if it is open.
415  return(myFilePtr->isOpen());
416  }
417  // File pointer is not set, so return false, not open.
418  return false;
419 }
bool isOpen() const
Returns whether or not the file was successfully opened.
Definition: InputFile.h:423

◆ OpenForRead()

bool SamFile::OpenForRead ( const char *  filename,
SamFileHeader header = NULL 
)

Open a sam/bam file for reading with the specified filename, determing the type of file and SAM/BAM by reading the file (if not stdin).

Parameters
filenamethe sam/bam file to open for reading.
headerto read into or write from (optional)
Returns
true = success; false = failure.

Definition at line 93 of file SamFile.cpp.

References InputFile::BGZF, InputFile::DEFAULT, StatGenStatus::FAIL_IO, ifopen(), ifread(), ifrewind(), myIsBamOpenForRead, myIsOpenForRead, myStatus, ReadHeader(), resetFile(), InputFile::setAttemptRecovery(), StatGenStatus::setStatus(), StatGenStatus::SUCCESS, and InputFile::UNCOMPRESSED.

Referenced by GetBamIndex(), and Pileup< TestPileupElement >::processFile().

94 {
95  // Reset for any previously operated on files.
96  resetFile();
97 
98  int lastchar = 0;
99 
100  while (filename[lastchar] != 0) lastchar++;
101 
102  // If at least one character, check for '-'.
103  if((lastchar >= 1) && (filename[0] == '-'))
104  {
105  // Read from stdin - determine type of file to read.
106  // Determine if compressed bam.
107  if(strcmp(filename, "-.bam") == 0)
108  {
109  // Compressed bam - open as bgzf.
110  // -.bam is the filename, read compressed bam from stdin
111  filename = "-";
112 
113  myFilePtr = new InputFile;
114  // support recover mode - this switches in a reader
115  // capable of recovering from bad BGZF compression blocks.
116  myFilePtr->setAttemptRecovery(myAttemptRecovery);
117  myFilePtr->openFile(filename, "rb", InputFile::BGZF);
118 
119  myInterfacePtr = new BamInterface;
120 
121  // Read the magic string.
122  char magic[4];
123  ifread(myFilePtr, magic, 4);
124  }
125  else if(strcmp(filename, "-.ubam") == 0)
126  {
127  // uncompressed BAM File.
128  // -.ubam is the filename, read uncompressed bam from stdin.
129  // uncompressed BAM is still compressed with BGZF, but using
130  // compression level 0, so still open as BGZF since it has a
131  // BGZF header.
132  filename = "-";
133 
134  // Uncompressed, so do not require the eof block.
135 #ifdef __ZLIB_AVAILABLE__
136  BgzfFileType::setRequireEofBlock(false);
137 #endif
138  myFilePtr = ifopen(filename, "rb", InputFile::BGZF);
139 
140  myInterfacePtr = new BamInterface;
141 
142  // Read the magic string.
143  char magic[4];
144  ifread(myFilePtr, magic, 4);
145  }
146  else if((strcmp(filename, "-") == 0) || (strcmp(filename, "-.sam") == 0))
147  {
148  // SAM File.
149  // read sam from stdin
150  filename = "-";
151  myFilePtr = ifopen(filename, "rb", InputFile::UNCOMPRESSED);
152  myInterfacePtr = new SamInterface;
153  }
154  else
155  {
156  std::string errorMessage = "Invalid SAM/BAM filename, ";
157  errorMessage += filename;
158  errorMessage += ". From stdin, can only be '-', '-.sam', '-.bam', or '-.ubam'";
159  myStatus.setStatus(SamStatus::FAIL_IO, errorMessage.c_str());
160  delete myFilePtr;
161  myFilePtr = NULL;
162  return(false);
163  }
164  }
165  else
166  {
167  // Not from stdin. Read the file to determine the type.
168 
169  myFilePtr = new InputFile;
170 
171  // support recovery mode - this conditionally enables a reader
172  // capable of recovering from bad BGZF compression blocks.
173  myFilePtr->setAttemptRecovery(myAttemptRecovery);
174  bool rc = myFilePtr->openFile(filename, "rb", InputFile::DEFAULT);
175 
176  if (rc == false)
177  {
178  std::string errorMessage = "Failed to Open ";
179  errorMessage += filename;
180  errorMessage += " for reading";
181  myStatus.setStatus(SamStatus::FAIL_IO, errorMessage.c_str());
182  delete myFilePtr;
183  myFilePtr = NULL;
184  return(false);
185  }
186 
187  char magic[4];
188  ifread(myFilePtr, magic, 4);
189 
190  if (magic[0] == 'B' && magic[1] == 'A' && magic[2] == 'M' &&
191  magic[3] == 1)
192  {
193  myInterfacePtr = new BamInterface;
194  // Set that it is a bam file open for reading. This is needed to
195  // determine if an index file can be used.
196  myIsBamOpenForRead = true;
197  }
198  else
199  {
200  // Not a bam, so rewind to the beginning of the file so it
201  // can be read.
202  ifrewind(myFilePtr);
203  myInterfacePtr = new SamInterface;
204  }
205  }
206 
207  // File is open for reading.
208  myIsOpenForRead = true;
209 
210  // Read the header if one was passed in.
211  if(header != NULL)
212  {
213  return(ReadHeader(*header));
214  }
215 
216  // Successfully opened the file.
218  return(true);
219 }
bgzf file.
Definition: InputFile.h:48
bool ReadHeader(SamFileHeader &header)
Reads the header section from the file and stores it in the passed in header.
Definition: SamFile.cpp:437
unsigned int ifread(IFILE file, void *buffer, unsigned int size)
Read up to size bytes from the file into the buffer.
Definition: InputFile.h:600
method failed due to an I/O issue.
Definition: StatGenStatus.h:37
uncompressed file.
Definition: InputFile.h:46
method completed successfully.
Definition: StatGenStatus.h:32
void setStatus(Status newStatus, const char *newMessage)
Set the status with the specified status enum and message.
Class for easily reading/writing files without having to worry about file type (uncompressed, gzip, bgzf) when reading.
Definition: InputFile.h:36
Check the extension, if it is ".gz", treat as gzip, otherwise treat it as UNCOMPRESSED.
Definition: InputFile.h:45
SamStatus myStatus
The status of the last SamFile command.
Definition: SamFile.h:415
IFILE ifopen(const char *filename, const char *mode, InputFile::ifileCompression compressionMode=InputFile::DEFAULT)
Open a file with the specified name and mode, using a filename of "-" to indicate stdin/stdout...
Definition: InputFile.h:562
void resetFile()
Resets the file prepping for a new file.
Definition: SamFile.cpp:953
void ifrewind(IFILE file)
Reset to the beginning of the file (cannot be done for stdin/stdout).
Definition: InputFile.h:642
void setAttemptRecovery(bool flag=false)
Enable (default) or disable recovery.
Definition: InputFile.h:485
bool myIsOpenForRead
Flag to indicate if a file is open for reading.
Definition: SamFile.h:394
bool myIsBamOpenForRead
Values for reading Sorted BAM files via the index.
Definition: SamFile.h:418

◆ OpenForWrite()

bool SamFile::OpenForWrite ( const char *  filename,
SamFileHeader header = NULL 
)

Open a sam/bam file for writing with the specified filename, determining SAM/BAM from the extension (.bam = BAM).

Parameters
filenamethe sam/bam file to open for writing.
headerto read into or write from (optional)
Returns
true = success; false = failure.

Definition at line 223 of file SamFile.cpp.

References InputFile::BGZF, StatGenStatus::FAIL_IO, ifopen(), myIsOpenForWrite, myStatus, resetFile(), StatGenStatus::setStatus(), StatGenStatus::SUCCESS, InputFile::UNCOMPRESSED, and WriteHeader().

Referenced by GetBamIndex().

224 {
225  // Reset for any previously operated on files.
226  resetFile();
227 
228  int lastchar = 0;
229  while (filename[lastchar] != 0) lastchar++;
230  if (lastchar >= 4 &&
231  filename[lastchar - 4] == 'u' &&
232  filename[lastchar - 3] == 'b' &&
233  filename[lastchar - 2] == 'a' &&
234  filename[lastchar - 1] == 'm')
235  {
236  // BAM File.
237  // if -.ubam is the filename, write uncompressed bam to stdout
238  if((lastchar == 6) && (filename[0] == '-') && (filename[1] == '.'))
239  {
240  filename = "-";
241  }
242 
243  myFilePtr = ifopen(filename, "wb0", InputFile::BGZF);
244 
245  myInterfacePtr = new BamInterface;
246  }
247  else if (lastchar >= 3 &&
248  filename[lastchar - 3] == 'b' &&
249  filename[lastchar - 2] == 'a' &&
250  filename[lastchar - 1] == 'm')
251  {
252  // BAM File.
253  // if -.bam is the filename, write compressed bam to stdout
254  if((lastchar == 5) && (filename[0] == '-') && (filename[1] == '.'))
255  {
256  filename = "-";
257  }
258  myFilePtr = ifopen(filename, "wb", InputFile::BGZF);
259 
260  myInterfacePtr = new BamInterface;
261  }
262  else
263  {
264  // SAM File
265  // if - (followed by anything is the filename,
266  // write uncompressed sam to stdout
267  if((lastchar >= 1) && (filename[0] == '-'))
268  {
269  filename = "-";
270  }
271  myFilePtr = ifopen(filename, "wb", InputFile::UNCOMPRESSED);
272 
273  myInterfacePtr = new SamInterface;
274  }
275 
276  if (myFilePtr == NULL)
277  {
278  std::string errorMessage = "Failed to Open ";
279  errorMessage += filename;
280  errorMessage += " for writing";
281  myStatus.setStatus(SamStatus::FAIL_IO, errorMessage.c_str());
282  return(false);
283  }
284 
285  myIsOpenForWrite = true;
286 
287  // Write the header if one was passed in.
288  if(header != NULL)
289  {
290  return(WriteHeader(*header));
291  }
292 
293  // Successfully opened the file.
295  return(true);
296 }
bgzf file.
Definition: InputFile.h:48
method failed due to an I/O issue.
Definition: StatGenStatus.h:37
uncompressed file.
Definition: InputFile.h:46
method completed successfully.
Definition: StatGenStatus.h:32
void setStatus(Status newStatus, const char *newMessage)
Set the status with the specified status enum and message.
bool myIsOpenForWrite
Flag to indicate if a file is open for writing.
Definition: SamFile.h:396
bool WriteHeader(SamFileHeader &header)
Writes the specified header into the file.
Definition: SamFile.cpp:467
SamStatus myStatus
The status of the last SamFile command.
Definition: SamFile.h:415
IFILE ifopen(const char *filename, const char *mode, InputFile::ifileCompression compressionMode=InputFile::DEFAULT)
Open a file with the specified name and mode, using a filename of "-" to indicate stdin/stdout...
Definition: InputFile.h:562
void resetFile()
Resets the file prepping for a new file.
Definition: SamFile.cpp:953

◆ PrintStatistics()

void SamFile::PrintStatistics ( )
inline

Print the statistics that have been recorded due to a call to GenerateStatistics.

Definition at line 352 of file SamFile.h.

References myStatistics, resetFile(), and validateSortOrder().

352 {if(myStatistics != NULL) myStatistics->print();}
SamStatistics * myStatistics
Pointer to the statistics for this file.
Definition: SamFile.h:412

◆ ReadBamIndex() [1/2]

bool SamFile::ReadBamIndex ( const char *  filename)

Read the specified bam index file.

It must be read prior to setting a read section, for seeking and reading portions of a bam file.

Parameters
filenamethe name of the bam index file to be read.
Returns
true = success; false = failure.

Definition at line 300 of file SamFile.cpp.

References myStatus, BamIndex::readIndex(), StatGenStatus::setStatus(), and StatGenStatus::SUCCESS.

301 {
302  // Cleanup a previously setup index.
303  if(myBamIndex != NULL)
304  {
305  delete myBamIndex;
306  myBamIndex = NULL;
307  }
308 
309  // Create a new bam index.
310  myBamIndex = new BamIndex();
311  SamStatus::Status indexStat = myBamIndex->readIndex(bamIndexFilename);
312 
313  if(indexStat != SamStatus::SUCCESS)
314  {
315  std::string errorMessage = "Failed to read the bam Index file: ";
316  errorMessage += bamIndexFilename;
317  myStatus.setStatus(indexStat, errorMessage.c_str());
318  delete myBamIndex;
319  myBamIndex = NULL;
320  return(false);
321  }
323  return(true);
324 }
SamStatus::Status readIndex(const char *filename)
Definition: BamIndex.cpp:45
method completed successfully.
Definition: StatGenStatus.h:32
void setStatus(Status newStatus, const char *newMessage)
Set the status with the specified status enum and message.
SamStatus myStatus
The status of the last SamFile command.
Definition: SamFile.h:415
Status
Return value enum for StatGenFile methods.
Definition: StatGenStatus.h:31

◆ ReadBamIndex() [2/2]

bool SamFile::ReadBamIndex ( )

Read the bam index file using the BAM filename as a base.

It must be read prior to setting a read section, for seeking and reading portions of a bam file. Must be read after opening the BAM file since it uses the BAM filename as a base name for the index file. First it tries filename.bam.bai. If that fails, it tries it without the .bam extension, filename.bai.

Returns
true = success; false = failure.

Definition at line 328 of file SamFile.cpp.

References StatGenStatus::FAIL_ORDER, InputFile::getFileName(), myStatus, and StatGenStatus::setStatus().

329 {
330  if(myFilePtr == NULL)
331  {
332  // Can't read the bam index file because the BAM file has not yet been
333  // opened, so we don't know the base filename for the index file.
334  std::string errorMessage = "Failed to read the bam Index file -"
335  " the BAM file needs to be read first in order to determine"
336  " the index filename.";
337  myStatus.setStatus(SamStatus::FAIL_ORDER, errorMessage.c_str());
338  return(false);
339  }
340 
341  const char* bamBaseName = myFilePtr->getFileName();
342 
343  std::string indexName = bamBaseName;
344  indexName += ".bai";
345 
346  bool foundFile = true;
347  try
348  {
349  if(ReadBamIndex(indexName.c_str()) == false)
350  {
351  foundFile = false;
352  }
353  }
354  catch (std::exception&)
355  {
356  foundFile = false;
357  }
358 
359  // Check to see if the index file was found.
360  if(!foundFile)
361  {
362  // Not found - try without the bam extension.
363  // Locate the start of the bam extension
364  size_t startExt = indexName.find(".bam");
365  if(startExt == std::string::npos)
366  {
367  // Could not find the .bam extension, so just return false since the
368  // call to ReadBamIndex set the status.
369  return(false);
370  }
371  // Remove ".bam" and try reading the index again.
372  indexName.erase(startExt, 4);
373  return(ReadBamIndex(indexName.c_str()));
374  }
375  return(true);
376 }
void setStatus(Status newStatus, const char *newMessage)
Set the status with the specified status enum and message.
const char * getFileName() const
Get the filename that is currently opened.
Definition: InputFile.h:473
bool ReadBamIndex()
Read the bam index file using the BAM filename as a base.
Definition: SamFile.cpp:328
SamStatus myStatus
The status of the last SamFile command.
Definition: SamFile.h:415
FAIL_ORDER: method failed because it was called out of order, like trying to read a file without open...
Definition: StatGenStatus.h:41

◆ ReadHeader()

bool SamFile::ReadHeader ( SamFileHeader header)

Reads the header section from the file and stores it in the passed in header.

Returns
true = success; false = failure.

Definition at line 437 of file SamFile.cpp.

References StatGenStatus::FAIL_ORDER, myHasHeader, myIsOpenForRead, myStatus, StatGenStatus::setStatus(), and StatGenStatus::SUCCESS.

Referenced by OpenForRead(), and Pileup< TestPileupElement >::processFile().

438 {
440  if(myIsOpenForRead == false)
441  {
442  // File is not open for read
444  "Cannot read header since the file is not open for reading");
445  return(false);
446  }
447 
448  if(myHasHeader == true)
449  {
450  // The header has already been read.
452  "Cannot read header since it has already been read.");
453  return(false);
454  }
455 
456  if(myInterfacePtr->readHeader(myFilePtr, header, myStatus))
457  {
458  // The header has now been successfully read.
459  myHasHeader = true;
460  return(true);
461  }
462  return(false);
463 }
method completed successfully.
Definition: StatGenStatus.h:32
void setStatus(Status newStatus, const char *newMessage)
Set the status with the specified status enum and message.
SamStatus myStatus
The status of the last SamFile command.
Definition: SamFile.h:415
FAIL_ORDER: method failed because it was called out of order, like trying to read a file without open...
Definition: StatGenStatus.h:41
bool myIsOpenForRead
Flag to indicate if a file is open for reading.
Definition: SamFile.h:394
bool myHasHeader
Flag to indicate if a header has been read/written - required before being able to read/write a recor...
Definition: SamFile.h:399

◆ ReadRecord()

bool SamFile::ReadRecord ( SamFileHeader header,
SamRecord record 
)

Reads the next record from the file & stores it in the passed in record.

If it is an indexed BAM file and SetReadSection was called, only alignments in the section specified by SetReadSection are read. If they all have already been read, this method returns false.

Validates that the record is sorted according to the value set by setSortedValidation. No sorting validation is done if specified to be unsorted, or setSortedValidation was never called.

Returns
true = record was successfully set (and sorted if applicable), false = record was not successfully set (or not sorted as expected).

Definition at line 501 of file SamFile.cpp.

References StatGenStatus::FAIL_ORDER, SamRecord::getFlag(), myHasHeader, myIsOpenForRead, myRecordCount, myStatistics, myStatus, SamRecord::setReference(), SamRecord::setSequenceTranslation(), StatGenStatus::setStatus(), StatGenStatus::SUCCESS, and validateSortOrder().

Referenced by Pileup< TestPileupElement >::processFile().

503 {
505 
506  if(myIsOpenForRead == false)
507  {
508  // File is not open for read
510  "Cannot read record since the file is not open for reading");
511  throw(std::runtime_error("SOFTWARE BUG: trying to read a SAM/BAM record prior to opening the file."));
512  return(false);
513  }
514 
515  if(myHasHeader == false)
516  {
517  // The header has not yet been read.
518  // TODO - maybe just read the header.
520  "Cannot read record since the header has not been read.");
521  throw(std::runtime_error("SOFTWARE BUG: trying to read a SAM/BAM record prior to reading the header."));
522  return(false);
523  }
524 
525  // Check to see if a new region has been set. If so, determine the
526  // chunks for that region.
527  if(myNewSection)
528  {
529  if(!processNewSection(header))
530  {
531  // Failed processing a new section. Could be an
532  // order issue like the file not being open or the
533  // indexed file not having been read.
534  // processNewSection sets myStatus with the failure reason.
535  return(false);
536  }
537  }
538 
539  // Read until a record is not successfully read or there are no more
540  // requested records.
541  while(myStatus == SamStatus::SUCCESS)
542  {
543  record.setReference(myRefPtr);
544  record.setSequenceTranslation(myReadTranslation);
545 
546  // If reading by index, this method will setup to ensure it is in
547  // the correct position for the next record (if not already there).
548  // Sets myStatus if it could not move to a good section.
549  // Just returns true if it is not setup to read by index.
550  if(!ensureIndexedReadPosition())
551  {
552  // Either there are no more records in the section
553  // or it failed to move to the right section, so there
554  // is nothing more to read, stop looping.
555  break;
556  }
557 
558  // File is open for reading and the header has been read, so read the
559  // next record.
560  myInterfacePtr->readRecord(myFilePtr, header, record, myStatus);
562  {
563  // Failed to read the record, so break out of the loop.
564  break;
565  }
566 
567  // Successfully read a record, so check if we should filter it.
568  // First check if it is out of the section. Returns true
569  // if not reading by sections, returns false if the record
570  // is outside of the section. Sets status to NO_MORE_RECS if
571  // there is nothing left ot read in the section.
572  if(!checkRecordInSection(record))
573  {
574  // The record is not in the section.
575  // The while loop will detect if NO_MORE_RECS was set.
576  continue;
577  }
578 
579  // Check the flag for required/excluded flags.
580  uint16_t flag = record.getFlag();
581  if((flag & myRequiredFlags) != myRequiredFlags)
582  {
583  // The record does not conatain all required flags, so
584  // continue looking.
585  continue;
586  }
587  if((flag & myExcludedFlags) != 0)
588  {
589  // The record contains an excluded flag, so continue looking.
590  continue;
591  }
592 
593  //increment the record count.
594  myRecordCount++;
595 
596  if(myStatistics != NULL)
597  {
598  // Statistics should be updated.
599  myStatistics->updateStatistics(record);
600  }
601 
602  // Successfully read the record, so check the sort order.
603  if(!validateSortOrder(record, header))
604  {
605  // ValidateSortOrder sets the status on a failure.
606  return(false);
607  }
608  return(true);
609 
610  } // End while loop that checks if a desired record is found or failure.
611 
612  // Return true if a record was found.
613  return(myStatus == SamStatus::SUCCESS);
614 }
SamStatistics * myStatistics
Pointer to the statistics for this file.
Definition: SamFile.h:412
bool validateSortOrder(SamRecord &record, SamFileHeader &header)
Validate that the record is sorted compared to the previously read record if there is one...
Definition: SamFile.cpp:1006
method completed successfully.
Definition: StatGenStatus.h:32
void setStatus(Status newStatus, const char *newMessage)
Set the status with the specified status enum and message.
void setSequenceTranslation(SequenceTranslation translation)
Set the type of sequence translation to use when getting the sequence.
Definition: SamRecord.cpp:187
void setReference(GenomeSequence *reference)
Set the reference to the specified genome sequence object.
Definition: SamRecord.cpp:178
SamStatus myStatus
The status of the last SamFile command.
Definition: SamFile.h:415
uint32_t myRecordCount
Keep a count of the number of records that have been read/written so far.
Definition: SamFile.h:409
FAIL_ORDER: method failed because it was called out of order, like trying to read a file without open...
Definition: StatGenStatus.h:41
uint16_t getFlag()
Get the flag (FLAG).
Definition: SamRecord.cpp:1372
bool myIsOpenForRead
Flag to indicate if a file is open for reading.
Definition: SamFile.h:394
bool myHasHeader
Flag to indicate if a header has been read/written - required before being able to read/write a recor...
Definition: SamFile.h:399

◆ SetReadFlags()

void SamFile::SetReadFlags ( uint16_t  requiredFlags,
uint16_t  excludedFlags 
)

Specify which reads should be returned by ReadRecord.

Reads will only be returned by ReadRecord that contain the specified required flags and that do not contain any of the specified excluded flags. ReadRecord will continue to read from the file until a record that complies with these flag settings is found or until the end of the file/region.

Parameters
requiredFlagsflags that are required to be in records returned by ReadRecord (set to 0x0 if there are no required flags).
excludedFlagsflags that are required to not be in records returned by ReadRecord (set to 0x0 if there are no excluded flags).

Definition at line 781 of file SamFile.cpp.

Referenced by GetStatusMessage().

782 {
783  myRequiredFlags = requiredFlags;
784  myExcludedFlags = excludedFlags;
785 }

◆ SetReadSection() [1/4]

bool SamFile::SetReadSection ( int32_t  refID)

Sets which reference id (index into the BAM list of reference information) of the BAM file should be read.

The records for that reference id will be retrieved on each ReadRecord call. Reference ids start at 0, and -1 indicates reads with no reference. When all records have been retrieved for the specified reference id, ReadRecord will return failure until a new read section is set. Must be called only after the file has been opened for reading. Sorting validation is reset everytime SetReadPosition is called since it can jump around in the file.

Parameters
refIDthe reference ID of the records to read from the file.
Returns
true = success; false = failure.

Definition at line 683 of file SamFile.cpp.

Referenced by GetStatusMessage(), and SetReadSection().

684 {
685  // No start/end specified, so set back to default -1.
686  return(SetReadSection(refID, -1, -1));
687 }
bool SetReadSection(int32_t refID)
Sets which reference id (index into the BAM list of reference information) of the BAM file should be ...
Definition: SamFile.cpp:683

◆ SetReadSection() [2/4]

bool SamFile::SetReadSection ( const char *  refName)

Sets which reference name of the BAM file should be read.

The records for that reference name will be retrieved on each ReadRecord call. Specify "" or "*" to read records not associated with a reference. When all records have been retrieved for the specified reference name, ReadRecord will return failure until a new read section is set. Must be called only after the file has been opened for reading. Sorting validation is reset everytime SetReadPosition is called since it can jump around in the file.

Parameters
refNamethe reference name of the records to read from the file.
Returns
true = success; false = failure.

Definition at line 692 of file SamFile.cpp.

References SetReadSection().

693 {
694  // No start/end specified, so set back to default -1.
695  return(SetReadSection(refName, -1, -1));
696 }
bool SetReadSection(int32_t refID)
Sets which reference id (index into the BAM list of reference information) of the BAM file should be ...
Definition: SamFile.cpp:683

◆ SetReadSection() [3/4]

bool SamFile::SetReadSection ( int32_t  refID,
int32_t  start,
int32_t  end,
bool  overlap = true 
)

Sets which reference id (index into the BAM list of reference information) & start/end positions of the BAM file should be read.

The records for that reference id and positions will be retrieved on each ReadRecord call. Reference ids start at 0, and -1 indicates reads with no reference. When all records have been retrieved for the specified reference id, ReadRecord will return failure until a new read section is set. Must be called only after the file has been opened for reading. Sorting validation is reset everytime SetReadPosition is called since it can jump around in the file.

Parameters
refIDthe reference ID of the records to read from the file.
startinclusive 0-based start position of records that should be read for this refID.
endexclusive 0-based end position of records that should be read for this refID.
overlapWhen true (default), return reads that just overlap the region; when false, only return reads that fall completely within the region
Returns
true = success; false = failure.

Definition at line 700 of file SamFile.cpp.

References StatGenStatus::FAIL_ORDER, myIsBamOpenForRead, myPrevCoord, myStatus, StatGenStatus::setStatus(), and StatGenStatus::SUCCESS.

702 {
703  // If there is not a BAM file open for reading, return failure.
704  // Opening a new file clears the read section, so it must be
705  // set after the file is opened.
706  if(!myIsBamOpenForRead)
707  {
708  // There is not a BAM file open for reading.
710  "Cannot set section since there is no bam file open");
711  return(false);
712  }
713 
714  myNewSection = true;
715  myOverlapSection = overlap;
716  myStartPos = start;
717  myEndPos = end;
718  myRefID = refID;
719  myRefName.clear();
720  myChunksToRead.clear();
721  // Reset the end of the current chunk. We are resetting our read, so
722  // we no longer have a "current chunk" that we are reading.
723  myCurrentChunkEnd = 0;
725 
726  // Reset the sort order criteria since we moved around in the file.
727  myPrevCoord = -1;
728  myPrevRefID = 0;
729  myPrevReadName.Clear();
730 
731  return(true);
732 }
method completed successfully.
Definition: StatGenStatus.h:32
void setStatus(Status newStatus, const char *newMessage)
Set the status with the specified status enum and message.
int32_t myPrevCoord
Previous values used for checking if the file is sorted.
Definition: SamFile.h:404
SamStatus myStatus
The status of the last SamFile command.
Definition: SamFile.h:415
FAIL_ORDER: method failed because it was called out of order, like trying to read a file without open...
Definition: StatGenStatus.h:41
bool myIsBamOpenForRead
Values for reading Sorted BAM files via the index.
Definition: SamFile.h:418

◆ SetReadSection() [4/4]

bool SamFile::SetReadSection ( const char *  refName,
int32_t  start,
int32_t  end,
bool  overlap = true 
)

Sets which reference name & start/end positions of the BAM file should be read.

The records for this reference name & positions will be retrieved on each ReadRecord call. Specify "" or "*" to indicate reads with no reference. When all records have been retrieved for the specified section, ReadRecord will return failure until a new read section is set. Must be called only after the file has been opened for reading. Sorting validation is reset everytime SetReadSection is called since it can jump around in the file.

Parameters
refNamethe reference name of the records to read from the file.
startinclusive 0-based start position of records that should be read for this refID.
endexclusive 0-based end position of records that should be read for this refID.
overlapWhen true (default), return reads that just overlap the region; when false, only return reads that fall completely within the region
Returns
true = success; false = failure.

Definition at line 736 of file SamFile.cpp.

References StatGenStatus::FAIL_ORDER, myIsBamOpenForRead, myPrevCoord, myStatus, BamIndex::REF_ID_ALL, BamIndex::REF_ID_UNMAPPED, StatGenStatus::setStatus(), and StatGenStatus::SUCCESS.

738 {
739  // If there is not a BAM file open for reading, return failure.
740  // Opening a new file clears the read section, so it must be
741  // set after the file is opened.
742  if(!myIsBamOpenForRead)
743  {
744  // There is not a BAM file open for reading.
746  "Cannot set section since there is no bam file open");
747  return(false);
748  }
749 
750  myNewSection = true;
751  myOverlapSection = overlap;
752  myStartPos = start;
753  myEndPos = end;
754  if((strcmp(refName, "") == 0) || (strcmp(refName, "*") == 0))
755  {
756  // No Reference name specified, so read just the "-1" entries.
757  myRefID = BamIndex::REF_ID_UNMAPPED;
758  }
759  else
760  {
761  // save the reference name and revert the reference ID to unknown
762  // so it will be calculated later.
763  myRefName = refName;
764  myRefID = BamIndex::REF_ID_ALL;
765  }
766  myChunksToRead.clear();
767  // Reset the end of the current chunk. We are resetting our read, so
768  // we no longer have a "current chunk" that we are reading.
769  myCurrentChunkEnd = 0;
771 
772  // Reset the sort order criteria since we moved around in the file.
773  myPrevCoord = -1;
774  myPrevRefID = 0;
775  myPrevReadName.Clear();
776 
777  return(true);
778 }
static const int32_t REF_ID_ALL
The number used to indicate that all reference ids should be used.
Definition: BamIndex.h:89
method completed successfully.
Definition: StatGenStatus.h:32
void setStatus(Status newStatus, const char *newMessage)
Set the status with the specified status enum and message.
int32_t myPrevCoord
Previous values used for checking if the file is sorted.
Definition: SamFile.h:404
SamStatus myStatus
The status of the last SamFile command.
Definition: SamFile.h:415
FAIL_ORDER: method failed because it was called out of order, like trying to read a file without open...
Definition: StatGenStatus.h:41
bool myIsBamOpenForRead
Values for reading Sorted BAM files via the index.
Definition: SamFile.h:418
static const int32_t REF_ID_UNMAPPED
The number used for the reference id of unmapped reads.
Definition: BamIndex.h:86

◆ SetReadSequenceTranslation()

void SamFile::SetReadSequenceTranslation ( SamRecord::SequenceTranslation  translation)

Set the type of sequence translation to use when reading the sequence.

Passed down to the SamRecord when it is read. The default type (if this method is never called) is NONE (the sequence is left as-is).

Parameters
translationtype of sequence translation to use.

Definition at line 387 of file SamFile.cpp.

388 {
389  myReadTranslation = translation;
390 }

◆ SetReference()

void SamFile::SetReference ( GenomeSequence reference)

Sets the reference to the specified genome sequence object.

Parameters
referencepointer to the GenomeSequence object.

Definition at line 380 of file SamFile.cpp.

Referenced by Pileup< TestPileupElement >::processFile().

381 {
382  myRefPtr = reference;
383 }

◆ setSortedValidation()

void SamFile::setSortedValidation ( SortedType  sortType)

Set the flag to validate that the file is sorted as it is read/written.

Must be called after the file has been opened. Sorting validation is reset everytime SetReadPosition is called since it can jump around in the file.

Parameters
sortTypespecifies the type of sort to be checked for.

Definition at line 669 of file SamFile.cpp.

Referenced by Pileup< TestPileupElement >::processFile().

670 {
671  mySortedType = sortType;
672 }

◆ SetWriteSequenceTranslation()

void SamFile::SetWriteSequenceTranslation ( SamRecord::SequenceTranslation  translation)

Set the type of sequence translation to use when writing the sequence.

Passed down to the SamRecord when it is written. The default type (if this method is never called) is NONE (the sequence is left as-is).

Parameters
translationtype of sequence translation to use.

Definition at line 394 of file SamFile.cpp.

395 {
396  myWriteTranslation = translation;
397 }

◆ validateSortOrder()

bool SamFile::validateSortOrder ( SamRecord record,
SamFileHeader header 
)
protected

Validate that the record is sorted compared to the previously read record if there is one, according to the specified sort order.

If the sort order is UNSORTED, true is returned. Sorting validation is reset everytime SetReadPosition is called since it can jump around in the file.

Definition at line 1006 of file SamFile.cpp.

References COORDINATE, InputFile::disableBuffering(), StatGenStatus::FAIL_IO, StatGenStatus::FAIL_ORDER, StatGenStatus::FAIL_PARSE, FLAG, SamRecord::get0BasedAlignmentEnd(), SamRecord::get0BasedPosition(), BamIndex::getChunksForRegion(), SamRecord::getReadName(), SamFileHeader::getReferenceID(), SamRecord::getReferenceID(), SamFileHeader::getReferenceLabel(), SamFileHeader::getSortOrder(), ifseek(), iftell(), StatGenStatus::INVALID_SORT, myHasHeader, myIsBamOpenForRead, myPrevCoord, myRecordCount, myStatus, StatGenStatus::NO_MORE_RECS, SamReferenceInfo::NO_REF_ID, QUERY_NAME, BamIndex::REF_ID_ALL, BamIndex::REF_ID_UNMAPPED, SamRecord::setReference(), SamRecord::setSequenceTranslation(), StatGenStatus::setStatus(), StatGenStatus::SUCCESS, and UNSORTED.

Referenced by PrintStatistics(), ReadRecord(), and WriteRecord().

1007 {
1008  if(myRefPtr != NULL)
1009  {
1010  record.setReference(myRefPtr);
1011  }
1012  record.setSequenceTranslation(myReadTranslation);
1013 
1014  bool status = false;
1015  if(mySortedType == UNSORTED)
1016  {
1017  // Unsorted, so nothing to validate, just return true.
1018  status = true;
1019  }
1020  else
1021  {
1022  // Check to see if mySortedType is based on the header.
1023  if(mySortedType == FLAG)
1024  {
1025  // Determine the sorted type from what was read out of the header.
1026  mySortedType = getSortOrderFromHeader(header);
1027  }
1028 
1029  if(mySortedType == QUERY_NAME)
1030  {
1031  // Validate that it is sorted by query name.
1032  // Get the query name from the record.
1033  const char* readName = record.getReadName();
1034 
1035  // Check if it is sorted either in samtools way or picard's way.
1036  if((myPrevReadName.Compare(readName) > 0) &&
1037  (strcmp(myPrevReadName.c_str(), readName) > 0))
1038  {
1039  // return false.
1040  String errorMessage = "ERROR: File is not sorted by read name at record ";
1041  errorMessage += myRecordCount;
1042  errorMessage += "\n\tPrevious record was ";
1043  errorMessage += myPrevReadName;
1044  errorMessage += ", but this record is ";
1045  errorMessage += readName;
1047  errorMessage.c_str());
1048  status = false;
1049  }
1050  else
1051  {
1052  myPrevReadName = readName;
1053  status = true;
1054  }
1055  }
1056  else
1057  {
1058  // Validate that it is sorted by COORDINATES.
1059  // Get the leftmost coordinate and the reference index.
1060  int32_t refID = record.getReferenceID();
1061  int32_t coord = record.get0BasedPosition();
1062  // The unmapped reference id is at the end of a sorted file.
1063  if(refID == BamIndex::REF_ID_UNMAPPED)
1064  {
1065  // A new reference ID that is for the unmapped reads
1066  // is always valid.
1067  status = true;
1068  myPrevRefID = refID;
1069  myPrevCoord = coord;
1070  }
1071  else if(myPrevRefID == BamIndex::REF_ID_UNMAPPED)
1072  {
1073  // Previous reference ID was for unmapped reads, but the
1074  // current one is not, so this is not sorted.
1075  String errorMessage = "ERROR: File is not coordinate sorted at record ";
1076  errorMessage += myRecordCount;
1077  errorMessage += "\n\tPrevious record was unmapped, but this record is ";
1078  errorMessage += header.getReferenceLabel(refID) + ":" + coord;
1080  errorMessage.c_str());
1081  status = false;
1082  }
1083  else if(refID < myPrevRefID)
1084  {
1085  // Current reference id is less than the previous one,
1086  //meaning that it is not sorted.
1087  String errorMessage = "ERROR: File is not coordinate sorted at record ";
1088  errorMessage += myRecordCount;
1089  errorMessage += "\n\tPrevious record was ";
1090  errorMessage += header.getReferenceLabel(myPrevRefID) + ":" + myPrevCoord;
1091  errorMessage += ", but this record is ";
1092  errorMessage += header.getReferenceLabel(refID) + ":" + coord;
1094  errorMessage.c_str());
1095  status = false;
1096  }
1097  else
1098  {
1099  // The reference IDs are in the correct order.
1100  if(refID > myPrevRefID)
1101  {
1102  // New reference id, so set the previous coordinate to -1
1103  myPrevCoord = -1;
1104  }
1105 
1106  // Check the coordinates.
1107  if(coord < myPrevCoord)
1108  {
1109  // New Coord is less than the previous position.
1110  String errorMessage = "ERROR: File is not coordinate sorted at record ";
1111  errorMessage += myRecordCount;
1112  errorMessage += "\n\tPreviousRecord was ";
1113  errorMessage += header.getReferenceLabel(myPrevRefID) + ":" + myPrevCoord;
1114  errorMessage += ", but this record is ";
1115  errorMessage += header.getReferenceLabel(refID) + ":" + coord;
1117  errorMessage.c_str());
1118  status = false;
1119  }
1120  else
1121  {
1122  myPrevRefID = refID;
1123  myPrevCoord = coord;
1124  status = true;
1125  }
1126  }
1127  }
1128  }
1129 
1130  return(status);
1131 }
file is not sorted.
Definition: SamFile.h:47
const char * getReadName()
Returns the SAM formatted Read Name (QNAME).
Definition: SamRecord.cpp:1530
int32_t getReferenceID()
Get the reference sequence id of the record (BAM format rid).
Definition: SamRecord.cpp:1293
const String & getReferenceLabel(int id) const
Return the reference name (chromosome) for the specified reference id.
void setStatus(Status newStatus, const char *newMessage)
Set the status with the specified status enum and message.
void setSequenceTranslation(SequenceTranslation translation)
Set the type of sequence translation to use when getting the sequence.
Definition: SamRecord.cpp:187
void setReference(GenomeSequence *reference)
Set the reference to the specified genome sequence object.
Definition: SamRecord.cpp:178
SO flag from the header indicates the sort type.
Definition: SamFile.h:48
int32_t myPrevCoord
Previous values used for checking if the file is sorted.
Definition: SamFile.h:404
SamStatus myStatus
The status of the last SamFile command.
Definition: SamFile.h:415
uint32_t myRecordCount
Keep a count of the number of records that have been read/written so far.
Definition: SamFile.h:409
file is sorted by queryname.
Definition: SamFile.h:50
record is invalid due to it not being sorted.
Definition: StatGenStatus.h:43
int32_t get0BasedPosition()
Get the 0-based(BAM) leftmost position of the record.
Definition: SamRecord.cpp:1307
static const int32_t REF_ID_UNMAPPED
The number used for the reference id of unmapped reads.
Definition: BamIndex.h:86

◆ WriteHeader()

bool SamFile::WriteHeader ( SamFileHeader header)

Writes the specified header into the file.

Returns
true = success; false = failure.

Definition at line 467 of file SamFile.cpp.

References StatGenStatus::FAIL_ORDER, myHasHeader, myIsOpenForWrite, myStatus, StatGenStatus::setStatus(), and StatGenStatus::SUCCESS.

Referenced by OpenForWrite().

468 {
470  if(myIsOpenForWrite == false)
471  {
472  // File is not open for write
473  // -OR-
474  // The header has already been written.
476  "Cannot write header since the file is not open for writing");
477  return(false);
478  }
479 
480  if(myHasHeader == true)
481  {
482  // The header has already been written.
484  "Cannot write header since it has already been written");
485  return(false);
486  }
487 
488  if(myInterfacePtr->writeHeader(myFilePtr, header, myStatus))
489  {
490  // The header has now been successfully written.
491  myHasHeader = true;
492  return(true);
493  }
494 
495  // return the status.
496  return(false);
497 }
method completed successfully.
Definition: StatGenStatus.h:32
void setStatus(Status newStatus, const char *newMessage)
Set the status with the specified status enum and message.
bool myIsOpenForWrite
Flag to indicate if a file is open for writing.
Definition: SamFile.h:396
SamStatus myStatus
The status of the last SamFile command.
Definition: SamFile.h:415
FAIL_ORDER: method failed because it was called out of order, like trying to read a file without open...
Definition: StatGenStatus.h:41
bool myHasHeader
Flag to indicate if a header has been read/written - required before being able to read/write a recor...
Definition: SamFile.h:399

◆ WriteRecord()

bool SamFile::WriteRecord ( SamFileHeader header,
SamRecord record 
)

Writes the specified record into the file.

Validates that the record is sorted according to the value set by setSortedValidation. No sorting validation is done if specified to be unsorted, or setSortedValidation was never called. Returns false and does not write the record if the record was not properly sorted.

Returns
true = success; false = failure.

Definition at line 619 of file SamFile.cpp.

References StatGenStatus::FAIL_ORDER, StatGenStatus::INVALID_SORT, myHasHeader, myIsOpenForWrite, myRecordCount, myStatus, SamRecord::setReference(), StatGenStatus::setStatus(), StatGenStatus::SUCCESS, and validateSortOrder().

Referenced by SamCoordOutput::flush().

621 {
622  if(myIsOpenForWrite == false)
623  {
624  // File is not open for writing
626  "Cannot write record since the file is not open for writing");
627  return(false);
628  }
629 
630  if(myHasHeader == false)
631  {
632  // The header has not yet been written.
634  "Cannot write record since the header has not been written");
635  return(false);
636  }
637 
638  // Before trying to write the record, validate the sort order.
639  if(!validateSortOrder(record, header))
640  {
641  // Not sorted like it is supposed to be, do not write the record
643  "Cannot write the record since the file is not properly sorted.");
644  return(false);
645  }
646 
647  if(myRefPtr != NULL)
648  {
649  record.setReference(myRefPtr);
650  }
651 
652  // File is open for writing and the header has been written, so write the
653  // record.
654  myStatus = myInterfacePtr->writeRecord(myFilePtr, header, record,
655  myWriteTranslation);
656 
658  {
659  // A record was successfully written, so increment the record count.
660  myRecordCount++;
661  return(true);
662  }
663  return(false);
664 }
bool validateSortOrder(SamRecord &record, SamFileHeader &header)
Validate that the record is sorted compared to the previously read record if there is one...
Definition: SamFile.cpp:1006
method completed successfully.
Definition: StatGenStatus.h:32
void setStatus(Status newStatus, const char *newMessage)
Set the status with the specified status enum and message.
void setReference(GenomeSequence *reference)
Set the reference to the specified genome sequence object.
Definition: SamRecord.cpp:178
bool myIsOpenForWrite
Flag to indicate if a file is open for writing.
Definition: SamFile.h:396
SamStatus myStatus
The status of the last SamFile command.
Definition: SamFile.h:415
uint32_t myRecordCount
Keep a count of the number of records that have been read/written so far.
Definition: SamFile.h:409
FAIL_ORDER: method failed because it was called out of order, like trying to read a file without open...
Definition: StatGenStatus.h:41
record is invalid due to it not being sorted.
Definition: StatGenStatus.h:43
bool myHasHeader
Flag to indicate if a header has been read/written - required before being able to read/write a recor...
Definition: SamFile.h:399

Member Data Documentation

◆ myHasHeader

bool SamFile::myHasHeader
protected

Flag to indicate if a header has been read/written - required before being able to read/write a record.

Definition at line 399 of file SamFile.h.

Referenced by ReadHeader(), ReadRecord(), resetFile(), validateSortOrder(), WriteHeader(), and WriteRecord().


The documentation for this class was generated from the following files: