libStatGen Software  1
SamFile Class Reference

Allows the user to easily read/write a SAM/BAM file. More...

#include <SamFile.h>

Inheritance diagram for SamFile:
Collaboration diagram for SamFile:

Public Types

enum  OpenType { READ , WRITE }
 Enum for indicating whether to open the file for read or write. More...
 
enum  SortedType { UNSORTED = 0 , FLAG , COORDINATE , QUERY_NAME }
 Enum for indicating the type of sort expected in the file. More...
 

Public Member Functions

 SamFile ()
 Default Constructor, initializes the variables, but does not open any files.
 
 SamFile (ErrorHandler::HandlingType errorHandlingType)
 Constructor that sets the error handling type. More...
 
 SamFile (const char *filename, OpenType mode)
 Constructor that opens the specified file based on the specified mode (READ/WRITE), aborts if the file could not be opened. More...
 
 SamFile (const char *filename, OpenType mode, ErrorHandler::HandlingType errorHandlingType)
 Constructor that opens the specified file based on the specified mode (READ/WRITE) and handles errors per the specified handleType. More...
 
 SamFile (const char *filename, OpenType mode, SamFileHeader *header)
 Constructor that opens the specified file based on the specified mode (READ/WRITE) and reads the header, aborts if the file could not be opened or the header not read. More...
 
 SamFile (const char *filename, OpenType mode, ErrorHandler::HandlingType errorHandlingType, SamFileHeader *header)
 Constructor that opens the specified file based on the specified mode (READ/WRITE) and reads the header, handling errors per the specified handleType. More...
 
virtual ~SamFile ()
 Destructor.
 
bool OpenForRead (const char *filename, SamFileHeader *header=NULL)
 Open a sam/bam file for reading with the specified filename, determing the type of file and SAM/BAM by reading the file (if not stdin). More...
 
bool OpenForWrite (const char *filename, SamFileHeader *header=NULL)
 Open a sam/bam file for writing with the specified filename, determining SAM/BAM from the extension (.bam = BAM). More...
 
bool ReadBamIndex (const char *filename)
 Read the specified bam index file. More...
 
bool ReadBamIndex ()
 Read the bam index file using the BAM filename as a base. More...
 
void SetReference (GenomeSequence *reference)
 Sets the reference to the specified genome sequence object. More...
 
void SetReadSequenceTranslation (SamRecord::SequenceTranslation translation)
 Set the type of sequence translation to use when reading the sequence. More...
 
void SetWriteSequenceTranslation (SamRecord::SequenceTranslation translation)
 Set the type of sequence translation to use when writing the sequence. More...
 
void Close ()
 Close the file if there is one open.
 
bool IsOpen ()
 Returns whether or not the file has been opened successfully. More...
 
bool IsEOF ()
 Returns whether or not the end of the file has been reached. More...
 
bool IsStream ()
 Returns whether or not the file has been opened for streaming input/output. More...
 
bool ReadHeader (SamFileHeader &header)
 Reads the header section from the file and stores it in the passed in header. More...
 
bool WriteHeader (SamFileHeader &header)
 Writes the specified header into the file. More...
 
bool ReadRecord (SamFileHeader &header, SamRecord &record)
 Reads the next record from the file & stores it in the passed in record. More...
 
bool WriteRecord (SamFileHeader &header, SamRecord &record)
 Writes the specified record into the file. More...
 
void setSortedValidation (SortedType sortType)
 Set the flag to validate that the file is sorted as it is read/written. More...
 
uint32_t GetCurrentRecordCount ()
 Return the number of records that have been read/written so far.
 
SamStatus::Status GetFailure ()
 Deprecated, get the Status of the last call that sets status. More...
 
SamStatus::Status GetStatus ()
 Get the Status of the last call that sets status.
 
const char * GetStatusMessage ()
 Get the Status Message of the last call that sets status.
 
bool SetReadSection (int32_t refID)
 Sets which reference id (index into the BAM list of reference information) of the BAM file should be read. More...
 
bool SetReadSection (const char *refName)
 Sets which reference name of the BAM file should be read. More...
 
bool SetReadSection (int32_t refID, int32_t start, int32_t end, bool overlap=true)
 Sets which reference id (index into the BAM list of reference information) & start/end positions of the BAM file should be read. More...
 
bool SetReadSection (const char *refName, int32_t start, int32_t end, bool overlap=true)
 Sets which reference name & start/end positions of the BAM file should be read. More...
 
void SetReadFlags (uint16_t requiredFlags, uint16_t excludedFlags)
 Specify which reads should be returned by ReadRecord. More...
 
int32_t getNumMappedReadsFromIndex (int32_t refID)
 Get the number of mapped reads in the specified reference id. More...
 
int32_t getNumUnMappedReadsFromIndex (int32_t refID)
 Get the number of unmapped reads in the specified reference id. More...
 
int32_t getNumMappedReadsFromIndex (const char *refName, SamFileHeader &header)
 Get the number of mapped reads in the specified reference name. More...
 
int32_t getNumUnMappedReadsFromIndex (const char *refName, SamFileHeader &header)
 Get the number of unmapped reads in the specified reference name. More...
 
uint32_t GetNumOverlaps (SamRecord &samRecord)
 Returns the number of bases in the passed in read that overlap the region that is currently set. More...
 
void GenerateStatistics (bool genStats)
 Whether or not statistics should be generated for this file. More...
 
const BamIndexGetBamIndex ()
 Return the bam index if one has been opened. More...
 
int64_t GetCurrentPosition ()
 Get the current file position. More...
 
void DisableBuffering ()
 Turn off file read buffering.
 
void PrintStatistics ()
 Print the statistics that have been recorded due to a call to GenerateStatistics.
 
bool attemptRecoverySync (bool(*checkSignature)(void *data), int length)
 
void setAttemptRecovery (bool flag=false)
 

Protected Member Functions

void init ()
 
void init (const char *filename, OpenType mode, SamFileHeader *header)
 
void resetFile ()
 Resets the file prepping for a new file.
 
bool validateSortOrder (SamRecord &record, SamFileHeader &header)
 Validate that the record is sorted compared to the previously read record if there is one, according to the specified sort order. More...
 
SortedType getSortOrderFromHeader (SamFileHeader &header)
 
bool processNewSection (SamFileHeader &header)
 
bool ensureIndexedReadPosition ()
 
bool checkRecordInSection (SamRecord &record)
 

Protected Attributes

IFILE myFilePtr
 
GenericSamInterfacemyInterfacePtr
 
bool myIsOpenForRead
 Flag to indicate if a file is open for reading.
 
bool myIsOpenForWrite
 Flag to indicate if a file is open for writing.
 
bool myHasHeader
 Flag to indicate if a header has been read/written - required before being able to read/write a record.
 
SortedType mySortedType
 
int32_t myPrevCoord
 Previous values used for checking if the file is sorted.
 
int32_t myPrevRefID
 
String myPrevReadName
 
uint32_t myRecordCount
 Keep a count of the number of records that have been read/written so far.
 
SamStatisticsmyStatistics
 Pointer to the statistics for this file.
 
SamStatus myStatus
 The status of the last SamFile command.
 
bool myIsBamOpenForRead
 Values for reading Sorted BAM files via the index.
 
bool myNewSection
 
bool myOverlapSection
 
int32_t myRefID
 
int32_t myStartPos
 
int32_t myEndPos
 
uint64_t myCurrentChunkEnd
 
SortedChunkList myChunksToRead
 
BamIndexmyBamIndex
 
GenomeSequencemyRefPtr
 
SamRecord::SequenceTranslation myReadTranslation
 
SamRecord::SequenceTranslation myWriteTranslation
 
std::string myRefName
 

Detailed Description

Allows the user to easily read/write a SAM/BAM file.

The SamFile class contains additional functionality that allows a user to read specific sections of sorted & indexed BAM files. In order to take advantage of this capability, the index file must be read prior to setting the read section. This logic saves the time of having to read the entire file and takes advantage of the seeking capability of BGZF.

Definition at line 35 of file SamFile.h.

Member Enumeration Documentation

◆ OpenType

Enum for indicating whether to open the file for read or write.

Enumerator
READ 

open for reading.

WRITE 

open for writing.

Definition at line 39 of file SamFile.h.

39  {
40  READ, ///< open for reading.
41  WRITE ///< open for writing.
42  };
@ READ
open for reading.
Definition: SamFile.h:40
@ WRITE
open for writing.
Definition: SamFile.h:41

◆ SortedType

Enum for indicating the type of sort expected in the file.

Enumerator
UNSORTED 

file is not sorted.

FLAG 

SO flag from the header indicates the sort type.

COORDINATE 

file is sorted by coordinate.

QUERY_NAME 

file is sorted by queryname.

Definition at line 46 of file SamFile.h.

46  {
47  UNSORTED = 0, ///< file is not sorted.
48  FLAG, ///< SO flag from the header indicates the sort type.
49  COORDINATE, ///< file is sorted by coordinate.
50  QUERY_NAME ///< file is sorted by queryname.
51  };
@ UNSORTED
file is not sorted.
Definition: SamFile.h:47
@ FLAG
SO flag from the header indicates the sort type.
Definition: SamFile.h:48
@ QUERY_NAME
file is sorted by queryname.
Definition: SamFile.h:50
@ COORDINATE
file is sorted by coordinate.
Definition: SamFile.h:49

Constructor & Destructor Documentation

◆ SamFile() [1/5]

SamFile::SamFile ( ErrorHandler::HandlingType  errorHandlingType)

Constructor that sets the error handling type.

Parameters
errorHandlingTypehow to handle errors.

Definition at line 35 of file SamFile.cpp.

36  : myStatus(errorHandlingType)
37 {
38  init();
39  resetFile();
40 }
void resetFile()
Resets the file prepping for a new file.
Definition: SamFile.cpp:966
SamStatus myStatus
The status of the last SamFile command.
Definition: SamFile.h:420

References resetFile().

◆ SamFile() [2/5]

SamFile::SamFile ( const char *  filename,
OpenType  mode 
)

Constructor that opens the specified file based on the specified mode (READ/WRITE), aborts if the file could not be opened.

Parameters
filenamename of the file to open.
modemode to use for opening the file.

Definition at line 45 of file SamFile.cpp.

46  : myStatus()
47 {
48  init(filename, mode, NULL);
49 }

◆ SamFile() [3/5]

SamFile::SamFile ( const char *  filename,
OpenType  mode,
ErrorHandler::HandlingType  errorHandlingType 
)

Constructor that opens the specified file based on the specified mode (READ/WRITE) and handles errors per the specified handleType.

Parameters
filenamename of the file to open.
modemode to use for opening the file.
errorHandlingTypehow to handle errors.

Definition at line 54 of file SamFile.cpp.

56  : myStatus(errorHandlingType)
57 {
58  init(filename, mode, NULL);
59 }

◆ SamFile() [4/5]

SamFile::SamFile ( const char *  filename,
OpenType  mode,
SamFileHeader header 
)

Constructor that opens the specified file based on the specified mode (READ/WRITE) and reads the header, aborts if the file could not be opened or the header not read.

Parameters
filenamename of the file to open.
modemode to use for opening the file.
headerto read into or write from

Definition at line 64 of file SamFile.cpp.

65  : myStatus()
66 {
67  init(filename, mode, header);
68 }

◆ SamFile() [5/5]

SamFile::SamFile ( const char *  filename,
OpenType  mode,
ErrorHandler::HandlingType  errorHandlingType,
SamFileHeader header 
)

Constructor that opens the specified file based on the specified mode (READ/WRITE) and reads the header, handling errors per the specified handleType.

Parameters
filenamename of the file to open.
modemode to use for opening the file.
errorHandlingTypehow to handle errors.
headerto read into or write from

Definition at line 73 of file SamFile.cpp.

76  : myStatus(errorHandlingType)
77 {
78  init(filename, mode, header);
79 }

Member Function Documentation

◆ GenerateStatistics()

void SamFile::GenerateStatistics ( bool  genStats)

Whether or not statistics should be generated for this file.

The value is carried over between files and is not reset, but the statistics themselves are reset between files.

Parameters
genStatsset to true if statistics should be generated, false if not.

Definition at line 891 of file SamFile.cpp.

892 {
893  if(genStats)
894  {
895  if(myStatistics == NULL)
896  {
897  // Want to generate statistics, but do not yet have the
898  // structure for them, so create one.
899  myStatistics = new SamStatistics();
900  }
901  }
902  else
903  {
904  // Do not generate statistics, so if myStatistics is not NULL,
905  // delete it.
906  if(myStatistics != NULL)
907  {
908  delete myStatistics;
909  myStatistics = NULL;
910  }
911  }
912 
913 }
SamStatistics * myStatistics
Pointer to the statistics for this file.
Definition: SamFile.h:417

References myStatistics.

◆ GetBamIndex()

const BamIndex * SamFile::GetBamIndex ( )

Return the bam index if one has been opened.

Returns
const pointer to the bam index, or null if one has not been opened.

Definition at line 916 of file SamFile.cpp.

917 {
918  return(myBamIndex);
919 }

◆ GetCurrentPosition()

int64_t SamFile::GetCurrentPosition ( )
inline

Get the current file position.

Returns
current position in the file.

Definition at line 341 of file SamFile.h.

342  {
343  return(iftell(myFilePtr));
344  }
int64_t iftell(IFILE file)
Get current position in the file.
Definition: InputFile.h:682

References iftell().

◆ GetFailure()

SamStatus::Status SamFile::GetFailure ( )
inline

Deprecated, get the Status of the last call that sets status.

To remain backwards compatable - will be removed later.

Definition at line 206 of file SamFile.h.

207  {
208  return(GetStatus());
209  }
SamStatus::Status GetStatus()
Get the Status of the last call that sets status.
Definition: SamFile.h:212

References GetStatus().

◆ getNumMappedReadsFromIndex() [1/2]

int32_t SamFile::getNumMappedReadsFromIndex ( const char *  refName,
SamFileHeader header 
)

Get the number of mapped reads in the specified reference name.

Returns -1 for unknown reference names.

Parameters
refNamereference name for which to extract the number of mapped reads.
headerheader object containing the map from refName to refID
Returns
number of mapped reads for the specified reference name.

Definition at line 833 of file SamFile.cpp.

835 {
836  // The bam index must have already been read.
837  if(myBamIndex == NULL)
838  {
840  "Cannot get num mapped reads from the index until it has been read.");
841  return(false);
842  }
843  int32_t refID = BamIndex::REF_ID_UNMAPPED;
844  if((strcmp(refName, "") != 0) && (strcmp(refName, "*") != 0))
845  {
846  // Reference name specified, so read just the "-1" entries.
847  refID = header.getReferenceID(refName);
848  }
849  return(myBamIndex->getNumMappedReads(refID));
850 }
int32_t getNumMappedReads(int32_t refID)
Get the number of mapped reads for this reference id.
Definition: BamIndex.cpp:355
static const int32_t REF_ID_UNMAPPED
The number used for the reference id of unmapped reads.
Definition: BamIndex.h:86
int getReferenceID(const String &referenceName, bool addID=false)
Get the reference ID for the specified reference name (chromosome).
@ FAIL_ORDER
FAIL_ORDER: method failed because it was called out of order, like trying to read a file without open...
Definition: StatGenStatus.h:41
void setStatus(Status newStatus, const char *newMessage)
Set the status with the specified status enum and message.

References StatGenStatus::FAIL_ORDER, BamIndex::getNumMappedReads(), SamFileHeader::getReferenceID(), myStatus, BamIndex::REF_ID_UNMAPPED, and StatGenStatus::setStatus().

◆ getNumMappedReadsFromIndex() [2/2]

int32_t SamFile::getNumMappedReadsFromIndex ( int32_t  refID)

Get the number of mapped reads in the specified reference id.


Returns -1 for out of range refIDs.

Parameters
refIDreference ID for which to extract the number of mapped reads.
Returns
number of mapped reads for the specified reference id.

Definition at line 803 of file SamFile.cpp.

804 {
805  // The bam index must have already been read.
806  if(myBamIndex == NULL)
807  {
809  "Cannot get num mapped reads from the index until it has been read.");
810  return(false);
811  }
812  return(myBamIndex->getNumMappedReads(refID));
813 }

References StatGenStatus::FAIL_ORDER, BamIndex::getNumMappedReads(), myStatus, and StatGenStatus::setStatus().

◆ GetNumOverlaps()

uint32_t SamFile::GetNumOverlaps ( SamRecord samRecord)

Returns the number of bases in the passed in read that overlap the region that is currently set.

Overlapping means that the bases occur in both the read and the reference as either matches or mismatches. This does not count insertions, deletions, clips, pads, or skips.

Parameters
samRecordto check for overlapping bases.
Returns
number of bases that overlap region that is currently set.

Definition at line 877 of file SamFile.cpp.

878 {
879  if(myRefPtr != NULL)
880  {
881  samRecord.setReference(myRefPtr);
882  }
883  samRecord.setSequenceTranslation(myReadTranslation);
884 
885  // Get the overlaps in the sam record for the region currently set
886  // for this file.
887  return(samRecord.getNumOverlaps(myStartPos, myEndPos));
888 }
void setSequenceTranslation(SequenceTranslation translation)
Set the type of sequence translation to use when getting the sequence.
Definition: SamRecord.cpp:187
uint32_t getNumOverlaps(int32_t start, int32_t end)
Return the number of bases in this read that overlap the passed in region.
Definition: SamRecord.cpp:1853
void setReference(GenomeSequence *reference)
Set the reference to the specified genome sequence object.
Definition: SamRecord.cpp:178

References SamRecord::getNumOverlaps(), SamRecord::setReference(), and SamRecord::setSequenceTranslation().

◆ getNumUnMappedReadsFromIndex() [1/2]

int32_t SamFile::getNumUnMappedReadsFromIndex ( const char *  refName,
SamFileHeader header 
)

Get the number of unmapped reads in the specified reference name.

Returns -1 for unknown reference names.

Parameters
refNamereference name for which to extract the number of unmapped reads.
headerheader object containing the map from refName to refID
Returns
number of unmapped reads for the specified reference name.

Definition at line 855 of file SamFile.cpp.

857 {
858  // The bam index must have already been read.
859  if(myBamIndex == NULL)
860  {
862  "Cannot get num unmapped reads from the index until it has been read.");
863  return(false);
864  }
865  int32_t refID = BamIndex::REF_ID_UNMAPPED;
866  if((strcmp(refName, "") != 0) && (strcmp(refName, "*") != 0))
867  {
868  // Reference name specified, so read just the "-1" entries.
869  refID = header.getReferenceID(refName);
870  }
871  return(myBamIndex->getNumUnMappedReads(refID));
872 }
int32_t getNumUnMappedReads(int32_t refID)
Get the number of unmapped reads for this reference id.
Definition: BamIndex.cpp:377

References StatGenStatus::FAIL_ORDER, BamIndex::getNumUnMappedReads(), SamFileHeader::getReferenceID(), myStatus, BamIndex::REF_ID_UNMAPPED, and StatGenStatus::setStatus().

◆ getNumUnMappedReadsFromIndex() [2/2]

int32_t SamFile::getNumUnMappedReadsFromIndex ( int32_t  refID)

Get the number of unmapped reads in the specified reference id.


Returns -1 for out of range refIDs.

Parameters
refIDreference ID for which to extract the number of unmapped reads.
Returns
number of unmapped reads for the specified reference id.

Definition at line 818 of file SamFile.cpp.

819 {
820  // The bam index must have already been read.
821  if(myBamIndex == NULL)
822  {
824  "Cannot get num unmapped reads from the index until it has been read.");
825  return(false);
826  }
827  return(myBamIndex->getNumUnMappedReads(refID));
828 }

References StatGenStatus::FAIL_ORDER, BamIndex::getNumUnMappedReads(), myStatus, and StatGenStatus::setStatus().

◆ IsEOF()

bool SamFile::IsEOF ( )

Returns whether or not the end of the file has been reached.

Returns
true = EOF; false = not eof. If the file is not open, true is returned.

Definition at line 424 of file SamFile.cpp.

425 {
426  if(myIsOpenForRead == false)
427  {
428  // Not open for read, return true.
429  return(true);
430  }
431  return(myInterfacePtr->isEOF(myFilePtr));
432 }
bool myIsOpenForRead
Flag to indicate if a file is open for reading.
Definition: SamFile.h:399

References myIsOpenForRead.

◆ IsOpen()

bool SamFile::IsOpen ( )

Returns whether or not the file has been opened successfully.

Returns
true = open; false = not open.

Definition at line 410 of file SamFile.cpp.

411 {
412  if (myFilePtr != NULL)
413  {
414  // File Pointer is set, so return if it is open.
415  return(myFilePtr->isOpen());
416  }
417  // File pointer is not set, so return false, not open.
418  return false;
419 }
bool isOpen() const
Returns whether or not the file was successfully opened.
Definition: InputFile.h:423

References InputFile::isOpen().

◆ IsStream()

bool SamFile::IsStream ( )

Returns whether or not the file has been opened for streaming input/output.

Returns
true = stream; false = not a stream.

Definition at line 437 of file SamFile.cpp.

438 {
439  if (myFilePtr != NULL)
440  {
441  // File Pointer is set, so return if it is a stream.
442  return((myFilePtr->getFileName())[0] == '-');
443  }
444  // File pointer is not set, so return false, not a stream.
445  return false;
446 }
const char * getFileName() const
Get the filename that is currently opened.
Definition: InputFile.h:473

References InputFile::getFileName().

◆ OpenForRead()

bool SamFile::OpenForRead ( const char *  filename,
SamFileHeader header = NULL 
)

Open a sam/bam file for reading with the specified filename, determing the type of file and SAM/BAM by reading the file (if not stdin).

Parameters
filenamethe sam/bam file to open for reading.
headerto read into or write from (optional)
Returns
true = success; false = failure.

Definition at line 93 of file SamFile.cpp.

94 {
95  // Reset for any previously operated on files.
96  resetFile();
97 
98  int lastchar = 0;
99 
100  while (filename[lastchar] != 0) lastchar++;
101 
102  // If at least one character, check for '-'.
103  if((lastchar >= 1) && (filename[0] == '-'))
104  {
105  // Read from stdin - determine type of file to read.
106  // Determine if compressed bam.
107  if(strcmp(filename, "-.bam") == 0)
108  {
109  // Compressed bam - open as bgzf.
110  // -.bam is the filename, read compressed bam from stdin
111  filename = "-";
112 
113  myFilePtr = new InputFile;
114  // support recover mode - this switches in a reader
115  // capable of recovering from bad BGZF compression blocks.
116  myFilePtr->setAttemptRecovery(myAttemptRecovery);
117  myFilePtr->openFile(filename, "rb", InputFile::BGZF);
118 
119  myInterfacePtr = new BamInterface;
120 
121  // Read the magic string.
122  char magic[4];
123  ifread(myFilePtr, magic, 4);
124  }
125  else if(strcmp(filename, "-.ubam") == 0)
126  {
127  // uncompressed BAM File.
128  // -.ubam is the filename, read uncompressed bam from stdin.
129  // uncompressed BAM is still compressed with BGZF, but using
130  // compression level 0, so still open as BGZF since it has a
131  // BGZF header.
132  filename = "-";
133 
134  // Uncompressed, so do not require the eof block.
135 #ifdef __ZLIB_AVAILABLE__
136  BgzfFileType::setRequireEofBlock(false);
137 #endif
138  myFilePtr = ifopen(filename, "rb", InputFile::BGZF);
139 
140  myInterfacePtr = new BamInterface;
141 
142  // Read the magic string.
143  char magic[4];
144  ifread(myFilePtr, magic, 4);
145  }
146  else if((strcmp(filename, "-") == 0) || (strcmp(filename, "-.sam") == 0))
147  {
148  // SAM File.
149  // read sam from stdin
150  filename = "-";
151  myFilePtr = ifopen(filename, "rb", InputFile::UNCOMPRESSED);
152  myInterfacePtr = new SamInterface;
153  }
154  else
155  {
156  std::string errorMessage = "Invalid SAM/BAM filename, ";
157  errorMessage += filename;
158  errorMessage += ". From stdin, can only be '-', '-.sam', '-.bam', or '-.ubam'";
159  myStatus.setStatus(SamStatus::FAIL_IO, errorMessage.c_str());
160  delete myFilePtr;
161  myFilePtr = NULL;
162  return(false);
163  }
164  }
165  else
166  {
167  // Not from stdin. Read the file to determine the type.
168 
169  myFilePtr = new InputFile;
170 
171  // support recovery mode - this conditionally enables a reader
172  // capable of recovering from bad BGZF compression blocks.
173  myFilePtr->setAttemptRecovery(myAttemptRecovery);
174  bool rc = myFilePtr->openFile(filename, "rb", InputFile::DEFAULT);
175 
176  if (rc == false)
177  {
178  std::string errorMessage = "Failed to Open ";
179  errorMessage += filename;
180  errorMessage += " for reading";
181  myStatus.setStatus(SamStatus::FAIL_IO, errorMessage.c_str());
182  delete myFilePtr;
183  myFilePtr = NULL;
184  return(false);
185  }
186 
187  char magic[4];
188  ifread(myFilePtr, magic, 4);
189 
190  if (magic[0] == 'B' && magic[1] == 'A' && magic[2] == 'M' &&
191  magic[3] == 1)
192  {
193  myInterfacePtr = new BamInterface;
194  // Set that it is a bam file open for reading. This is needed to
195  // determine if an index file can be used.
196  myIsBamOpenForRead = true;
197  }
198  else
199  {
200  // Not a bam, so rewind to the beginning of the file so it
201  // can be read.
202  ifrewind(myFilePtr);
203  myInterfacePtr = new SamInterface;
204  }
205  }
206 
207  // File is open for reading.
208  myIsOpenForRead = true;
209 
210  // Read the header if one was passed in.
211  if(header != NULL)
212  {
213  return(ReadHeader(*header));
214  }
215 
216  // Successfully opened the file.
218  return(true);
219 }
void ifrewind(IFILE file)
Reset to the beginning of the file (cannot be done for stdin/stdout).
Definition: InputFile.h:642
IFILE ifopen(const char *filename, const char *mode, InputFile::ifileCompression compressionMode=InputFile::DEFAULT)
Open a file with the specified name and mode, using a filename of "-" to indicate stdin/stdout.
Definition: InputFile.h:562
unsigned int ifread(IFILE file, void *buffer, unsigned int size)
Read up to size bytes from the file into the buffer.
Definition: InputFile.h:600
Class for easily reading/writing files without having to worry about file type (uncompressed,...
Definition: InputFile.h:37
void setAttemptRecovery(bool flag=false)
Enable (default) or disable recovery.
Definition: InputFile.h:485
@ BGZF
bgzf file.
Definition: InputFile.h:48
@ DEFAULT
Check the extension, if it is ".gz", treat as gzip, otherwise treat it as UNCOMPRESSED.
Definition: InputFile.h:45
@ UNCOMPRESSED
uncompressed file.
Definition: InputFile.h:46
bool ReadHeader(SamFileHeader &header)
Reads the header section from the file and stores it in the passed in header.
Definition: SamFile.cpp:450
bool myIsBamOpenForRead
Values for reading Sorted BAM files via the index.
Definition: SamFile.h:423
@ SUCCESS
method completed successfully.
Definition: StatGenStatus.h:32
@ FAIL_IO
method failed due to an I/O issue.
Definition: StatGenStatus.h:37

References InputFile::BGZF, InputFile::DEFAULT, StatGenStatus::FAIL_IO, ifopen(), ifread(), ifrewind(), myIsBamOpenForRead, myIsOpenForRead, myStatus, ReadHeader(), resetFile(), InputFile::setAttemptRecovery(), StatGenStatus::setStatus(), StatGenStatus::SUCCESS, and InputFile::UNCOMPRESSED.

Referenced by Pileup< PILEUP_TYPE, FUNC_CLASS >::processFile().

◆ OpenForWrite()

bool SamFile::OpenForWrite ( const char *  filename,
SamFileHeader header = NULL 
)

Open a sam/bam file for writing with the specified filename, determining SAM/BAM from the extension (.bam = BAM).

Parameters
filenamethe sam/bam file to open for writing.
headerto read into or write from (optional)
Returns
true = success; false = failure.

Definition at line 223 of file SamFile.cpp.

224 {
225  // Reset for any previously operated on files.
226  resetFile();
227 
228  int lastchar = 0;
229  while (filename[lastchar] != 0) lastchar++;
230  if (lastchar >= 4 &&
231  filename[lastchar - 4] == 'u' &&
232  filename[lastchar - 3] == 'b' &&
233  filename[lastchar - 2] == 'a' &&
234  filename[lastchar - 1] == 'm')
235  {
236  // BAM File.
237  // if -.ubam is the filename, write uncompressed bam to stdout
238  if((lastchar == 6) && (filename[0] == '-') && (filename[1] == '.'))
239  {
240  filename = "-";
241  }
242 
243  myFilePtr = ifopen(filename, "wb0", InputFile::BGZF);
244 
245  myInterfacePtr = new BamInterface;
246  }
247  else if (lastchar >= 3 &&
248  filename[lastchar - 3] == 'b' &&
249  filename[lastchar - 2] == 'a' &&
250  filename[lastchar - 1] == 'm')
251  {
252  // BAM File.
253  // if -.bam is the filename, write compressed bam to stdout
254  if((lastchar == 5) && (filename[0] == '-') && (filename[1] == '.'))
255  {
256  filename = "-";
257  }
258  myFilePtr = ifopen(filename, "wb", InputFile::BGZF);
259 
260  myInterfacePtr = new BamInterface;
261  }
262  else
263  {
264  // SAM File
265  // if - (followed by anything is the filename,
266  // write uncompressed sam to stdout
267  if((lastchar >= 1) && (filename[0] == '-'))
268  {
269  filename = "-";
270  }
271  myFilePtr = ifopen(filename, "wb", InputFile::UNCOMPRESSED);
272 
273  myInterfacePtr = new SamInterface;
274  }
275 
276  if (myFilePtr == NULL)
277  {
278  std::string errorMessage = "Failed to Open ";
279  errorMessage += filename;
280  errorMessage += " for writing";
281  myStatus.setStatus(SamStatus::FAIL_IO, errorMessage.c_str());
282  return(false);
283  }
284 
285  myIsOpenForWrite = true;
286 
287  // Write the header if one was passed in.
288  if(header != NULL)
289  {
290  return(WriteHeader(*header));
291  }
292 
293  // Successfully opened the file.
295  return(true);
296 }
bool WriteHeader(SamFileHeader &header)
Writes the specified header into the file.
Definition: SamFile.cpp:480
bool myIsOpenForWrite
Flag to indicate if a file is open for writing.
Definition: SamFile.h:401

References InputFile::BGZF, StatGenStatus::FAIL_IO, ifopen(), myIsOpenForWrite, myStatus, resetFile(), StatGenStatus::setStatus(), StatGenStatus::SUCCESS, InputFile::UNCOMPRESSED, and WriteHeader().

◆ ReadBamIndex() [1/2]

bool SamFile::ReadBamIndex ( )

Read the bam index file using the BAM filename as a base.

It must be read prior to setting a read section, for seeking and reading portions of a bam file. Must be read after opening the BAM file since it uses the BAM filename as a base name for the index file. First it tries filename.bam.bai. If that fails, it tries it without the .bam extension, filename.bai.

Returns
true = success; false = failure.

Definition at line 328 of file SamFile.cpp.

329 {
330  if(myFilePtr == NULL)
331  {
332  // Can't read the bam index file because the BAM file has not yet been
333  // opened, so we don't know the base filename for the index file.
334  std::string errorMessage = "Failed to read the bam Index file -"
335  " the BAM file needs to be read first in order to determine"
336  " the index filename.";
337  myStatus.setStatus(SamStatus::FAIL_ORDER, errorMessage.c_str());
338  return(false);
339  }
340 
341  const char* bamBaseName = myFilePtr->getFileName();
342 
343  std::string indexName = bamBaseName;
344  indexName += ".bai";
345 
346  bool foundFile = true;
347  try
348  {
349  if(ReadBamIndex(indexName.c_str()) == false)
350  {
351  foundFile = false;
352  }
353  }
354  catch (std::exception&)
355  {
356  foundFile = false;
357  }
358 
359  // Check to see if the index file was found.
360  if(!foundFile)
361  {
362  // Not found - try without the bam extension.
363  // Locate the start of the bam extension
364  size_t startExt = indexName.find(".bam");
365  if(startExt == std::string::npos)
366  {
367  // Could not find the .bam extension, so just return false since the
368  // call to ReadBamIndex set the status.
369  return(false);
370  }
371  // Remove ".bam" and try reading the index again.
372  indexName.erase(startExt, 4);
373  return(ReadBamIndex(indexName.c_str()));
374  }
375  return(true);
376 }
bool ReadBamIndex()
Read the bam index file using the BAM filename as a base.
Definition: SamFile.cpp:328

References StatGenStatus::FAIL_ORDER, InputFile::getFileName(), myStatus, and StatGenStatus::setStatus().

◆ ReadBamIndex() [2/2]

bool SamFile::ReadBamIndex ( const char *  filename)

Read the specified bam index file.

It must be read prior to setting a read section, for seeking and reading portions of a bam file.

Parameters
filenamethe name of the bam index file to be read.
Returns
true = success; false = failure.

Definition at line 300 of file SamFile.cpp.

301 {
302  // Cleanup a previously setup index.
303  if(myBamIndex != NULL)
304  {
305  delete myBamIndex;
306  myBamIndex = NULL;
307  }
308 
309  // Create a new bam index.
310  myBamIndex = new BamIndex();
311  SamStatus::Status indexStat = myBamIndex->readIndex(bamIndexFilename);
312 
313  if(indexStat != SamStatus::SUCCESS)
314  {
315  std::string errorMessage = "Failed to read the bam Index file: ";
316  errorMessage += bamIndexFilename;
317  myStatus.setStatus(indexStat, errorMessage.c_str());
318  delete myBamIndex;
319  myBamIndex = NULL;
320  return(false);
321  }
323  return(true);
324 }
SamStatus::Status readIndex(const char *filename)
Definition: BamIndex.cpp:45
Status
Return value enum for StatGenFile methods.
Definition: StatGenStatus.h:32

References myStatus, BamIndex::readIndex(), StatGenStatus::setStatus(), and StatGenStatus::SUCCESS.

◆ ReadHeader()

bool SamFile::ReadHeader ( SamFileHeader header)

Reads the header section from the file and stores it in the passed in header.

Returns
true = success; false = failure.

Definition at line 450 of file SamFile.cpp.

451 {
453  if(myIsOpenForRead == false)
454  {
455  // File is not open for read
457  "Cannot read header since the file is not open for reading");
458  return(false);
459  }
460 
461  if(myHasHeader == true)
462  {
463  // The header has already been read.
465  "Cannot read header since it has already been read.");
466  return(false);
467  }
468 
469  if(myInterfacePtr->readHeader(myFilePtr, header, myStatus))
470  {
471  // The header has now been successfully read.
472  myHasHeader = true;
473  return(true);
474  }
475  return(false);
476 }
bool myHasHeader
Flag to indicate if a header has been read/written - required before being able to read/write a recor...
Definition: SamFile.h:404

References StatGenStatus::FAIL_ORDER, myHasHeader, myIsOpenForRead, myStatus, StatGenStatus::setStatus(), and StatGenStatus::SUCCESS.

Referenced by OpenForRead(), and Pileup< PILEUP_TYPE, FUNC_CLASS >::processFile().

◆ ReadRecord()

bool SamFile::ReadRecord ( SamFileHeader header,
SamRecord record 
)

Reads the next record from the file & stores it in the passed in record.

If it is an indexed BAM file and SetReadSection was called, only alignments in the section specified by SetReadSection are read. If they all have already been read, this method returns false.

Validates that the record is sorted according to the value set by setSortedValidation. No sorting validation is done if specified to be unsorted, or setSortedValidation was never called.

Returns
true = record was successfully set (and sorted if applicable), false = record was not successfully set (or not sorted as expected).

Definition at line 514 of file SamFile.cpp.

516 {
518 
519  if(myIsOpenForRead == false)
520  {
521  // File is not open for read
523  "Cannot read record since the file is not open for reading");
524  throw(std::runtime_error("SOFTWARE BUG: trying to read a SAM/BAM record prior to opening the file."));
525  return(false);
526  }
527 
528  if(myHasHeader == false)
529  {
530  // The header has not yet been read.
531  // TODO - maybe just read the header.
533  "Cannot read record since the header has not been read.");
534  throw(std::runtime_error("SOFTWARE BUG: trying to read a SAM/BAM record prior to reading the header."));
535  return(false);
536  }
537 
538  // Check to see if a new region has been set. If so, determine the
539  // chunks for that region.
540  if(myNewSection)
541  {
542  if(!processNewSection(header))
543  {
544  // Failed processing a new section. Could be an
545  // order issue like the file not being open or the
546  // indexed file not having been read.
547  // processNewSection sets myStatus with the failure reason.
548  return(false);
549  }
550  }
551 
552  // Read until a record is not successfully read or there are no more
553  // requested records.
554  while(myStatus == SamStatus::SUCCESS)
555  {
556  record.setReference(myRefPtr);
557  record.setSequenceTranslation(myReadTranslation);
558 
559  // If reading by index, this method will setup to ensure it is in
560  // the correct position for the next record (if not already there).
561  // Sets myStatus if it could not move to a good section.
562  // Just returns true if it is not setup to read by index.
563  if(!ensureIndexedReadPosition())
564  {
565  // Either there are no more records in the section
566  // or it failed to move to the right section, so there
567  // is nothing more to read, stop looping.
568  break;
569  }
570 
571  // File is open for reading and the header has been read, so read the
572  // next record.
573  myInterfacePtr->readRecord(myFilePtr, header, record, myStatus);
575  {
576  // Failed to read the record, so break out of the loop.
577  break;
578  }
579 
580  // Successfully read a record, so check if we should filter it.
581  // First check if it is out of the section. Returns true
582  // if not reading by sections, returns false if the record
583  // is outside of the section. Sets status to NO_MORE_RECS if
584  // there is nothing left ot read in the section.
585  if(!checkRecordInSection(record))
586  {
587  // The record is not in the section.
588  // The while loop will detect if NO_MORE_RECS was set.
589  continue;
590  }
591 
592  // Check the flag for required/excluded flags.
593  uint16_t flag = record.getFlag();
594  if((flag & myRequiredFlags) != myRequiredFlags)
595  {
596  // The record does not conatain all required flags, so
597  // continue looking.
598  continue;
599  }
600  if((flag & myExcludedFlags) != 0)
601  {
602  // The record contains an excluded flag, so continue looking.
603  continue;
604  }
605 
606  //increment the record count.
607  myRecordCount++;
608 
609  if(myStatistics != NULL)
610  {
611  // Statistics should be updated.
612  myStatistics->updateStatistics(record);
613  }
614 
615  // Successfully read the record, so check the sort order.
616  if(!validateSortOrder(record, header))
617  {
618  // ValidateSortOrder sets the status on a failure.
619  return(false);
620  }
621  return(true);
622 
623  } // End while loop that checks if a desired record is found or failure.
624 
625  // Return true if a record was found.
626  return(myStatus == SamStatus::SUCCESS);
627 }
bool validateSortOrder(SamRecord &record, SamFileHeader &header)
Validate that the record is sorted compared to the previously read record if there is one,...
Definition: SamFile.cpp:1019
uint32_t myRecordCount
Keep a count of the number of records that have been read/written so far.
Definition: SamFile.h:414
uint16_t getFlag()
Get the flag (FLAG).
Definition: SamRecord.cpp:1384

References StatGenStatus::FAIL_ORDER, SamRecord::getFlag(), myHasHeader, myIsOpenForRead, myRecordCount, myStatistics, myStatus, SamRecord::setReference(), SamRecord::setSequenceTranslation(), StatGenStatus::setStatus(), StatGenStatus::SUCCESS, and validateSortOrder().

Referenced by Pileup< PILEUP_TYPE, FUNC_CLASS >::processFile().

◆ SetReadFlags()

void SamFile::SetReadFlags ( uint16_t  requiredFlags,
uint16_t  excludedFlags 
)

Specify which reads should be returned by ReadRecord.

Reads will only be returned by ReadRecord that contain the specified required flags and that do not contain any of the specified excluded flags. ReadRecord will continue to read from the file until a record that complies with these flag settings is found or until the end of the file/region.

Parameters
requiredFlagsflags that are required to be in records returned by ReadRecord (set to 0x0 if there are no required flags).
excludedFlagsflags that are required to not be in records returned by ReadRecord (set to 0x0 if there are no excluded flags).

Definition at line 794 of file SamFile.cpp.

795 {
796  myRequiredFlags = requiredFlags;
797  myExcludedFlags = excludedFlags;
798 }

◆ SetReadSection() [1/4]

bool SamFile::SetReadSection ( const char *  refName)

Sets which reference name of the BAM file should be read.

The records for that reference name will be retrieved on each ReadRecord call. Specify "" or "*" to read records not associated with a reference. When all records have been retrieved for the specified reference name, ReadRecord will return failure until a new read section is set. Must be called only after the file has been opened for reading. Sorting validation is reset everytime SetReadPosition is called since it can jump around in the file.

Parameters
refNamethe reference name of the records to read from the file.
Returns
true = success; false = failure.

Definition at line 705 of file SamFile.cpp.

706 {
707  // No start/end specified, so set back to default -1.
708  return(SetReadSection(refName, -1, -1));
709 }
bool SetReadSection(int32_t refID)
Sets which reference id (index into the BAM list of reference information) of the BAM file should be ...
Definition: SamFile.cpp:696

References SetReadSection().

◆ SetReadSection() [2/4]

bool SamFile::SetReadSection ( const char *  refName,
int32_t  start,
int32_t  end,
bool  overlap = true 
)

Sets which reference name & start/end positions of the BAM file should be read.

The records for this reference name & positions will be retrieved on each ReadRecord call. Specify "" or "*" to indicate reads with no reference. When all records have been retrieved for the specified section, ReadRecord will return failure until a new read section is set. Must be called only after the file has been opened for reading. Sorting validation is reset everytime SetReadSection is called since it can jump around in the file.

Parameters
refNamethe reference name of the records to read from the file.
startinclusive 0-based start position of records that should be read for this refID.
endexclusive 0-based end position of records that should be read for this refID.
overlapWhen true (default), return reads that just overlap the region; when false, only return reads that fall completely within the region
Returns
true = success; false = failure.

Definition at line 749 of file SamFile.cpp.

751 {
752  // If there is not a BAM file open for reading, return failure.
753  // Opening a new file clears the read section, so it must be
754  // set after the file is opened.
755  if(!myIsBamOpenForRead)
756  {
757  // There is not a BAM file open for reading.
759  "Cannot set section since there is no bam file open");
760  return(false);
761  }
762 
763  myNewSection = true;
764  myOverlapSection = overlap;
765  myStartPos = start;
766  myEndPos = end;
767  if((strcmp(refName, "") == 0) || (strcmp(refName, "*") == 0))
768  {
769  // No Reference name specified, so read just the "-1" entries.
770  myRefID = BamIndex::REF_ID_UNMAPPED;
771  }
772  else
773  {
774  // save the reference name and revert the reference ID to unknown
775  // so it will be calculated later.
776  myRefName = refName;
777  myRefID = BamIndex::REF_ID_ALL;
778  }
779  myChunksToRead.clear();
780  // Reset the end of the current chunk. We are resetting our read, so
781  // we no longer have a "current chunk" that we are reading.
782  myCurrentChunkEnd = 0;
784 
785  // Reset the sort order criteria since we moved around in the file.
786  myPrevCoord = -1;
787  myPrevRefID = 0;
788  myPrevReadName.Clear();
789 
790  return(true);
791 }
static const int32_t REF_ID_ALL
The number used to indicate that all reference ids should be used.
Definition: BamIndex.h:89
int32_t myPrevCoord
Previous values used for checking if the file is sorted.
Definition: SamFile.h:409

References StatGenStatus::FAIL_ORDER, myIsBamOpenForRead, myPrevCoord, myStatus, BamIndex::REF_ID_ALL, BamIndex::REF_ID_UNMAPPED, StatGenStatus::setStatus(), and StatGenStatus::SUCCESS.

◆ SetReadSection() [3/4]

bool SamFile::SetReadSection ( int32_t  refID)

Sets which reference id (index into the BAM list of reference information) of the BAM file should be read.

The records for that reference id will be retrieved on each ReadRecord call.
Reference ids start at 0, and -1 indicates reads with no reference. When all records have been retrieved for the specified reference id, ReadRecord will return failure until a new read section is set. Must be called only after the file has been opened for reading. Sorting validation is reset everytime SetReadPosition is called since it can jump around in the file.

Parameters
refIDthe reference ID of the records to read from the file.
Returns
true = success; false = failure.

Definition at line 696 of file SamFile.cpp.

697 {
698  // No start/end specified, so set back to default -1.
699  return(SetReadSection(refID, -1, -1));
700 }

Referenced by SetReadSection().

◆ SetReadSection() [4/4]

bool SamFile::SetReadSection ( int32_t  refID,
int32_t  start,
int32_t  end,
bool  overlap = true 
)

Sets which reference id (index into the BAM list of reference information) & start/end positions of the BAM file should be read.

The records for that reference id and positions will be retrieved on each ReadRecord call. Reference ids start at 0, and -1 indicates reads with no reference. When all records have been retrieved for the specified reference id, ReadRecord will return failure until a new read section is set. Must be called only after the file has been opened for reading. Sorting validation is reset everytime SetReadPosition is called since it can jump around in the file.

Parameters
refIDthe reference ID of the records to read from the file.
startinclusive 0-based start position of records that should be read for this refID.
endexclusive 0-based end position of records that should be read for this refID.
overlapWhen true (default), return reads that just overlap the region; when false, only return reads that fall completely within the region
Returns
true = success; false = failure.

Definition at line 713 of file SamFile.cpp.

715 {
716  // If there is not a BAM file open for reading, return failure.
717  // Opening a new file clears the read section, so it must be
718  // set after the file is opened.
719  if(!myIsBamOpenForRead)
720  {
721  // There is not a BAM file open for reading.
723  "Cannot set section since there is no bam file open");
724  return(false);
725  }
726 
727  myNewSection = true;
728  myOverlapSection = overlap;
729  myStartPos = start;
730  myEndPos = end;
731  myRefID = refID;
732  myRefName.clear();
733  myChunksToRead.clear();
734  // Reset the end of the current chunk. We are resetting our read, so
735  // we no longer have a "current chunk" that we are reading.
736  myCurrentChunkEnd = 0;
738 
739  // Reset the sort order criteria since we moved around in the file.
740  myPrevCoord = -1;
741  myPrevRefID = 0;
742  myPrevReadName.Clear();
743 
744  return(true);
745 }

References StatGenStatus::FAIL_ORDER, myIsBamOpenForRead, myPrevCoord, myStatus, StatGenStatus::setStatus(), and StatGenStatus::SUCCESS.

◆ SetReadSequenceTranslation()

void SamFile::SetReadSequenceTranslation ( SamRecord::SequenceTranslation  translation)

Set the type of sequence translation to use when reading the sequence.

Passed down to the SamRecord when it is read.
The default type (if this method is never called) is NONE (the sequence is left as-is).

Parameters
translationtype of sequence translation to use.

Definition at line 387 of file SamFile.cpp.

388 {
389  myReadTranslation = translation;
390 }

◆ SetReference()

void SamFile::SetReference ( GenomeSequence reference)

Sets the reference to the specified genome sequence object.

Parameters
referencepointer to the GenomeSequence object.

Definition at line 380 of file SamFile.cpp.

381 {
382  myRefPtr = reference;
383 }

Referenced by Pileup< PILEUP_TYPE, FUNC_CLASS >::processFile().

◆ setSortedValidation()

void SamFile::setSortedValidation ( SortedType  sortType)

Set the flag to validate that the file is sorted as it is read/written.

Must be called after the file has been opened. Sorting validation is reset everytime SetReadPosition is called since it can jump around in the file.

Parameters
sortTypespecifies the type of sort to be checked for.

Definition at line 682 of file SamFile.cpp.

683 {
684  mySortedType = sortType;
685 }

Referenced by Pileup< PILEUP_TYPE, FUNC_CLASS >::processFile().

◆ SetWriteSequenceTranslation()

void SamFile::SetWriteSequenceTranslation ( SamRecord::SequenceTranslation  translation)

Set the type of sequence translation to use when writing the sequence.

Passed down to the SamRecord when it is written. The default type (if this method is never called) is NONE (the sequence is left as-is).

Parameters
translationtype of sequence translation to use.

Definition at line 394 of file SamFile.cpp.

395 {
396  myWriteTranslation = translation;
397 }

◆ validateSortOrder()

bool SamFile::validateSortOrder ( SamRecord record,
SamFileHeader header 
)
protected

Validate that the record is sorted compared to the previously read record if there is one, according to the specified sort order.

If the sort order is UNSORTED, true is returned. Sorting validation is reset everytime SetReadPosition is called since it can jump around in the file.

Definition at line 1019 of file SamFile.cpp.

1020 {
1021  if(myRefPtr != NULL)
1022  {
1023  record.setReference(myRefPtr);
1024  }
1025  record.setSequenceTranslation(myReadTranslation);
1026 
1027  bool status = false;
1028  if(mySortedType == UNSORTED)
1029  {
1030  // Unsorted, so nothing to validate, just return true.
1031  status = true;
1032  }
1033  else
1034  {
1035  // Check to see if mySortedType is based on the header.
1036  if(mySortedType == FLAG)
1037  {
1038  // Determine the sorted type from what was read out of the header.
1039  mySortedType = getSortOrderFromHeader(header);
1040  }
1041 
1042  if(mySortedType == QUERY_NAME)
1043  {
1044  // Validate that it is sorted by query name.
1045  // Get the query name from the record.
1046  const char* readName = record.getReadName();
1047 
1048  // Check if it is sorted either in samtools way or picard's way.
1049  if((myPrevReadName.Compare(readName) > 0) &&
1050  (strcmp(myPrevReadName.c_str(), readName) > 0))
1051  {
1052  // return false.
1053  String errorMessage = "ERROR: File is not sorted by read name at record ";
1054  errorMessage += myRecordCount;
1055  errorMessage += "\n\tPrevious record was ";
1056  errorMessage += myPrevReadName;
1057  errorMessage += ", but this record is ";
1058  errorMessage += readName;
1060  errorMessage.c_str());
1061  status = false;
1062  }
1063  else
1064  {
1065  myPrevReadName = readName;
1066  status = true;
1067  }
1068  }
1069  else
1070  {
1071  // Validate that it is sorted by COORDINATES.
1072  // Get the leftmost coordinate and the reference index.
1073  int32_t refID = record.getReferenceID();
1074  int32_t coord = record.get0BasedPosition();
1075  // The unmapped reference id is at the end of a sorted file.
1076  if(refID == BamIndex::REF_ID_UNMAPPED)
1077  {
1078  // A new reference ID that is for the unmapped reads
1079  // is always valid.
1080  status = true;
1081  myPrevRefID = refID;
1082  myPrevCoord = coord;
1083  }
1084  else if(myPrevRefID == BamIndex::REF_ID_UNMAPPED)
1085  {
1086  // Previous reference ID was for unmapped reads, but the
1087  // current one is not, so this is not sorted.
1088  String errorMessage = "ERROR: File is not coordinate sorted at record ";
1089  errorMessage += myRecordCount;
1090  errorMessage += "\n\tPrevious record was unmapped, but this record is ";
1091  errorMessage += header.getReferenceLabel(refID) + ":" + coord;
1093  errorMessage.c_str());
1094  status = false;
1095  }
1096  else if(refID < myPrevRefID)
1097  {
1098  // Current reference id is less than the previous one,
1099  //meaning that it is not sorted.
1100  String errorMessage = "ERROR: File is not coordinate sorted at record ";
1101  errorMessage += myRecordCount;
1102  errorMessage += "\n\tPrevious record was ";
1103  errorMessage += header.getReferenceLabel(myPrevRefID) + ":" + myPrevCoord;
1104  errorMessage += ", but this record is ";
1105  errorMessage += header.getReferenceLabel(refID) + ":" + coord;
1107  errorMessage.c_str());
1108  status = false;
1109  }
1110  else
1111  {
1112  // The reference IDs are in the correct order.
1113  if(refID > myPrevRefID)
1114  {
1115  // New reference id, so set the previous coordinate to -1
1116  myPrevCoord = -1;
1117  }
1118 
1119  // Check the coordinates.
1120  if(coord < myPrevCoord)
1121  {
1122  // New Coord is less than the previous position.
1123  String errorMessage = "ERROR: File is not coordinate sorted at record ";
1124  errorMessage += myRecordCount;
1125  errorMessage += "\n\tPreviousRecord was ";
1126  errorMessage += header.getReferenceLabel(myPrevRefID) + ":" + myPrevCoord;
1127  errorMessage += ", but this record is ";
1128  errorMessage += header.getReferenceLabel(refID) + ":" + coord;
1130  errorMessage.c_str());
1131  status = false;
1132  }
1133  else
1134  {
1135  myPrevRefID = refID;
1136  myPrevCoord = coord;
1137  status = true;
1138  }
1139  }
1140  }
1141  }
1142 
1143  return(status);
1144 }
const String & getReferenceLabel(int id) const
Return the reference name (chromosome) for the specified reference id.
int32_t getReferenceID()
Get the reference sequence id of the record (BAM format rid).
Definition: SamRecord.cpp:1305
int32_t get0BasedPosition()
Get the 0-based(BAM) leftmost position of the record.
Definition: SamRecord.cpp:1319
const char * getReadName()
Returns the SAM formatted Read Name (QNAME).
Definition: SamRecord.cpp:1542
@ INVALID_SORT
record is invalid due to it not being sorted.
Definition: StatGenStatus.h:43

References FLAG, SamRecord::get0BasedPosition(), SamRecord::getReadName(), SamRecord::getReferenceID(), SamFileHeader::getReferenceLabel(), StatGenStatus::INVALID_SORT, myPrevCoord, myRecordCount, myStatus, QUERY_NAME, BamIndex::REF_ID_UNMAPPED, SamRecord::setReference(), SamRecord::setSequenceTranslation(), StatGenStatus::setStatus(), and UNSORTED.

Referenced by ReadRecord(), and WriteRecord().

◆ WriteHeader()

bool SamFile::WriteHeader ( SamFileHeader header)

Writes the specified header into the file.

Returns
true = success; false = failure.

Definition at line 480 of file SamFile.cpp.

481 {
483  if(myIsOpenForWrite == false)
484  {
485  // File is not open for write
486  // -OR-
487  // The header has already been written.
489  "Cannot write header since the file is not open for writing");
490  return(false);
491  }
492 
493  if(myHasHeader == true)
494  {
495  // The header has already been written.
497  "Cannot write header since it has already been written");
498  return(false);
499  }
500 
501  if(myInterfacePtr->writeHeader(myFilePtr, header, myStatus))
502  {
503  // The header has now been successfully written.
504  myHasHeader = true;
505  return(true);
506  }
507 
508  // return the status.
509  return(false);
510 }

References StatGenStatus::FAIL_ORDER, myHasHeader, myIsOpenForWrite, myStatus, StatGenStatus::setStatus(), and StatGenStatus::SUCCESS.

Referenced by OpenForWrite().

◆ WriteRecord()

bool SamFile::WriteRecord ( SamFileHeader header,
SamRecord record 
)

Writes the specified record into the file.

Validates that the record is sorted according to the value set by setSortedValidation. No sorting validation is done if specified to be unsorted, or setSortedValidation was never called. Returns false and does not write the record if the record was not properly sorted.

Returns
true = success; false = failure.

Definition at line 632 of file SamFile.cpp.

634 {
635  if(myIsOpenForWrite == false)
636  {
637  // File is not open for writing
639  "Cannot write record since the file is not open for writing");
640  return(false);
641  }
642 
643  if(myHasHeader == false)
644  {
645  // The header has not yet been written.
647  "Cannot write record since the header has not been written");
648  return(false);
649  }
650 
651  // Before trying to write the record, validate the sort order.
652  if(!validateSortOrder(record, header))
653  {
654  // Not sorted like it is supposed to be, do not write the record
656  "Cannot write the record since the file is not properly sorted.");
657  return(false);
658  }
659 
660  if(myRefPtr != NULL)
661  {
662  record.setReference(myRefPtr);
663  }
664 
665  // File is open for writing and the header has been written, so write the
666  // record.
667  myStatus = myInterfacePtr->writeRecord(myFilePtr, header, record,
668  myWriteTranslation);
669 
671  {
672  // A record was successfully written, so increment the record count.
673  myRecordCount++;
674  return(true);
675  }
676  return(false);
677 }

References StatGenStatus::FAIL_ORDER, StatGenStatus::INVALID_SORT, myHasHeader, myIsOpenForWrite, myRecordCount, myStatus, SamRecord::setReference(), StatGenStatus::setStatus(), StatGenStatus::SUCCESS, and validateSortOrder().

Referenced by SamCoordOutput::flush().


The documentation for this class was generated from the following files: