libStatGen Software  1
InputFile Class Reference

Class for easily reading/writing files without having to worry about file type (uncompressed, gzip, bgzf) when reading. More...

#include <InputFile.h>

Inheritance diagram for InputFile:
Collaboration diagram for InputFile:

Public Types

enum  ifileCompression { DEFAULT, UNCOMPRESSED, GZIP, BGZF }
 Compression to use when writing a file & decompression used when reading a file from stdin. More...
 

Public Member Functions

 InputFile ()
 Default constructor.
 
 ~InputFile ()
 Destructor.
 
 InputFile (const char *filename, const char *mode, InputFile::ifileCompression compressionMode=InputFile::DEFAULT)
 Constructor for opening a file. More...
 
void bufferReads (unsigned int bufferSize=DEFAULT_BUFFER_SIZE)
 Set the buffer size for reading from files so that bufferSize bytes are read at a time and stored until accessed by another read call. More...
 
void disableBuffering ()
 Disable read buffering.
 
int ifclose ()
 Close the file. More...
 
int ifread (void *buffer, unsigned int size)
 Read size bytes from the file into the buffer. More...
 
int readTilChar (const std::string &stopChars, std::string &stringRef)
 Read until the specified characters, returning which character was found causing the stop, -1 returned for EOF, storing the other read characters into the specified string. More...
 
int readTilChar (const std::string &stopChars)
 Read until the specified characters, returning which character was found causing the stop, -1 returned for EOF, dropping all read chars. More...
 
int discardLine ()
 Read until the end of the line, discarding the characters, returning -1 returned for EOF and returning 0 if the end of the line was found. More...
 
int readLine (std::string &line)
 Read, appending the characters into the specified string until new line or EOF is found, returning -1 if EOF is found first and 0 if new line is found first. More...
 
int readTilTab (std::string &field)
 Read, appending the characters into the specified string until tab, new line, or EOF is found, returning -1 if EOF is found first, 0 if new line is found first, or 1 if a tab is found first. More...
 
int ifgetc ()
 Get a character from the file. More...
 
bool ifgetline (void *voidBuffer, size_t max)
 Get a line from the file. More...
 
void ifrewind ()
 Reset to the beginning of the file.
 
int ifeof () const
 Check to see if we have reached the EOF. More...
 
unsigned int ifwrite (const void *buffer, unsigned int size)
 Write the specified buffer into the file. More...
 
bool isOpen () const
 Returns whether or not the file was successfully opened. More...
 
int64_t iftell ()
 Get current position in the file. More...
 
bool ifseek (int64_t offset, int origin)
 Seek to the specified offset from the origin. More...
 
const char * getFileName () const
 Get the filename that is currently opened. More...
 
void setAttemptRecovery (bool flag=false)
 Enable (default) or disable recovery. More...
 
bool attemptRecoverySync (bool(*checkSignature)(void *data), int length)
 
bool openFile (const char *filename, const char *mode, InputFile::ifileCompression compressionMode)
 

Protected Member Functions

int readFromFile (void *buffer, unsigned int size)
 

Protected Attributes

FileTypemyFileTypePtr
 
unsigned int myAllocatedBufferSize
 
char * myFileBuffer
 
int myBufferIndex
 
int myCurrentBufferSize
 
std::string myFileName
 

Static Protected Attributes

static const unsigned int DEFAULT_BUFFER_SIZE = 65536
 

Detailed Description

Class for easily reading/writing files without having to worry about file type (uncompressed, gzip, bgzf) when reading.

It hides the low level file operations/structure from the user, allowing them to generically open and operate on a file using the same interface without knowing the file format (standard uncompressed, gzip, or bgzf). For writing, the user must specify the file type. There is a typedef IFILE which is InputFile* and setup to mimic FILE including global methods that take IFILE as a parameter.

Definition at line 36 of file InputFile.h.

Member Enumeration Documentation

◆ ifileCompression

Compression to use when writing a file & decompression used when reading a file from stdin.

Any other read checks the file to determine how to uncompress it.

Enumerator
DEFAULT 

Check the extension, if it is ".gz", treat as gzip, otherwise treat it as UNCOMPRESSED.

UNCOMPRESSED 

uncompressed file.

GZIP 

gzip file.

BGZF 

bgzf file.

Definition at line 44 of file InputFile.h.

44  {
45  DEFAULT, ///< Check the extension, if it is ".gz", treat as gzip, otherwise treat it as UNCOMPRESSED.
46  UNCOMPRESSED, ///< uncompressed file.
47  GZIP, ///< gzip file.
48  BGZF ///< bgzf file.
49  };
Definition: bgzf.h:44
uncompressed file.
Definition: InputFile.h:46
Check the extension, if it is ".gz", treat as gzip, otherwise treat it as UNCOMPRESSED.
Definition: InputFile.h:45
gzip file.
Definition: InputFile.h:47

Constructor & Destructor Documentation

◆ InputFile()

InputFile::InputFile ( const char *  filename,
const char *  mode,
InputFile::ifileCompression  compressionMode = InputFile::DEFAULT 
)

Constructor for opening a file.

Parameters
filenamefile to open
modesame format as fopen: "r" for read & "w" for write.
compressionModeset the type of file to open for writing or for reading from stdin (when reading files, the compression type is determined by reading the file).

Definition at line 28 of file InputFile.cpp.

30 {
31  // XXX duplicate code
32  myAttemptRecovery = false;
33  myFileTypePtr = NULL;
34  myBufferIndex = 0;
35  myCurrentBufferSize = 0;
36  myAllocatedBufferSize = DEFAULT_BUFFER_SIZE;
37  myFileBuffer = new char[myAllocatedBufferSize];
38  myFileName.clear();
39 
40  openFile(filename, mode, compressionMode);
41 }

Member Function Documentation

◆ bufferReads()

void InputFile::bufferReads ( unsigned int  bufferSize = DEFAULT_BUFFER_SIZE)
inline

Set the buffer size for reading from files so that bufferSize bytes are read at a time and stored until accessed by another read call.

This improves performance over reading the file small bits at a time. Buffering reads disables the tell call for bgzf files. Any previous values in the buffer will be deleted.

Parameters
bufferSizenumber of bytes to read/buffer at a time, turn off read buffering by setting bufferSize = 1;

Definition at line 83 of file InputFile.h.

Referenced by disableBuffering(), and readTilTab().

84  {
85  // If the buffer size is the same, do nothing.
86  if(bufferSize == myAllocatedBufferSize)
87  {
88  return;
89  }
90  // Delete the previous buffer.
91  if(myFileBuffer != NULL)
92  {
93  delete[] myFileBuffer;
94  }
95  myBufferIndex = 0;
96  myCurrentBufferSize = 0;
97  // The buffer size must be at least 1 so one character can be
98  // read and ifgetc can just assume reading into the buffer.
99  if(bufferSize < 1)
100  {
101  bufferSize = 1;
102  }
103  myFileBuffer = new char[bufferSize];
104  myAllocatedBufferSize = bufferSize;
105 
106  if(myFileTypePtr != NULL)
107  {
108  if(bufferSize == 1)
109  {
110  myFileTypePtr->setBuffered(false);
111  }
112  else
113  {
114  myFileTypePtr->setBuffered(true);
115  }
116  }
117  }

◆ discardLine()

int InputFile::discardLine ( )

Read until the end of the line, discarding the characters, returning -1 returned for EOF and returning 0 if the end of the line was found.

Returns
0 if the end of the line was found before EOF or -1 for EOF.

Definition at line 95 of file InputFile.cpp.

References ifgetc().

Referenced by GenomeSequence::getChromosome(), and ifread().

96 {
97  int charRead = 0;
98  // Loop until the character was not found in the stop characters.
99  while((charRead != EOF) && (charRead != '\n'))
100  {
101  charRead = ifgetc();
102  }
103  // First Check for EOF. If EOF is found, just return -1
104  if(charRead == EOF)
105  {
106  return(-1);
107  }
108  return(0);
109 }
int ifgetc()
Get a character from the file.
Definition: InputFile.h:324

◆ getFileName()

const char* InputFile::getFileName ( ) const
inline

Get the filename that is currently opened.

Returns
filename associated with this class

Definition at line 473 of file InputFile.h.

Referenced by SamFile::ReadBamIndex().

474  {
475  return(myFileName.c_str());
476  }

◆ ifclose()

int InputFile::ifclose ( )
inline

Close the file.

Returns
status of the close (0 is success).

Definition at line 133 of file InputFile.h.

Referenced by ifclose().

134  {
135  if (myFileTypePtr == NULL)
136  {
137  return EOF;
138  }
139  int result = myFileTypePtr->close();
140  delete myFileTypePtr;
141  myFileTypePtr = NULL;
142  myFileName.clear();
143  return result;
144  }

◆ ifeof()

int InputFile::ifeof ( ) const
inline

Check to see if we have reached the EOF.

Returns
0 if not EOF, any other value means EOF.

Definition at line 386 of file InputFile.h.

Referenced by GenomeSequence::getChromosome(), ifeof(), readLine(), and readTilTab().

387  {
388  // Not EOF if we are not at the end of the buffer.
389  if (myBufferIndex < myCurrentBufferSize)
390  {
391  // There are still available bytes in the buffer, so NOT EOF.
392  return false;
393  }
394  else
395  {
396  if (myFileTypePtr == NULL)
397  {
398  // No myFileTypePtr, so not eof (return 0).
399  return 0;
400  }
401  // exhausted our buffer, so check the file for eof.
402  return myFileTypePtr->eof();
403  }
404  }

◆ ifgetc()

int InputFile::ifgetc ( )
inline

Get a character from the file.

Read a character from the internal buffer, or if the end of the buffer has been reached, read from the file into the buffer and return index 0.

Returns
character that was read or EOF.

Definition at line 324 of file InputFile.h.

Referenced by discardLine(), ifgetc(), ifgetline(), operator>>(), readLine(), readTilChar(), and readTilTab().

325  {
326  if (myBufferIndex >= myCurrentBufferSize)
327  {
328  // at the last index, read a new buffer.
329  myCurrentBufferSize = readFromFile(myFileBuffer, myAllocatedBufferSize);
330  myBufferIndex = 0;
331  // If the buffer index is still greater than or equal to the
332  // myCurrentBufferSize, then we failed to read the file - return EOF.
333  // NB: This only needs to be checked when myCurrentBufferSize
334  // is changed. Simplify check - readFromFile returns zero on EOF
335  if (myCurrentBufferSize == 0)
336  {
337  return(EOF);
338  }
339  }
340  return(myFileBuffer[myBufferIndex++]);
341  }

◆ ifgetline()

bool InputFile::ifgetline ( void *  voidBuffer,
size_t  max 
)
inline

Get a line from the file.

Parameters
bufferthe buffer into which data is to be placed
maxthe maximum size of the buffer, in bytes
Returns
true if the last character read was an EOF

Definition at line 347 of file InputFile.h.

References ifgetc().

Referenced by ifgetline().

348  {
349  int ch;
350  char *buffer = (char *) voidBuffer;
351 
352  while( (ch=ifgetc()) != '\n' && ch != EOF) {
353  *buffer++ = ch;
354  if((--max)<2)
355  {
356  // truncate the line, so drop remainder
357  while( (ch=ifgetc()) && ch != '\n' && ch != EOF)
358  {
359  }
360  break;
361  }
362  }
363  *buffer++ = '\0';
364  return ch==EOF;
365  }
int ifgetc()
Get a character from the file.
Definition: InputFile.h:324

◆ ifread()

int InputFile::ifread ( void *  buffer,
unsigned int  size 
)
inline

Read size bytes from the file into the buffer.

Parameters
bufferpointer to memory at least size bytes big to write the data into.
sizenumber of bytes to be read
Returns
number of bytes read, if it is not equal to size, there was either an error or the end of the file was reached, use ifeof to determine which case it was.

Definition at line 153 of file InputFile.h.

References discardLine(), readLine(), readTilChar(), and readTilTab().

Referenced by ifread().

154  {
155  // There are 2 cases:
156  // 1) There are already size available bytes in buffer.
157  // 2) There are not size bytes in buffer.
158 
159  // Determine the number of available bytes in the buffer.
160  unsigned int availableBytes = myCurrentBufferSize - myBufferIndex;
161  int returnSize = 0;
162 
163  // Case 1: There are already size available bytes in buffer.
164  if (size <= availableBytes)
165  {
166  // Just copy from the buffer, increment the index and return.
167  memcpy(buffer, myFileBuffer+myBufferIndex, size);
168  // Increment the buffer index.
169  myBufferIndex += size;
170  returnSize = size;
171  }
172  // Case 2: There are not size bytes in buffer.
173  else
174  {
175  // Check to see if there are some bytes in the buffer.
176  if (availableBytes > 0)
177  {
178  // Size > availableBytes > 0
179  // Copy the available bytes into the buffer.
180  memcpy(buffer, myFileBuffer+myBufferIndex, availableBytes);
181  }
182  // So far availableBytes have been copied into the read buffer.
183  returnSize = availableBytes;
184  // Increment myBufferIndex by what was read.
185  myBufferIndex += availableBytes;
186 
187  unsigned int remainingSize = size - availableBytes;
188 
189  // Check if the remaining size is more or less than the
190  // max buffer size.
191  if(remainingSize < myAllocatedBufferSize)
192  {
193  // the remaining size is not the full buffer, but read
194  // a full buffer worth of data anyway.
195  myCurrentBufferSize =
196  readFromFile(myFileBuffer, myAllocatedBufferSize);
197 
198  // Check for an error.
199  if(myCurrentBufferSize <= 0)
200  {
201  // No more data was successfully read, so check to see
202  // if any data was copied to the return buffer at all.
203  if( returnSize == 0)
204  {
205  // No data has been copied at all into the
206  // return read buffer, so just return the value
207  // returned from readFromFile.
208  returnSize = myCurrentBufferSize;
209  // Otherwise, returnSize is already set to the
210  // available bytes that was already copied (so no
211  // else statement is needed).
212  }
213  // Set myBufferIndex & myCurrentBufferSize to 0.
214  myCurrentBufferSize = 0;
215  myBufferIndex = 0;
216  }
217  else
218  {
219  // Successfully read more data.
220  // Check to see how much was copied.
221  int copySize = remainingSize;
222  if(copySize > myCurrentBufferSize)
223  {
224  // Not the entire requested amount was read
225  // (either from EOF or there was a partial read due to
226  // an error), so set the copySize to what was read.
227  copySize = myCurrentBufferSize;
228  }
229 
230  // Now copy the rest of the bytes into the buffer.
231  memcpy((char*)buffer+availableBytes,
232  myFileBuffer, copySize);
233 
234  // set the buffer index to the location after what we are
235  // returning as read.
236  myBufferIndex = copySize;
237 
238  returnSize += copySize;
239  }
240  }
241  else
242  {
243  // More remaining to be read than the max buffer size, so just
244  // read directly into the output buffer.
245  int readSize = readFromFile((char*)buffer + availableBytes,
246  remainingSize);
247 
248  // Already used the buffer, so "clear" it.
249  myCurrentBufferSize = 0;
250  myBufferIndex = 0;
251  if(readSize <= 0)
252  {
253  // No more data was successfully read, so check to see
254  // if any data was copied to the return buffer at all.
255  if(returnSize == 0)
256  {
257  // No data has been copied at all into the
258  // return read buffer, so just return the value
259  // returned from readFromFile.
260  returnSize = readSize;
261  // Otherwise, returnSize is already set to the
262  // available bytes that was already copied (so no
263  // else statement is needed).
264  }
265  }
266  else
267  {
268  // More data was read, so increment the return count.
269  returnSize += readSize;
270  }
271  }
272  }
273  return(returnSize);
274  }

◆ ifseek()

bool InputFile::ifseek ( int64_t  offset,
int  origin 
)
inline

Seek to the specified offset from the origin.

Parameters
offsetoffset into the file to move to (must be from a tell call)
origincan be any of the following: Note: not all are valid for all filetypes. SEEK_SET - Beginning of file SEEK_CUR - Current position of the file pointer SEEK_END - End of file
Returns
true on successful seek and false on a failed seek.

Definition at line 457 of file InputFile.h.

Referenced by ifseek().

458  {
459  if (myFileTypePtr == NULL)
460  {
461  // No myFileTypePtr, so return false - could not seek.
462  return false;
463  }
464  // TODO - may be able to seek within the buffer if applicable.
465  // Reset buffering since a seek is being done.
466  myBufferIndex = 0;
467  myCurrentBufferSize = 0;
468  return myFileTypePtr->seek(offset, origin);
469  }

◆ iftell()

int64_t InputFile::iftell ( )
inline

Get current position in the file.

Returns
current position in the file, -1 indicates an error.

Definition at line 436 of file InputFile.h.

Referenced by iftell().

437  {
438  if (myFileTypePtr == NULL)
439  {
440  // No myFileTypePtr, so return false - could not seek.
441  return -1;
442  }
443  int64_t pos = myFileTypePtr->tell();
444  pos -= (myCurrentBufferSize - myBufferIndex);
445  return(pos);
446  }

◆ ifwrite()

unsigned int InputFile::ifwrite ( const void *  buffer,
unsigned int  size 
)
inline

Write the specified buffer into the file.

Parameters
bufferbuffer containing size bytes to write to the file.
sizenumber of bytes to write
Returns
number of bytes written We do not buffer the write call, so just leave this as normal.

Definition at line 411 of file InputFile.h.

Referenced by ifwrite(), and operator<<().

412  {
413  if (myFileTypePtr == NULL)
414  {
415  // No myFileTypePtr, so return 0 - nothing written.
416  return 0;
417  }
418  return myFileTypePtr->write(buffer, size);
419  }

◆ isOpen()

bool InputFile::isOpen ( ) const
inline

Returns whether or not the file was successfully opened.

Returns
true if the file is open, false if not.

Definition at line 423 of file InputFile.h.

Referenced by ifopen(), FastQFile::isOpen(), SamFile::IsOpen(), GlfHeader::read(), SamRecord::setBufferFromFile(), GlfHeader::write(), and SamRecord::writeRecordBuffer().

424  {
425  // It is open if the myFileTypePtr is set and says it is open.
426  if ((myFileTypePtr != NULL) && myFileTypePtr->isOpen())
427  {
428  return true;
429  }
430  // File was not successfully opened.
431  return false;
432  }

◆ readLine()

int InputFile::readLine ( std::string &  line)

Read, appending the characters into the specified string until new line or EOF is found, returning -1 if EOF is found first and 0 if new line is found first.

The new line and EOF are not written into the specified string.

Parameters
linereference to a string that the read characters should be apppended to (does not include the new line or eof).
Returns
0 if new line and -1 for EOF.

Definition at line 112 of file InputFile.cpp.

References ifeof(), and ifgetc().

Referenced by ifread().

113 {
114  int charRead = 0;
115  while(!ifeof())
116  {
117  charRead = ifgetc();
118  if(charRead == EOF)
119  {
120  return(-1);
121  }
122  if(charRead == '\n')
123  {
124  return(0);
125  }
126  line += charRead;
127  }
128  // Should never get here.
129  return(-1);
130 }
int ifeof() const
Check to see if we have reached the EOF.
Definition: InputFile.h:386
int ifgetc()
Get a character from the file.
Definition: InputFile.h:324

◆ readTilChar() [1/2]

int InputFile::readTilChar ( const std::string &  stopChars,
std::string &  stringRef 
)

Read until the specified characters, returning which character was found causing the stop, -1 returned for EOF, storing the other read characters into the specified string.

Note: If stopChars is just '
', readLine is faster and if stopChars is just '
' and '', readTilTab is faster.

Parameters
stopCharscharacters to stop reading when they are hit.
stringRefreference to a string that the read characters should be apppended to (does not include the stopchar).
Returns
index of the character in stopChars that caused it to stop reading or -1 for EOF.

Definition at line 44 of file InputFile.cpp.

References ifgetc().

Referenced by ifread().

45 {
46  int charRead = 0;
47  size_t pos = std::string::npos;
48  // Loop until the character was not found in the stop characters.
49  while(pos == std::string::npos)
50  {
51  charRead = ifgetc();
52 
53  // First Check for EOF. If EOF is found, just return -1
54  if(charRead == EOF)
55  {
56  return(-1);
57  }
58 
59  // Try to find the character in the stopChars.
60  pos = stopChars.find(charRead);
61 
62  if(pos == std::string::npos)
63  {
64  // Didn't find a stop character and it is not an EOF,
65  // so add it to the string.
66  stringRef += charRead;
67  }
68  }
69  return(pos);
70 }
int ifgetc()
Get a character from the file.
Definition: InputFile.h:324

◆ readTilChar() [2/2]

int InputFile::readTilChar ( const std::string &  stopChars)

Read until the specified characters, returning which character was found causing the stop, -1 returned for EOF, dropping all read chars.

Note: If stopChars is just '
', discardLine is faster.

Parameters
stopCharscharacters to stop reading when they are hit.
Returns
index of the character in stopChars that caused it to stop reading or -1 for EOF.

Definition at line 73 of file InputFile.cpp.

References ifgetc().

74 {
75  int charRead = 0;
76  size_t pos = std::string::npos;
77  // Loop until the character was not found in the stop characters.
78  while(pos == std::string::npos)
79  {
80  charRead = ifgetc();
81 
82  // First Check for EOF. If EOF is found, just return -1
83  if(charRead == EOF)
84  {
85  return(-1);
86  }
87 
88  // Try to find the character in the stopChars.
89  pos = stopChars.find(charRead);
90  }
91  return(pos);
92 }
int ifgetc()
Get a character from the file.
Definition: InputFile.h:324

◆ readTilTab()

int InputFile::readTilTab ( std::string &  field)

Read, appending the characters into the specified string until tab, new line, or EOF is found, returning -1 if EOF is found first, 0 if new line is found first, or 1 if a tab is found first.

The tab, new line, and EOF are not written into the specified string.

Parameters
fieldreference to a string that the read characters should be apppended to (does not include the tab, new line, or eof).
Returns
1 if tab is found, 0 if new line, and -1 for EOF.

Definition at line 133 of file InputFile.cpp.

References BGZF, bufferReads(), DEFAULT, GZIP, ifeof(), ifgetc(), and UNCOMPRESSED.

Referenced by GenomeSequence::getChromosome(), and ifread().

134 {
135  int charRead = 0;
136  while(!ifeof())
137  {
138  charRead = ifgetc();
139  if(charRead == EOF)
140  {
141  return(-1);
142  }
143  if(charRead == '\n')
144  {
145  return(0);
146  }
147  if(charRead == '\t')
148  {
149  return(1);
150  }
151  field += charRead;
152  }
153  return(-1);
154 }
int ifeof() const
Check to see if we have reached the EOF.
Definition: InputFile.h:386
int ifgetc()
Get a character from the file.
Definition: InputFile.h:324

◆ setAttemptRecovery()

void InputFile::setAttemptRecovery ( bool  flag = false)
inline

Enable (default) or disable recovery.

When true, we can attach a myFileTypePtr that implements a recovery capable decompressor. This requires that the caller be able to catch the exception XXX "blah blah blah".

Definition at line 485 of file InputFile.h.

Referenced by SamFile::OpenForRead().

486  {
487  myAttemptRecovery = flag;
488  }

The documentation for this class was generated from the following files: