Package picard.sam.markduplicates.util
Class DiskBasedReadEndsForMarkDuplicatesMap
- java.lang.Object
-
- picard.sam.markduplicates.util.DiskBasedReadEndsForMarkDuplicatesMap
-
- All Implemented Interfaces:
ReadEndsForMarkDuplicatesMap
public class DiskBasedReadEndsForMarkDuplicatesMap extends Object implements ReadEndsForMarkDuplicatesMap
Disk-based implementation of ReadEndsForMarkDuplicatesMap. A subdirectory of the system tmpdir is created to store files, one for each reference sequence. The reference sequence that is currently being queried (i.e. the sequence for which remove() has been most recently called) is stored in RAM. ReadEnds for all other sequences are stored on disk. When put() is called for a sequence that is the current one in RAM, the ReadEnds object is merely put into the in-memory map. If put() is called for a sequence ID that is not the current RAM one, the ReadEnds object is appended to the file for that sequence, creating the file if necessary. When remove() is called for a sequence that is the current one in RAM, remove() is called on the in-memory map. If remove() is called for a sequence other than the current RAM sequence, then the current RAM sequence is written to disk, the new sequence is read from disk into RAM map, and the file for the new sequence is deleted. If things work properly, and reads are processed in genomic order, records will be written for mates that are in a later sequence. When the mate is reached in the input SAM file, the file that was written will be deleted. This should result in all temporary files being deleted by the time all the reads are processed. The temp directory is marked to be deleted on exit so everything should get cleaned up.
-
-
Constructor Summary
Constructors Constructor Description DiskBasedReadEndsForMarkDuplicatesMap(int maxOpenFiles, ReadEndsForMarkDuplicatesCodec readEndsForMarkDuplicatesCodec)
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description void
put(int mateSequenceIndex, String key, ReadEndsForMarkDuplicates readEnds)
Store the element in the map with the given key.ReadEndsForMarkDuplicates
remove(int mateSequenceIndex, String key)
Remove element with given key from the map.int
size()
int
sizeInRam()
-
-
-
Constructor Detail
-
DiskBasedReadEndsForMarkDuplicatesMap
public DiskBasedReadEndsForMarkDuplicatesMap(int maxOpenFiles, ReadEndsForMarkDuplicatesCodec readEndsForMarkDuplicatesCodec)
-
-
Method Detail
-
remove
public ReadEndsForMarkDuplicates remove(int mateSequenceIndex, String key)
Description copied from interface:ReadEndsForMarkDuplicatesMap
Remove element with given key from the map. Because an implementation may be disk-based, the object returned may not be the same object that was put into the map- Specified by:
remove
in interfaceReadEndsForMarkDuplicatesMap
- Parameters:
mateSequenceIndex
- must agree with the value used when the object was put into the mapkey
- typically, concatenation of read group ID and read name- Returns:
- null if the key is not found, otherwise the object removed.
-
put
public void put(int mateSequenceIndex, String key, ReadEndsForMarkDuplicates readEnds)
Description copied from interface:ReadEndsForMarkDuplicatesMap
Store the element in the map with the given key. It is assumed that the element is not already present in the map.- Specified by:
put
in interfaceReadEndsForMarkDuplicatesMap
- Parameters:
mateSequenceIndex
- use to optimize storage & retrieval. The same value must be used when trying to remove this element. It is not valid to store the same key with two different mateSequenceIndexes.key
- typically, concatenation of read group ID and read namereadEnds
- the object to be stored
-
size
public int size()
- Specified by:
size
in interfaceReadEndsForMarkDuplicatesMap
- Returns:
- number of elements stored in map
-
sizeInRam
public int sizeInRam()
- Specified by:
sizeInRam
in interfaceReadEndsForMarkDuplicatesMap
- Returns:
- number of elements stored in RAM. Always <= size()
-
-