Class DataSource
- java.lang.Object
-
- uk.ac.starlink.util.DataSource
-
- Direct Known Subclasses:
ByteArrayDataSource
,FileDataSource
,ProcessDataSource
,ResourceDataSource
,URLDataSource
public abstract class DataSource extends java.lang.Object
Represents a stream-like source of data. Instances of this class can be used to encapsulate the data available from a stream. The idea is that the stream should return the same sequence of bytes each time.As well as the ability to return a stream, a DataSource may also have a position, which corresponds to the 'ref' or 'frag' part of a URL (the bit after the #). This is an indication of a location in the stream; it is a string, and its interpretation is entirely up to the application (though may be specified by the documentation of specific DataSource subclasses).
As well as providing the facility for several different objects to get their own copy of the underlying input stream, this class also handles decompression of the stream. Compression types are as understood by the associated
Compression
class.For efficiency, a buffer of the bytes at the start of the stream called the 'intro buffer' is recorded the first time that the stream is read. This can then be used for magic number queries cheaply, without having to open a new input stream. In the case that the whole input stream is shorter than the intro buffer, the underlying input stream never has to be read again.
Any implementation which implements
getRawInputStream()
in such a way as to return different byte sequences on different occasions may lead to unpredictable behaviour from this class.- Author:
- Mark Taylor (Starlink)
- See Also:
Compression
-
-
Field Summary
Fields Modifier and Type Field Description static int
DEFAULT_INTRO_LIMIT
static java.lang.String
MARK_WORKAROUND_PROPERTY
-
Constructor Summary
Constructors Constructor Description DataSource()
Constructs a DataSource with a default size of intro buffer.DataSource(int introLimit)
Constructs a DataSource with a given size of intro buffer.
-
Method Summary
All Methods Static Methods Instance Methods Abstract Methods Concrete Methods Modifier and Type Method Description void
close()
Closes any open streams owned and not yet dispatched by this DataSource.DataSource
forceCompression(Compression compress)
Returns a DataSource representing the same underlying stream, but with a forced compression mode compress.Compression
getCompression()
Returns an object which will handle any required decompression for this stream.java.io.InputStream
getHybridInputStream()
Returns an input stream which appears just the same as the one returned bygetInputStream()
, but only incurs the expense of obtaining an actual input stream (by callinggetRawInputStream()
if more bytes are read than the cached magic number.java.io.InputStream
getInputStream()
Returns an InputStream containing the whole of this DataSource.static java.io.InputStream
getInputStream(java.lang.String location, boolean allowSystem)
Returns an input stream based on the given location string.byte[]
getIntro()
Returns the intro buffer, first reading it if this hasn't been done before.int
getIntroLimit()
Returns the maximum length of the intro buffer.long
getLength()
Returns the length of the stream returned by getInputStream in bytes, if known.static boolean
getMarkWorkaround()
Returns true if we are working around potential bugs in InputStreamInputStream.mark(int)
/InputStream.reset()
methods (common, including in J2SE classes).java.lang.String
getName()
Returns a name for this source.java.lang.String
getPosition()
Returns the position associated with this source.protected abstract java.io.InputStream
getRawInputStream()
Provides a new InputStream for this data source.long
getRawLength()
Returns the length in bytes of the stream returned by getRawInputStream, if known.java.lang.String
getSystemId()
Returns a System ID for this DataSource; this is a string representation of a file name or URL, as used bySource
and friends.java.net.URL
getURL()
Returns a URL which corresponds to this data source, if one exists.static DataSource
makeDataSource(java.lang.String loc)
Attempts to make a source given a string identifying its location as a file, URL or system command output.static DataSource
makeDataSource(java.lang.String loc, boolean allowSystem)
Attempts to make a source given a string identifying its location as a file, URL or optionally a system command output.static DataSource
makeDataSource(java.net.URL url)
Makes a source from a URL.void
setCompression(Compression compress)
Sets the compression to be associated with this data source.void
setIntroLimit(int limit)
Sets the maximum size of the intro buffer to a new value.static void
setMarkWorkaround(boolean workaround)
Sets whether we want to work around bugs in InputStream mark/reset methods.void
setName(java.lang.String name)
Sets the name of this source.void
setPosition(java.lang.String position)
Sets the position associated with this source.java.lang.String
toString()
Returns a short description of this source (name plus compression type).
-
-
-
Field Detail
-
DEFAULT_INTRO_LIMIT
public static final int DEFAULT_INTRO_LIMIT
- See Also:
- Constant Field Values
-
MARK_WORKAROUND_PROPERTY
public static final java.lang.String MARK_WORKAROUND_PROPERTY
- See Also:
- Constant Field Values
-
-
Method Detail
-
getRawInputStream
protected abstract java.io.InputStream getRawInputStream() throws java.io.IOException
Provides a new InputStream for this data source. This method should be implemented by subclasses to provide a new InputStream giving the raw content of the source each time it is called. The general contract of this method is that each time it is called it will return a stream with the same content.- Returns:
- an InputStream containing the data of this source
- Throws:
java.io.IOException
-
getURL
public java.net.URL getURL()
Returns a URL which corresponds to this data source, if one exists. AnURL.openConnection()
method call on the URL returned by this method should provide a stream with the same content as thegetRawInputStream()
method of this data source. If no such URL exists or is known, then null should be returned.If this source has a non-null position value, it will be appended to the main part of the URL after a '#' character (as the URL's ref part).
- Returns:
- a URL corresponding to this source, or null
-
getIntroLimit
public int getIntroLimit()
Returns the maximum length of the intro buffer.- Returns:
- maximum length of the intro buffer
-
setIntroLimit
public void setIntroLimit(int limit)
Sets the maximum size of the intro buffer to a new value. Setting the intro limit to a new value will discard any state which this source has, so for reasons of efficiency it's not a good idea to call this method except immediately after the source has been constructed and before any reads have taken place.- Parameters:
limit
- the new maximum length of the intro buffer
-
getRawLength
public long getRawLength()
Returns the length in bytes of the stream returned by getRawInputStream, if known. If the length is not known then -1 should be returned. The implementation of this method in DataSource returns -1; subclasses should override it if they can determine their length.- Returns:
- the length of the raw input stream, or -1
-
getLength
public long getLength()
Returns the length of the stream returned by getInputStream in bytes, if known. A return value of -1 indicates that the length is unknown. The return value of this method may change from -1 to a positive value during the life of this object if it happens to work out how long it is.- Returns:
- the length of the stream in bytes, or -1
-
getName
public java.lang.String getName()
Returns a name for this source. This name is mainly intended as a label identifying the source for use in informational messages; it is not in general intended to be used to provide an absolute reference to the source. Thus, for instance, if the source references a file, its name might be a relative pathname or simple filename, rather than its absolute pathname. To identify the source absolutely, thegetURL()
method (or some suitable class-specific method) should be used. If this source has a position, it should probably form part of this name.- Returns:
- a name
-
setName
public void setName(java.lang.String name)
Sets the name of this source.- Parameters:
name
- a name- See Also:
getName()
-
getPosition
public java.lang.String getPosition()
Returns the position associated with this source. It is a string giving an indication of the part of the stream which is of interest. Its interpretation is up to the application.- Returns:
- the position string, or null
-
setPosition
public void setPosition(java.lang.String position)
Sets the position associated with this source. It is a string giving an indication of the part of the stream which is of interest. Its interpretation is up to the application.- Parameters:
position
- the new posisition (may be null)
-
getSystemId
public java.lang.String getSystemId()
Returns a System ID for this DataSource; this is a string representation of a file name or URL, as used bySource
and friends. The return value may be null if none is known. This does not contain any reference to the position.- Returns:
- the System ID string for this source, or null
-
getCompression
public Compression getCompression() throws java.io.IOException
Returns an object which will handle any required decompression for this stream. A raw data stream is read and its magic number (first few bytes) matched against known patterns to determine if any known compression method is in use. If no known compression is being used, the value Compression.NONE is returned.- Returns:
- a Compression object encoding this stream
- Throws:
java.io.IOException
-
getIntro
public byte[] getIntro() throws java.io.IOException
Returns the intro buffer, first reading it if this hasn't been done before. The intro buffer will contain the first few bytes of the decompressed stream. The number of bytes it contains (the size of the returned byte[] array) will be the smaller of introLimit and the length of the underlying uncompressed stream.The returned buffer is the original not a copy - don't change its contents!
- Returns:
- the first few bytes of the uncompressed stream, up to a limit of introLimit
- Throws:
java.io.IOException
-
setCompression
public void setCompression(Compression compress)
Sets the compression to be associated with this data source. In general it will not be necessary or advisable to call this method, since this object will figure it out using magic numbers of the underlying stream. It can be used if the compression method is known, or to force use of a particular compression; in particular setCompression(Compression.NONE) can be used to force direct examination of the underlying stream without decompression, even if the underlying stream is in fact compressed.The effects of setting a compression to a mode (other than NONE) which does not match the actual compression mode of the underlying stream are undefined, so this method should be used with care.
- Parameters:
compress
- the compression mode encoding the underlying stream
-
forceCompression
public DataSource forceCompression(Compression compress)
Returns a DataSource representing the same underlying stream, but with a forced compression mode compress. The returned DataSource object may be the same object as this one, but if it has a different compression mode from compress a new one will be created. As withsetCompression(uk.ac.starlink.util.Compression)
, the consequences of using a different value of compress than the correct one (other thanCompression.NONE
are unpredictable.- Parameters:
compress
- the compression mode to be used for the returned data source- Returns:
- a data source with the same underlying stream as this, but a compression mode given by compress
-
getInputStream
public java.io.InputStream getInputStream() throws java.io.IOException
Returns an InputStream containing the whole of this DataSource. If compression is detected in the underlying stream, it will be decompressed. The returned stream should be closed by the user when no longer required.- Returns:
- an input stream that reads from the beginning of the underlying data source, decompressing it if appropriate
- Throws:
java.io.IOException
-
getHybridInputStream
public java.io.InputStream getHybridInputStream() throws java.io.IOException
Returns an input stream which appears just the same as the one returned bygetInputStream()
, but only incurs the expense of obtaining an actual input stream (by callinggetRawInputStream()
if more bytes are read than the cached magic number. This is an efficient way to read if you need an InputStream but may only end up reading the first few bytes of it.- Returns:
- an input stream that reads from the beginning of the underlying data source, decompressing it if appropriate
- Throws:
java.io.IOException
-
close
public void close()
Closes any open streams owned and not yet dispatched by this DataSource. Should be called if this object is no longer required, or if it may not be required for some while. Calling this method does not prevent any other method being called on this object in the future. This method throws no checked exceptions; any IOException thrown during closing any owned streams are simply discarded.
-
toString
public java.lang.String toString()
Returns a short description of this source (name plus compression type).- Overrides:
toString
in classjava.lang.Object
- Returns:
- description of this DataSource
-
makeDataSource
public static DataSource makeDataSource(java.lang.String loc) throws java.io.IOException
Attempts to make a source given a string identifying its location as a file, URL or system command output. This may be one of the following options:- filename
- URL
- a string preceded by "<" or followed by "|", giving a shell command line (may not work on all platforms)
If a '#' character exists in the string, text after it will be interpreted as a position value. Otherwise, the position is considered to be null.
Note: this method presents a security risk if the
loc
string is vulnerable to injection. Consider using the variant methodmakeDataSource
(loc,false) in such cases. This method just callsmakeDataSource(loc,true)
.- Parameters:
loc
- the location of the data, with optional position- Returns:
- a DataSource based on the data at loc
- Throws:
java.io.IOException
- if loc does not name an existing readable file or valid URL
-
makeDataSource
public static DataSource makeDataSource(java.lang.String loc, boolean allowSystem) throws java.io.IOException
Attempts to make a source given a string identifying its location as a file, URL or optionally a system command output.The supplied
loc
may be one of the following:- filename
- URL
- only if
allowSystem=true
: a string preceded by "<" or followed by "|", giving a shell command line (may not work on all platforms)
If a '#' character exists in the string, text after it will be interpreted as a position value. Otherwise, the position is considered to be null.
Note: setting
allowSystem=true
may introduce a security risk if theloc
string is vulnerable to injection.- Parameters:
loc
- the location of the data, with optional positionallowSystem
- whether to allow system commands using the format above- Returns:
- a DataSource based on the data at loc
- Throws:
java.io.IOException
- if loc does not name an existing readable file or valid URL
-
makeDataSource
public static DataSource makeDataSource(java.net.URL url)
Makes a source from a URL. If url is a file-protocol URL referencing an existing file then a FileDataSource will be returned, otherwise it will be a URLDataSource. Under certain circumstances, it may be more efficient to use a FileDataSource than a URLDataSource, which is why this method may be worth using.- Parameters:
url
- location of the data stream- Returns:
- data source which returns the data at url
-
getInputStream
public static java.io.InputStream getInputStream(java.lang.String location, boolean allowSystem) throws java.io.IOException
Returns an input stream based on the given location string. The content of the stream may be compressed or uncompressed data; the returned stream will be an uncompressed version. The following options are allowed for the location:- filename
- URL
- "-" meaning standard input
- only if
allowSystem=true
: a string preceded by "<" or followed by "|", giving a shell command line (may not work on all platforms)
Note: setting
allowSystem=true
may introduce a security risk if theloc
string is vulnerable to injection.- Parameters:
location
- URL, filename, "cmdline|"/"<cmdline", or "-"allowSystem
- whether to allow system commands using the format above- Returns:
- uncompressed stream containing the data at location
- Throws:
java.io.FileNotFoundException
- if location cannot be interpreted as a source of bytesjava.io.IOException
- if there is an error obtaining the stream
-
getMarkWorkaround
public static boolean getMarkWorkaround()
Returns true if we are working around potential bugs in InputStreamInputStream.mark(int)
/InputStream.reset()
methods (common, including in J2SE classes). The return value is dependent on the system property namedMARK_WORKAROUND_PROPERTY
.- Returns:
- true iff we are working around mark/reset bugs
-
setMarkWorkaround
public static void setMarkWorkaround(boolean workaround)
Sets whether we want to work around bugs in InputStream mark/reset methods.- Parameters:
workaround
- true to employ the workaround
-
-