Public API¶
The main factory¶
The main factory function you should use in your application to get a
openxmllib.document.Document
subclass:
openxmllib.wordprocessing.WordprocessingDocument
typically built from MS Word.openxmllib.presentation.PresentationDocument
typically built from MS Powerpoint.openxmllib.spreadsheet.SpreadsheetDocument
typically built from MS Excel.
If you’re missusing this factory, you’ll get a ValueError
exception that
says what’s wrong.
-
openxmllib.
openXmlDocument
(path=None, file_=None, data=None, url=None, mime_type=None)[source]¶ Factory function
Will guess what document type is best suited and return the appropriate document type. User must provide either
path
,file_
,data
orurl
parameter.Parameters: - path – file path in the local filesystem to a document.
- file – a file (like) object to a document (must be opened in ‘rb’ mode’)
- data – the binary data of a document
- url – the URL of a document
- mime_type – mime type if known. One of the known MIME types from
openxmllib.contenttypes
.
Note that
mime_tyype
parameter must be provided if you provide the Open XML document through thedata
parameter. Otherwise, if you don’t provide one, we’ll try to guess which is the most appropriate using the file extension.Returns: A subclass of openxmllib.document.Document
.
The document classes¶
Base class¶
All documents classes inherit from openxmllib.document.Document
.
-
class
openxmllib.document.
Document
(file_, mime_type=None)[source]¶ Base class for handling Open XML document (all types)
Must be subclassed for various types of documents (word processing, …)
Parameters: - file – An opened file(like) object of the document that must be opened in ‘rb’ mode
- mime_type – the MIME type for the file, potentially found by
openxmllib.openXmlDocument()
-
allProperties
¶ Helper that merges core, extended and custom properties
Returns: mapping of all properties
-
classmethod
canProcessFilename
(filename)[source]¶ Check if we can process such file based on name
Parameters: filename – File name as ‘mydoc.docx’ Returns: True if we can process such file
-
classmethod
canProcessMime
(mime_type)[source]¶ Check if we can process such mime type
Parameters: mime_type – Mime type as ‘application/xxx’ Returns: True if we can process such mime
-
content_types
= None¶ A
openxmllib.contenttypes.ContentTypes
object for this document
-
coreProperties
¶ Document core properties (author, …) similar to DublinCore
Returns: mapping of standard metadata like {'title': 'blah', 'language': 'fr-FR', ...}
-
customProperties
¶ Document custom properties added by the document author.
We canot convert the properties as indicated with the http://schemas.openxmlformats.org/officeDocument/2006/docPropsVTypes namespace
Returns: mapping of metadata
-
extendedProperties
¶ Additional document automatic properties provided by the office app
Returns: mapping of metadata like {'Pages': '14', ...}
-
filename
= None¶ The file mane of the document
-
indexableText
(include_properties=True)[source]¶ Words found in the various texts of the document.
Parameters: include_properties – Adds words from properties Returns: Space separated words of the document.
-
mimeType
¶ The official MIME type for this document, guessed from the extensions of the
openxmllib.document.Document.filename
attribute, as opposed to theopenxmllib.document.Document.mime_type
attribute.Returns: application/xxx
for this file
-
mime_type
= None¶ The MIME type of the document
Other attributes¶
-
Document.
_extpattern_to_mime
= {}¶ A mapping like
{glob-expr: mime-type, ...}
must be overriden by subclasses
-
Document.
_text_extractors
= []¶ A sequence of extractor objects for text extraction must be overriden by subclasses
Hint
Metadata
The various metadata provided by
openxmllib.document.Document.coreProperties
,
openxmllib.document.Document.extendedProperties
and
openxmllib.document.Document.customProperties
depend on the
application used to build the document. You can use the Command line: openxmlinfo
to see what properties / metadata are applied to your document using the
command: openxmlinfo -vv metadata your-file
.