Home | Trees | Indices | Help |
|
---|
|
object --+ | ??._BaseParser --+ | _FeedParser --+ | HTMLParser
HTMLParser(self, encoding=None, remove_blank_text=False, remove_comments=False, remove_pis=False, strip_cdata=True, no_network=True, target=None, schema: XMLSchema =None, recover=True, compact=True, collect_ids=True, huge_tree=False)
The HTML parser.
This parser allows reading HTML into a normal XML tree. By default, it can read broken (non well-formed) HTML, depending on the capabilities of libxml2. Use the 'recover' option to switch this off.
Available boolean keyword arguments:
recover - try hard to parse through broken HTML (default: True)
no_network - prevent network access for related files (default: True)
remove_blank_text - discard empty text nodes that are ignorable (i.e. not actual text content)
remove_comments - discard comments
remove_pis - discard processing instructions
strip_cdata - replace CDATA sections by normal text content (default: True)
compact - save memory for short text content (default: True)
default_doctype - add a default doctype even if it is not found in the HTML (default: True)
collect_ids - use a hash table of XML IDs for fast access (default: True)
and very long text content (only affects libxml2 2.7+)
Other keyword arguments:
Note that you should avoid sharing parsers between threads for performance reasons.
|
|||
|
|||
a new object with type S, a subtype of T |
|
||
Inherited from Inherited from Inherited from |
|
|||
Inherited from Inherited from Inherited from |
|
|
|
Home | Trees | Indices | Help |
|
---|
Generated by Epydoc 3.0.1 on Tue Mar 13 20:17:56 2018 | http://epydoc.sourceforge.net |