genshi.filters¶
Implementation of a number of stream filters.
genshi.filters.html¶
Implementation of a number of stream filters.
-
class
genshi.filters.html.
HTMLFormFiller
(name=None, id=None, data=None, passwords=False)¶ A stream filter that can populate HTML forms from a dictionary of values.
>>> from genshi.input import HTML >>> html = HTML('''<form> ... <p><input type="text" name="foo" /></p> ... </form>''', encoding='utf-8') >>> filler = HTMLFormFiller(data={'foo': 'bar'}) >>> print(html | filler) <form> <p><input type="text" name="foo" value="bar"/></p> </form>
-
class
genshi.filters.html.
HTMLSanitizer
(safe_tags=frozenset(['em', 'pre', 'code', 'p', 'h2', 'h3', 'h1', 'h6', 'h4', 'h5', 'table', 'font', 'u', 'select', 'kbd', 'strong', 'span', 'sub', 'img', 'area', 'menu', 'tt', 'tr', 'tbody', 'label', 'hr', 'dfn', 'tfoot', 'th', 'sup', 'strike', 'input', 'td', 'samp', 'cite', 'thead', 'map', 'dl', 'blockquote', 'fieldset', 'option', 'form', 'acronym', 'big', 'dd', 'var', 'ol', 'abbr', 'br', 'address', 'optgroup', 'li', 'dt', 'ins', 'legend', 'a', 'b', 'center', 'textarea', 'colgroup', 'i', 'button', 'q', 'caption', 's', 'del', 'small', 'div', 'col', 'dir', 'ul']), safe_attrs=frozenset(['rev', 'prompt', 'color', 'colspan', 'accesskey', 'usemap', 'cols', 'accept', 'datetime', 'char', 'accept-charset', 'shape', 'href', 'hreflang', 'selected', 'frame', 'type', 'alt', 'nowrap', 'border', 'id', 'axis', 'compact', 'rows', 'checked', 'for', 'start', 'hspace', 'charset', 'ismap', 'label', 'target', 'bgcolor', 'readonly', 'rel', 'valign', 'scope', 'size', 'cellspacing', 'cite', 'media', 'multiple', 'src', 'rules', 'nohref', 'action', 'rowspan', 'abbr', 'span', 'method', 'height', 'class', 'enctype', 'lang', 'disabled', 'name', 'charoff', 'clear', 'summary', 'value', 'longdesc', 'headers', 'vspace', 'noshade', 'coords', 'width', 'maxlength', 'cellpadding', 'title', 'align', 'dir', 'tabindex']), safe_schemes=frozenset(['mailto', 'ftp', 'http', 'file', 'https', None]), uri_attrs=frozenset(['src', 'lowsrc', 'href', 'dynsrc', 'background', 'action']))¶ A filter that removes potentially dangerous HTML tags and attributes from the stream.
>>> from genshi import HTML >>> html = HTML('<div><script>alert(document.cookie)</script></div>', encoding='utf-8') >>> print(html | HTMLSanitizer()) <div/>
The default set of safe tags and attributes can be modified when the filter is instantiated. For example, to allow inline
style
attributes, the following instantation would work:>>> html = HTML('<div style="background: #000"></div>', encoding='utf-8') >>> sanitizer = HTMLSanitizer(safe_attrs=HTMLSanitizer.SAFE_ATTRS | set(['style'])) >>> print(html | sanitizer) <div style="background: #000"/>
Note that even in this case, the filter does attempt to remove dangerous constructs from style attributes:
>>> html = HTML('<div style="background: url(javascript:void); color: #000"></div>', encoding='utf-8') >>> print(html | sanitizer) <div style="color: #000"/>
This handles HTML entities, unicode escapes in CSS and Javascript text, as well as a lot of other things. However, the style tag is still excluded by default because it is very hard for such sanitizing to be completely safe, especially considering how much error recovery current web browsers perform.
It also does some basic filtering of CSS properties that may be used for typical phishing attacks. For more sophisticated filtering, this class provides a couple of hooks that can be overridden in sub-classes.
Warn: Note that this special processing of CSS is currently only applied to style attributes, not style elements. -
is_safe_css
(propname, value)¶ Determine whether the given css property declaration is to be considered safe for inclusion in the output.
Parameters: - propname – the CSS property name
- value – the value of the property
Returns: whether the property value should be considered safe
Return type: bool
Since: version 0.6
-
is_safe_elem
(tag, attrs)¶ Determine whether the given element should be considered safe for inclusion in the output.
Parameters: Returns: whether the element should be considered safe
Return type: bool
Since: version 0.6
-
is_safe_uri
(uri)¶ Determine whether the given URI is to be considered safe for inclusion in the output.
The default implementation checks whether the scheme of the URI is in the set of allowed URIs (safe_schemes).
>>> sanitizer = HTMLSanitizer() >>> sanitizer.is_safe_uri('http://example.org/') True >>> sanitizer.is_safe_uri('javascript:alert(document.cookie)') False
Parameters: uri – the URI to check Returns: True if the URI can be considered safe, False otherwise Return type: bool Since: version 0.4.3
-
sanitize_css
(text)¶ Remove potentially dangerous property declarations from CSS code.
In particular, properties using the CSS
url()
function with a scheme that is not considered safe are removed:>>> sanitizer = HTMLSanitizer() >>> sanitizer.sanitize_css(u''' ... background: url(javascript:alert("foo")); ... color: #000; ... ''') [u'color: #000']
Also, the proprietary Internet Explorer function
expression()
is always stripped:>>> sanitizer.sanitize_css(u''' ... background: #fff; ... color: #000; ... width: e/**/xpression(alert("foo")); ... ''') [u'background: #fff', u'color: #000']
Parameters: text – the CSS text; this is expected to be unicode and to not contain any character or numeric references Returns: a list of declarations that are considered safe Return type: list Since: version 0.4.3
-
genshi.filters.i18n¶
Directives and utilities for internationalization and localization of templates.
since: | version 0.4 |
---|---|
note: | Directives support added since version 0.6 |
-
class
genshi.filters.i18n.
Translator
(translate=<gettext.NullTranslations instance>, ignore_tags=frozenset([QName('http://www.w3.org/1999/xhtml}style'), QName('http://www.w3.org/1999/xhtml}script'), QName('style'), QName('script')]), include_attrs=frozenset(['prompt', 'title', 'standby', 'summary', 'abbr', 'alt', 'label']), extract_text=True)¶ Can extract and translate localizable strings from markup streams and templates.
For example, assume the following template:
>>> tmpl = MarkupTemplate('''<html xmlns:py="http://genshi.edgewall.org/"> ... <head> ... <title>Example</title> ... </head> ... <body> ... <h1>Example</h1> ... <p>${_("Hello, %(name)s") % dict(name=username)}</p> ... </body> ... </html>''', filename='example.html')
For demonstration, we define a dummy
gettext
-style function with a hard-coded translation table, and pass that to the Translator initializer:>>> def pseudo_gettext(string): ... return { ... 'Example': 'Beispiel', ... 'Hello, %(name)s': 'Hallo, %(name)s' ... }[string] >>> translator = Translator(pseudo_gettext)
Next, the translator needs to be prepended to any already defined filters on the template:
>>> tmpl.filters.insert(0, translator)
When generating the template output, our hard-coded translations should be applied as expected:
>>> print(tmpl.generate(username='Hans', _=pseudo_gettext)) <html> <head> <title>Beispiel</title> </head> <body> <h1>Beispiel</h1> <p>Hallo, Hans</p> </body> </html>
Note that elements defining
xml:lang
attributes that do not contain variable expressions are ignored by this filter. That can be used to exclude specific parts of a template from being extracted and translated.-
extract
(stream, gettext_functions=('_', 'gettext', 'ngettext', 'dgettext', 'dngettext', 'ugettext', 'ungettext'), search_text=True, comment_stack=None)¶ Extract localizable strings from the given template stream.
For every string found, this function yields a
(lineno, function, message, comments)
tuple, where:lineno
is the number of the line on which the string was found,function
is the name of thegettext
function used (if the string was extracted from embedded Python code), andmessage
is the string itself (aunicode
object, or a tuple ofunicode
objects for functions with multiple string arguments).comments
is a list of comments related to the message, extracted fromi18n:comment
attributes found in the markup
>>> tmpl = MarkupTemplate('''<html xmlns:py="http://genshi.edgewall.org/"> ... <head> ... <title>Example</title> ... </head> ... <body> ... <h1>Example</h1> ... <p>${_("Hello, %(name)s") % dict(name=username)}</p> ... <p>${ngettext("You have %d item", "You have %d items", num)}</p> ... </body> ... </html>''', filename='example.html') >>> for line, func, msg, comments in Translator().extract(tmpl.stream): ... print('%d, %r, %r' % (line, func, msg)) 3, None, u'Example' 6, None, u'Example' 7, '_', u'Hello, %(name)s' 8, 'ngettext', (u'You have %d item', u'You have %d items', None)
Parameters: - stream – the event stream to extract strings from; can be a regular stream or a template stream
- gettext_functions – a sequence of function names that should be treated as gettext-style localization functions
- search_text – whether the content of text nodes should be extracted (used internally)
Note: Changed in 0.4.1: For a function with multiple string arguments (such as
ngettext
), a single item with a tuple of strings is yielded, instead an item for each string argument.Note: Changed in 0.6: The returned tuples now include a fourth element, which is a list of comments for the translator.
-
setup
(template)¶ Convenience function to register the Translator filter and the related directives with the given template.
Parameters: template – a Template instance
-
-
genshi.filters.i18n.
extract
(fileobj, keywords, comment_tags, options)¶ Babel extraction method for Genshi templates.
Parameters: - fileobj – the file-like object the messages should be extracted from
- keywords – a list of keywords (i.e. function names) that should be recognized as translation functions
- comment_tags – a list of translator tags to search for and include in the results
- options – a dictionary of additional options (optional)
Returns: an iterator over
(lineno, funcname, message, comments)
tuplesReturn type: iterator
genshi.filters.transform¶
A filter for functional-style transformations of markup streams.
The Transformer filter provides a variety of transformations that can be applied to parts of streams that match given XPath expressions. These transformations can be chained to achieve results that would be comparitively tedious to achieve by writing stream filters by hand. The approach of chaining node selection and transformation has been inspired by the jQuery Javascript library.
For example, the following transformation removes the <title>
element from
the <head>
of the input document:
>>> from genshi.builder import tag
>>> html = HTML('''<html>
... <head><title>Some Title</title></head>
... <body>
... Some <em>body</em> text.
... </body>
... </html>''',
... encoding='utf-8')
>>> print(html | Transformer('body/em').map(unicode.upper, TEXT)
... .unwrap().wrap(tag.u))
<html>
<head><title>Some Title</title></head>
<body>
Some <u>BODY</u> text.
</body>
</html>
The Transformer
support a large number of useful transformations out of the
box, but custom transformations can be added easily.
since: | version 0.5 |
---|
-
class
genshi.filters.transform.
Transformer
(path='.')¶ Stream filter that can apply a variety of different transformations to a stream.
This is achieved by selecting the events to be transformed using XPath, then applying the transformations to the events matched by the path expression. Each marked event is in the form (mark, (kind, data, pos)), where mark can be any of ENTER, INSIDE, EXIT, OUTSIDE, or None.
The first three marks match START and END events, and any events contained INSIDE any selected XML/HTML element. A non-element match outside a START/END container (e.g.
text()
) will yield an OUTSIDE mark.>>> html = HTML('<html><head><title>Some Title</title></head>' ... '<body>Some <em>body</em> text.</body></html>', ... encoding='utf-8')
Transformations act on selected stream events matching an XPath expression. Here’s an example of removing some markup (the title, in this case) selected by an expression:
>>> print(html | Transformer('head/title').remove()) <html><head/><body>Some <em>body</em> text.</body></html>
Inserted content can be passed in the form of a string, or a markup event stream, which includes streams generated programmatically via the builder module:
>>> from genshi.builder import tag >>> print(html | Transformer('body').prepend(tag.h1('Document Title'))) <html><head><title>Some Title</title></head><body><h1>Document Title</h1>Some <em>body</em> text.</body></html>
Each XPath expression determines the set of tags that will be acted upon by subsequent transformations. In this example we select the
<title>
text, copy it into a buffer, then select the<body>
element and paste the copied text into the body as<h1>
enclosed text:>>> buffer = StreamBuffer() >>> print(html | Transformer('head/title/text()').copy(buffer) ... .end().select('body').prepend(tag.h1(buffer))) <html><head><title>Some Title</title></head><body><h1>Some Title</h1>Some <em>body</em> text.</body></html>
Transformations can also be assigned and reused, although care must be taken when using buffers, to ensure that buffers are cleared between transforms:
>>> emphasis = Transformer('body//em').attr('class', 'emphasis') >>> print(html | emphasis) <html><head><title>Some Title</title></head><body>Some <em class="emphasis">body</em> text.</body></html>
-
after
(content)¶ Insert content after selection.
Here, we insert some text after the </em> closing tag:
>>> html = HTML('<html><head><title>Some Title</title></head>' ... '<body>Some <em>body</em> text.</body></html>', ... encoding='utf-8') >>> print(html | Transformer('.//em').after(' rock')) <html><head><title>Some Title</title></head><body>Some <em>body</em> rock text.</body></html>
Parameters: content – Either a callable, an iterable of events, or a string to insert. Return type: Transformer
-
append
(content)¶ Insert content before the END event of the selection.
>>> html = HTML('<html><head><title>Some Title</title></head>' ... '<body>Some <em>body</em> text.</body></html>', ... encoding='utf-8') >>> print(html | Transformer('.//body').append(' Some new body text.')) <html><head><title>Some Title</title></head><body>Some <em>body</em> text. Some new body text.</body></html>
Parameters: content – Either a callable, an iterable of events, or a string to insert. Return type: Transformer
-
apply
(function)¶ Apply a transformation to the stream.
Transformations can be chained, similar to stream filters. Any callable accepting a marked stream can be used as a transform.
As an example, here is a simple TEXT event upper-casing transform:
>>> def upper(stream): ... for mark, (kind, data, pos) in stream: ... if mark and kind is TEXT: ... yield mark, (kind, data.upper(), pos) ... else: ... yield mark, (kind, data, pos) >>> short_stream = HTML('<body>Some <em>test</em> text</body>', ... encoding='utf-8') >>> print(short_stream | Transformer('.//em/text()').apply(upper)) <body>Some <em>TEST</em> text</body>
-
attr
(name, value)¶ Add, replace or delete an attribute on selected elements.
If value evaulates to None the attribute will be deleted from the element:
>>> html = HTML('<html><head><title>Some Title</title></head>' ... '<body>Some <em class="before">body</em> <em>text</em>.</body>' ... '</html>', encoding='utf-8') >>> print(html | Transformer('body/em').attr('class', None)) <html><head><title>Some Title</title></head><body>Some <em>body</em> <em>text</em>.</body></html>
Otherwise the attribute will be set to value:
>>> print(html | Transformer('body/em').attr('class', 'emphasis')) <html><head><title>Some Title</title></head><body>Some <em class="emphasis">body</em> <em class="emphasis">text</em>.</body></html>
If value is a callable it will be called with the attribute name and the START event for the matching element. Its return value will then be used to set the attribute:
>>> def print_attr(name, event): ... attrs = event[1][1] ... print(attrs) ... return attrs.get(name) >>> print(html | Transformer('body/em').attr('class', print_attr)) Attrs([(QName('class'), u'before')]) Attrs() <html><head><title>Some Title</title></head><body>Some <em class="before">body</em> <em>text</em>.</body></html>
Parameters: - name – the name of the attribute
- value – the value that should be set for the attribute.
Return type: Transformer
-
before
(content)¶ Insert content before selection.
In this example we insert the word ‘emphasised’ before the <em> opening tag:
>>> html = HTML('<html><head><title>Some Title</title></head>' ... '<body>Some <em>body</em> text.</body></html>', ... encoding='utf-8') >>> print(html | Transformer('.//em').before('emphasised ')) <html><head><title>Some Title</title></head><body>Some emphasised <em>body</em> text.</body></html>
Parameters: content – Either a callable, an iterable of events, or a string to insert. Return type: Transformer
-
buffer
()¶ Buffer the entire stream (can consume a considerable amount of memory).
Useful in conjunction with copy(accumulate=True) and cut(accumulate=True) to ensure that all marked events in the entire stream are copied to the buffer before further transformations are applied.
For example, to move all <note> elements inside a <notes> tag at the top of the document:
>>> doc = HTML('<doc><notes></notes><body>Some <note>one</note> ' ... 'text <note>two</note>.</body></doc>', ... encoding='utf-8') >>> buffer = StreamBuffer() >>> print(doc | Transformer('body/note').cut(buffer, accumulate=True) ... .end().buffer().select('notes').prepend(buffer)) <doc><notes><note>one</note><note>two</note></notes><body>Some text .</body></doc>
-
copy
(buffer, accumulate=False)¶ Copy selection into buffer.
The buffer is replaced by each contiguous selection before being passed to the next transformation. If accumulate=True, further selections will be appended to the buffer rather than replacing it.
>>> from genshi.builder import tag >>> buffer = StreamBuffer() >>> html = HTML('<html><head><title>Some Title</title></head>' ... '<body>Some <em>body</em> text.</body></html>', ... encoding='utf-8') >>> print(html | Transformer('head/title/text()').copy(buffer) ... .end().select('body').prepend(tag.h1(buffer))) <html><head><title>Some Title</title></head><body><h1>Some Title</h1>Some <em>body</em> text.</body></html>
This example illustrates that only a single contiguous selection will be buffered:
>>> print(html | Transformer('head/title/text()').copy(buffer) ... .end().select('body/em').copy(buffer).end().select('body') ... .prepend(tag.h1(buffer))) <html><head><title>Some Title</title></head><body><h1>Some Title</h1>Some <em>body</em> text.</body></html> >>> print(buffer) <em>body</em>
Element attributes can also be copied for later use:
>>> html = HTML('<html><head><title>Some Title</title></head>' ... '<body><em>Some</em> <em class="before">body</em>' ... '<em>text</em>.</body></html>', ... encoding='utf-8') >>> buffer = StreamBuffer() >>> def apply_attr(name, entry): ... return list(buffer)[0][1][1].get('class') >>> print(html | Transformer('body/em[@class]/@class').copy(buffer) ... .end().buffer().select('body/em[not(@class)]') ... .attr('class', apply_attr)) <html><head><title>Some Title</title></head><body><em class="before">Some</em> <em class="before">body</em><em class="before">text</em>.</body></html>
Parameters: buffer – the StreamBuffer in which the selection should be stored Return type: Transformer Note: Copy (and cut) copy each individual selected object into the buffer before passing to the next transform. For example, the XPath *|text()
will select all elements and text, each instance of which will be copied to the buffer individually before passing to the next transform. This has implications for howStreamBuffer
objects can be used, so some experimentation may be required.
-
cut
(buffer, accumulate=False)¶ Copy selection into buffer and remove the selection from the stream.
>>> from genshi.builder import tag >>> buffer = StreamBuffer() >>> html = HTML('<html><head><title>Some Title</title></head>' ... '<body>Some <em>body</em> text.</body></html>', ... encoding='utf-8') >>> print(html | Transformer('.//em/text()').cut(buffer) ... .end().select('.//em').after(tag.h1(buffer))) <html><head><title>Some Title</title></head><body>Some <em/><h1>body</h1> text.</body></html>
Specifying accumulate=True, appends all selected intervals onto the buffer. Combining this with the .buffer() operation allows us operate on all copied events rather than per-segment. See the documentation on buffer() for more information.
Parameters: buffer – the StreamBuffer in which the selection should be stored Return type: Transformer Note: this transformation will buffer the entire input stream
-
empty
()¶ Empty selected elements of all content.
Example:
>>> html = HTML('<html><head><title>Some Title</title></head>' ... '<body>Some <em>body</em> text.</body></html>', ... encoding='utf-8') >>> print(html | Transformer('.//em').empty()) <html><head><title>Some Title</title></head><body>Some <em/> text.</body></html>
Return type: Transformer
-
end
()¶ End current selection, allowing all events to be selected.
Example:
>>> html = HTML('<body>Some <em>test</em> text</body>', encoding='utf-8') >>> print(html | Transformer('//em').end().trace()) ('OUTSIDE', ('START', (QName('body'), Attrs()), (None, 1, 0))) ('OUTSIDE', ('TEXT', u'Some ', (None, 1, 6))) ('OUTSIDE', ('START', (QName('em'), Attrs()), (None, 1, 11))) ('OUTSIDE', ('TEXT', u'test', (None, 1, 15))) ('OUTSIDE', ('END', QName('em'), (None, 1, 19))) ('OUTSIDE', ('TEXT', u' text', (None, 1, 24))) ('OUTSIDE', ('END', QName('body'), (None, 1, 29))) <body>Some <em>test</em> text</body>
Returns: the stream augmented by transformation marks Return type: Transformer
-
filter
(filter)¶ Apply a normal stream filter to the selection. The filter is called once for each contiguous block of marked events.
>>> from genshi.filters.html import HTMLSanitizer >>> html = HTML('<html><body>Some text<script>alert(document.cookie)' ... '</script> and some more text</body></html>', ... encoding='utf-8') >>> print(html | Transformer('body/*').filter(HTMLSanitizer())) <html><body>Some text and some more text</body></html>
Parameters: filter – The stream filter to apply. Return type: Transformer
-
invert
()¶ Invert selection so that marked events become unmarked, and vice versa.
Specificaly, all marks are converted to null marks, and all null marks are converted to OUTSIDE marks.
>>> html = HTML('<body>Some <em>test</em> text</body>', encoding='utf-8') >>> print(html | Transformer('//em').invert().trace()) ('OUTSIDE', ('START', (QName('body'), Attrs()), (None, 1, 0))) ('OUTSIDE', ('TEXT', u'Some ', (None, 1, 6))) (None, ('START', (QName('em'), Attrs()), (None, 1, 11))) (None, ('TEXT', u'test', (None, 1, 15))) (None, ('END', QName('em'), (None, 1, 19))) ('OUTSIDE', ('TEXT', u' text', (None, 1, 24))) ('OUTSIDE', ('END', QName('body'), (None, 1, 29))) <body>Some <em>test</em> text</body>
Return type: Transformer
-
map
(function, kind)¶ Applies a function to the
data
element of events ofkind
in the selection.>>> html = HTML('<html><head><title>Some Title</title></head>' ... '<body>Some <em>body</em> text.</body></html>', ... encoding='utf-8') >>> print(html | Transformer('head/title').map(unicode.upper, TEXT)) <html><head><title>SOME TITLE</title></head><body>Some <em>body</em> text.</body></html>
Parameters: - function – the function to apply
- kind – the kind of event the function should be applied to
Return type: Transformer
-
prepend
(content)¶ Insert content after the ENTER event of the selection.
Inserting some new text at the start of the <body>:
>>> html = HTML('<html><head><title>Some Title</title></head>' ... '<body>Some <em>body</em> text.</body></html>', ... encoding='utf-8') >>> print(html | Transformer('.//body').prepend('Some new body text. ')) <html><head><title>Some Title</title></head><body>Some new body text. Some <em>body</em> text.</body></html>
Parameters: content – Either a callable, an iterable of events, or a string to insert. Return type: Transformer
-
remove
()¶ Remove selection from the stream.
Example:
>>> html = HTML('<html><head><title>Some Title</title></head>' ... '<body>Some <em>body</em> text.</body></html>', ... encoding='utf-8') >>> print(html | Transformer('.//em').remove()) <html><head><title>Some Title</title></head><body>Some text.</body></html>
Return type: Transformer
-
rename
(name)¶ Rename matching elements.
>>> html = HTML('<html><body>Some text, some more text and ' ... '<b>some bold text</b></body></html>', ... encoding='utf-8') >>> print(html | Transformer('body/b').rename('strong')) <html><body>Some text, some more text and <strong>some bold text</strong></body></html>
-
replace
(content)¶ Replace selection with content.
>>> html = HTML('<html><head><title>Some Title</title></head>' ... '<body>Some <em>body</em> text.</body></html>', ... encoding='utf-8') >>> print(html | Transformer('.//title/text()').replace('New Title')) <html><head><title>New Title</title></head><body>Some <em>body</em> text.</body></html>
Parameters: content – Either a callable, an iterable of events, or a string to insert. Return type: Transformer
-
select
(path)¶ Mark events matching the given XPath expression, within the current selection.
>>> html = HTML('<body>Some <em>test</em> text</body>', encoding='utf-8') >>> print(html | Transformer().select('.//em').trace()) (None, ('START', (QName('body'), Attrs()), (None, 1, 0))) (None, ('TEXT', u'Some ', (None, 1, 6))) ('ENTER', ('START', (QName('em'), Attrs()), (None, 1, 11))) ('INSIDE', ('TEXT', u'test', (None, 1, 15))) ('EXIT', ('END', QName('em'), (None, 1, 19))) (None, ('TEXT', u' text', (None, 1, 24))) (None, ('END', QName('body'), (None, 1, 29))) <body>Some <em>test</em> text</body>
Parameters: path – an XPath expression (as string) or a Path instance Returns: the stream augmented by transformation marks Return type: Transformer
-
substitute
(pattern, replace, count=1)¶ Replace text matching a regular expression.
Refer to the documentation for
re.sub()
for details.>>> html = HTML('<html><body>Some text, some more text and ' ... '<b>some bold text</b>\n' ... '<i>some italicised text</i></body></html>', ... encoding='utf-8') >>> print(html | Transformer('body/b').substitute('(?i)some', 'SOME')) <html><body>Some text, some more text and <b>SOME bold text</b> <i>some italicised text</i></body></html> >>> tags = tag.html(tag.body('Some text, some more text and\n', ... Markup('<b>some bold text</b>'))) >>> print(tags.generate() | Transformer('body').substitute( ... '(?i)some', 'SOME')) <html><body>SOME text, some more text and <b>SOME bold text</b></body></html>
Parameters: - pattern – A regular expression object or string.
- replace – Replacement pattern.
- count – Number of replacements to make in each text fragment.
Return type: Transformer
-
trace
(prefix='', fileobj=None)¶ Print events as they pass through the transform.
>>> html = HTML('<body>Some <em>test</em> text</body>', encoding='utf-8') >>> print(html | Transformer('em').trace()) (None, ('START', (QName('body'), Attrs()), (None, 1, 0))) (None, ('TEXT', u'Some ', (None, 1, 6))) ('ENTER', ('START', (QName('em'), Attrs()), (None, 1, 11))) ('INSIDE', ('TEXT', u'test', (None, 1, 15))) ('EXIT', ('END', QName('em'), (None, 1, 19))) (None, ('TEXT', u' text', (None, 1, 24))) (None, ('END', QName('body'), (None, 1, 29))) <body>Some <em>test</em> text</body>
Parameters: - prefix – a string to prefix each event with in the output
- fileobj – the writable file-like object to write to; defaults to the standard output stream
Return type: Transformer
-
unwrap
()¶ Remove outermost enclosing elements from selection.
Example:
>>> html = HTML('<html><head><title>Some Title</title></head>' ... '<body>Some <em>body</em> text.</body></html>', ... encoding='utf-8') >>> print(html | Transformer('.//em').unwrap()) <html><head><title>Some Title</title></head><body>Some body text.</body></html>
Return type: Transformer
-
wrap
(element)¶ Wrap selection in an element.
>>> html = HTML('<html><head><title>Some Title</title></head>' ... '<body>Some <em>body</em> text.</body></html>', ... encoding='utf-8') >>> print(html | Transformer('.//em').wrap('strong')) <html><head><title>Some Title</title></head><body>Some <strong><em>body</em></strong> text.</body></html>
Parameters: element – either a tag name (as string) or an Element object Return type: Transformer
-
-
class
genshi.filters.transform.
StreamBuffer
¶ Stream event buffer used for cut and copy transformations.
-
append
(event)¶ Add an event to the buffer.
Parameters: event – the markup event to add
-
reset
()¶ Empty the buffer of events.
-
-
class
genshi.filters.transform.
InjectorTransformation
(content)¶ Abstract base class for transformations that inject content into a stream.
>>> class Top(InjectorTransformation): ... def __call__(self, stream): ... for event in self._inject(): ... yield event ... for event in stream: ... yield event >>> html = HTML('<body>Some <em>test</em> text</body>', encoding='utf-8') >>> print(html | Transformer('.//em').apply(Top('Prefix '))) Prefix <body>Some <em>test</em> text</body>