genshi.filters

Implementation of a number of stream filters.

genshi.filters.html

Implementation of a number of stream filters.

class genshi.filters.html.HTMLFormFiller(name=None, id=None, data=None, passwords=False)

A stream filter that can populate HTML forms from a dictionary of values.

>>> from genshi.input import HTML
>>> html = HTML('''<form>
...   <p><input type="text" name="foo" /></p>
... </form>''', encoding='utf-8')
>>> filler = HTMLFormFiller(data={'foo': 'bar'})
>>> print(html | filler)
<form>
  <p><input type="text" name="foo" value="bar"/></p>
</form>
class genshi.filters.html.HTMLSanitizer(safe_tags=frozenset(['em', 'pre', 'code', 'p', 'h2', 'h3', 'h1', 'h6', 'h4', 'h5', 'table', 'font', 'u', 'select', 'kbd', 'strong', 'span', 'sub', 'img', 'area', 'menu', 'tt', 'tr', 'tbody', 'label', 'hr', 'dfn', 'tfoot', 'th', 'sup', 'strike', 'input', 'td', 'samp', 'cite', 'thead', 'map', 'dl', 'blockquote', 'fieldset', 'option', 'form', 'acronym', 'big', 'dd', 'var', 'ol', 'abbr', 'br', 'address', 'optgroup', 'li', 'dt', 'ins', 'legend', 'a', 'b', 'center', 'textarea', 'colgroup', 'i', 'button', 'q', 'caption', 's', 'del', 'small', 'div', 'col', 'dir', 'ul']), safe_attrs=frozenset(['rev', 'prompt', 'color', 'colspan', 'accesskey', 'usemap', 'cols', 'accept', 'datetime', 'char', 'accept-charset', 'shape', 'href', 'hreflang', 'selected', 'frame', 'type', 'alt', 'nowrap', 'border', 'id', 'axis', 'compact', 'rows', 'checked', 'for', 'start', 'hspace', 'charset', 'ismap', 'label', 'target', 'bgcolor', 'readonly', 'rel', 'valign', 'scope', 'size', 'cellspacing', 'cite', 'media', 'multiple', 'src', 'rules', 'nohref', 'action', 'rowspan', 'abbr', 'span', 'method', 'height', 'class', 'enctype', 'lang', 'disabled', 'name', 'charoff', 'clear', 'summary', 'value', 'longdesc', 'headers', 'vspace', 'noshade', 'coords', 'width', 'maxlength', 'cellpadding', 'title', 'align', 'dir', 'tabindex']), safe_schemes=frozenset(['mailto', 'ftp', 'http', 'file', 'https', None]), uri_attrs=frozenset(['src', 'lowsrc', 'href', 'dynsrc', 'background', 'action']))

A filter that removes potentially dangerous HTML tags and attributes from the stream.

>>> from genshi import HTML
>>> html = HTML('<div><script>alert(document.cookie)</script></div>', encoding='utf-8')
>>> print(html | HTMLSanitizer())
<div/>

The default set of safe tags and attributes can be modified when the filter is instantiated. For example, to allow inline style attributes, the following instantation would work:

>>> html = HTML('<div style="background: #000"></div>', encoding='utf-8')
>>> sanitizer = HTMLSanitizer(safe_attrs=HTMLSanitizer.SAFE_ATTRS | set(['style']))
>>> print(html | sanitizer)
<div style="background: #000"/>

Note that even in this case, the filter does attempt to remove dangerous constructs from style attributes:

>>> html = HTML('<div style="background: url(javascript:void); color: #000"></div>', encoding='utf-8')
>>> print(html | sanitizer)
<div style="color: #000"/>

This handles HTML entities, unicode escapes in CSS and Javascript text, as well as a lot of other things. However, the style tag is still excluded by default because it is very hard for such sanitizing to be completely safe, especially considering how much error recovery current web browsers perform.

It also does some basic filtering of CSS properties that may be used for typical phishing attacks. For more sophisticated filtering, this class provides a couple of hooks that can be overridden in sub-classes.

Warn:Note that this special processing of CSS is currently only applied to style attributes, not style elements.
is_safe_css(propname, value)

Determine whether the given css property declaration is to be considered safe for inclusion in the output.

Parameters:
  • propname – the CSS property name
  • value – the value of the property
Returns:

whether the property value should be considered safe

Return type:

bool

Since:

version 0.6

is_safe_elem(tag, attrs)

Determine whether the given element should be considered safe for inclusion in the output.

Parameters:
  • tag (QName) – the tag name of the element
  • attrs (Attrs) – the element attributes
Returns:

whether the element should be considered safe

Return type:

bool

Since:

version 0.6

is_safe_uri(uri)

Determine whether the given URI is to be considered safe for inclusion in the output.

The default implementation checks whether the scheme of the URI is in the set of allowed URIs (safe_schemes).

>>> sanitizer = HTMLSanitizer()
>>> sanitizer.is_safe_uri('http://example.org/')
True
>>> sanitizer.is_safe_uri('javascript:alert(document.cookie)')
False
Parameters:uri – the URI to check
Returns:True if the URI can be considered safe, False otherwise
Return type:bool
Since:version 0.4.3
sanitize_css(text)

Remove potentially dangerous property declarations from CSS code.

In particular, properties using the CSS url() function with a scheme that is not considered safe are removed:

>>> sanitizer = HTMLSanitizer()
>>> sanitizer.sanitize_css(u'''
...   background: url(javascript:alert("foo"));
...   color: #000;
... ''')
[u'color: #000']

Also, the proprietary Internet Explorer function expression() is always stripped:

>>> sanitizer.sanitize_css(u'''
...   background: #fff;
...   color: #000;
...   width: e/**/xpression(alert("foo"));
... ''')
[u'background: #fff', u'color: #000']
Parameters:text – the CSS text; this is expected to be unicode and to not contain any character or numeric references
Returns:a list of declarations that are considered safe
Return type:list
Since:version 0.4.3

genshi.filters.i18n

Directives and utilities for internationalization and localization of templates.

since:version 0.4
note:Directives support added since version 0.6
class genshi.filters.i18n.Translator(translate=<gettext.NullTranslations instance>, ignore_tags=frozenset([QName('http://www.w3.org/1999/xhtml}style'), QName('http://www.w3.org/1999/xhtml}script'), QName('style'), QName('script')]), include_attrs=frozenset(['prompt', 'title', 'standby', 'summary', 'abbr', 'alt', 'label']), extract_text=True)

Can extract and translate localizable strings from markup streams and templates.

For example, assume the following template:

>>> tmpl = MarkupTemplate('''<html xmlns:py="http://genshi.edgewall.org/">
...   <head>
...     <title>Example</title>
...   </head>
...   <body>
...     <h1>Example</h1>
...     <p>${_("Hello, %(name)s") % dict(name=username)}</p>
...   </body>
... </html>''', filename='example.html')

For demonstration, we define a dummy gettext-style function with a hard-coded translation table, and pass that to the Translator initializer:

>>> def pseudo_gettext(string):
...     return {
...         'Example': 'Beispiel',
...         'Hello, %(name)s': 'Hallo, %(name)s'
...     }[string]
>>> translator = Translator(pseudo_gettext)

Next, the translator needs to be prepended to any already defined filters on the template:

>>> tmpl.filters.insert(0, translator)

When generating the template output, our hard-coded translations should be applied as expected:

>>> print(tmpl.generate(username='Hans', _=pseudo_gettext))
<html>
  <head>
    <title>Beispiel</title>
  </head>
  <body>
    <h1>Beispiel</h1>
    <p>Hallo, Hans</p>
  </body>
</html>

Note that elements defining xml:lang attributes that do not contain variable expressions are ignored by this filter. That can be used to exclude specific parts of a template from being extracted and translated.

extract(stream, gettext_functions=('_', 'gettext', 'ngettext', 'dgettext', 'dngettext', 'ugettext', 'ungettext'), search_text=True, comment_stack=None)

Extract localizable strings from the given template stream.

For every string found, this function yields a (lineno, function, message, comments) tuple, where:

  • lineno is the number of the line on which the string was found,
  • function is the name of the gettext function used (if the string was extracted from embedded Python code), and
  • message is the string itself (a unicode object, or a tuple of unicode objects for functions with multiple string arguments).
  • comments is a list of comments related to the message, extracted from i18n:comment attributes found in the markup
>>> tmpl = MarkupTemplate('''<html xmlns:py="http://genshi.edgewall.org/">
...   <head>
...     <title>Example</title>
...   </head>
...   <body>
...     <h1>Example</h1>
...     <p>${_("Hello, %(name)s") % dict(name=username)}</p>
...     <p>${ngettext("You have %d item", "You have %d items", num)}</p>
...   </body>
... </html>''', filename='example.html')
>>> for line, func, msg, comments in Translator().extract(tmpl.stream):
...    print('%d, %r, %r' % (line, func, msg))
3, None, u'Example'
6, None, u'Example'
7, '_', u'Hello, %(name)s'
8, 'ngettext', (u'You have %d item', u'You have %d items', None)
Parameters:
  • stream – the event stream to extract strings from; can be a regular stream or a template stream
  • gettext_functions – a sequence of function names that should be treated as gettext-style localization functions
  • search_text – whether the content of text nodes should be extracted (used internally)
Note:

Changed in 0.4.1: For a function with multiple string arguments (such as ngettext), a single item with a tuple of strings is yielded, instead an item for each string argument.

Note:

Changed in 0.6: The returned tuples now include a fourth element, which is a list of comments for the translator.

setup(template)

Convenience function to register the Translator filter and the related directives with the given template.

Parameters:template – a Template instance
genshi.filters.i18n.extract(fileobj, keywords, comment_tags, options)

Babel extraction method for Genshi templates.

Parameters:
  • fileobj – the file-like object the messages should be extracted from
  • keywords – a list of keywords (i.e. function names) that should be recognized as translation functions
  • comment_tags – a list of translator tags to search for and include in the results
  • options – a dictionary of additional options (optional)
Returns:

an iterator over (lineno, funcname, message, comments) tuples

Return type:

iterator

genshi.filters.transform

A filter for functional-style transformations of markup streams.

The Transformer filter provides a variety of transformations that can be applied to parts of streams that match given XPath expressions. These transformations can be chained to achieve results that would be comparitively tedious to achieve by writing stream filters by hand. The approach of chaining node selection and transformation has been inspired by the jQuery Javascript library.

For example, the following transformation removes the <title> element from the <head> of the input document:

>>> from genshi.builder import tag
>>> html = HTML('''<html>
...  <head><title>Some Title</title></head>
...  <body>
...    Some <em>body</em> text.
...  </body>
... </html>''',
... encoding='utf-8')
>>> print(html | Transformer('body/em').map(unicode.upper, TEXT)
...                                    .unwrap().wrap(tag.u))
<html>
  <head><title>Some Title</title></head>
  <body>
    Some <u>BODY</u> text.
  </body>
</html>

The Transformer support a large number of useful transformations out of the box, but custom transformations can be added easily.

since:version 0.5
class genshi.filters.transform.Transformer(path='.')

Stream filter that can apply a variety of different transformations to a stream.

This is achieved by selecting the events to be transformed using XPath, then applying the transformations to the events matched by the path expression. Each marked event is in the form (mark, (kind, data, pos)), where mark can be any of ENTER, INSIDE, EXIT, OUTSIDE, or None.

The first three marks match START and END events, and any events contained INSIDE any selected XML/HTML element. A non-element match outside a START/END container (e.g. text()) will yield an OUTSIDE mark.

>>> html = HTML('<html><head><title>Some Title</title></head>'
...             '<body>Some <em>body</em> text.</body></html>',
...             encoding='utf-8')

Transformations act on selected stream events matching an XPath expression. Here’s an example of removing some markup (the title, in this case) selected by an expression:

>>> print(html | Transformer('head/title').remove())
<html><head/><body>Some <em>body</em> text.</body></html>

Inserted content can be passed in the form of a string, or a markup event stream, which includes streams generated programmatically via the builder module:

>>> from genshi.builder import tag
>>> print(html | Transformer('body').prepend(tag.h1('Document Title')))
<html><head><title>Some Title</title></head><body><h1>Document
Title</h1>Some <em>body</em> text.</body></html>

Each XPath expression determines the set of tags that will be acted upon by subsequent transformations. In this example we select the <title> text, copy it into a buffer, then select the <body> element and paste the copied text into the body as <h1> enclosed text:

>>> buffer = StreamBuffer()
>>> print(html | Transformer('head/title/text()').copy(buffer)
...     .end().select('body').prepend(tag.h1(buffer)))
<html><head><title>Some Title</title></head><body><h1>Some Title</h1>Some
<em>body</em> text.</body></html>

Transformations can also be assigned and reused, although care must be taken when using buffers, to ensure that buffers are cleared between transforms:

>>> emphasis = Transformer('body//em').attr('class', 'emphasis')
>>> print(html | emphasis)
<html><head><title>Some Title</title></head><body>Some <em
class="emphasis">body</em> text.</body></html>
after(content)

Insert content after selection.

Here, we insert some text after the </em> closing tag:

>>> html = HTML('<html><head><title>Some Title</title></head>'
...             '<body>Some <em>body</em> text.</body></html>',
...             encoding='utf-8')
>>> print(html | Transformer('.//em').after(' rock'))
<html><head><title>Some Title</title></head><body>Some <em>body</em>
rock text.</body></html>
Parameters:content – Either a callable, an iterable of events, or a string to insert.
Return type:Transformer
append(content)

Insert content before the END event of the selection.

>>> html = HTML('<html><head><title>Some Title</title></head>'
...             '<body>Some <em>body</em> text.</body></html>',
...             encoding='utf-8')
>>> print(html | Transformer('.//body').append(' Some new body text.'))
<html><head><title>Some Title</title></head><body>Some <em>body</em>
text. Some new body text.</body></html>
Parameters:content – Either a callable, an iterable of events, or a string to insert.
Return type:Transformer
apply(function)

Apply a transformation to the stream.

Transformations can be chained, similar to stream filters. Any callable accepting a marked stream can be used as a transform.

As an example, here is a simple TEXT event upper-casing transform:

>>> def upper(stream):
...     for mark, (kind, data, pos) in stream:
...         if mark and kind is TEXT:
...             yield mark, (kind, data.upper(), pos)
...         else:
...             yield mark, (kind, data, pos)
>>> short_stream = HTML('<body>Some <em>test</em> text</body>',
...                      encoding='utf-8')
>>> print(short_stream | Transformer('.//em/text()').apply(upper))
<body>Some <em>TEST</em> text</body>
attr(name, value)

Add, replace or delete an attribute on selected elements.

If value evaulates to None the attribute will be deleted from the element:

>>> html = HTML('<html><head><title>Some Title</title></head>'
...             '<body>Some <em class="before">body</em> <em>text</em>.</body>'
...             '</html>', encoding='utf-8')
>>> print(html | Transformer('body/em').attr('class', None))
<html><head><title>Some Title</title></head><body>Some <em>body</em>
<em>text</em>.</body></html>

Otherwise the attribute will be set to value:

>>> print(html | Transformer('body/em').attr('class', 'emphasis'))
<html><head><title>Some Title</title></head><body>Some <em
class="emphasis">body</em> <em class="emphasis">text</em>.</body></html>

If value is a callable it will be called with the attribute name and the START event for the matching element. Its return value will then be used to set the attribute:

>>> def print_attr(name, event):
...     attrs = event[1][1]
...     print(attrs)
...     return attrs.get(name)
>>> print(html | Transformer('body/em').attr('class', print_attr))
Attrs([(QName('class'), u'before')])
Attrs()
<html><head><title>Some Title</title></head><body>Some <em
class="before">body</em> <em>text</em>.</body></html>
Parameters:
  • name – the name of the attribute
  • value – the value that should be set for the attribute.
Return type:

Transformer

before(content)

Insert content before selection.

In this example we insert the word ‘emphasised’ before the <em> opening tag:

>>> html = HTML('<html><head><title>Some Title</title></head>'
...             '<body>Some <em>body</em> text.</body></html>',
...             encoding='utf-8')
>>> print(html | Transformer('.//em').before('emphasised '))
<html><head><title>Some Title</title></head><body>Some emphasised
<em>body</em> text.</body></html>
Parameters:content – Either a callable, an iterable of events, or a string to insert.
Return type:Transformer
buffer()

Buffer the entire stream (can consume a considerable amount of memory).

Useful in conjunction with copy(accumulate=True) and cut(accumulate=True) to ensure that all marked events in the entire stream are copied to the buffer before further transformations are applied.

For example, to move all <note> elements inside a <notes> tag at the top of the document:

>>> doc = HTML('<doc><notes></notes><body>Some <note>one</note> '
...            'text <note>two</note>.</body></doc>',
...             encoding='utf-8')
>>> buffer = StreamBuffer()
>>> print(doc | Transformer('body/note').cut(buffer, accumulate=True)
...     .end().buffer().select('notes').prepend(buffer))
<doc><notes><note>one</note><note>two</note></notes><body>Some  text
.</body></doc>
copy(buffer, accumulate=False)

Copy selection into buffer.

The buffer is replaced by each contiguous selection before being passed to the next transformation. If accumulate=True, further selections will be appended to the buffer rather than replacing it.

>>> from genshi.builder import tag
>>> buffer = StreamBuffer()
>>> html = HTML('<html><head><title>Some Title</title></head>'
...             '<body>Some <em>body</em> text.</body></html>',
...             encoding='utf-8')
>>> print(html | Transformer('head/title/text()').copy(buffer)
...     .end().select('body').prepend(tag.h1(buffer)))
<html><head><title>Some Title</title></head><body><h1>Some
Title</h1>Some <em>body</em> text.</body></html>

This example illustrates that only a single contiguous selection will be buffered:

>>> print(html | Transformer('head/title/text()').copy(buffer)
...     .end().select('body/em').copy(buffer).end().select('body')
...     .prepend(tag.h1(buffer)))
<html><head><title>Some Title</title></head><body><h1>Some
Title</h1>Some <em>body</em> text.</body></html>
>>> print(buffer)
<em>body</em>

Element attributes can also be copied for later use:

>>> html = HTML('<html><head><title>Some Title</title></head>'
...             '<body><em>Some</em> <em class="before">body</em>'
...             '<em>text</em>.</body></html>',
...             encoding='utf-8')
>>> buffer = StreamBuffer()
>>> def apply_attr(name, entry):
...     return list(buffer)[0][1][1].get('class')
>>> print(html | Transformer('body/em[@class]/@class').copy(buffer)
...     .end().buffer().select('body/em[not(@class)]')
...     .attr('class', apply_attr))
<html><head><title>Some Title</title></head><body><em
class="before">Some</em> <em class="before">body</em><em
class="before">text</em>.</body></html>
Parameters:buffer – the StreamBuffer in which the selection should be stored
Return type:Transformer
Note:Copy (and cut) copy each individual selected object into the buffer before passing to the next transform. For example, the XPath *|text() will select all elements and text, each instance of which will be copied to the buffer individually before passing to the next transform. This has implications for how StreamBuffer objects can be used, so some experimentation may be required.
cut(buffer, accumulate=False)

Copy selection into buffer and remove the selection from the stream.

>>> from genshi.builder import tag
>>> buffer = StreamBuffer()
>>> html = HTML('<html><head><title>Some Title</title></head>'
...             '<body>Some <em>body</em> text.</body></html>',
...             encoding='utf-8')
>>> print(html | Transformer('.//em/text()').cut(buffer)
...     .end().select('.//em').after(tag.h1(buffer)))
<html><head><title>Some Title</title></head><body>Some
<em/><h1>body</h1> text.</body></html>

Specifying accumulate=True, appends all selected intervals onto the buffer. Combining this with the .buffer() operation allows us operate on all copied events rather than per-segment. See the documentation on buffer() for more information.

Parameters:buffer – the StreamBuffer in which the selection should be stored
Return type:Transformer
Note:this transformation will buffer the entire input stream
empty()

Empty selected elements of all content.

Example:

>>> html = HTML('<html><head><title>Some Title</title></head>'
...             '<body>Some <em>body</em> text.</body></html>',
...             encoding='utf-8')
>>> print(html | Transformer('.//em').empty())
<html><head><title>Some Title</title></head><body>Some <em/>
text.</body></html>
Return type:Transformer
end()

End current selection, allowing all events to be selected.

Example:

>>> html = HTML('<body>Some <em>test</em> text</body>', encoding='utf-8')
>>> print(html | Transformer('//em').end().trace())
('OUTSIDE', ('START', (QName('body'), Attrs()), (None, 1, 0)))
('OUTSIDE', ('TEXT', u'Some ', (None, 1, 6)))
('OUTSIDE', ('START', (QName('em'), Attrs()), (None, 1, 11)))
('OUTSIDE', ('TEXT', u'test', (None, 1, 15)))
('OUTSIDE', ('END', QName('em'), (None, 1, 19)))
('OUTSIDE', ('TEXT', u' text', (None, 1, 24)))
('OUTSIDE', ('END', QName('body'), (None, 1, 29)))
<body>Some <em>test</em> text</body>
Returns:the stream augmented by transformation marks
Return type:Transformer
filter(filter)

Apply a normal stream filter to the selection. The filter is called once for each contiguous block of marked events.

>>> from genshi.filters.html import HTMLSanitizer
>>> html = HTML('<html><body>Some text<script>alert(document.cookie)'
...             '</script> and some more text</body></html>',
...             encoding='utf-8')
>>> print(html | Transformer('body/*').filter(HTMLSanitizer()))
<html><body>Some text and some more text</body></html>
Parameters:filter – The stream filter to apply.
Return type:Transformer
invert()

Invert selection so that marked events become unmarked, and vice versa.

Specificaly, all marks are converted to null marks, and all null marks are converted to OUTSIDE marks.

>>> html = HTML('<body>Some <em>test</em> text</body>', encoding='utf-8')
>>> print(html | Transformer('//em').invert().trace())
('OUTSIDE', ('START', (QName('body'), Attrs()), (None, 1, 0)))
('OUTSIDE', ('TEXT', u'Some ', (None, 1, 6)))
(None, ('START', (QName('em'), Attrs()), (None, 1, 11)))
(None, ('TEXT', u'test', (None, 1, 15)))
(None, ('END', QName('em'), (None, 1, 19)))
('OUTSIDE', ('TEXT', u' text', (None, 1, 24)))
('OUTSIDE', ('END', QName('body'), (None, 1, 29)))
<body>Some <em>test</em> text</body>
Return type:Transformer
map(function, kind)

Applies a function to the data element of events of kind in the selection.

>>> html = HTML('<html><head><title>Some Title</title></head>'
...               '<body>Some <em>body</em> text.</body></html>',
...             encoding='utf-8')
>>> print(html | Transformer('head/title').map(unicode.upper, TEXT))
<html><head><title>SOME TITLE</title></head><body>Some <em>body</em>
text.</body></html>
Parameters:
  • function – the function to apply
  • kind – the kind of event the function should be applied to
Return type:

Transformer

prepend(content)

Insert content after the ENTER event of the selection.

Inserting some new text at the start of the <body>:

>>> html = HTML('<html><head><title>Some Title</title></head>'
...             '<body>Some <em>body</em> text.</body></html>',
...             encoding='utf-8')
>>> print(html | Transformer('.//body').prepend('Some new body text. '))
<html><head><title>Some Title</title></head><body>Some new body text.
Some <em>body</em> text.</body></html>
Parameters:content – Either a callable, an iterable of events, or a string to insert.
Return type:Transformer
remove()

Remove selection from the stream.

Example:

>>> html = HTML('<html><head><title>Some Title</title></head>'
...             '<body>Some <em>body</em> text.</body></html>',
...             encoding='utf-8')
>>> print(html | Transformer('.//em').remove())
<html><head><title>Some Title</title></head><body>Some
text.</body></html>
Return type:Transformer
rename(name)

Rename matching elements.

>>> html = HTML('<html><body>Some text, some more text and '
...             '<b>some bold text</b></body></html>',
...             encoding='utf-8')
>>> print(html | Transformer('body/b').rename('strong'))
<html><body>Some text, some more text and <strong>some bold text</strong></body></html>
replace(content)

Replace selection with content.

>>> html = HTML('<html><head><title>Some Title</title></head>'
...             '<body>Some <em>body</em> text.</body></html>',
...             encoding='utf-8')
>>> print(html | Transformer('.//title/text()').replace('New Title'))
<html><head><title>New Title</title></head><body>Some <em>body</em>
text.</body></html>
Parameters:content – Either a callable, an iterable of events, or a string to insert.
Return type:Transformer
select(path)

Mark events matching the given XPath expression, within the current selection.

>>> html = HTML('<body>Some <em>test</em> text</body>', encoding='utf-8')
>>> print(html | Transformer().select('.//em').trace())
(None, ('START', (QName('body'), Attrs()), (None, 1, 0)))
(None, ('TEXT', u'Some ', (None, 1, 6)))
('ENTER', ('START', (QName('em'), Attrs()), (None, 1, 11)))
('INSIDE', ('TEXT', u'test', (None, 1, 15)))
('EXIT', ('END', QName('em'), (None, 1, 19)))
(None, ('TEXT', u' text', (None, 1, 24)))
(None, ('END', QName('body'), (None, 1, 29)))
<body>Some <em>test</em> text</body>
Parameters:path – an XPath expression (as string) or a Path instance
Returns:the stream augmented by transformation marks
Return type:Transformer
substitute(pattern, replace, count=1)

Replace text matching a regular expression.

Refer to the documentation for re.sub() for details.

>>> html = HTML('<html><body>Some text, some more text and '
...             '<b>some bold text</b>\n'
...             '<i>some italicised text</i></body></html>',
...             encoding='utf-8')
>>> print(html | Transformer('body/b').substitute('(?i)some', 'SOME'))
<html><body>Some text, some more text and <b>SOME bold text</b>
<i>some italicised text</i></body></html>
>>> tags = tag.html(tag.body('Some text, some more text and\n',
...      Markup('<b>some bold text</b>')))
>>> print(tags.generate() | Transformer('body').substitute(
...     '(?i)some', 'SOME'))
<html><body>SOME text, some more text and
<b>SOME bold text</b></body></html>
Parameters:
  • pattern – A regular expression object or string.
  • replace – Replacement pattern.
  • count – Number of replacements to make in each text fragment.
Return type:

Transformer

trace(prefix='', fileobj=None)

Print events as they pass through the transform.

>>> html = HTML('<body>Some <em>test</em> text</body>', encoding='utf-8')
>>> print(html | Transformer('em').trace())
(None, ('START', (QName('body'), Attrs()), (None, 1, 0)))
(None, ('TEXT', u'Some ', (None, 1, 6)))
('ENTER', ('START', (QName('em'), Attrs()), (None, 1, 11)))
('INSIDE', ('TEXT', u'test', (None, 1, 15)))
('EXIT', ('END', QName('em'), (None, 1, 19)))
(None, ('TEXT', u' text', (None, 1, 24)))
(None, ('END', QName('body'), (None, 1, 29)))
<body>Some <em>test</em> text</body>
Parameters:
  • prefix – a string to prefix each event with in the output
  • fileobj – the writable file-like object to write to; defaults to the standard output stream
Return type:

Transformer

unwrap()

Remove outermost enclosing elements from selection.

Example:

>>> html = HTML('<html><head><title>Some Title</title></head>'
...             '<body>Some <em>body</em> text.</body></html>',
...             encoding='utf-8')
>>> print(html | Transformer('.//em').unwrap())
<html><head><title>Some Title</title></head><body>Some body
text.</body></html>
Return type:Transformer
wrap(element)

Wrap selection in an element.

>>> html = HTML('<html><head><title>Some Title</title></head>'
...             '<body>Some <em>body</em> text.</body></html>',
...             encoding='utf-8')
>>> print(html | Transformer('.//em').wrap('strong'))
<html><head><title>Some Title</title></head><body>Some
<strong><em>body</em></strong> text.</body></html>
Parameters:element – either a tag name (as string) or an Element object
Return type:Transformer
class genshi.filters.transform.StreamBuffer

Stream event buffer used for cut and copy transformations.

append(event)

Add an event to the buffer.

Parameters:event – the markup event to add
reset()

Empty the buffer of events.

class genshi.filters.transform.InjectorTransformation(content)

Abstract base class for transformations that inject content into a stream.

>>> class Top(InjectorTransformation):
...     def __call__(self, stream):
...         for event in self._inject():
...             yield event
...         for event in stream:
...             yield event
>>> html = HTML('<body>Some <em>test</em> text</body>', encoding='utf-8')
>>> print(html | Transformer('.//em').apply(Top('Prefix ')))
Prefix <body>Some <em>test</em> text</body>