"""HTML40 -- generate HTML conformant to the 4.0 standard. See:
http://www.w3.org/TR/REC-html40/
All HTML 4.0 elements are implemented except for a few which are
deprecated. All attributes should be implemented. HTML is generally
case-insensitive, whereas Python is not. All elements have been coded
in UPPER CASE, with attributes in lower case. General usage:
e = ELEMENT(*content, **attr)
i.e., the positional arguments become the content of the element, and
the keyword arguments set element attributes. All attributes MUST be
specified with keyword arguments, and the content MUST be a series of
positional arguments; if you use content="spam", it will set this as
the attribute content, not as the element content. Multiple content
arguments are simply joined with no separator. Example:
>>> t = TABLE(TR(TH('SPAM'), TH('EGGS')), TR(TD('foo','bar', colspan=2)) )
>>> print t
As with HTMLgen and other HTML generators, you can print the object
and it makes one monster string and writes that to stdout. Unlike
HTMLgen, these objects all have a writeto(fp=stdout, indent=0,
perlevel=2) method. This method may save memory, and it might be
faster possibly (many fewer string joins), plus you get progressive
output. If you want to alter the indentation on the string output,
try:
>> print t.__str__(indent=5, perlevel=6)
The output from either method (__str__ or writeto) SHOULD be the
lexically equivalent, regardless of indentation; not all elements are
made pretty, only those which are insensitive to the addition of
whitespace before or after the start/end tags. If you don't like the
indenting, use writeto(perlevel=0) (or __str__(perlevel=0)).
Element attributes can be set through the normal Python dictionary
operations on the object (they are not Python attributes).
Note: There are a few HTML attributes with a dash in them. In these
cases, substitute an underscore and the output will be corrected. HTML
4.0 also defines a class attribute, which conflicts with Python's
class statement; use klass instead. The new LABEL element has a for
attribute which also conflicts; use label_for instead.
>>> print META(http_equiv='refresh',content='60;/index2.html')
The output order of attributes is indeterminate (based on hash order),
but this is of no particular importance. The extent of attribute
checking is limited to checking that the attribute is legal for that
element; the values themselves are not checked, but must be
convertible to a string. The content items must be convertible to
strings and/or have a writeto() method. Some elements may have a few
attributes they shouldn't, particularly those which use intrinsic
events.
Valid attributes are defined for each element with dictionaries, with
the keys being the attributes. If the value is false, it's a boolean;
otherwise the value is printed.
Subclassing: If all you need to do is have some defaults, override the
defaults dictionary. You will also need to set name to the correct
element name. Example:
>>> class Refresh(META): defaults = {'http_equiv': 'refresh'}; name = 'META'
...
>>> print Refresh(content='10; /index2.html')
Weirdness with Netscape 4.x: It recognizes a border attribute for the
FRAMESET element, though it is not defined in the HTML 4.0 spec. It
seems to recognize the frameborder attribute for FRAME, but border
only changes from a 3D shaded border to a flat, unresizable grey
border. Because of this, there is a border attribute defined for
FRAMESET. Similarly, HTML 4.0 does not define a border attribute for
INPUT (for use with type="image"), but one has been added anyway.
Historical notes: My first experience with an HTML generator was with
the one which comes with "Internet Programming with Python" by Aaron
Watters, Guido van Rossum, and James C. Ahlstrom. I hate to dis it,
but the thing really drove me nuts after awhile. Horrible to debug
anything, but maybe my understanding of it was incomplete. I then
discovered HTMLgen by Robin Friedrich:
http://starship.skyport.net/crew/friedrich/HTMLgen/html/main.html
It worked much better, for me at least, good enough for a major
project. There were, however, some frustrations: Subclassing could
sometimes be difficult (in fairness, I think that was by design), and
there were some missing features I wanted. Plus the thing's huge, as
Python modules go. These are relatively minor gripes, and if you don't
like this module, definitely use HTMLgen.
Mainly I did this because the methodology to do it just sorta dawned
on me. The result is, I think, some pretty clean code. Really, there's
hardly any actual code at all. Hey, and when was the last time saw a
subclass inherit from only one parent class with only a pass statement
and no attributes defined? There's 27 of them here. There's almost no
logic to it at all; it's pretty much all driven by dictionaries.
Yes, there are a number of features missing which are present in
HTMLgen, namely the document classes. All the high-level abstractions
are going in another module or two.
"""
__version__ = "$Revision: 1.8 $"[11:-4]
import string
from string import lower, join, replace
from sys import stdout
coreattrs = {'id': 1, 'klass': 1, 'style': 1, 'title': 1}
i18n = {'lang': 1, 'dir': 1}
intrinsic_events = {'onload': 1, 'onunload': 1, 'onclick': 1,
'ondblclick': 1, 'onmousedown': 1, 'onmouseup': 1,
'onmouseover': 1, 'onmousemove': 1, 'onmouseout': 1,
'onfocus': 1, 'onblur': 1, 'onkeypress': 1,
'onkeydown': 1, 'onkeyup': 1, 'onsubmit': 1,
'onreset': 1, 'onselect': 1, 'onchange': 1 }
attrs = coreattrs.copy()
attrs.update(i18n)
attrs.update(intrinsic_events)
alternate_text = {'alt': 1}
image_maps = {'shape': 1, 'coords': 1}
anchor_reference = {'href': 1}
target_frame_info = {'target': 1}
tabbing_navigation = {'tabindex': 1}
access_keys = {'accesskey': 1}
tabbing_and_access = tabbing_navigation.copy()
tabbing_and_access.update(access_keys)
visual_presentation = {'height': 1, 'width': 1, 'border': 1, 'align': 1,
'hspace': 1, 'vspace': 1}
cellhalign = {'align': 1, 'char': 1, 'charoff': 1}
cellvalign = {'valign': 1}
font_modifiers = {'size': 1, 'color': 1, 'face': 1}
links_and_anchors = {'href': 1, 'hreflang': 1, 'type': 1, 'rel': 1, 'rev': 1}
borders_and_rules = {'frame': 1, 'rules': 1, 'border': 1}
from SGML import Markup, Comment
from XML import XMLPI
DOCTYPE = Markup("DOCTYPE",
'HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" ' \
'"http://www.w3.org/TR/REC-html40/loose.dtd"')
DOCTYPE_frameset = Markup("DOCTYPE",
'HTML PUBLIC "-//W3C//DTD HTML 4.0 Frameset//EN" ' \
'"http://www.w3.org/TR/REC-HTML/frameset.dtd"')
class Element(XMLPI):
defaults = {}
attr_translations = {'klass': 'class',
'label_for': 'for',
'http_equiv': 'http-equiv',
'accept_charset': 'accept-charset'}
def __init__(self, *content, **attr):
self.dict = {}
if not hasattr(self, 'name'): self.name = self.__class__.__name__
if self.defaults: self.update(self.defaults)
self.update(attr)
if not self.content_model and content:
raise TypeError, "No content for this element"
self.content = list(content)
def update(self, d2):
for k, v in d2.items(): self[k] = v
def __setitem__(self, k, v):
kl = lower(k)
if self.attlist.has_key(kl): self.dict[kl] = v
else: raise KeyError, "Invalid attribute for this element"
start_tag_string = "<%s %s>"
start_tag_no_attr_string = "<%s>"
end_tag_string = "%s>"
def str_attribute(self, k):
return self.attlist.get(k, 1) and '%s="%s"' % \
(self.attr_translations.get(k, k), str(self[k])) \
or self[k] and k or ''
def start_tag(self):
a = self.str_attribute_list()
return a and self.start_tag_string % (self.name, a) \
or self.start_tag_no_attr_string % self.name
def end_tag(self):
return self.content_model and self.end_tag_string % self.name or ''
class PrettyTagsMixIn:
def writeto(self, fp=stdout, indent=0, perlevel=2):
myindent = '\n' + " "*indent
fp.write(myindent+self.start_tag())
for c in self.content:
if hasattr(c, 'writeto'):
getattr(c, 'writeto')(fp, indent+perlevel, perlevel)
else:
fp.write(str(c))
fp.write(self.end_tag())
def __str__(self, indent=0, perlevel=2):
myindent = (perlevel and '\n' or '') + " "*indent
s = [myindent, self.start_tag()]
for c in self.content:
try: s.append(apply(c.__str__, (indent+perlevel, perlevel)))
except: s.append(str(c))
s.append(self.end_tag())
return join(s,'')
class CommonElement(Element): attlist = attrs
class PCElement(PrettyTagsMixIn, CommonElement): pass
class A(CommonElement):
attlist = {'name': 1, 'charset': 1}
attlist.update(CommonElement.attlist)
attlist.update(links_and_anchors)
attlist.update(image_maps)
attlist.update(target_frame_info)
attlist.update(tabbing_and_access)
class ABBR(CommonElement): pass
class ACRONYM(CommonElement): pass
class CITE(CommonElement): pass
class CODE(CommonElement): pass
class DFN(CommonElement): pass
class EM(CommonElement): pass
class KBD(CommonElement): pass
class PRE(CommonElement): pass
class SAMP(CommonElement): pass
class STRONG(CommonElement): pass
class VAR(CommonElement): pass
class ADDRESS(CommonElement): pass
class B(CommonElement): pass
class BIG(CommonElement): pass
class I(CommonElement): pass
class S(CommonElement): pass
class SMALL(CommonElement): pass
class STRIKE(CommonElement): pass
class TT(CommonElement): pass
class U(CommonElement): pass
class SUB(CommonElement): pass
class SUP(CommonElement): pass
class DD(PCElement): pass
class DL(PCElement): pass
class DT(PCElement): pass
class NOFRAMES(PCElement): pass
class NOSCRIPTS(PCElement): pass
class P(PCElement): pass
class AREA(PCElement):
attlist = {'name': 1, 'nohref': 0}
attlist.update(PCElement.attlist)
attlist.update(image_maps)
attlist.update(anchor_reference)
attlist.update(tabbing_and_access)
attlist.update(alternate_text)
class MAP(AREA): pass
class BASE(PrettyTagsMixIn, Element):
attlist = anchor_reference.copy()
attlist.update(target_frame_info)
content_model = None
class BDO(Element):
attlist = coreattrs.copy()
attlist.update(i18n)
class BLOCKQUOTE(CommonElement):
attlist = {'cite': 1}
attlist.update(CommonElement.attlist)
class Q(BLOCKQUOTE): pass
class BR(PrettyTagsMixIn, Element):
attlist = coreattrs
content_model = None
class BUTTON(CommonElement):
attlist = {'name': 1, 'value': 1, 'type': 1, 'disabled': 0}
attlist.update(CommonElement.attlist)
attlist.update(tabbing_and_access)
class CAPTION(Element):
attlist = {'align': 1}
attlist.update(attrs)
class COLGROUP(PCElement):
attlist = {'span': 1, 'width': 1}
attlist.update(PCElement.attlist)
attlist.update(cellhalign)
attlist.update(cellvalign)
class COL(COLGROUP): content_model = None
class DEL(Element):
attlist = {'cite': 1, 'datetime': 1}
attlist.update(attrs)
class INS(DEL): pass
class FIELDSET(PCElement): pass
class LEGEND(PCElement):
attlist = {'align': 1}
attlist.update(PCElement.attlist)
attlist.update(access_keys)
class BASEFONT(Element):
attlist = {'id': 1}
attlist.update(font_modifiers)
content_model = None
class FONT(Element):
attlist = font_modifiers.copy()
attlist.update(coreattrs)
attlist.update(i18n)
class FORM(PCElement):
attlist = {'action': 1, 'method': 1, 'enctype': 1, 'accept_charset': 1,
'target': 1}
attlist.update(PCElement.attlist)
class FRAME(PrettyTagsMixIn, Element):
attlist = {'longdesc': 1, 'name': 1, 'src': 1, 'frameborder': 1,
'marginwidth': 1, 'marginheight': 1, 'noresize': 0,
'scrolling': 1}
attlist.update(coreattrs)
content_model = None
class FRAMESET(PrettyTagsMixIn, Element):
attlist = {'rows': 1, 'cols': 1, 'border': 1}
attlist.update(coreattrs)
attlist.update(intrinsic_events)
class Heading(PCElement):
attlist = {'align': 1}
attlist.update(attrs)
def __init__(self, level, *content, **attr):
self.level = level
apply(PCElement.__init__, (self,)+content, attr)
def start_tag(self):
a = self.str_attribute_list()
return a and "" % (self.level, a) or "" % self.level
def end_tag(self):
return self.content_model and "\n" % self.level or ''
class HEAD(PrettyTagsMixIn, Element):
attlist = {'profile': 1}
attlist.update(i18n)
class HR(Element):
attlist = {'align': 1, 'noshade': 0, 'size': 1, 'width': 1}
attlist.update(coreattrs)
attlist.update(intrinsic_events)
content_model = None
class HTML(PrettyTagsMixIn, Element):
attlist = i18n
class TITLE(HTML): pass
class BODY(PCElement):
attlist = {'background': 1, 'text': 1, 'link': 1, 'vlink': 1, 'alink': 1,
'bgcolor': 1}
attlist.update(PCElement.attlist)
class IFRAME(PrettyTagsMixIn, Element):
attlist = {'longdesc': 1, 'name': 1, 'src': 1, 'frameborder': 1,
'marginwidth': 1, 'marginheight': 1, 'scrolling': 1,
'align': 1, 'height': 1, 'width': 1}
attlist.update(coreattrs)
class IMG(CommonElement):
attlist = {'src': 1, 'longdesc': 1, 'usemap': 1, 'ismap': 0}
attlist.update(PCElement.attlist)
attlist.update(visual_presentation)
attlist.update(alternate_text)
content_model = None
class INPUT(CommonElement):
attlist = {'type': 1, 'name': 1, 'value': 1, 'checked': 0, 'disabled': 0,
'readonly': 0, 'size': 1, 'maxlength': 1, 'src': 1,
'usemap': 1, 'accept': 1, 'border': 1}
attlist.update(CommonElement.attlist)
attlist.update(tabbing_and_access)
attlist.update(alternate_text)
content_model = None
class LABEL(CommonElement):
attlist = {'label_for': 1}
attlist.update(CommonElement.attlist)
attlist.update(access_keys)
class UL(PCElement):
attlist = {'compact': 0}
attlist.update(CommonElement.attlist)
class OL(UL):
attlist = {'start': 1}
attlist.update(UL.attlist)
class LI(UL):
attlist = {'value': 1, 'type': 1}
attlist.update(UL.attlist)
class LINK(PCElement):
attlist = {'charset': 1, 'media': 1}
attlist.update(PCElement.attlist)
attlist.update(links_and_anchors)
content_model = None
class META(PrettyTagsMixIn, Element):
attlist = {'http_equiv': 1, 'name': 1, 'content': 1, 'scheme': 1}
attlist.update(i18n)
content_model = None
class OBJECT(PCElement):
attlist = {'declare': 0, 'classid': 1, 'codebase': 1, 'data': 1,
'type': 1, 'codetype': 1, 'archive': 1, 'standby': 1,
'height': 1, 'width': 1, 'usemap': 1}
attlist.update(PCElement.attlist)
attlist.update(tabbing_navigation)
class SELECT(PCElement):
attlist = {'name': 1, 'size': 1, 'multiple': 0, 'disabled': 0}
attlist.update(CommonElement.attlist)
attlist.update(tabbing_navigation)
class OPTGROUP(PCElement):
attlist = {'disabled': 0, 'label': 1}
attlist.update(CommonElement.attlist)
class OPTION(OPTGROUP):
attlist = {'value': 1, 'selected': 0}
attlist.update(OPTGROUP.attlist)
class PARAM(Element):
attlist = {'id': 1, 'name': 1, 'value': 1, 'valuetype': 1, 'type': 1}
class SCRIPT(Element):
attlist = {'charset': 1, 'type': 1, 'src': 1, 'defer': 0}
class SPAN(CommonElement):
attlist = {'align': 1}
attlist.update(CommonElement.attlist)
class DIV(PrettyTagsMixIn, SPAN): pass
class STYLE(PrettyTagsMixIn, Element):
attlist = {'type': 1, 'media': 1, 'title': 1}
attlist.update(i18n)
class TABLE(PCElement):
attlist = {'cellspacing': 1, 'cellpadding': 1, 'summary': 1, 'align': 1,
'bgcolor': 1, 'width': 1}
attlist.update(CommonElement.attlist)
attlist.update(borders_and_rules)
class TBODY(PCElement):
attlist = CommonElement.attlist.copy()
attlist.update(cellhalign)
attlist.update(cellvalign)
class THEAD(TBODY): pass
class TFOOT(TBODY): pass
class TR(TBODY): pass
class TH(TBODY):
attlist = {'abbv': 1, 'axis': 1, 'headers': 1, 'scope': 1,
'rowspan': 1, 'colspan': 1, 'nowrap': 0, 'width': 1,
'height': 1}
attlist.update(TBODY.attlist)
class TD(TH): pass
class TEXTAREA(CommonElement):
attlist = {'name': 1, 'rows': 1, 'cols': 1, 'disabled': 0, 'readonly': 0}
attlist.update(CommonElement.attlist)
attlist.update(tabbing_and_access)
def CENTER(*content, **attr):
c = apply(DIV, content, attr)
c['align'] = 'center'
return c
def H1(content=[], **attr): return apply(Heading, (1, content), attr)
def H2(content=[], **attr): return apply(Heading, (2, content), attr)
def H3(content=[], **attr): return apply(Heading, (3, content), attr)
def H4(content=[], **attr): return apply(Heading, (4, content), attr)
def H5(content=[], **attr): return apply(Heading, (5, content), attr)
def H6(content=[], **attr): return apply(Heading, (6, content), attr)
class CSSRule(PrettyTagsMixIn, Element):
attlist = {'font': 1, 'font_family': 1, 'font_face': 1, 'font_size': 1,
'border': 1, 'border_width': 1, 'color': 1,
'background': 1, 'background_color': 1, 'background_image': 1,
'text_align': 1, 'text_decoration': 1, 'text_indent': 1,
'line_height': 1, 'margin_left': 1, 'margin_right': 1,
'clear': 1, 'list_style_type': 1}
content = []
content_model = None
def __init__(self, selector, **decl):
self.dict = {}
self.update(decl)
self.name = selector
start_tag_string = "%s { %s }"
def end_tag(self): return ''
def str_attribute(self, k):
kt = replace(k, '_', '-')
if self.attlist[k]: return '%s: %s' % (kt, str(self[k]))
else: return self[k] and kt or ''
def str_attribute_list(self):
return join(map(self.str_attribute, self.dict.keys()), '; ')
nbsp = " "
def quote_body(s):
r=replace; return r(r(r(s, '&', '&'), '<', '<'), '>', '>')
safe = string.letters + string.digits + '_,.-'
def url_encode(s):
l = []
for c in s:
if c in safe: l.append(c)
elif c == ' ': l.append('+')
else: l.append("%%%02x" % ord(c))
return join(l, '')
def URL(*args, **kwargs):
url_path = join(args, '/')
a = []
for k, v in kwargs.items():
a.append("%s=%s" % (url_encode(k), url_encode(v)))
url_vals = join(a, '&')
return url_vals and join([url_path, url_vals],'?') or url_path
def Options(options, selected=[], **attrs):
opts = []
for o, v in options:
opt = apply(OPTION, (o,), attrs)
opt['value'] = v
if v in selected: opt['selected'] = 1
opts.append(opt)
return opts
def Select(options, selected=[], **attrs):
return apply(SELECT, tuple(apply(Options, (options, selected))), attrs)
def Href(url, text, **attrs):
h = apply(A, (text,), attrs)
h['href'] = url
return h
def Mailto(address, text, subject='', **attrs):
if subject:
url = "mailto:%s?subject=%s" % (address, subject)
else:
url = "mailto:%s" % address
return apply(Href, (url, text), attrs)
def Image(src, **attrs):
i = apply(IMG, (), a)
i['src'] = src
return i
def StyledTR(element, row, klasses):
r = TR()
for i in range(len(row)):
r.append(klasses[i] and element(row[i], klass=klasses[i]) \
or element(row[i]))
return r
def StyledVTable(klasses, *rows, **attrs):
t = apply(TABLE, (), attrs)
t.append(COL(span=len(klasses)))
for row in rows:
r = StyledTR(TD, row[1:], klasses[1:])
h = klasses[0] and TH(row[0], klass=klasses[0]) \
or TH(row[0])
r.content.insert(0,h)
t.append(r)
return t
def VTable(*rows, **attrs):
t = apply(TABLE, (), attrs)
t.append(COL(span=len(rows[0])))
for row in rows:
r = apply(TR, tuple(map(TD, row[1:])))
r.content.insert(0, TH(row[0]))
t.append(r)
return t
def StyledHTable(klasses, headers, *rows, **attrs):
t = apply(TABLE, (), attrs)
t.append(COL(span=len(headers)))
t.append(StyledTR(TH, headers, klasses))
for row in rows: t.append(StyledTR(TD, row, klasses))
return t
def HTable(headers, *rows, **attrs):
t = apply(TABLE, (), attrs)
t.append(COL(span=len(headers)))
t.append(TR, tuple(map(TH, headers)))
for row in rows: t.append(TR(apply(TD, row)))
return t
def DefinitionList(*items, **attrs):
dl = apply(DL, (), attrs)
for dt, dd in items: dl.append(DT(dt), DD(dd))
return dl