Working draft, revised 2018-01-19 (UTC+01:00)
Published 2019-07-31T14:01Z
This document specifies the Open Braille Formatting Language (OBFL), a data format for representing text formatting layout. OBFL is an XML 1.0 application.
This document is a working draft.
The following changes have been introduced that break compatibility with preceding versions (listed by date of change):
[no changes]
After a period of one year from the time of the change, the change is no longer seen as incompatible. In practice, this means that:
In both cases, the change will be removed from the above list.
Changes to incubating features are not marked as compatibility breaking.
This section is informative.
The Open Braille Formatting Language (OBFL) is a document type that represents text formatting layout. It is designed for paged media, with the additional capability of media divided in physical volumes. Specifically, it is designed for braille, but can also be used in other fixed character width contexts.
OBFL brings a number of things to braille production:
OBFL is a braille formatting language. This includes areas specific to braille formatting, such as volume splitting. However, there are many other issues in braille production that OBFL neither can nor should solve. For example, issues involving controlling text to braille translations (except by means of formatting related abstractions).
This section is normative.
The following terms and definitions are used within this document.
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119.
The use of axes in this document (such as "child", "parent", "ancestor", "descendant") is consistent with Section 2.2 of the XPath 1.0 Recommendation [XPath].
The use of node types in this document (such as "root", "element", "text", "attribute") is consistent with Section 5 of the XPath 1.0 Recommendation [XPath].
This specification is based on the specific versions of the standards and specifications referenced herein, which are used as defined except as noted in this document. Any refinement or replacement of a referenced specification by a newer or different version is not directly applicable to this standard. Conformance to this standard is based on the versions of the standards and specifications in effect at the time of writing.
This section is normative.
A conforming OBFL document is an XML document that requires only the facilities described as mandatory in this specification. Such a document must meet all of the following criteria:
The OBFL namespace may be used with other XML namespaces as per [XMLNS].
The Internet media type for OBFL is "application/x-obfl+xml".
It is recommended that OBFL files have the extension ".obfl" (all lowercase) on all platforms.
This section is normative.
While processing XML-documents white space is often introduced unintentionally, i.e. without the intent to carry meaning.
White space can be introduced by a graphical editor. Incorrect style boundaries - i.e. including surrounding white space - is visually undetectable and usually not corrected by WYSIWYG editors.
White space can be introduced as a result of pretty printing. Pretty printing is often employed by users of XML-editors as it improves readability and navigability. However, it is an inexact process which makes certain assumtions about the XML-document. Most algorithms do not change elements that evidently contain both non white space text nodes and elements. However, pretty printing algorithms often assume that an element that only has non-text children can be reformatted, even if:
White space can be introduced manually. For example, by adding an XML comment:
<block> <!-- Just an innocent comment at the start of a new block --> We offer free <style name="em">technical support</style> for subscribers.</block>
or when nesting block elements:
<block> <block>We offer free <style name="em">technical support</style> for subscribers.</block> <block>
That the above could have rendering implications are often unknown to users working with XML-files.
The implications of incorrect white space is more severe in braille than in visual media. This is true in particular for inline elements such as style markers due to the fact that they are added as separate characters in braille.
In a word processor this:
<p>We offer free<em> technical support </em>for subscribers.</p>
is visually equvialent to:
<p>We offer free <em>technical support</em> for subscribers.</p>
In braille however, the former will be percieved as an error as the braille markers will be disjoint from the text.
The following applies to all characters defined as white space by [Unicode] with the exception of 'NO-BREAK SPACE' (U+00A0). To affect rendering,   must be surrounded by non white space characters on both sides.
Note that, line breaks are also white space characters. These do not constitute line breaks in OBFL. Authors should use appropriate elements and attributes to achieve formatting effects that involve white space, rather than space characters.
Also note that, Unicode characters 'ZERO WIDTH SPACE' (U+200B) and 'BRAILLE PATTERN BLANK' (U+2800) are not regarded as white space by the Unicode specification and should be preserved.
Sequences of white space separate "words" (we use the term "word" here to mean "sequences of non-white space characters"). When formatting text, user agents should identify these words and lay them out according to the conventions of the particular written language (script). This typically involve putting space between words.
Note that a sequence of white spaces between words in the source document may result in a different rendered inter-word spacing. In particular, user agents should collapse input white space sequences when producing output inter-word space.
A user agent should by default process white space according to the following:
This section is normative.
The following describes the default pagination behavior and applies unless otherwise specified.
Pages in the main flow are counted together in document order, starting from 1.
Pages in pre-content and post-content are counted together within the boundaries of each instance of these containers, starting from 1 each time.
In a sequence where duplex is not enabled, only the front side of sheets are counted and paginated. The back side of sheets are neither counted nor paginated.
In a sequence where duplex is enabled, both sides of sheets are counted and paginated. However, if the last sheet of a duplex sequence only has content on the front side, the back side is counted, but not paginated.
This section is normative.
The page model describes the relationships between regions on a page.
The following applies on the front side of a sheet or when only one side of the sheet is used.
The inner margin is to the left and the outer margin is to the right. The inner and outer margins span the full height of the page.
The left margin region is to the left of the text body. The right margin region is to the right of the text body. Stacked above and below these three areas (and spanning the full width of them) are the page area, header and footer. The top and bottom page area are closest to the center. The header and footer are closest to the edge of the paper.
Note that if text flow is allowed in the header/footer area, the area of the header/footer is reduced to align with the text body area so that the margin area can be used as well.
The following table is a graphical representation of the this description.
Inner margin |
A | Header | A | Outer margin |
Top page area | ||||
Left margin region(s) |
Text body | Right margin region(s) |
||
Bottom page area | ||||
A | Footer | A |
A = Margin region if text flow is combined with header/footer and there is a margin region, otherwise this area is available to header/footer.
The only difference between the recto and the verso side, is that the outer and inner margins have switched places.
This section is informative.
In this listing, unprefixed elements are in the OBFL namespace.
The obfl element is the root element of an OBFL document.
The top level element structure of an OBFL document is as follows:
<obfl> <meta>...</meta> <layout-master>...</layout-master> <file-reference>...</file-reference> <xml-processor>...</xml-processor> <renderer>...</renderer> <table-of-contents>...</table-of-contents> <volume-template>...</volume-template> <collection>...</collection> <sequence>...</sequence> </obfl>
The meta element contains meta data, e.g. Dublin Core. Currently, any element is allowed, however a stricter set of elements might be more useful.
The layout-master element defines the appearence of pages in the sequences where the layout-master is applied. It contains page template information, such as the width and height of pages.
The template element contains template information that applies under specific conditions. The template conditions are tested in document order, and the first template whose condition returns true is applied.
Headers, footers and page-areas span the full width of the page, whereas the text flow area is reduced by the combined width of the margin-regions.
The default-template element contains the same type of information as the template element and acts as a fallback when no previous template element applies.
The page-area element is a page anchored place-holder for items in a collection.
The fallback element is used to determine what should happen if rendering of the collection fails. The fallback element must include an instruction for the collection that triggered it, but it may also include additional instructions for other collections.
The rename element is used within the fallback element to specify a collection that should be renamed. The new name must not be in use as an identifier elsewhere (but it should be referenced).
The before element contains blocks to insert before the contents of the page area.
The after element contains blocks to insert after the contents of the page area.
The header element contains the page header information. The chunks of text created when resolving each child elements contents are distributed in an equal spaced cell grid over a single row.
The footer element contains the page footer information. The chunks of text created when resolving each child elements contents are distributed in an equal spaced cell grid over a single row.
Adds a margin region.
The indicators element uses the full height of the margin-region to indicate rows where one of the specified conditions are met.
The conditions should be evaluated in the order they occur in this element. In other words, only the first match is rendered when more than one condition applies to a given row.
The marker-indicator element indicates that one or more of the specified markers are present on the indicated row.
Given the following input:
<layout-master name="default" ...> <default-template> <header/> <footer/> <margin-region align="left"> <indicators> <marker-indicator markers="pages sections" indicator="!"/> <marker-indicator markers="images" indicator="?"/> </indicators> </margin-region> </default-template> </layout-master> ... <sequence master="default"> <block margin-left="1"><marker class="sections" value="1"/>Page 1</block> <block margin-left="1">Page 2<marker class="pages" value="2"/></block> </sequence>
The output would be:
! Page 1 ! Page 2
The field element contains data.
The marker-reference element indicates that it should resolve to the value of a marker of the same class in the direction and scope specified by this element. If no reference is found, the empty string is returned.
The start-offset attribute offsets the start of the search by the specified number of pages. This applies irrespectively of the scope of the search.
By default, the search begins on the page where the marker-reference element is placed. The search, once combined with a non-zero start-offset, could start on a non-existing page. If the search direction is the same as the offset direction, the search would never hit an existing page and the empty string is returned. However, if the search direction is opposite to the offset direction, the search could eventually hit an existing page. Depending on the scope, the following applies in this particular situation:
The string element indicates that it should resolve to the contents of its value attribute.
The current-page element indicates that it should resolve to the value of the current page, using the number format specified by this element.
The sequence element contains the text body.
The dynamic-sequence is similar to a regular sequence. The main difference is that it can contain elements that are generated based on the placement of the body of text. Another difference is that its existence depends on the existence of generated content. Without any generated content, the sequence should be excluded entirely.
The block element defines a block of text.
The evaluate element inserts the result of evaluating an expression, see the OBFL evaluation language for more information.
<evaluate expression="(= 0 1)"/>
The leader element moves the location of the following piece of text within the current row.
The marker element provides locations for classes of values that may be refered to in headers and footers.
The anchor element references an item in a collection, suggesting that the item has a connection to the text near the location of the anchor.
The br element provides a line break within a block of text (i.e. a paragraph).
The page-number element inserts the page number of the referenced element.
The span element represents a segment of text whose hyphenation policy or language is different from the surrounding text.
The style element represents a segment of text with a specific style, e.g. emphasis or strong.
The table element represents data with more than one dimension, in the form of a table.
Table cell borders collapse unless table-col-spacing > 0 or table-row-spacing > 0.
Table borders will collapse with cell borders, if not separated by padding. Table borders takes precedence over cell borders, unless the table's border style is 'none'.
Algorithm for selecting a border between a cell and the table border:
Table cell borders will combine to a form a single border, that is a mix between the two borders.
Algorithm for selecting a shared border for cells A and B:
Note that the presence of a border on one cell will offset the placement of all cells within that row and column to the same extent, even if they themselves do not have borders. The cells that do not have borders will have extra empty space in it's place.
The thead element represents a row of repeating header cells in a table. The cells in the header will repeat at the start of each page that the table spans.
Note that the identifier attribute will be disregarded for all descendant elements of thead.
The tbody element represents the body cells in a table.
The tr element represents a row of cells in a table.
The td element represents a data cell in a table.
The table-of-contents element defines the entries in a table of contents. Note that this element specifies formatting of data in a table of contents, it does not by it self specify that a table of contents should be inserted. It is required that the order of the toc-entries is consistent with the blocks it references.
The toc-entry element defines an entry in a table of contents.
The toc-sequence element contains instructions needed to generate a TOC's for braille books. Since the number of volumes and their contents is unknown prior to the layout process and since the TOC in braille books may be different for each volume, it is not sufficient to define the TOC before applying the layout as in XSL-FO. Note that a toc-sequence only contains static contents inserted while generating the toc (on-toc-start, on-volume-start, on-volume-end and on-toc-end). The actual TOC entries are collected from the referenced toc data.
The on-toc-start element defines blocks of text to insert before the TOC data.
The on-volume-start element defines blocks of text to insert before the TOC data or collection items of a volume. Note that this element only applies to document range toc-sequences and list-of-references.
The on-volume-end element defines blocks of text to insert after the TOC data or collection items of a volume. Note that this element only applies to document range toc-sequences and list-of-references.
The on-toc-end element defines blocks of text to insert after the TOC data.
The volume-template element contains static contents that is to be inserted before or after a braille volume's body of pages. In additional to the normal sequences that can be used to create static contents such as information about the author and title of the book, there is a special sequence called toc-sequence.
The pre-content element defines contents that should preceed a braille volume's body of pages.
The post-content element defines contents that should follow a braille volume's body of pages.
This feature is incubating (2018-01-18). This means that the feature is currently a work-in-progress and may change or disappear at any time.
The transition element defines a volume transition.
When this element is present, the last page or sheet in each volume (except the last volume) may be modified so that the volume break occurs earlier than usual: preferably between two blocks, or if that is not possible, between words. The left over contents is moved to the next volume. If volume-keep-priority is used, it will be used to determine a suitable break point.
When this element is not present, the last page is broken in the same way as all other pages.
<volume-transition> <!-- If the last block has to be split up, use the following. --> <block-interrupted>[The paragraph continues in the next volume. --Ed.]</block-interrupted> <block-resumed>[Paragraph continued. --Ed]</block-resumed> <!-- If the last block is short enough to fit on the page, use the following. --> <sequence-interrupted> <block>[The chapter continues in the next volume. --Ed.]</block> </sequence-interrupted> <sequence-resumed> <block>[Continued from previous volume. --Ed.]</block> </sequence-resumed> <any-interrupted> <block>--- End of volume ---</block> </any-interrupted> <any-resumed> <block>--- Volume begins ---</block> </any-resumed> </volume-transition>
This feature is incubating (2018-01-18). This means that the feature is currently a work-in-progress and may change or disappear at any time.
The block-interrupted element contains text to insert when a block is interrupted before a volume break.
This feature is incubating (2018-01-18). This means that the feature is currently a work-in-progress and may change or disappear at any time.
The block-resumed element contains text to insert when a block is resumed after a volume break.
This feature is incubating (2018-01-18). This means that the feature is currently a work-in-progress and may change or disappear at any time.
The sequence-interrupted element contains blocks to insert when a sequence is interrupted before a volume break.
This feature is incubating (2018-01-18). This means that the feature is currently a work-in-progress and may change or disappear at any time.
The sequence-resumed element contains blocks to insert when a sequence is resumed after a volume break.
This feature is incubating (2018-09-24). This means that the feature is currently a work-in-progress and may change or disappear at any time.
The any-interrupted element contains blocks to insert in the main flow immediately before a volume break, regardless of where in the flow the volume is interrupted. Any conditional volume-transitions are rendered before this one.
This feature is incubating (2018-09-24). This means that the feature is currently a work-in-progress and may change or disappear at any time.
The any-resumed element contains blocks to insert in the main flow immediately after a volume break, regardless of where in the flow the volume was resumed. Any conditional volume-transitions are rendered after this one.
Defines items to be placed somewhere within the flow, typically as page positioned elements, such as footnotes.
Defines the formatting of an item that should be inserted somewhere in the flow. The behavior of this element is similar to the toc-entry element, except that nesting of items is not allowed.
The list-of-references element contains instructions needed to render a collection of items ordered by page of reference. Note that a list-of-references only contains static contents inserted while generating the sequence. The actual items are collected from the referenced collection.
The on-collection-start element defines blocks of text to insert before the collection items.
The on-page-start element defines blocks of text to insert before the collection items on a page.
The on-page-end element defines blocks of text to insert after the collection items on a page.
The on-collection-end element defines blocks of text to insert after the collection items.
xml-data is unprocessed xml in any namespace that may be represented in more than one way, depending on some aspect of the rendered result. Note that the contents of this element must be usable as a stand-alone source document.
<xml-data xmlns:html="http://www.w3.org/1999/xhtml"> <html:table> <html:tr> <html:td>table cell</html:td> </html:tr> </html:table> </xml-data>
Defines rendering options for unprocessed xml-data.
All scenarios that qualify will be rendered, and the scenario with the lowest cost will be selected. If a scenario cannot be rendered for some reason, it is assumed to have an infinite cost. It is an error if no scenario can be applied to a given xml-data.
Defines a rendering scenario.
<renderer name="table-renderer"> <rendering-scenario xmlns:html="http://www.w3.org/1999/xhtml" qualifier="/html:table/max(html:tr/sum(html:td/(if (@colspan) then @colspan else 1)))<=2" processor="simple-table-processor" cost="(+ (- 30 $min-block-width) $total-height)"/> </renderer>
Defines a parameter for a rendering-scenario.
Defines a pre-processor for xml-data. The output of the processor must be elements in the obfl namespace and the elements must be valid in the context where they are inserted.
Currently, the only defined processor is xsl:stylesheet, i.e. an XSLT document as specified in the XSL Transformations Recommendation [XSLT]. However additional elements might be supported in future versions.
<xml-processor name="example-processor"> <xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> <xsl:output method="xml" version="1.0" encoding="UTF-8" indent="no"/> <!-- Makes a verbatim copy of the input, which is assumed to be in obfl namespace --> <xsl:template match="/"> <xsl:copy-of select="*"/> </xsl:template> </xsl:stylesheet> </xml-processor>
Optional root node for the xml-processor result.
<xml-processor name="example-processor"> <xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> <xsl:template match="/"> <xml-processor-result> <xsl:apply-templates select="//tr"/> </xml-processor-result> </xsl:template> </xsl:stylesheet> </xml-processor>
Defines a file reference used in an xml-processor.
Currently, the only defined processor is xsl:stylesheet. Consequently, the only supported file reference content is an XSLT document. However additional content might be supported in future versions.
<file-reference uri="import.xsl"> <xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> <xsl:template match="tr"> <xsl:value-of select="string(.)" </xsl:template> </xsl:stylesheet> </file-reference> <xml-processor name="example-processor"> <xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> <xsl:import href="import.xsl"/> <xsl:template match="/"> <xml-processor-result> <xsl:apply-templates select="//tr"/> </xml-processor-result> </xsl:template> </xsl:stylesheet> </xml-processor>
The overall hyphenation policy should be determined by the implementing application, for example per job or in a system setting. A user's preference should be respected, unless doing so would render the text difficult to read or interpret for some reason. The hyphenate attribute can be used when detailed control over hyphenation is required, e.g. to ensure proper rendering of content that may be difficult to interpret if hyphenated, such as hyperlinks.
The hyphenation specified by hyphenate
applies to all elements in
its content unless overridden with another instance of hyphenate
. In
particular, the empty value of hyphenate
is used on an element B to
override a specification of hyphenate
on an enclosing element A,
without specifying another hyphenation policy. Within B, the user's
preference for hyphenation should be applied.
The value of the hyphenate attribute can either be 'true' or 'false'. The value 'true' indicates that a hyphenation algorithm should be applied to the text contents, i.e. add hyphenation information to the text. The value 'false' indicates that a hyphenation algorithm should NOT be applied to the text contents. However, obvious hyphention points that were already in the text to begin with may be used to hyphenate. Examples of hyphenation points that may be used to hyphenate even when the value of the hyphenate attribute is 'false' include (but are not limited to): SOFT HYPHEN (U+00AD) and ZERO WIDTH SPACE (U+200B).
The following table illustrates the output for different combinations of input text and hyphenate attribute value (the first row has room for three more characters):
input | hyphenate | output |
---|---|---|
ex-ample | true | ex-
ample |
false | ex-
ample |
|
example | true | ex-
ample |
false |
|
Specifies a name for a page number counter to use.
By default, all sequences in the text body are counted together, whereas sequences in pre-content and post-content are counted together only within the boundaries of each instance. This behavior can be changed by using this attribute.
Pages in sequences having the same value for this attribute will have consecutive page numbers in document order.
The value must be a token, but since this value is only used for grouping, no other restrictions apply.
Note that:
initial-page-number
attribute.initial-page-number
attribute will affect the page numbers in following
sequences with the same counter name.The following will result in three pages with the page numbers: 1, A, 2:
<layout-master name="body" page-width="10" page-height="6" duplex="false"> <default-template> <header><field><current-page/></field></header> <footer/> </default-template> </layout-master> <layout-master name="insert" page-width="10" page-height="6" duplex="false"> <default-template> <header><field><current-page number-format="upper-alpha"/></field></header> <footer/> </default-template> </layout-master> <sequence master="body"> <block>A</block> </sequence> <sequence master="insert" page-number-counter="insert-counter"> <block>B</block> </sequence> <sequence master="body"> <block>C</block> </sequence>
The row spacing, in row heights. E.g. 1.0 is normal row spacing, 2.0 is double.
Row spacing effects the appearance of the following vertical measurements that are expressed in rows:
For example, if row-spacing is 2.0, a vertical margin or padding of 2 will render a spacing equal to four row heights.
The overall translation policy should be determined by the user of the application, for example as a job parameter or a system setting. A user's preference should be respected whenever possible. Setting the translate attribute on the obfl-element is generally not recommended. The translate attribute should be used when detailed control over translation is required, e.g. when combining different languages or in books about braille code.
The value of the translate attribute can be 'pre-translated', 'grade0', 'grade1', 'grade2' or 'grade3'.
The translation specified by translate
applies to all elements in
its content unless overridden with another instance of translate
. In
particular, the empty value of translate
is used on an element B to
override a specification of translate
on an enclosing element A,
without specififying another translation. Within B, the user's preference for
translation should be applied.
The value 'pre-translated' indicates that the contents is already braille and hence does not need processing by a complete braille translator to produce braille output. When using this value, text nodes MUST contain a combination of braille characters [U+2800-U+28FF] and the following:
Note that:
Adjoining margins of two or more blocks (which might or might not be siblings) can combine to form a single margin. Margins that combine this way are said to collapse, and the resulting combined margin is called a collapsed margin.
Horizontal margins never collapse.
Vertical margins collapse when:
When two or more margins collapse, the resulting margin width is the maximum of the collapsing margins' widths. If the top and bottom margins of a block are adjoining, then it is possible for margins to collapse through it.
The following figure illustrates some of the available block properties.
margin-top | ||||||||||||||||||||
margin-left |
|
margin-right | ||||||||||||||||||
margin-bottom |
The rule sets mentioned below are included in the zip file for this specification. Users looking for local copies of the rule sets to work with should download and use this archive rather than using the specific references below.
This section is normative.
The Relax NG rule set (XML syntax) "validation/obfl.rng" forms a normative part of this specification.
This section is informative.
The XML schema rule set "validation/obfl.xsd" forms an informative part of this specification.
The Relax NG (Compact syntax) rule set "validation/obfl.rnc" forms an informative part of this specification.
This section is normative.
This section is informative.