Format XML, indent align beautify clean up XML

To format XML data, open the file or paste the document in firstobject's free XML editor (foxe) and press F8 to indent XML, or Shift+F8 to align XML against the left margin. The XML formatter, or "XML beautifier," puts each element on its own line making it more pretty and easy to read, and does it fast even on relatively large files over a megabyte. With Tools -> Preferences you can set the type of indent and cause attributes to be placed on separate lines.

Indent XML

The following illustration shows an example of a simple unformatted XML document and some line breaks at odd places.

<config><diagnostics><path>C:\temp</path>
<proxy usedefault="true"/></diagnostics></config>

The Indent (F8) function moves all elements, comments and processing instructions to start on separate lines and indents them to show their hierarchical relationship. Specify indentation by tab or spaces in Tools Preferences. For elements with text content or CDATA Sections the content is not modified.

<config>
    <diagnostics>
        <path>C:\temp</path>
        <proxy usedefault="true"/>
    </diagnostics>
</config>

Since text content is not modified even if it has multiple lines, the appearance will not be as attractive if the document contains multiline text content. However, for simple XML documents the indent XML function produces a very nice result.

 

comment posted cannot get indenting to work

Len Conrad 06-Feb-2009

Hate to bother you about free software, but ... FirstObject [editor] is really cool, just what we need to edit pfSense-exported firewall config but I cannot get the indenting nor data-on-separate-line to work.

  <?xml version="1.0" ?> 
- <pfsense>
  <version>3.0</version> 
  <lastchange /> 
  <theme>pfsense-dropdown</theme> 
- <system>
  <optimization>normal</optimization> 

Those dashes shouldn't be there; it looks like you copied and pasted the XML from Internet Explorer or another browser. If the XML file is presented with minus signs in a browser, you need to get the XML file itself by going to Save As from the File menu. With the above example, the firstobject XML editor does nothing when you tell it to format the XML document because the extra dashes confuse it into thinking the root element contains mixed content.

Copying and pasting from that type of browser-view is not reliable because that is a displayable view of the XML, and not the actual XML. You'll not just get extra dashes and plus signs, you can also get decoded character references like quotes in attribute values that make it ill-formed. If you have don't have access to the original XML, you can select the dash and the space and the less than sign start of the tag - <pfsense>, press Ctrl+H and Replace All with just a less than sign. That will probably not mess up any legitimate values in your XML and then format F8 will work fine provided there are no problematic character references in the document.

Align XML

The Align (Shift+F8) function moves all elements to start on separate lines but lines are not indented. This takes up less disk space than indentation and is sometimes preferred for useability with word wrap or a lot of multi-line data elements.

<config>
<diagnostics>
<path>C:\temp</path>
<proxy usedefault="true"/>
</diagnostics>
</config>

Attributes on separate lines

If you turn on "Attributes on separate lines" in Tools Preferences, this will affect both indented and aligned formatting. The following example shows the result for an indented document.

<config>
    <diagnostics>
        <path>C:\temp</path>
        <proxy
            usedefault="true"
            />
    </diagnostics>
</config>

Manual indentation

Use the Tab and Shift+Tab keys to indent selected lines by one and unindent selected lines by one. This only works when the currently selected text includes more than one line. This is also used for programming.

How the XML formatter works

Formatting works by modifying the insignificant whitespace between elements (and between attributes inside start tags). Formatting is primarily for XML without mixed content; elements containing mixed content are left as is. Generally, the editor's formatting functions should not disturb mixed content, however it is possible that significant whitespace in mixed content may be mistaken for insignificant whitespace and be modified.

Formatting works even when there are errors in the XML such as non-ended elements, partially formed tags and lone end tags. Simple rules are applied to the parts containing errors to decide which elements will be represented as containing other elements (see Containment Hierarchy), so that the areas without errors can be formatted. The elements containing errors are not formatted.

Formatting generates the formatted document entirely by copying directly from the old document. Only whitespace is modified (spaces, tabs, carriage returns, and linefeeds). This means that escaped or unescaped characters in the content and attribute values will not be affected, and elements with end tags but containing no data will not be converted to empty elements. In addition, lone end tags and erroneous nodes will be copied over without modification.

Mixed content

The XML formatter can only modify "whitespace" which appears insignificant. Whitespace includes all spaces, returns, newlines and tabs. The formatter can only format sibling elements that are separated by whitespace only.

The XML formatter cannot do much with HTML and XML documents with mixed content, though it can sometimes format pieces of them. The contents of elements containing a mix of elements and text (as in HTML paragraphs) will usually be left untouched. However, the formatter could erroneously format an HTML paragraph that looks like two discrete elements:

<p><em>run</em><strong>FAST</strong></p>

The em and strong elements would be moved to separate lines and the rendered HTML would show run FAST with a space between the two words instead of runFAST.

See also:

CMarkup GetDocFormatted Method