Could actions make Freeway’s HTML import smarter?

Questions:

  1. Can Freeway Actions be applied to the content of such files before
    publishing?
  2. If yes, would it be possible to make an action that converted
    non-ASCII characters from references (such as å) into “real“
    characters (such as “å”)?
  3. Alternatively, would it be possible to have an action that made sure
    that only the content of the body element was imported.

Background/Use case(s)

Use case for question 1: converting character references before

publish

Via the Import submenu of the File menu, Freeway can import HTML files
from the harddisk. This (potentially) allows bits of the site to be
authored and updated in more convenient writibng tools than Freeway (and
by more competent authors than the Freeway user …). Example: We can
create the file ‘fragment.html’ and fill it with the following content:

 <section xmlns="http://www.w3.org/1999/xhtml">
  <p>ABC</p><p>XYZ</p><p>ÆØÅ</p>
 </section>

The above file content makes up conforming XML-file in the XHTML
namespace which can, given the right authoring tool, be authored in a
WYSYWYG fashion. Finding such a tool can be difficult as most (X)HTML
editors insist on creating complete HTML documents - they don’t allow
you to only edit a fragment. Luckily, I have just discovered a tool
that allows me to do what I want: XMLmind XML Editor. [1] This permits
(in principle) the file to be be directly imported into Freeway without
any problems resulting from the import. (Note, by the way, that the
XHTML namespace dleclaration is permitted on any element in HTML5, hence
its presence on the blockquote element does not make the document
invalid.)

This solution only leaves one problem to be solved: the character
encoding problem: This solution requires (due to restrictions in XML)
the document to be authored in the UTF-8 encoding (or UTF-16, but forget
about that). Why: Only then does XML permit the XML encoding declaration
to be dropped. (If it it ain’t dropped, then Freeway imports that too,
which in turn leads to an invalid document.)

XMLmind XML Editor fortunately permits auto-conversion of all non-ASCII
to character references even when the document uses UTF-8 encoding.
Voila problem solved.

However: Partly for purity reasons but also for some ‘real’ value in
avoiding that Freeway publishes anything as character references, I
wonder if an Action could reconvert the character references to
UTF-8 before publishing?

Use case for question 2: Importing selected part of a document

Most authoring tools (such as e.g. BlueGriffon) do however insist on
working with a complete HTML file, including the elements ,
and . Meaning that the above file would look like this:

 <!DOCTYPE HTML><html xmlns="http://www.w3.org/1999/xhtml">
 <head><title></title><meta charset="UTF-8"/></head><body>
 <section><p>ABC</p><p>XYZ</p><p>ÆØÅ</p>
 </section></body></html>

Due to the current unintelligent mechanism of Freeway’s HTML import,
this causes , and <body to be imported as well, which in
turns invalidates the document that Freeway publishes. So this is the
use case for question 2: Could a Freeway action that deleted the
unnecessary elements be made?

PS: A solution to the second use case could also ad benefits to the
first usecase: Consider the following document, where the encoding is
‘x-mac-roman’/‘macintosh’ - thus there is no need to convert the
non-ASCII letters to character references. Thus: Freeway gets what it
expects: the Macintosh encoding. Many languages (French, German,
Scandinavian and more) can be expressed using the ‘macintosh’ encoding,
and this solutions might thus be preferrable to converting characters to
references. However, instead, we get the problem that the XML
declaration gets imported - whic hmakes the Freeway published document
invalid:

 <?xml version="1.0" encoding="macintosh" ?>
 <section xmlns="http://www.w3.org/1999/xhtml">
  <p>ABC</p><p>XYZ</p><p>ÆØÅ</p>
 </section>

[1] XMLmind XML Editor: XMLmind XML Editor: A strictly validating, near WYSIWYG, DocBook editor, DITA editor, XHTML editor, MathML editor, XML editor, aimed at technical writers

leif halvard silli


actionsdev mailing list
email@hidden
Update your subscriptions at:
https://freewaytalk.softpress.com/person/options

I did a lot of work in this area when I was doing the stylesheet extension library system several years ago, and before that, when I was building the Template Helper Action. There are some possibilities here, and some areas where the Action API is really difficult to reason with. First, you can probably do much of this XML-munging in an external library like libxml, called through the command line. But that uses XSLT 1, not 2 (which I believe is still mostly an academic exercise) and thus the character encoding on output may still be forced on you. XSLT 1 is awfully fussy about non-ASCII characters. Under the API hood, most things to do with external files either deal with them as a sequence of bits or as a string of text in whatever character encoding scheme you have told Freeway the file belongs to. I admit I don’t understand this interface as well as I could. While working on Template Helper, I discovered that Freeway was generating a file with unicode contents but giving it the magic bits of a Mac Roman text file, or something like that. Result was predictably awful – anything above 127 in the ASCII table was converted into something other.

Walter

On Nov 26, 2015, at 7:17 AM, Leif Halvard Silli email@hidden wrote:

Questions:

  1. Can Freeway Actions be applied to the content of such files before publishing?

actionsdev mailing list
email@hidden
Update your subscriptions at:
https://freewaytalk.softpress.com/person/options

Hi Walter,

On 26 Nov 2015, at 14:38, Walter Lee Davis wrote:

While working on Template Helper, I discovered that Freeway was
generating a file with unicode contents but giving it the magic bits
of a Mac Roman text file, or something like that. Result was
predictably awful – anything above 127 in the ASCII table was
converted into something other.

Exactly. Freeway expect imported/included HTML to the Mac Roman text
encoding. (Even if only developers living in the Mac OS 9 era would
produce HTML using the Mac Roman text encoding.) For example, if the
file is UTF-8 encoded, Freeway will read it as if it is a Mac Roman
encoded file. The result is, like you say, awful.

As a consequence (and workaround) it is only possible to include files
where the content is - or has ben converted for compatibility with -
the Mac Roman files or the US-ASCII encoding. But that will work,
though.

And fortunately, any HTML files can be converted to be compatible with
Mac Roman files or US-ASCII - it is just that it is unexpected,
impractical and impractical.

… snip …

First, you can probably do much of this XML-munging in an external
library

I really sounded complicated … Fortunately, XMLmind XML editor now
automatically takes care of the encoding for me. I am really glad I
found that workaround!

leif halvard silli


actionsdev mailing list
email@hidden
Update your subscriptions at:
https://freewaytalk.softpress.com/person/options

Thankfully, that hasn’t been true for quite a while now. Have a look at this line, where I am manipulating an external file with the Action API:

Using that method when opening the file for initial writing lets me declare the charset and Freeway will treat the file correctly. But I also have to take care to ensure that what I stuff into that file is actually in that encoding. That’s where the tag at the top of the Action comes in (also the fact that the Action file itself is saved in UTF-8, I think, which means that any HTML inserted from the Action carries that character encoding along with it as well).

Walter

On Nov 26, 2015, at 5:04 PM, Leif Halvard Silli email@hidden wrote:

Exactly. Freeway expect imported/included HTML to the Mac Roman text encoding. (Even if only developers living in the Mac OS 9 era would produce HTML using the Mac Roman text encoding.) For example, if the file is UTF-8 encoded, Freeway will read it as if it is a Mac Roman encoded file. The result is, like you say, awful.

As a consequence (and workaround) it is only possible to include files where the content is - or has ben converted for compatibility with - the Mac Roman files or the US-ASCII encoding. But that will work, though.


actionsdev mailing list
email@hidden
Update your subscriptions at:
https://freewaytalk.softpress.com/person/options