Home > Uncategorized > I can write them nearly as fast as you can read them (now)

I can write them nearly as fast as you can read them (now)

While filling a few holes in the docs I found myself waiting for more then one hour to see POD-files being converted to HTML. For a quick review if the HTML is presentable that’s a wee bit long a wait. So I set out to solve that problem. I don’t really need all the fancy whistles and bells. Checking external links, looking for unclosed tags etc. will do. Having everything in one big file helps with full-text search. The official doc site doesn’t have a TOC over all POD-files, so that would be nice to have to see if there is anything hidden by a lack of linkage. Using more then one core (the main reason why it takes 81 minutes) would be cool too.

As it turns out that’s surprisingly simple. POD is parsed by the Perl 6 compiler of your choosing (Rakudo in my case) and lands in the special variable $=pod. It’s a list of various Pod::* nodes that we can walk recursively with not so many multies.

my proto sub handle ($node, Context $context = None) is export {
    {*}
}

multi sub handle (Pod::Block::Named $node) is export {
    $node.contents>>.&handle();
}

...

multi sub handle (Str $node, Context $context?) is export {
    $node.subst('&', '&amp;', :g).subst('<', '&lt;', :g);
}

There is the matter of context though. A child node of Pod::Html needs a serious lack of html-escaping. That’s what $context is for. What different contexts we can have knows the Enum.

my enum Context ( None => 0, Index => 1 , Heading => 2, HTML => 3);

That could be a bitmask, for now it wasn’t needed. If the context is handed down the call-tree, we can test for it in the signature.

multi sub handle (Pod::Block::Named $node where $node.name eq 'Html') is export {
    $node.contents>>.&handle(HTML);
}
multi sub handle (Str $node, Context $context where * == HTML) is export {
    $node.Str;
}

Poof! The html-escape is gone.

Since we are quite functional, we can fairly easy multi-thread the whole thing.

await do start { handle($_) } for $pod.flat

That’s basically it. Applied to the top-level List in $=pod, it will pick threads from the thread-pool and run the whole thing in parallel. There is a catch though. While building the TOC Of All Things, there may be name collisions in headings. That is fine for the visible names in the TOC but not for the html-link-anchors. A simple way to cheat is to apply a serial number to each TOC entry and use that for inner-document links. Getting that thread-safe is easily done with a Lock.

sub register-toc-entry($level, $text) {
    state $lock = Lock.new;
    state Int $global-toc-counter = 0;
    my $clone;
    $lock.protect: {
        ++$global-toc-counter;
        $clone = $global-toc-counter.clone;
        @toc.push: $level => $text => $clone;
    }
    't' ~ $clone
}

Both the Lock and the serial-counter are state variables. The => operator will bind the container of our serial number to the Pair that is pushed into @toc and without explicitly cloning it, we would end up with exactly one serial number.

With that little code it’s scales up quite well over the numbers of cores to your disposal. When running over many files and sending Rakudos parser into a thread for each file, the speedup is quite dramatic. It’s done in a little over a minute.

The whole thing is on github under the trusty artistic license. It’s not a complete POD renderer, nor is the output overly pretty. However, it may serve you well for a quick preview of many POD-files. If you feel adventures, fork away!

Categories: Uncategorized
  1. No comments yet.
  1. No trackbacks yet.

Leave a comment