Home > Raku > All your idioms are belong to us

All your idioms are belong to us

In the closing thought in my last post I postulated the need to find idioms. That worried me a bit because finding things that are not there (yet) is no easy feat. By chance that day Hacker News linked to an article with well written and explained Python code. We can’t quite translate idioms from one language to another. But if we can steal ideasborrow features from other languages, maybe we can take inspiration for idioms too.

The article by Bart de Goede kindly links to a github repo, where we can find the following piece of code.

import requests


def download_wikipedia_abstracts():
    URL = 'https://dumps.wikimedia.org/enwiki/latest/enwiki-latest-abstract.xml.gz'
    with requests.get(URL, stream=True) as r:
        r.raise_for_status()
        with open('data/enwiki-latest-abstract.xml.gz', 'wb') as f:
            # write every 1mb
            for i, chunk in enumerate(r.iter_content(chunk_size=1024*1024)):
                f.write(chunk)
                if i % 10 == 0:
                    print(f'Downloaded {i} megabytes', end='\r')

This is a very basic implementation of wget. I know very little about Python but I doubt they would be able to implement a complex one for the lack of horizontal space. Python may just be the driving force behind the proliferation of 4K monitors. Being mean aside, the whole idea is that the HTTP component can return an iterator that produces chunks of a given size. Those are written to disk and a progress message is presented.

In Raku iterators are well hidden behind Seq. We also have threadsafe streams in the form of Supply and Channel. To take advantage of this hidden superpower, we need a HTTP client module that can return a Supply. As described in jnthn`s youngest video* we need to get a Supply via .body-byte-stream.

sub download_wikipedia_abstracts {
    use Cro::HTTP::Client;

    constant $abstract-url = 'http://dexhome/enwiki-latest-abstract.xml.gz';
    constant $abstract-file = '/tmp/enwiki-latest-abstract.xml.gz';

    sub MB(Int $i --> Str) {
        sprintf("%.2fMB", $i / 1024 ** 2)
    }

    sub has-time-passed(:$h = 0, :$m = 0, :$s = 1 --> Bool) {
        my $seconds = $h * 60*60 + $m * 60 + $s;
        state $last-time = now;

        if now - $last-time >= $seconds {
            $last-time = now;
            True
        } else {
            False
        }
    }

    with await Cro::HTTP::Client.get: $abstract-url -> $r {
        my $file-length = $r.header('content-length').Int // *;
        with open($abstract-file, :w, :bin) -> $fh {
            say "";
            react whenever $r.body-byte-stream -> Blob \data {
                LAST { progress; $fh.close };
                state $so-far += data.bytes;
                sub progress { print "\r{$so-far.&MB} of {$file-length.&MB} downloaded" }

                $fh.write: data;
                progress if has-time-passed;
            }
        }
    }
}

download_wikipedia_abstracts;

This Raku-version is a bit longer because it shows the correct size of the download in a nicer way. I should not have made fun of Python for its horizontalism. We are only marginally better here. And not just with indentation. This is hardly readable boilerplate rich code. As functions are verbs and objects are nouns, we can try to be a bit more literate.

sub download_wikipedia_abstracts {
    use Cro::HTTP::Client;

    constant $abstract-url = 'http://dexhome/enwiki-latest-abstract.xml.gz';
    constant $abstract-file = '/tmp/enwiki-latest-abstract.xml.gz';

    my $r = await Cro::HTTP::Client.get: $abstract-url;

    my $file-length = $r.header('content-length').Int // *;
    my $fh = open($abstract-file, :w, :bin);
    print "\e[2Kdownloading abstract ";

    $r.body-byte-stream
        ==> progress-indicator(:max-bytes($r.header('content-length').Int // *), :bar-length(20))
        ==> supply-to-file(:path($abstract-file.IO));
}

By using the (sadly underused) feed operator we can nicely show the flow of data of the byte stream through our program. Since Boilerplate never really goes away, we merely put it someplace else.

constant NOP = -> | {;};

multi sub progress-indicator(Supply:D $in, :$max-bytes!, :$bar-length = 10, :&prefix = NOP, :&suffix = NOP, :&speed is copy --> Supply:D) {
    constant @block-chars = (0x2589 .. 0x258F, 0x2591).flat.reverse».chr;
    constant &store-cursor = { print "\e[s" }
    constant &restore-cursor = { print "\e[u" }
    constant &hide-cursor = { print "\e[?25l" }
    constant &show-cursor = { print "\e[?25h" }
    constant &reset-terminal = { print "\ec" }

    sub mega-bits-per-second($so-far) {
        state $last-time = now;
        state $last-bytes = 0;

        if now > $last-time + 2 {
            print ' ', (($so-far - $last-bytes) / (now - $last-time) / 1024**2 * 8).fmt('%.2fMBit/s');

            $last-bytes = $so-far;
            $last-time = now;
        }
    }

    my $out = Supplier::Preserving.new;
    &speed //= &mega-bits-per-second;

    hide-cursor;

    start react whenever $in -> \v {
        LAST { show-cursor; $out.done; }

        state $so-far += v.bytes;
        my $percent = $so-far / $max-bytes * 100;
        my $fraction = (($percent / $bar-length - floor $percent / $bar-length) * 7).round;

        store-cursor;
        prefix $so-far, $max-bytes;

        print '[', @block-chars[*-1] x floor($percent / 100 * $bar-length), ($percent < 100 ?? @block-chars[$fraction] !! ''), "\c[0x2591]" x ($bar-length - floor($percent / 100 * $bar-length)) - 1, ']';

        # &speed ?? speed($so-far) !! mega-bits-per-seconds($so-far);
        speed($so-far);

        suffix $so-far, $max-bytes;

        restore-cursor;

        $out.emit: v;
    }

    $out.Supply
}

multi sub supply-to-file(Supply:D $in, IO::Path :$path, :$blocking) {
    # $io = $io.open(:w, :bin) if $io ~~ IO::Path;
    my $io;

    react whenever $in -> \v {
        state $first = True and ( $first = False; $io = v ~~ Blob ?? $path.open(:w, :bin) !! $path.open(:w) );
        $io.write: v;
    }
}

The feed operator calls its RHS with the return value of the LHS as the last positional parameter. By using named arguments for our stream processors, we end up with just one positional and are a little more descriptive for the (often optional) options. The progress indicator is using NOP to allow :speed(NOP) to disable mega-bits-per-second. Having &prefix and &suffix is done in hopes to make functional programming easy. Checking the type of the first chunk against Blob to decide if we need to open the file in binary mode is a hack. I’m not happy with that.

Sadly Supply and Channel are not typed. They will pass whatever value is emitted to the other side. If they would be parametrised roles that default to Mu I would offload that decision into multi-dispatch. The 2nd hack that emulates FIRST in the react block would also go away. Further, Cro and other stream producers would be easier to document --> Supply[Blob] would suffice. By moving the type check to .emit, consumers of streams would not need to worry about getting the wrong type. Any failed type check would happen closer to the point where the wrong value is produced. Hunting down errors in concurrent code is not fun. Any help is well worth it in this reguard.

As a module author we can mitigate that design shortcoming with mixins.

role Typed[::T] {
    method of { T }
}

role WithLength[$length = *] {
    has $.length = $length;
}

constant BlobSupply = Typed[Blob];

my $s = (Supplier.new.Supply but Typed[Blob]) but WithLength[42];

multi sub f(Supply $s where * ~~ Typed[Blob] & WithLength) {
    say „I got Blobs with a total size of {$s.length} byes.“;
}

multi sub f(Supply $s) {
    fail(‚No can do!‘);
}

f $s;

With the new dispatcher where-clauses are going to be less slow. Yet, a solution in CORE would be much better. As a proper language feature it would get more use. IO::Handle does provide a Supply for read but not for write operations. With typed streams this would be much easier to implement.

Idioms are formed in natural languages to make communication more efficient and precise at the same time. I believe the same is true for PLs. The best idioms are those that are easy to guess the meaning of. With the help of the feed operator and good names that seams to be quite possible. Which leaves the question where we document our idioms. So this needs more thought. Pretty much the only good thing about this pandemic is that we all got more time to do so. The less distractions, the better. I think I gonna play a game now. :-D

*) I was about to write “last video”, what, surprisingly, constitutes as valid English. But we don’t want that. jnthn, please moar of the same! Your VMs and videos are really good.

Categories: Raku