Perl6 | Playing Perl 6␛b6xA Raku

Tracing whats missing

July 7, 2019 gfldex 1 comment

I have a logfile of the following form that I would like to parse.

[ 2016.03.09 20:40:28 ] (MessageType) Some message text that depends on the <MessageType>

Since the form of the text depends on the message type I need a rule to identify the message type and a rule to parse the message body itself. To aid my struggle through the Grammar in question I use Grammar::Tracer from jnthn’s Grammer::Debugger module. It’s a fine module that will tell until where a match was OK and at which point the Grammar gave up parsing. In the case of a successful match it shows part of the substring that was successfully parsed. If parsing a rule or token fails it will tell you but wont show the offending string. The whole purpose of Grammar wrangling is to identify the bits that wont match and change the Grammar until they go away. Not showing the offending string is not overly helpful.

But fear not as Grammars are classes and as such can have methods. Let’s define one and add it to a chain of options.

method parse-fail {
    # self is a subclass of Grammar
    say self.postmatch.substr(0, 100);
    exit 0;
}

rule body-line { '[' <timestamp> ']' [ <body-notify> | <body-question> | <body-info> | <body-warning> || <parse-fail> ] }

So when none of the known message types match the Grammar stops and shows the string that still needs to be handled. With that I could parse all 8768 files until I got them all covered. Also this is much faster then running with Grammar::Tracer.

It seems to be very useful to have folk implement a language they would like to use to implement that language.

Categories: Perl6

Whatever whenever does

May 31, 2019 gfldex 1 comment

Jnthn answered the question why $*IN.lines blocks in a react block. What isn’t explained is what whenever actually does before it starts blocking.

react {
    whenever $*IN.lines { .say }
}

Looking at the syntax of a whenever block, we see that whenever takes a variable immediatly followed by a block. The only place where a structure like that can be defined is Grammar.nqp.

rule statement_control:sym<whenever> {
    <sym><.kok>
    [
    || <?{
            nqp::getcomp('perl6').language_version eq '6.c'
          || $*WHENEVER_COUNT >= 0
       }>
    || <.typed_panic('X::Comp::WheneverOutOfScope')>
    ]
    { $*WHENEVER_COUNT++ }
    <xblock($PBLOCK_REQUIRED_TOPIC)>
}

Here the grammar just checks a few things without actually generating any code. So we head to Actions.nqp.

method statement_control:sym<whenever>($/) {
    my $xblock := $<xblock>.ast;
    make QAST::Op.new(
        :op<call>, :name<&WHENEVER>, :node($/),
        $xblock[0], block_closure($xblock[1])
    );
}

The whenever block is converted to a call to sub WHENEVER which we find in Supply.pm6.

sub WHENEVER(Supply() $supply, &block) {

There we go. A whenever block takes its first argument of any type and calles .Supply on it, as long as Any is a parent of that type. In the case of $*IN that type will typically be whatever IO::Handle.lines returns.

Seq.new(self!LINES-ITERATOR($close))

To turn a Seq into a Supply Any.Supply calls self.list.Supply. Nowhere in this fairly long chain of method lookups (this can’t be fast) are there any threads to be found. If we want to fix this we need to sneak a Channel into $*IN.lines which does exactly that.

$*IN.^can('lines')[1].wrap(my method {
    my $channel = Channel.new;
    start {
        for callsame() {
            last if $channel.closed;
            $channel.send($_)
        }
        LEAVE $channel.close unless $channel.closed;
    }
    $channel
});

Or if we want to be explicit:

use Concurrent::Channelify;

react {
    whenever signal(SIGINT) {
        say "Got signal";
        exit;
    }
    whenever $*IN.lines⇒ {
        say "got line";
    }
}

We already use ⚛ to indicate atomic operations. Maybe using prefix:<∥> to indicate concurrency makes sense. Anyway, we went lucky once again that Rakudo is implemented (mostly) in Perl 6 so we can find out where we need to poke it whenever we want to change it.

Categories: Perl6

Nil shall warn or fail but not both

May 14, 2019 gfldex 1 comment

As announced earlier I went to write a module to make Nil.list behave a little better. There are basicly two way Nil could be turned into a list. One should warn the same way as Nil.Str does and the other should end the program loudly. Doing both at the same time however does not make sense.

There are a few ways this could be done. One is augmenting Nil with a list method and have this method check a dynamic variable to pick the desired behaviour. That would be slow and might hurt if Nil.list is called in a loop. The other is by using a custom sub EXPORT and a given switch.

# lib/NoNilList/Warning.pm6
use NoNilList 'Warning';

# lib/NoNilList/Fatal.pm6
use NoNilList 'Fatal';

# lib/NoNilList.pm6

sub EXPORT($_?) {
    given $_ {
        when 'Warning' {
             # augment Nil with a warning .list
        }
        when 'Fatal' {
             # augment Nil with a failing .list
        }
        default {
            die 'Please use NoNilList::Warning or NoNilList::Fatal.';
        }
    }

    %() # Rakudo complains without this
}

Now use NoNilList; will yield a compile time error with a friedly hint how to avoid it.

I left the augmenting part out because it does not work. I thought I stepped on #2779 again but was corrected that this is acually a different bug. Jnthn++ fixed part of that new bug (Yes, Perl 6 bugs are so advanced they come in multiple parts.) and proposed the use of the MOP instead. That resulted in #2897. The tricky bit is that I have to delay augmentation of Nil to after the check on $_ because augment is a declarator and as such executed at compile time — in a module that can be months before the program starts to run. Both an augment in an EVAL string and the MOP route would lead there. I wanted to use this module as my debut on 6PAN. That will have to wait for another time.

If you find a bug please file it. It will lead to interresting discoveries for sure.

Categories: Perl6

MONKEY see no Nil

May 4, 2019 gfldex Leave a comment

In a for loop Nil is turned into a List with one Element that happens to be Any. This really buged me so I went to find out why. As it turns out the culprit is the very definition of Nil is Cool. To be able to turn any single value into a List Cool implements method list(). Which takes a single values and turns that value into a List with that one value. Nil indicates the absense of a value and turning it into a value doesn’t make sense. Luckily we can change that.

use MONKEY-TYPING;

augment class Nil {
    method list() {
        note 'Trying to turn Nil into a list.';
        note Backtrace.new.list.tail.Str;
        Empty
    }
}

Nil.HOW.compose(Nil);

sub niler() { Nil }

for niler() { say 'oi‽' }

We can’t just warn because that would show the wrong point in the stack trace. So we note (which also goes to $*ERR) and pull the last value out of the Backtrace.

Interestingly Failure throws both in .list and in .iterator. Nil implements push, append, unshift and prepend by immediatly die-ing. Adding more to nothing is deadly but turning nothing first into something vaguely undefined and then allowing to add more stuff to it is inconsistent at best. What leads me to believe that Nil.list as it is specced today is just an oversight.

At least I can now write a simple module to protect my code from surprising Nils.

Categories: Perl6

Parallel permutations

April 27, 2019 gfldex Leave a comment

Jo Christian Oterhals asked for a parallel solution for challenge 2. I believe he had problems to find one himself, because his code sports quite a few for loops. By changing those to method call chains, we can use .hyper to run at lease some code concurrently.

use v6.d;

constant CORES = $*KERNEL.cpu-cores;

# workaround for #1210
sub prefix:<[max]>(%h){ %h.sort(-*.value).first }

my %dict = "/usr/share/dict/words".IO.lines.map({ .lc => True });

my %seen;
%dict.keys».&{ %seen{.comb.sort.join}++; };

with [max] %seen {
    say .value, .key.comb.hyper(:batch(1024), :degree(CORES)).permutations».join.grep({ %dict{$_}:exists }).Str
}

My approach is a little different then Jo’s. I don’t try to keep all combinations around but just count the anagrams for each entry in the word list. Then I find a word with the most anagrams (there are more candidates with the same amount that I skip) and reconstruct the anagrams for that word.

The only operation where any form of computation happens is the generation of permutations. Anything else is just too memory bound to get a boost by spinning up threads. With the .hyper-call the program is a tiny wee bit faster then with just one thread on my Threadripper box. A system with slow cores/smaller caches should benefit a little more. The main issue is that the entire word list fits into the 3rd level cache. With a bigger dataset a fast system might benefit as well.

In many cases multi-core systems are fairy dust, which makes the wallets of chip makers sparkle. Wrangling Hashs seams to be one of those.

Categories: Perl6

Nil is a pessimist

April 24, 2019 gfldex Leave a comment

Guifa was unhappy with $xml.elements returning a list with one undefined element if there are no child nodes. That led me to the conclusion that Nil is only halve Empty.

Let’s consider this piece of code.

sub nilish() { Nil }; 
for nilish() { say 'oi‽' }

my $nil := nilish();
for $nil { say 'still oi‽' }

sub no-return() { }
for no-return() { say 'even more oi‽' }

sub return-a-list( --> List:D ) { Nil }
for return-a-list() { say 'Should this happen?' }

# OUTPUT:
# oi‽
# still oi‽
# even more oi‽
# Should this happen?

We are iterating over the special container-reset-value called Nil because there is no container to reset. Since for binds to its arguments it binds to Nil. A type object, even a very special one as Nil, is a single item which is treated as a list with one element by for.

We can solve this problem by a multi sub that turn unreasonable values into the empty list.

multi sub undefined-to-empty(Nil) { Empty }
multi sub undefined-to-empty(@a where all @a.map(!*.defined)) { Empty }
multi sub undefined-to-empty(\item) { item }

for undefined-to-empty(Nil) { say 'nothing happens' }
for undefined-to-empty((Any,)) { say 'nothing happens' }

By adding a candidate that checks if there are only undefined values in a list we can also solve guifa`s problem.

This is of cause just a shim. The real solution is to stop turning nullinto Nil in native bindings. If you write a sub that has to return a list but can’t either fail or return Empty if there is nothing to return. To help avoid that mistake in the future I filed #2721.

If it looks empty or sounds emtpy or tastes emtpy make it Empty!

Categories: Perl6

Wrapping a scope

April 21, 2019 gfldex 1 comment

Intricate instruments like *scopes can be quite temparatur sensitive. Wrapping them into some cosy insulator can help. With Perl 6 it is the other way around. When we wrap a Callable we need to add insulation to guard anything that is in a different scope.

On IRC AndroidKitKat asked a question about formatting Array-stringification. It was suggested to monkeytype another gist-method into Array. He was warned that precompilation would be disabled in this case. A wrapper would avoid this problem. For both solutions the problem of interference with other coders code (in doubt that is you halve a year younger) remains. Luckily we can use dynamic variables to take advantage of stack magic to solve this problem.

Array.^can('gist')[0].wrap(
    sub (\a){ 
        print 'wrapped: '; 
        $*dyn ?? a.join(',') !! nextsame 
    }
);

my @a = [1,2,3];

{
    my $*dyn = True;
    say @a;
}

say @a;

# output:
wrapped: 1,2,3
wrapped: [1 2 3]

Dynamic variables don’t really have a scope. They live on the stack and their assigned value travels up the call tree. A wrapper can check if that variable is defined or got a specific value and fall back to the default behaviour by calling nextsame if need be. Both.wrap and dynamic variables work across module bounderies. As such we can make the behaviour of our code much more predictable.

This paragraph was meant to wrap things up. But since blogs don’t support dynamic variables I better stop befor I mess something up.

Categories: Perl6

I like Rakudo 100x

March 25, 2019 gfldex 1 comment

One of my scripts stopped working without any change by my hands with a most peculiar error message:

Type check failed in binding to parameter '$s'; expected Str but got Int (42)
  in sub jpost at /home/bisect/.perl6/sources/674E3526955FCB738B7B736D9DBBD3BD5B162E5C (WWW) line 9
  in block <unit> at wrong-line-or-identifier.p6 line 3

Whereby line 9 looks like this:

@stations = | jpost "https://www.perl6.org", :limit(42);

Rakudo is missing the parameter $s and so am I. Because neither my script nor any routine in WWW does contain it. This is clearly a regression on a rather simple piece of code and in a popular module. Since I didn’t check that script for quite some time I can’t easily tell what Rakudo commit caused it.

In #perl6 we got bisectable6, a member of the ever growing army of useful bots. Yet it could not help me because it doesn’t come with the community modules installed. Testing against a few dozen Rakudo versions by hand was out of question. So I mustered the little bash-foo I have and wrote a few scripts to build Rakudos past. This resulted in #2779.

If you wish to go on a bug hunt for time travelers too, clone the scripts, install the modules your script needs and make sure it fails with an exit code greater 0. Then run ./build-head-to-tail.sh <nr-of-commits> to build as many Rakudos as you like. With ./run-head-to-tail <nr-of-commits> <your-script-name-here>. Up to the number of cores of the host tests are run in parallel. After a while you get a list of OK, FAILed and SKIPed commits. Any Rakudo commit that fails to build will be SKIPed.

Running as root may not work because the modules will be put in the wrong spot by zef. A single commit will take about 70MB of disk space with little hope for deduplication.

The brave folk who push Perl 5 ever forward have a whole CPAN worth of tests to check if anything breaks while they change the compiler. Our stretch of land is still quite small in comparison but I hope to have helped with testing it better.

Categories: Perl6

Threading nqp through a channel

February 3, 2019 gfldex 1 comment

Given that nqp is faster then plain Perl 6 and threads combining the two should give us some decent speed. Using a Supply as promised in the last post wouldn’t really help. The emit will block until the internal queue of the Supply is cleared. If we want to process files recursively the filesystem might stall just after the recursing thread is unblocked. If we are putting pressure on the filesystem in the consumer, we are better of with a Channel that is swiftly filled with file paths.

Let’s start with a simulated consumer that will stall every now end then and takes the Channel in $c.

my @files;
react {
    whenever $c -> $path {
        @files.push: $path;
        sleep 1 if rand < 0.00001;
    }
}

If we would pump out paths as quickly as possible we could fill quite a bit of RAM and put a lot of pressure on the CPU caches. After some trial and error I found that sleeping befor the .send on the Channel helps when there are more then 64 worker threads waiting to be put onto machine threads. That information is accessible via Telemetry::Instrument::ThreadPool::Snap.new<gtq>.

my $c = Channel.new;
start {
    my @dirs = '/snapshots/home-2019-01-29';
    while @dirs.shift -> str $dir {
        my Mu $dirh := nqp::opendir(nqp::unbox_s($dir));
        while my str $name = nqp::nextfiledir($dirh) {
            next if $name eq '.' | '..';
            my str $abs-path = nqp::concat( nqp::concat($dir, '/'), $name);
            next if nqp::fileislink($abs-path);
            if Telemetry::Instrument::ThreadPool::Snap.new<gtq> > 64 {
                say Telemetry::Instrument::ThreadPool::Snap.new<gtq>;
                say 'sleeping';
                sleep 0.1;
        }
        $c.send($abs-path) if nqp::stat($abs-path, nqp::const::STAT_ISREG);
        @dirs.push: $abs-path if nqp::stat($abs-path, nqp::const::STAT_ISDIR);
    }
    CATCH { default { put BOLD .Str, ' ⟨', $dir, '⟩' } }
    nqp::closedir($dirh); }
    $c.close;
}

Sleeping for 0.1s before sending the next value is a bit naive. It would be better to watch the number of waiting workers and only continue when it has dropped below 64. But that is a task for a differnt module. We don’t really have a middle ground in Perl 6 between Supply with it’s blocking nature and the value pumping Channel. So such a module might be actually quite useful.

But that will have to wait. I seam to have stepped on a bug in IO::Handle.read while working with large binary files. We got tons of tests on roast that deal with small data. Working with large data isn’t well tested and I wonder what monsters are lurking there.

Categories: Perl6, Uncategorized

nqp is faster then threads

February 2, 2019 gfldex 1 comment

After heaving to much fun with a 20 year old filesystem and the inability of unix commands to hande odd filenames, I decided to replace find /somewhere -type f | xargs -P 10 -n 1 do-stuff with a Perl 6 script.

The first step is to travers a directory tree. I don’t really need to keep the a list of paths but for sure run stuff in parallel. Generating a supply in a thread seams to be a reasonable thing to do.

start my $s = supply {
     for '/snapshots/home-2019-01-29/' {
         emit .IO if (.IO.f & ! .IO.l);
         .IO.dir()».&?BLOCK if (.IO.d & ! .IO.l);
         CATCH { default { put BOLD .Str } }
     }
 }

{
     my @files;
     react whenever $s {
         @files.push: $_;
     }
     say +@files;
     say now - ENTER now;
}

Recursion is done with by calling the for block on the topic with .&?BLOCK. It’s very short and very slow. It takes 21.3s for 200891 files — find will do the same in 0.296s.

The OS wont be the bottleneck here, so maybe threading will help. I don’t want to overwhelm the OS with filesystem requests though. The buildin Telemetry module can tell us how many worker threads are sitting on their hands at any given time. If we use Promise to start workers by hand, we can decide to avoid threading when workers are still idle.

sub recurse(IO() $_){
    my @ret;
    @ret.push: .Str if (.IO.f & ! .IO.l);
    if (.IO.d & ! .IO.l) {
        if Telemetry::Instrument::ThreadPool::Snap.new<gtq> > 4 {
            @ret.append: do for .dir() { recurse($_) }
        } else {
            @ret.append: await do for .dir() {
                Promise.start({ recurse($_) }) 
            }
        }
    } 
    CATCH { default { put BOLD .Str } }
    @ret.Slip
}
{
    say +recurse('/snapshots/home-2019-01-29');
    say now - ENTER now;
}

That takes 7.65s what is a big improvement but still miles from the performance of a 20 year old c implementation. Also find can that do the same and more on a single CPU core instead of producing a load of ~800%.

Poking around in Rakudos source, one can clearly see why. There are loads of IO::Path objects created and c-strings concatenated, just to unbox those c-strings and hand them over to some VM-opcodes. All I want are absolute paths I can call open with. We have to go deeper!

use nqp;

my @files;
my @dirs = '/snapshots/home-2019-01-29';
while @dirs.shift -> str $dir {
    my Mu $dirh := nqp::opendir(nqp::unbox_s($dir));
    while my str $name = nqp::nextfiledir($dirh) {
        next if $name eq '.' | '..';
        my str $abs-path = nqp::concat( nqp::concat($dir, '/'), $name);
        next if nqp::fileislink($abs-path);
        @files.push: $abs-path if nqp::stat($abs-path, nqp::const::STAT_ISREG);
        @dirs.push: $abs-path if nqp::stat($abs-path, nqp::const::STAT_ISDIR);
    }
    CATCH { default { put BOLD .Str, ' ⟨', $dir, '⟩' } }
    nqp::closedir($dirh);
}
say +@files; say now - ENTER now;

And this finishes in 2.58s with just 1 core and should play better in situations where not many filehandles are available. Still 9 times slower than find but workable. Wrapping it into a supply is a task for another day.

So for the time being — if you want fast you need nqp.

UPDATE: We need to check the currently waiting workers, not the number of spawned workers. Example changed to Snap.new<gtq>.

Categories: Perl6

Older Entries

Archive

use nqp;