Archive
Concurrent dogfood
I’m using zef --verbose test .
in a Makefile triggered by the F2 key to run test in Raku projects. While zef
is very nice it’s also very slow not fast yet. It could be a good bit faster if it could run tests in parallel. Since I split up tests in individual files running them concurrently isn’t all that hard. The biggest hurdle is to collect all the outputs and show them without mixing up lines. With Shell::Piping
that is very easy indeed.
constant NL = $?NL;
my &RED = { "\e[31m$_\e[0m" };
my &BOLD = { "\e[1m$_\e[0m" };
&RED = &BOLD = { $_ } unless $*OUT.t;
sub run-test(IO::Path $file where { .e & .f }) {
my @out;
my @err;
my $failed;
px«raku -Ilib $file» |» @out :stderr(@err) :done({$failed = .exitcodes.so});
(„Testing: {$file}“.&BOLD, @out, @err».&RED).flat.join(NL);
}
I shell out to raku
and let it run a single test file. The streams of STDOUT and STDERR end up in Arrays. These are then merged in the right order with some colouring for good measure. Now I have to get list of files and run run-test
in parallel.
.put for dir(‚t/‘).grep(*.extension eq ‚t‘).sort.hyper(:batch(1), :degree(12)).map(*.&run-test);
The outputs are .put
out in the right order thanks to the magic of .hyper
. With a single raku
process the tests need 11.3s. With 12 threads it’s down to 3s. I shall change the binding to F2 in vim at once!
The whole script can be found here and Shell::Piping
here. The latter will land in the ecosystem shortly.
Dogfood time!
Shell::Piping
has now all the features I had on my list. So it is time to use it. I still need to work on documentation in the form of README.md. It would be nice to render it to HTML without pushing to github. As it turns out there is a ruby gem that takes a fancy Markdown file and outputs a HTML-fragment. I already had a script that embeds HTML into an HTML page with some effort to make it printable.
#! /usr/bin/env raku
use v6;
put q:to/EOH/;
<html>
<head>
<style>
body {
margin: auto; margin-top: 4em; max-width: 80em;
}
@media print {
@page { margin: 4em; }
p { break-inside: avoid; break-before: avoid; }
h3 { break-after: avoid; break-before: avoid; }
h2 { break-after: avoid-page; break-before: auto; }
}
</style>
</head>
<body>
EOH
put $*ARGFILES.slurp;
put ‚ </body>‘;
put ‚</html>‘;
Since this script is taking input from STDIN by the virtue of $*ARGFILES
it lends itself to be part of a unix pipe. Starting such a pipe by hand is way to much work. Writing to README.md creates all the data needed to decide that and how to create a README.html.
#! /usr/bin/env raku
use v6;
use Shell::Piping;
react whenever Supply.merge(Promise.in(0).Supply, ‚.‘.IO.watch) {
say "something changed";
for dir(‚.‘).grep(*.ends-with('.md')) {
my $src = .IO;
my $dst = $src.extension(‚html‘);
if $src.modified > (try $dst.modified // 0) {
my @html;
#`[MARK] px«commonmarker $src» |» px<html-container> |» $dst;
say ‚spurted!‘;
}
}
}
The MARK
ed line saves me about a dozen lines of code not counting error handling. Most errors will produce error messages by having exceptions thrown by Shell::Pipe
. Since we can’t easily have dependencies across language borders, it would be nice to remind my future self what is needed.
CATCH {
when X::Shell::CommandNotFound {
when .cmd ~~ ‚commonmarker‘ {
put ‚Please install commonmarker with `gem install commonmarker`.‘;
exit 2;
}
default {
.rethrow;
}
}
}
I’m specialising an exception so to speak. There is the X::Shell::CommandNotFound
type object and a conditional. If that conditional is not met we use the already provided exception. Otherwise we replace the message with a better one. I believe that is a patten worth thinking about. There may be more boilerplate to remove.
Augmenting with Exitcode
In my last post I found a nice way to match against an Exitcode. I wanted to extend that to matching against STDERR if exitcode is non-zero. I already got a way to capture all error streams of a pipe.
px<find /tmp» |» px<your-script-here> |» @a :stderr(Capture);
I’m using Capture
(the type object) in the same way as we use *
or Whatever
. It just indicated that magic stuff should happen. That magic boils down to sticking all STDERR streams into a 2-dimensional array. If I want to handle errors I might want to match against the exitcode, the name of the shell-command and of parts of it’s output to STDERR. A syntax like the following would be nice.
my $ex = Exitcode.new: :STDERR(<abc def ghi>), :exitint(42), :command<find>;
given $ex {
when ‚find‘ & 42 & /def\s(\S+)/ {
note „find terminated with 42 and $0“;
}
}
As it turns out getting the match against Str
, Numeric
and Regex
in a Junction is easily done. All we need to do is augment
ing Regex
.
augment class Regex {
multi method ACCEPTS(Regex:D: Shell::Piping::Exitcode:D $ex) {
?$ex.STDERR.join(„\n“).match(self)
}
}
augment class Int {
multi method ACCEPTS(Int:D: Shell::Piping::Exitcode:D $ex) {
self.ACCEPTS($ex.exitint)
}
}
augment class Str {
multi method ACCEPTS(Str:D: Shell::Piping::Exitcode:D $ex) {
self.ACCEPTS($ex.command)
}
}
This only works for matching. I don’t get (\S+)
to capture into $0
that way. We know and love that Str.match
does do that – seemingly with ease. Let’s steal codelearn from it!
proto method match(|) { $/ := nqp::getlexcaller('$/'); {*} }
So the first thing .match
is doing is to bind it’s local $/
to the caller’s one. Thus any changes to the local version will actually change the caller’s. I tried to mimic that and it didn’t work. Neither the nqp-way nor the slightly cleaner Raku way.
augment class Regex {
multi method ACCEPTS(Regex:D: Shell::Piping::Exitcode:D $ex) {
CALLER::<$/> := $/;
?$ex.STDERR.join(„\n“).match(self)
}
}
At least not in the given/when
block. A simply say $ex ~~ def\s(\S+);
in the global scope did work just fine. I even got an error message that \$
does not exist in OUTER
. Given that it should exist in every block by definition, that was rather strange.
We can inspect the lexical scope of the caller with the following construct.
say CALLER::.keys;
say Backtrace.new.gist;
This will output the lexicals and how many stack frames there are. And indeed, given/when
does introduce an additional stack frame that contains $_
but not $/
(and a few other bits and bobs). Since CALLER
is a Stash
, what is turn halve a Hash
we can use :exists
as usual and add another caller.
augment class Regex {
multi method ACCEPTS(Regex:D: Shell::Piping::Exitcode:D $ex) {
CALLER::<$/>:exists ?? (CALLER::<$/> := $/) !! (CALLER::CALLER::<$/> := $/);
?$ex.STDERR.join(„\n“).match(self)
}
}
Now it works as envisioned.
However – actually HOWEVER – I am augmenting a buildin class. That is risky. I am not alone. We got an ircbot that can grep in the source codes of the ecosystem.
20:01 < gfldex> greppable6: augment class
20:01 < greppable6> gfldex, 49 lines, 26 modules: https://gist.github.com/4088b5b8e7b51d94276b15500c240a5f
When we write modules we introduce scopes where our custom names reside in. The user of a module can decide to import those names into a scope under the users control. When we augment
we introduce a new name into a globalish scope. What happens if two modules have the same idea? If the injected method is actually a multi then it will most likely work. But it does not have to. When two or more multi candidates have the same precedence, the first one found will win. By using augment
on a non-multi we got a chance to get an error message. If we add a method via MOP we wont. If I want to allow smart match in a when
statement or block, I need to provide the ability to have my custom class on the LHS of ~~
. So there is no way around augment
. I would even go so far as to say that this is the reason augment
was added to the design of the language. It gets worse when we consider Raku to get less young and the language version keeps increasing. A module with sloppy tests might collide with a new method added with a language release. Should we enforce a use statement with a language version in a compunit with an augment
statement?
This is really bothering me. We can of cause use META6 and add the field augments:"Cool,Int,Regex"
. That way zef
would have a chance to spot collisions and provide a warning. Sadly, there is no way to enforce this (becaue of EVAL
). I will spend some more time thinking about this and might start a problem solving issue.
Handling Failure
After some back and forth I have found a practical way to handle error conditions in Shell::Piping
. The practical way is to have more then one way. A process does have an exitinteger (called a code, because it can be quite cryptic) and text output to STDERR to indicate something went wrong. Sometimes we need sloppy error handling, sometimes we need to look into the textual output and react to it.
I found a really nice way to use a Junction and Rakus type system to remove some boilerplate from error handling. Combining both allows us to create a flexible type.
class Exitcode {
has $.value;
has $.command;
method Numeric { $.value }
method Str { $.command }
}
So this class produces objects that are both a number and a text. What is actually looked at depends on who is looking. We can use infix:<~~>
to make the decision which comparison operator to use.
say $ex ~~ 42 && $ex ~~ ‚find‘; # OUTPUT: True
That’s still quite wordy. We can use a Junction because it binds tighter then ~~
.
say $ex ~~ 42 & ‚find‘; # OUTPUT: True
Now we can CATCH
an Exception and narrow done the command in a pipe that failed easily.
CATCH {
when X::Shell::NonZeroExitcode {
given .exitcode {
when 42 & ‚find‘ {
warn ‚Oh now! We found the answer!‘;
}
}
}
}
Not all users of a module might like to use Exceptions. So we use a construct in a Shell::Pipe
object to create a Failure
to return from .sink
. If the method Shell::Pipe.exitcode
is called, we assume the user is dealing with them by hand. We can then call .handled
to “abort” the Exception. This has to be easy or it might get skipped. Hence the unusual usage of the coercer methods in the class Exitcode
.
Mimicing quotes
My quest to shorten code is coming along nicely. One thing that is still quite wordy is object creation. To fill a pipe with life we need instances of Proc::Async
.
my $find = Proc::Async.new: </usr/bin/find /tmp>;
As I showed earlier operators are deboilerplaters. We will need that argument list that is currently going to .new
. It would be nice to reduce anything else to just two letters. Let’s define a sub prefix
to do so.
sub prefix:<px>(List:D \l) is looser(&infix:<,>) { say ‚Proc::Async.new: <‘, l.Str, ‚>‘; };
px</usr/bin/find /tmp>; # OUTPUT Proc::Async.new: </usr/bin/find /tmp>
px ‚/usr/bin/find‘, ‚/tmp‘; # OUTPUT Proc::Async.new: </usr/bin/find /tmp>
By using is looser(&infix:<,>)
we tell the compiler to create a List
and then call the sub called px
with a funny syntax. By doing so we mimic a quote construct that is used to implement qx
and are able to leave a space out. There is a catch though.
px«/usr/bin/find /tmp»;
# OUTPUT:
# ===SORRY!=== Error while compiling /home/dex/projects /raku/lib/raku-shell-piping/EVAL_0
# Two terms in a row
# at /home/dex/projects/raku/lib/raku-shell-piping/EVAL_0:1
# ------> px«/usr/⏏bin/find /tmp;
# expecting any of:
# infix
# infix stopper
# statement end
# statement modifier
# statement modifier loop
The grammar seams to be confused here. Since there is an alternative to list quotes with interpolation, I shall bugrepot and soldier on.
My nagging via R#3799 has been fruitful and provided a solution for this case.
class C {
has $.state is rw;
}
my \px = C.new;
multi postcircumfix:<{ }>(C:D, $s, Bool :$adverb = False) {
Proc::Async.new: $s;
}
multi postcircumfix:<{ }>(C:D, @a, Bool :$adverb = False) {
Proc::Async.new: @a;
}
multi infix:<|»>(Proc::Async:D $l, Proc::Async:D $r, :$different-adverb = "non-given") {
dd $l;
dd $r;
}
px<ls>;
px«ls»;
my $a = 42;
px<ls 1 2 3 $a>;
px«ls $a»;
px«ls $a»:adverb;
px{'foo' ~ 41.succ};
px«ls $a»:adverb |» (px«sort»:adverb) :different-adverb(42);
This is providing all I need to mimic qx
in most of it various forms. The instance of C
is used as a placeholder for the compiler to hold onto. It can have state that is global to all calls to postcircumfix:<{ }>(C:D, ...)
. This might came in handy later.
Shell::Piping
has caused 3 bug reports so far. I do feel like I’m the first to tread this swamp. If I don’t make it, please hug my friends and delete my browser history.
Awaiting a bugfix
When using Proc::Async
we need to await
while waiting for a fix for R#3817. While adding many tests to Shell::Piping I got a flapper on Travis. After my last misadvanture while testing async code I learned to use stress -c 30
to make sure that OS threads are congested. And sure enough, I got the same tests to fail then on Travis. A workaround is to await
after reading form Proc::Async.stdout
. In my code that looks like the following.
for $.proc-out-stdout.lines {
my $value := &.code.($_);
my $processed = $value === Nil ?? ‚‘ !! $value ~ "\n";
await $.proc-in.write: $processed.encode with $.proc-in;
# ^^^^^ WORKAROUND for R#3817
}
I then had to add a piece of odd code at another place to have something to await
on.
method write($blob) { my $p = Promise.new; $p.keep; a.push: $blob.decode.chomp; $p }
The Promise
is really just there so we can nudge Rakudo to have a good look at its threads. If you are using Proc::Async
in your code please check for .write
and test it on a system with more work then cores. You wont get an error with this bug. It will just silently drop values that are send via .write
to another process or fetched via .stdout.lines
. Good hunting!
Deboilerplating
I agree with Damian that envy is clearly a virtue. We should add being boastful to that list. What good does it that we can make easy things easy without much efford, if we never tell anybody? Hence this blog.
While working on Shell::Piping
I realised that many languages use operators or operator overloading to get rid of plenty of boilerplate. Please allow me to boastfully illustrate.
my $find = Proc::Async.new('/usr/bin/find', '/tmp');
my $sort = Proc::Async.new('/usr/bin/sort', :w);
$find.stdout.lines.tap: -> $l {
once await $sort.ready;
$sort.write: „$l\n“.encode if $l ~~ /a/;
}
my $handle-sort-output = start { put $sort.stdout.lines.join(„\n“); }
my $sort-started = $sort.start;
{
await $find.start;
CATCH { default { } }
}
$sort.close-stdin;
await $handle-sort-output;
So we basically find /tmp | grep "a" | sort
. Sort is a bit unusual as it waits for its STDIN to be closed before it actually starts to do anything. We don’t use grep
but do the filtering ourselves. If we wouldn’t we could just shell
-out and save us the bother. I found a way to do the same with a little less code.
$find |> -> $l { $l ~~ /a/ ?? $l !! Nil } |> $sort :quiet;
Looking at the operators of Raku that pattern can be found all over the place. Especially the hyper operators replace a loop and/or a chain of method calls with a single expression. The same goes for subscripts.
my %a = flat (1..12) Z <January February March April May June July August September October November December>;
say %a<3 5 7>;
# OUTPUT: (March May July)
Here the postcircumfix operator iterates over 3 5 7
and for each element calls .AT-KEY
on %a
. The result is returned as a list of the produced values. If we would do that by hand, we would be fairly quickly by 6 lines of code.
An instance of Proc::Async
can only run once. When shell scripting in Raku that might become a burden. I need a way to declare a routine (or something that behaves like one) that will create objects from the same set of arguments. My goal was the following.
Shell::<&ls> = px<ls -l>;
Shell::<&grep> = px<grep>;
Shell::ls |> Shell::grep('a');
This would be quite easy to achieve if I didn’t step on a bug. So for now there is one extra space needed in px <grep>
. While I was on it I added some error checking in px
. It makes little sense to try to start a shell command if the file in question is not there nor executable.
Simple error handling however was easy to add because I rely on an infix to build the pipe. In contrast to postcircumfix they are well supported by Rakudo.
multi sub handle-stderr(|) { };
multi sub handle-stderr(0, $line) { say „ERR stream find: $line“ };
$find |> $errorer |> $sort :done({ say .exitcode if .exitcode }) :stderr(&handle-stderr);
The adverb :stderr
registers a callback with the Shell::Pipe
that is called with lines of any STDERR of the members of the pipe. As a first argument that callback receives the position of the command that produced that line. By using a multi we can offload the selection of the correct handler to the compiler. A single |
in a signature declares a default candidate that will catch all otherwise unhandled outputs to STDERR. The operator doesn’t really do much here.
my multi infix:«|>»(Shell::Pipe:D $pipe where $pipe.pipees.tail ~~ Shell::Pipe::BlockContainer, Proc::Async:D $in, :&done? = Code, :$stderr? = CodeOrChannel, Bool :$quiet ?) {
...
$pipe.done = &done;
$pipe.stderr = $stder;
$pipe.quiet = $quiet;
...
}
It just sets a public attribute of the Shell::Pipe
object to the provided callback. So the pattern that I’m using here is actually quite simple. Use an infix to turn two operands into an intermediary type. If more infix are used on that type, add to its state. Adverbs are used to trigger optional behaviour. The compiler will then call .sink
on that intermediary to set anything in motion. As the first example in this blog post shows that motion can actually be quite elaborate. Yet we can hide that behind very cursory syntax by defining a custom operator.
I managed to implement anything I need to start turning the whole thing into a module. Automatic testing of that module will be a bit challenging. Luckily Raku is well suited to test crashing as jnthn kindly pointed out.
Judging by the weeklies we are producing more blog posts then ever. I welcome that move. We have no reason to be modest about Raku. Go forth and show off!
Indicating absence
I can’t do any smartassing on Stackoverflow because I’m incompatible with the license they use there. That wont stop me from reading questions uzlxxxx did ask. In his last gainage of knowledge he sought to use Nil
to indicate the absence of a value at the end of a linked list. Nil is an old friend of this blog and it took me a good while to getting to like it.
Nil does indeed indicate the absence of a value. Indication is quite an active word and one can ask the question who is indicating to whom. I believe Nil is indicating the absence of a value to the compiler.
my $var = "value";
$var = Nil;
dd $var;
# Any $var = Any
To the debugger (That is you and me, the Debugger doesn’t remove any bugs. We do.) the absence is indicated by Any. As jnthn pointed out in the case of a Node in a linked list a type object associated to that list makes more sense. That’s not what Rakudo is doing.
my constant IterationEnd = nqp::create(Mu);
# eqv to Mu.new;
It’s using an instant of Mu
which introduces some problems.
my $var = Mu.new;
say [$var.Bool, $var.defined];
# OUTPUT: [True True]
Requesting an element beyond the end of a list should not be True nor defined. We could help that by mixin a role in.
my \Last = Mu.new but role {
method defined { False };
method Bool { False }
};
say [.Bool, .defined, .^name] given $var;
# OUTPUT: [False False Mu]
That’s better. It will work with if
, with
and //
. But for debugging it’s not so nice. We don’t get a specific error message or any information where the undefined value came from. We can define a singleton with a better name.
constant \Last = class BeyondLast {
method defined { False };
method Bool { False }
}.new but role { method new { die 'Sigleton is better left alone' } };
say [Last.WHAT, Last.defined, Last.so, Last ~~ Last, Last === Last ];
# OUTPUT: [(BeyondLast+{<anon|1>}) False False True True]
Now we get something undefined, wrong and better named. If there is any runtime error we wont get a message telling us where it came from. There is a whole class of objects that are undefined and false. We can use an exception bottled up in a Failure
as a default value.
constant Last = Failure.new(X::ValueAfterLast.new);
say [Last ~~ Last, Last === Last];
# OUTPUT: [True True]
my $node is default(Last); # line 3
$node = 42;
$node = Nil;
say $node === Last;
say $node;
CATCH { default { say .^name, ': ', .gist } }
# OUTPUT: True
X::ValueAfterLast: Value after last element requested.
in block at /home/dex/tmp/tmp.raku line 3
Sadly is default
does not allow code object or we could get a stacktrace that points to where the Nil was assigned. If the Failure
object slips through we get at least a decent error message.
There are many way to indicate unusual values. However, none of them should end with the user of a module. We got Failure
for that.
Spinning up sort
My Linux box is transcoding from mp4 to av1 like a boss. I figured that shrinking the space hogs to halve is cheaper then doubling disk space. Running with a load of 20.29 for a few days helped to uncover a bug and a ENODOCish.
I wanted the following code to DWIM.
my $obj = class AnonClass {
has @.a;
method push(\e) { self.a.push: e; self }
method list { self.a.list }
}.new;
my $sort = Proc::Async.new('/usr/bin/sort');
{ (‚a‘..‚z‘).roll(10) } |> $sort |> $obj;
Have a block that returns a list (that’s the small one that might be lazy) and feed it’s values into sort
. Not overly helpful to have a blocking shell command with a lazy list but blocking did help to uncover a thinko of mine before. The result is then fed into something I call an Arrayish
, a not-quite-type defined by a subset.
subset Arrayish of Any where { .^can(‚push‘) && .^can(‚list‘) }
Used in a Signature it basicly means: “If you give me an object that got .push
and .list
with the semantics of the buildin types, I will gladly take it.” The pipe operator handling this case looks as follows.
my multi infix:«|>»(Arrayish:D \a, Proc::Async:D $in) {
my $pipe = Shell::Pipe.new;
$pipe.pipees.push: a;
$pipe.pipees.push: $in;
# FIXME workaround R#3778
$in.^attributes.grep(*.name eq '$!w')[0].set_value($in, True);
$pipe.starters.push: -> {
| $in.start, start {
LEAVE try $in.close-stdin;
await $in.ready;
$in.write: „$_\n“.encode for a.list;
}
}
$pipe
}
At first I did not have the blocking await $in.ready
what caused the block to spit text in the direction of sort
before the latter was properly started and ready to read on its STDIN. The docs mention .write
and .ready
but don’t explain that you need to use them together. If my system would have been idle I might not have spotted it. So I can conclude that porn is useful, so long as it is made smaller. A surprising thought because it’s usually used to make things bigger.
If you are using async behaviour it might help to run stress -c 32
in the background to push your heisenbugs over the edge. While you are on it, please push my bug over the edge too. I don’t really want to have them.
Unrecursing
Moritz was unhappy with the power Raku gave him to wrestle with lists. And he is right. If easy things are easy, no wrestling is required. That made me think about the data structure I build in my last blog post. It’s a list of pairs of a list and a Proc::Async
.
[[[Proc::Async],Proc::Async],Proc::Async]
Whereby the list has the method .start
mixed in. That allows me to connect the shell commands in order and start them in reverse order without special casing to get .start
called. After all I need to connect STDOUT and STDIN before I start a pair of shell commands. However, any form of introspection becomes a burden. And I need to check if an Array is not at the start or end of a pipe chain.
@a |> $grep |> $sort; # this is fine
$find |> $sort |> @a; # this too
$find |> @a |> $sort; this can not work
An Array is not a concurrent data structure. The left and right side of the chain are. So we can’t mix them. (I believe I can make this work when R#3778 is fixed.)
So I rewrote what I had so far. As a side effect we can store a pipe and start it later by hand and provide a nice gist.
my $find = Proc::Async.new('/usr/bin/find', '/tmp');
my $sort = Proc::Async.new('/usr/bin/sort');
my @a;
my $p = $find |> $sort |> @a;
say $p;
#OUTPUT: find ↦ sort ↦ @a
Whereby $p
contains a Shell::Pipe
which has @.pipees
. So we can do something like this.
for $p.pipees -> $p { $p.stderr.tap(-> {}) if $p ~~ Proc::Async}; # silence is golden
$p.start;
I want to support Supplies, Channels and Callables as start and end of a pipe. Maybe even in between. Then I can move on to tackle error handling.
It is very tempting to build elaborate data structures because Raku is so good at deconstructing them. This seams to be an option best to be avoided. Elegance might just be the solution with the least moving parts.
You must be logged in to post a comment.