Archive
You have to take what you can get
And sometimes you have to get it by any means necessary. If it’s a file, you could use Jonathan Stowes URI::FetchFile. Said module checks if any of four modules are available and takes the first that sticks to turn a URI into a file on disk. There is one interesting line in his code that triggered an ENODOC.
$type = try require ::($class-name);
Here require
returns a type object of a class declared by a module with the same name then that module.
Checking roast for that neat trick and playing with the whole dynamic module magic made me realise, that we don’t really cover this in the docs. When I try do handle an ENODOC I like to start with an example that compiles. This time, we need two files.
# M.pm6 unit module M; class C is export { method m { 'method C::m' } }; class D is export { method m { 'method D::m' } };
# dynamic-modules.p6 use v6; use lib '.'; subset C where ::('M::C'); my C $context = try { CATCH { default { .note } }; require ::('M'); ::('M::C') }; dd $context.HOW.^methods.elems; dd $context.HOW.shortname($context);
Any symbol that is loaded via require
will not be available at runtime. Consequently, we can’t have static type checks. Using a subset
and dynamic lookup, we can get us a type object to check against. The where
-clause will smart match against the type object. Since dynamic lookups are slow it may be sensible to cache the type object like so:
subset C where $ //= ::('M::C');
Now we got a type constraint to guard against require
not returning a type that matches the name we expect. Please note we check against a name, not a type or interface. If you have the chance to design the modules that are loaded dynamically, you may want to define a role (that may even be empty) that must be implemented by the classes you load dynamically, to make sure you can actually call the methods you expect. Not just methods with the same name.
Now we can actually load the module by its name and resolve one of the classes dynamically and return it from the try block. Since M.pm6
defined a Module (as in Perl6::Metamodel::ModuleHOW
) as its top level package, we can’t just simply take the return value of require because Module is not the most introspective thing we have in Perl 6. Please note that the symbols loaded by require are available via dynamic lookup outside the try-block. What happens if you go wild and load modules that have symbols with the same fully qualified name, I do not know. There may be dragons.
The case of loading any of a set of modules that may or may not be installed is quite a general one and to my limited knowledge we don’t got a module for that in the ecosystem yet. I therefore would like to challenge my three reads to write a module that sports the following interface.
sub load-any-module(*%module-name-to-adapter); load-any-module({'Module::Name' => &Callable-adapter});
Whereby Callable-adapter provides a common interface to translate one sub or method call of the module to whatever user code requires. With such a module Jonathan may be able to boil URI::FetchFile
down to 50 lines of code.
UPDATE:
Benchmarking a little revealed that `$` is not equivalent to `state $` while it should. To get the speedup right now use the following.
subset C where state $ = ::('M::C');
UPDATE 2:
The behaviour is by design and has been documented.
Being lazy on this side of the Channel
While writing Concurrent::File::Find I started with a non-concurrent, lazy gather/take version that was, after working mostly correct, turned into a concurrent, Channel sporting version. This looked very nice and regular. A while ago I bragged about another nice and regular lazy program. With four lazy lists in a row it would be lovely to turn them all into Channels.
Simplified it looks like this:
my \l1 := gather for 1..10 { take .item }; my \l2 := gather for l1 { take .item * 2 }; .say for l2;
We loop over l1
to generate l2
. Now we need to add a Channel, loop over the lazy lists and .send
each value one-by-one, close the Channel for sure and mixin the role to be able to close the Channel from the outside. While we are on it, only create Channels when Rakudo is told to use more then one thread to avoid the overhead of heaving a Channel in the mix.
my &channelify = %*ENV<RAKUDO_MAX_THREADS>:!exists || %*ENV<RAKUDO_MAX_THREADS>.Int <= 1 ?? -> \c { c } !! -> \list, $channel = Channel.new { start { for list { $channel.send($_) } LEAVE $channel.close unless $channel.closed; } $channel.list but role :: { method channel { $channel } } };
my \l1 := channelify gather for 1..10 { take .item }; my \l2 := channelify gather for l1 { take .item * 2 }; .say for l2;
The method check script went from 0m2.978s down to 0m2.555s with a fair bit of Rakudo startup time in the mix. Not bad for some quick copypasta.
It pays to be lazy early on.
UPDATE: I improved on checking the environment variable. At some point in the future Rakudo will sport a sane default. It’s hardcoded to 16 worker threads, IIRC.
Things I found out while finding
Concurrent::File::Find is now in the ecosystem. Besides a lack of loop detection, thanks to a lack of readlink
in Rakudo and my failed attempt to get __lxstat
(GNUs libc got a strange taste when it comes to symbol names) to cooperate, it’s working quite nicely. There are a couple things I learned that I would like to share.
Having a where
-clause on a sub-signature will not provide the variables inside the sub-signature. If a where
-clause has to operate on all arguments, to check exclusiveness or other interdependence, it has to be applied to the very last argument. This limitation is by design, so I told the docs. That where
-clauses don’t work on captures, was not by design and jnthn kindly fixed it the next day. I know that whining gets you stuff, but I didn’t expect it to be so fast.
I also found that a method called .close
almost always requires a LEAVE
. Let’s have a look at some code.
sub find-simple ( IO(Str) $dir, :$keep-going = True, :$no-thread = False ) is export { my $channel = Channel.new; my &start = -> ( &c ) { c } if $no-thread; my $promise = start { for dir($dir) { CATCH { default { if $keep-going { note .Str } else { .rethrow } } } if .IO.l && !.IO.e { X::IO::StaleSymlink.new(path=>.Str).throw; } { CATCH { when X::Channel::SendOnClosed { last } } $channel.send(.IO) if .IO.f; $channel.send(.IO) if .IO.d; } .IO.dir()».&?BLOCK if .IO.e && .IO.d; } LEAVE $channel.close unless $channel.closed; } return $channel.list but role :: { method channel { $channel } }; }
This basically takes the List
returned by dir
and returns IO::Path
for files and directories via a Channel
. For any directory it recurses in the for
-block (that’s what ».&?BLOCK
means). Since any of those file-tests may fire an exception, we may leave the start
-block. If there is a Promise
, that would send us behind the return
-statement, where there is nothing. The return
-statement is actually in the main-thread and some consumer will be blocking on the returned .list
.
.say for find-simple('/home/you');
For the consumer of that Channel
it is by no means clear that his may block forever. Hence the LEAVE
-statement that ensures a closed Channel
if something goes wrong.
Mixing the role into the .list
ified Channel
provides a side channel to hand the Channel
over to the consumer, so it can be closed on that end.
my @l := find-simple(%*ENV<HOME>, :keep-going); # binding to avoid eagerness
for @l {
@l.channel.close if $++ > 5000; # hard-close the channel after 5000 found files
.say if $++ %% 100 # print every 100th file
}
The binding is required or the Array
would eat up the entire Channel
and with a scalar in it’s place we would have to flatten. Please note how well the anonymous state variables count in a most untypo-possible manner. This is one of those cases where a boatload of Perl 6 features play together. Sigils allow to indicate a variable, even if it got no name. Autovivication on Any by postfix:<++>
gets us a 0 that is incremented to 1. Since we don’t got a name that is spilled into the local scope (the 2nd $++
introduces a separate variable), we don’t need a declarator. I never really liked one shot variables because they are so easy to miss when moving code around. Then the compiler would complain about a missing declaration, or worse, have a variable that is initialised with a wrong value.
There are problems though. While testing in a clean VM (Debian Jessie) I found that symlink-loops (/sys/
got plenty of those) will cause the Promised
-thread to consume 100% CPU while the main-thread is doing nothing. There seams to be something amiss with threads. Maybe in conjunction with plenty of exceptions. I would not wonder if there would be a bug. Nobody is testing exception-spamming in a synthetic test.
That’s what $no-thread
is for. Simply declaring &start
to be a nearly empty pointy will disable any threading and turn the Channel
into a glorified Array
.
Anyway, both writing and using concurrent code is really easy, as long as we turn a gather/take into a start/send to channel a List through a Channel.
UPDATE: I just managed to golf and report the bug. At the time you read this it may be fixed already.