Home > Uncategorized > It’s lazy all the way down

It’s lazy all the way down

On my quest to help with the completion of the docs for Perl 6 I found the task to find undocumented methods quite cumbersome. There are currently 241 files that should contain method definitions and that number will likely grow with 6.d.

Luckily I wield the powers of Perl 6 what allows me to parse text swiftly and to introspect the build-in types.

71     my \methods := gather for types -> ($type-name, $path) {
72         take ($type-name, $path, ::($type-name).^methods(:local).grep({
73             my $name = .name;
74             # Some buildins like NQPRoutine don't support the introspection we need.
75             # We solve this the British way, complain and carry on.
76             CATCH { default { say "problematic method $name in $type-name" unless $name eq '<anon>'; False } }
77             (.package ~~ ::($type-name))
78         })».name)
79     }

We can turn a Str into a list of method objects by calling ::("YourTypeName").^methods(). The :local adverb will filter out all inherited methods but not methods that are provided by roles and mixins. To filter roles out we can check if the .package property of a method object is the same then the type name we got the methods from.

For some strange reason I started to define lazy lists and didn’t stop ’till the end.

52     my \ignore = LazyLookup.new(:path($ignore));

That creates a kind of lazy Hash with a type name as a key and a list of string of method names as values. I can go lazy here because I know the file those strings come from will never contain the same key twice. This works by overloading AT-KEY with a method that will check in a private Hash if a key is present. If not it will read a file line-by-line and store any found key/value pairs in the private Hash and stop when it found the key AT-KEY was asked for. In a bad case it has to read the entire file to the end. If we check only one type name (the script can and should be used to do that) we can go lucky and it wont pass the middle of the file.

54     my \type-pod-files := $source-path.ends-with('.pod6')

That lazy list produces IO::Path objects for every file under a directory that ends in ‘.pod6’ and descends into sub-directories in a recursive fashion. Or it contains no lazy list at all but a single IO::Path to the one file the script was asked to check. Perl 6 will turn a single value into a list with that value if ever possible to make it easy to use listy functions without hassle.

64     my \types := gather for type-pod-files».IO {

This lazy list is generated by turning IO::Path into a string and yank the type name out of it. Type names can be fairly long in Perl 6, e.g. X::Method::Private::Permission.

71     my \methods := gather for types -> ($type-name, $path) {

Here we gather a list of lists of method names found via introspection.

81     my \matched-methods := gather for methods -> ($type-name, $path, @expected-methods) {

This list is the business end of the script. It uses Set operators to get the method names that are in the list created by introspection (methods that are in the type object) but neither in the pod6-file nor in the ignore-methods file. If the ignore-methods is complete and correct, that will be a list of methods that we missed to write documentation for.

88     for matched-methods -> ($type-name, $path, Set $missing-methods) {
89         put "Type: {$type-name}, File: ⟨{$path}⟩";
90         put $missing-methods;
91         put "";
92     };

And then at the end we have a friendly for loop that takes those missing methods and puts them on the screen.

Looking at my work I realised that the whole script wouldn’t do squad all without that for loop. Well, it would allocate some RAM and setup a handful of objects, just to tell the GC to gobble them all up. Also there is a nice regularity with those lazy lists. They take the content of the previous list, use destructuring to stick some values into input variables that we can nicely name. Then it declares and fills a few output variables, again with proper speaking names, and returns a list of those. Ready to be destructured in the next lazy list. I can use the same output names as input names in the following list what makes good copypasta.

While testing the script and fixing a few bugs I found that any mistake I make that triggers and any pod6-file did terminate the program faster then I could start it. I got the error message I need to fix the problem as early as possible. The same is true for bugs that trigger only on some files. Walking a directory tree is not impressively fast yet, as Rakudo creates a lot of objects based on c-strings that drop out of libc, just to turn them right back into c-strings when you asked for the content of another directory of open a file. No big deal for 241 files, really, but I did have the pleasure to $work with an archive of 2.5 million files before. I couldn’t possibly drink all that coffee.

I like to ride the bicycle and as such could be dead tomorrow. I better keep programming in a lazy fashion so I get done as much as humanly possible.

EDIT: The docs where wrong about how gather/take behaves at the time of the writing of this blog post. They are now corrected what should lead to less confusion.

Categories: Uncategorized
  1. August 14, 2016 at 17:18

    Why are you using sygil-less variable names? Just because you don’t like sygils, or is there a deeper reason?
    I find this makes the code harder to read. On the very first line, “… gather for types -> …”, it is not clear what “types” is. It looks like a function call. Wouldnt “… gather for @types -> …” be much more readable?

    • August 14, 2016 at 18:06

      There are quite a few reasons. Firstly I don’t need those variables to be containers, hence the binding. Containers are just another level of indirection. So it’s a little faster that way. Then I don’t want to do any string interpolation with those lazy lists. They would explode into all values when .Str is called on them. That is likely a bit much for a single string. Lazy lists are a lot closer to a Callable then an Array. They produce a Seq, which is immutable, while @-sigiled containers are of type Array, what is very much mutable. Binding to a @-sigiled container doesn’t work because it’s the wrong type. On assignment to Array a Seq will generate all its values even if we touch every file only once. With 2.5M files that may be a bad idea. And lastly it’s optics. A common idiom would be for lines() { .say }; no sigil on this lazy list ($*IN.lines is lazy).

  1. August 15, 2016 at 23:20
  2. October 3, 2016 at 03:25

Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: