Archive

Archive for August, 2016

One line of unicode at you fingertips

August 31, 2016 1 comment

I found myself making finger acrobatics to enter “»” on the CLI to test examples dealing with IO::* because camelia comes with a restricted setting. I’m using screen what does support digraphs but doesn’t include the fancy chars I need for my Perl 6 pleasure.

As it turns out one can tell screen to exec a shell command, let it take keyboard input and paste the result in the active screen window. There is a irssi plugin that allows the usage of digraphs and is implemented in Perl 5.

It’s not really doing what I wanted to do but got a list of digraphs to start with. A little while later I got a Perl 5 script that is doing what is needed to turn screens digraphs into something useful for a Perl 6 lubber. As a side effect irssi got convenient digraphs without fiddling with Josh Holland‘s plugin. And so does any program running inside screen. If you don’t like my intentionally RFC breaking digraphs – fork away.

♥ Happy␣typing! ♥¶

Advertisements
Categories: Perl6, unicode

It’s lazy all the way down

August 14, 2016 4 comments

On my quest to help with the completion of the docs for Perl 6 I found the task to find undocumented methods quite cumbersome. There are currently 241 files that should contain method definitions and that number will likely grow with 6.d.

Luckily I wield the powers of Perl 6 what allows me to parse text swiftly and to introspect the build-in types.

71     my \methods := gather for types -> ($type-name, $path) {
72         take ($type-name, $path, ::($type-name).^methods(:local).grep({
73             my $name = .name;
74             # Some buildins like NQPRoutine don't support the introspection we need.
75             # We solve this the British way, complain and carry on.
76             CATCH { default { say "problematic method $name in $type-name" unless $name eq '<anon>'; False } }
77             (.package ~~ ::($type-name))
78         })».name)
79     }

We can turn a Str into a list of method objects by calling ::("YourTypeName").^methods(). The :local adverb will filter out all inherited methods but not methods that are provided by roles and mixins. To filter roles out we can check if the .package property of a method object is the same then the type name we got the methods from.

For some strange reason I started to define lazy lists and didn’t stop ’till the end.

52     my \ignore = LazyLookup.new(:path($ignore));

That creates a kind of lazy Hash with a type name as a key and a list of string of method names as values. I can go lazy here because I know the file those strings come from will never contain the same key twice. This works by overloading AT-KEY with a method that will check in a private Hash if a key is present. If not it will read a file line-by-line and store any found key/value pairs in the private Hash and stop when it found the key AT-KEY was asked for. In a bad case it has to read the entire file to the end. If we check only one type name (the script can and should be used to do that) we can go lucky and it wont pass the middle of the file.

54     my \type-pod-files := $source-path.ends-with('.pod6')

That lazy list produces IO::Path objects for every file under a directory that ends in ‘.pod6’ and descends into sub-directories in a recursive fashion. Or it contains no lazy list at all but a single IO::Path to the one file the script was asked to check. Perl 6 will turn a single value into a list with that value if ever possible to make it easy to use listy functions without hassle.

64     my \types := gather for type-pod-files».IO {

This lazy list is generated by turning IO::Path into a string and yank the type name out of it. Type names can be fairly long in Perl 6, e.g. X::Method::Private::Permission.

71     my \methods := gather for types -> ($type-name, $path) {

Here we gather a list of lists of method names found via introspection.

81     my \matched-methods := gather for methods -> ($type-name, $path, @expected-methods) {

This list is the business end of the script. It uses Set operators to get the method names that are in the list created by introspection (methods that are in the type object) but neither in the pod6-file nor in the ignore-methods file. If the ignore-methods is complete and correct, that will be a list of methods that we missed to write documentation for.

88     for matched-methods -> ($type-name, $path, Set $missing-methods) {
89         put "Type: {$type-name}, File: ⟨{$path}⟩";
90         put $missing-methods;
91         put "";
92     };

And then at the end we have a friendly for loop that takes those missing methods and puts them on the screen.

Looking at my work I realised that the whole script wouldn’t do squad all without that for loop. Well, it would allocate some RAM and setup a handful of objects, just to tell the GC to gobble them all up. Also there is a nice regularity with those lazy lists. They take the content of the previous list, use destructuring to stick some values into input variables that we can nicely name. Then it declares and fills a few output variables, again with proper speaking names, and returns a list of those. Ready to be destructured in the next lazy list. I can use the same output names as input names in the following list what makes good copypasta.

While testing the script and fixing a few bugs I found that any mistake I make that triggers and any pod6-file did terminate the program faster then I could start it. I got the error message I need to fix the problem as early as possible. The same is true for bugs that trigger only on some files. Walking a directory tree is not impressively fast yet, as Rakudo creates a lot of objects based on c-strings that drop out of libc, just to turn them right back into c-strings when you asked for the content of another directory of open a file. No big deal for 241 files, really, but I did have the pleasure to $work with an archive of 2.5 million files before. I couldn’t possibly drink all that coffee.

I like to ride the bicycle and as such could be dead tomorrow. I better keep programming in a lazy fashion so I get done as much as humanly possible.

EDIT: The docs where wrong about how gather/take behaves at the time of the writing of this blog post. They are now corrected what should lead to less confusion.

Categories: Uncategorized

Sneaking into a loop

August 10, 2016 3 comments

Zoffix answered a question about Perl 5s <> operator (with a few steps in-between as it happens on IRC) with a one liner.

slurp.words.Bag.sort(-*.value).fmt("%10s => %3d\n").say;

While looking at how this actually works I stepped on a few idioms and a hole. But first lets look at how it works.

The sub slurp will read the whole “file” from STDIN and return a Str. The method Str::words will split the string into a list by some unicode-meaning of word. Coercing the list into a Bag creates a counting Hash and is a shortcut for the following expression.

my %h; %h{$_}++ for <peter paul marry>; dd %h
# OUTPUT«Hash %h = {:marry(1), :paul(1), :peter(1)}␤»

Calling .sort(-*.value) on an Associative will sort by values descending and return a ordered list of Pairs. List::fmt will call Pair::fmt what calls fmt with the .key as its 2nd and .value the parameter. Say will join with a space and output to STDOUT. The last step is a bit wrong because there will be an extra space in from of each line but the first.

slurp.words.Bag.sort(-*.value).fmt(“%10s => %3d”).join(“\n”).say;

Joining by hand is better. That’s a lot for a fairly short one-liner. There is a problem though that I tripped over before. The format string is using %10s to indent and right-align the first column what works nicely unless the values wont fit anymore. So we would need to find the longest word and use .chars to get the column width.

The first thing to solve the problem was a hack. Often you want to write a small CLI tool that reads stuff from STDIN and that requires a test. There is currently no way to have a IO::Handle that reads from a Str (there is an incomplete module for writing). We don’t really need a whole class in that case because slurp will simply call .slurp-rest on $*IN.

$*IN = <peter paul marry peter paul paul> but role { method slurp-rest { self.Str } };

It’s a hack because it will fail on any form of type check and it wont work for anything but slurp. Also we actually untie STDIN from $*IN. Don’t try this at homework.

Now we can happily slurp and start counting.

my %counted-words = slurp.words.Bag;
my $word-width = [max] %counted-words.keys».chars;

And continue the chain where we broke it apart.

%counted-words.sort(-*.value).fmt("%{$word-width}s => %3d").join("\n").say;

Solved but ugly. We broke a one-liner apart‽ Let’s fix fmt to have it whole again.

What we want is a method fmt that takes a Positional, a printf-style format string and a block per %* in the format string. Also we may need a separator to forward to self.fmt.

8 my multi method fmt(Positional:D: $fmt-str, *@width where *.all ~~ Callable, :$separator = " "){
9     self.fmt(
10         $fmt-str.subst(:g, "%*", {
11             my &width = @width[$++] // Failure.new("missing width block");
12             '%' ~ (&width.count == 2 ?? width(self, $_) !! width(self))
13         }), $separator);
14 }

The expression *.all ~~ Callable simply checks if all elements in the slurpy array implement CALL-ME (that’s the real method that is executed with you do foo()).

We then use subst on the format string to replace %*, whereby the replacement is a (closure) block that is called once per match. And here we have the nice idiom I stepped on.

say "1-a 2-b 3-c".subst(:g, /\d/, {<one two three>[$++]});
# OUTPUT«one-a two-b three-c␤»

The anonymous state variable $ is counting up by one from 0 for every execution of the block. What we actually do here is removing a loop by sneaking an extra counter and an array subscript into the loop subst must have. Or we could say that we inject an iterator pull into the loop inside subst. One could argue that subst should accept a Seq as its 2nd positional, what would make a call redundant. Anyway, we got hole plugged.

In line 11 we take one element out of the slurpy array or create a Failure if there is no element. We store the block in a variable because we want to introspect in line 12. If the block takes two Positionals, we feed the topic subst is calling the block with as a 2nd parameter to our stored block. That happens to be a Match and may be useful to react on what was matched. In our case we know that we matched on %* and the current position is counted by $++ anyway. With that done we got a format string augmented with a column provided by the user of our version of fmt.

The user supplied block is called with a list of Pairs. We have to go one level deeper to get the biggest key.

{[max] .values».keys».chars}

That’s what we have to supply to get the first columns width, in a list of Pairs dropping out of Bag.sort.

print %counted-words.sort(-*.value).&fmt("%*s => %3d", {[max] .values».keys».chars}, separator => "\n");

The fancy .&fmt call is required because our free floating method is not a method of List. That spares us to augment List.

And there you have it. Another hole in CORE plugged.

Categories: Perl6

Walk on the flats

August 1, 2016 1 comment

In #perl6 we are asked on a regular basis how to flatten deep nested lists. The naive approach of calling flat doesn’t do what beginners wish it to do.

my @a = <Peter Paul> Z <70 42> Z <green yellow>;

dd @a;
# Array @a = [("Peter", IntStr.new(70, "70"), "green"), ("Paul", IntStr.new(42, "42"), "yellow")]

dd @a.flat.flat;
# ($("Peter", IntStr.new(70, "70"), "green"), $("Paul", IntStr.new(42, "42"), "yellow")).Seq

The reason why @a is so resistant to flattening is that flat will stop on the first item, what happens to be a list in the example above. We can show that with dd.

for @a { .&dd }
# List @a = $("Peter", IntStr.new(70, "70"), "green")
# List @a = $("Paul", IntStr.new(42, "42"), "yellow")

The reason why this easy thing isn’t easy is that you are not meant to do it. Flattening lists may work well with short simple list literals but will become pricey on lazy lists, supplies or 2TB of BIG DATA. Leave the data as it is and use destructuring to untangle nested lists.

If you must iterate in a flattening fashion over your data, use a lazy iterator.

sub descend(Any:D \l){ gather l.deepmap: { .take } }

dd descend @a;
# ("Peter", IntStr.new(70, "70"), "green", "Paul", IntStr.new(42, "42"), "yellow").Seq

By maintaining the structure of your data you can reuse that structure later, what includes changes in halve a year’s time.

put @a.map({ '<tr>' ~ .&descend.map({"<td>{.Str}</td>"}).join ~ "</tr>" }).join("\n");
# <tr><td>Peter</td><td>70</td><td>green</td></tr>
# <tr><td>Paul</td><td>42</td><td>yellow</td></tr>

There are quite a few constructs that will require you to flatten to feed the result to operators and loops. If you concatenate list-alikes like Ranges or Seq returned by metaops, flat will come in handy.

"BenGoldberg++ for this pretty example".trans( [ flat "A".."Z", "a".."z"] => [ flat "𝓐".."𝓩", "𝓪".."𝔃" ] ).put
# OUTPUT«𝓑𝓮𝓷𝓖𝓸𝓵𝓭𝓫𝓮𝓻𝓰++ 𝓯𝓸𝓻 𝓽𝓱𝓲𝓼 𝓹𝓻𝓮𝓽𝓽𝔂 𝓮𝔁𝓪𝓶𝓹𝓵𝓮␤»
Categories: Perl6