Home > Raku > How does lizmat know?

How does lizmat know?

I didn’t know so I asked her.

15:25 <gfldex> How do you gather info for "Updated Raku Modules"?
17:40 <lizmat> https://twitter.com/raku_cpan_new
17:59 <gfldex> thx
23:06 <lizmat> well volunteered  :-)

That’s what you get for being nosey. So off I went into the land of mostly undocumented infrastructure.

The objective is simple. Generate two lists of modules where the first contains all modules that are newly added to the ecosystem and the second got all updated modules. For both the timespan of interest is Monday of this week until Monday of last week. Currently we got two collections of META-files. Our ecosystem and CPAN. The latter does not know about META6 and that sucks. But we will manage. Conveniently both lists are provided by ugexe at github. Since there are commits we can travel back in time and get a view of the ecosystem from when we need it. To do so we first need to get a list of commits.

sub github-get-remote-commits($owner, $repo, :$since, :$until) is export(:GIT) {
    my $page = 1;
    my @response;
    loop {
        my $commits-url = $since && $until ?? „https://api.github.com/repos/$owner/$repo/commits?since=$since&until=$until&per_page=100&page=$page“ !! „https://api.github.com/repos/$owner/$repo/commits“;
        my $curl = Proc::Async::Timeout.new('curl', '--silent', '-X', 'GET', $commits-url);
        my $github-response;
        $curl.stdout.tap: { $github-response ~= .Str };

        await my $p = $curl.start: :$timeout;
        @response.append: from-json($github-response);

        last unless from-json($github-response)[0].<commit>;
        $page++;
    }

    if @response.flat.grep(*.<message>) && @response.flat.hash.<message>.starts-with('API rate limit exceeded') {
        dd @response.flat;
        die „github hourly rate limit hit.“;
    }

    @response.flat
}

my @ecosystems-commits = github-get-remote-commits(‚ugexe‘, ‚Perl6-ecosystems‘, :since($old), :until($young));

Now we can get a whole bunch of ex-json which was compiled of the META6.json and *.meta files. Both file formats are not compatible. The auth field of a CPAN module will differ from the auth of the upstream META6.json, there is no authors field and no URL to the upstream repo. Not pretty but fixable because tar is awesome.

my @meta6;
px«curl -s $source-url» |» px<tar -xz -O --no-wildcards-match-slash --wildcards */META6.json> |» @meta6;

my $meta6 = @meta6.join.chomp.&from-json;

(Well, GNU tar is awesome. BSD tar doesn’t sport --no-wildcards-match-slash and there is one module with two META6.json-files. I think I can get around this with a 2 pass run.)

This works nicely for all but one module. For some reason a Perl 5 module sneaked into the list of Raku modules on CPAN. It’s all just parsed JSON so we can filter those out.

my @ecosystems = fetch-ecosystem(:commit($youngest-commit)).grep(*.<perl>.?starts-with('6'));

Some modules don’t contain an auth field, some got an empty name. Others don’t got the authors field set. We don’t enforce proper meta data even though it’s very easy to add quality control. Just use Test::META in your tests. Here is an example.

I can’t let lizmat down though and github knows almost all authors.

sub github-realname(Str:D $handle) {
    my @github-response;

    my $url = 'https://api.github.com/users:' ~ $handle;
    px«curl -s -X GET $url» |» @github-response;

    @github-response.join.&from-json.<name>
}

If there is more then one author they wont show up with this hack. I can’t win them all. I’m not the only one who suffers here. On modules.raku.org at least one module shows up twice with the same author. My guess is that happens when a module is published both in our ecosystem and on CPAN. I don’t know what zef does if you try to nail a module down by author and version with such ambiguity.

I added basic html support and am now able to give you a preview of next weeks new modules.

New Modules

Updated Modules

If your module is in the list and your name looks funny, you may want to have a look into the META6.json of you project.

Yesterday we had a discussion about where to publish modules. I will not use CPAN with the wrong language. Don’t get me wrong. I like CPAN. You can tie an aircraft carrier to it and it wont move. But it’s a Comprehensive Perl Archive Network. It’s no wonder it doesn’t like our metadata.

Kudos to tony-o for taking on a sizeable task. I hope my lamentation is helpful in this regard.

The script can be found here. I plan to turn it into a more general module to query the ecosystem. Given I spend the better part of a week on a 246 lines file the module might take a while.

Categories: Raku
  1. September 6, 2020 at 12:45

    That’s very cool… thought I would chuck in my 2p worth. Since we now have GitHub (which did not exist in the early days of CPAN), I think the infrastructure for a Comprehensive Raku Universal Module Base COULD just be:
    * a live directory with human and machine list of modules (viz. your post)
    * an installer that can lookup in the directory and get modules right out of GitHub (i.e. zef)
    * some repo name conventions on GitHub to ensure that inclusion is a deliberate act
    * some version conventions between GitHub release names, code header and METAR.json
    * some care on checksums / certificates
    Errr – that’s it!

    Authors would be the owners of quality with suitable module tests. No doubt some purists will not want raku to depend on a commercial platform. No doubt wiser heads than mine may have a different view …

    • September 6, 2020 at 13:52

      github supports releases. So that is an option. As you spottet already, github is owned by Microsoft. They changed a lot in the last couple years. But that means they can change and the next change may not be to our liking. I believe we still need a place to store releases and that place must not be under the control of a single entity. Just like CPAN. The whole thing requires a lot of thought because we are setting the course for the next 100 years.

  1. No trackbacks yet.

Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: