Archive

Archive for February, 2019

Threading nqp through a channel

February 3, 2019 1 comment

Given that nqp is faster then plain Perl 6 and threads combining the two should give us some decent speed. Using a Supply as promised in the last post wouldn’t really help. The emit will block until the internal queue of the Supply is cleared. If we want to process files recursively the filesystem might stall just after the recursing thread is unblocked. If we are putting pressure on the filesystem in the consumer, we are better of with a Channel that is swiftly filled with file paths.

Let’s start with a simulated consumer that will stall every now end then and takes the Channel in $c.

my @files;
react {
whenever $c -> $path {
@files.push: $path;
sleep 1 if rand < 0.00001;
}
}

If we would pump out paths as quickly as possible we could fill quite a bit of RAM and put a lot of pressure on the CPU caches. After some trial and error I found that sleeping befor the .send on the Channel helps when there are more then 64 worker threads waiting to be put onto machine threads. That information is accessible via Telemetry::Instrument::ThreadPool::Snap.new<gtq>.

my $c = Channel.new;
start {
my @dirs = '/snapshots/home-2019-01-29';
  while @dirs.shift -> str $dir {
  my Mu $dirh := nqp::opendir(nqp::unbox_s($dir));
  while my str $name = nqp::nextfiledir($dirh) {
  next if $name eq '.' | '..';
  my str $abs-path = nqp::concat( nqp::concat($dir, '/'), $name);
  next if nqp::fileislink($abs-path);
  if Telemetry::Instrument::ThreadPool::Snap.new<gtq> > 64 {
say Telemetry::Instrument::ThreadPool::Snap.new<gtq>;
  say 'sleeping';
sleep 0.1;
}
$c.send($abs-path) if nqp::stat($abs-path, nqp::const::STAT_ISREG);
@dirs.push: $abs-path if nqp::stat($abs-path, nqp::const::STAT_ISDIR);
  }
  CATCH { default { put BOLD .Str, ' ⟨', $dir, '⟩' } }
  nqp::closedir($dirh); }
  $c.close;
}

Sleeping for 0.1s before sending the next value is a bit naive. It would be better to watch the number of waiting workers and only continue when it has dropped below 64. But that is a task for a differnt module. We don’t really have a middle ground in Perl 6 between Supply with it’s blocking nature and the value pumping Channel. So such a module might be actually quite useful.

But that will have to wait. I seam to have stepped on a bug in IO::Handle.read while working with large binary files. We got tons of tests on roast that deal with small data. Working with large data isn’t well tested and I wonder what monsters are lurking there.

Categories: Perl6, Uncategorized

nqp is faster then threads

February 2, 2019 1 comment

After heaving to much fun with a 20 year old filesystem and the inability of unix commands to hande odd filenames, I decided to replace find /somewhere -type f | xargs -P 10 -n 1 do-stuff with a Perl 6 script.

The first step is to travers a directory tree. I don’t really need to keep the a list of paths but for sure run stuff in parallel. Generating a supply in a thread seams to be a reasonable thing to do.

start my $s = supply {
for '/snapshots/home-2019-01-29/' {
emit .IO if (.IO.f & ! .IO.l);
.IO.dir()».&?BLOCK if (.IO.d & ! .IO.l);
CATCH { default { put BOLD .Str } }
}
}

{
my @files;
react whenever $s {
@files.push: $_;
}
  say +@files;
  say now - ENTER now;
}

Recursion is done with by calling the for block on the topic with .&?BLOCK. It’s very short and very slow. It takes 21.3s for 200891 files — find will do the same in 0.296s.

The OS wont be the bottleneck here, so maybe threading will help. I don’t want to overwhelm the OS with filesystem requests though. The buildin Telemetry module can tell us how many worker threads are sitting on their hands at any given time. If we use Promise to start workers by hand, we can decide to avoid threading when workers are still idle.

sub recurse(IO() $_){
my @ret;
@ret.push: .Str if (.IO.f & ! .IO.l);
  if (.IO.d & ! .IO.l) {
if Telemetry::Instrument::ThreadPool::Snap.new<gtq> > 4 {
@ret.append: do for .dir() { recurse($_) }
} else {
@ret.append: await do for .dir() {
Promise.start({ recurse($_) })
}
}
}
CATCH { default { put BOLD .Str } }
@ret.Slip
}
{
say +recurse('/snapshots/home-2019-01-29');
say now - ENTER now;
}

That takes 7.65s what is a big improvement but still miles from the performance of a 20 year old c implementation. Also find can that do the same and more on a single CPU core instead of producing a load of ~800%.

Poking around in Rakudos source, one can clearly see why. There are loads of IO::Path objects created and c-strings concatenated, just to unbox those c-strings and hand them over to some VM-opcodes. All I want are absolute paths I can call open with. We have to go deeper!

use nqp;

my @files;
my @dirs = '/snapshots/home-2019-01-29';
while @dirs.shift -> str $dir {
my Mu $dirh := nqp::opendir(nqp::unbox_s($dir));
while my str $name = nqp::nextfiledir($dirh) {
next if $name eq '.' | '..';
my str $abs-path = nqp::concat( nqp::concat($dir, '/'), $name);
next if nqp::fileislink($abs-path);
@files.push: $abs-path if nqp::stat($abs-path, nqp::const::STAT_ISREG);
@dirs.push: $abs-path if nqp::stat($abs-path, nqp::const::STAT_ISDIR);
}
CATCH { default { put BOLD .Str, ' ⟨', $dir, '⟩' } }
nqp::closedir($dirh);
}
say +@files; say now - ENTER now;

And this finishes in 2.58s with just 1 core and should play better in situations where not many filehandles are available. Still 9 times slower than find but workable. Wrapping it into a supply is a task for another day.

So for the time being — if you want fast you need nqp.

UPDATE: We need to check the currently waiting workers, not the number of spawned workers. Example changed to Snap.new<gtq>.

Categories: Perl6