Running pipelines in the background
In the unix world, we’re used to being able to run the different commands in a pipeline in different threads/processes. This is usually a lot more efficient when the producer/consumer rates vary, and makes use of multiple cores/cpus.
For example, cmd1 | cmd2
will run the two commands in separate processes.
PowerShell doesn’t do this. Everything runs in one thread, and pipelined objects are processed in batches, one bit at a time. You can control the batch size with the -outBuffer
parameter, but things are still done sequentially, with one command in the pipe being busy whilst all others are sitting idle.
Even the venerable CMD.exe will put these two commands tree | more
into separate processes.
This aside, what about simply running an arbitrary command (or scriptblock) in the background whilst doing something else?
PowerShell doesn’t have any obvious way to do this, but it is possible with runspaces. You can create a new runspace and then invoke a pipeline asynchronously.
Here is an example script, but lets go through it step by step.
$rscfg = [management.automation.runspaces.runspaceconfiguration]::Create() $rs=[management.automation.runspaces.runspacefactory]::CreateRunspace($rscfg) $rs.Open()
Here, we create a runspace
with a default configuration. There is an overloaded version of CreateRunspace
that takes a PSHost
object, so we could pass in $host
. In this case we just use the default. We then open the runspace
.
$block = { $sum=0 "adding numbers in background!" foreach($i in $input) { $sum+=$i start-sleep 1 # pretent it takes ages to compute this "adding number $i" } "that's it, sum is $sum" } $pipe=$rs.CreatePipeline($block)
Here we create the scriptblock $block
that we want to run in the background and create a pipeline $pipe
for it. The block itself simply takes it time adding up numbers passed to it, yielding textual information to the pipe, ending with the result.
$writer = $pipe.Input $pipe.InvokeAsync()
Here we simply grab hold of the pipe’s input then invoke it asynchronously.
$numwritten = $writer.Write(1..5,$true) $writer.Close()
This is where we give the background pipe some data to play with. Write(1..5,$true)
means write the numbers 1 through 5 into the pipeline ($true
means send the numbers one at a time instead of all at once).
1..3 | foreach { start-sleep 1; "busy in foreground $_" }
The whole purpose of this example is to show how two different things can be done at once, so here we simply spend 3 seconds pretending to be busy 🙂
$reader = $pipe.Output while (-not $reader.EndOfPipeline) { $o = $reader.Read() write-host "$o" }
Here we get the output $reader
from the pipeline and whilst there is data, read an object and display it.
We didn’t have to do it this way; we could have read the whole thing in one go like this: $out = $reader.ReadToEnd()
. The latter might be better if the background work produces a single result, but the line-by-line version may suit something with a lot of output. Although we simple do a write-host "$o"
here, this could be a lengthy operation that would run whilst the background operation continues to do work.
$pipe.Error.ReadToEnd() $pipe.Dispose() $rs.Close()
Finally we consume any error output and clean up.
If you run the example script, you will see something like this:
busy in foreground 1 busy in foreground 2 busy in foreground 3 adding numbers in background! adding number 1 adding number 2 adding number 3 adding number 4 adding number 5 that's it, sum is 15
This is great, but why is this post categorised as a rant? Well, it’s very long-winded and in my mind, async job scheduling is something that all shells should do.
It’s a pity PowerShell doesn’t let you do something like this:
{ foo | bar ; wibble}.InvokeAsync()
or
invoke-command -async { foo | bar ; wibble }
You can do this, however:
{ foo | bar ; wibble}.Invoke()
So how would all this be used? Would it make space invaders any quicker on a core duo pc? Only a profiler would know.
A single runspace can only have one asynchronous pipeline running and the pipeline cannot be reused. The setup costs are quite high, so you’d probably have to mess about with a single background pipeline reading work requests and writing them back.