Running pipelines in the background

Home  >>  Cool  >>  Running pipelines in the background

Running pipelines in the background

On January 22, 2007, Posted by , In Cool,PowerShell,Rant, With Comments Off on Running pipelines in the background

In the unix world, we’re used to being able to run the different commands in a pipeline in different threads/processes. This is usually a lot more efficient when the producer/consumer rates vary, and makes use of multiple cores/cpus.

For example, cmd1 | cmd2 will run the two commands in separate processes.

PowerShell doesn’t do this. Everything runs in one thread, and pipelined objects are processed in batches, one bit at a time. You can control the batch size with the -outBuffer parameter, but things are still done sequentially, with one command in the pipe being busy whilst all others are sitting idle.

Even the venerable CMD.exe will put these two commands tree | more into separate processes.

This aside, what about simply running an arbitrary command (or scriptblock) in the background whilst doing something else?

PowerShell doesn’t have any obvious way to do this, but it is possible with runspaces. You can create a new runspace and then invoke a pipeline asynchronously.

Here is an example script, but lets go through it step by step.

$rscfg = [management.automation.runspaces.runspaceconfiguration]::Create()

Here, we create a runspace with a default configuration. There is an overloaded version of CreateRunspace that takes a PSHost object, so we could pass in $host. In this case we just use the default. We then open the runspace.

$block = {
  "adding numbers in background!"
  foreach($i in $input) {
    start-sleep 1 # pretent it takes ages to compute this
    "adding number $i"
  "that's it, sum is $sum"

Here we create the scriptblock $block that we want to run in the background and create a pipeline $pipe for it. The block itself simply takes it time adding up numbers passed to it, yielding textual information to the pipe, ending with the result.

$writer = $pipe.Input

Here we simply grab hold of the pipe’s input then invoke it asynchronously.

$numwritten = $writer.Write(1..5,$true)

This is where we give the background pipe some data to play with. Write(1..5,$true) means write the numbers 1 through 5 into the pipeline ($true means send the numbers one at a time instead of all at once).

1..3 | foreach { start-sleep 1; "busy in foreground $_" }

The whole purpose of this example is to show how two different things can be done at once, so here we simply spend 3 seconds pretending to be busy :-)

$reader = $pipe.Output
while (-not $reader.EndOfPipeline) {
  $o = $reader.Read()
  write-host "$o"

Here we get the output $reader from the pipeline and whilst there is data, read an object and display it.

We didn’t have to do it this way; we could have read the whole thing in one go like this: $out = $reader.ReadToEnd(). The latter might be better if the background work produces a single result, but the line-by-line version may suit something with a lot of output. Although we simple do a write-host "$o" here, this could be a lengthy operation that would run whilst the background operation continues to do work.


Finally we consume any error output and clean up.

If you run the example script, you will see something like this:

busy in foreground 1
busy in foreground 2
busy in foreground 3
adding numbers in background!
adding number 1
adding number 2
adding number 3
adding number 4
adding number 5
that's it, sum is 15

This is great, but why is this post categorised as a rant? Well, it’s very long-winded and in my mind, async job scheduling is something that all shells should do.

It’s a pity PowerShell doesn’t let you do something like this:

{ foo | bar ; wibble}.InvokeAsync()


invoke-command -async { foo | bar ; wibble }

You can do this, however:

{ foo | bar ; wibble}.Invoke()

So how would all this be used? Would it make space invaders any quicker on a core duo pc? Only a profiler would know.

A single runspace can only have one asynchronous pipeline running and the pipeline cannot be reused. The setup costs are quite high, so you’d probably have to mess about with a single background pipeline reading work requests and writing them back.

Comments are closed.