Text and path gotchas

Home  >>  Rant  >>  Text and path gotchas

Text and path gotchas

On January 9, 2007, Posted by , In Rant, With 1 Comment

PowerShell is now my default Windows shell, but every now and then, little things crop up that make me want to go back to CMD.exe, or better cygwin/zsh.

Someone gave me an example of filtering an ASCII text file, stripping out the first 3 characters of each line.

On a Mac/UNIX box, you’d simply do:

cut -c4- infile.txt > outfile.txt

With PowerShell, you do:

gc infile.txt | % { $_.remove(0,3) } | sc outfile.txt

As an asside, on my machine, the latter takes 80 times longer to do the job on a 8MB text file than the former , but we’ll skip over that 🙂 So, to the gotchas.

Gotcha 1 – text gets recoded

You might be tempted to do this instead:

gc infile.txt | % { $_.remove(0,3) } > outfile.txt

If you did, the resulting outfile.txt would be UTF16 and so be double the size of the original (less the cut 3 chars per line).

You might not notice this at first, as PS1 handles all this sort of thing transparently, but at some point, something might choke on it. You can be specific about output encoding, but that’s not something you might think about.

Gotcha 2 – path weirdness

My working area is: c:\Documents and Settings\adrian\My Documents\proj\top. That’s quite a lot to deal with, so I usually make it shorter by doing this:

new-psdrive -name doc: -psprovider filesystem -root "c:\Documents and Settings\adrian\My Documents"

so my working directory is actually doc:\proj\top. Much shorter, however, when I run the first command above, I get this:

PS doc:\\proj\\top> gc infile.txt | % { $_.remove(0,3) } | sc outfile.txt
Set-Content : The given path's format is not supported.
At line:1 char:42
+ gc infile.txt | % { $_.remove(0,3) } | sc  ﹤﹤﹤﹤ outfile.txt
The pipeline has been stopped.
At line:1 char:30
+ gc 100.txt | % { $_.remove( ﹤﹤﹤﹤ 0,3) } | sc outfile.txt
Set-Content : The given path's format is not supported.
At line:1 char:42

Odd that gc didn’t mind, but sc did.

You can’t even use resolve-path to fix it because if outfile.txt didn’t already exist, you’d get an error… maybe you need to use new-item -type file or something first?

To be honest, I gave up. PowerShell should really be making things easier, not throwing up barriers at every turn.

One Comment so far:

  1. jsnover says:

    Please file a feature on this. Seems like we need a Cmdlet like Remove-String that takes either a FILENAME or a stream of strings and removed.
    Instructions for how to do this can be found at: http://blogs.msdn.com/powershell/archive/2006/05/09/filing-bugs.aspx .

    Apologizes for the inconvenience.

    Jeffrey Snover [MSFT]
    Windows PowerShell/MMC Architect
    Visit the Windows PowerShell Team blog at: http://blogs.msdn.com/PowerShell
    Visit the Windows PowerShell ScriptCenter at: http://www.microsoft.com/technet/scriptcenter/hubs/msh.mspx