Text and path gotchas
PowerShell is now my default Windows shell, but every now and then, little things crop up that make me want to go back to CMD.exe, or better cygwin/zsh.
Someone gave me an example of filtering an ASCII text file, stripping out the first 3 characters of each line.
On a Mac/UNIX box, you’d simply do:
cut -c4- infile.txt > outfile.txt
With PowerShell, you do:
gc infile.txt | % { $_.remove(0,3) } | sc outfile.txt
As an asside, on my machine, the latter takes 80 times longer to do the job on a 8MB text file than the former , but we’ll skip over that 🙂 So, to the gotchas.
Gotcha 1 – text gets recoded
You might be tempted to do this instead:
gc infile.txt | % { $_.remove(0,3) } > outfile.txt
If you did, the resulting outfile.txt would be UTF16 and so be double the size of the original (less the cut 3 chars per line).
You might not notice this at first, as PS1 handles all this sort of thing transparently, but at some point, something might choke on it. You can be specific about output encoding, but that’s not something you might think about.
Gotcha 2 – path weirdness
My working area is: c:\Documents and Settings\adrian\My Documents\proj\top
. That’s quite a lot to deal with, so I usually make it shorter by doing this:
new-psdrive -name doc: -psprovider filesystem -root "c:\Documents and Settings\adrian\My Documents"
so my working directory is actually doc:\proj\top
. Much shorter, however, when I run the first command above, I get this:
PS doc:\\proj\\top> gc infile.txt | % { $_.remove(0,3) } | sc outfile.txt Set-Content : The given path's format is not supported. At line:1 char:42 + gc infile.txt | % { $_.remove(0,3) } | sc ﹤﹤﹤﹤ outfile.txt The pipeline has been stopped. At line:1 char:30 + gc 100.txt | % { $_.remove( ﹤﹤﹤﹤ 0,3) } | sc outfile.txt Set-Content : The given path's format is not supported. At line:1 char:42
Odd that gc
didn’t mind, but sc
did.
You can’t even use resolve-path
to fix it because if outfile.txt didn’t already exist, you’d get an error… maybe you need to use new-item -type file
or something first?
To be honest, I gave up. PowerShell should really be making things easier, not throwing up barriers at every turn.
Please file a feature on this. Seems like we need a Cmdlet like Remove-String that takes either a FILENAME or a stream of strings and removed.
Instructions for how to do this can be found at: http://blogs.msdn.com/powershell/archive/2006/05/09/filing-bugs.aspx .
Apologizes for the inconvenience.
Jeffrey Snover [MSFT]
Windows PowerShell/MMC Architect
Visit the Windows PowerShell Team blog at: http://blogs.msdn.com/PowerShell
Visit the Windows PowerShell ScriptCenter at: http://www.microsoft.com/technet/scriptcenter/hubs/msh.mspx