2015/05/21

PowerShell - Remove Diacritics (Accents) from a string

In the last few days, I have been working on a Onboarding automation process that need to handle both French and English and one of the step needed to remove the accents (also knows as Diacritics) from some strings passed by the users.

The following approach uses String.Normalize to split the input string into constituent glyphs (basically separating the "base" characters from the diacritics) and then scans the result and retains only the base characters. It's just a little complicated, but really you're looking at a complicated problem...

UPDATE: Thanks to Marcin Krzanowicz who provided another solution, see the Method 2 below. His version works with Polish characters too where the method 1 doesn't.

UPDATE (2016/10/10): Added an example to replace diacritics in multiple files



Method 1

The following code is available on my PowerShell GitHub repository.

function Remove-StringDiacritic
{
<#
    .SYNOPSIS
        This function will remove the diacritics (accents) characters from a string.
        
    .DESCRIPTION
        This function will remove the diacritics (accents) characters from a string.
    
    .PARAMETER String
        Specifies the String on which the diacritics need to be removed
    
    .PARAMETER NormalizationForm
        Specifies the normalization form to use
        https://msdn.microsoft.com/en-us/library/system.text.normalizationform(v=vs.110).aspx
    
    .EXAMPLE
        PS C:\> Remove-StringDiacritic "L'été de Raphaël"
        
        L'ete de Raphael
    
    .NOTES
        Francois-Xavier Cat
        @lazywinadm
        www.lazywinadmin.com
#>
    
    param
    (
        [ValidateNotNullOrEmpty()]
        [Alias('Text')]
        [System.String]$String,
        [System.Text.NormalizationForm]$NormalizationForm = "FormD"
    )
    
    BEGIN
    {
        $Normalized = $String.Normalize($NormalizationForm)
        $NewString = New-Object -TypeName System.Text.StringBuilder
        
    }
    PROCESS
    {
        $normalized.ToCharArray() | ForEach-Object -Process {
            if ([Globalization.CharUnicodeInfo]::GetUnicodeCategory($psitem) -ne [Globalization.UnicodeCategory]::NonSpacingMark)
            {
                [void]$NewString.Append($psitem)
            }
        }
    }
    END
    {
        Write-Output $($NewString -as [string])
    }
}


This code is available on my GitHub repository.


Method 2 (From Marcin Krzanowicz)



function Remove-StringLatinCharacters
{
    PARAM ([string]$String)
    [Text.Encoding]::ASCII.GetString([Text.Encoding]::GetEncoding("Cyrillic").GetBytes($String))
}



Extra: Remove Diacritics from multiple files


If you want to take this to the next level and remove diacritics from multiple files, you could do something like this:


# Modify the function to make it compatible with the pipeline
function Remove-StringLatinCharacters
{
    PARAM (
        [parameter(ValueFromPipeline = $true)]
        [string]$String
    )
    PROCESS
    {
        [Text.Encoding]::ASCII.GetString([Text.Encoding]::GetEncoding("Cyrillic").GetBytes($String))
    }
}


# Exemple with multiple Text files located in the directory c:\test\
Foreach ($file in (Get-ChildItem c:\test\*.txt))
{
    # Get the content of the current file and remove the diacritics
    $NewContent = Get-content $file | Remove-StringLatinCharacters
    
    # Overwrite the current file with the new content
    $NewContent | Set-Content $file
}



Thanks for reading! If you have any questions, leave a comment or send me an email at [email protected] I invite you to follow me on Twitter @lazywinadm / Google+ / LinkedIn. You can also follow the LazyWinAdmin Blog on Facebook Page and Google+ Page.

2015/05/18

PowerShell - Report Expiring User accounts

In the video game industry it is common practice to hire consultants to take care of the Quality Assurance, which consists of a means of the software engineering processes and methods used to ensure quality. Those people are most likely Testers and usually spend most of their day testing games in development to find bugs.

The problem is, once in a while managers forget to update the expiration dates of their Consultant/External Partners even if they got a couple of reminders, and since we have some automation process taking care of the off-boarding (thanks to PowerShell! ;-)...it is becoming fun when those guys can't connect to their accounts on Monday morning...and they lost all their access.


So I wrote a tiny script to report any expiring user accounts and send it to the IT department every Monday morning, just to give us a heads up.

Report Example

2015/05/04

Microsoft Ignite 2015 in Chicago

This week I will be in Chicago, Illinois to attend the Microsoft Ignite 2015. The last time I was in Chicago was for the Marathon in 2012. I kept a great souvenir of the city and people were super nice!

This time I'm going to attend the Microsoft Ignite all week but first I will meet my coworkers at the NetherRealm video games studio (where they work on Mortal Kombat) and visit their workplace.



If you want to meet me, I'm planning to attend sessions related to Nano servers, Docker/Containers, Windows Server/System Center and anything interesting on Automation.

I will also be at the Scripting Guys and Sapien Booth:

  • Scripting Guys Booth - Tuesday May 5 - 10:30am
  • Sapien Technologies Booth #355 - Wednesday May 6 - 3:00pm



Willis Tower



Running the Chicago's Marathon in 2012
Marathon is a 26 miles/42 km race, I finished in 3 hours and 20 minutes! :-)


Thanks for reading! If you have any questions, leave a comment or send me an email at [email protected] I invite you to follow me on Twitter @lazywinadm / Google+ / LinkedIn. You can also follow the LazyWinAdmin Blog on Facebook Page and Google+ Page.