Click on the banner to learn about and purchase my database training on Azure

SQL Server - How to Convert an RTF String to Text (Remove RTF Tags) Using CLR (C #) or Powershell

Views: 10.553 views
Reading Time: 6 minutes

Hello people,
Alright?

In this article, I will show you how to convert an RTF string to text (Remove RTF tags) using CLR (C #) or Powershell, which was a necessity I had last week, where a system would write the information to a table and the data would be Rich Text Format (RTF). I searched the internet a lot to find solutions that would help me convert RTF to text straight from the database and I didn't find many alternatives that I liked and that motivated me to write this post for you.

If you do not know the CLR or want to know more about it, understand how it works, its limitations, advantages, disadvantages and how to create and publish a new CLR project, read the post Introduction to SQL Common Language Runtime (CLR) in SQL Server.

The trivial solution with C #

If you do this search on Google, probably the most common solution you'll find is this:

Which is a really simple solution and it really works. However, I would like to use this solution for the database and so using the System.Windows.Forms library is not a very interesting idea, since this library is not loaded by default by SQL Server and so I would have to Import its assembly and its dependencies (which are not few) manually into the database, as shown below:

This would increase the complexity of your CLR (as well as publications) and can have negative effects on the stability of your instance, so I do not recommend this practice.

The C # Solution Using Helper RichTextStripper

After much searching the internet for similar solutions, I found a class called RichTextStripper, written in C # and developed by Chris Benard, which does just that. He found Python code that does this work and converted the code to C #.

Function source code in CLR

This is the function that will be generated in the database and you will use it to remove RTF formatting from strings and columns in SQL Server after your CLR is published and using the RichTextStripper class.

Function usage demo

As you can see below, the result of the function is the same as when you open the document (or string) in Microsoft Word.

Using the CLR function to remove RDF tags from string and return text

Viewing RTF Content in Word

The solution with Powershell

If you don't want to use CLR (C #) to accomplish this task, don't worry. You can also do this using Powershell (Posh) and you can integrate it with SQL Server in various ways by using xp_cmdshell and exporting the result to text or capturing the powershell return.

The Powershell script is very simple. Remember that simple C # script I posted at the beginning of the post that I didn't want to implement in CLR because of Windows.Forms dependencies? So .. In powershell this is no problem .. 🙂

Powershell Source Code:

Powershell Execution Result

Example 1:
With this you can use xp_cmdshell to run your Powershell as needed:

Example Using PowerShell -File and Entering the Path of a File I Previously Created with the Posh Commands Demonstrated Above

2 Example

Result:

Example Using PowerShell -Command and Entering Parameters Manually

That's it folks!
I hope you enjoyed the post and see you next time.

sql server clr converter convert remove remove rdf rich text format tags strings for plain text plain text C # csharp powershell posh ps

sql server clr converter convert remove remove rdf rich text format tags strings for plain text plain text C # csharp powershell posh ps