Hey guys!
All right with you? I hope so, huh!
In this post, I will demonstrate to you how to convert an HTML string to text (Remove HTML tags) using CLR (C #). If you are new to the blog or have never heard of CLR or don't know how to create your first project using this powerful SQL Server tool, which lets you create C # or VB.NET code and run it through the database, give it a try. list in post Introduction to SQL Common Language Runtime (CLR) in SQL Server.
In 2014, I made the post Removing HTML Tags from a String in SQL Server and therefore, you must be asking yourself: “Dirceu, if you have already made a post about it, why do another one using the CLR?” and the answer to that is very simple: Much simpler code and PERFORMANCE! As I explained in the post SQL Server - Performance Comparison between Scalar Function and CLR Scalar Function, CLR functions generally deliver much better performance than UDF T-SQL functions, up to 200 times faster.
Code used for record creation:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 | IF (OBJECT_ID('dirceuresende.dbo.Teste_HTML') IS NOT NULL) DROP TABLE dirceuresende.dbo.Teste_HTML CREATE TABLE dirceuresende.dbo.Teste_HTML ( texto VARCHAR(MAX) ) GO INSERT INTO dirceuresende.dbo.Teste_HTML SELECT '<!DOCTYPE html> <html> <body> <h1>My First Heading</h1> <p>My first paragraph.</p> </body> </html>' AS texto FROM sys.objects GO 40 |
To demonstrate the difference in performance of the two functions, I populated a table with only 6.000 records, all with the same string containing basic HTML code. And the larger the data volume, the greater the performance difference between the CLR and UDF T-SQL functions. The result you can check below:
Function Source Code:
To remove HTML tags, I use the HtmlDecode method of the WebUtility class, which belongs to the System.Net library. This method is available only from the .NET Framework 4.0 and therefore can only be used on SQL Server 2012 or later (SQL Server versions 2005 and 2008 use the .NET Framework 3.5)
The parameter Fl_Quebra_Linha serves to replace the tag (and its variants) for a line break in the text. If you enter a value of 0 (false) in this parameter, line breaks will be replaced by an empty string.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 | using System; using System.Net; using System.Text.RegularExpressions; public partial class UserDefinedFunctions { [Microsoft.SqlServer.Server.SqlFunction] public static string fncRemove_Html_String(string Ds_String_HTML, bool Fl_Quebra_Linha) { if (string.IsNullOrEmpty(Ds_String_HTML)) return null; var html = Ds_String_HTML; html = WebUtility.HtmlDecode(html); if (Fl_Quebra_Linha) { html = Regex.Replace(html, "<br/>", Environment.NewLine, RegexOptions.IgnoreCase); html = Regex.Replace(html, "<br />", Environment.NewLine, RegexOptions.IgnoreCase); html = Regex.Replace(html, "<br>", Environment.NewLine, RegexOptions.IgnoreCase); } html = Regex.Replace(html, " ", " ", RegexOptions.IgnoreCase); html = Regex.Replace(html, "<.*?>", string.Empty); return html; } } |
That's it folks!
I hope you enjoyed the post and see you next time.
sql server converter convert string text html remove remove html tags for text plain text
sql server converter convert string text html remove remove html tags for text plain text