Staff,
Good afternoon!
Today I will show you how to remove HTML tags from a string in SQL Server using only T-SQL. This function was very useful for me because there was a need to include a column in a report that was exported to XLS (Excel), but this column was the HTML description of the system-generated calls and in Excel that lot of HTML tags. It was really bad to read .. That's when I developed this function and introduce you.
FncRemove_HTML function source code:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 | CREATE FUNCTION [dbo].[fncRemove_HTML] (@HTMLText VARCHAR(MAX)) RETURNS VARCHAR(MAX) AS BEGIN DECLARE @Start INT DECLARE @End INT DECLARE @Length INT -- Substitui a entidade HTML "&" pelo caracter '&' SET @Start = CHARINDEX('&', @HTMLText) SET @End = @Start + 4 SET @Length = (@End - @Start) + 1 WHILE (@Start > 0 AND @End > 0 AND @Length > 0) BEGIN SET @HTMLText = STUFF(@HTMLText, @Start, @Length, '&') SET @Start = CHARINDEX('&', @HTMLText) SET @End = @Start + 4 SET @Length = (@End - @Start) + 1 END -- Substitui a entidade HTML "<" pelo caracter '<' SET @Start = CHARINDEX('<', @HTMLText) SET @End = @Start + 3 SET @Length = (@End - @Start) + 1 WHILE (@Start > 0 AND @End > 0 AND @Length > 0) BEGIN SET @HTMLText = STUFF(@HTMLText, @Start, @Length, '<') SET @Start = CHARINDEX('<', @HTMLText) SET @End = @Start + 3 SET @Length = (@End - @Start) + 1 END -- Substitui a entidade HTML ">" pelo caracter '>' SET @Start = CHARINDEX('>', @HTMLText) SET @End = @Start + 3 SET @Length = (@End - @Start) + 1 WHILE (@Start > 0 AND @End > 0 AND @Length > 0) BEGIN SET @HTMLText = STUFF(@HTMLText, @Start, @Length, '>') SET @Start = CHARINDEX('>', @HTMLText) SET @End = @Start + 3 SET @Length = (@End - @Start) + 1 END -- Substitui a entidade HTML "&&" pelo caracter '&' SET @Start = CHARINDEX('&amp;', @HTMLText) SET @End = @Start + 4 SET @Length = (@End - @Start) + 1 WHILE (@Start > 0 AND @End > 0 AND @Length > 0) BEGIN SET @HTMLText = STUFF(@HTMLText, @Start, @Length, '&') SET @Start = CHARINDEX('&amp;', @HTMLText) SET @End = @Start + 4 SET @Length = (@End - @Start) + 1 END -- Substitui a entidade HTML " " pelo caracter ' ' SET @Start = CHARINDEX(' ', @HTMLText) SET @End = @Start + 5 SET @Length = (@End - @Start) + 1 WHILE (@Start > 0 AND @End > 0 AND @Length > 0) BEGIN SET @HTMLText = STUFF(@HTMLText, @Start, @Length, ' ') SET @Start = CHARINDEX(' ', @HTMLText) SET @End = @Start + 5 SET @Length = (@End - @Start) + 1 END -- Substitui a tag <br> pela sequência de nova linha (CHR(13) + CHR(10)) SET @Start = CHARINDEX('<br>', @HTMLText) SET @End = @Start + 3 SET @Length = (@End - @Start) + 1 WHILE (@Start > 0 AND @End > 0 AND @Length > 0) BEGIN SET @HTMLText = STUFF(@HTMLText, @Start, @Length, CHAR(13) + CHAR(10)) SET @Start = CHARINDEX('<br>', @HTMLText) SET @End = @Start + 3 SET @Length = (@End - @Start) + 1 END -- Substitui a tag <br/> pela sequência de nova linha (CHR(13) + CHR(10)) SET @Start = CHARINDEX('<br/>', @HTMLText) SET @End = @Start + 4 SET @Length = (@End - @Start) + 1 WHILE (@Start > 0 AND @End > 0 AND @Length > 0) BEGIN SET @HTMLText = STUFF(@HTMLText, @Start, @Length, 'CHAR(13) + CHAR(10)') SET @Start = CHARINDEX('<br/>', @HTMLText) SET @End = @Start + 4 SET @Length = (@End - @Start) + 1 END -- Substitui a tag <br /> pela sequência de nova linha (CHR(13) + CHR(10)) SET @Start = CHARINDEX('<br />', @HTMLText) SET @End = @Start + 5 SET @Length = (@End - @Start) + 1 WHILE (@Start > 0 AND @End > 0 AND @Length > 0) BEGIN SET @HTMLText = STUFF(@HTMLText, @Start, @Length, 'CHAR(13) + CHAR(10)') SET @Start = CHARINDEX('<br />', @HTMLText) SET @End = @Start + 5 SET @Length = (@End - @Start) + 1 END -- Remove os parâmetros contidos nas tags HTML SET @Start = CHARINDEX('<', @HTMLText) SET @End = CHARINDEX('>', @HTMLText, CHARINDEX('<', @HTMLText)) SET @Length = (@End - @Start) + 1 WHILE (@Start > 0 AND @End > 0 AND @Length > 0) BEGIN SET @HTMLText = STUFF(@HTMLText, @Start, @Length, '') SET @Start = CHARINDEX('<', @HTMLText) SET @End = CHARINDEX('>', @HTMLText, CHARINDEX('<', @HTMLText)) SET @Length = (@End - @Start) + 1 END RETURN LTRIM(RTRIM(@HTMLText)) END |
FncRemove_Formatacao_Html function source code
Another way to remove HTML tags and formatting from strings is to use the fncRemove_Formatacao_Html function, which I demonstrate below.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 | CREATE FUNCTION [dbo].[fncRemove_Formatacao_Html] ( @pe_sTexto VARCHAR(MAX) ) RETURNS VARCHAR(MAX) AS BEGIN DECLARE @v_sTexto VARCHAR(MAX), @v_iPosicaoInicio INT = 0, @v_iPosicaoFim INT = 0, @v_iTotalPosicoes INT = 0 SET @v_sTexto = @pe_sTexto SET @v_sTexto = REPLACE(@v_sTexto, '<BR><BR><BR>', '<BR>') SET @v_sTexto = REPLACE(@v_sTexto, '<BR><BR>', '<BR>') SET @v_sTexto = REPLACE(@v_sTexto, '<BR>', '. ') WHILE (1 = 1) BEGIN SET @v_iPosicaoInicio = PATINDEX('%<%', @v_sTexto) SET @v_iPosicaoFim = ISNULL(NULLIF(PATINDEX('%>%', @v_sTexto), 0), LEN(@v_sTexto)) SET @v_iTotalPosicoes = ((@v_iPosicaoFim - @v_iPosicaoInicio) + 1) IF (@v_iPosicaoFim<@v_iPosicaoInicio) SET @v_sTexto = SUBSTRING(@v_sTexto, 1, @v_iPosicaoFim-1)+SUBSTRING(@v_sTexto, @v_iPosicaoFim+1, LEN(@v_sTexto)) ELSE IF (@v_iPosicaoInicio <> 0) SET @v_sTexto = REPLACE(@v_sTexto, SUBSTRING(@v_sTexto, @v_iPosicaoInicio, @v_iTotalPosicoes), '') ELSE BREAK; SET @v_iPosicaoInicio = 0 SET @v_iPosicaoFim = 0 SET @v_iTotalPosicoes = 0 END SET @v_sTexto = REPLACE(@v_sTexto, '..', '. ') SET @v_sTexto = REPLACE(@v_sTexto, ',.', ', ') SET @v_sTexto = REPLACE(@v_sTexto, '?.', '? ') SET @v_sTexto = REPLACE(@v_sTexto, '!.', '! ') SET @v_sTexto = REPLACE(@v_sTexto, ',.', ', ') SET @v_sTexto = REPLACE(@v_sTexto, '.. ', '. ') SET @v_sTexto = REPLACE(@v_sTexto, ',. ', ', ') SET @v_sTexto = REPLACE(@v_sTexto, '?. ', '? ') SET @v_sTexto = REPLACE(@v_sTexto, '!. ', '! ') SET @v_sTexto = REPLACE(@v_sTexto, ',. ', ', ') SET @v_sTexto = REPLACE(@v_sTexto, '. ', '. ') SET @v_sTexto = REPLACE(@v_sTexto, ', ', ', ') SET @v_sTexto = REPLACE(@v_sTexto, '? ', '? ') SET @v_sTexto = REPLACE(@v_sTexto, '! ', '! ') SET @v_sTexto = REPLACE(@v_sTexto, ' . ', '. ') SET @v_sTexto = REPLACE(@v_sTexto, ' ', '') SET @v_sTexto = RTRIM(@v_sTexto) SET @v_sTexto = LTRIM(@v_sTexto) RETURN @v_sTexto END |
If you want to use this same function, but with the source code in C # and using CLR, for a much better performance, see the post SQL Server - How to Convert an HTML String to Text (Remove HTML Tags) Using CLR (C #).
A big hug, see you later!