Speak guys!
In this blog post, I'd like to share with you a solution that allows you to analyze and generate Whatsapp group conversation statistics using SQL Server and Power BI. Unlike Telegram, which has an integration API and task automation bots, Whatsapp is completely late closed and does not allow any kind of legal integration into the platform, which makes the task of managing groups much more expensive for administrators, especially in large groups.
Wouldn't it be nice to have some statistics of who are the people who talk most in a group, which days and times people are most interacting with, and who are the users who never talk at all? While in Telegram there are bots (Combot, for example) that do this for you, in Whatsapp there is no way to generate this kind of statistics. I mean, there wasn't!
How to export group conversations
Since there is no integration API in Whatsapp, there is no way for you to access conversations directly from the Whatsapp platform. However, one way to access conversation data is by exporting conversations from your mobile phone (Whatsapp Web does not have this feature) and saving to your server.
A handy way to do this is to save it to your Google Drive / Dropbox / Email from your phone and download the file to your server.
How to import group conversations
View contentWhen you open the conversation zip file, you will see that there are several contact (.vcf) files and you will have a .txt file, which we are looking to import:
When you open the file you will see that its encoding is UTF-8 (Codepage 65001) and the line wrapping format is Unix (Linefeed or \ n). This will be very important to import the file correctly.
Now we need import this file into the database. Choose the method that suits you best and move on to the next step.
BULK INSERT (Did not identify the line break)
1 2 3 4 5 6 7 8 9 10 11 12 13 |
IF (OBJECT_ID('tempdb..#Teste') IS NOT NULL) DROP TABLE #Teste CREATE TABLE #Teste ( Linha VARCHAR(MAX) ) BULK INSERT #Teste FROM 'C:\Temporario\Conversas.txt' WITH( ROWTERMINATOR = '\n', -- Unix CODEPAGE = '65001' -- UTF-8 ) SELECT * FROM #Teste |
OPENROWSET (Did not identify line break and encoding)
1 2 3 4 5 6 7 8 9 10 |
IF (OBJECT_ID('tempdb..#Teste') IS NOT NULL) DROP TABLE #Teste CREATE TABLE #Teste ( Linha VARCHAR(MAX) ) INSERT INTO #Teste SELECT * FROM OPENROWSET(BULK 'C:\Temporario\Conversas.txt', CODEPAGE='65001', SINGLE_CLOB) AS Arquivo(linhas) SELECT * FROM #Teste |
BCP with xp_cmdshell (no line break and no encoding)
1 2 3 4 5 6 7 8 9 |
IF (OBJECT_ID('dirceuresende.dbo.Teste') IS NOT NULL) DROP TABLE dirceuresende.dbo.Teste CREATE TABLE dirceuresende.dbo.Teste ( Linha VARCHAR(MAX) ) INSERT INTO dirceuresende.dbo.Teste EXEC master.dbo.xp_cmdshell 'bcp dirceuresende.dbo.Teste IN "C:\Temporario\Conversas.txt" -T -Slocalhost\sql2016 -c' SELECT * FROM dirceuresende.dbo.Teste |
OLE Automation (worked, but does not read UTF-8 files)
1 2 3 4 5 6 7 8 9 |
IF (OBJECT_ID('dirceuresende.dbo.Teste') IS NOT NULL) DROP TABLE dirceuresende.dbo.Teste CREATE TABLE dirceuresende.dbo.Teste ( Linha VARCHAR(MAX) ) INSERT INTO #Teste SELECT * FROM dbo.fncLer_Arquivo_FSO('C:\Temporario\Conversas.txt') SELECT * FROM #Teste |
To view the source code of the fncLer_File_File function, visit the post. SQL Server - How to import text files into database (OLE Automation, CLR, BCP, BULK INSERT, OPENROWSET).
OPENROWSET with ACE OLEDB (worked)
One of the options where you can actually import the file correctly, including in UTF-8 encoding, is by using OPENROWSET and ACE OLEDB drivers. If you do not have ACE OLEDB drivers installed on your server, install them using the post tutorial. SQL Server - How to Install the Microsoft.ACE.OLEDB.12.0 and Microsoft.Jet.OLEDB.4.0 Drivers.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
IF (OBJECT_ID('tempdb..#Teste') IS NOT NULL) DROP TABLE #Teste CREATE TABLE #Teste ( Linha VARCHAR(MAX) ) INSERT INTO #Teste SELECT * FROM OPENROWSET( 'Microsoft.ACE.OLEDB.12.0', 'Text;Database=C:\Temporario\;HDR=No;CharacterSet=65001', 'SELECT [F1] FROM [Conversas.txt]' ) A SELECT * FROM #Teste |
SQLCLR (worked)
And lastly, the solution that for me is the best of options, SQLCLR allows you to import text files easily using the Stored Procedure shown below:
1 2 3 4 5 6 7 8 9 10 11 12 |
IF (OBJECT_ID('dirceuresende.dbo.Teste') IS NOT NULL) DROP TABLE dirceuresende.dbo.Teste CREATE TABLE dirceuresende.dbo.Teste ( Linha VARCHAR(MAX) ) INSERT INTO #Teste EXEC CLR.dbo.stpImporta_Txt_Encoding @caminho = N'C:\Temporario\Conversas.txt', -- nvarchar(max) @Ds_Encoding = N'utf-8' -- nvarchar(max) SELECT * FROM #Teste |
If you want to implement this SP in your database, just use the script below:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 |
USE [dirceuresende] GO IF NOT EXISTS(SELECT NULL FROM sys.assemblies WHERE [name] = 'SQLCLR') BEGIN CREATE ASSEMBLY [SQLCLR] FROM 0x4D5A90000300000004000000FFFF0000B800000000000000400000000000000000000000000000000000000000000000000000000000000000000000800000000E1FBA0E00B409CD21B8014CCD21546869732070726F6772616D2063616E6E6F742062652072756E20696E20444F53206D6F64652E0D0D0A2400000000000000504500004C010300FDFD775B0000000000000000E00022200B013000000C00000006000000000000B62A0000002000000040000000000010002000000002000004000000000000000600000000000000008000000002000000000000030060850000100000100000000010000010000000000000100000000000000000000000642A00004F000000004000006803000000000000000000000000000000000000006000000C0000002C2900001C0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000200000080000000000000000000000082000004800000000000000000000002E74657874000000BC0A000000200000000C000000020000000000000000000000000000200000602E72737263000000680300000040000000040000000E0000000000000000000000000000400000402E72656C6F6300000C0000000060000000020000001200000000000000000000000000004000004200000000000000000000000000000000982A0000000000004800000002000500182100001408000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001B3006009D00000001000011000F00281000000A0F01281000000A281100000A731200000A0A00178D16000001251672010000701F0C281300000A731400000AA2731500000A0B281600000A14FE010C082C02DE53281600000A076F1700000A002B1C000716066F1800000A6F1900000A00281600000A076F1A00000A0000066F1B00000A16FE0416FE010D092DD4281600000A6F1C00000A0000DE0B062C07066F1D00000A00DC2A0000000110000002001A007791000B000000002202281E00000A002A00000042534A4201000100000000000C00000076342E302E33303331390000000005006C00000080020000237E0000EC020000B003000023537472696E6773000000009C0600001400000023555300B0060000100000002347554944000000C00600005401000023426C6F620000000000000002000001471502000900000000FA013300160000010000001B0000000200000002000000020000001E0000000F00000001000000010000000200000000007A020100000000000600820112030600EF0112030600A000E0020F00320300000600C8008D02060065018D02060046018D020600D6018D020600A2018D020600BB018D020600F5008D020600B400F30206009200F302060029018D020600100147020600670386020A00DF00BF020A00610241030600A70212000A004C00BF0206003E027F030A002800BF020A0080001C000A008B03BF020A007800BF020600B402120006005A0086020000000009000000000001000100010010005603000041000100010050200000000096002302650001000C21000000008618DA0206000300000001009F020000020017020900DA0201001100DA0206001900DA020A002900DA0210003100DA0210003900DA0210004100DA0210004900DA0210005100DA0210005900DA0210006100DA0215006900DA0210007100DA0210007900DA0210008900DA02060091000D022300A9003B0227009900DA022D00B100A5033400B100DA023800A100DA024000C1006F004700C9006E034C00D10066002300A1006B025200C90096034C00D10075025800C9003D000600D9008A0006008100DA02060020007B004E012E000B006D002E00130076002E001B0095002E0023009E002E002B00AB002E003300AB002E003B00AB002E0043009E002E004B00B1002E005300AB002E005B00AB002E006300C9002E006B00F3002E00730000011A00048000000100000000000000000000000000010000000400000000000000000000005C003400000000000400000000000000000000005C001C000000000000000053514C434C5232003C4D6F64756C653E0053797374656D2E494F0053797374656D2E446174610053716C4D65746144617461006D73636F726C69620053656E64526573756C7473456E640053716C446174615265636F72640049446973706F7361626C6500526561644C696E65006765745F506970650053716C506970650053716C44625479706500446973706F736500477569644174747269627574650044656275676761626C6541747472696275746500436F6D56697369626C6541747472696275746500417373656D626C795469746C654174747269627574650053716C50726F63656475726541747472696275746500417373656D626C7954726164656D61726B417474726962757465005461726765744672616D65776F726B41747472696275746500417373656D626C7946696C6556657273696F6E41747472696275746500417373656D626C79436F6E66696775726174696F6E41747472696275746500417373656D626C794465736372697074696F6E41747472696275746500436F6D70696C6174696F6E52656C61786174696F6E7341747472696275746500417373656D626C7950726F6475637441747472696275746500417373656D626C79436F7079726967687441747472696275746500417373656D626C79436F6D70616E794174747269627574650052756E74696D65436F6D7061746962696C697479417474726962757465006765745F56616C75650044735F456E636F64696E6700737470496D706F7274615F5478745F456E636F64696E6700476574456E636F64696E670053797374656D2E52756E74696D652E56657273696F6E696E670053716C537472696E6700536574537472696E67005065656B0053514C434C52322E646C6C0053797374656D0053797374656D2E5265666C656374696F6E0063616D696E686F0053747265616D5265616465720054657874526561646572004D6963726F736F66742E53716C5365727665722E536572766572002E63746F720053797374656D2E446961676E6F73746963730053797374656D2E52756E74696D652E496E7465726F7053657276696365730053797374656D2E52756E74696D652E436F6D70696C6572536572766963657300446562756767696E674D6F6465730053797374656D2E446174612E53716C54797065730053746F72656450726F63656475726573004F626A6563740053656E64526573756C747353746172740053797374656D2E546578740053716C436F6E746578740053656E64526573756C7473526F77006765745F4D6178000000000011440073005F004C0069006E00680061000000296D52E3C4A8364896D45A12676954DA00042001010803200001052001011111042001010E0420010102080704124D125102020320000E05000112550E062002010E12550300000A072003010E115D0A062001011D1259040000126505200101125105200201080E0320000808B77A5C561934E08907000201114911490801000800000000001E01000100540216577261704E6F6E457863657074696F6E5468726F7773010801000701000000000C01000753514C434C5232000005010000000017010012436F7079726967687420C2A920203230313800002901002436626662343334372D636164302D346433372D616665642D65656234353063356538616400000C010007312E302E302E3000004D01001C2E4E45544672616D65776F726B2C56657273696F6E3D76342E362E310100540E144672616D65776F726B446973706C61794E616D65142E4E4554204672616D65776F726B20342E362E3104010000000000000000FDFD775B00000000020000001C01000048290000480B000052534453B0EBB2DD9240B14AAA96BC6723A8AF2C01000000433A5C55736572735C646966696C5C446F63756D656E74735C56697375616C2053747564696F20323031375C50726F6A656374735C4461746162617365315C53514C434C52325C6F626A5C44656275675C53514C434C52322E7064620000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000008C2A00000000000000000000A62A0000002000000000000000000000000000000000000000000000982A0000000000000000000000005F436F72446C6C4D61696E006D73636F7265652E646C6C0000000000FF2500200010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001001000000018000080000000000000000000000000000001000100000030000080000000000000000000000000000001000000000048000000584000000C03000000000000000000000C0334000000560053005F00560045005200530049004F004E005F0049004E0046004F0000000000BD04EFFE00000100000001000000000000000100000000003F000000000000000400000002000000000000000000000000000000440000000100560061007200460069006C00650049006E0066006F00000000002400040000005400720061006E0073006C006100740069006F006E00000000000000B0046C020000010053007400720069006E006700460069006C00650049006E0066006F0000004802000001003000300030003000300034006200300000001A000100010043006F006D006D0065006E007400730000000000000022000100010043006F006D00700061006E0079004E0061006D0065000000000000000000380008000100460069006C0065004400650073006300720069007000740069006F006E0000000000530051004C0043004C00520032000000300008000100460069006C006500560065007200730069006F006E000000000031002E0030002E0030002E003000000038000C00010049006E007400650072006E0061006C004E0061006D0065000000530051004C0043004C00520032002E0064006C006C0000004800120001004C006500670061006C0043006F007000790072006900670068007400000043006F0070007900720069006700680074002000A90020002000320030003100380000002A00010001004C006500670061006C00540072006100640065006D00610072006B007300000000000000000040000C0001004F0072006900670069006E0061006C00460069006C0065006E0061006D0065000000530051004C0043004C00520032002E0064006C006C000000300008000100500072006F0064007500630074004E0061006D00650000000000530051004C0043004C00520032000000340008000100500072006F006400750063007400560065007200730069006F006E00000031002E0030002E0030002E003000000038000800010041007300730065006D0062006C0079002000560065007200730069006F006E00000031002E0030002E0030002E0030000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000002000000C000000B83A00000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000 WITH PERMISSION_SET = EXTERNAL_ACCESS END GO IF (OBJECT_ID('dbo.stpImporta_Txt_Encoding') IS NOT NULL) DROP PROCEDURE stpImporta_Txt_Encoding GO CREATE PROCEDURE [dbo].[stpImporta_Txt_Encoding] @caminho [nvarchar](max), @Ds_Encoding [nvarchar](max) WITH EXECUTE AS CALLER AS EXTERNAL NAME [SQLCLR].[StoredProcedures].[stpImporta_Txt_Encoding] GO |
Since this SP requires access to the filesystem, it must be created using the EXTERNAL_ACCESS permission. This requires the database to have the TRUSTWORTHY property set to ON before creating the assembly and Stored Procedure:
1 2 3 4 5 |
USE [master] GO ALTER DATABASE [dirceuresende] SET TRUSTWORTHY ON GO |
How to handle data in SQL Server
View contentIn summary, I took the following actions to handle the information:
- Generate the table that will store the data
- Imports the file
- Replaces the string "you" with your user
- Remove character "?"
- Formats the event date
- Remove unformatted lines (probably broken text on multiple lines)
- Remove date from text (already has specific column for it)
- Identifies the user who sent the common messages
- Identifies the user who left the group
- Identifies users who have been removed from the group.
- Identify users who joined the group via invite link
- Identifies users who were added to the group by another user.
- Identify common messages
- Creates the tables for use in Power BI (Group_Users, Never_Used_Users, Speak_More_Word, and Speak_More_Word)
- Ignore some common expressions of Portuguese language in most spoken word tables
Source code:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 |
----------------------------------------------------------------------------------------- -- GERA A TABELA QUE IRÁ ARMAZENAR OS DADOS ----------------------------------------------------------------------------------------- IF (OBJECT_ID('dbo.Conversas_Whatsapp') IS NOT NULL) DROP TABLE dbo.Conversas_Whatsapp CREATE TABLE dbo.Conversas_Whatsapp ( Ds_Linha VARCHAR(MAX), Dt_Evento DATETIME, Ds_Situacao VARCHAR(50), Ds_Mensagem VARCHAR(MAX), Ds_Usuario_Removido VARCHAR(100) ) ----------------------------------------------------------------------------------------- -- IMPORTA O ARQUIVO ----------------------------------------------------------------------------------------- INSERT INTO dbo.Conversas_Whatsapp(Ds_Linha) SELECT [F1] FROM OPENROWSET( 'Microsoft.ACE.OLEDB.12.0', 'Text;Database=C:\Temporario\;HDR=No;CharacterSet=65001', 'SELECT [F1] FROM [Conversas.txt]' ) A ----------------------------------------------------------------------------------------- -- REMOVE CARACTERE "?" ----------------------------------------------------------------------------------------- UPDATE A SET A.Ds_Linha = REPLACE(REPLACE(REPLACE(Ds_Linha, ' - ?', ' - '), '?:', ':'), '? saiu', ' saiu') FROM dbo.Conversas_Whatsapp A ----------------------------------------------------------------------------------------- -- FORMATA A DATA DO EVENTO ----------------------------------------------------------------------------------------- UPDATE dbo.Conversas_Whatsapp SET Dt_Evento = TRY_CONVERT(DATETIME, LEFT(Ds_Linha, 6) + '20' + SUBSTRING(Ds_Linha, 7, 8), 103) ----------------------------------------------------------------------------------------- -- REMOVE LINHAS SEM DATA FORMATADA (PROVAVELMENTE TEXTOS QUEBRADOS EM VÁRIAS LINHAS) ----------------------------------------------------------------------------------------- DELETE A FROM dbo.Conversas_Whatsapp A WHERE Dt_Evento IS NULL ----------------------------------------------------------------------------------------- -- REMOVE A DATA DO TEXTO (JÁ POSSUI COLUNA ESPECÍFICA PARA ISSO) ----------------------------------------------------------------------------------------- UPDATE dbo.Conversas_Whatsapp SET Ds_Linha = SUBSTRING(Ds_Linha, 18, LEN(Ds_Linha)) ----------------------------------------------------------------------------------------- -- IDENTIFICA O USUÁRIO QUE ENVIOU AS MENSAGENS COMUNS ----------------------------------------------------------------------------------------- UPDATE dbo.Conversas_Whatsapp SET Ds_Mensagem = LTRIM(RTRIM(SUBSTRING(Ds_Linha, CHARINDEX(':', Ds_Linha) + 1, LEN(Ds_Linha)))) UPDATE dbo.Conversas_Whatsapp SET Ds_Linha = LTRIM(RTRIM(REPLACE(SUBSTRING(Ds_Linha, 1, CHARINDEX(':', Ds_Linha)), ':', ''))) ----------------------------------------------------------------------------------------- -- IDENTIFICA O USUÁRIO QUE SAIU DO GRUPO ----------------------------------------------------------------------------------------- UPDATE dbo.Conversas_Whatsapp SET Ds_Linha = REPLACE(LEFT(Ds_Mensagem, LEN(Ds_Mensagem) - 5), '?', ''), Ds_Mensagem = 'saiu', Ds_Situacao = 'saiu', Ds_Usuario_Removido = REPLACE(LEFT(Ds_Mensagem, LEN(Ds_Mensagem) - 5), '?', '') WHERE Ds_Mensagem LIKE '% saiu' ----------------------------------------------------------------------------------------- -- IDENTIFICA OS USUÁRIOS QUE FORAM REMOVIDOS DO GRUPO ----------------------------------------------------------------------------------------- UPDATE dbo.Conversas_Whatsapp SET Ds_Linha = REPLACE(LEFT(Ds_Mensagem, CHARINDEX(' removeu ', Ds_Mensagem)), '?', ''), Ds_Usuario_Removido = REPLACE(SUBSTRING(Ds_Mensagem, CHARINDEX(' removeu ', Ds_Mensagem) + 9, LEN(Ds_Mensagem)), '?', ''), Ds_Situacao = 'saiu' WHERE Ds_Mensagem LIKE '% removeu %' ----------------------------------------------------------------------------------------- -- IDENTIFICA OS USUÁRIOS QUE ENTRARAM NO GRUPO ATRAVÉS DE LINK DE CONVITE ----------------------------------------------------------------------------------------- UPDATE dbo.Conversas_Whatsapp SET Ds_Linha = LTRIM(RTRIM(REPLACE(LEFT(Ds_Mensagem, CHARINDEX(' entrou usando o link de convite deste grupo', Ds_Mensagem)), '?', ''))), Ds_Mensagem = LTRIM(RTRIM(REPLACE(SUBSTRING(Ds_Mensagem, CHARINDEX(' entrou usando o link de convite deste grupo', Ds_Mensagem), LEN(Ds_Mensagem)), '?', ''))), Ds_Situacao = 'entrou' WHERE Ds_Mensagem LIKE '% entrou usando o link de convite deste grupo' ----------------------------------------------------------------------------------------- -- IDENTIFICA OS USUÁRIOS QUE FORAM ADICIONADOS AO GRUPO POR OUTRO USUÁRIO ----------------------------------------------------------------------------------------- INSERT INTO dbo.Conversas_Whatsapp ( Ds_Linha, Dt_Evento, Ds_Situacao, Ds_Mensagem, Ds_Usuario_Removido ) SELECT LTRIM(RTRIM(REPLACE(LEFT(A.Ds_Mensagem, CHARINDEX(' adicionou ', A.Ds_Mensagem)), '?', ''))) AS Ds_Linha, A.Dt_Evento, 'entrou' AS Ds_Situacao, A.Ds_Mensagem, LTRIM(RTRIM(B.Palavra)) AS Ds_Usuario_Removido FROM dbo.Conversas_Whatsapp A CROSS APPLY dbo.fncSplitTexto(REPLACE(LTRIM(RTRIM(REPLACE(SUBSTRING(A.Ds_Mensagem, CHARINDEX(' adicionou ', A.Ds_Mensagem) + 11, LEN(A.Ds_Mensagem)), '?', ''))), ' e ', ','), ',') B WHERE A.Ds_Mensagem LIKE '% adicionou %' AND Ds_Linha = '' DELETE FROM dbo.Conversas_Whatsapp WHERE Ds_Mensagem LIKE '% adicionou %' AND Ds_Linha = '' ----------------------------------------------------------------------------------------- -- IDENTIFICA AS MENSAGENS COMUNS ----------------------------------------------------------------------------------------- UPDATE dbo.Conversas_Whatsapp SET Ds_Situacao = 'info' WHERE Ds_Situacao IS NULL AND Ds_Linha = '' UPDATE dbo.Conversas_Whatsapp SET Ds_Situacao = 'mensagem' WHERE Ds_Situacao IS NULL ----------------------------------------------------------------------------------------- -- SUBSTITUI A STRING "VOCÊ" PELO SEU USUÁRIO ----------------------------------------------------------------------------------------- UPDATE A SET A.Ds_Linha = REPLACE(A.Ds_Linha, 'Você', 'Dirceu Resende') FROM dbo.Conversas_Whatsapp A WHERE A.Ds_Situacao = 'mensagem' UPDATE A SET A.Ds_Usuario_Removido = REPLACE(A.Ds_Usuario_Removido, 'Você', 'Dirceu Resende') FROM dbo.Conversas_Whatsapp A WHERE A.Ds_Situacao IN ('entrou', 'saiu', 'info') ----------------------------------------------------------------------------------------- -- CRIA AS TABELAS PARA USAR NO POWERBI ----------------------------------------------------------------------------------------- IF (OBJECT_ID('dbo.Usuarios_No_Grupo') IS NOT NULL) DROP TABLE dbo.Usuarios_No_Grupo SELECT DISTINCT A.Ds_Linha AS Ds_Usuario, CAST(NULL AS DATETIME) AS Dt_Entrada, CAST(NULL AS INT) AS Qt_Dias_Sem_Falar, CAST(NULL AS INT) AS Qt_Mensagens INTO dbo.Usuarios_No_Grupo FROM dbo.Conversas_Whatsapp A LEFT JOIN dbo.Conversas_Whatsapp B ON A.Ds_Linha = B.Ds_Usuario_Removido AND B.Ds_Situacao = 'saiu' WHERE B.Ds_Usuario_Removido IS NULL AND A.Ds_Situacao IN ('entrou', 'mensagem') AND A.Ds_Linha IS NOT NULL UNION SELECT DISTINCT A.Ds_Usuario_Removido, CAST(NULL AS DATETIME) AS Dt_Entrada, CAST(NULL AS INT) AS Qt_Dias_Sem_Falar, CAST(NULL AS INT) AS Qt_Mensagens FROM dbo.Conversas_Whatsapp A LEFT JOIN dbo.Conversas_Whatsapp B ON A.Ds_Usuario_Removido = B.Ds_Usuario_Removido AND B.Ds_Situacao = 'saiu' WHERE B.Ds_Usuario_Removido IS NULL AND A.Ds_Situacao = 'entrou' AND A.Ds_Usuario_Removido IS NOT NULL UPDATE A SET A.Dt_Entrada = B.Dt_Evento, A.Qt_Dias_Sem_Falar = DATEDIFF(DAY, B.Dt_Ultimo_Evento, GETDATE()) FROM dbo.Usuarios_No_Grupo A JOIN ( SELECT ISNULL(Ds_Usuario_Removido, Ds_Linha) AS Ds_Usuario, MIN(Dt_Evento) AS Dt_Evento, MAX(Dt_Evento) AS Dt_Ultimo_Evento FROM dbo.Conversas_Whatsapp WHERE Ds_Situacao IN ('entrou', 'mensagem') GROUP BY ISNULL(Ds_Usuario_Removido, Ds_Linha) ) B ON A.Ds_Usuario = B.Ds_Usuario UPDATE A SET A.Qt_Mensagens = ISNULL(B.Qt_Mensagens, 0) FROM dbo.Usuarios_No_Grupo A LEFT JOIN ( SELECT Ds_Linha AS Ds_Usuario, COUNT(*) AS Qt_Mensagens FROM dbo.Conversas_Whatsapp WHERE Ds_Situacao = 'mensagem' GROUP BY Ds_Linha ) B ON A.Ds_Usuario = B.Ds_Usuario IF (OBJECT_ID('dbo.Usuarios_Nunca_Falaram') IS NOT NULL) DROP TABLE dbo.Usuarios_Nunca_Falaram SELECT A.Ds_Usuario INTO dbo.Usuarios_Nunca_Falaram FROM dbo.Usuarios_No_Grupo A LEFT JOIN dbo.Conversas_Whatsapp B ON A.Ds_Usuario = B.Ds_Linha AND B.Ds_Situacao = 'mensagem' WHERE B.Ds_Mensagem IS NULL IF (OBJECT_ID('dbo.Palavras_Mais_Faladas') IS NOT NULL) DROP TABLE dbo.Palavras_Mais_Faladas SELECT TOP 100 LTRIM(RTRIM(B.Palavra)) AS Palavra, COUNT(*) AS Qt_Ocorrencias INTO dbo.Palavras_Mais_Faladas FROM dbo.Conversas_Whatsapp A CROSS APPLY dbo.fncSplitTexto(A.Ds_Mensagem, ' ') B WHERE A.Ds_Situacao = 'mensagem' AND LEN(B.Palavra) >= 3 AND B.Palavra NOT IN ( 'que', 'nao', '<Arquivo', 'mídia', 'oculto>', 'para', 'com', 'uma', 'tem', 'Mas', 'pra', 'mais', 'por', 'isso', 'Esta', 'muito', 'Vai', 'como', 'Bom', 'Boa', 'aqui', 'Esse', 'ser', 'vou', 'fazer', 'quem', 'ele', 'pode', 'dia', 'Foi', 'alguem', 'bem', 'estou', 'mesmo', 'essa', 'até', 'ter', '??????', 'tenho', '??????', 'pessoal', 'Acho', 'Sim', 'também', 'Ainda', 'sobre', 'kkk', 'galera', 'são', 'Cara', 'meu', 'Obrigado', 'agora', 'nos', 'dos', 'tudo', 'Quando', 'sem', 'PESSOAL,', 'pelo', 'Depois', 'então', 'todos', 'Dar', 'minha', 'ver', 'sempre', 'parabéns', 'Nem', 'gente', 'usar', 'algum', 'mensagem', 'pois', 'qual', 'for', 'pela', 'Sei', 'faz', 'eles', 'das' ) AND B.Palavra NOT IN ( 'seu', 'sua', 'vamos', 'era', 'nada', 'problema', 'fiz', 'Pro', 'Show', 'Sou', 'apenas', 'seria', 'Valeu', 'assim', 'melhor', 'tipo', 'porque', 'coisa', 'fica', 'seja', 'falar', 'outro', 'quer', 'algo', 'hoje', 'Onde', 'sim,', 'usando', 'uns', 'tarde', 'será', 'vlw', 'estão', 'todo', 'top', 'deve', 'sabe', 'galera,', 'caso', 'cada', 'Fala', 'preciso', 'outra', 'tinha', 'precisa', 'pouco', 'tiver', 'fazendo', 'Nesse', 'ano', 'erro', 'deu', 'estava', 'ficar', 'qualquer', 'muita', 'noite', 'uso' , 'local', 'mundo', 'Olha', 'querendo', 'rapaz', 'Gostaria', 'foram', 'bastante', 'Excelente') AND B.Palavra NOT IN ( 'alguma', 'fora', 'alguns', 'Existe', 'nome', 'Posso', 'saber', 'Estamos', 'Legal', 'quero', 'hein', 'Nosso', 'Nunca', 'achei', 'menos', 'anos', 'podem', 'forma', 'grande', 'Manda', 'usa', 'conta', 'fez', 'pessoas', 'tentar', 'entendi', 'mercado', 'verdade', '(arquivo', 'anexado)', 'desse', 'vindo', 'certo', 'entrar', 'estar', 'outros', 'parece', 'temos', 'próximo', 'entre', 'Esses', 'isso,', 'parte', 'algumas', 'daí', 'exatamente', 'favor', 'hora', 'maior', 'mandar', 'ninguém', 'realmente', 'sábado', 'tenha', 'teria', 'vem', 'vezes', 'vida', 'umas', 'desde', 'dias', 'direto', 'dois' ) AND B.Palavra NOT IN ( 'Aos', 'nas', 'novo', 'tive', 'dele', 'deles', 'dela', 'delas', 'dessa', 'mesma', 'sendo', 'ele', 'ela', 'vez', 'final', 'quiser', 'vão', 'vários', 'díficil', 'duas', 'colocar', 'dentro', 'dizer', 'essas', 'Fui', 'Nessa', 'pegar', 'conseguir', 'dia.', 'outras', 'Poderia', 'todas', 'certeza', 'dia,', 'falando', 'muitos', 'olhada', 'pessoa', 'quase', 'questão', 'semana', 'tirar', 'amanhã', 'antes', 'não,', 'opção', 'talvez', 'coisas', 'daqui', 'massa', 'assunto', 'consigo', 'contato', 'faço', 'passar', 'Queria', 'somente', 'toda', 'vcs', 'Bora', 'enviar', 'isto', 'você', 'vc', 'vcs' ) AND B.Palavra NOT LIKE '?%' AND B.Palavra NOT LIKE '..%' AND B.Palavra NOT LIKE 'rs%' AND B.Palavra NOT LIKE 'kkk%' GROUP BY LTRIM(RTRIM(B.Palavra)) ORDER BY 2 DESC IF (OBJECT_ID('dbo.Palavras_Mais_Faladas_Por_Usuario') IS NOT NULL) DROP TABLE dbo.Palavras_Mais_Faladas_Por_Usuario SELECT A.Ds_Usuario AS Ds_Usuario, A.Palavra, A.Qt_Ocorrencias INTO dbo.Palavras_Mais_Faladas_Por_Usuario FROM ( SELECT A.Ds_Linha AS Ds_Usuario, LTRIM(RTRIM(B.Palavra)) AS Palavra, COUNT(*) AS Qt_Ocorrencias, ROW_NUMBER() OVER(PARTITION BY A.Ds_Linha ORDER BY COUNT(*) DESC) AS Ranking FROM dbo.Conversas_Whatsapp A CROSS APPLY dbo.fncSplitTexto(A.Ds_Mensagem, ' ') B WHERE A.Ds_Situacao = 'mensagem' AND LEN(B.Palavra) >= 3 AND B.Palavra NOT IN ( 'que', 'nao', '<Arquivo', 'mídia', 'oculto>', 'para', 'com', 'uma', 'tem', 'Mas', 'pra', 'mais', 'por', 'isso', 'Esta', 'muito', 'Vai', 'como', 'Bom', 'Boa', 'aqui', 'Esse', 'ser', 'vou', 'fazer', 'quem', 'ele', 'pode', 'dia', 'Foi', 'alguem', 'bem', 'estou', 'mesmo', 'essa', 'até', 'ter', '??????', 'tenho', '??????', 'pessoal', 'Acho', 'Sim', 'também', 'Ainda', 'sobre', 'kkk', 'galera', 'são', 'Cara', 'meu', 'Obrigado', 'agora', 'nos', 'dos', 'tudo', 'Quando', 'sem', 'PESSOAL,', 'pelo', 'Depois', 'então', 'todos', 'Dar', 'minha', 'ver', 'sempre', 'parabéns', 'Nem', 'gente', 'usar', 'algum', 'mensagem', 'pois', 'qual', 'for', 'pela', 'Sei', 'faz', 'eles', 'das' ) AND B.Palavra NOT IN ( 'seu', 'sua', 'vamos', 'era', 'nada', 'problema', 'fiz', 'Pro', 'Show', 'Sou', 'apenas', 'seria', 'Valeu', 'assim', 'melhor', 'tipo', 'porque', 'coisa', 'fica', 'seja', 'falar', 'outro', 'quer', 'algo', 'hoje', 'Onde', 'sim,', 'usando', 'uns', 'tarde', 'será', 'vlw', 'estão', 'todo', 'top', 'deve', 'sabe', 'galera,', 'caso', 'cada', 'Fala', 'preciso', 'outra', 'tinha', 'precisa', 'pouco', 'tiver', 'fazendo', 'Nesse', 'ano', 'erro', 'deu', 'estava', 'ficar', 'qualquer', 'muita', 'noite', 'uso' , 'local', 'mundo', 'Olha', 'querendo', 'rapaz', 'Gostaria', 'foram', 'bastante', 'Excelente') AND B.Palavra NOT IN ( 'alguma', 'fora', 'alguns', 'Existe', 'nome', 'Posso', 'saber', 'Estamos', 'Legal', 'quero', 'hein', 'Nosso', 'Nunca', 'achei', 'menos', 'anos', 'podem', 'forma', 'grande', 'Manda', 'usa', 'conta', 'fez', 'pessoas', 'tentar', 'entendi', 'mercado', 'verdade', '(arquivo', 'anexado)', 'desse', 'vindo', 'certo', 'entrar', 'estar', 'outros', 'parece', 'temos', 'próximo', 'entre', 'Esses', 'isso,', 'parte', 'algumas', 'daí', 'exatamente', 'favor', 'hora', 'maior', 'mandar', 'ninguém', 'realmente', 'sábado', 'tenha', 'teria', 'vem', 'vezes', 'vida', 'umas', 'desde', 'dias', 'direto', 'dois' ) AND B.Palavra NOT IN ( 'Aos', 'nas', 'novo', 'tive', 'dele', 'deles', 'dela', 'delas', 'dessa', 'mesma', 'sendo', 'ele', 'ela', 'vez', 'final', 'quiser', 'vão', 'vários', 'díficil', 'duas', 'colocar', 'dentro', 'dizer', 'essas', 'Fui', 'Nessa', 'pegar', 'conseguir', 'dia.', 'outras', 'Poderia', 'todas', 'certeza', 'dia,', 'falando', 'muitos', 'olhada', 'pessoa', 'quase', 'questão', 'semana', 'tirar', 'amanhã', 'antes', 'não,', 'opção', 'talvez', 'coisas', 'daqui', 'massa', 'assunto', 'consigo', 'contato', 'faço', 'passar', 'Queria', 'somente', 'toda', 'vcs', 'Bora', 'enviar', 'isto', 'você', 'vc', 'vcs' ) AND B.Palavra NOT LIKE '?%' AND B.Palavra NOT LIKE '..%' AND B.Palavra NOT LIKE 'rs%' AND B.Palavra NOT LIKE 'kkk%' GROUP BY A.Ds_Linha, LTRIM(RTRIM(B.Palavra)) ) A WHERE A.Ranking <= 10 ORDER BY A.Qt_Ocorrencias DESC |
How to view data in Power BI
Since we already treat the data in the database (which I think is the most correct and professional way), in Power BI we will have the job of just displaying and visualizing the data, which is the real purpose of the tool (not a tool for ETL).
First, let's upload the database data to Power BI:
Once the data has been imported (prefer the Import method over DirectQuery), I will need to create new 2 columns in Power BI, which I could have created in the database, but wanted to show how to do this in Power BI:
Time column:
DAX used: Time = HOUR (Conversations_Whatsapp [Event_Dt])
Weekday column:
DAX used: Dia_Semana = FORMAT (WEEKDAY (Conversas_Whatsapp [Dt_Evento]; 3); “dddd”)
As you may have noticed, as I am using the English version of Power BI Desktop, the dates are in English format as well. To fix this, use the Portuguese version of Power BI Desktop or use your own date tables or change the direct location in the PBIX file.
Now is the time to assemble the graphs and views in the report:
Well guys, I hope you enjoyed this article. I will start posting more BI content and hope you are enjoying the latest articles.
Regards and see you next post.
Hi Dirceu,
First congratulations on the topic.
I would like to take a doubt with you, I am following the procedure to generate a statistic and I came across that in the source code, the creation of the 'Conversas_Whatsapp' table has more columns than the 'Test' table, mentioned in the SQLCLR example. How can I import the chat file, in a table with more columns using its stpImporta_Txt_Encoding procedure?
very good! you are the man!
It was top prb.
WOW !! The graphics were very good Dirceu! TOP!! Congratulations!!