Click on the banner to learn about and purchase my database training on Azure

SQL Server and Power BI - How to analyze and generate WhatsApp group conversation statistics

Views: 2.988 views
Reading Time: 11 minutes

Speak guys!

In this blog post, I'd like to share with you a solution that allows you to analyze and generate Whatsapp group conversation statistics using SQL Server and Power BI. Unlike Telegram, which has an integration API and task automation bots, Whatsapp is completely late closed and does not allow any kind of legal integration into the platform, which makes the task of managing groups much more expensive for administrators, especially in large groups.

Wouldn't it be nice to have some statistics of who are the people who talk most in a group, which days and times people are most interacting with, and who are the users who never talk at all? While in Telegram there are bots (Combot, for example) that do this for you, in Whatsapp there is no way to generate this kind of statistics. I mean, there wasn't!

How to export group conversations

Since there is no integration API in Whatsapp, there is no way for you to access conversations directly from the Whatsapp platform. However, one way to access conversation data is by exporting conversations from your mobile phone (Whatsapp Web does not have this feature) and saving to your server.

A handy way to do this is to save it to your Google Drive / Dropbox / Email from your phone and download the file to your server.

How to import group conversations

View content
Once you have recorded the conversations and downloaded them to the server, we now need to import the file into the database.

When you open the conversation zip file, you will see that there are several contact (.vcf) files and you will have a .txt file, which we are looking to import:

When you open the file you will see that its encoding is UTF-8 (Codepage 65001) and the line wrapping format is Unix (Linefeed or \ n). This will be very important to import the file correctly.

Now we need import this file into the database. Choose the method that suits you best and move on to the next step.

BULK INSERT (Did not identify the line break)

Result:

OPENROWSET (Did not identify line break and encoding)

Result:

BCP with xp_cmdshell (no line break and no encoding)

Result:

OLE Automation (worked, but does not read UTF-8 files)

To view the source code of the fncLer_File_File function, visit the post. SQL Server - How to import text files into database (OLE Automation, CLR, BCP, BULK INSERT, OPENROWSET).

Result:

OPENROWSET with ACE OLEDB (worked)

One of the options where you can actually import the file correctly, including in UTF-8 encoding, is by using OPENROWSET and ACE OLEDB drivers. If you do not have ACE OLEDB drivers installed on your server, install them using the post tutorial. SQL Server - How to Install the Microsoft.ACE.OLEDB.12.0 and Microsoft.Jet.OLEDB.4.0 Drivers.

Result:

SQLCLR (worked)

And lastly, the solution that for me is the best of options, SQLCLR allows you to import text files easily using the Stored Procedure shown below:

Result:

If you want to implement this SP in your database, just use the script below:

Since this SP requires access to the filesystem, it must be created using the EXTERNAL_ACCESS permission. This requires the database to have the TRUSTWORTHY property set to ON before creating the assembly and Stored Procedure:

To learn more about SQLCLR, be sure to read the article. .

How to handle data in SQL Server

View content
For data processing, I will use the Stored Procedure stpImporta_Txt_Encodingusing OLE Automation to import the data as I demonstrated in the previous topic and also the function fncSplitText, which I shared in the post How to wrap a string in a substring table using a delimiter in SQL Server.

In summary, I took the following actions to handle the information:

  • Generate the table that will store the data
  • Imports the file
  • Replaces the string "you" with your user
  • Remove character "?"
  • Formats the event date
  • Remove unformatted lines (probably broken text on multiple lines)
  • Remove date from text (already has specific column for it)
  • Identifies the user who sent the common messages
  • Identifies the user who left the group
  • Identifies users who have been removed from the group.
  • Identify users who joined the group via invite link
  • Identifies users who were added to the group by another user.
  • Identify common messages
  • Creates the tables for use in Power BI (Group_Users, Never_Used_Users, Speak_More_Word, and Speak_More_Word)
  • Ignore some common expressions of Portuguese language in most spoken word tables

Source code:

How to view data in Power BI

Since we already treat the data in the database (which I think is the most correct and professional way), in Power BI we will have the job of just displaying and visualizing the data, which is the real purpose of the tool (not a tool for ETL).

First, let's upload the database data to Power BI:

Once the data has been imported (prefer the Import method over DirectQuery), I will need to create new 2 columns in Power BI, which I could have created in the database, but wanted to show how to do this in Power BI:

Time column:
DAX used: Time = HOUR (Conversations_Whatsapp [Event_Dt])

Weekday column:
DAX used: Dia_Semana = FORMAT (WEEKDAY (Conversas_Whatsapp [Dt_Evento]; 3); “dddd”)

As you may have noticed, as I am using the English version of Power BI Desktop, the dates are in English format as well. To fix this, use the Portuguese version of Power BI Desktop or use your own date tables or change the direct location in the PBIX file.

Now is the time to assemble the graphs and views in the report:

And this is the end result:

Well guys, I hope you enjoyed this article. I will start posting more BI content and hope you are enjoying the latest articles.

Regards and see you next post.