SMS Filtering APP Development#
This article is published on Sohu Technology Products - SMS Filtering APP Development
I have always wanted to develop my own SMS filtering APP, but I never took concrete action. Now I finally have the time to calm down and document the entire development process while developing.
Spam SMS Samples#
The first question I encountered was, since I want to filter spam SMS, how do I identify which ones are spam? How to recognize them?
Referring to my previous experience in training to recognize pipe counts, I decided to train a Text model using CoreML for recognition. But then the question arose: where do I get the SMS dataset to train the model?
Initially, I planned to find spam SMS samples online, but after searching for a long time, I couldn't find any. So I thought of using the SMS messages on my and my family's phones. After all, SMS messages are generally not deleted, and there are several thousand of them, including spam, promotions, advertisements, and so on.
So the question became, how to export SMS from iPhone?
I also searched for a long time and found that most third-party software required payment. Eventually, I discovered a free export solution.
First, back up the phone to the computer without encryption. As shown in the figure below, select Back up all the data on your iPhone to this Mac
, click Back Up Now
, and wait for the backup to complete. After the backup is complete, click Manage Backups
.
After clicking Manage Backups
, the interface is as follows. You can see the backup records. Right-click and select Show In Finder
to open it in the folder.
Then you can see that the backup directory has been opened. At this point, you need to find the file named 3d0d7e5fb2ce288813306e4d4636395e047a3d28
. This file is the database file for the SMS backup. Then the question arises: how to find it? Seeing the backup directory with one folder after another can be confusing. How to find it? It's simple: search. Click the search button in the upper right corner and directly input this file name. Note that the search range is the current folder.
The search results are as follows:
Then copy this file to another location, such as the desktop, and open it with database software, such as SQLPro for SQLLite
, as shown below:
After observing this file, I found that the phone numbers and SMS records are distributed across different tables. I need to write an SQL query to extract the required content. The SQL content is as follows, referring to SQL to extract messages from backup. Select Query
in the above image and input the command as follows:
SELECT datetime(message.date, 'unixepoch', '+31 years', '-6 hours') as Timestamp, handle.id, message.text,
case when message.is_from_me then 'From me' else 'To me' end as Sender
FROM message, handle WHERE message.handle_id = handle.ROWID AND message.text NOT NULL;
Then click execute in the upper right corner, and you can see that all the SMS messages have been filtered out.
Then select all rows, right-click and choose Export result set as
to export as CSV
, which will export an Excel format file.
This way, I obtained the required SMS samples.
Spam SMS Training Recognition#
With the samples in hand, let's see how to train for recognition. I plan to use Apple's CoreML for recognition. So how to use it? What are the format requirements for the samples? How long does training take?
First, let's create a text training CoreML
project. Select Xcode, click Open Developer Tool
, and select CoreML
to open it, as shown below:
Then select a folder and click New Document
, as shown:
Then select Text Classification
, as shown below:
Next, input the project name and description.
Click create in the lower right corner to enter the main interface, as shown below:
Click on the detailed description of Training Data
, and you can see the format required by CoreML
for text recognition, which supports JSON
and CSV
files. The format is as follows:
The JSON format is as follows:
// JSON file
[
{
"text": "The movie was fantastic!",
"label": "positive"
}, {
"text": "Very boring. Fell asleep.",
"label": "negative"
}, {
"text": "It was just OK.",
"label": "neutral"
} ...
]
The CSV format consists of one column for text
and one column for label
:
text | label |
---|---|
This is a normal SMS | label1 |
This is a spam SMS | label2 |
Since in the previous step, I have already exported the SMS as a CSV format, I just need to change the format to that shown in the image above. The only remaining question is: what values can the labels take?
To see what values the labels can take, I need to first understand the filtering logic of the system SMS. What filtering categories are supported? Otherwise, if I define my own categories and group them, I might find out that the system does not support them, which would be awkward.
SMS Filtering Categories#
System SMS Filtering Logic#
Referring to SMS and MMS Message Filtering, it can be seen that developers do not have the permission to create new groups. They can only intercept and return specified categories for SMS
or MMS
received from unknown contacts.
It should be noted that, according to the documentation, SMS filtering does not support filtering iMessages and SMS from contacts in the address book; it only supports SMS
and MMS
from unknown contacts.
SMS filtering is further divided into local judgment filtering and server-side judgment filtering, as illustrated below:
According to the documentation, even for server-side filtering, the APP cannot directly access the network; the system will interact with the configured server. Moreover, the App Extension cannot write data through the shared Group, so SMS can only be obtained in the App Extension, cannot be stored, and cannot be uploaded, thus ensuring privacy and security. For more implementations of server-side filtering, refer to Creating a Message Filter App Extension.
Next, let's look at the supported filtering types, ILMessageFilterAction
.
The major categories support five types:
- none
Not enough information to judge, will display information or further request server-side judgment filtering. - allow
Normally display information. - junk
Prevent normal display of information, displayed under the junk SMS category. - promotion
Prevent normal display of information, displayed under the promotional information category. - transaction
Prevent normal display of information, displayed under the transaction information category.
Among these, there are also subcategories, ILMessageFilterSubAction
. For specific meanings, refer to ILMessageFilterSubAction.
- none
- The subcategories supported by promotion include:
- others
- offers
- coupons
- The subcategories supported by transaction include:
- others
- finance
- orders
- reminders
- health
- weather
- carrier
- rewards
- publicServices
Here, we only handle the major categories, and the specific subcategories are not filtered in detail. Therefore, the values for the labels that need to be trained are very clear: filter spam SMS, promotional information, and transaction information. As for none and allow, they are not distinguished and are uniformly processed as allow. Thus, the total values for the labels that need to be trained are as follows:
- allow
- junk
- promotion
- transaction
Next, for the exported SMS CSV
file, corresponding labels need to be added for each SMS. This can only be done manually. The size of the sample and the definition of the labels will determine the subsequent accuracy of recognition. At the same time, for the implementation of subsequent subcategories, it is advisable to be realistic and not categorize, for example, promotions into junk...
Once all the SMS samples are labeled, they can be imported into Create ML
for training to generate the required model. The steps are as follows:
First, import the dataset.
Then click Train
in the upper left corner.
Once training is complete, you can click Preview to simulate SMS text and see the output predictions, as shown in the figure below:
Finally, export the model for APP use.
APP Development#
Create a new project, then use new bing to generate images to design the APP Icon, and use ChatGPT-4 to generate the APP name. Then add the Message Filter Extension
Target, as shown in the figure below:
In MessageFilterExtension.swift
, you can see that Apple has already implemented the basic framework. You only need to add the corresponding filtering logic in the framework's corresponding // TODO: location.
Then import the training result set into the project. Note that the Target should be checked for both the main project and the Message Filter Extension
Target, as the model needs to be used in this Target for filtering.
The specific usage is as follows:
import Foundation
import IdentityLookup
import CoreML
import IdentityLookup
enum SMSFilterActionType: String {
case transaction
case promotion
case allow
case junk
func formatFilterAction() -> ILMessageFilterAction {
switch self {
case .transaction:
return ILMessageFilterAction.transaction
case .promotion:
return ILMessageFilterAction.promotion
case .allow:
return ILMessageFilterAction.allow
case .junk:
return ILMessageFilterAction.junk
}
}
}
struct SMSFilterUtil {
static func filter(with messageBody: String) -> ILMessageFilterAction {
var filterAction: ILMessageFilterAction = .none
let configuration = MLModelConfiguration()
do {
let model = try SmsClassifier(configuration: configuration)
let resultLabel = try model.prediction(text: messageBody).label
if let resultFilterAction = SMSFilterActionType(rawValue: resultLabel)?.formatFilterAction() {
filterAction = resultFilterAction
}
} catch {
print(error)
}
return filterAction
}
}
Then in MessageFilterExtension.Swift
, call the offlineAction(for queryRequest: ILMessageFilterQueryRequest)
method as follows:
@available(iOSApplicationExtension 16.0, *)
private func offlineAction(for queryRequest: ILMessageFilterQueryRequest) -> (ILMessageFilterAction, ILMessageFilterSubAction) {
guard let messageBody = queryRequest.messageBody else {
return (.none, .none)
}
let action = MWSMSFilterUtil.filter(with: messageBody)
return (action, .none)
}
It should be noted that the minimum version setting for the APP is that ILMessageFilterSubAction
is only supported on iOS 16 and above, while ILMessageFilterSubAction
is supported on iOS 14 and above.
If you want to implement more refined SubAction
filtering, then the labels of the SMS dataset above need to be changed to more refined labels, and then a model needs to be trained to determine.
Additionally, ILMessageFilterQueryRequest
can retrieve sender
and messageBody
, so if you want to implement custom rules, such as setting corresponding rules for a specific phone number, you need to set the corresponding rules from the APP and then share them to the Extension through Group, and then match the rules in the above method.
Summary#
I believe that through the above steps, everyone can develop their own SMS filtering APP.
The above steps are based on a fixed training model to match the logic, and the steps are:
- Obtain the SMS dataset.
- Use CoreML to train the dataset and generate a model.
- Use the model in the project for judgment.
This method generates a model with fixed data. Each update of the model requires retraining and importing, followed by updating the APP. Is there a better way?
For example, can training and updating be done in the APP? Or can a combination of local rules, local models, and network models be used?
Assuming Solution One:
First, training and updating in the APP, the general idea is as follows:
To update the model, it is necessary to know the content of a piece of data and its classification. Therefore, if training the model in the APP, it is necessary to obtain the classification through another method. Otherwise, using the model to obtain the classification and then going back to train the model is not very meaningful. Therefore, obtaining data classification through custom rules and then using the data and its classification to update the model should be feasible.
Assuming Solution Two:
Then consider a more complete method, which is to use a combination of local rules, local models, and network models:
The logic is to first match using local rules. If local rules do not match, continue to use the local model for matching. If the local model also does not match, request the server, which has a continuously trained and updated model to obtain the corresponding classification. Finally, during each update, the latest model from the server is updated to the project.
Assuming Solution Three:
Solution two requires a network model, assuming that the premise is that the server has a continuously trained and updated model. What if this assumption does not exist? If only local rules and local models are available, along with occasionally obtained updated datasets, is there a way to update the local model online?
Currently, the local model is directly added to the APP main Bundle. It can be considered to copy it to the shared Group of the APP and Extension during the first launch. Each time the APP is opened, check if the model has been updated. If there is an update, download and replace the model file in this directory. In the Extension, the model file in this directory can be accessed via URL for filtering.
The flowcharts for several solutions are as follows:
Summary as follows: