All Freshdesk accounts have a spam reputation which is based on their respective spam scores. Spam scores are stored in our database in the 'conversion_metrics' table. Spam reputation for an account can be seen in the accounts details page in the FreshopsAdmin console


What is a spam score?


A spam score is a value that is maintained on an account level. It ranges between -2 and 5. When an account has a spam score between -2 and 3, there will be no impacts and it is considered to be okay. But when the account has a score of 4  (probable spam) or 5 (spam), there are certain consequences that are mentioned in the sections below.


When is a spam score set for an account?

For an account, the spam score will be determined and saved during the following events:

  • Account signup (one-time activity)
  • On publish of an article
  • Forums


Account signup


When a user is signing up for a Freshdesk account, the spam score will be calculated based on the domain of the signup email address, IP address,  account domain, account name, city, phone, and referrer. If the account is marked as spam in any of these checks, then we set the spam score to 5 directly.

Also, we maintain a list of restricted email domains. If someone is creating a Freshdesk account with an email address that is present in this list, then we set the spam score to 4.

Both of these events occur when the account signup happens.

On Publish of an Article


When an article is published, all the SEO data will be published as well and the articles will be available when searched on the internet. We have to make sure that the articles are returned, by the search engines, only when a related or relevant search term is used. 


For example, if an article contains the phrase 'gmail support' in any of its paragraphs or in its title, then this article will be returned by the search engine when a user searches for 'gmail support' on the web. They might be trying to search for a Gmail article, but the search engine returns this Freshdesk article as well. Hence, while publishing an article, we always check the article title and description.


Along with title and description, we also check the account status (is active or not) and  SEO data against a regular expression. Matching the article title and description happens as soon as the article is published. But the description might be very long and would take some time to get matched against the expression and hence it is handled asynchronously via sidekiq jobs. When the job runs, it will try to match the article description against the regular expression and will return the spam probability. If it is identified as spam, then we will set the spam score as 4.


The REGEX patterns used to find out if the title/description of an article has spam  is as follows:


'ARTICLE_SPAM_REGEX': '(gmail|kindle|face.?book|apple|microsoft|google|aol|hotmail|aim|mozilla|quickbooks|norton).*(support|phone|number)'
Null

'PHONE_NUMBER_SPAM_REGEX': '(1|I)..?8(1|I)8..?85(0|O)..?78(0|O)6|(1|I)..?877..?345..?3847|(1|I)..?877..?37(0|O)..?3(1|I)89|(1|I)..?8(0|O)(0|O)..?79(0|O)..?9(1|I)86|(1|I)..?8(0|O)(0|O)..?436..?(0|O)259|(1|I)..?8(0|O)(0|O)..?969..?(1|I)649|(1|I)..?844..?922..?7448|(1|I)..?8(0|O)(0|O)..?75(0|O)..?6584|(1|I)..?8(0|O)(0|O)..?6(0|O)4..?(1|I)88(0|O)|(1|I)..?877..?242..?364(1|I)|(1|I)..?844..?782..?8(0|O)96|(1|I)..?844..?895..?(0|O)4(1|I)(0|O)|(1|I)..?844..?2(0|O)4..?9294|(1|I)..?8(0|O)(0|O)..?2(1|I)3..?2(1|I)7(1|I)|(1|I)..?855..?58(0|O)..?(1|I)8(0|O)8|(1|I)..?877..?424..?6647|(1|I)..?877..?37(0|O)..?3(1|I)89|(1|I)..?844..?83(0|O)..?8555|(1|I)..?8(0|O)(0|O)..?6(1|I)(1|I)..?5(0|O)(0|O)7|(1|I)..?8(0|O)(0|O)..?584..?46(1|I)(1|I)|(1|I)..?844..?389..?5696|(1|I)..?844..?483..?(0|O)332|(1|I)..?844..?78(0|O)..?675(1|I)|(1|I)..?8(0|O)(0|O)..?596..?(1|I)(0|O)65|(1|I)..?888..?573..?5222|(1|I)..?855..?4(0|O)9..?(1|I)555|(1|I)..?844..?436..?(1|I)893|(1|I)..?8(0|O)(0|O)..?89(1|I)..?4(0|O)(0|O)8|(1|I)..?855..?662..?4436'
Null


'CONTENT_SPAM_CHAR_REGEX': 'ℴ|ℕ|ℓ|ℳ|ℱ|ℋ|ℝ|ⅈ|ℯ|ℂ|○|ℬ|ℂ|ℙ|ℹ|ℒ|ⅉ|ℐ'


Forums


In Forums, a similar approach is taken as followed for articles but we have a different set of regular expressions. We also use a spam checker 'akismetor' to determine the spam score. Unlike articles, all spam validations are handled in async by a sidekiq job. If the Form post is identified as spam, then the spam score is set as 4.


What are the impacts of a high spam score?


When an account's spam score is set as 4 or 5, the following are the consequences:

  • Block ticket creation via the Support portal
  • Block ticket creation via Feedback widget. 
  • Add no-index, no-follow to the customer portal


Do we get notified when the spam score is increased?


When the spam score is increased for an account, an email will be sent to mail-alerts@freshdesk.com,

noc@freshdesk.com, and helpdesk@noc-alerts.freshservice.com. The email subject will be in the following format

"Detected suspicious solution spam : Account id : #{account_id}, Account state : #{account.subscription.state}, Domain : #{account.full_domain}"


Additionally, the following information is also logged


:::::: Kbase spam content encountered - increased spam reputation for article ##{self.id} in account ##{self.account.id}  :::::::


So we can either check with the NOC team to know when an account got spam blacklisted or we can get the logs from Haystack using the following search query


"Kbase spam content encountered" AND "account_id"


Here is an example:



We also get the article ID in these logs.

How to reduce the spam score?


NOC team has privileges to reduce the spam score and whitelist the account and the same access privileges are available to the leads, mentors, and L2s. If the spam score was increased because of an article being published, then reducing the spam score would not be a permanent solution. Because, if the article is published again, the score will again get increased. We would have to get the article ID from the logs and check which article caused this account to be spam blacklisted. We would have to check for any generic or probable spam content added to the articles (like 'gmail support', 'facebook support', etc).

Then, the customer has to remove the phrase and republish the article so that the score does not get increased every time the article is published.

However, there are customers with genuine reasons to use such phrases in their articles. In those cases, we can whitelist the entire knowledge base for an account (not recommended) after consulting with the product team. Once it is whitelisted, no further spam checks will be run for the articles in that account. Hence this call has to be taken by the product team.



If an account seems to be genuine and if we need to whitelist the account for Knowledge-Base spam check, please enable the launch party kbase_spam_whitelist from Devops


Useful references


Freshdesk also shares the spam detection service used by the Email team. Here are some references:

https://www.freshworks.com/company/practices-on-tackling-spam-blog/
https://www.ehawk.net/ (third party used for spam check)