How we do it




Our Methodology


Gator Analytics uses many different detection methods to determine whether the user is valid or not. Some of these methods are algorithmic, and others are learned over time by detecting patterns in the data. Not all methods will be listed in order to protect our intellectual property and to prevent reverse-engineering.

Scoring

Some methods are fairly conclusive about whether the user is valid or not. Other methods produce a likelyhood of validity, which we show as a score from 0-1000. The lower the score, the more likely the user is invalid. For example, a user proxying in through a data center's I.P. address is highly unlikely to be a valid user. On the other hand, certain countries originate the bulk of invalid traffic, but also have real users.

Scores will generally be between zero and about 500. A score below 100 is considered 'invalid', meaning there is almost a certainty that the user is not real. The upper end of the scoring range will be used in the future for whitelisting methods.


Bots


A large and growing percentage of web traffic is generated by bots, spiders, extensions, headless browsers, toolbars and other means (collectively called bots). The bots have become increasingly sophisticated in how they disguise themselves, therefore requiring continuously evolving detection methods.

Here are some of the methods we employ:

Method Description
Block List We check every I.P. address against our database of known infected machines. This detects machines that have been hijacked as spambots and also machines that are infected with viruses and generate large amounts of automated traffic and clicks. This database is maintained in realtime in order to detect emerging sources.
Data Center Origin We maintain a database of data center I.P. address ranges, since many bot networks will use data centers to create or proxy traffic. A session from within, for example, an Amazon AWS data center address block is unlikely to be valid.
Public Web Proxies Similar to using a data center to proxy traffic, public web proxies are also used. We maintain a realtime database of public web proxies in order to score sessions from them.
TOR TOR has legitimate uses, but hides the origin of the user, so it can be used to generate random sessions.
Spoofed User Agents Bots often rotate their user agents in order to appear to be more than one device and generate realistic looking traffic. We have developed technology to match the user agent to the browser's capabilities and detect sessions that have altered their user agent.
Invalid Searches To appear to be from a search engine, often bots create fake referrer headers. In many cases, these headers differ from real search engine referrer structures.
Collusion This method detects the coincidence of a set of I.P. addresses and a set of publisher sites.
Other Proprietary Methods We currently have developed several other methods for detecting fraudulent sessions and this continues to be a primary focus of our research efforts.

Hidden Users


Hidden users are from sessions where no page is ever visible on the screen. This is often, but not necessarily due to bots, since there are many generated by search engines pre-loading pages in the background in order to improve performance. Also, a page may be behind a tab that is never shown, or offscreen. Hidden sessions score zero due to this.

Primary reasons for hidden sessions:

Reason Description
Preloading Search engines will preload pages in the background while a user types in a search query. The search engine attempts to predict which link or links the user will click on and loads the pages from those links. This is a way to improve the performance of web browsing, however many of the preloaded pages are never made visible and should not be counted.
Browser Window Hidden This occurs when a browser window is behind another window.
Background Browser Tabs A browser tab can be launched in the background and load pages. These pages are never visible unless the user opens the tab.
Bots Even if the session is not detected as a bot, the session will often never be visible and be scored as invalid.

Our technology tracks whether a session is ever viewed and updates the visibility based on that. For example, if a page is hidden during a pre-load, it is initially recorded as hidden with a score of zero. If the user clicks on the link to view the preloaded page, that is detected and the session is updated with a new score.

Each session is scored and reports all have options to include or exclude users based on score. For example, you may want to view campaigns where the score is less than 100. This would show you the campaigns that are referring the worst quality users.


Signup

Contact Us

Gator.io
65 Enterprise
Aliso Viejo, CA 92656
Sales: support@gator.io
Support: support@gator.io
© 2016 Gator.io