Honeypot 8 hours data analysis – Digital-Domain TI Blog

Identification and Removal of Bots/Crawlers

The following entries are identified as bots or known search engine crawlers and should be removed from the analysis of malicious activity:

Category	Entry to Remove	Reason
User-Agent	`l9tcpid/v1.1.0`	Explicitly identified as a known bot in the LLM Capture Analysis. (2 records)
IP Reputation	`crawl-192-178-6-97.googlebot.com`	Googlebot, a search engine crawler. (1 record)
IP Reputation	`crawl-192-178-6-99.googlebot.com`	Googlebot, a search engine crawler. (1 record)
IP Reputation	`crawl-66-249-64-105.googlebot.com`	Googlebot, a search engine crawler. (1 record)
IP Reputation	`crawler190.deepfield.net`	Clearly labeled as a crawler. (1 record)
IP Reputation	`fwdproxy-ftw-022.fbsv.net`	A Facebook crawler/proxy. (1 record)
IP Reputation	`fwdproxy-pnb-112.fbsv.net`	A Facebook crawler/proxy. (1 record)

The total count of identified bots/crawlers is 9 records (). However, since the LLM Capture Analysis indicates that the l9tcpid/v1.1.0 bot accounts for of the analysis ( record from total analyzed, not the from the User-Agent list), and the IP Reputation data has separate entries, it’s safer to consider the specific IP Reputation entries and the bot mention from the LLM analysis.

Conservative Removal: I will remove the specific Google/Deepfield/Facebook crawler records from the IP Reputation Hostnames and the records associated with the l9tcpid/v1.1.0 User-Agent. This is a minimum of records to exclude from the total of .

Analysis of Remaining Honeypot Data (Excluding Bots/Crawlers)

The vast majority of the recorded events appear to be malicious or targeted activity, primarily classified as scanners and bots.

Threat Summary

Threat Type: The primary activity is Scanning (), indicating automated attempts to discover and probe vulnerabilities. Bots account for of the analyzed captures.
Threat Level: Almost all analyzed captures () are rated Medium threat, with a smaller portion () rated High.

Geographic Distribution (Pre-Removal)

The geographic data reflects the source of the connections and not necessarily the operator’s location.

Continent	Count (Records with Location Data)
Europe	()
Asia	()
North America	()
Total

Europe is the dominant source, accounting for over four-fifths of the geographically identifiable malicious traffic.
The top three contributing countries are all in Europe: Netherlands (NL) at , Portugal (PT) at , and Bulgaria (BG) at .

Targeted Activity and Methods

The LLM Capture Analysis reveals the specific attack patterns, almost all of which indicate automated scanning:

WordPress Scanning: One analysis highlights an IP systematically requesting wlwmanifest.xml files, a strong indicator of an automated WordPress installation scanner.
Configuration File Probing: Multiple captured URLs show probing for common configuration files like .env, .env.development, and .env.example (e.g., http://68.106.110.168/.env), which could expose sensitive information.
Vulnerability Scanning: Several analyses confirm probing for known web application vulnerabilities and targeted scans for vulnerabilities related to login functionality and potentially vulnerable JavaScript files.
Bot Activity: The analysis confirms one specific IP ( of the LLM-analyzed records) is using the known bot User-Agent l9tcpid/v1.1.0.

Key User-Agents Associated with Malicious Activity

While many records are categorized as “Other,” the listed user agents not explicitly identified as bots or crawlers suggest the following activity:

Browser-Mimicking Scanners: The most frequent non-bot user agents are those that mimic standard, slightly older browser versions (e.g., Chrome/60 on Windows NT 10.0, or Edge/90 on Windows NT 10.0). These are often used by scanning tools to blend in with legitimate traffic.
Mobile/Legacy Mimicry: Agents mimicking older Android, Firefox Mobile, or even legacy IE/Opera are present, which could be an attempt to test for vulnerabilities specific to those platforms or bypass simple bot detection.