Identification and Removal of Bots/Crawlers
The following entries are identified as bots or known search engine crawlers and should be removed from the analysis of malicious activity:
| Category | Entry to Remove | Reason |
| User-Agent | l9tcpid/v1.1.0 | Explicitly identified as a known bot in the LLM Capture Analysis. (2 records) |
| IP Reputation | crawl-192-178-6-97.googlebot.com | Googlebot, a search engine crawler. (1 record) |
| IP Reputation | crawl-192-178-6-99.googlebot.com | Googlebot, a search engine crawler. (1 record) |
| IP Reputation | crawl-66-249-64-105.googlebot.com | Googlebot, a search engine crawler. (1 record) |
| IP Reputation | crawler190.deepfield.net | Clearly labeled as a crawler. (1 record) |
| IP Reputation | fwdproxy-ftw-022.fbsv.net | A Facebook crawler/proxy. (1 record) |
| IP Reputation | fwdproxy-pnb-112.fbsv.net | A Facebook crawler/proxy. (1 record) |
The total count of identified bots/crawlers is 9 records (). However, since the LLM Capture Analysis indicates that the
l9tcpid/v1.1.0 bot accounts for of the analysis (
record from
total analyzed, not the
from the User-Agent list), and the IP Reputation data has separate entries, it’s safer to consider the specific IP Reputation entries and the bot mention from the LLM analysis.
Conservative Removal: I will remove the specific Google/Deepfield/Facebook crawler records from the IP Reputation Hostnames and the
records associated with the
l9tcpid/v1.1.0 User-Agent. This is a minimum of records to exclude from the total of .
Analysis of Remaining Honeypot Data (Excluding Bots/Crawlers)
The vast majority of the recorded events appear to be malicious or targeted activity, primarily classified as scanners and bots.
Threat Summary
- Threat Type: The primary activity is Scanning (
), indicating automated attempts to discover and probe vulnerabilities. Bots account for
of the analyzed captures.
- Threat Level: Almost all analyzed captures (
) are rated Medium threat, with a smaller portion (
) rated High.
Geographic Distribution (Pre-Removal)
The geographic data reflects the source of the connections and not necessarily the operator’s location.
| Continent | Count (Records with Location Data) |
| Europe | |
| Asia | |
| North America | |
| Total |
- Europe is the dominant source, accounting for over four-fifths of the geographically identifiable malicious traffic.
- The top three contributing countries are all in Europe: Netherlands (NL) at
, Portugal (PT) at
, and Bulgaria (BG) at
.
Targeted Activity and Methods
The LLM Capture Analysis reveals the specific attack patterns, almost all of which indicate automated scanning:
- WordPress Scanning: One analysis highlights an IP systematically requesting
wlwmanifest.xmlfiles, a strong indicator of an automated WordPress installation scanner. - Configuration File Probing: Multiple captured URLs show probing for common configuration files like
.env,.env.development, and.env.example(e.g.,http://68.106.110.168/.env), which could expose sensitive information. - Vulnerability Scanning: Several analyses confirm probing for known web application vulnerabilities and targeted scans for vulnerabilities related to login functionality and potentially vulnerable JavaScript files.
- Bot Activity: The analysis confirms one specific IP (
of the LLM-analyzed records) is using the known bot User-Agent
l9tcpid/v1.1.0.
Key User-Agents Associated with Malicious Activity
While many records are categorized as “Other,” the listed user agents not explicitly identified as bots or crawlers suggest the following activity:
- Browser-Mimicking Scanners: The most frequent non-bot user agents are those that mimic standard, slightly older browser versions (e.g., Chrome/60 on Windows NT 10.0, or Edge/90 on Windows NT 10.0). These are often used by scanning tools to blend in with legitimate traffic.
- Mobile/Legacy Mimicry: Agents mimicking older Android, Firefox Mobile, or even legacy IE/Opera are present, which could be an attempt to test for vulnerabilities specific to those platforms or bypass simple bot detection.
