DefploreX - MACHINE-LEARNING TOOLKIT FOR LARGE-SCALE ECRIME FORENSICS
At BlackHat USA 2017's Arsenal we've showcased DefPloreX, an Elasticsearch-based toolkit that our team uses for large-scale processing, analysis and visualization of e-crime records. In particular, we've successfully been applying DefPloreX to the analysis of deface records (e.g., from web compromises); hence its name, Def(acement) eXPlorer (DefPloreX).
DefPloreX automatically organizes deface records by web pages' content and format (what we call ``template pages''). This allows an analyst to easily investigate on campaigns, for example in discovering websites targeted by the same campaign or attributing one or more actors to the same hacking group. All of this without sacrificing the interactivity aspect of the investigation.
The full version of DefPloreX includes:
- A thin wrapper to interact with an Elasticsearch backend (included in this release)
- A distributed data-processing pipeline based on Celery (example included in this release)
- An analysis component to extract information from deface web pages
- A features extraction component to produce a compact, numerical and categorical representation of each web page
- A statistical machine-learning component to automatically find groups of similar web pages
The input to DefPloreX is a feed of URLs describing the deface web pages, including metadata such as the (declared) attacker name, timestamp, reason for hacking that page, and so on. Separately, we also have a mirror of the web pages at the time of compromise.
Post a Comment