SlideShare a Scribd company logo
1 of 66
Data Mining and
  Exploration
           David Carasso, Office of CTO, Chief Mind
AGENDA
What is data mining?

What’s the plan of attack?

What type of events do I have?

How do I mine fields?

How do I to detect anomalous events?

Why do I need to visualize my data?
What is Data Mining?


                       3
Is this data mining?

This is an orange




                                   4
What is Data Mining?

Extracting implicit, previously unknown, and
potentially useful information from data.




                                               5
Better




         6
Data Preparation




                   Understanding
Data Exploration
Data Mining


                                   7
What’s the plan of attack?


                             8
Preparing the data
You've been thrown data you aren't familiar with…

Mar 7 12:40:01 willLaptop crond(pam_unix)[10696]: session opened for user root by (uid=0)
Mar 7 12:40:01 willLaptop crond(pam_unix)[10695]: session closed for user root
Mar 7 12:40:02 willLaptop crond(pam_unix)[10696]: session closed for user root
Mar 7 12:44:47 willLaptop gconfd (root-10750): starting (version 2.10.0), pid 10750 user
'root'
Mar 7 12:44:47 willLaptop gconfd (root-10750): Resolved address
"xml:readonly:/etc/gconf/gconf.xml.mandatory" to a read-only config...
Mar 7 12:44:47 willLaptop gconfd (root-10750): Resolved address "xml:readwrite:/root/.gconf”…
Mar 7 12:45:01 willLaptop crond(pam_unix)[10754]: session opened for user root by (uid=0)
Mar 7 12:45:02 willLaptop crond(pam_unix)[10754]: session closed for user root
....
                        Eventtypes        Fields      Transactions Anomalies
                     (closed sessions)    (pid)       (open-close) (unexpected
                                                                     address)
                                                                                                9
Is Understanding Linear?

     Event
    Groups               Events

               reports


   Anomalies             Fields



                                       No.

                                  10
What type of events do I have?


                                 11
Given Some Unknown Data
Mar 7 12:40:01 willLaptop crond(pam_unix)[10696]: session opened for user root by (uid=0)
Mar 7 12:40:01 willLaptop crond(pam_unix)[10695]: session closed for user root
Mar 7 12:40:02 willLaptop crond(pam_unix)[10696]: session closed for user root
Mar 7 12:44:47 willLaptop gconfd (root-10750): starting (version 2.10.0), pid 10750 user
'root'
Mar 7 12:44:47 willLaptop gconfd (root-10750): Resolved address
"xml:readonly:/etc/gconf/gconf.xml.mandatory" to a read-only config...
Mar 7 12:44:47 willLaptop gconfd (root-10750): Resolved address "xml:readwrite:/root/.gconf”…
Mar 7 12:44:47 willLaptop gconfd (root-10750): Resolved address
"xml:readonly:/etc/gconf/gconf.xml.defaults" to a read-only configuration ...
Mar 7 12:45:01 willLaptop crond(pam_unix)[10754]: session opened for user root by (uid=0)
Mar 7 12:45:02 willLaptop crond(pam_unix)[10754]: session closed for user root
....




                                                                                            12
Find Broad Categories of Events


Group Events by Content, Format, and Time




                                            13
Group Events by Content
Cluster events with similar values.

Show 3 examples from each cluster, from the most
common cluster to the least:

…| cluster labelonly=t showcount=t
 | dedup 3 cluster_label
           sortby -cluster_count, cluster_label, _time
                                                         14
Events By Content
count    label    _raw
--------------------------------------------------------------------------------------------------------
-
   1339     3    Mar 7 11:05:01 willLaptop crond(pam_unix)[6785]: session opened for user root by…
   1339     3    Mar 7 11:10:01 willLaptop crond(pam_unix)[1769]: session opened for user root by …
   1339     3    Mar 7 11:10:01 willLaptop crond(pam_unix)[1766]: session opened for user root by …

   1324     2    Mar   7 11:05:02 willLaptop crond(pam_unix)[6785]: session closed for user root
   1324     2    Mar   7 11:10:01 willLaptop crond(pam_unix)[1766]: session closed for user root
   1324     2    Mar   7 11:10:02 willLaptop crond(pam_unix)[1769]: session closed for user root

    136     13   Mar   7 20:05:08 willLaptop kernel: SELinux: initialized (dev selinuxfs, type
selinuxfs)…
    136     13   Mar   7 20:05:09 willLaptop kernel: SELinux: initialized (dev usbfs, type usbfs), uses …
    136     13   Mar   7 20:05:09 willLaptop kernel: SELinux: initialized (dev sysfs, type sysfs), uses …




                                                                                                   15
Group by $%#! Format
Cluster events by first 7 punctuation chars:

…| rex field=punct "(?<smallpunct>.{7})”
 | eventstats count by smallpunct
 | sort -count, smallpunct
 | dedup 3 smallpunct
                                               16
Events by Format
count smallpunct raw
------------------------------------------------------------------------------------------------
 637 __::__(    Mar 10 16:50:02 willLaptop crond(pam_unix)[9639]: session closed for user root
 637 __::__(    Mar 10 16:50:01 willLaptop crond(pam_unix)[9638]: session closed for user root
 637 __::__(    Mar 10 16:50:01 willLaptop crond(pam_unix)[9639]: session opened for user root by …

367 __::__:    Mar 10 15:30:25 willLaptop dhclient: bound to 10.1.1.194 -- renewal in 5788 seconds.
367 __::__:    Mar 10 15:30:25 willLaptop dhclient: DHCPACK from 10.1.1.50
367 __::__:    Mar 10 15:30:25 willLaptop dhclient: DHCPREQUEST on eth0 to 10.1.1.50 port 67

 57 __::__[    Mar 10 16:46:32 willLaptop ntpd[2544]: synchronized to 138.23.180.126, stratum 2
 57 __::__[    Mar 10 16:46:27 willLaptop ntpd[2544]: synchronized to LOCAL(0), stratum 10
 57 __::__[    Mar 10 16:42:09 willLaptop ntpd[2544]: time reset -0.236567 s




                                                                                                  17
Group by Time
Look for bursts of events

                       •   Turn on computer
                       •   Load a web page
                       •   Detects speeding car
                       •   Print document
                       •   Scan security badge



                                                  18
Group by Time Bursts
… | transaction maxpause=2s
  | search eventcount>1
Mar   10   16:50:01   willLaptop   crond(pam_unix)[9638]:   session   opened   for   user   root by (uid=0)
Mar   10   16:50:01   willLaptop   crond(pam_unix)[9639]:   session   opened   for   user   root by (uid=0)
Mar   10   16:50:01   willLaptop   crond(pam_unix)[9638]:   session   closed   for   user   root
Mar   10   16:50:02   willLaptop   crond(pam_unix)[9639]:   session   closed   for   user   root

Mar 10 15:30:25 willLaptop dhclient: DHCPREQUEST on eth0 to 10.1.1.50 port 67
Mar 10 15:30:25 willLaptop dhclient: DHCPACK from 10.1.1.50
Mar 10 15:30:25 willLaptop dhclient: bound to 10.1.1.194 -- renewal in 5788 seconds.

Mar 10 16:45:01 willLaptop crond(pam_unix)[9553]: session opened for user root by (uid=0)
Mar 10 16:45:02 willLaptop crond(pam_unix)[9553]: session closed for user root


                                                                                                              19
Multiple Sources
                   (not really correct)




                                          20
Now what?

1. ✓ group your data
2.   tell splunk!




                          21
Telling Splunk
(about your groups of events)



Add eventtypes and tags



                                Huh?

                                22
SURPRISE TANGENT!

What is an eventtype?


                        23
Eventtype

A dynamic “tag” added to events, if they would
match the search that defines the eventtype.



                                                 24
Eventtype:
  Name: “closed_root”
  Definition: “session closed”   root


Event:
  … session closed for user root …
  =>
  eventtype=closed_root



                                        25
Create an Eventtype




                      26
Independent searches will return events tagged
with previous eventtypes that help classify events.




                                                      27
Create reports on the classifications you’ve made




                                                    Ok, it
                                                    wasn’t a
                                                    tangent.

                                                     28
How do I mine fields?


                        29
Fields Correlation

Discover correlations to remove uninteresting
fields and narrow in on promising reports.



                                                haiku

                                                30
Fields Correlation Haiku

Discover patterns
in fields with a correlation:
co-occurring fields.

                                   indulgence

                                      31
Splunkd.log Sample File
09-05-2012   15:34:11.886   -0700   INFO    ExecProcessor - Ran script: python /opt/splunk/etc/apps/...
09-05-2012   15:34:02.467   -0700   ERROR   TcpOutputProc - Can't find or illegal IP address or ...
09-05-2012   15:32:03.397   -0700   INFO    ProcessTracker - Process ran long; type=SplunkOptimize ...
09-05-2012   15:30:20.016   -0700   WARN    DispatchCommand - The system is approaching the maximum ...




                                                                                               fascinating

                                                                                                  32
Field Correlation
… | correlate
RowField                      C     CN   Component   Context      L   ...
------------------------   ----   ----   ---------   -------   ----
C                          1.00   1.00        0.00      0.00   1.00
CN                         1.00   1.00        0.00      0.00   1.00
Component                  0.00   0.00        1.00      0.06   0.00
Context                    0.00   0.00        0.06      1.00   0.00
L                          1.00   1.00        0.00      0.00   1.00
Log_Level                  0.00   0.00        1.00      0.06   0.00
…




                                                                            33
Field Associations
automatically deduce correlations and
implications of field values:

…| associate Log_Level Component




                                        34
Field Association Summary
                                                              Uncond    Cond
Ref_Key     Ref_Value                  Target_Key   Support   Entropy   Entropy   Increase   Top_Conditional_Value
---------   ------------------------   ----------   -------   -------   -------   --------   ------------------------
Component   DatabaseDirectoryManager   Log_Level    34.67%    1.182     0.000     1.182201   WARN (62.25% -> 100.00%)
Component   HotDBManager               Log_Level    38.25%    1.182     0.000     1.182201   INFO (33.15% -> 100.00%)
Component   SavedSplunker              Log_Level    394.31%   1.182     0.000     1.182201   WARN (62.25% -> 100.00%)
Component   databasePartitionPolicy    Log_Level    95.50%    1.182     0.417     0.765017   INFO (33.15% -> 91.57%)
Component   loader                     Log_Level    79.17%    1.182     0.050     1.131883   INFO (33.15% -> 99.44%)
Component   timeinvertedIndex          Log_Level    44.28%    1.182     0.000     1.182201   INFO (33.15% -> 100.00%)




                                                                                                               35
Top Fields by Fields
Most common Log_Level by Component:

  ... | top Log_Level by Component
Component                            Log_Level   count      percent
----------------------------------   ---------   -----   ----------
AdminManager                         WARN            1   100.000000
DatabaseDirectoryManager             WARN          153   100.000000
DateParserVerbose                    WARN          262   100.000000
DedupProcessor                       ERROR           1   100.000000
DeploymentClient                     DEBUG          60    85.714286
DeploymentClient                     WARN            5     7.142857


                                                                      36
How do I to detect anomalous events?


                                 37
Types of Anomalies

Anomalies you know about

Anomalies you don’t know about


                                 38
Handling Known Anomalies.
Easy. Define a search for the anomalous condition
and make an alert to detect it.

ip=10.* NOT domain=mycompany.com

… | stats perc99(spent)  500ms.
      Alert on “spent>500”

                                                    39
Finding Unknown Anomalies
Look for Abnormal
• Single-Field Values
• Multi-Field Values
• Contexts
• Visual Inspections…

                                40
Anomalies by Single Field Values
Identify anomalous values in a given field either by
frequency of occurrence or number of standard
deviations from the mean.

… | anomalousvalue action=summary pthresh=0.02
  | search isNum=YES

                                                       41
Anomalies by Single Field Values




                                   42
Anomalous by Many Values

Look for small clusters – by content, format, and
time – to find anomalies. For example…

…| cluster …| sort cluster_count

                                                43
Smallest Clusters by Content
count   label   uri

1       7    /img/skins/default/bolt.png

1       37   /en-US/search/inspector?sid=1345075042.125&namespace=search

1       45   /services/admin/summarization?count=10

1       53   /services/pdfgen/is_available?viewId=index_status_health&...

1       57   /static/splunkrc_cmds.xml




                                                                            44
Small Clusters: Bursts of One
Find bursts of just a single events where a pause of 2 seconds
occurred around it.

… |transaction maxpause=2s | search eventcount = 1


Mar 10 16:46:32 willLaptop ntpd[2544]: synchronized to 138.23.180.126…

Mar 10 16:46:27 willLaptop ntpd[2544]: synchronized to LOCAL(0), stratum…

Mar 10 16:42:09 willLaptop ntpd[2544]: time reset -0.236567…


                                                                            45
Burst of One
Same idea, different data source: splunk
[11:58:08] "POST /services/search/jobs/export HTTP/1.1" 200 201630 …

[11:12:51] "POST /services/search/jobs/export HTTP/1.1" 200 459441 …

[10:00:58] "GET /servicesNS/nobody/SplunkDeploymentMonitor/backfill/…




                                                                        46
Anomalous by Context
Identify values not expected by the context of other
events.

… | anomalies field=file labelonly=true maxvalues=10




                                                       47
Anomalous by Context
  Unexpectedness    file
  0.00             shelper
  0.16             shelper
  0.00             1345502591.356
  0.00             1345502591.356
  0.00             1345074401.191
  0.00             1345074031.153     time
  0.03             1345074328.186
  0.00             1345502591.356
  0.35             conf-dm_backfill
  0.00             1345074309.185
  0.00             1345502591.356




                                             48
Surprise Eventtype: Part Deux!
Classified major categories of your data with
eventtypes?

-- just search for things that don’t match those
eventtypes

                                                   49
50
Once you can describe anomalous
     behavior as a search…


                              51
52
Other mining commands
• kmeans: Performs k-means clustering on selected
  fields.
• outlier: Removes outlying numerical values.
• af (analyze fields): Analyzes numerical fields for their
  ability to predict another discrete field
• fieldsummary : Generates summary information fields.
• shape: Produces a symbolic 'shape' attribute describing
  the shape of a numeric multivalued field
                                                        53
Why do I need to visualize my data?


                                  54
Data Mining by Visualization

Visualization can capture nuances in the data that
numerical or linguistic summaries cannot easily capture.




                                                           55
These data points are radically different.




                                         *Source: Anscombe’s Quartet (Anscombe 1973)



                                                                          56
Why visualize?
Because they all have the exact same

  • average (7.50)
  • standard deviation (2.03)
  • least-squares fit (3 + 0.5x).

Do not just rely on numerical summarization.
                                               57
But I already have charts!
You don’t graph enough.
Data Exploration
    Don’t decide ahead of time what graphs you want
    Regularly do out-of-the-box scenarios with graphs




                                                        58
Data Exploration
Variations:
• Subsets of Events (paying customers vs lookers)
• Fields by Fields (including eventtypes and tags)
• Ignored fields
• Min/max/avg/count
• Compare to other times windows
• Transactions
                                                     59
Visual Arrangement
Sorting data, Changing Scales
(Linear/Log), Min/Max can have a huge difference
on looking at the same data.




                                                   60
Visual Considerations
         Pick representations that make
         obvious the distinctions you
         need to care about.




                                          61
Summary


          62
Summary
• Discovery is an iterative process.
• Group events by content, format, and time, and
  define classifications with eventtypes and tags
• Focus on promising fields with correlations
• Discover unknown anomalies with small clusters.
• Visualize your data, from a dozen angles.
                                                63
But wait!



            64
More to come: Predictive Analytics
… | forecast foo




                                     65
The End
                            Mine the Gap.



.,`...,`...,`...,`...,`...,`...,`...,`...,`...,`...,`...,`...
.,`......_.,`...,`...,`...,`...,`...,`...,`...,`...,`....._..
...___..|.|...__._..._.__.,`..._.__.,`..___...__.,`...__.|.|.
../.__|.|.|../._`.|.|.'_.....|.'_..../._...../././.|.|.
.|.(__..|.|.|.(_|.|.|.|_).|...|.|.|.|.|.(_).|...V..V./..|_|.
..___|.|_|..__,_|.|..__/....|_|.|_|..___/...._/_/...(_).
.,`...,`...,`...,`..|_|.,`...,`...,`...,`...,`...,`...,`.....
                                         Golf clapping at #datamining
.,`...,`...,`...,`...,`...,`...,`...,`...,`...,`...,`...,`...




                                                                        66

More Related Content

What's hot

Analyzing Petabyte Scale Financial Data with Apache Pinot and Apache Kafka | ...
Analyzing Petabyte Scale Financial Data with Apache Pinot and Apache Kafka | ...Analyzing Petabyte Scale Financial Data with Apache Pinot and Apache Kafka | ...
Analyzing Petabyte Scale Financial Data with Apache Pinot and Apache Kafka | ...HostedbyConfluent
 
Amazon S3 & Amazon Glacier - Object Storage Overview
Amazon S3 & Amazon Glacier - Object Storage OverviewAmazon S3 & Amazon Glacier - Object Storage Overview
Amazon S3 & Amazon Glacier - Object Storage OverviewAmazon Web Services
 
Apache Iceberg - A Table Format for Hige Analytic Datasets
Apache Iceberg - A Table Format for Hige Analytic DatasetsApache Iceberg - A Table Format for Hige Analytic Datasets
Apache Iceberg - A Table Format for Hige Analytic DatasetsAlluxio, Inc.
 
How Government Agencies are Using MongoDB to Build Data as a Service Solutions
How Government Agencies are Using MongoDB to Build Data as a Service SolutionsHow Government Agencies are Using MongoDB to Build Data as a Service Solutions
How Government Agencies are Using MongoDB to Build Data as a Service SolutionsMongoDB
 
Capital One Delivers Risk Insights in Real Time with Stream Processing
Capital One Delivers Risk Insights in Real Time with Stream ProcessingCapital One Delivers Risk Insights in Real Time with Stream Processing
Capital One Delivers Risk Insights in Real Time with Stream Processingconfluent
 
Warranty Predictive Analytics solution
Warranty Predictive Analytics solutionWarranty Predictive Analytics solution
Warranty Predictive Analytics solutionRevolution Analytics
 
Introduction to Stream Processing
Introduction to Stream ProcessingIntroduction to Stream Processing
Introduction to Stream ProcessingGuido Schmutz
 
Interactively Querying Large-scale Datasets on Amazon S3
Interactively Querying Large-scale Datasets on Amazon S3Interactively Querying Large-scale Datasets on Amazon S3
Interactively Querying Large-scale Datasets on Amazon S3Amazon Web Services
 
SAP Global Available to Promise (gATP) 101: Global Visibility vs. Global Avai...
SAP Global Available to Promise (gATP) 101: Global Visibility vs. Global Avai...SAP Global Available to Promise (gATP) 101: Global Visibility vs. Global Avai...
SAP Global Available to Promise (gATP) 101: Global Visibility vs. Global Avai...Plan4Demand
 
How to Implement Snowflake Security Best Practices with Panther
How to Implement Snowflake Security Best Practices with PantherHow to Implement Snowflake Security Best Practices with Panther
How to Implement Snowflake Security Best Practices with PantherPanther Labs
 
Productizing Structured Streaming Jobs
Productizing Structured Streaming JobsProductizing Structured Streaming Jobs
Productizing Structured Streaming JobsDatabricks
 
AWS (Amazon Redshift) presentation
AWS (Amazon Redshift) presentationAWS (Amazon Redshift) presentation
AWS (Amazon Redshift) presentationVolodymyr Rovetskiy
 
Executive S&Op Case Study Gpseg
Executive S&Op Case Study GpsegExecutive S&Op Case Study Gpseg
Executive S&Op Case Study Gpsegguest268716d
 
Mainframe Modernization with AWS: Patterns and Best Practices
Mainframe Modernization with AWS: Patterns and Best PracticesMainframe Modernization with AWS: Patterns and Best Practices
Mainframe Modernization with AWS: Patterns and Best PracticesAmazon Web Services
 
Warehouse Storage Managment System Database Schema
Warehouse Storage Managment System Database SchemaWarehouse Storage Managment System Database Schema
Warehouse Storage Managment System Database SchemaMatt Saragusa
 

What's hot (20)

Analyzing Petabyte Scale Financial Data with Apache Pinot and Apache Kafka | ...
Analyzing Petabyte Scale Financial Data with Apache Pinot and Apache Kafka | ...Analyzing Petabyte Scale Financial Data with Apache Pinot and Apache Kafka | ...
Analyzing Petabyte Scale Financial Data with Apache Pinot and Apache Kafka | ...
 
Amazon S3 & Amazon Glacier - Object Storage Overview
Amazon S3 & Amazon Glacier - Object Storage OverviewAmazon S3 & Amazon Glacier - Object Storage Overview
Amazon S3 & Amazon Glacier - Object Storage Overview
 
Apache Iceberg - A Table Format for Hige Analytic Datasets
Apache Iceberg - A Table Format for Hige Analytic DatasetsApache Iceberg - A Table Format for Hige Analytic Datasets
Apache Iceberg - A Table Format for Hige Analytic Datasets
 
Event Storming and Saga
Event Storming and SagaEvent Storming and Saga
Event Storming and Saga
 
How Government Agencies are Using MongoDB to Build Data as a Service Solutions
How Government Agencies are Using MongoDB to Build Data as a Service SolutionsHow Government Agencies are Using MongoDB to Build Data as a Service Solutions
How Government Agencies are Using MongoDB to Build Data as a Service Solutions
 
Apache Arrow - An Overview
Apache Arrow - An OverviewApache Arrow - An Overview
Apache Arrow - An Overview
 
Capital One Delivers Risk Insights in Real Time with Stream Processing
Capital One Delivers Risk Insights in Real Time with Stream ProcessingCapital One Delivers Risk Insights in Real Time with Stream Processing
Capital One Delivers Risk Insights in Real Time with Stream Processing
 
Warranty Predictive Analytics solution
Warranty Predictive Analytics solutionWarranty Predictive Analytics solution
Warranty Predictive Analytics solution
 
Building Data Lakes with AWS
Building Data Lakes with AWSBuilding Data Lakes with AWS
Building Data Lakes with AWS
 
Kafka 101
Kafka 101Kafka 101
Kafka 101
 
Introduction to Stream Processing
Introduction to Stream ProcessingIntroduction to Stream Processing
Introduction to Stream Processing
 
Interactively Querying Large-scale Datasets on Amazon S3
Interactively Querying Large-scale Datasets on Amazon S3Interactively Querying Large-scale Datasets on Amazon S3
Interactively Querying Large-scale Datasets on Amazon S3
 
Introduction to Amazon Redshift
Introduction to Amazon RedshiftIntroduction to Amazon Redshift
Introduction to Amazon Redshift
 
SAP Global Available to Promise (gATP) 101: Global Visibility vs. Global Avai...
SAP Global Available to Promise (gATP) 101: Global Visibility vs. Global Avai...SAP Global Available to Promise (gATP) 101: Global Visibility vs. Global Avai...
SAP Global Available to Promise (gATP) 101: Global Visibility vs. Global Avai...
 
How to Implement Snowflake Security Best Practices with Panther
How to Implement Snowflake Security Best Practices with PantherHow to Implement Snowflake Security Best Practices with Panther
How to Implement Snowflake Security Best Practices with Panther
 
Productizing Structured Streaming Jobs
Productizing Structured Streaming JobsProductizing Structured Streaming Jobs
Productizing Structured Streaming Jobs
 
AWS (Amazon Redshift) presentation
AWS (Amazon Redshift) presentationAWS (Amazon Redshift) presentation
AWS (Amazon Redshift) presentation
 
Executive S&Op Case Study Gpseg
Executive S&Op Case Study GpsegExecutive S&Op Case Study Gpseg
Executive S&Op Case Study Gpseg
 
Mainframe Modernization with AWS: Patterns and Best Practices
Mainframe Modernization with AWS: Patterns and Best PracticesMainframe Modernization with AWS: Patterns and Best Practices
Mainframe Modernization with AWS: Patterns and Best Practices
 
Warehouse Storage Managment System Database Schema
Warehouse Storage Managment System Database SchemaWarehouse Storage Managment System Database Schema
Warehouse Storage Managment System Database Schema
 

Viewers also liked

Splunk conf2014 - Detecting Fraud and Suspicious Events Using Risk Scoring
Splunk conf2014 - Detecting Fraud and Suspicious Events Using Risk ScoringSplunk conf2014 - Detecting Fraud and Suspicious Events Using Risk Scoring
Splunk conf2014 - Detecting Fraud and Suspicious Events Using Risk ScoringSplunk
 
SplunkLive! Splunk for Insider Threats and Fraud Detection
SplunkLive! Splunk for Insider Threats and Fraud DetectionSplunkLive! Splunk for Insider Threats and Fraud Detection
SplunkLive! Splunk for Insider Threats and Fraud DetectionSplunk
 
Machine Learning + Analytics in Splunk
Machine Learning + Analytics in SplunkMachine Learning + Analytics in Splunk
Machine Learning + Analytics in SplunkSplunk
 
Splunk FISMA for Continuous Monitoring
Splunk FISMA for Continuous Monitoring Splunk FISMA for Continuous Monitoring
Splunk FISMA for Continuous Monitoring Greg Hanchin
 
SplunkLive! Data Models 101
SplunkLive! Data Models 101SplunkLive! Data Models 101
SplunkLive! Data Models 101Splunk
 
Virtual SplunkLive! for Higher Education Overview/Customers
Virtual SplunkLive! for Higher Education Overview/CustomersVirtual SplunkLive! for Higher Education Overview/Customers
Virtual SplunkLive! for Higher Education Overview/CustomersSplunk
 
Avoiding big data antipatterns
Avoiding big data antipatternsAvoiding big data antipatterns
Avoiding big data antipatternsgrepalex
 
How to integrate Splunk with any data solution
How to integrate Splunk with any data solutionHow to integrate Splunk with any data solution
How to integrate Splunk with any data solutionJulian Hyde
 
.conf2011: Web Analytics Throwdown: with NPR and Intuit
.conf2011: Web Analytics Throwdown: with NPR and Intuit.conf2011: Web Analytics Throwdown: with NPR and Intuit
.conf2011: Web Analytics Throwdown: with NPR and IntuitErin Sweeney
 
HawkEye : A Real-time Anomaly Detection System
HawkEye : A Real-time Anomaly Detection SystemHawkEye : A Real-time Anomaly Detection System
HawkEye : A Real-time Anomaly Detection SystemSatnam Singh
 
Internship_presentation
Internship_presentationInternship_presentation
Internship_presentationAditya Gautam
 
Splunk Fundamentals: Investigations with Core Splunk - Splunk Tech Day
Splunk Fundamentals: Investigations with Core Splunk - Splunk Tech DaySplunk Fundamentals: Investigations with Core Splunk - Splunk Tech Day
Splunk Fundamentals: Investigations with Core Splunk - Splunk Tech DayZivaro Inc
 
Splunk | Reporting Use Cases
Splunk | Reporting Use CasesSplunk | Reporting Use Cases
Splunk | Reporting Use CasesBeth Goldman
 
Analytics for large-scale time series and event data
Analytics for large-scale time series and event dataAnalytics for large-scale time series and event data
Analytics for large-scale time series and event dataAnodot
 
Science of Anomaly Detection
Science of Anomaly Detection Science of Anomaly Detection
Science of Anomaly Detection Numenta
 
Splunk .conf2011: Real Time Alerting and Monitoring
Splunk .conf2011: Real Time Alerting and MonitoringSplunk .conf2011: Real Time Alerting and Monitoring
Splunk .conf2011: Real Time Alerting and MonitoringErin Sweeney
 
Splunk at Expedia - Gartner Symposium
Splunk at Expedia - Gartner SymposiumSplunk at Expedia - Gartner Symposium
Splunk at Expedia - Gartner SymposiumEddie Satterly
 
"Building Anomaly Detection For Large Scale Analytics", Yonatan Ben Shimon, A...
"Building Anomaly Detection For Large Scale Analytics", Yonatan Ben Shimon, A..."Building Anomaly Detection For Large Scale Analytics", Yonatan Ben Shimon, A...
"Building Anomaly Detection For Large Scale Analytics", Yonatan Ben Shimon, A...Dataconomy Media
 
Visualizing the Insider Threat: Challenges and tools for identifying maliciou...
Visualizing the Insider Threat: Challenges and tools for identifying maliciou...Visualizing the Insider Threat: Challenges and tools for identifying maliciou...
Visualizing the Insider Threat: Challenges and tools for identifying maliciou...Phil Legg
 

Viewers also liked (20)

Splunk conf2014 - Detecting Fraud and Suspicious Events Using Risk Scoring
Splunk conf2014 - Detecting Fraud and Suspicious Events Using Risk ScoringSplunk conf2014 - Detecting Fraud and Suspicious Events Using Risk Scoring
Splunk conf2014 - Detecting Fraud and Suspicious Events Using Risk Scoring
 
SplunkLive! Splunk for Insider Threats and Fraud Detection
SplunkLive! Splunk for Insider Threats and Fraud DetectionSplunkLive! Splunk for Insider Threats and Fraud Detection
SplunkLive! Splunk for Insider Threats and Fraud Detection
 
Machine Learning + Analytics in Splunk
Machine Learning + Analytics in SplunkMachine Learning + Analytics in Splunk
Machine Learning + Analytics in Splunk
 
Splunk FISMA for Continuous Monitoring
Splunk FISMA for Continuous Monitoring Splunk FISMA for Continuous Monitoring
Splunk FISMA for Continuous Monitoring
 
SplunkLive! Data Models 101
SplunkLive! Data Models 101SplunkLive! Data Models 101
SplunkLive! Data Models 101
 
Virtual SplunkLive! for Higher Education Overview/Customers
Virtual SplunkLive! for Higher Education Overview/CustomersVirtual SplunkLive! for Higher Education Overview/Customers
Virtual SplunkLive! for Higher Education Overview/Customers
 
Avoiding big data antipatterns
Avoiding big data antipatternsAvoiding big data antipatterns
Avoiding big data antipatterns
 
How to integrate Splunk with any data solution
How to integrate Splunk with any data solutionHow to integrate Splunk with any data solution
How to integrate Splunk with any data solution
 
.conf2011: Web Analytics Throwdown: with NPR and Intuit
.conf2011: Web Analytics Throwdown: with NPR and Intuit.conf2011: Web Analytics Throwdown: with NPR and Intuit
.conf2011: Web Analytics Throwdown: with NPR and Intuit
 
HawkEye : A Real-time Anomaly Detection System
HawkEye : A Real-time Anomaly Detection SystemHawkEye : A Real-time Anomaly Detection System
HawkEye : A Real-time Anomaly Detection System
 
Internship_presentation
Internship_presentationInternship_presentation
Internship_presentation
 
Splunk Fundamentals: Investigations with Core Splunk - Splunk Tech Day
Splunk Fundamentals: Investigations with Core Splunk - Splunk Tech DaySplunk Fundamentals: Investigations with Core Splunk - Splunk Tech Day
Splunk Fundamentals: Investigations with Core Splunk - Splunk Tech Day
 
Splunk | Reporting Use Cases
Splunk | Reporting Use CasesSplunk | Reporting Use Cases
Splunk | Reporting Use Cases
 
Analytics for large-scale time series and event data
Analytics for large-scale time series and event dataAnalytics for large-scale time series and event data
Analytics for large-scale time series and event data
 
Science of Anomaly Detection
Science of Anomaly Detection Science of Anomaly Detection
Science of Anomaly Detection
 
Splunk .conf2011: Real Time Alerting and Monitoring
Splunk .conf2011: Real Time Alerting and MonitoringSplunk .conf2011: Real Time Alerting and Monitoring
Splunk .conf2011: Real Time Alerting and Monitoring
 
Splunk at Expedia - Gartner Symposium
Splunk at Expedia - Gartner SymposiumSplunk at Expedia - Gartner Symposium
Splunk at Expedia - Gartner Symposium
 
"Building Anomaly Detection For Large Scale Analytics", Yonatan Ben Shimon, A...
"Building Anomaly Detection For Large Scale Analytics", Yonatan Ben Shimon, A..."Building Anomaly Detection For Large Scale Analytics", Yonatan Ben Shimon, A...
"Building Anomaly Detection For Large Scale Analytics", Yonatan Ben Shimon, A...
 
Visualizing the Insider Threat: Challenges and tools for identifying maliciou...
Visualizing the Insider Threat: Challenges and tools for identifying maliciou...Visualizing the Insider Threat: Challenges and tools for identifying maliciou...
Visualizing the Insider Threat: Challenges and tools for identifying maliciou...
 
Carta Teccsen
Carta TeccsenCarta Teccsen
Carta Teccsen
 

Similar to Data Mining and Exploration Events, Fields, and Anomalies

MongoDB Europe 2016 - Debugging MongoDB Performance
MongoDB Europe 2016 - Debugging MongoDB PerformanceMongoDB Europe 2016 - Debugging MongoDB Performance
MongoDB Europe 2016 - Debugging MongoDB PerformanceMongoDB
 
2014 OSDC Talk: Introduction to Percona XtraDB Cluster and HAProxy
2014 OSDC Talk: Introduction to Percona XtraDB Cluster and HAProxy2014 OSDC Talk: Introduction to Percona XtraDB Cluster and HAProxy
2014 OSDC Talk: Introduction to Percona XtraDB Cluster and HAProxyBo-Yi Wu
 
Layer one 2011-gh0stwood-d-dos-attacks
Layer one 2011-gh0stwood-d-dos-attacksLayer one 2011-gh0stwood-d-dos-attacks
Layer one 2011-gh0stwood-d-dos-attacksfangjiafu
 
Diagnostics and Debugging
Diagnostics and DebuggingDiagnostics and Debugging
Diagnostics and DebuggingMongoDB
 
Diagnostics & Debugging webinar
Diagnostics & Debugging webinarDiagnostics & Debugging webinar
Diagnostics & Debugging webinarMongoDB
 
Virtual Knowledge Graphs for Federated Log Analysis
Virtual Knowledge Graphs for Federated Log AnalysisVirtual Knowledge Graphs for Federated Log Analysis
Virtual Knowledge Graphs for Federated Log AnalysisKabul Kurniawan
 
Hands on MapR -- Viadea
Hands on MapR -- ViadeaHands on MapR -- Viadea
Hands on MapR -- Viadeaviadea
 
Unlocking Your Hadoop Data with Apache Spark and CDH5
Unlocking Your Hadoop Data with Apache Spark and CDH5Unlocking Your Hadoop Data with Apache Spark and CDH5
Unlocking Your Hadoop Data with Apache Spark and CDH5SAP Concur
 
Containers with systemd-nspawn
Containers with systemd-nspawnContainers with systemd-nspawn
Containers with systemd-nspawnGábor Nyers
 
The Heatmap
 - Why is Security Visualization so Hard?
The Heatmap
 - Why is Security Visualization so Hard?The Heatmap
 - Why is Security Visualization so Hard?
The Heatmap
 - Why is Security Visualization so Hard?Raffael Marty
 
Ring 0/-2 Rootkits: bypassing defenses -- DEF CON 2018 USA
Ring 0/-2 Rootkits: bypassing defenses  -- DEF CON 2018 USARing 0/-2 Rootkits: bypassing defenses  -- DEF CON 2018 USA
Ring 0/-2 Rootkits: bypassing defenses -- DEF CON 2018 USAAlexandre Borges
 
Apache Flink internals
Apache Flink internalsApache Flink internals
Apache Flink internalsKostas Tzoumas
 
Saltstack - Orchestration & Application Deployment
Saltstack - Orchestration & Application DeploymentSaltstack - Orchestration & Application Deployment
Saltstack - Orchestration & Application Deploymentinovex GmbH
 
Finding an unusual cause of max_user_connections in MySQL
Finding an unusual cause of max_user_connections in MySQLFinding an unusual cause of max_user_connections in MySQL
Finding an unusual cause of max_user_connections in MySQLOlivier Doucet
 
Network Simulator Tutorial
Network Simulator TutorialNetwork Simulator Tutorial
Network Simulator Tutorialcscarcas
 
Nagios Conference 2011 - Mike Weber - Training: Getting Started With SNMP
Nagios Conference 2011 - Mike Weber - Training: Getting Started With SNMPNagios Conference 2011 - Mike Weber - Training: Getting Started With SNMP
Nagios Conference 2011 - Mike Weber - Training: Getting Started With SNMPNagios
 
RING 0/-2 ROOKITS : COMPROMISING DEFENSES
 RING 0/-2 ROOKITS : COMPROMISING DEFENSES RING 0/-2 ROOKITS : COMPROMISING DEFENSES
RING 0/-2 ROOKITS : COMPROMISING DEFENSESPriyanka Aash
 
How to Troubleshoot OpenStack Without Losing Sleep
How to Troubleshoot OpenStack Without Losing SleepHow to Troubleshoot OpenStack Without Losing Sleep
How to Troubleshoot OpenStack Without Losing SleepSadique Puthen
 

Similar to Data Mining and Exploration Events, Fields, and Anomalies (20)

MongoDB Europe 2016 - Debugging MongoDB Performance
MongoDB Europe 2016 - Debugging MongoDB PerformanceMongoDB Europe 2016 - Debugging MongoDB Performance
MongoDB Europe 2016 - Debugging MongoDB Performance
 
2014 OSDC Talk: Introduction to Percona XtraDB Cluster and HAProxy
2014 OSDC Talk: Introduction to Percona XtraDB Cluster and HAProxy2014 OSDC Talk: Introduction to Percona XtraDB Cluster and HAProxy
2014 OSDC Talk: Introduction to Percona XtraDB Cluster and HAProxy
 
Layer one 2011-gh0stwood-d-dos-attacks
Layer one 2011-gh0stwood-d-dos-attacksLayer one 2011-gh0stwood-d-dos-attacks
Layer one 2011-gh0stwood-d-dos-attacks
 
Diagnostics and Debugging
Diagnostics and DebuggingDiagnostics and Debugging
Diagnostics and Debugging
 
Diagnostics & Debugging webinar
Diagnostics & Debugging webinarDiagnostics & Debugging webinar
Diagnostics & Debugging webinar
 
Virtual Knowledge Graphs for Federated Log Analysis
Virtual Knowledge Graphs for Federated Log AnalysisVirtual Knowledge Graphs for Federated Log Analysis
Virtual Knowledge Graphs for Federated Log Analysis
 
Hands on MapR -- Viadea
Hands on MapR -- ViadeaHands on MapR -- Viadea
Hands on MapR -- Viadea
 
Unlocking Your Hadoop Data with Apache Spark and CDH5
Unlocking Your Hadoop Data with Apache Spark and CDH5Unlocking Your Hadoop Data with Apache Spark and CDH5
Unlocking Your Hadoop Data with Apache Spark and CDH5
 
Metasploitable
MetasploitableMetasploitable
Metasploitable
 
Containers with systemd-nspawn
Containers with systemd-nspawnContainers with systemd-nspawn
Containers with systemd-nspawn
 
The Heatmap
 - Why is Security Visualization so Hard?
The Heatmap
 - Why is Security Visualization so Hard?The Heatmap
 - Why is Security Visualization so Hard?
The Heatmap
 - Why is Security Visualization so Hard?
 
Ring 0/-2 Rootkits: bypassing defenses -- DEF CON 2018 USA
Ring 0/-2 Rootkits: bypassing defenses  -- DEF CON 2018 USARing 0/-2 Rootkits: bypassing defenses  -- DEF CON 2018 USA
Ring 0/-2 Rootkits: bypassing defenses -- DEF CON 2018 USA
 
Apache Flink internals
Apache Flink internalsApache Flink internals
Apache Flink internals
 
Saltstack - Orchestration & Application Deployment
Saltstack - Orchestration & Application DeploymentSaltstack - Orchestration & Application Deployment
Saltstack - Orchestration & Application Deployment
 
Finding an unusual cause of max_user_connections in MySQL
Finding an unusual cause of max_user_connections in MySQLFinding an unusual cause of max_user_connections in MySQL
Finding an unusual cause of max_user_connections in MySQL
 
Network Simulator Tutorial
Network Simulator TutorialNetwork Simulator Tutorial
Network Simulator Tutorial
 
Ns network simulator
Ns network simulatorNs network simulator
Ns network simulator
 
Nagios Conference 2011 - Mike Weber - Training: Getting Started With SNMP
Nagios Conference 2011 - Mike Weber - Training: Getting Started With SNMPNagios Conference 2011 - Mike Weber - Training: Getting Started With SNMP
Nagios Conference 2011 - Mike Weber - Training: Getting Started With SNMP
 
RING 0/-2 ROOKITS : COMPROMISING DEFENSES
 RING 0/-2 ROOKITS : COMPROMISING DEFENSES RING 0/-2 ROOKITS : COMPROMISING DEFENSES
RING 0/-2 ROOKITS : COMPROMISING DEFENSES
 
How to Troubleshoot OpenStack Without Losing Sleep
How to Troubleshoot OpenStack Without Losing SleepHow to Troubleshoot OpenStack Without Losing Sleep
How to Troubleshoot OpenStack Without Losing Sleep
 

Recently uploaded

What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfRankYa
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DaySri Ambati
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfPrecisely
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 

Recently uploaded (20)

What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdf
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 

Data Mining and Exploration Events, Fields, and Anomalies

  • 1. Data Mining and Exploration David Carasso, Office of CTO, Chief Mind
  • 2. AGENDA What is data mining? What’s the plan of attack? What type of events do I have? How do I mine fields? How do I to detect anomalous events? Why do I need to visualize my data?
  • 3. What is Data Mining? 3
  • 4. Is this data mining? This is an orange 4
  • 5. What is Data Mining? Extracting implicit, previously unknown, and potentially useful information from data. 5
  • 6. Better 6
  • 7. Data Preparation Understanding Data Exploration Data Mining 7
  • 8. What’s the plan of attack? 8
  • 9. Preparing the data You've been thrown data you aren't familiar with… Mar 7 12:40:01 willLaptop crond(pam_unix)[10696]: session opened for user root by (uid=0) Mar 7 12:40:01 willLaptop crond(pam_unix)[10695]: session closed for user root Mar 7 12:40:02 willLaptop crond(pam_unix)[10696]: session closed for user root Mar 7 12:44:47 willLaptop gconfd (root-10750): starting (version 2.10.0), pid 10750 user 'root' Mar 7 12:44:47 willLaptop gconfd (root-10750): Resolved address "xml:readonly:/etc/gconf/gconf.xml.mandatory" to a read-only config... Mar 7 12:44:47 willLaptop gconfd (root-10750): Resolved address "xml:readwrite:/root/.gconf”… Mar 7 12:45:01 willLaptop crond(pam_unix)[10754]: session opened for user root by (uid=0) Mar 7 12:45:02 willLaptop crond(pam_unix)[10754]: session closed for user root .... Eventtypes Fields Transactions Anomalies (closed sessions) (pid) (open-close) (unexpected address) 9
  • 10. Is Understanding Linear? Event Groups Events reports Anomalies Fields No. 10
  • 11. What type of events do I have? 11
  • 12. Given Some Unknown Data Mar 7 12:40:01 willLaptop crond(pam_unix)[10696]: session opened for user root by (uid=0) Mar 7 12:40:01 willLaptop crond(pam_unix)[10695]: session closed for user root Mar 7 12:40:02 willLaptop crond(pam_unix)[10696]: session closed for user root Mar 7 12:44:47 willLaptop gconfd (root-10750): starting (version 2.10.0), pid 10750 user 'root' Mar 7 12:44:47 willLaptop gconfd (root-10750): Resolved address "xml:readonly:/etc/gconf/gconf.xml.mandatory" to a read-only config... Mar 7 12:44:47 willLaptop gconfd (root-10750): Resolved address "xml:readwrite:/root/.gconf”… Mar 7 12:44:47 willLaptop gconfd (root-10750): Resolved address "xml:readonly:/etc/gconf/gconf.xml.defaults" to a read-only configuration ... Mar 7 12:45:01 willLaptop crond(pam_unix)[10754]: session opened for user root by (uid=0) Mar 7 12:45:02 willLaptop crond(pam_unix)[10754]: session closed for user root .... 12
  • 13. Find Broad Categories of Events Group Events by Content, Format, and Time 13
  • 14. Group Events by Content Cluster events with similar values. Show 3 examples from each cluster, from the most common cluster to the least: …| cluster labelonly=t showcount=t | dedup 3 cluster_label sortby -cluster_count, cluster_label, _time 14
  • 15. Events By Content count label _raw -------------------------------------------------------------------------------------------------------- - 1339 3 Mar 7 11:05:01 willLaptop crond(pam_unix)[6785]: session opened for user root by… 1339 3 Mar 7 11:10:01 willLaptop crond(pam_unix)[1769]: session opened for user root by … 1339 3 Mar 7 11:10:01 willLaptop crond(pam_unix)[1766]: session opened for user root by … 1324 2 Mar 7 11:05:02 willLaptop crond(pam_unix)[6785]: session closed for user root 1324 2 Mar 7 11:10:01 willLaptop crond(pam_unix)[1766]: session closed for user root 1324 2 Mar 7 11:10:02 willLaptop crond(pam_unix)[1769]: session closed for user root 136 13 Mar 7 20:05:08 willLaptop kernel: SELinux: initialized (dev selinuxfs, type selinuxfs)… 136 13 Mar 7 20:05:09 willLaptop kernel: SELinux: initialized (dev usbfs, type usbfs), uses … 136 13 Mar 7 20:05:09 willLaptop kernel: SELinux: initialized (dev sysfs, type sysfs), uses … 15
  • 16. Group by $%#! Format Cluster events by first 7 punctuation chars: …| rex field=punct "(?<smallpunct>.{7})” | eventstats count by smallpunct | sort -count, smallpunct | dedup 3 smallpunct 16
  • 17. Events by Format count smallpunct raw ------------------------------------------------------------------------------------------------ 637 __::__( Mar 10 16:50:02 willLaptop crond(pam_unix)[9639]: session closed for user root 637 __::__( Mar 10 16:50:01 willLaptop crond(pam_unix)[9638]: session closed for user root 637 __::__( Mar 10 16:50:01 willLaptop crond(pam_unix)[9639]: session opened for user root by … 367 __::__: Mar 10 15:30:25 willLaptop dhclient: bound to 10.1.1.194 -- renewal in 5788 seconds. 367 __::__: Mar 10 15:30:25 willLaptop dhclient: DHCPACK from 10.1.1.50 367 __::__: Mar 10 15:30:25 willLaptop dhclient: DHCPREQUEST on eth0 to 10.1.1.50 port 67 57 __::__[ Mar 10 16:46:32 willLaptop ntpd[2544]: synchronized to 138.23.180.126, stratum 2 57 __::__[ Mar 10 16:46:27 willLaptop ntpd[2544]: synchronized to LOCAL(0), stratum 10 57 __::__[ Mar 10 16:42:09 willLaptop ntpd[2544]: time reset -0.236567 s 17
  • 18. Group by Time Look for bursts of events • Turn on computer • Load a web page • Detects speeding car • Print document • Scan security badge 18
  • 19. Group by Time Bursts … | transaction maxpause=2s | search eventcount>1 Mar 10 16:50:01 willLaptop crond(pam_unix)[9638]: session opened for user root by (uid=0) Mar 10 16:50:01 willLaptop crond(pam_unix)[9639]: session opened for user root by (uid=0) Mar 10 16:50:01 willLaptop crond(pam_unix)[9638]: session closed for user root Mar 10 16:50:02 willLaptop crond(pam_unix)[9639]: session closed for user root Mar 10 15:30:25 willLaptop dhclient: DHCPREQUEST on eth0 to 10.1.1.50 port 67 Mar 10 15:30:25 willLaptop dhclient: DHCPACK from 10.1.1.50 Mar 10 15:30:25 willLaptop dhclient: bound to 10.1.1.194 -- renewal in 5788 seconds. Mar 10 16:45:01 willLaptop crond(pam_unix)[9553]: session opened for user root by (uid=0) Mar 10 16:45:02 willLaptop crond(pam_unix)[9553]: session closed for user root 19
  • 20. Multiple Sources (not really correct) 20
  • 21. Now what? 1. ✓ group your data 2. tell splunk! 21
  • 22. Telling Splunk (about your groups of events) Add eventtypes and tags Huh? 22
  • 23. SURPRISE TANGENT! What is an eventtype? 23
  • 24. Eventtype A dynamic “tag” added to events, if they would match the search that defines the eventtype. 24
  • 25. Eventtype: Name: “closed_root” Definition: “session closed” root Event: … session closed for user root … => eventtype=closed_root 25
  • 27. Independent searches will return events tagged with previous eventtypes that help classify events. 27
  • 28. Create reports on the classifications you’ve made Ok, it wasn’t a tangent. 28
  • 29. How do I mine fields? 29
  • 30. Fields Correlation Discover correlations to remove uninteresting fields and narrow in on promising reports. haiku 30
  • 31. Fields Correlation Haiku Discover patterns in fields with a correlation: co-occurring fields. indulgence 31
  • 32. Splunkd.log Sample File 09-05-2012 15:34:11.886 -0700 INFO ExecProcessor - Ran script: python /opt/splunk/etc/apps/... 09-05-2012 15:34:02.467 -0700 ERROR TcpOutputProc - Can't find or illegal IP address or ... 09-05-2012 15:32:03.397 -0700 INFO ProcessTracker - Process ran long; type=SplunkOptimize ... 09-05-2012 15:30:20.016 -0700 WARN DispatchCommand - The system is approaching the maximum ... fascinating 32
  • 33. Field Correlation … | correlate RowField C CN Component Context L ... ------------------------ ---- ---- --------- ------- ---- C 1.00 1.00 0.00 0.00 1.00 CN 1.00 1.00 0.00 0.00 1.00 Component 0.00 0.00 1.00 0.06 0.00 Context 0.00 0.00 0.06 1.00 0.00 L 1.00 1.00 0.00 0.00 1.00 Log_Level 0.00 0.00 1.00 0.06 0.00 … 33
  • 34. Field Associations automatically deduce correlations and implications of field values: …| associate Log_Level Component 34
  • 35. Field Association Summary Uncond Cond Ref_Key Ref_Value Target_Key Support Entropy Entropy Increase Top_Conditional_Value --------- ------------------------ ---------- ------- ------- ------- -------- ------------------------ Component DatabaseDirectoryManager Log_Level 34.67% 1.182 0.000 1.182201 WARN (62.25% -> 100.00%) Component HotDBManager Log_Level 38.25% 1.182 0.000 1.182201 INFO (33.15% -> 100.00%) Component SavedSplunker Log_Level 394.31% 1.182 0.000 1.182201 WARN (62.25% -> 100.00%) Component databasePartitionPolicy Log_Level 95.50% 1.182 0.417 0.765017 INFO (33.15% -> 91.57%) Component loader Log_Level 79.17% 1.182 0.050 1.131883 INFO (33.15% -> 99.44%) Component timeinvertedIndex Log_Level 44.28% 1.182 0.000 1.182201 INFO (33.15% -> 100.00%) 35
  • 36. Top Fields by Fields Most common Log_Level by Component: ... | top Log_Level by Component Component Log_Level count percent ---------------------------------- --------- ----- ---------- AdminManager WARN 1 100.000000 DatabaseDirectoryManager WARN 153 100.000000 DateParserVerbose WARN 262 100.000000 DedupProcessor ERROR 1 100.000000 DeploymentClient DEBUG 60 85.714286 DeploymentClient WARN 5 7.142857 36
  • 37. How do I to detect anomalous events? 37
  • 38. Types of Anomalies Anomalies you know about Anomalies you don’t know about 38
  • 39. Handling Known Anomalies. Easy. Define a search for the anomalous condition and make an alert to detect it. ip=10.* NOT domain=mycompany.com … | stats perc99(spent)  500ms. Alert on “spent>500” 39
  • 40. Finding Unknown Anomalies Look for Abnormal • Single-Field Values • Multi-Field Values • Contexts • Visual Inspections… 40
  • 41. Anomalies by Single Field Values Identify anomalous values in a given field either by frequency of occurrence or number of standard deviations from the mean. … | anomalousvalue action=summary pthresh=0.02 | search isNum=YES 41
  • 42. Anomalies by Single Field Values 42
  • 43. Anomalous by Many Values Look for small clusters – by content, format, and time – to find anomalies. For example… …| cluster …| sort cluster_count 43
  • 44. Smallest Clusters by Content count label uri 1 7 /img/skins/default/bolt.png 1 37 /en-US/search/inspector?sid=1345075042.125&namespace=search 1 45 /services/admin/summarization?count=10 1 53 /services/pdfgen/is_available?viewId=index_status_health&... 1 57 /static/splunkrc_cmds.xml 44
  • 45. Small Clusters: Bursts of One Find bursts of just a single events where a pause of 2 seconds occurred around it. … |transaction maxpause=2s | search eventcount = 1 Mar 10 16:46:32 willLaptop ntpd[2544]: synchronized to 138.23.180.126… Mar 10 16:46:27 willLaptop ntpd[2544]: synchronized to LOCAL(0), stratum… Mar 10 16:42:09 willLaptop ntpd[2544]: time reset -0.236567… 45
  • 46. Burst of One Same idea, different data source: splunk [11:58:08] "POST /services/search/jobs/export HTTP/1.1" 200 201630 … [11:12:51] "POST /services/search/jobs/export HTTP/1.1" 200 459441 … [10:00:58] "GET /servicesNS/nobody/SplunkDeploymentMonitor/backfill/… 46
  • 47. Anomalous by Context Identify values not expected by the context of other events. … | anomalies field=file labelonly=true maxvalues=10 47
  • 48. Anomalous by Context Unexpectedness file 0.00 shelper 0.16 shelper 0.00 1345502591.356 0.00 1345502591.356 0.00 1345074401.191 0.00 1345074031.153 time 0.03 1345074328.186 0.00 1345502591.356 0.35 conf-dm_backfill 0.00 1345074309.185 0.00 1345502591.356 48
  • 49. Surprise Eventtype: Part Deux! Classified major categories of your data with eventtypes? -- just search for things that don’t match those eventtypes 49
  • 50. 50
  • 51. Once you can describe anomalous behavior as a search… 51
  • 52. 52
  • 53. Other mining commands • kmeans: Performs k-means clustering on selected fields. • outlier: Removes outlying numerical values. • af (analyze fields): Analyzes numerical fields for their ability to predict another discrete field • fieldsummary : Generates summary information fields. • shape: Produces a symbolic 'shape' attribute describing the shape of a numeric multivalued field 53
  • 54. Why do I need to visualize my data? 54
  • 55. Data Mining by Visualization Visualization can capture nuances in the data that numerical or linguistic summaries cannot easily capture. 55
  • 56. These data points are radically different. *Source: Anscombe’s Quartet (Anscombe 1973) 56
  • 57. Why visualize? Because they all have the exact same • average (7.50) • standard deviation (2.03) • least-squares fit (3 + 0.5x). Do not just rely on numerical summarization. 57
  • 58. But I already have charts! You don’t graph enough. Data Exploration Don’t decide ahead of time what graphs you want Regularly do out-of-the-box scenarios with graphs 58
  • 59. Data Exploration Variations: • Subsets of Events (paying customers vs lookers) • Fields by Fields (including eventtypes and tags) • Ignored fields • Min/max/avg/count • Compare to other times windows • Transactions 59
  • 60. Visual Arrangement Sorting data, Changing Scales (Linear/Log), Min/Max can have a huge difference on looking at the same data. 60
  • 61. Visual Considerations Pick representations that make obvious the distinctions you need to care about. 61
  • 62. Summary 62
  • 63. Summary • Discovery is an iterative process. • Group events by content, format, and time, and define classifications with eventtypes and tags • Focus on promising fields with correlations • Discover unknown anomalies with small clusters. • Visualize your data, from a dozen angles. 63
  • 64. But wait! 64
  • 65. More to come: Predictive Analytics … | forecast foo 65
  • 66. The End Mine the Gap. .,`...,`...,`...,`...,`...,`...,`...,`...,`...,`...,`...,`... .,`......_.,`...,`...,`...,`...,`...,`...,`...,`...,`....._.. ...___..|.|...__._..._.__.,`..._.__.,`..___...__.,`...__.|.|. ../.__|.|.|../._`.|.|.'_.....|.'_..../._...../././.|.|. .|.(__..|.|.|.(_|.|.|.|_).|...|.|.|.|.|.(_).|...V..V./..|_|. ..___|.|_|..__,_|.|..__/....|_|.|_|..___/...._/_/...(_). .,`...,`...,`...,`..|_|.,`...,`...,`...,`...,`...,`...,`..... Golf clapping at #datamining .,`...,`...,`...,`...,`...,`...,`...,`...,`...,`...,`...,`... 66

Editor's Notes

  1. ----- Meeting Notes (9/7/12 14:21) -----[ASK AUDIENCE -- WHAT IS DATA MINING?]
  2. No. Explicit. Learning nothing new. Not significant in meaning.I’m explicitly telling you what it is. You’re not mining it. By looking at the data, you’re not learning anything new by me saying this is an orange. And frankly it’s not useful.
  3. Regularities, patterns, anomalies that are interesting, meaning not obvious, explicit inferences, and at the same time not coincidental or noisy inferences.
  4. Yellow is SodaBlue is PopRed is Coke
  5. Before we can really mine a bunch of text for valuable information, we need to do some prep work. We need to understand our data – the dimensions, the sets of values. In Splunk terms – create fields, eventtypes, transactions, etc.By adding fields, you’re mining out dimensions; by adding eventtypes, you’re mining classes; my adding transactions, you’re mining correlations; etc.BUT… Prepping the data for mining is a data mining task of sorts in itself, and the line between understanding your data and mining is really non-existent. This before-work is sometimes called Data Exploration.
  6. The more knowledge you can add to Splunk about your data the more options you’ll have to analyze it.There maybe data cleaning involved.
  7. You can go from groups of events to understanding events to understanding fields to understanding normality/anomalies to generating reports. But the truth is, this is an iterative process. Each step tells you more about something else. (Un)fortunately, this presentation is linear.
  8. Raw values, like raw text.
  9. Make eventtypes for “session opened”, “session closed”, “linux initialized”. Tag them. Then mine out questions like “how long is the average session?, “how much churn is there?”, etc
  10. Consider linecount as well.
  11. Make eventtypes or tags for cron jobs, ntpd, dhclient. Then mine out questions like “who is running what jobs? Which are the most common?
  12. One of the most useful ways to see how your individual events relate to each other is to look for pauses in your events, as real-physical events often happen in bursts. For example, there are bursts of log activity:When you shutdown a computerWhen you access a web page, which has many images.When a car factory robot detects the next carWhen you turn on a printer and it connects to your computerWhen you scan your security badge
  13. Make transactions for sessions opening and closing. Find unclosed transactions. How often, how many, by whom?
  14. No reason to limit correlations to a particular data source. Splunk can easily correlate them together in one search.Search isn’t correct in that the dedup is removing important consecutive events, but it was useful for showing small correlated events across sources.
  15. If facebook had eventtypes, you’d define any picture that has any of your family members but no co-workers as a ‘family’ pix that you could then have a virtual photo album for. Any pix with a family member outside the bayarea as a “family vacation” pix.When you search your data, you’re essentially weeding out all unwanted events; the results of your search are events that share common characteristics, and you can give them a collective name or “event type”. The names of your event types are added as values into an eventtype field. This means that you can search for, and report on, these groups of events the same way you search for any field. The following example takes you through the steps to save a search as an eventtype and then searching for that field. If you run frequent searches to investigate SSH and firewall activities, such as sshd logins or firewall denies, you can save these searches as an event type. Also, if you see error messages that are cryptic, you can save it as an event type with a more descriptive name.
  16. If facebook had eventtypes, you’d define any picture that has any of your family members but no co-workers as a ‘family’ pix that you could then have a virtual photo album for. Any pix with a family member outside the bayarea as a “family vacation” pix.When you search your data, you’re essentially weeding out all unwanted events; the results of your search are events that share common characteristics, and you can give them a collective name or “event type”. The names of your event types are added as values into an eventtype field. This means that you can search for, and report on, these groups of events the same way you search for any field. The following example takes you through the steps to save a search as an eventtype and then searching for that field. If you run frequent searches to investigate SSH and firewall activities, such as sshd logins or firewall denies, you can save these searches as an event type. Also, if you see error messages that are cryptic, you can save it as an event type with a more descriptive name.
  17. If facebook had eventtypes, you’d define any picture that has any of your family members but no co-workers as a ‘family’ pix that you could then have a virtual photo album for. Any pix with a family member outside the bayarea as a “family vacation” pix.
  18. If facebook had eventtypes, you’d define any picture that has any of your family members but no co-workers as a ‘family’ pix that you could then have a virtual photo album for. Any pix with a family member outside the bayarea as a “family vacation” pix.
  19. If facebook had eventtypes, you’d define any picture that has any of your family members but no co-workers as a ‘family’ pix that you could then have a virtual photo album for. Any pix with a family member outside the bayarea as a “family vacation” pix.
  20. Why? Reduce the number of fields you should focus on to those with the most value. For analysis and graphing
  21. Why? Reduce the number of fields you should focus on to those with the most value. For analysis and graphing
  22. A 1.0 means two fields always co-occur. For example, Component and Log_Level always co-occur in splunkd.log. You can filter out fields to make this table more manageable.
  23. ----- Meeting Notes (9/4/12 11:49) -----give splunkd example output first to show log
  24. This shows that before we know the component is SavedSplunker, the odds of a WARN Log_Level is 62.25%; afterwords, the odds are 100%. Before we know the component is loader, the odds of INFO Log_Level is 33.15%; afterwards, 99.44%.
  25. What are anomalies/outliers?The set of data points that are considerably differentApplications: network intrusion detection, fault detection, credit card fraud detection, telecommunication fraud detection– Build a profile of the “normal” behavior – patterns, stats to detect anomaliesVery often you want to find “problems” in your IT data, but you don’t know what to look for. If you know what to look for, by all means, look.
  26. Very often you want to find “problems” in your IT data, but you don’t know what to look for. If you know what to look for, by all means, look.… | eventstats perc99(spent) as bigspender | where spent &gt; bigspender
  27. Very often you want to find anomalies/problems in your IT data, but you don’t know what to look for. Single Value: – ‘port’ value is highly irregularMany Values: – many values look different than othersAnomalous: – many values were unexpected by contextEvernything applies to transactions as well. Look for anomalies
  28. Identifies values in the data that are anomalous either by frequency of occurrence or number of standard deviations from the mean. Make searches to find these anomalous values and create alerts.
  29. catNormFreq = the average frequency of non-anomalous valuesisNum means all values of the field were numerical.basically we assume a normal distribution, but if we find that ends up causing too many values to be anomalous we don&apos;t use it
  30. Earlier we looked for large clusters to get a broad understanding of the events. We grouped by content, format, and time.Now, just flip it. Make searches to find these anomalous values and create alerts.
  31. Same for for form (looking for unusual punctuation) or especially long pauses between events (10 seconds?)Make searches to find these anomalous values and create alerts.
  32. . These slow events are often important and indicate longer tasks.
  33. Make eventtypes or tags for these slow, important events. Who runs them most? Are they a problem? Why is someone exporting, or backfilling their data? Make an alert when it happens.
  34. Experimental search command that uses compression and a window of N last events to see if a new command compresses well with past events, or if it looks unexpected.Make searches to find these anomalous values and create alerts.
  35. Make searches to find these anomalous values and create alerts.
  36. One of the most obvious and important methods of discovering what your data is saying is to simply graph your data.Humans have a well-developed ability to analyze large amounts of data presented visually, detecting general patterns and trends, as well as outliers and unusual patterns.
  37. What data points are outliers? what inferences would you make?radically different.
  38. Limitations of Statistical Approaches:   usually tests a single attribute. distributions aren’t known  for many dimensions, hard to estimate the true distribution Do not just rely on numerical summarization, or you won’t see what’s going on.
  39. Same for transactions of events, and classes of events (eventtypes) and field-values (tags)
  40. Eventually you’ll tweak out little nuggets of knowledge.Over time, what is the average duration users spend on my website by language of country, compared to last month.How does the time on the website correlate with the time of day, or browserdoes the max delay for each server vary over time by languageSame for transactions of events, and classes of events (eventtypes) and field-values (tags)
  41. .  So reducing the number of dimensions down to 2 or 3 for visualization and limiting the data shown
  42. Heat map vs much more useful chart
  43. Discovery: Each step tells you more about everything else.
  44. predicting foo and getting better and better at it, and towards the right edge you can see it&apos;s predicting values that haven&apos;t happened yet&quot;