Welcome back to my Splunk series. Let’s continue our journey with Splunk.
What data can Splunk ingest? Let us take a view of Splunk at 1000 feet.
One thing that Splunk strives for is they can ingest any data. Splunk software collects and indexes data from virtually any source, whether structured in databases, unstructured in a data lake, or previously unknown (dark). Machine data is dark; this high-volume, high-velocity data is highly variable and incredibly diverse — and simply overwhelming for traditional system management, SIEM, CEP/ECA and log management.
The green box in the image gives you an idea of devices and infrastructure that you can gather data from.
Why do I need infrastructure? You need to make this infrastructure decision upfront. Do you want your data to be in a single instance, or distributed? If you choose a single instance you cannot change it, you have to build a new infrastructure for distribution.
In the image above you can see some recommend Specs, we will talk about that later in this blog.
Architecture is a big subject when it comes to Splunk. Each piece has a responsibility and not all pieces are required. Here is a high-level Architecture of Splunk.
Required fundamental components
Indexer process the incoming data in real-time. It also stores and indexes the data on disk. An indexer has two types:
- peer node in a cluster
End-users interact with Splunk through search head(s). It allows users to do search, analysis & Visualization. A search head has five types:
Independent search head
A search head node of an indexer cluster
A member of a search head cluster
A search head node of an indexer cluster and a member of a search head cluster
A member of a search head pool
Forwarder collects the data from remote machines then forwards data to the Index in real-time. A search head has three types:
- Universal Forwarders provide reliable, secure data collection from remote sources and forward that data into Splunk software for indexing and consolidation. They can scale to tens of thousands of remote systems, collecting terabytes of data.
- A heavy forwarder has a smaller footprint than a Splunk Enterprise indexer but retains most of the capabilities of an indexer. An exception is that it cannot perform distributed searches. You can disable some services, such as Splunk Web, to further reduce its footprint size.
- Unlike other forwarder types, a heavy forwarder parses data before forwarding it and can route data based on criteria such as source or type of event. It can also index data locally while forwarding the data to another indexer.
- A light forwarder has a smaller footprint with much more limited functionality. It forwards only unparsed data. The universal forwarder, which provides very similar functionality, supersedes it. The light forwarder has been deprecated but continues to be available mainly to meet legacy needs.
The deployment server helps to deploy the configuration. For example, update the universal forwarder configuration file. We can use a deployment server to share between the component we can use the deployment server.
The master node manages the cluster. It coordinates the replicating activities of the peer nodes and tells the search head where to find data. It also helps manage the configuration of peer nodes and orchestrates remedial activities if a peer goes offline.
Unlike the peer nodes, the master does not index external data. A cluster has exactly one master node.
The deployer is a Splunk Enterprise instance that distributes apps and other configurations to the cluster members. It stands outside the cluster and cannot run on the same instance as a cluster member. It can, however, under some circumstances, reside on the same instance as other Splunk Enterprise components, such as a deployment server or an indexer cluster master node.
The license master controls one or more license slaves. From the license master, you can define stacks, pools, add licensing capacity, and manage license slaves.
The Monitoring Console is a search-based monitoring tool that lets you view detailed information about the topology and performance of your Splunk Enterprise deployment. The Monitoring Console provides pre-built dashboards that give you visibility into many areas of your deployment, including search and indexing performance, resource usage, license usage, and more. You can use the Monitoring Console to track the status of all types of deployment topologies, from single-instance (standalone) deployments to complex multi-site indexer clusters.
To support larger environments where data originates on many machines in many locations and where many users across the globe need to search the data, you can scale your deployment by distributing Splunk instances across multiple machines and multiple locations.
Splunk performs three key functions as it processes data:
It ingests data
It parses and indexes the data
It runs searches on the indexed data
To scale Splunk, you can split this functionality across multiple specialized instances of Splunk. These instances can range in number from just a small amount to many thousands, depending on the quantity of data that you are dealing with and other variables in your environment. Below is an image of a possible Splunk deployment across multiple regions.
You should have a 101 mentality of Splunk and its architecture. In the next Splunk blogs, I will dive deeper into the information that is required to size the Splunk environment correctly.