Splunk Interview Q & A

Splunk Interview

Part 1:

1) Define Splunk

It is a software technology that is used for searching, visualizing, and monitoring machine-generated big data. It monitors and different types of log files and stores data in Indexers.

2) List out common ports used by Splunk.

Common ports used by Splunk are as follows:

  • Web Port: 8000
  • Management Port: 8089
  • Network port: 514
  • Index Replication Port: 8080
  • Indexing Port: 9997
  • KV store: 8191

3) Explain Splunk components

The fundamental components of Splunk are:

  • Universal forward: It is a lightweight component which inserts data to Splunk forwarder.
  • Heavy forward: It is a heavy component that allows you to filter the required data.
  • Search head: This component is used to gain intelligence and perform reporting.
  • License manager: The license is based on volume & usage. It allows you to use 50 GB per day. Splunk regular checks the licensing details.
  • Load Balancer: In addition to the functionality of default Splunk loader, it also enables you to use your personalized load balancer.

4) What do you mean by Splunk indexer?

It is a component of Splunk Enterprise which creates and manages indexes. The primary functions of an indexer are 1) Indexing raw data into an index and 2) Search and manage Indexed data.

5) What are the disadvantages of using Splunk?

Some disadvantages of using Splunk tool are:

  • Splunk can prove expensive for large data volumes.
  • Dashboards are functional but not as effective as some other monitoring tools.
  • Its learning curve is stiff, and you need Splunk training as it’s a multi-tier architecture. So, you need to spend lots of time to learn this tool.
  • Searches are difficult to understand, especially regular expressions and search syntax.

6) What are the pros of getting data into a Splunk instance using forwarders?

The advantages of getting data into Splunk via forwarders are TCP connection, bandwidth throttling, and secure SSL connection for transferring crucial data from a forwarder to an indexer.

7) What is the importance of license master in Splunk?

License master in Splunk ensures that the right amount of data gets indexed. It ensures that the environment remains within the limits of the purchased volume as Splunk license depends on the data volume, which comes to the platform within a 24-hour window.

8) Name some important configuration files of Splunk

Commonly used Splunk configuration files are:

  • Inputs file
  • Transforms file
  • Server file
  • Indexes file
  • Props file

9)  Explain license violation in Splunk.

It is a warning error that occurs when you exceed the data limit. This warning error will persist for 14 days. In a commercial license, you may have 5 warnings within a 1-month rolling window before which your Indexer search results and reports stop triggering.

However, in a free version, license violation warning shows only 3 counts of warning.

10) What is the use of Splunk alert?

Alerts can be used when you have to monitor for and respond to specific events. For example, sending an email notification to the user when there are more than three failed login attempts in a 24-hour period.

11) Explain map-reduce algorithm

Map-reduce algorithm is a technique used by Splunk to increase data searching speed. It is inspired by two functional programming functions 1) reduce () 2) map().

Here map() function is associated with Mapper class and reduce() function is associated with a Reducer class.

12) Explain different types of data inputs in Splunk?

Following are different types of data inputs in Splunk:

  • Using files and directories as input
  • Configuring Network ports to receive inputs automatically
  • Add windows inputs. These windows inputs are of four types: 1) active directory monitor, 2) printer monitor, 3) network monitor, and 4) registry inputs monitor.

13) How Splunk avoids duplicate log indexing?

Splunk allows you to keeps track of indexed events in a fish buckets directory. It contains CRCs and seeks pointers for the files you are indexing, so Splunk can’t if it has read them already.

14) Explain pivot and data models.

Pivots are used to create the front views of your output and then choose the proper filter for a better view of this output. Both options are beneficial for the people from a semi-technical or non-technical background.

Data models are most commonly used for creating a hierarchical model of data. However, it can also be used when you have a large amount of unstructured data. It helps you make use of that information without using complicated search queries.

15) Explain search factor and replication factor?

Search factor determines the number of data maintained by the indexer cluster. It determines the number of searchable copies available in the bucket.

Replication factor determines the number of copies maintained by the cluster as well as the number of copies that each site maintains.

16) What is the use of lookup command?

Lookup command is generally used when you want to get some fields from an external file. It helps you to narrow the search results as it helps to reference fields in an external file that match fields in your event data.

17) Explain default fields for an event in Splunk

There are 5 default fields which are barcoded with every event into Splunk. They are: 1) host, 2) source, 3) source type, 4) index, and 5) timestamp.

18) How can you extract fields?

In order to extract fields from either sidebar, event lists or the settings menu using UI.

Another way to extract fields in Splunk is to write your regular expressions in a props configuration file.

19) What do you mean by summary index?

A summary index is a special index that stores that result calculated by Splunk.  It is a fast and cheap way to run a query over a longer period of time.

20) How to prevent events from being indexed by Splunk?

You can prevent the event from being indexed by Splunk by excluding debug messages by putting them in the null queue. You have to keep the null queue in transforms.conf file at the forwarder level itself.

21) Define Splunk DB connect

It is a SQL database plugin which enables to import tables, rows, and columns from a database add the database. Splunk DB connect helps in providing reliable and scalable integration between databases and Splunk Enterprises.

22) Define Splunk buckets

It is the directory used by Splunk enterprise to store data and indexed files into the data.  These index files contain various buckets managed by the age of the data.

23) What is the function of Alert Manager?

The alert manager adds workflow to Splunk. The purpose of alert manager o provides a common app with dashboards to search for alerts or events.

24) How can you troubleshoot Splunk performance issues?

Three ways to troubleshoot Splunk performance issue.

  • See server performance issues.
  • See for errors in splunkd.log.
  • Install Splunk app and check for warnings and errors in the dashboard.

25) What is the difference between Index time and Search time?

Index time is a period when the data is consumed and the point when it is written to disk. Search time take place while the search is run as events are composed by the search.

26) How to reset the Splunk administrator password?

In order to reset the administrator password, perform the following steps:

  1. Login into the server on which Splunk is installed
  2. Rename the password file and then again start the Splunk.
  3. After this, you can sign into the server by using username either administrator or admin with a password changeme.

27)  Name the command which is used to the “filtering results” category

The command which is used to the “filtering results” category is: “where,” “Sort,” “rex,” and “search.”

28)  List out different types of Splunk licenses

The types of Splunk licenses are as follows:

  • Free license
  • Beta license
  • Search heads license
  • Cluster members license
  • Forwarder license
  • Enterprise license

29) List out the number of categories of the SPL commands.

The SPL commands are classified into five categories:

1) Filtering Results, 2) Sorting Results, 3) Filtering Grouping Results, 4) Adding Fields, and 5) Reporting Results.

30) What is eval command?

This command is used to calculate an expression. Eval command evaluates boolean expressions, string, and mathematical articulations. You can use multiple eval expressions in a single search using a comma.

31) Name commands which are included in the reporting results category

Following are the commands which are included in the reporting results category:

  • Rare
  • Chart
  • time chart
  • Top
  • Stats

32) What is SOS?

Splunk on Splunk or SOS is a Splunk app that helps you to analyze and troubleshoot Splunk environment performance and issues.

33) What is a replace command?

This command searches and replaces specified field values with replacement values.

34) Name features which are not available in Splunk free version?

Splunk free version lacks the following features:

  • Distributed searching
  • Forwarding in HTTP or TCP
  • Agile statistics and reporting with Real-time architecture
  • Offers analysis, search, and visualization capabilities to empower users of all types.
  • Generate ROI faster

35) What is a null queue?

A null queue is an approach to filter out unwanted incoming events sent by Splunk enterprise.

36) Explain types of search modes in Splunk?

There are three types of search modules. They are:

  • Fast mode: It increases the searching speed by limiting search data.
  • Verbose mode: This mode returns all possible fields and event data.
  • Smart mode: It is a default setting in a Splunk app. Smart mode toggles the search behavior based on transforming commands.

37) What is the main difference between source & source type

The source identifies as a source of the event which a particular event originates, while the sourcetype determines how Splunk processes the incoming data stream into events according to its nature.

38) What is a join command?

It is used to combine the results of a sub search with the results of the actual search. Here the fields must be common to each result set. You can also combine a search set of results to itself using the selfjoin command in Splunk.

39) How to start and stop Splunk service?

To start and stop Splunk serives use can use following commands:

1

2

./splunk start

./splunk stop

40)  Where to download Splunk Cloud?

Visit website: https://www.splunk.com/ to download a free trial of Splunk Cloud.

41) What is the difference between stats and timechart command?

Parameter

Stats

Timechart

Purpose

They are used to represent numerical data in tabular format.

Timechart is used to represent search result in a graphical view.

Fields usage

Stats can use more than one field.

It uses _time as default field in the graph.

42) Define deployment server

Deployment server is a Splunk instance that acts as a centralized configuration manager. It is used to deploy the configuration to other Splunk instances.

43) What is Time Zone property in Splunk?

Time zone property provides the output for a specific time zone. Splunk takes the default time zone from browser settings. The browser takes the current time zone from the computer system, which is currently in use. Splunk takes that time zone when users are searching and correlating bulk data coming from other sources.

44) What is Splunk sound unit connect?

Splunk sound unit is a plugin which allows adding info data with Splunk reports. It helps in providing reliable and ascendible integration between relative databases and Splunk enterprises.

45) How to install forwarder remotely?

You can make use of a bash script in order to install forwarder remotely.

46) What is the use of syslog server?

Syslog server is used to collect data from various devices like routers and switches and application logs from the web server. You can use R syslog or syslog NG command to configure a Syslog server.

47) How to monitor forwarders?

Use the forwarder tab available on the DMC (Distributed Management Console) to monitor the status of forwarders and the deployment server to manage them.

48) What is the use of Splunk btool?

It is a command-line tool that is designed to solve configuration related issues.

49) Name Splunk alternatives

Some Splunk alternatives are:

  • Sumo logic
  • Loglogic
  • Loggy
  • Logstash

50) What is KV store in Splunk?

Key Value( KV) allows to store and obtain data inside Splunk. KV also helps you to:

  • Manage job queue
  • Store metadata
  • Examine the workflow

51) What do you mean by deployer in Splunk?

Deployer is a Splunk enterprise instant which is used to deploy apps to the cluster head. It can also be used to configure information for app and user.

52) When to use auto_high_volume in Splunk?

It is used when the indexes are of high volume, i.e., 10GB of data.

53) What is a stat command?

It is a Splunk command that is used to arrange report data in tabular format.

54) What is a regex command?

Regex command removes results which do not match with desired regular expression.

55) What is input lookup command?

This Splunk command returns lookup table in the search result.

56) What is the output lookup command?

Output lookup command searches the result for a lookup table on the hard disk.

57) List out various stages of bucket lifecycle

Stages of bucket lifecycle are as follows:

  • Hot
  • Warm
  • Cold
  • Frozen
  • Thawed

58) Name stages of Splunk indexer

Stages of Splunk indexer are:

  • Input
  • Parsing
  • Indexing
  • Searching

59) Explain the distinction between Splunk and Spark

Parameter

Splunk

Spark

Purpose

Collect a large amount of computer-generated data.

Used for big data processing

Preference

Can be integrated easily with Hadoop

It is more preferred and can be used with apache projects.

Mode

Streaming mode

Streaming as well as batch mode

60) Explain how Splunk works?

There are three phases in which Splunk works:

  • The first phase: It generates data and solves query from various sources.
  • The second phase: It uses the data to solve the query.
  • Third phase: it displays the answers via graph, report, or chart which is understood by audiences.

61) What are three versions if Splunk?

Splunk is available in three different versions. These versions are 1) Splunk enterprise, 2) Splunk light, 3) Splunk cloud.

  • Splunk enterprise: Splunk Enterprise edition is used by many IT organizations. It helps you to analyze the data from various websites and applications.
  • Splunk cloud: Splunk Cloud is a SaaS (Software as a Service) It offers almost similar features as the enterprise version, including APIs, SDKs, and apps.
  • Splunk light: Splunk light is a free version which allows, to make a report, search and edit your log data. Splunk light version has limited functionalities and features compared to other versions.

62) Name companies which are using Splunk

Well known companies which are using Splunk tool are:

  • Cisco
  • Facebook
  • Bosch
  • Adobe
  • IBM
  • Walmart
  • Salesforce

63) What is SLP?

Search Processing Language or SLP is a language which contains functions, commands, and arguments. It is used to get the desired output from the database.

64) Define monitoring in Splunk

Monitoring is a term related to reports you can visually monitor.

65) Name the domain in which knowledge objects can be used

Following are a few domains in which knowledge objects can be used:

  • Application Monitoring
  • Employee Management
  • Physical Security
  • Network Security

66) How many roles are there in Splunk?

There are three roles in Splunk: 1) Admin, 2) Power, and 3) User.

67) Are search terms in Splunk case sensitive?

No, Search terms in Splunk are not case sensitive.

68) Can search results be used to change the existing search?

Yes, the search result can be used to make changes in an existing search.

69)  List out layout options for search results.

Following are a few layout options for search result:

  • List
  • Table
  • Raw

70) What are the formats in which search result be exported?

The search result can be exported into JSON, CSV, XML, and PDF.

71) Explain types of Boolean operators in Splunk.

Splunk supports three types of Boolean operators; they are:

  • AND: It is implied between two terms, so you do not need to write it.
  • OR: It determines that either one of the two arguments should be true.
  • NOT: used to filter out events having a specific word.

72) Explain the use of top command in Splunk

The top command is used to display the common values of a field, with their percentage and count.

73) What is the use of stats command?

It calculates aggregate statistics over a dataset, such as count, sum, and average.

74) What are the types of alerts in Splunk?

There are mainly three types of alerts available in Splunk:

  • Scheduled alert: It is an alert that is based on a historical search. It runs periodically with a set schedule.
  • Per result alert: This alert is based on a real time search which runs overall time.
  • Rolling window alert: An alert that is based on real-time search. This search is set to run within a specific rolling time window that you define.

75) List various types of Splunk dashboards.

  • Dynamic form-based dashboards
  • Dashboards as scheduled reports
  • Real time dashboards

76) What is the use of tags in Splunk?

They are used to assign names to specific filed and value pairs. The filed can be event type, source, source type, and host.

77) How to increase the size of Splunk data storage?

In order to increase the size of data storage, you can either add more space to index or add more indexers.

78) Distinguish between Splunk apps and add-ons

There is only one difference between Splunk apps, and add-ons that is Splunk apps contains built-in reports, configurations, and dashboards. However, Splunk add-ons contain only built-in configurations they do not contain dashboards or reports.

79) Define dispatch directory in Splunk?

Dispatch directory stores status like running or completed.

80) What is the primary difference between stats and eventstats commands

Stats command provides summary statistics of existing fields available in search output, and then it stores them as values in new fields. On the other hand, in eventstats command aggregation results are added so that every event only if the aggregation applies to that particular event.

81) What do you mean by source type in Splunk?

Source field is a default field that finds the data structure of an event. It determines how Splunk formats the data while indexing.

82) Define calculated fields?

Calculated fields are the fields which perform the calculation which the values of two fields available in a specific event.

83) List out some Splunk search commands

Following are some search commands available in Splunk:

  • Abstract
  • Erex
  • Addtotals
  • Accum
  • Filldown
  • Typer
  • Rename
  • Anomalies

84) What does xyseries command do?

xyseries command converts the search results into a format that is suitable for graphing.

85)  What is the use of spath command?

spath command is used to extract fields from structured data formats like JSON and XML.

86) How to adds summary statistics to all results in a streaming manner?

In order to add summary statistics in results, you can use streamstats.

87) Where to create knowledge objects, dashboards, and reports?

You can create knowledge, objects, reports, and dashboards in reporting and search app.

88) What is table command?

This command returns all fields of table in the argument list.

89) How to remove duplicate events having common values?

Use dedup command to remove duplicate events having common values.

90) What is the main difference between sort + and sort -?

  • sort + displays search in ascending order
  • sort – displays search in descending order.

91) Define reports in Splunk

They are results saved from a search action that shows the visualization and statistic of a particular event.

92) Define dashboard in Splunk

The dashboard is defined as a collection of views that are made of various panels.

93) What is the use of instant pivot in Splunk?

It is used to work with data without creating any data model.

Instant pivot is available to all users.

94) How is it possible to use the host value and not IP address or the DNS name for a TCP input?

Under stanza in the input configuration file, set the connection_host to none and mention the host value.

95) What is the full form of LDAP?

LDAP stands for Lightweight Directory Access Protocol

96) Define search head pooling

It is a group of servers connected with each other. These servers are used to share configuration, user data, and load.

97) Define search head clustering

It is a group of Splunk enterprise search heads that serves as a central resource for searching.

98) What is the full form of REST?

The abbreviation of REST is Representational State Transfer

99) Explain Splunk SDKs

Splunk SDKs are written on the base of Splunk REST APIs. Various languages supported by SDKs are: 1) Java, 2) Python, 3) JavaScript, and 4) C#.

100) Explain Splunk REST API

The Splunk REST API offers various processes for accessing every feature available in the product. Your program communicates to Splunk enterprise using HTTP or HTTPS. It uses the same protocols that any web browser uses to interact with web pages.

101) What is security accelerate data model in Splunk?

Splunk Enterprise Security accelerates data model provides a panel, dashboard, and correlation search results. It uses the indexers for processing and storage. The accelerated data is stored within each index by default.

102) Explain how indexer stores various indexes?

Indexers create various files which contain two types of data: 1) Raw data and 2) metadata index file. Both these files are used to constitute Splunk enterprise index.

Part 2:

  1. Compare Splunk with Spark.

Criteria

Splunk

Spark

Deployment area

Collecting large amounts of machine-generated data

Iterative applications and in-memory processing

Nature of tool

Proprietary

Open-source

Working mode

Streaming mode

Both streaming and batch modes

  1. What is Splunk?

Splunk is ‘Google’ for our machine-generated data. It’s a software/engine that can be used for searching, visualizing, monitoring, reporting, etc. of our enterprise data. Splunk takes valuable machine data and turns it into powerful operational intelligence by providing real-time insights into our data through charts, alerts, reports, etc.

  1. What are the common port numbers used by Splunk?

Below are the common port numbers used by Splunk. However, we can change them if required.

Service

Port Number Used

Splunk Web port

8000

Splunk Management port

8089

Splunk Indexing port

9997

Splunk Index Replication port

8080

Splunk Network port

514 (Used to get data from the Network port, i.e., UDP data)

KV Store

8191

  1. What are the components of Splunk? Explain Splunk architecture.

This is one of the most frequently asked Splunk interview questions. Below are the components of Splunk:

  • Search Head: Provides the GUI for searching
  • Indexer: Indexes the machine data
  • Forwarder: Forwards logs to the Indexer
  • Deployment Server:Manges Splunk components in a distributed environment
  1. Which is the latest Splunk version in use?

Splunk 8.2.1 (as of June 21, 2021)

  1. What is Splunk Indexer? What are the stages of Splunk Indexing?

Splunk Indexer is the Splunk Enterprise component that creates and manages indexes. The primary functions of an indexer are:

  • Indexing incoming data
  • Searching the indexed data
  • Picture

Bottom of Form

  1. What is a Splunk Forwarder? What are the types of Splunk Forwarders?

There are two types of Splunk Forwarders as below:

  • Universal Forwarder (UF): The Splunk agent installed on a non-Splunk system to gather data locally; it can’t parse or index data.
  • Heavyweight Forwarder (HWF): A full instance of Splunk with advanced functionalities.

It generally works as a remote collector, intermediate forwarder, and possible data filter, and since it parses data, it is not recommended for production systems.

  1. Can you name a few most important configuration files in Splunk?
  • conf
  • conf
  • conf
  • conf
  • conf
  1. What are the types of Splunk Licenses?
  • Enterprise license
  • Free license
  • Forwarder license
  • Beta license
  • Licenses for search heads (for distributed search)
  • Licenses for cluster members (for index replication)
  1. What is Splunk App?

Splunk app is a container/directory of configurations, searches, dashboards, etc. in Splunk.

  1. Where is Splunk Default Configuration stored?

$splunkhome/etc/system/default

  1. What are the features not available in Splunk Free?

Splunk Free does not include below features:

  • Authentication and scheduled searches/alerting
  • Distributed search
  • Forwarding in TCP/HTTP (to non-Splunk)
  • Deployment management
  1. What happens if the License Master is unreachable?

If the license master is not available, the license slave will start a 24-hour timer, after which the search will be blocked on the license slave (though indexing continues). However, users will not be able to search for data in that slave until it can reach the license master again.

  1. What is Summary Index in Splunk?

A summary index is the default Splunk index (the index that Splunk Enterprise uses if we do not indicate another one).

If we plan to run a variety of summary index reports, we may need to create additional summary indexes.

Learn more about Splunk from this Splunk Training in New York to get ahead in your career!

  1. What is Splunk DB Connect?

Splunk DB Connect is a generic SQL database plugin for Splunk that allows us to easily integrate database information with Splunk queries and reports.

  1. Can you write down a general regular expression for extracting the IP address from logs?

There are multiple ways in which we can extract the IP address from logs. Below are a few examples:

By using a regular expression:

rex field=_raw  “(?<ip_address>\d+\.\d+\.\d+\.\d+)”

OR

rex field=_raw  “(?<ip_address>([0-9]{1,3}[\.]){3}[0-9]{1,3})”

  1. Explain Stats vs Transaction commands.

This is another frequently asked interview questions on splunk which will test Developer or Engineers knowledge. The transaction command is the most useful in two specific cases:

  • When the unique ID (from one or more fields) alone is not sufficient to discriminate between two transactions. This is the case when the identifier is reused, for example, web sessions identified by a cookie/client IP. In this case, the time span or pauses are also used to segment the data into transactions.
  • When an identifier is reused, say in DHCP logs, a particular message identifies the beginning or end of a transaction.
  • When it is desirable to see the raw text of events combined rather than an analysis of the constituent fields of the events.

In other cases, it’s usually better to use stats.

  • As the performance of the stats command is higher, it can be used especially in a distributed search environment

If there is a unique ID, the stats command can be used

  1. How to troubleshoot Splunk performance issues?

The answer to this question would be very wide, but mostly an interviewer would be looking for the following keywords:

  • Check splunkd.log for errors
  • Check server performance issues, i.e., CPU, memory usage, disk I/O, etc.
  • Install the SOS (Splunk on Splunk) app and check for warnings and errors in its dashboard
  • Check the number of saved searches currently running and their consumption of system resources
  • Install and enable Firebug, a Firefox extension. Log into Splunk (using Firefox) and open Firebug’s panels. Then, switch to the ‘Net’ panel (we will have to enable it). The Net panel will show us the HTTP requests and responses, along with the time spent in each. This will give us a lot of information quickly such as which requests are hanging Splunk, which requests are blameless, etc.
  1. What are Buckets? Explain Splunk Bucket Lifecycle.

Splunk places indexed data in directories, called ‘buckets.’ It is physically a directory containing events of a certain period.

A bucket moves through several stages as it ages. Below are the various stages it goes through:

  • Hot: A hot bucket contains newly indexed data. It is open for writing. There can be one or more hot buckets for each index.
  • Warm: A warm bucket consists of data rolled out from a hot bucket. There are many warm buckets.
  • Cold: A cold bucket has data that is rolled out from a warm bucket. There are many cold buckets.
  • Frozen: A frozen bucket is comprised of data rolled out from a cold bucket. The indexer deletes frozen data by default, but we can archive it. Archived data can later be thawed (data in a frozen bucket is not searchable).

By default, the buckets are located in:

$SPLUNK_HOME/var/lib/splunk/defaultdb/db

We should see the hot-db there, and any warm buckets we have. By default, Splunk sets the bucket size to 10 GB for 64-bit systems and 750 MB on 32-bit systems.

  1. What is the difference between stats and eventstats commands?
  • The statscommand generates summary statistics of all the existing fields in the search results and saves them as values in new fields.
  • Eventstats is similar to the stats command, except that the aggregation results are added inline to each event and only if the aggregation is pertinent to that event. The eventstats command computes requested statistics, like stats does, but aggregates them to the original raw data.
  1. Who are the top direct competitors to Splunk?

Logstash, Loggly, LogLogic, Sumo Logic, etc. are some of the top direct competitors to Splunk.

  1. What do Splunk Licenses specify?

Splunk licenses specify how much data we can index per calendar day.

  1. How does Splunk determine 1 day, from a licensing perspective?

In terms of licensing, for Splunk, 1 day is from midnight to midnight on the clock of the license master.

  1. How are Forwarder Licenses purchased?

They are included with Splunk. Therefore, no need to purchase separately.

  1. What is the command for restarting Splunk web server?

This is another frequently asked Splunk commands interview question. Get a thorough idea of commands We can restart the Splunk web server by using the following command:

splunk start splunkweb

  1. What is the command for restarting Splunk Daemon?

Splunk Deamon can be restarted with the below command:

splunk start splunkd

  1. What is the command used to check the running Splunk processes on Unix/Linux?

If we want to check the running Splunk Enterprise processes on Unix/Linux, we can make use of the following command:

ps aux | grep splunk

  1. What is the command used for enabling Splunk to boot start?

To boot start Splunk, we have to use the following command:

$SPLUNK_HOME/bin/splunk enable boot-start

  1. How to disable Splunk boot-start?

In order to disable Splunk boot-start, we can use the following:

$SPLUNK_HOME/bin/splunk disable boot-start

  1. What is Source Type in Splunk?

Source type is Splunk way of identifying data.

  1. How to reset Splunk Admin password?

Resetting Splunk Admin password depends on the version of Splunk. If we are using Splunk 7.1 and above, then we have to follow the below steps:

  • First, we have to stop our Splunk Enterprise
  • Now, we need to find the ‘passwd’ file and rename it to ‘passwd.bk’
  • Then, we have to create a file named ‘user-seed.conf’ in the below directory:

$SPLUNK_HOME/etc/system/local/

In the file, we will have to use the following command (here, in the place of ‘NEW_PASSWORD’, we will add our own new password):

[user_info]

 

PASSWORD = NEW_PASSWORD

  • After that, we can just restart the Splunk Enterprise and use the new password to log in

Now, if we are using the versions prior to 7.1, we will follow the below steps:

  • First, stop the Splunk Enterprise
  • Find the passwd file and rename it to ‘passw.bk’
  • Start Splunk Enterprise and log in using the default credentials of admin/changeme
  • Here, when asked to enter a new password for our admin account, we will follow the instructions

Note: In case we have created other users earlier and know their login details, copy and paste their credentials from the passwd.bk file into the passwd file and restart Splunk.

  1. How to disable Splunk Launch Message?

Set value OFFENSIVE=Less in splunk_launch.conf

  1. How to clear Splunk Search History?

We can clear Splunk search history by deleting the following file from Splunk server:

$splunk_home/var/log/splunk/searches.log

  1. What is Btool? How will you troubleshoot Splunk configuration files?

Splunk Btool is a command-line tool that helps us troubleshoot configuration file issues or just see what values are being used by our Splunk Enterprise installation in the existing environment.

  1. What is the difference between Splunk App and Splunk Add-on?

In fact, both contain preconfigured configuration, reports, etc., but Splunk add-on do not have a visual app. On the other hand, a Splunk app has a preconfigured visual app.

  1. What is .conf files precedence in Splunk?

File precedence is as follows:

System local directory — highest priority

App local directories

App default directories

System default directory — lowest priority

  1. What is Fishbucket? What is Fishbucket Index?

Fishbucket is a directory or index at the default location:

/opt/splunk/var/lib/splunk

It contains seek pointers and CRCs for the files we are indexing, so ‘splunkd’ can tell us if it has read them already. We can access it through the GUI by searching for:

index=_thefishbucket

  1. How do I exclude some events from being indexed by Splunk?

This can be done by defining a regex to match the necessary event(s) and send everything else to NullQueue. Here is a basic example that will drop everything except events that contain the string login:
In props.conf:

<code>[source::/var/log/foo]

 

# Transforms must be applied in this order

 

# to make sure events are dropped on the

 

# floor prior to making their way to the

 

# index processor

 

TRANSFORMS-set= setnull,setparsing

 

</code>

In transforms.conf:

[setnull] REGEX = . DEST_KEY = queue FORMAT = nullQueue

 

[setparsing]

 

REGEX = login

 

DEST_KEY = queue

 

FORMAT = indexQueue

  1. How can I understand when Splunk has finished indexing a log file?

We can figure this out:
By watching data from Splunk’s metrics log in real time:

index=”_internal” source=”*metrics.log” group=”per_sourcetype_thruput” series=”&lt;your_sourcetype_here&gt;” |

 

eval MB=kb/1024 | chart sum(MB)

By watching everything split by source type:

index=”_internal” source=”*metrics.log” group=”per_sourcetype_thruput” | eval MB=kb/1024 | chart sum(MB) avg(eps) over series

If we are having trouble with a data input and we want a way to troubleshoot it, particularly if our whitelist/blacklist rules are not working the way we expected, we will go to the following URL:

https://yoursplunkhost:8089/services/admin/inputstatus

  1. How to set the default search time in Splunk 6?

To do this in Splunk Enterprise 6.0, we have to use ‘ui-prefs.conf’. If we set the value in the following, all our users would see it as the default setting:

$SPLUNK_HOME/etc/system/local

For example, if our

$SPLUNK_HOME/etc/system/local/ui-prefs.conf file

includes:

[search]

dispatch.earliest_time = @d

dispatch.latest_time = now

The default time range that all users will see in the search app will be today.

The configuration file reference for ui-prefs.conf is here:

http://docs.splunk.com/Documentation/Splunk/latest/Admin/Ui-prefsconf

  1. What is Dispatch Directory?

$SPLUNK_HOME/var/run/splunk/dispatch

contains a directory for each search that is running or has completed. For example, a directory named 1434308943.358 will contain a CSV file of its search results, a search.log with details about the search execution, and other stuff. Using the defaults (which we can override in limits.conf), these directories will be deleted 10 minutes after the search completes—unless the user saves the search results, in which case the results will be deleted after 7 days.

  1. What is the difference between Search Head Pooling and Search Head Clustering?

Both are features provided by Splunk for the high availability of Splunk search head in case any search head goes down. However, the search head cluster is newly introduced and search head pooling will be removed in the next upcoming versions.

The search head cluster is managed by a captain, and the captain controls its slaves. The search head cluster is more reliable and efficient than the search head pooling.

  1. If I want to add folder access logs from a windows machine to Splunk, how do I do it?

Below are the steps to add folder access logs to Splunk:

  1. Enable Object Access Audit through group policy on the Windows machine on which the folder is located
  2. Enable auditing on a specific folder for which we want to monitor logs
  3. Install Splunk universal forwarder on the Windows machine
  4. Configure universal forwarder to send security logs to Splunk indexer
  5. How would you handle/troubleshoot Splunk License Violation Warning?

A license violation warning means that Splunk has indexed more data than our purchased license quota. We have to identify which index/source type has received more data recently than the usual daily data volume. We can check the Splunk license master pool-wise available quota and identify the pool for which the violation has occurred. Once we know the pool for which we are receiving more data, then we have to identify the top source type for which we are receiving more data than the usual data. Once the source type is identified, then we have to find out the source machine which is sending the huge number of logs and the root cause for the same and troubleshoot it, accordingly.

  1. What is MapReduce algorithm?

MapReduce algorithm is the secret behind Splunk’s faster data searching. It’s an algorithm typically used for batch-based large-scale parallelization. It’s inspired by functional programming’s map() and reduce() functions.

  1. How does Splunk avoid the duplicate indexing of logs?

At the indexer, Splunk keeps track of the indexed events in a directory called fishbucket with the default location:

/opt/splunk/var/lib/splunk

It contains seek pointers and CRCs for the files we are indexing, so splunkd can tell us if it has read them already.

  1. What is the difference between Splunk SDK and Splunk Framework?

Splunk SDKs are designed to allow us to develop applications from scratch and they do not require Splunk Web or any components from the Splunk App Framework. These are separately licensed from Splunk and do not alter the Splunk Software.

Splunk App Framework resides within the Splunk web server and permits us to customize the Splunk Web UI that comes with the product and develop Splunk apps using the Splunk web server. It is an important part of the features and functionalities of Splunk, which does not license users to modify anything in Splunk.

  1. For what purpose inputlookup and outputlookup are used in Splunk Search?

The inputlookup command is used to search the contents of a lookup table. The lookup table can be a CSV lookup or a KV store lookup. The inputlookup command is considered to be an event-generating command. An event-generating command generates events or reports from one or more indexes without transforming them. There are many commands that come under the event-generating commands such as metadata, loadjob, inputcsv, etc. The inputlookup command is one of them.

Syntax:

inputlookup [append=] [start=] [max=] [ | ] [WHERE ]

Now coming to the outputlookup command, it writes the search results to a static lookup table, or KV store collection, that we specify. The outputlookup command is not being used with external lookups.

Syntax:

outputlookup [append=<bool>] [create_empty=<bool>] [max=<int>] [key_field=<field_name>] [createinapp=<bool>] [override_if_empty=<bool>] (<filename> | <tablename>)

  1. Explain how Splunk works?

We can divide the working of Splunk into three main parts:

  • Forwarder:You can see it as a dumb agent whose main task is to collect the data from various sources like remote machines and transfers it to the indexer.
  • Indexer:The indexer will then process the data in real-time and store & index it on the localhost or cloud server.
  • Search Head:It allows the end-user to interact with the data and perform various operations like searching, analyzing, and visualizing the information.
  1. How to add the colors in Splunk UI based on the field names?

Splunk UI has a number of features that allow the administrator to make the reports more presentable. One such feature that proves to be very useful for presenting distinguished results is the custom colors. For example, if the sales of a product drop below a threshold value, then as an administrator you can set the chart to display the values in red color.

The administrator can also change chart colors in the Splunk Web UI by editing the panels from the panel settings mentioned above the dashboard. Moreover, you can write the codes and use hexadecimal values to choose a color from the palette.

  1. How the Data Ages in Splunk?

Data entering in an indexer gets directories, also known as buckets. Over a period of time, these buckets roll over different stages from hot to warm, cold, frozen, and finally thawed. The indexer goes through a pipeline and this is where the event processing takes place. It occurs in two stages, Parsing breaks the in individual events, while indexing takes these events into the pipeline for the processing.

This is what happens to the data at each stage of the indexing pipeline:

  • As soon as the data center the pipeline, it goes to the hot bucket. There can be multiple hot buckets at any point in time, which you can both search and write to.
  • If any problem like the Splunk getting restarted or the hot bucket has reached a certain threshold value/size, then a new bucket will be created in its place and the existing ones roll to become a warm bucket. These warm buckets are searchable, but you cannot write anything in them.
  • Further, if the indexer reaches its maximum capacity, the warm bucket will be rolled to become a cold one. Splunk will automatically execute the process by selecting the oldest warm bucket from the pipeline. However, it doesn’t rename the bucket. All the above buckets will be stored in the default location ‘$SPLUNK_HOME/var/lib/splunk/defaultdb/db/*’.
  • After a certain period of time, the cold bucket rolls to become the frozen bucket. These buckets don’t have the same location as the previous buckets and are non-searchable. These buckets can either be archived or deleted based on the priorities.
  • You can’t do anything if the bucket is deleted, but you can retrieve the frozen bucket if it’s being archived. The process of retrieving an archived bucket is known as thawing. Once a bucket is thawed it becomes searchable and stores into a new location ‘$SPLUNK_HOME/var/lib/splunk/defaultdb/thaweddb/’.
  1. What are pivots and data models in Splunk?

Data models in Splunk are used when you have to process huge amounts of unstructured data and create a hierarchical model without executing complex search queries on the data. Data models are widely used for creating sales reports, add access levels, and create a structure of authentication for various applications.

Pivots, on the other hand, give you the flexibility to create multiple views and see the results as per the requirements. With pivots, even the managers of stakeholders from non-technical backgrounds can create views and get more details about their departments.

  1. Explain Workflow Actions?

This topic will be present in any set of Splunk interview questions and answers. Workflow actions in Splunk are referred to as highly configurable, knowledge objects that enable you to interact with web resources and other fields. Splunk workflow actions can be used to create HTML links and use them to search field values, put HTTP post requests for specific URLs, and run secondary searches for selected events.

  1. How many types of dashboards are available in Splunk?

There are three types of dashboards available in Splunk:

  • Real-time dashboards
  • Dynamic form-based dashboards
  • Dashboards for scheduled reports
  1. What are the types of alerts available in Splunk?

Alerts are the actions generated by a saved search result after a certain period of time. Once an alert has occurred, subsequent actions like email or message will also be triggered. There two

Types of alters available in Splunk:

  • Real-time alerts:we can divide the real-time alerts into two parts, pre-result, and rolling-window alerts. The pre-result alert gets triggered with every search, while rolling-window alerts are triggered when a specific criterion is met by the search.
  • Scheduled Alerts:As the name suggests, scheduled alerts can be initialized to trigger multiple alerts based on the set criteria.
  1. Define the term “Search factor” and “Replication factor”

Search factor: The search factor (SF) decides the number of searchable copies an indexer cluster can maintain of the data/bucket. For example, the search factor value of 3 shows that the cluster can maintain up to 3 copies of each bucket.

Replication factor: The replication factor (RF) determines the number of users that can receive copies of your data/buckets. However, the search factor should not be greater than the replication factor.

  1. How to stop/start the Splunk service?

The command for starting Splunk service:

./splunk start

The command for stopping Splunk service:

./splunk stop

  1. What is the use of Time Zone property in Splunk?

Time Zone is an important property that helps you search for the events in case any fraud or security issue occurs. The default time zone will be taken from the browser settings or the machine you are using. Apart from event searching, it is also used in data pouring from multiple sources and aligns them based on different time zones.

  1. What are the important Search commands in Splunk?

Below are some of the important search commands in Splunk:

  • Erex
  • Abstract
  • Typer
  • Rename
  • Anomalies
  • Fill down
  • Accum
  • Add totals
  1. How many types of search modes are there in Splunk?

There are three types of search modes in Splunk:

  • Fast mode:speeds up your search result by limiting the types of data.
  • Verbose mode:Slower as compared to the fast mode, but returns the information for as many events as possible.
  • Smart mode:It toggles between different modes and search behaviours to provide maximum results in the shortest period of time.

Part 3:

  1. Define Splunk

Splunk is a software platform that allows users to analyze machine-generated data (from hardware devices, networks, servers, IoT devices, etc.). Splunk is widely used for searching, visualizing, monitoring, and reporting enterprise data. It processes and analyzes machine data and converts it into powerful operational intelligence by offering real-time insights into the data through accurate visualizations.

 

Splunk is used for analyzing machine data because:

  • It offers business insights – Splunk understands the patterns hidden within the data and turns it into real-time business insights that can be used to make informed business decisions.
  • It provides operational visibility – Splunk leverages machine data to get end-to-end visibility into company operations and then breaks it down across the infrastructure.
  • It facilitates proactive monitoring – Splunk uses machine data to monitor systems in real-time to identify system issues and vulnerabilities (external/internal breaches and attacks).

 

  1. Name the common port numbers used by Splunk.

The common port numbers for Splunk are:

  • Splunk Web Port: 8000
  • Splunk Management Port: 8089
  • Splunk Network port: 514
  • Splunk Index Replication Port: 8080
  • Splunk Indexing Port: 9997
  • KV store: 8191
  1. Name the components of Splunk architecture.

The Splunk architecture is made of the following components:

  • Search Head – It provides GUI for searching
  • Indexer – It indexes the machine data
  • Forwarder – It forwards logs to the Indexer

Deployment server – It manages the Splunk components in a distributed environment and distributes configuration apps.

 

  1. What are the different types of Splunk dashboards?

There are three different kinds of Splunk dashboards:

  • Real-time dashboards
  • Dynamic form-based dashboards
  • Dashboards for scheduled reports

 

  1. Name the types of search modes supported in Splunk.

Splunk supports three types of dashboards, namely:

  • Fast mode
  • Smart mode
  • Verbose mode

 

  1. Name the different kinds of Splunk Forwarders.

There are two types of Splunk Forwarders:

  • Universal Forwarder (UF)– It is a lightweight Splunk agent installed on a non-Splunk system to gather data locally. UF cannot parse or index data.
  • Heavyweight Forwarder (HWF)– It is a heavyweight Splunk agent with advanced functionalities, including parsing and indexing capabilities. It is used for filtering data.

 

  1. What are the benefits of feeding data into a Splunk instance through Splunk Forwarders?

If you feed the data into a Splunk instance via Splunk Forwarders, you can reap three significant benefits – TCP connection, bandwidth throttling, and an encrypted SSL connection to transfer data from a Forwarder to an Indexer. Splunk’s architecture is such that the data forwarded to the Indexer is load-balanced by default.

So, even if one Indexer goes down due to some reason, the data can re-route itself via another Indexer instance quickly. Furthermore, Splunk Forwarders cache the events locally before forwarding it, thereby creating a temporary backup of the data.

 

  1. What is the “Summary Index” in Splunk?

In Splunk, the Summary Index refers to the default Splunk index that stores data resulting from scheduled searches over time. Essentially, it is the index that Splunk Enterprise uses if a user does not specify or indicate another one.

The most significant advantage of the Summary Index is that it allows you to retain the analytics and reports even after your data has aged.

 

  1. What is the purpose of Splunk DB Connect?

Splunk DB Connect is a generic SQL database plugin designed for Splunk. It enables users to integrate database information with Splunk queries and reports seamlessly.

 

  1. What is the function of the Splunk Indexer?

As the name suggests, the Splunk Indexer creates and manages indexes. It has two core functions – to index raw data into an index and to search and manage the indexed data.

 

  1. Name a few important Splunk search commands.

Some of the important search commands in Splunk are:

  • Abstract
  • Erex
  • Addtotals
  • Accum
  • Filldown
  • Typer
  • Rename
  • Anomalies

 

  1. What are some of the most important configuration files in Splunk?

The most crucial configuration files in Splunk are:

  • conf
  • conf
  • conf
  • conf
  • conf

 

  1. What is the importance of the License Master in Splunk? What happens if the License Master is unreachable?

 

In Splunk, the License Master ensures that the right amount of data gets indexed. Since the Splunk license is based on the data volume that reaches the platform within a 24hr-window, the License Master ensures that your Splunk environment stays within the constraints of the purchased volume.

 

If ever the License Master is unreachable, a user cannot search the data. However, this will not affect the data flowing into the Indexer – data will continue to flow in the Splunk deployment, and the Indexers will index the data. But the top of the Search Head will display a warning message that the user has exceeded the indexing volume. In this case, they must either reduce the amount of data flowing in or must purchase additional capacity of the Splunk license.

 

  1. Explain ‘license violation’ in the Splunk perspective.

Anytime you exceed the data limit, the ‘license violation’ error will show on the dashboard. This warning will remain for 14 days. For a commercial Splunk license, users can have five warnings in a 30-day window before which Indexer’s search results and reports will not trigger. However, for the free version, users get only three warning counts.

 

  1. What is the general expression for extracting IP address from logs?

Although you can extract the IP address from logs in many ways, the regular experssion for it would be:

rex field=_raw “(?<ip_address>\d+\.\d+\.\d+\.\d+)”

OR

rex field=_raw “(?<ip_address>([0-9]{1,3}[\.]){3}[0-9]{1,3})”

 

  1. How can you troubleshoot Splunk performance issues?

To troubleshoot Splunk performance issues, perform the following steps:

  • Check splunkd.log to find any errors
  • Check server performance issues (CPU/memory usage, disk i/o, etc.)
  • Check the number of saved searches that are running at present and also their system resources consumption.
  • Install the SOS (Splunk on Splunk) app and see if the dashboard displays any warning or errors.
  • Install Firebug (a Firefox extension) and enable it in your system. After that, you have to log into Splunk using Firefox, open Firebug’s panels, and go to the ‘Net’ panel to enable it). The Net panel displays the HTTP requests and responses, along with the time spent in each. This will allow you to see which requests are slowing down Splunk and affecting the overall performance.

 

  1. What are Buckets? Explain Splunk Bucket Lifecycle.

Buckets are directories that store the indexed data in Splunk. So, it is a physical directory that chronicles the events of a specific period. A bucket undergoes several stages of transformation over time. They are:

  • Hot – A hot bucket comprises of the newly indexed data, and hence, it is open for writing and new additions. An index can have one or more hot buckets. 
  • Warm – A warm bucket contains the data that is rolled out from a hot bucket. 
  • Cold – A cold bucket has data that is rolled out from a warm bucket. 
  • Frozen – A frozen bucket contains the data rolled out from a cold bucket. The Splunk Indexer deletes the frozen data by default. However, there’s an option to archive it. An important thing to remember here is that frozen data is not searchable.

 

  1. What purpose does the Time Zone property serve in Splunk?

In Splunk, Time Zone is crucial for searching for events from a security or fraud perspective. Splunk sets the default Time Zone for you from your browser settings. The browser further picks up the current Time Zone from the machine you are using. So, if you search for any event with the wrong Time Zone, you will not find anything relevant for that search.

The Time Zone becomes extremely important when you are searching and correlating data pouring in from different and multiple sources

 

  1. Define Sourcetype in Splunk.

In Splunk, Sourcetype refers to the default field that is used to identify the data structure of an incoming event. Sourcetype should be set at the forwarder level for indexer extraction to help identify different data formats. It determines how Splunk Enterprise formats the data during the indexing process. This being the case, you must ensure to assign the correct Sourcetype to your data. To make data searching even easier, you should provide accurate timestamps, and event breaks to the indexed data (the event data). 

  1. Explain the difference between Stats and Eventstats commands.

In Splunk, the Stats command is used to generate the summary statistics of all the existing fields in the search results and save them as values in newly created fields. Although the Eventstats command is pretty similar to the Stats command, it adds the aggregation results inline to each event (if only the aggregation is pertinent to that particular event). So, while both the commands compute the requested statistics, the Eventstats command aggregates the statistics into the original raw data.

 

  1. Differentiate between Splunk App and Add-on.

Splunk Apps refer to the complete collection of reports, dashboards, alerts, field extractions, and lookups. However, Splunk Add-ons only contain built-in configurations – they do not have dashboards or reports.

 

  1. What is the command to stop and start Splunk service?

The command to start Splunk service is: ./splunk start

The command to stop Splunk service is: ./splunk stop

 

  1. How can you clear the Splunk search history?

To clear the Splunk search history, you need to delete the following file from Splunk server:

$splunk_home/var/log/splunk/searches.log

 

  1. What is Btool in Splunk?

Btool in Splunk is a command-line tool that is used for troubleshooting configuration file issues. It also helps check what values are being used by a user’s Splunk Enterprise installation in the existing environment.

 

  1. What is the need for Splunk Alert? Specify the type of options you get while setting up Splunk Alerts.

Splunk Alerts help notify users of any erroneous condition in their systems. For instance, a user can set up Alerts for email notification to be sent to the admin in case there are more than three failed login attempts within 24 hours.

The different options you get while setting up Alerts include:

  • You can create a webhook. This will allow you to write to HipChat or GitHub – you can write an email to a group of machines containing your subject, priorities, and the body of your email.
  • You can add results in CSV or pdf formats or in line with the body of the message to help the recipient understand the location and conditions of the alert that has been triggered and what actions have been taken for the same.
  • You can create tickets and throttle alerts based on specific conditions such as the machine name or IP address. These alerts can be controlled from the alert window.
  •  
  1. What is a Fishbucket and what is the Index for it?

Fishbucket is an index directory resting at the default location that is:

/opt/splunk/var/lib/splunk

Fishbucket includes seek pointers and CRCs for the indexed files. To access the Fishbucket, you can use the GUI for searching:

index=_thefishbucket

  1. How to know when Splunk has completed indexing a log file?

You can figure out whether or not Splunk has completed indexing a log file in two ways:

  1. By monitoring the data from Splunk’s metrics log in real-time:

 index=”_internal” source=”*metrics.log” group=”per_sourcetype_thruput” series=”&lt;your_sourcetype_here&gt;” |

eval MB=kb/1024 | chart sum(MB)

  1. By monitoring all the metrics split by source type:

index=”_internal” source=”*metrics.log” group=”per_sourcetype_thruput” | eval MB=kb/1024 | chart sum(MB) avg(eps) over series

 

  1. What is the Dispatch Directory?

The Dispatch Directory includes a directory for individual searches that are either running or have completed. The configuration for the Dispatch Directory is as follows:

$SPLUNK_HOME/var/run/splunk/dispatch

Let’s assume, there is a directory named 1434308943.358. This directory will contain a CSV file of all the search results, a search.log containing the details about the search execution, and other relevant information. By using the default configuration, you can delete this directory within 10 minutes after the search completes. If you save the search results, they will be deleted after seven days.

 

  1. How can you add folder access logs from a Windows machine to Splunk?

To add folder access logs from a Windows machines to Splunk, you must follow the steps listed below:

  • Go to Group Policy and enable Object Access Audit on the Windows machine where the folder is located.
  • Now you have to enable auditing on the specific folder for which you want to monitor access logs.
  • Install Splunk Universal Forwarder on the Windows machine.
  • Configure the Universal Forwarder to send security logs to the Splunk Indexer.

 

  1. How does Splunk avoid duplicate indexing of logs?

The Splunk Indexer keeps track of all the indexed events in a directory – the Fishbuckets directory that contains seek pointers and CRCs for all the files being indexed presently. So, if there’s any seek pointer or CRC that has been already read, splunkd will point it out.

 

  1. What is the configuration files precedence in Splunk?

The precedence of configuration files in Splunk is as follows:

  • System Local Directory (highest priority)
  • App Local Directories
  • App Default Directories
  • System Default Directory (lowest priority)

 

  1. Define “Search Factor” and “Replication Factor.”

Both Search Factor (SF) and Replication Factor (RF) are clustering terminologies in Splunk. While the SF (with a default value of 2) determines the number of searchable copies of data maintained by the Indexer cluster, the RF represents the number of copies of data maintained by the Indexer cluster. An important thing to remember is that SF must always be less than or equal to the replication factor. Also, the Search Head cluster only has a Search Factor, whereas an Indexer cluster has both SF and RF. 

 

  1. Why is the lookup command used? Differentiate between inputlookupoutputlookupcommands.

In Splunk, lookup commands are used when you want to receive specific fields from an external file (for example, a Python-based script, or a CSV file) to obtain a value of an event. It helps narrow the search results by referencing the fields in an external CSV file that matches fields in the event data.

The inputlookup command is used when you want to take an input. For instance, the command can take the product price or product name as input and then match it with an internal field such as a product ID. On the contrary, the outputlookup command is used to produce an output from an existing field list.

 

  1. Differentiate between Splunk SDK and Splunk Framework.

Splunk SDKs are primarily designed to help users develop applications from scratch. They do not require Splunk Web or any other component from the Splunk App Framework to function. Splunk SDKs are separately licensed from Splunk. As opposed to this, the Splunk App Framework rests within the Splunk Web Server. It allows users to customize the Splunk Web UI that accompanies the product. Although it lets you develop Splunk apps, you have to do so by using the Splunk Web Server. 

 

By bpci