Site Search:

Splunk in 5 minutes

Back>

One trend of software system is -- they became more and more distributed. You host your web application in Amazon cloud, the application consumes various rest services hosted in your cooperate data center, the rest services connect to the database server clusters for business data. That's just half of the story -- your web client might be a browser running in PC, Android, iphone, blackberry. Sometimes, the client is a script or malicious bot. Whenever something happens and you want to figure out what's going on, you have to search tons of logs that's located in many hosts with various folder structure and file format.

In early days, machine data warriors do manually check logs, they "ssh" into routers, firewalls, cloud virtual machines, application servers, database servers, ldap servers, authentication servers..., do a "grep" or "zgrep" for gcc.log, access.log, catalina.out, petstore-ui_18_01_12.log, psvs-aaa_18_01_12.log, psvs-prov_18_01_12.log, psvs-audit_18_01_12.log, myql.log ...,  then mentally correlate hints they grep out to make theories then verify the theories with more greps.

As you can see, this process is slow, so there are huge demand for speeding up the ssh and grep process. Various smart solutions have been tried -- tools such as MtPutty, iterm allows multiple ssh sessions to be opened in tabs; shell scripts with expect can automate ssh login and grep on remote machine; robotframework is developed as a framework for executing remote scripts...Splunk is one of the most successful solutions tried and prevailed.

Splunk has three main components: forwarder, indexer, search head. A typical splunk setup is: you install a forwarder program on an application host, the forwarder collects all sorts of logs on the host and forward them to splunk server, where the indexer program is running. Indexer process the received raw logs, index them in order to produce time ordered events, so that the search head later can find the wanted events faster. As you can see, you don't have to remote login servers, servers send the log to you. The logs searching is faster because of the indexing. In a nutshell, splunk created a centralized database/repository for all of your log data for easy search and report. It is kind of simple idea, but splunk makes it happen in large scale the first time in IT history. The splunk user interface is impressive. You login splunk with a browser, then you write a query to find the events, splunk streamline the process of creating table, charts, statistics from those events.