MLops for beginners

Day18: Monitoring containerized webapp logs with Splunk

What is splunk? Why should we use it? How to use it? How to do something cool with it? REAL TIME ALERTS!!

Rishabh Umrao

--

Machine data has taken the exponential growth over the last decade and we are still counting on it; Partly due to increasing number of IT infrastructure systems and partly due to the IoT devices. We need something to analyze this data in effective manner to get better insights of the productivity and visibility of the business.

Machine data is mostly

  • Not suitable of direct analysis
  • complex to understand
  • Sometimes unstructured
  • And with time data gets huge.

So, What is Splunk and How it can help?

According to wikipedia,

Splunk (the product) captures, indexes, and correlates real-time data in a searchable repository from which it can generate graphs, reports, alerts, dashboards, and visualizations.

Splunk makes machine data accessible across an organization by identifying data patterns, providing metrics, diagnosing problems, and providing intelligence for business operations. Splunk is a horizontal technology used for application management, security and compliance, as well as business and web analytics.

Basically, Splunk will help you to get all the data from all the machines and allows you monitor the data from the webUI. It also allows you to draw visualizaions, graphs, reports and other statistical things. Along with that you can set alarms for almost anything. (We’ll see that part in later section of the article)

Splunk is a huge topic to discuss in this single article. There are many resources through which you can learn more about splunk. (I prefer the below ones.)

Splunk have all the cool features and is a great tool to work with. But the only issue is that Splunk is a costly tool. You need to buy a license to use it in the enterprise clouds. Although you can get a free version or get a developer license that can help you learn and use the splunk for your personal projects.

Time for some action

Here I’ll be using 2 different machines.

  • a RHEL/CentOS 8 server as my splunk server
  • a Ubuntu 18 server where I have hosted a webapp in docker and splunk universal forwarder.

As always I’ll break the whole process into small steps to make them achievable.

  1. Setup splunk server
  2. Setup a web app (I’ll use DVWA in docker)
  3. Setup the splunk forwarder to send logs to splunk server.

Setup splunk server

(I am using a RHEL 8 instance to set up splunk server.)

Here I have to dowload the splunk server package and install it on the system and then configure it to accept any log data on some port.

  • Install wget
yum install wget -y
installing wget
  • Download splunk .rpm file ( rpm files are only for redhat family distros)
wget -O splunk-8.0.4.1-ab7a85abaa98-linux-2.6-x86_64.rpm 'https://www.splunk.com/bin/splunk/DownloadActivityServlet?architecture=x86_64&platform=linux&version=8.0.4.1&product=splunk&filename=splunk-8.0.4.1-ab7a85abaa98-linux-2.6-x86_64.rpm&wget=true'
Downlaod splunk
  • Install splunk
rpm -i splunk-8.0.4.1-ab7a85abaa98-linux-2.6-x86_64.rpm
installing splunk
  • go to splunk directory
cd /opt/splunk/bin/
  • Start splunk server
./splunk start --accept-license
  • This will ask for a username and a password for splunk server. This will be required to log into the webUI.
  • Open the webUI in your favorite browser, login with the username and password. (If you are on any cloud instance, then use the public ip with port 8000)
  • Enable the service at boot time
./splunk enable boot-start
  • Configure the splunk from web portal for receiving the log data.
  • Go to settings > Forwarding and Recieving
  • Add a new receiver
  • Make it listen on port 9997 (or you can choose any other available port)
  • check for listening ports (optional)
netstat -auntp | grep 9997

Setup the web app

  • Install Docker

https://www.google.com/?q=how+to+install+docker+in+ubuntu+18

  • Create a Dockerfile
FROM tutum/lamp:latestENV DEBIAN_FRONTEND noninteractive# Preparation
RUN \
rm -fr /app/* && \
apt-get update && apt-get install -yqq wget unzip php5-gd && \
rm -rf /var/lib/apt/lists/* && \
wget https://github.com/ethicalhack3r/DVWA/archive/v1.9.zip && \
unzip /v1.9.zip && \
rm -rf app/* && \
cp -r /DVWA-1.9/* /app && \
rm -rf /DVWA-1.9 && \
sed -i “s/^\$_DVWA\[ ‘db_user’ \] = ‘root’/\$_DVWA[ ‘db_user’ ] = ‘admin’/g” /app/config/config.inc.php && \
echo “sed -i \”s/p@ssw0rd/\$PASS/g\” /app/config/config.inc.php” >> /create_mysql_admin_user.sh && \
echo ‘session.save_path = “/tmp”’ >> /etc/php5/apache2/php.ini && \
sed -ri -e “s/^allow_url_include.*/allow_url_include = On/” /etc/php5/apache2/php.ini && \
chmod a+w /app/hackable/uploads && \
chmod a+w /app/external/phpids/0.6/lib/IDS/tmp/phpids_log.txt
EXPOSE 80 3306
CMD [“/run.sh”]
  • Build the docker image
docker build -t “mydvwa:v1” .
  • Create a docker volume to store server logs
docker volume create httpd_logs
  • Start the docker container
docker run -d --volume httpd_logs:/var/log/apache2 --name dvwawebserver -p 80:80 -P mydvwa:v1
  • Know where your logs are stored (docker volumes mount point)
docker volume inspect httpd_logs
  • For me the location is this (cat the log file to see the data)
cat /var/lib/docker/volumes/httpd_logs/_data/access.log
  • Download the splunk universal forwarder
wget -O splunkforwarder-8.0.4–767223ac207f-linux-2.6-amd64.deb ‘https://www.splunk.com/bin/splunk/DownloadActivityServlet?architecture=x86_64&platform=linux&version=8.0.4&product=universalforwarder&filename=splunkforwarder-8.0.4-767223ac207f-linux-2.6-amd64.deb&wget=true'
  • Install it
dpkg -i ./splunk-7.1.0–2e75b3406c5b-linux-2.6-amd64.deb
  • Go to the folder where it is installed
cd /opt/splunkforwarder/bin
  • Start the program (This will also ask for the username and password; But these are client username and password and will be required later)
./splunk start --accept-license
  • Add a forwarding server location (my splunk server on RHEL8) (only give the client username and password; else the login will fail)
./splunk add forward-server 34.229.67.168:9997
  • Add data to monitor (access_combined source type is used for the httpd or basic webserver log types)
./splunk add monitor /var/lib/docker/volumes/httpd_logs/_data/access.log -sourcetype access_combined
  • But adding like this is temporary, we need to make things permanent.
vi /opt/splunkforwarder/etc/system/local/inputs.conf
  • Add the entry like below to this file (first line denotes which file to monitor; second line denotes the type of file it is; You can google the source type for your log file — these are basically the parser types)
[monitor:///var/lib/docker/volumes/httpd_logs/_data/access.log]
sourcetype=access_combined
  • This host will be used by the splunk server to identify the source.
  • Now restart and enable at boot ( run these commands in the terminal).
/opt/splunkforwarder/bin/splunk restart
/opt/splunkforwarder/bin/splunk enable boot-start

You are now done with all the setting to be made from machines. Now whatever log data will be generated by the webapp that will be forwarded to the splunk server to monitor.

Now let’s see how to use the webUI for analysis on that log data.

  • For this log into the webUI with your server username and password (which you have set in RHEL8 earlier)
  • Click on Search & Reporting from the side panel. You’ll see something like this as below.
  • Number of events and other time stamps might differ for you. But that’s not something to concern about right now.
  • Click on Data Summary.
  • This is the universal forwarder host which we have set on ubuntu. This is your data source for now.
  • This is how your data will be shown when you click on the host. Analyze your log and play around a bit to get familiar with the log and splunk’s smart parsing.
  • Let’s try to generate some dirty logs for the webserver. I am trying to access some page that does not exist. Or in technical terms, trying to access some page where server will give me a 400 error or similar. And the incident will surely be recorded in the log for the analysis.
  • Now it is visible that if we search for the status=”4**”, then I have an entry for GET /abc; which gives me an error of 400 or any 4**. (All Hail, REGEX !!)
  • Now let’s try some more examples. Say /xyz. Hit this url on your web application and this will generate some more log data.

Monitoring such entries is good to give us insight that If someone is trying to perform a directory bruteforce attack. But we can’t keep on checking this every second for the attack.

Here comes the alert. Splunk has a feature that we can save a search query as an alert. This alert will be triggered whenever someone will be trying to perform a directory bruteforce attack.

  • Click on the top Save As button and then click on Alert.
  • Then a pop up will show. Fill up this form as per your need of the alert.
  • I am using a trigger condition = Number of Results.
  • Splunk gives you many features as trigger action. You can even run a script that can execute any code you have written. Here I am using to add the event in the event alert logs.
  • Now save it.
  • I am using a free version, So it is giving me a warning that my scheduled search will not work when the licence expires.
  • Now we have set an alarm. So whenever someone will perform a directory bruteforce on our webapp.
  • I performed a directory bruteforce and It alarmed me for the attack in real time.
  • Now I can simply make a script to block the Attacker IP. And set it as my trigger action for the alert. Or do anything as per my imaginations.

Splunk is not only used for analysis of the server logs. One can use it to monitor the progress of their ML/DL model, Or the stock price, Or any data you want to analyze, visualize or do some nitty-gritty statistics operations on it.

Read more about splunk here.

That’s it for now !!

--

--