Databricks Data Lakehouse vs. a Data Warehouse: What’s the Difference? Read Our Latest Blog...
Databricks Data Lakehouse vs. a Data Warehouse: What’s the Difference? Read Our Latest Blog...
Start Free Trial

ChaosSearch Blog

7 MIN READ

Identify Anomalies in your AWS CloudTrail Data

Released in 2013, AWS CloudTrail is a service provided by Amazon Web Services (AWS) which keeps a record of every single API call that happens within your AWS account. CloudTrail provides you with the ability to get deep visibility into the activity that occurs within your account, allowing you to see exactly who did what and when. You can use the CloudTrail logs not only to track the security of the user access but also for operational troubleshooting. There are no charges to use the CloudTrail service, but since all the data is logged into a bucket in your AWS account, standard S3 charges will apply.

CloudTrail is only an API logging tool — it does not come with any native way to analyze the data other than a simple UI that lets you run some basic searches on events over the last 90 days worth of data. If you wanted to use native AWS services to visualize usage trends you would need to ingest your data into Redshift or use Athena and QuickSight to build dashboards and other visualizations. If you wanted the ability to run free-form queries on your data, such as wildcard searches on various fields, you would need to ingest this data into an Elasticsearch cluster.

Elasticsearch is a powerful distributed search cluster, but there is a lot of technical complexity involved in order to consume and index this data. The biggest complaint our customers mention when they talk about trying to get value out of their CloudTrail data is the time it takes to build a schema for the data. CloudTrail data is notoriously sparse, with hundreds of fields requiring distinct mappings for. Additionally, if you were to make a mistake during index creation, you would need to spend time and energy updating that schema and reindexing all of your data.

 

Watch this quick demo to learn more about CloudTrail log analysis with ChaosSearch:

 

CHAOSSEARCH is the first technology that turns your data on Amazon S3 into a fully searchable cluster with support for the Elasticsearch API as well as a fully integrated Kibana interface. After you get started with CHAOSSEARCH and integrate your account with a Read-Only IAM access role, you can create a Virtual Bucket grouping together all of your CloudTrail data for indexing.

Identify Anomalies in your AWS CloudTrail Data

CHAOSSEARCH will identify this data as GZIP JSON, and if you want to have this data continually indexed as AWS writes logs to your S3 bucket, you can enable SQS notifications which lets CHAOSSEARCH know when new objects are available in the bucket for indexing.

Identify Anomalies in your AWS CloudTrail Data

You won’t need to spend any time creating an index schema and mapping for the data with CHAOSSEARCH. We can automatically identify which fields are strings, integers, or time values. Since we leverage a revolutionary Schema on Read approach to data indexing, you can modify the schema for the data anytime without ever needing to reindex your data.

Identify Anomalies in your AWS CloudTrail Data

After indexing, this data is now available within our integrated Kibana interface and you can see that we have indexed 455 separate fields within this very sparse dataset. Everything looks and feels like a normal Kibana interface, except all this data exists within your Amazon S3 infrastructure.

Identify Anomalies in your AWS CloudTrail Data

From here we can navigate to the discover screen to start analyzing API usage over time. Since all the data lives within your Amazon S3 account, you can now cost-effectively retain months and years worth of your log and event data. In the event of an operational issue or a security event, you will always be able to go back to data with the platform no matter how long ago the event occurred.

Identify Anomalies in your AWS CloudTrail Data

In this scenario, I want to analyze my AWS S3 API calls for any potentially anomalous activity. I can immediately run an aggregation for all events where the source is an Amazon S3 API call.

Identify Anomalies in your AWS CloudTrail Data

Sometimes you may want to order by the most frequently used API calls, but in this case, I want to investigate the least common API calls by type. When I adjust my query I can see two DeleteBucket API calls. Let’s continue diving in to see which buckets were deleted.

Identify Anomalies in your AWS CloudTrail Data

I can use the native Kibana tools to pin the filter and go back to our raw event view to identify which buckets were deleted.

Identify Anomalies in your AWS CloudTrail Data

In my case, I see a bucket related to logs for my blog was deleted, and the bucket that the CHAOSSEARCH indexes were stored in was deleted as well.

Let’s now continue diving into my CloudTrail data to see if we can identify and anomalous user console logins to the AWS platform.

Identify Anomalies in your AWS CloudTrail Data

When I search for the ConsoleLogin event name I can see that a majority of user logins to the Console were made by my IAM user “petecheslock” — but there were also 5 “root” user logins as well. There should be no reason that I can think of for logging in as the root user of an AWS account, so let’s identify when these logins occurred and what activity was taken.

Identify Anomalies in your AWS CloudTrail Data

You can see those 5 logins occurred at random times over the last 6 months — but I’m most interested in the most recent login — what did the root user do around the end of April?

Identify Anomalies in your AWS CloudTrail Data

By adjusting my query and time scale I can see that the root user was used in order to redeem an AWS promo code for my account.

This entire process from raw data within my AWS account, into actionable insights on my data, was all able to happen within minutes from initially getting my AWS account set up with CHAOSSEARCH. I didn’t have to spend any time deploying a database like Elasticsearch, sizing, sharding, or creating schemas and mappings for my data. I can leave all my data within my own AWS account, let CHAOSSEARCH index that data, and write that indexed data back into my AWS S3 buckets. Reach out today and get set up in minutes to start getting answers to your CloudTrail data questions.

About the Author, Pete Cheslock

Pete Cheslock was the VP of Product for ChaosSearch, where he was brought on as one of the founding executives. In his role, Pete helped to define the go-to-market strategy and refine product direction for the initial ChaosSearch launch. To see what Pete’s up to now, connect with him on LinkedIn. More posts by Pete Cheslock