In this article, we discuss how admins can programmatically access Spotlight using a utility called osquery, and we demonstrate useful queries you can run to find evidence of compromise in your organization.
Have you ever wanted to find exactly the right file on your system in a hurry?
That’s why those clever folks at Apple built Spotlight.
Originally introduced in Mac OS X Tiger, Spotlight continuously maintains an index of all of the files on your Mac, which allows you to instantly search for files not just by their names, but by their metadata, and even the text content inside of them.
I lean on Spotlight heavily every day, to quickly locate and pull up the right design assets from the 1000+ Sketch files on my system.
While the benefits of Spotlight as a user of macOS are obvious and intuitive, admins may be surprised to learn you can leverage this powerful feature across your Mac fleet to hunt for evidence of malware, data breaches, and other undesirable artifacts in your end-users' devices.
In this article, we will discuss how you can programmatically access Spotlight using a utility called osquery, and we’ll demonstrate useful queries you can run to find evidence of compromise in your organization.
What is osquery?
Osquery is a free open-source project that allows you to query a device with SQL as if it were a real relational database.
For example, if you wanted to list all of the apps installed on a device, you can simply open your terminal, type osqueryi
(osquery’s command line utility) and in the prompt run the following:
SELECT * FROM apps LIMIT 1;
name = 1Password 7.app
path = /Applications/1Password 7.app
bundle_executable = 1Password 7
bundle_identifier = com.agilebits.onepassword7
bundle_name = 1Password 7
bundle_short_version = 7.0.7
bundle_version = 70007000
bundle_package_type = APPL
environment =
element =
compiler = com.apple.compilers.llvm.clang.1_0
development_region = en
display_name =
info_string =
minimum_system_version = 10.12.6
category = public.app-category.productivity
applescript_enabled = 0
copyright = Copyright © AgileBits Inc.
last_opened_time = 1533860585.5755
When I run this query, osquery translates the SQL into live API calls that list all of the apps currently installed on my device. If I installed more apps and ran this command again, the output would immediately reflect those changes.
In our query above, apps
is called a “virtual table” in osquery terminology. Osquery has hundreds of these virtual tables, many of them which work across platforms. To see them all, check out osquery’s schema documentation.
To download osquery on your device, simply visit the official website and find the right package for your platform.
This article focuses on a special Virtual Table called mdfind. This Virtual Table (named after the CLI utility mdfind
) allows us to use Spotlight to locate files on a Mac that meet our search criteria.
The mdfind virtual table
Prior to mdfind’s inclusion into osquery, searching for files across the file-system, when their location was unknown, necessitated great deals of recursion, and was best avoided to prevent undue strain on the device being queried.
You can read more about the file
table in my previous blog post: The File Table: Osquery’s Secret Weapon.
Enter macOS Spotlight, and the mdfind table!
macOS Spotlight (mdfind) is like a lightweight grep
without the p
. The best part is, because it’s built around an index, it’s insanely fast and well suited to locating files where the precise location or name is unknown.
The mdfind
virtual table was born first as a custom osquery go table written by Victor Vrantchan (groob) and later added (v3.2.6) to the core Osquery open-source project by Facebook developer Mitchell Grenier (obelisk).
Unlike most osquery virtual tables, where querying is straightforward if you are familiar with SQL, mdfind
requires some Apple developer-level knowledge of Spotlight’s own unique query language. You should think of mdfind
like a pass-through that allows you to access the raw power of Spotlight from osquery.
What can mdfind find?
Spotlight indexes an absolutely incredible breadth of data across your file system. There are over 125 published metadata attributes that Spotlight is capable of indexing and which you can search by. However most systems actually have closer to double that number, and 3rd-party applications can add even more via custom attributes. You can check to see what metadata attributes are present on your own device by running the following command in the terminal:
mdimport -A
The naming schema is straightforward, and all attributes are prepended by kMDItem
:
k - (Hungarian Notation for constant, used by Apple since Pascal)
MD - (metadata)
Item
AttributeName
The standard stuff is all there of course:
File Name (
kMDItemFSName
)File Size (
kMDItemFSSize
)File Creation Date (
kMDItemFSCreationDate
)
Spotlight also gives you access to metadata that you may not realize even exists across your files, including:
Downloaded File Source (
kMDItemWhereFroms
)File EXIF Altitude in Meters above sea-level (
kMDItemAltitude
)PDF Password Security Method (
kMDItemSecurityMethod
)
While some of these are useful (and others just strange), they still require some basic level of knowledge about the files themselves. What if you’re just looking for files of any type that contain specific phrases or confidential information? This is where Spotlight really shines (pun intended). This is accessed through the attribute…
- File Text Contents (
KMDItemTextContent
)
Spotlight can index the text contents of any ASCII plain-text files, PDFs, Messages, emails, text files, csv’s, python files, shell scripts, JSON and other compatible formats. They just have to bewithin an indexable location.
That’s right, it’s pretty much the bee’s knees! Let’s discuss some practical applications of that utility.
Using mdfind to prevent a data breach
Unlike other types of devastating cyber security incidents that involve advanced threat actors and malware, most data breaches for SaaS companies can be attributed to innocuous events. For example, a well-intentioned software engineer troubleshoots a customer issue, and then simply forgets to delete a production database backup on their device afterwards.
Once a production database backup is on a device, forgetting to encrypt that device and leaving it in the back of a cab is all it takes for a simple oversight to turn into a disastrous headline.
So, let’s say you were afraid your engineers had unintentionally left a copy of your production database on their laptop. How would we find it? For this example, I will be using 1Password Device Trust’s Live Query feature to run a search across a few of our own devices. You could use osqueryi
to test these queries yourself, but it’s worth mentioning that we already have this set up as a check template called “Sensitive Files” in 1Password Device Trust. If you want a simple osquery solution to deploy across your fleet, we’ve got your back.
For now, I have created a mock production db dump on my device to illustrate the process, located at:
/Users/fritz-imac/dev/pg/backups/backup_2018-07-11T06-57-36Z
👀 Let’s see if we can find it!
For the purposes of this post I am going to assume little experience in writing osquery SQL queries, and share a couple useful tips along the way. If you want to skip ahead to the finished query, feel free to jump ahead to the final attempt.
Attempt 1 — A basic mdfind query: Single condition, zero complexity
The most basic approach would be to search for any file containing the string 'backup'
. All mdfind queries have the same basic building blocks, and they are all joined on a table that contains a path
column, most typically the file
table.
SELECT f.path FROM file AS f JOIN mdfind ON mdfind.path = f.path
This segment tells osquery to join file
against mdfind
on path
and to return the file.path
.
Next, we need to provide the mdfind.query
component. This will be the method we use to search the device for the desired files and pass the appropriate path
to the joined file
table.
Our condition is represented here as:
AND mdfind.query = "kMDItemFSName == '*backup*'"
In it, we are telling the mdfind API to look for any files whose filename contains the partial string '*backup*'
with the *
characters representing wildcards which would allow us to match things like: 01-02-22-backup.zip
or fritzsbackupfile.gzip
It’s important to note that the mdfind.query
must be made within double quotations; individual operator comparison strings such as ‘backup’ must be within single quotes.
This gives us a complete and valid mdfind osquery query:
SELECT
f.path
FROM file AS f
JOIN mdfind ON mdfind.path = f.path
AND mdfind.query = "kMDItemFSName == '*backup*'"
Let’s give it a try now to see what is returned:
Attempt 1 - 2016 Results
Oof!
With only 9 Devices Targeted, it’s going to take a fair bit of sifting to find our needle in the haystack. You can imagine how much larger the result set becomes when you are querying thousands of machines.
Furthermore, because the only item returned by mdfind is the path, we need to ask the file table to return some more relevant information in our results:
“Tell me more, tell me more”
Attempt 2 — Two conditions and boolean logic
In order to expedite our ability to parse these results, let’s return some additional metadata such as:
file size:
f.size
creation time:
f.btime
last modified time:
f.mtime
Additionally, we will use the datetime
function to return time in the standard ISO-8601 format, and we will use the ROUND
function on f.size
so that it returns in MB instead of bytes.
Alright, now that we have some data about the files that we can quickly scan by eye, let’s add some more conditional logic.
As we discussed earlier, Spotlight has a truly wild feature, which is the ability to search an item’s text content (across a pretty wide array of file types, including .pdf, .olm, .py, etc.)
We can run our search against a file’s contents by calling the kMDItemTextContent
metadata attribute in our query:
...AND mdfind.query = "kMDItemTextContent == 'foo'"
We need to think like a production database backup, and consider what strings we would give a strong signal to noise ratio for filtering down results. For instance, we might look for a CREATE TABLE
statement that would indicate the presence of a standard SQL DB.
SELECT
f.path,
f.size AS size_bytes,
ROUND((f.size * 10e-7),2) AS size_megabytes,
f.btime AS file_created_epoch,
datetime(f.btime, "unixepoch") AS file_created
FROM file f
JOIN mdfind ON mdfind.path = f.path
AND mdfind.query =
"kMDItemFSName == '*backup*' && kMDItemTextContent == 'CREATE TABLE'"
Attempt 2 - 9 Results
🔔 Ding, ding, ding! 🔔 - I see our file!
But we can do better!!
Attempt 3 — Filtering down results
As you might have seen in our results set, there is an additional component we could be scoping our query with. Any uncompressed DB backup will likely range in size from hundreds of MB to hundreds of GB. Therefore, we can add a condition to our query to filter out any results that are below a certain threshold in size. For our example, we will ignore any file that is less than 100 MB.
SELECT
f.path,
f.size AS size_bytes,
ROUND((f.size * 10e-7),2) AS size_megabytes,
f.btime AS file_created_epoch,
datetime(f.btime, "unixepoch") AS file_created
FROM file f
JOIN mdfind ON mdfind.path = f.path
AND mdfind.query =
"kMDItemFSName == '*backup*' && kMDItemTextContent == 'CREATE TABLE'"
AND size_megabytes > 100
Attempt 3 - 1 Result
We did it! Using three clauses, we’ve isolated our test file!
Attempt 4 — But wait there’s more…
What if I told you there was another identical backup file without such a convenient naming schema located on the device? Well, we are going to miss it with that kMDItemFSName
condition. Let’s try killing that condition and seeing what happens…
🥜 Nuts! — Some false positives in there!
But I can see our other file. Let’s see if we can filter it down by further refining the kMDItemTextContent argument with some knowledge of the database we’re looking for.
I happen to know our production database should have a table called alerts, so let’s change the string to CREATE TABLE alerts:
SELECT
f.path,
f.size AS size_bytes,
ROUND((f.size * 10e-7),2) AS size_megabytes,
f.btime AS file_created_epoch,
datetime(f.btime, "unixepoch") AS file_created
FROM file f
JOIN mdfind ON mdfind.path = f.path
AND mdfind.query =
"kMDItemFSName == '*backup*' && kMDItemTextContent == 'CREATE TABLE alerts'"
AND size_megabytes > 100
Attempt 4 - 2 Results
Boom goes the dynamite!
We have found our db backups, and can now reach out to the responsible end-user and ask that they delete the unused file.
Hopefully this iterative example will give you some inspiration into how you might leverage the mdfind
table to locate and prevent unintentional catastrophic data breaches. And of course, just to reiterate…1Password Device Trust does have templates that will let you make these checks a little more easily.
In the next section, we will discuss some of the basic syntax that you can use within the mdfind.query
mdfind query syntax tips
Apple Support Documentation: File Metadata Query Expression Syntax
Comparative logic works using the following operators
==
equals
!=
not equal
<
less than
>
greater than
<=
less than or equal to
>=
greater than or equal to
c
makes string case-insensitive
...mdfind.query = "kMDItemFSName = '*FoO'c"
d
ignores diacritical marks (such as à, ê, ñ, ß, etc.)
...mdfind.query = "kMDItemFSName = '*föo'd"
*
Wildcard lets you search for partial matches on either side of a string ...mdfind.query = "kMDItemFSName = '*foo'" &&
AND condition
...mdfind.query = "kMDItemFSName = 'foo' && kMDItemTextContent = 'bar'"
||
OR condition
...mdfind.query = "kMDItemFSName = 'foo' || kMDItemFSName = 'bar'"
(
& )
Use parentheses to enclose multiple groups of conditions:
...mdfind.query = "(kMDItemFSName = 'foo' || kMDItemFSName = 'bar') && (kMDItemTextContent = 'paris' || kMDItemTextContent = 'france')
$time.
Like constraining to time, for files created in a time range: (eg. $time.now, $time.today, $time.yesterday, $time.this_week, $time.this_month, $time.this_year
)
These can be further modified by providing a parenthetical number afterwards. The number in parenthesis refers to the unit of time measurement: (eg. now
is registered in seconds, today
in days, this_week
weeks, etc.)
Let’s modify the $time.now
example to search for files created in the last hour.
...mdfindquery = "kMDItemFSCreationDate >= $time.now(-3600)"
“Wow! This mdfind thing can do it all! There’s no stopping me!"
🐯 Easy tiger…there are a couple of gotchas that you have to look out for when using mdfind with osquery!
Caveats to the mdfind Osquery Table:
Spotlight can only give you the paths of matching files
“There’s no p in this gre”
While the contents of files can be read by mdfind, they cannot be printed (output) as part of the results set.
This is an intentional feature on the part of the osquery team. This limitation keeps the mdfind table within the scope of osquery’s privacy-minded development.
“It’s kind of like playing 20 questions”
You cannot print the surrounding strings that match within a file when querying kMDItemTextContent
, or examine the file by arbitrarily reading it in osquery. Therefore, you must construct your query very intentionally in order to avoid false positives.
Remember our earlier example. If you wanted to look for downloaded copies of your production database across your infrastructure, you could not simply search for the string CREATE TABLE
because it would net too many false positives. Instead, you would want to specify a string that would be found only in a real db backup, and strengthen your argument by adding exclusionary criteria such as scoping to file size.
No regular expressions
Regular expressions. I want them. You want them. Sadly, none of us can have them.
As useful as regex would be, it is currently unsupported by Spotlight and Apple has shown no sign of intending to add regex functionality ever.
This means, no matter how cool it would be, you can’t run the following query within mdfind to search for plain-text files containing credit card numbers:
...AND mdfind.query = "-regex '^(?:4[0-9]{12}(?:[0-9]{3})?|5[1-5][0-9]{14}|3[47][0-9]{13})$'
You have to get more creative when it comes to finding things like files containing credit cards or social security numbers. The easiest way is typically looking for files with strings like cvv
, or ssn
, but those may produce large quantities of false positives.
Another approach is including canary fingerprint values in your db backups, which you can use to key off of (eg. a fake credit card # of a known value that always belongs to a fake user).
mdfind only indexes *some of your files
Because the intended usage of Spotlight is to quickly locate relevant user actionable items (apps, system preferences, files), Spotlight does not index hidden folders or files by default. While mdfind can be forced to import from new directories, Apple prevents system files and hidden directories from being indexed. This means you cannot run the following query to find all of your locally cloned repositories:
...AND mdfind.query = "kMDItemFSName = '.git'"
This also means you should forget about indexing your .ssh
folders or any items located within hidden folders (those prepended with the “.
” character).
macOS Spotlight must not be disabled
It should go without saying that you cannot use the mdfind table on any operating system other than macOS.
I would love to see someone take on the challenge of developing a similar table for Windows, (the Linux space has too many options to choose only one) but for the moment we just have to satisfy ourselves with querying solely the Macs in our fleet via this method.
Furthermore, due to indexing performance issues in earlier iterations of OS X, there are still those (grumpy developers) who disable mdfind on their system.
Users can also exclude directories from being indexed, and globally limit what types of files are imported on their system.
To output any user-specified excluded directories, you can run the following query:
SELECT
value
FROM plist
WHERE path = '/.Spotlight-V100/VolumeConfiguration.plist'
AND key = 'Exclusions'
AND value IS NOT NULL
AND value != ''
Categories of files can also be excluded irrespective of their parent directory. For instance, some folks don’t love Spotlight indexing their Mail.app, Outlook.app or Messages.app conversation and email history. As a result, querying for these files will often not return results.
To output the Spotlight preferences of a user, you can run the following query against their device. Unfortunately, due to the handling of nested keys by osquery, the plist XML output will need to be humanly parsed and cannot be procedurally checked.
SELECT *
FROM plist
WHERE path LIKE '/Users/%/Library/Preferences/com.apple.Spotlight.plist';
Easy mode isn’t always so easy
The mdfind table can be queried in one of two ways: explicitly, wherein you specify the kMDItem
attributes and their desired criteria, or implicitly wherein you simply provide a simple string of text like a user would in the Spotlight Search bar.
I didn’t discuss the implicit method earlier, because it is my firm belief that it doesn’t work as well and results in too many false-positives. But I will demonstrate it here for the sake of being comprehensive.
Query for any file that contains foo
:
...AND mdfind.query = "foo"
Query for any file that contains BOTH foo
and bar
*
*By default, all space delimited strings are treated as AND’ed conditions when simple querying Spotlight
...AND mdfind.query = "foo bar"
Query for any file that contains foo
but NOT bar
...AND mdfind.query = "foo(-bar)"
Use the metadata attributes that work
While using Spotlight in the UI of macOS allows intelligent full-text search across available attributes, the mdfind table in osquery is much more useful if you specify the attributes you wish to search and the criteria you want to match: (eg. kMDItemFSName, kMDItemTextContent,
etc.)
Because you need to know your metadata attributes by name, it helps to keep a cheat-sheet like this one handy so that you can find what you are looking for.
In general, however, the most useful items I have found are:
1. kMDItemFSName
Great for finding files of a certain extension type.
2. kMDItemTextContent
Similar to our article’s production backup example, great for finding strings inside of compatible documents.
...AND mdfind.query = "kMDItemTextContent == '*ssn,*'"
3. kMDItemFSCreationDate
& kMDItemFSContentChangeDate
Great for finding any file that was created / modified on, or within a range.
...AND mdfind.query =
"( kMDItemFSCreationDate >= $time.iso(2018-08-01T00:00Z)
&& kMDItemFSCreationDate <= $time.iso(2018-08-20T00:00Z))
&& (kMDItemFSName = '*.csv')"
Some mdfind operators do not work as documented
onlyin
has only spotty support
mdfind has an argument called: -onlyin /path/you/want
which would typically constrain the results to only items which are within the specified parent directory (any level of nesting below that parent directory). This argument can be used, but must be formatted accordingly with the string first followed by -onlyin
:
...AND mdfind.query = "foo -onlyin /path/you/want/"
Because of the way that we are forced to write our query within double quotations, you cannot (to my frustration) constrain to a parent directory with a space in the path. Enclosing the path in single quotes does not work, and \ escaping the spaces does not work. If you can figure out a way, I would love to hear about it.
kMDItemKind
is kind of garbage
You may be tempted to search for files that are pdfs by typing:
...AND mdfind.query = "kMDFSItemKind == 'pdf'"
but DON’T!
For some totally unclear reason, the ItemKind metadata attribute is inconsistent at best, and will miss files that it shouldn’t.
You should instead rely on kMDItemFSName with wildcards, and explicitly name the desired extension, as it is more reliable:
...AND mdfind.query = "kMDItemFSName == '*.pdf'"
Chained OR
conditions
mdfind
in the terminal supports strings such as: mdfind foo|bar(-baz)
This would return items that matched foo
OR bar
, but NOT baz
These pipe |
OR conditions cannot be used in osquery when using the simple syntax.
In order to OR
conditions, you must use explicit kMDItem
conditions separated by double pipes: ||
eg.
...AND mdfind.query "kMDItemFSName == 'foo' || kMDItemFSName == 'bar'"
And on that note…
Maximum 5 OR conditions for a single metadata attribute
Let’s say you wanted to find files that contain any of the following strings.
fritz@acme.com
jane@acme.com
john@acme.com
frank@acme.com
stella@acme.com
joyce@acme.com
rupert@acme.com
You would only be able to specify 5 OR’ed conditions using: kMDItemTextContent = 'fritz@acme.com' || kMDItemTextContent = 'jane@...
With more than 5 Conditions, mdfind
stops returning results entirely.
If you feel demoralized regarding the usage of mdfind, I am here to tell you that despite these limitations, mdfind has allowed us to build over one hundred compelling queries for our 1Password® Extended Access Management customers.
Wrapping Things Up
- As you can see, there is a wealth of possibility in the mdfind table for quickly locating files or performing aggregation functions.
And if this article got you excited about the possibilities, more mdfind queries can be found in 1Password Device Trust, powering our Checks feature. That’s how we keep our customers safe by looking for potential sources of data compromise across their fleets.
Tweet about this post