- Shodan can be used to find Calibre servers.
- I wrote an nmap script for identification and metadata analysis
- 2.5 million titles are available on identified servers
- An average of 10k~ titles per server
- If you use the Calibre web server, verify the authentication and it's network connectivity.
- The Calibre developers don't have the best history of dealing with security issues.
I love reading, and I especially like my e-readers. They allow you to carry and travel with hundreds of books. Calibre is an open source e-book management application, and probably one of the most popular. It's capable of running a server to allow remote users to browse and download books. Knowing this and being a pentester by trade, I became quite curious if there was any notable presence of Calibre on the internet. In it's default configuration, Calibre does not require any authentication to access the web interface. Using Shodan.io, we can search for the keyword Calibre in the server HTTP header.
Using the export function, we can gather a large number of possible Calibre web servers quickly. Depending on the version of Calibre, it's possible to extract the entire manifest of all the books. This includes the title, author, genre, etc. For the older version, it's possible to scrape the mobile interface using regex for the total number of books.
To help expedite the process of identifying Calibre web servers, I wrote a simple nmap script to help identify:
- If authentication is required
- Number of books
I created a pull request to have it integrated in future releases of nmap pending approval.
Using my script, I enumerated roughly 2,500,000 titles on unauthenticated Calibre servers.
Of the original 1,800 or so servers from Shodan, we were able to download the manifest file from 225 Calibre servers. Note this doesn't include unauthenticated servers which don't offer the manifest file. I didn't write a crawler to parse individual titles and requesting potentially 100s of pages from a single host.
From the 225 Calibre servers, I was able to identify about 10,000 unique titles. Some interesting observations:
- Ironically, there's a number of "cybersecurity" titles.
- I tried searching through the titles for sensitive documents such as "receipt" or "invoice" or "tax". Nothing.
- Unsurprisingly given world demographics, a large number of titles are not English. This might have hamstringed my manual analysis.
This is just a jumping off point that came from a lazy weekend morning, some interesting takeaways and ideas to for next steps
- Build an actual script to harvest all 2.5 million books.
- PDFs, EPUBs, and MOBIs can contain metadata. It may be possible to disclose sensitive information.
- There are some poorly documented API endpoints available via /cdb/ (unauth'd /cdb/cmd anyone?) which can be used to programmatically interface. If I was to to review Calibre more closely, I would start here.