Project Gutenberg SQL Queries
Here you can find some queries which you can run against a database with Project Gutenberg data created in this tutorial.
Total number of books
SELECT count( * ) FROM books;
Expected result: 76553
10 books without a text version
Some books, such as Tom Sawyer Abroad by Mark Twain, exist only as audio books. You can get a list of ten such books using the query below.
SELECT * FROM books WHERE TRIM(textUrl) = '' LIMIT 10;
Total number of books for which a text version exists
SELECT count( * ) FROM books WHERE (TRIM(textUrl) != '');
Expected value: 75042
Number of books in English with a text version
SELECT count( * )
FROM books
WHERE
(langCode = 'en') AND
(textUrl IS NOT NULL ) AND
(textUrl != '') AND
(TRIM(textUrl) != '');
Expected result: 59477
Number of prose fiction books in English with a text version
SELECT count( * )
FROM books
WHERE (isFiction = true) AND
(langCode = 'en') AND
(textUrl IS NOT NULL ) AND
(textUrl != '') AND
(TRIM(textUrl) != '');
Expected result: 22613
Number of poetry books in English with a text version
SELECT count( * )
FROM books
WHERE (isPoetry = true) AND
(langCode = 'en') AND
(textUrl IS NOT NULL ) AND
(textUrl != '') AND
(TRIM(textUrl) != '');
Expected result: 2816
How many books are there in each language in Project Gutenberg?
SELECT langCode,
count( * ) as BookCount
FROM books
WHERE (textUrl IS NOT NULL) AND
(textUrl != '') AND
(TRIM(textUrl) != '')
GROUP BY langCode
ORDER BY BookCount DESC;
Expected results:
Surprisingly, there are more books in Finnish (fi) than in German (de), Italian (it), and Spanish (es).
Random fiction book in English with a text version
SELECT *
FROM books
WHERE (isFiction = true) AND
(langCode = 'en') AND
(textUrl IS NOT NULL) AND
(textUrl != '') AND
(TRIM(textUrl) != '')
ORDER BY RANDOM()
LIMIT 1;
Random poetry book in English with a text version
SELECT *
FROM books
WHERE (isPoetry = true) AND
(langCode = 'en') AND
(textUrl IS NOT NULL) AND
(textUrl != '') AND
(TRIM(textUrl) != '')
ORDER BY RANDOM()
LIMIT 1;
Which creators created the most books in English prose fiction?
SELECT creators,
count( * ) AS BookCount
FROM books
WHERE (isFiction = true) AND
(langCode = 'en') AND
(textUrl IS NOT NULL) AND
(textUrl != '') AND
(TRIM(textUrl) != '')
GROUP BY creators
ORDER BY BookCount DESC;
Expected result:
How many writers are represented (English fiction prose only)?
SELECT count( * )
FROM (
SELECT DISTINCT creators
FROM books
WHERE (isFiction = true) AND
(langCode = 'en') AND
(textUrl IS NOT NULL) AND
(textUrl != '') AND
(TRIM(textUrl) != '')
)
Expected result: 5142
How to find a book by its title?
Imagine I want to find out the URL of The Brothers Karamazov by Fyodor Dostoevsky.
You can use the following query:
SELECT *
FROM books
WHERE (isFiction = true) AND
(langCode = 'en') AND
(title LIKE '%brothers kara%');
Expected result: Book nr. 28054
P.S.: Here are some other articles you may find interesting.