Project Gutenberg SQL Queries

Here you can find some queries which you can run against a database with Project Gutenberg data created in this tutorial.

Total number of books

SELECT count( * ) 
  FROM books;

Expected result: 76553

10 books without a text version

Some books, such as Tom Sawyer Abroad by Mark Twain, exist only as audio books. You can get a list of ten such books using the query below.

SELECT *
  FROM books
 WHERE TRIM(textUrl) = ''
 LIMIT 10;

Total number of books for which a text version exists

SELECT count( * ) 
  FROM books
 WHERE (TRIM(textUrl) != '');

Expected value: 75042

Number of books in English with a text version

SELECT count( * ) 
  FROM books
 WHERE 
       (langCode = 'en') AND
       (textUrl IS NOT NULL ) AND
       (textUrl != '') AND
       (TRIM(textUrl) != '');

Expected result: 59477

Number of prose fiction books in English with a text version

SELECT count( * ) 
  FROM books
 WHERE (isFiction = true) AND 
       (langCode = 'en') AND
       (textUrl IS NOT NULL ) AND
       (textUrl != '') AND
       (TRIM(textUrl) != '');

Expected result: 22613

Number of poetry books in English with a text version

SELECT count( * ) 
  FROM books
 WHERE (isPoetry = true) AND 
       (langCode = 'en') AND
       (textUrl IS NOT NULL ) AND
       (textUrl != '') AND
       (TRIM(textUrl) != '');

Expected result: 2816

How many books are there in each language in Project Gutenberg?

SELECT langCode,
       count( * ) as BookCount
  FROM books
 WHERE (textUrl IS NOT NULL) AND 
       (textUrl != '') AND 
       (TRIM(textUrl) != '') 
 GROUP BY langCode
 ORDER BY BookCount DESC;

Expected results:

/ref/en/project-gutenberg-sql-queries/img01.png

Surprisingly, there are more books in Finnish (fi) than in German (de), Italian (it), and Spanish (es).

Random fiction book in English with a text version

SELECT *
  FROM books
 WHERE (isFiction = true) AND 
       (langCode = 'en') AND 
       (textUrl IS NOT NULL) AND 
       (textUrl != '') AND 
       (TRIM(textUrl) != '') 
 ORDER BY RANDOM() 
 LIMIT 1;

Random poetry book in English with a text version

SELECT *
  FROM books
 WHERE (isPoetry = true) AND 
       (langCode = 'en') AND 
       (textUrl IS NOT NULL) AND 
       (textUrl != '') AND 
       (TRIM(textUrl) != '') 
 ORDER BY RANDOM() 
 LIMIT 1;

Which creators created the most books in English prose fiction?

SELECT creators,
       count( * ) AS BookCount
  FROM books
 WHERE (isFiction = true) AND 
       (langCode = 'en') AND 
       (textUrl IS NOT NULL) AND 
       (textUrl != '') AND 
       (TRIM(textUrl) != '') 
 GROUP BY creators
 ORDER BY BookCount DESC;

Expected result:

/ref/en/project-gutenberg-sql-queries/img02.png

How many writers are represented (English fiction prose only)?

SELECT count( * ) 
  FROM (
           SELECT DISTINCT creators
             FROM books
            WHERE (isFiction = true) AND 
                  (langCode = 'en') AND 
                  (textUrl IS NOT NULL) AND 
                  (textUrl != '') AND 
                  (TRIM(textUrl) != '') 
       )

Expected result: 5142

How to find a book by its title?

Imagine I want to find out the URL of The Brothers Karamazov by Fyodor Dostoevsky.

You can use the following query:

SELECT *
  FROM books
 WHERE (isFiction = true) AND 
       (langCode = 'en') AND 
       (title LIKE '%brothers kara%');

Expected result: Book nr. 28054


P.S.: Here are some other articles you may find interesting.