How to Extract English Fiction Prose from Project Gutenberg

Step 4: Create SQLite database

Now we need to create a SQLite database with the data from the RDF files. First, copy the SQL file from rdf/cache to the directory where the database should be (cp rdf/cache/rdf.sql .).

Then, run the following command:

sqlite3 books.db < rdf.sql

Thereafter you should have the file books.db.

Step 5: Open the database in SQLite studio

Now we will import books.db into SQLite Studio so we can run queries in a more convenient way.

After launching SQLite Studio, select Add database from the Database menu.

/ref/en/how-to-extract-english-fiction-prose-from-project-gutenberg-4/img01.png

A dialog box will open. Click on the folder button and select the books.db file.

/ref/en/how-to-extract-english-fiction-prose-from-project-gutenberg-4/img02.png

Then press the OK button. An icon will appear in the tree view on the left side. Right-click on it and select Connect to the database from the menu.

/ref/en/how-to-extract-english-fiction-prose-from-project-gutenberg-4/img03.png

Press the Open SQL Editor button in the toolbar.

/ref/en/how-to-extract-english-fiction-prose-from-project-gutenberg-4/img04.png

Enter the query

SELECT count( * ) 
  FROM books;

You should see 76553 as the result (number of all books).

/ref/en/how-to-extract-english-fiction-prose-from-project-gutenberg-4/img05.png

If you do, you can now run various SQL queries against this database.


If you are interested in further experiments with books from Project Gutenberg, consider subscribing to my SubStack. You can also read my other articles on exploratory stylometry.