How to Extract English Fiction Prose from Project Gutenberg
Step 4: Create SQLite database
Now we need to create a SQLite database with the data from the RDF files. First, copy the SQL file from rdf/cache to the directory where the database should be (cp rdf/cache/rdf.sql .).
Then, run the following command:
sqlite3 books.db < rdf.sql
Thereafter you should have the file books.db.
Step 5: Open the database in SQLite studio
Now we will import books.db into SQLite Studio so we can run queries in a more convenient way.
After launching SQLite Studio, select Add database from the Database menu.
A dialog box will open. Click on the folder button and select the books.db file.
Then press the OK button. An icon will appear in the tree view on the left side. Right-click on it and select Connect to the database from the menu.
Press the Open SQL Editor button in the toolbar.
Enter the query
SELECT count( * ) FROM books;
You should see 76553 as the result (number of all books).
If you do, you can now run various SQL queries against this database.
If you are interested in further experiments with books from Project Gutenberg, consider subscribing to my SubStack. You can also read my other articles on exploratory stylometry.