On 2015-09-17 05:13 AM, Nicolas J?ger wrote: > @Ryan, >> To get you started on this road, may I ask you take a leap of faith >> and explain to us from A to Z what you would like to achieve with the >> tables and data you are asking about above - i.e don't show us the >> programming you think will solve the problem and ask us if that is >> good SQL, rather explain us the problem to be solved and allow us to >> show you the good SQL that will do it. >> >> Every next task should be clearer and easier. > Actualy, it's a personal open source project. So the code could be see > by anyone. The only thing is, the most recent part isn't yet on the > repo (last push was before the summer). But, if for some reason you > want to see the code as it is today on my computer (almost all files > were heavily modified during the summer), I can push it. > > this project is about helping me to manage all files I have on my > computer. I'm a metalhead and science nerd at the same time. > So, I have tons of music and tons of ebooks and publications... So my > project as currently two goals: > > 1) store the data on disk, I should not access the files directly. > > 2) using a DB to retrieve any document trough a search engine. > > (actually I want to add a third goal: I want to create something I > don't know how I will call it, but it's something like a binder. For > example if you open a binder about 'modelling demons', you can link data > of differents files even if they don't share tags or whatever (can be > in that example some texture, some .blend or pdf about modelling > humanoid...).) > > So, I'm writing some custom GTKmm widgets wich interact with the DB > (but they don't directly access any sqlite function, I'm working with > engines communication (gtkmm_engine, sqlite_engine, etc.)). > > I'm currently thinking to make that program modular. For example, the > user (usualy me), set what kind of data (trough some config file (in > my head I call these files 'collection')) should be store in the > DB. For each data will correspond a widget. > > Let's took an example, > > for ebooks: > > I want to save the filename (filename is ruled by the program, the > user has no direct way to modify it), title, some tags and a description > for each ebook (for the last three the program shows some widgets, > wich allow the user to modify the values). > > At this point, when the program read the ebook.collection, it will > create a first table EBOOK with four columns > (ID,FILENAME,TITLE,DESCRIPTION) > > for the tags, things are more complicated. I need two tables: > - TAGS with three columns (ID,NAME,COUNT) > - TAGSMAP with two columns (ID_EBOOK,ID_TAGS) (I explain the use below) > > so if I take my old example : > > (id|NAME|COUNT) > 53|black metal|3 > 55|occult rock|3 > 61|satanic smurfs|1 > 62|beer|0 > 63|pizza|0 > > if I'm seeking to get all files with the tags 'black metal' the program > check all entries in TAGSMAP with ID_TAGS = 53, > > SELECT * FROM TAGSMAP WHERE ID_TAGS = 53 would returns something like, > > (ID_EBOOK|ID_TAGS) > 3|53 > 5|53 > 9|53 > > then the ID_EBOOKs refer to the IDs of the data (title and description > in this case) for some files in EBOOK. > > now my problems come when you want to add/delete a tags. If for a file > you want (eg) to delete the tag 'black metal' you click on the button I > setup to delete that tag, the program at this time has two > informations: > > the current EBOOK.FILENAME and the TAGS.NAME of the tag. > > So I have to query the DB to get their respective IDs (ID of that > FILENAME in EBOOK and ID of that NAME in TAGS). > > Then, the program has to delete the entry in the TAGSMAP where ID_EBOOK > is the same than ID for the current file in EBOOK and where ID_TAGS > equal the ID in TAGS. > > For TAGS, the program has first to check COUNT. Before the check I have > to get the value from the DB. > > If COUNT = 1, the program has to delete the tag with NAME 'black > metal'. > > If COUNT > 1, the program has to decrement count. > > My first manner to do that, was to just store/read the data to/from the > DB. All conditions (like what I try to do for some days now...) were > executed by the program, not by queries. But the problem (wich is > not really a big issue), I do a lot of queries because of the IFs in my > program then I get one/two seconds of lags (wich may also be here > because I wrote all operation I do in the console for debug > purpose...)... > > So, as I wrote in an earlier email, since I know I can use > transaction, I try to replace all queries in one giant query send to > the DB when I want to update the data for a file (title, tags and > description). > > Maybe, I should ask if what I want to do is really possible by > SQL or if I should better let the IFs in the program better than > struggling like I did in my last emails... > > > I really hope I was clear. If you need further explanation about > something just tell it. Anyway thank you for the time you spend to > explain your magic to some "muggle" like me...
Ok, firstly, these things you wish to do is exactly the most common use of Database engines. Is it suitable to do in a database?, heck these are the exact things that databases exist for. You will also be happy to hear there already exists a myriad of programs doing exactly what you are designing here, perhaps with a slightly different paradigm or use-case, but in terms of data-usage, much the same. Some initial things. Data needs to be kept in 1NF (or 1st-Normal-Form) which is relational theory speak for "Everything without a 1-to-1 relationship with the key field in a table, should be in another table". Many reasons for this (if you care to read up on some RT) but the most simple reason is: We can make complex extractions from 1NF data tables via JOINs, sub-queries and the like, but we cannot extract simple information (or de-complicate it) from convoluted tables. Example: You can store a person's ID, name, surname, hair colour, age all in 1 table, but you can't put the names of his/her children in there, because no longer a 1-to-1 relationship. In your case, all files can go in 1 table with their respective types, genres, usage, path, etc. but not the tags. Tags will need to be in another table. This also has a reverse-implication: Do not put things which are in effect similar, and might differ only by simple properties with 1-to-1 relations, in different tables. Put them together. Example: In a "People" database, you can list everyone's occupation. There is no need to make a table for only scientists, and another for only Entertainers. Just use 1 table and make sure it has a property (field) for "Occupation". In your use-case, there is no need to have different tables for E-Books or Videos or Music... they are all just files with slightly different properties. Sure, an e-book might have an edition while a music file would typically not, and a music file might have both an artist and a composer (and perhaps an author) while an e-book only have an author, but these are minute differences and will manifest in very few fields. There is no harm in having fields for all of Author, Composer and Artist, even if some entries do not use all of it. Then again, you might wish to record a lot more information about say the Author of anything, like gender, date of birth, country, etc. - in which case Author information will definitely move to another table, but you will still have an "Author" field in the main table to know how to link to the other table. Making combinations later (your idea about a sort of binder) will be especially easy when all relevant files are maintained in a single table with foreign keys to other tables as needed. Most combinations can be achieved by a simple table with double-keys in it. For instance the TAGS table as you've described it will be a simple table with 2 fields (columns) containing the FILE-table key in one, and the TAG key in the other, with the primary key for this TAGS table made over both columns. That way it is impossible to duplicate things, and none of all that delete tags stuff is needed that you noted in your design. This will also make everything easy w.r.t. counting how many tags of which kinds exist and so on. And if you need to know how many tags of a certain kind exists but only for e-books, then that is achieved with a simple join lookup. This brings me to a good next point: When programming, we always try to make the data as meaningful as possible, such as storing reference counts etc. But in a database, we try to make the data as plain as possible - i.e. you should never store data values which can be attained from a query inside the database. (Unless the best possible method of obtaining it is still too slow, but that is an exercise for much later and will require optimizations which should never be entertained prematurely). I can envision 3 initial tables you will need: FILES - to keep all file-references with their id, file_Name, file_Path, file_extension, kind (e-Book, Music, etc.), size, opens_with, run_length, author, composer, artist, etc. TAGNAMES - to keep a simple list of Tag IDs and their Names/Captions/Descriptions TAGS - a simple reference table with 2 columns (file_id, tag_id) to keep track of which tags are added to which files. (More may be needed later) Note: Things like "Genre" can be envisioned as a Tag rather - this way you can add multiple genres (or none) to a single file. A first-presentation schema might look something like this: CREATE TABLE "Files" ( id INTEGER PRIMARY KEY, file_kind TEXT NOT NULL COLLATE NOCASE, size INT NOT NULL, run_length INT, rating INT, file_name TEXT NOT NULL COLLATE NOCASE, file_extension TEXT COLLATE NOCASE, file_path TEXT NOT NULL COLLATE NOCASE, artist TEXT, author TEXT, composer TEXT, synopsis TEXT, CONSTRAINT UFile_Key UNIQUE (file_name, file_extension, file_path) ); CREATE TABLE "TagNames" ( name TEXT NOT NULL COLLATE NOCASE PRIMARY KEY, kind TEXT NOT NULL COLLATE NOCASE DEFAULT '', description TEXT COLLATE NOCASE ) WITHOUT ROWID; CREATE TABLE "Tags" ( file_id INTEGER, tag_name TEXT NOT NULL COLLATE NOCASE, PRIMARY KEY (file_id, tag_name), FOREIGN KEY (file_id) REFERENCES "Files"(id) ON DELETE CASCADE ON UPDATE CASCADE, FOREIGN KEY (tag_name) REFERENCES "TagNames"(name) ON DELETE CASCADE ON UPDATE CASCADE ) WITHOUT ROWID; Note: the COLLATE NOCASE options added specifically to the file_* fields above assumes NTFS or such file system that is NOT case-sensitive. If this is to run on Linux or indeed anything with a case-sensitive file-system, then best is to keep the file name, path and extension also case-sensitive by removing those NOCASE options. Other NOCASE fields should remain as is. Some Queries for basic tasks ---------------------------------------- 1 - Add a normal Tag-name called "SciFi": REPLACE INTO "TagNames" (name) VALUES ('SciFi'); 2 - Add a genre-specific Tag-name called "SciFi" with some description (this will overwrite the previous entry of Sci-Fi with more detail): REPLACE INTO "TagNames" (kind, name, description) VALUES ('genre', 'SciFi', 'Science-Fiction'); 3 - Add a File for an e-book called "The Martian" REPLACE INTO "Files" (id, file_kind, size, file_name, file_extension, file_path, author, synopsis) VALUES (1, 'ebook', 10424, 'the_martian_book.txt', 'txt', 'd:\documents\books', 'Andy Weir', 'Six days ago, astronaut Mark Watney became one of the first people to walk on Mars. Now, he's sure he'll be the first person to die there.'); 4 - Assign the genre-specific tag we already added to the tag-names list to this book: INSERT INTO "Tags" VALUES (1, 'SciFi'); Note: This will fail if either the book ID 1 or the tag name 'SciFi' does not exist yet. Note2: It will also fail if tried a second time, since the combination of file id and tag name must be unique in the table. Note3: There are different way of attaining an inserted id, if you prefer not to control the id yourself but relinquish it to the AUTOINC mechanism. 5 - Change the Tag-Name 'SciFi' to 'Sci-Fi': UPDATE "TagNames" SET name = 'Sci-Fi' WHERE name = 'SciFi'; Note: The above will change all SciFi tags in the "Tags" table to 'Sci-Fi' so that the relationship is not broken - all thanks to the ON UPDATE CASCADE in the foreign key. Note2: This requires Foreign-Key checking is ON for your database connection (It is off by default, run SQL "PRAGMA foreign_keys = ON;" to enable. Read more at: https://www.sqlite.org/foreignkeys.html 6 - Delete the Tag-name "Sci-Fi" DELETE FROM "TagNames" WHERE name = 'Sci-Fi'; Again, this will remove all such tags throughout the "Tags" table too because of the foreign-key relationship, but you are equally welcome to run: DELETE FROM "Tags" WHERE tag_name = 'Sci-Fi'; 7 - Find all e-books and their genres with txt extensions that resides somewhere on the D drive: SELECT F.file_path, F.file_name, max(F.size) AS Size, group_concat(G.name) AS Genres FROM Files LEFT JOIN Tags AS T ON T.file_id = F.id LEFT JOIN TagNames AS G ON G.name = F.tag_name AND G.kind = 'genre' WHERE F.kind = 'ebook' AND F.file_path LIKE 'D:%' AND F.file_extension = 'txt' GROUP BY F.file_path, F.file_name Note: The max(F.size) is just to satisfy SQL grouping semantics, it is imposible in the schema to have more than one file with same path and same name, so size will always be the same for all items in such a group. Ok, enough for day 1. I've been as brief as possible, but lots of these conventions may need some elaboration (not enough space here to add all of it), but please do ask where needed. Obviously your schema will be more complete (I imagine adding such things as release_date, IMDB_link, perhaps youtube link etc.), but once you have a good schema, we can elaborate more on the specifics. Another table might keep a list of reviews or references found online - The World is your oyster.... btw: Your binder idea is equally easy implementable with a similar set of tables as the Tags above. One table to record a Binder name, purpose and such, and another foreign-keyed dual-reference list connecting every included file to its binder. I'm hoping the idea is shining through on how this is different from normal programming but much much more powerful for handling data. Keep us posted :) Ryan