I hope someone can help me to improve my SQL.

The solution I came up with becomes unreadable after 3 dots in a website address (at least for me).

This is my approach:

The domain/subdomain name of a website address is enclosed between '://' and the following '/'. So I trim any text before and including '://' and after '/' in line 17 - AS long

I calculate the number of dots in long - line 16 - AS dots

Because I could not find a way to figure out the position of the 2nd to last dot directly (pseudo: substr(string, '2nd to last dot')  or substr(string, nth('.')) ), I work with nested string manipulation. The problem is that the nested string manipulation becomes unreadable very quickly.

The case statement (lines 6 to 14) then strips all unwanted text depending on the number of dots.
For 1 dot:  no stripping required
For 2 dots: a subdomain and the additional dot must be stripped
For 3 dots: the sub-subdomain and the additional w dots must be stripped
etc.
(CASE WHEN dots = 1  isn't really required, because the ELSE clause will cover this case as well.)

Sample data:

PRAGMA foreign_keys=OFF;
BEGIN TRANSACTION;
CREATE TABLE links(link);
INSERT INTO "links" VALUES('https://stackoverflow.com/questions/7826876/sqlite-reverse-string-function'); INSERT INTO "links" VALUES('https://www.pcwelt.de/ratgeber/Virtuelle-Netzwerke-mit-Open-VPN-aufbauen-9889432.html?utm_source=best-of-pc-welt-manuell&utm_medium=email&utm_campaign=newsletter&ext_id=3291863&pm_cat%5B0%5D=Netzwerktechnologie&pm_cat%5B1%5D=Netzwerke+allgemein&pm_cat%5B2%5D=Personal+Computer&pm_cat%5B3%5D=Netzwerk+Hardware&pm_cat%5B4%5D=Apps&pm_cat%5B5%5D=Client+Hardware&r=3677484228301571&lid=784281&pm_ln=3');
INSERT INTO "links" VALUES('http://dot3.dot2.memyself.eu/blabl.html');
INSERT INTO "links" VALUES('http://dot4.dot3.dot2.memyself.eu/miamia/blabl.html'); INSERT INTO "links" VALUES('http://dot5.dot4.dot3.dot2.memyself.eu/miamia/blabl.html'); INSERT INTO "links" VALUES('http://dot6.dot5.dot4.dot3.dot2.memyself.eu/miamia/blabl.html'); INSERT INTO "links" VALUES('http://dot7.dot6.dot5.dot4.dot3.dot2.memyself.eu/miamia/blabl.html'); INSERT INTO "links" VALUES('http://dot8.dot7.dot6.dot5.dot4.dot3.dot2.memyself.eu/miamia/blabl.html'); INSERT INTO "links" VALUES('http://dot9.dot8.dot7.dot6.dot5.dot4.dot3.dot2.memyself.eu/miamia/blabl.html');
COMMIT;

The example below is limited to 5 dots (domain plus 4 subdomain levels).
I did it for 9 dots as well and then the query blows up to 3 screen pages.

links:        name of source table:
link:         field name of website address
long:        field name of trimmed string
dots:        field name of calculated dots
domain:   field name of resulting domain.tld-string (stripped of any subdomain)

.mode column
.header on
.width 51 -4 51
-- up to 5 dots (= 4 subdomain levels)
SELECT long, dots,
(CASE
WHEN dots = 1 THEN long
WHEN dots = 2 THEN substr(long, instr(long, '.')+1)
WHEN dots = 3 THEN substr(substr(long, instr(long, '.')+1), instr(substr(long, instr(long, '.')+1), '.')+1) WHEN dots = 4 THEN substr(substr(substr(long, instr(long, '.')+1), instr(substr(long, instr(long, '.')+1), '.')+1), instr(substr(substr(long, instr(long, '.')+1), instr(substr(long, instr(long, '.')+1), '.')+1), '.')+1) WHEN dots = 5 THEN substr(substr(substr(substr(long, instr(long, '.')+1), instr(substr(long, instr(long, '.')+1), '.')+1), instr(substr(substr(long, instr(long, '.')+1), instr(substr(long, instr(long, '.')+1), '.')+1), '.')+1), instr(substr(substr(substr(long, instr(long, '.')+1), instr(substr(long, instr(long, '.')+1), '.')+1), instr(substr(substr(long, instr(long, '.')+1), instr(substr(long, instr(long, '.')+1), '.')+1), '.')+1), '.')+1)
ELSE long
END) AS domain
FROM
(SELECT long, length(long) - length(replace(long, '.', '')) AS dots
FROM (SELECT substr(link, instr(link, '://')+3, instr(substr(link, instr(link, '://')+3), '/')-1) AS long FROM links)
);

Is there a way to do this better/smarter?

Klaus
------------------------------------------------------------------------
_______________________________________________
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Reply via email to