I hope someone can help me to improve my SQL.
The solution I came up with becomes unreadable after 3 dots in a website
address (at least for me).
This is my approach:
The domain/subdomain name of a website address is enclosed between '://'
and the following '/'.
So I trim any text before and including '://' and after '/' in line 17 -
AS long
I calculate the number of dots in long - line 16 - AS dots
Because I could not find a way to figure out the position of the 2nd to
last dot directly (pseudo: substr(string, '2nd to last dot') or
substr(string, nth('.')) ), I work with nested string manipulation.
The problem is that the nested string manipulation becomes unreadable
very quickly.
The case statement (lines 6 to 14) then strips all unwanted text
depending on the number of dots.
For 1 dot: no stripping required
For 2 dots: a subdomain and the additional dot must be stripped
For 3 dots: the sub-subdomain and the additional w dots must be stripped
etc.
(CASE WHEN dots = 1 isn't really required, because the ELSE clause will
cover this case as well.)
Sample data:
PRAGMA foreign_keys=OFF;
BEGIN TRANSACTION;
CREATE TABLE links(link);
INSERT INTO "links"
VALUES('https://stackoverflow.com/questions/7826876/sqlite-reverse-string-function');
INSERT INTO "links"
VALUES('https://www.pcwelt.de/ratgeber/Virtuelle-Netzwerke-mit-Open-VPN-aufbauen-9889432.html?utm_source=best-of-pc-welt-manuell&utm_medium=email&utm_campaign=newsletter&ext_id=3291863&pm_cat%5B0%5D=Netzwerktechnologie&pm_cat%5B1%5D=Netzwerke+allgemein&pm_cat%5B2%5D=Personal+Computer&pm_cat%5B3%5D=Netzwerk+Hardware&pm_cat%5B4%5D=Apps&pm_cat%5B5%5D=Client+Hardware&r=3677484228301571&lid=784281&pm_ln=3');
INSERT INTO "links" VALUES('http://dot3.dot2.memyself.eu/blabl.html');
INSERT INTO "links"
VALUES('http://dot4.dot3.dot2.memyself.eu/miamia/blabl.html');
INSERT INTO "links"
VALUES('http://dot5.dot4.dot3.dot2.memyself.eu/miamia/blabl.html');
INSERT INTO "links"
VALUES('http://dot6.dot5.dot4.dot3.dot2.memyself.eu/miamia/blabl.html');
INSERT INTO "links"
VALUES('http://dot7.dot6.dot5.dot4.dot3.dot2.memyself.eu/miamia/blabl.html');
INSERT INTO "links"
VALUES('http://dot8.dot7.dot6.dot5.dot4.dot3.dot2.memyself.eu/miamia/blabl.html');
INSERT INTO "links"
VALUES('http://dot9.dot8.dot7.dot6.dot5.dot4.dot3.dot2.memyself.eu/miamia/blabl.html');
COMMIT;
The example below is limited to 5 dots (domain plus 4 subdomain levels).
I did it for 9 dots as well and then the query blows up to 3 screen pages.
links: name of source table:
link: field name of website address
long: field name of trimmed string
dots: field name of calculated dots
domain: field name of resulting domain.tld-string (stripped of any
subdomain)
.mode column
.header on
.width 51 -4 51
-- up to 5 dots (= 4 subdomain levels)
SELECT long, dots,
(CASE
WHEN dots = 1 THEN long
WHEN dots = 2 THEN substr(long, instr(long, '.')+1)
WHEN dots = 3 THEN substr(substr(long, instr(long, '.')+1),
instr(substr(long, instr(long, '.')+1), '.')+1)
WHEN dots = 4 THEN substr(substr(substr(long, instr(long, '.')+1),
instr(substr(long, instr(long, '.')+1), '.')+1),
instr(substr(substr(long, instr(long, '.')+1), instr(substr(long,
instr(long, '.')+1), '.')+1), '.')+1)
WHEN dots = 5 THEN substr(substr(substr(substr(long, instr(long,
'.')+1), instr(substr(long, instr(long, '.')+1), '.')+1),
instr(substr(substr(long, instr(long, '.')+1), instr(substr(long,
instr(long, '.')+1), '.')+1), '.')+1), instr(substr(substr(substr(long,
instr(long, '.')+1), instr(substr(long, instr(long, '.')+1), '.')+1),
instr(substr(substr(long, instr(long, '.')+1), instr(substr(long,
instr(long, '.')+1), '.')+1), '.')+1), '.')+1)
ELSE long
END) AS domain
FROM
(SELECT long, length(long) - length(replace(long, '.', '')) AS dots
FROM (SELECT substr(link, instr(link, '://')+3, instr(substr(link,
instr(link, '://')+3), '/')-1) AS long FROM links)
);
Is there a way to do this better/smarter?
Klaus
------------------------------------------------------------------------
_______________________________________________
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users