cookiecutter template for Postgres extensions

2024-02-08 Thread Florents Tselai
Hi all, I built a cookiecutter template for Postgres extensions. This cookiecutter prompts for some basic extension metadata (e.g., name, Postgres version) and then automatically generates the necessary boilerplate files. These include: * Standard extension files (Makefile, .control, c, .h

Released vasco: a PG extension to compute MIC and MINE correlation statistics

2023-08-22 Thread Florents Tselai
Hello, If you're looking for a more powerful alternative to corr(X, Y), I just open - sourced an vasco [0] vasco is a Postgres extension that helps you discover hidden correlations in your data. It is based on the MIC and MINE family of statistics. [0] https://github.com/Florents-Tselai/vasco

No Greek stop words in FTS ?

2023-06-03 Thread Florents Tselai
Hi, I maintain a project (diofanti.org ) that tracks public spending in Greece. It’s a PG instance hosting 55M+ json documents with searching functionality on top of them. It relies heavily on to_tsvector(‘greek’, ..), as users search for company names, invoice descriptio

Multilang text search. Is this correct?

2023-03-30 Thread Florents Tselai
In my table (mix of text and jsonb columns ) I have text in multiple languages. I’d like search in all the possible regconfigs, so I’ve come up with the following recipe: CREATE FUNCTION to_tsvector_multilang(text) RETURNS tsvector AS $$ SELECT to_tsvector('english', $1) || to_tsvector('g

Re: ADD COLUMN ts tsvector GENERATED too slow

2022-07-06 Thread Florents Tselai
> On 6 Jul 2022, at 1:35 PM, Florents Tselai wrote: > > > >> On 6 Jul 2022, at 1:11 PM, Francisco Olarte wrote: >> >> On Wed, 6 Jul 2022 at 11:55, Florents Tselai >> wrote: >>> Also, fwiw looking at top the CPU% and MEM% activity

Re: ADD COLUMN ts tsvector GENERATED too slow

2022-07-06 Thread Florents Tselai
> On 6 Jul 2022, at 1:11 PM, Francisco Olarte wrote: > > On Wed, 6 Jul 2022 at 11:55, Florents Tselai > wrote: >> Also, fwiw looking at top the CPU% and MEM% activity, looks like it does >> data crunching work. > ... >>>> On 06.07.22 10:42, Flor

Re: ADD COLUMN ts tsvector GENERATED too slow

2022-07-06 Thread Florents Tselai
Also, fwiw looking at top the CPU% and MEM% activity, looks like it does data crunching work. > On 6 Jul 2022, at 12:48 PM, Florents Tselai wrote: > > > >> On 6 Jul 2022, at 12:38 PM, Peter Eisentraut >> wrote: >> >> On 06.07.22 10:42, Florents Tselai

Re: ADD COLUMN ts tsvector GENERATED too slow

2022-07-06 Thread Florents Tselai
> On 6 Jul 2022, at 12:38 PM, Peter Eisentraut > wrote: > > On 06.07.22 10:42, Florents Tselai wrote: >> I have a beefy server (40+ worker processes , 40GB+ shared buffers) and a >> table holding (key text, text text,) of around 50M rows. >> These are text

ADD COLUMN ts tsvector GENERATED too slow

2022-07-06 Thread Florents Tselai
I have a beefy server (40+ worker processes , 40GB+ shared buffers) and a table holding (key text, text text,) of around 50M rows. These are text fields extracted from 4-5 page pdfs each. I’m adding the following generated col to keep up with tsvectors ALTER TABLE docs_text ADD COLUMN ts tsvect

Re: Table space not returned to the OS ?

2022-06-27 Thread Florents Tselai
> On 27 Jun 2022, at 12:38 PM, Magnus Hagander wrote: > > > > On Mon, Jun 27, 2022 at 11:30 AM Florents Tselai <mailto:florents.tse...@gmail.com>> wrote: > Hi, > > A few months back (October) I had upgraded a Postgres instance from v12 —> 14. >

Table space not returned to the OS ?

2022-06-27 Thread Florents Tselai
Hi, A few months back (October) I had upgraded a Postgres instance from v12 —> 14. The database disk size under /var/lib/postgresql/12 was around 800GB+ back then. Note, that IIRC I had used hard-linking during the upgrade. In the database itself, lots of things have changed since. In fact, tha

Lazier alternative to row_to_json ?

2021-11-01 Thread Florents Tselai
I have the following simple query select row_to_json(d) from documents d The output of this goes to script that expects new-line-delimited stream of JSON objects. But as-is, ti looks like the server’s memory fills-up before ti starts emitting results. Any ideas how I could bypass this?

Re: Force re-compression with lz4

2021-10-17 Thread Florents Tselai
, even allow an explicit option to be defined during VACUUM ? > On 18 Oct 2021, at 8:18 AM, Michael Paquier wrote: > > On Sun, Oct 17, 2021 at 10:13:48PM +0300, Florents Tselai wrote: >> I did look into VACUUM(full) for it’s PROCESS_TOAST option which >> makes sense, but t

Re: Force re-compression with lz4

2021-10-17 Thread Florents Tselai
some hassle. Thus, the external script approach makes more sense. > On 17 Oct 2021, at 8:28 PM, Adrian Klaver wrote: > > On 10/17/21 10:17, Magnus Hagander wrote: >> On Sun, Oct 17, 2021 at 5:12 PM Florents Tselai > <mailto:florents.tse...@gmail.com>> wrote: > >

Re: Force re-compression with lz4

2021-10-17 Thread Florents Tselai
Yes, That COPY-delete-COPY sequence is what I ended up doing. Unfortunately can’t use ranges as the PK its a text string. > On 17 Oct 2021, at 7:36 PM, Ron wrote: > > On 10/17/21 10:12 AM, Florents Tselai wrote: >> Hello, >> >> I have a table storing mostly tex

Force re-compression with lz4

2021-10-17 Thread Florents Tselai
Hello, I have a table storing mostly text data (40M+ rows) that has pg_total_relation_size ~670GB. I’ve just upgraded to postgres 14 and I’m now eager to try the new LZ4 compression. I’ve altered the column to use the new lz4 compression, but that only applies to new rows. What’s the recommen

Re: Is there a good discussion of optimizations?

2021-01-07 Thread Florents Tselai
ents could actually "steal and tweak" my code. <https://github.com/Florents-Tselai/SQLite-for-Data-Scientists/blob/master/notebooks/5_advanced_SQL.ipynb> *Florents Tselai* *Data Scientist & Engineer* Jack of All Trades and Master of Some http://tselai.com <http://ts