[weewx-user] Re: weectl import skip existing data?

[email protected] Thu, 20 Mar 2025 11:48:28 -0700

Attached is an updated query.  See the comments in it.  It can find missing 
records by comparing two databases.  It reports the extra records in the 
more complete database.  That's the same as saying it reports the missing 
records in the less complete database.


The situation with timestamps not matching up exactly between the two 
databases raises an issue.  If you create a csv of the missing records to 
import, do you need to correct the timestamps to match the interval Weewx 
is expecting?  For example, if you determine that the 09:00:00 entry is 
missing in weewx, but the 9 o'clock entry in the other database is actually 
timestamped 09:00:10, do you need to fix that before importing it?  Or does 
Weewx not really care about that?

Walt
On Wednesday, March 19, 2025 at 2:33:58 PM UTC-5 vince wrote:

> oops - typo'd the field names in multiple places previously.  Sorry.  I 
> was working off an alternate db with different field names than weewx. 
>  This one matches the weewx schema.
>
> select dateTime,datetime(dateTime,'unixepoch','localtime') from archive 
> where datetime(dateTime,'unixepoch','localtime') like "2024-12-28 09%" 
> limit 2;
> 1735405200|2024-12-28 09:00:00
> 1735405500|2024-12-28 09:05:00
>
>
> On Wednesday, March 19, 2025 at 12:23:53 PM UTC-7 vince wrote:
>
>> Ryan - here's an example.  It looks for the first two records from 
>> 2024-12-18 starting with 9am localtime.
>>
>> select datetime,datetime(datetime,'unixepoch','localtime') from archive 
>> where datetime(datetime,'unixepoch','localtime') like "2024-12-28 09%" 
>> limit 2;
>> 1735405200|2024-12-28 09:00:00
>> 1735405500|2024-12-28 09:05:00
>>
>
>> On Tuesday, March 18, 2025 at 9:16:49 PM UTC-7 vince wrote:
>>
>>>
>>> … where datetime(dateTime,'unixepoch','localtime') … gets you a more 
>>> human friendly datetime but you’ll have to work out the syntax for 
>>> comparison to a date string that does what you want.
>>>
>>> Still not understanding why you’re working so hard on this since you 
>>> know which days your network was down. Heck you can pretty close visually 
>>> just by looking at the graphs. But see if the syntax above helps any.
>>>
>>> Away from computer so I can’t give you a complete example…
>>>
>>> On Tuesday, March 18, 2025 at 8:05:50 PM UTC-7 Ryan Stasel wrote:
>>>
>>>> I'm drawing a blank here... I'm fine altering my import data to not 
>>>> include overlapping data... but how the heck do I figure out what's 
>>>> missing? 
>>>>
>>>> While I can convert unixepoch to a readable date, is there an easy way 
>>>> to just do something like
>>>>
>>>> select * from archive where datetime -like "2025-02-28";
>>>>
>>>> Then I can see what's missing and alter the import file.
>>>>
>>>> Thanks! 
>>>> On Tuesday, March 18, 2025 at 5:11:07 AM UTC-7 [email protected] wrote:
>>>>
>>>>> Attached is a simple sqlite query for finding extra database rows.  In 
>>>>> it, both databases are weewx databases, so if you're trying to find 
>>>>> differences between weewx and some other database, you'll have to modify 
>>>>> the query, but the idea remains the same.
>>>>>
>>>>> Note that the order of the databases is important.  It reports rows 
>>>>> that are IN db1 but NOT IN db2.  If you swapped the order of the 
>>>>> databases, 
>>>>> the counts would still be correct but it would print no rows because 
>>>>> every 
>>>>> record in the smaller database is in the larger database.
>>>>>
>>>>> Walt
>>>>> On Monday, March 17, 2025 at 5:07:39 PM UTC-5 gjr80 wrote:
>>>>>
>>>>>> If you want to tell what data already exists you will need to get 
>>>>>> your SQL hands dirty (you could use Cheetah produce a WeeWX report that 
>>>>>> lists all the timestamps in some given period, but that's an awful lot 
>>>>>> of 
>>>>>> messing around for little gain). WeeWX archive record timestamps are 
>>>>>> saved 
>>>>>> as an integers, with the DBMS enforcing the unique constraint. So a 
>>>>>> source 
>>>>>> record with one second difference to an existing record would be 
>>>>>> imported.
>>>>>>
>>>>>> The import process does report on what records were imported and what 
>>>>>> records were not via the log; imported records will be logged as being 
>>>>>> added and records that were rejected due to duplicate timestamps will be 
>>>>>> reported with a unique key error. Arguably a little crude, but to do any 
>>>>>> more detailed user friendly analysis and reporting would have to 
>>>>>> expensive 
>>>>>> time wise. Of course the reporting is after the fact not before.
>>>>>>
>>>>>> Gary
>>>>>>
>>>>>> On Tuesday, 18 March 2025 at 02:14:24 UTC+10 [email protected] wrote:
>>>>>>
>>>>>>> Thanks gjr80. I guess the question is, how do I tell easily what 
>>>>>>> data already exists (the time stamps will have to match, I assume). So 
>>>>>>> "2025-02-28 08:15:00" existing (for example) I assume won't match 
>>>>>>> against 
>>>>>>> an import of "2025-02-28 08:16:00" (let alone if there's seconds other 
>>>>>>> than 
>>>>>>> 00). 
>>>>>>>
>>>>>>> Is there an "easy" way to tell? or am I stuck going into sqlite3 and 
>>>>>>> doing some selects (not hard, just was kind of hoping the import 
>>>>>>> process 
>>>>>>> WOULD report on records already existing). =)
>>>>>>>
>>>>>>> On Sunday, March 16, 2025 at 5:59:03 PM UTC-7 gjr80 wrote:
>>>>>>>
>>>>>>>> The default weectl import action is to *not* import records where 
>>>>>>>> there is already an archive record in the database with the same 
>>>>>>>> timestamp. weectl 
>>>>>>>> import will appear to import such records but if you look at the 
>>>>>>>> WeeWX log you will see entries similar to:
>>>>>>>>
>>>>>>>> 2023-11-04 15:33:01 weectl-import[3795]: ERROR weewx.manager: 
>>>>>>>> Unable to add record 2018-09-04 04:20:00 AEST (1535998800) to database 
>>>>>>>> 'weewx.sdb': UNIQUE constraint failed: archive.dateTime
>>>>>>>>
>>>>>>>> indicating a record with the same timestamp (in this case 
>>>>>>>> 2018-09-04 04:20:00 AEST (1535998800)) already exists in the database 
>>>>>>>> and 
>>>>>>>> the imported record was discarded. Long story, but it was too 
>>>>>>>> inefficient 
>>>>>>>> to check every record before attempting to add it to the archive, 
>>>>>>>> hence the 
>>>>>>>> somewhat cumbersome import and notification. Note that if you use the 
>>>>>>>> (presently) undocumented --update command line option with weectl 
>>>>>>>> import pre-existing records will be overwritten.
>>>>>>>>
>>>>>>>> Bottom line - just run weectl import as per the docs, and whatever 
>>>>>>>> you do don't include --update on your weectl import command line. 
>>>>>>>> And of course always make a backup of your database before importing.
>>>>>>>>
>>>>>>>> Gary
>>>>>>>> On Monday, 17 March 2025 at 09:56:17 UTC+10 [email protected] 
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> I lost a few days of data due to a network issue, but I have the 
>>>>>>>>> data in another system. Am trying to import from csv, but I think my 
>>>>>>>>> csv 
>>>>>>>>> overlaps a bit with what's already in the weewx database. 
>>>>>>>>>
>>>>>>>>> Doing a dryrun, it says it will import every line in the csv... 
>>>>>>>>>
>>>>>>>>> Is there a way to get weectl import to NOT import duplicate data, 
>>>>>>>>> or is there an easy way to see what data is in those dates so I can 
>>>>>>>>> modify 
>>>>>>>>> csv to exclude those entries?
>>>>>>>>>
>>>>>>>>> Thanks! 
>>>>>>>>>
>>>>>>>>

-- 
You received this message because you are subscribed to the Google Groups 
"weewx-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion visit 
https://groups.google.com/d/msgid/weewx-user/0847fbee-52b3-49a4-b7a8-aac0a0fb7fd8n%40googlegroups.com.

-- Find missing or extra observations between weather databases based on 
observation time.

-- This query can only filter out observations that exist in both databases if 
the observation times
-- match exactly or they match if you disregard the seconds.  For example, this 
query can't match an observation 
-- at 11:59:59 with an observation at 12:00:00, or one at 12:00:00 with one at 
12:01:00.


-- The more complete database should be db1.  Extra rows in db1 will be 
reported, which means they're missing in db2.


-- Provide a db file name for the more complete database as db1.  It can be 
another weewx db or some other db.
-- db2 must be a weewx db.  If you want to use a non-weewx db as db2, you will 
have to modify the select statements below.
 
attach 'my_more_complete_database' as db1
;
attach 'weewx.sdb' as db2
;



select 'row count in db1 is ' || printf('%,i', count(*)) 
  from db1.archive
;

select 'row count in db2 is ' || printf('%,i', count(*)) 
  from db2.archive
;

-- Uncomment all the lines for one of the queries below


-- Compare two weewx databases

-- with t1 as (
--     select datetime from db1.archive
--     except
--     select datetime from db2.archive
-- )
-- select datetime(datetime, 'unixepoch', 'localtime') as dt1
--   from t1
;


-- Compare a non-weewx database to a weewx database on exact observation times
-- You will need to provide sql below to return the db1 obs date/time as a 
string of 'YYYY-MM-DD HH:MM:SS'
-- That might be just a column name if the datetime is already stored as a 
'YYYY-MM-DD HH:MM:SS' string,
-- or maybe an strftime(), or a concatenation of separate columns holding 
pieces of datetime values

-- with t1 as (
--     select ADD-SQL-HERE 
--       from db1.archive
--     except
--     select datetime(datetime, 'unixepoch', 'localtime')
--       from db2.archive
-- )
-- select *
--   from t1
;


-- Compare a non-weewx database to a weewx database on observation times 
ignoring seconds
-- You will need to provide sql below to return the db1 obs date/time as string 
of 'YYYY-MM-DD HH:MM'
-- That might be just a column name or a substring() of it where the datetime 
is stored as a string,
-- or maybe an strftime(), or a concatenation of separate columns holding 
pieces of datetime values

-- with t1 as (
--     select ADD-SQL-HERE 
--       from db1.archive
--     except
--     select strftime('%Y-%m-%d %H:%M', datetime, 'unixepoch', 'localtime')
--       from db2.archive
-- )
-- select *
--   from t1
;

[weewx-user] Re: weectl import skip existing data?

Reply via email to