Re: [Tutor] Reading .csv data vs. reading an array

Mats Wichmann Mon, 15 Jul 2019 12:05:48 -0700

On 7/15/19 12:35 PM, Chip Wachob wrote:
> Oscar and Mats,
> 
> Thank you for your comments and taking time to look at the snips.
> 
> Yes, I think I had commented that the avg+trigger was = triggervolts in
> my original post.
> 
> I did find that there was an intermediary process which I had forgotten
> to comment out that was adversely affecting the data in one instance and
> not the other.  So it WAS a case of becoming code blind.  But I didn't
> give y'all all of the code so you would not have known that.  My apologies.
> 
> Mats, I'd like to get a better handle on your suggestions about
> improving the code.  Turns out, I've got another couple of 4GByte files
> to sift through, and they are less 'friendly' when it comes to
> determining the start and stop points.  So, I have to basically redo
> about half of my code and I'd like to improve on my Python coding skills.
> 
> Unfortunately, I have gaps in my coding time, and I end up forgetting
> the details of a particular language, especially a new language to me,
> Python.
> 
> I'll admit that my 'C' background keeps me thinking as these data sets
> as arrays.. in fact they are lists, eg:
> 
> [
> [t0, v0],
> [t1, v1],
> [t2, v2],
> .
> .
> .
> [tn, vn]
> ]
> 
> Time and volts are floats and need to be converted from the csv file
> entries.
> 
> I'm not sure that follow the "unpack" assignment in your example of:
> 
> for row in TrigWind:
>     time, voltage = row  # unpack
> 
> I think I 'see' what is happening, but when I read up on unpacking, I
> see that referring to using the * and ** when passing arguments to a
> function...


That's a different aspect of unpacking.  This one is sequnce unpacking,
sometimes called tuple (or seqeucence) assignment.  In the official
Python docs it is described in the latter part of this section:

https://docs.python.org/3/tutorial/datastructures.html#tuples-and-sequences


> I tried it anyhow, with this being an example of my source data:
> 
> "Record Length",2000002,"Points",-0.005640001706,1.6363
> "Sample Interval",5e-09,s,-0.005639996706,1.65291
> "Trigger Point",1128000,"Samples",-0.005639991706,1.65291
> "Trigger Time",0.341197,s,-0.005639986706,1.60309
> ,,,-0.005639981706,1.60309
> "Horizontal Offset",-0.00564,s,-0.005639976706,1.6363
> ,,,-0.005639971706,1.65291
> ,,,-0.005639966706,1.65291
> ,,,-0.005639961706,1.6363
> .
> .
> .
> 
> Note that I want the items in the third and fourth column of the csv
> file for my time and voltage.
> 
> When I tried to use the unpack, they all came over as strings.  I can't
> seem to convert them selectively..

That's what the csv module does, unless you tell it not to. Maybe this
will help:

https://docs.python.org/3/library/csv.html#csv.reader

There's an option to convert unquoted values to floats, and leave quoted
values alone as strings, which would seem to match your data above quite
well.

> Desc1, Val1, Desc2, TimeVal, VoltVal = row
> 
> TimeVal and VoltVal return type of str, which makes sense.
> 
> Must I go through yet another iteration of scanning TimeVal and VoltVal
> and converting them using float() by saving them to another array?
> 
> 
> Thanks for your patience.
> 
> Chip
> 
> 
> 
> 
> 
> 
> 
> 
> 
> On Sat, Jul 13, 2019 at 9:36 AM Mats Wichmann <[email protected]
> <mailto:[email protected]>> wrote:
> 
>     On 7/11/19 8:15 AM, Chip Wachob wrote:
> 
>     kinda restating what Oscar said, he came to the same conclusions, I'm
>     just being a lot more wordy:
> 
> 
>     > So, here's where it gets interesting.  And, I'm presuming that
>     someone out
>     > there knows exactly what is going on and can help me get past this
>     hurdle.
> 
>     Well, each snippet has some "magic" variables (from our point of view,
>     since we don't see where they are set up):
> 
>     1: if(voltage > (avg + triglevel)
> 
>     2: if((voltage > triggervolts)
> 
>     since the value you're comparing voltage to gates when you decide
>     there's a transition, and thus what gets added to the transition list
>     you're building, and the list size comes out different, and you claim
>     the data are the same, then guess where a process of elimination
>     suggests the difference is coming from?
> 
>     ===
> 
>     Stylistic comment, I know this wasn't your question.
> 
>     >         for row in range (len(TrigWind)):
> 
>     Don't do this.  It's not a coding error giving you wrong results, but
>     it's not efficient and makes for harder to read code.  You already have
>     an iterable in TrigWind.  You then find the size of the iterable and use
>     that size to generate a range object, which you then iterate over,
>     producing index values which you use to index into the original
>     iterable.  Why not skip all that?  Just do
> 
>     for row in TrigWind:
> 
>     now row is actually a row, as the variable name suggests, rather than an
>     index you use to go retrieve the row.
> 
>     Further, the "row" entries in TrigWind are lists (or tuples, or some
>     other indexable iterable, we can't tell), which means you end up
>     indexing into two things - into the "array" to get the row, then into
>     the row to get the individual values. It's nicer if you unpack the rows
>     into variables so they can have meaningful names - indeed you already do
>     that with one of them. Lets you avoid code snips like  "x[7][1]"
> 
>     Conceptually then, you can take this:
> 
>     for row in range(len(Trigwind)):
>         voltage = float(TrigWind[row][1])
>         ...
>             edgearray.append([float(TrigWind[row][0]),
>     float(TrigWind[row][1])])
>         ...
> 
>     and change to this:
> 
>     for row in TrigWind:
>         time, voltage = row  # unpack
>         ....
>             edgearray.append([float)time, float(voltage)])
> 
>     or even more compactly you can unpack directly at the top:
> 
>     for time, voltage in TrigWind:
>         ...
>             edgearray.append([float)time, float(voltage)])
>         ...
> 
>     Now I left an issue to resolve with conversion - voltage is not
>     converted before its use in the not-shown comparisons. Does it need to
>     be? every usage of the values from the individual rows here uses them
>     immediately after converting them to float.  It's usually better not to
>     convert all over the place, and since the creation of TrigWind is under
>     your own control, you should do that at the point the data enters the
>     program - that is as TrigWind is created; then you just consume data
>     from it in its intended form.  But if not, just convert voltage before
>     using, as your original code does. You don't then need to convert
>     voltage a second time in the list append statements.
> 
>     for time, voltage in TrigWind:
>         voltage = float(voltage)
>         ...
>             edgearray.append([float)time, voltage])
>         ...
> 
> 
>     _______________________________________________
>     Tutor maillist  -  [email protected] <mailto:[email protected]>
>     To unsubscribe or change subscription options:
>     https://mail.python.org/mailman/listinfo/tutor
> 

_______________________________________________
Tutor maillist  -  [email protected]
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] Reading .csv data vs. reading an array

Reply via email to