[Mark Lawrence please press DELETE now in case the rest of this message is all about you.] [[If that is not working, if on Windows, try Control-ALT-DELETE as that will really get rid of my message.]]
Back to replying to Steven, Of course I want to be corrected when wrong. I think everyone here knows I tend to be quite expansive in my thoughts and sometimes to the point where they suggest I am free-associating. I am trying to get to the point faster and stay there. So if what I write is not wrong as a general point and you want to bring up every exception, fine. I reserve the right not to follow you there, especially not on the forum. I may continue a discussion with you in private, of course. I often have a problem in real life (not talking about you, let alone whoever Mark is) where I think I said something clearly by using phrases like "if" and find the other person simply acts as if I had left that out. You know, we can go to the park IF it is not raining tomorrow. Reply is to tell me the weather report says it will rain so why am I suggesting we go to the park. Duh. I was not aware of the weather report directly BUT clearly suggested it was a consideration we should look at before deciding. Now a more obvious error should be pointed out. EXAMPLE, I am driving to Pennsylvania this weekend not far from a National Park and will have some hours to kill. I suggested we might visit Valley Forge National Historic Park and did not say only if it was open. Well, in the U.S. we happen to have the very real possibility the Park will be closed due to it being deemed optional during a so-called Government Shutdown so such a reply IS reasonable. I did not consider that and stand corrected. But Chris, you point out I reacted similarly to what you said. Indeed, you said that sometimes we don't need to focus on efficiency as compared to saying we should always ignore it or something like that. I think we actually are in relative agreement in how we might approach a problem like this. We might try to solve it in a reasonable way first and not worry at first about efficiency especially now that some equipment runs so fast and with so much memory that results appear faster than we can get to them. But, with experience, and need, we may fine tune code that is causing issues. As I have mentioned, I have applications that regularly need huge samples taken at random so a list of millions being created millions of times and the above being done thousands of times, adds up. Many cheaper methods might then be considered including, especially, just switching to a better data structure ONCE. I will stop this message here as I suspect Mark is still reading and fuming. Note, I do not intend to mention Mark again in future messages. I do not actually want to annoy him and wish he would live and let live. -----Original Message----- From: Tutor <tutor-bounces+avigross=verizon....@python.org> On Behalf Of Steven D'Aprano Sent: Thursday, December 27, 2018 5:38 PM To: tutor@python.org Subject: Re: [Tutor] decomposing a problem On Wed, Dec 26, 2018 at 11:02:07AM -0500, Avi Gross wrote: > I often find that I try to make a main point ad people then focus on > something else, like an example. I can't speak for others, but for me, that could be because of a number of reasons: - I agree with what you say, but don't feel like adding "I agree!!!!" after each paragraph of yours; - I disagree, but can't be bothered arguing; - I don't understand the point you intend to make, so just move on. But when you make an obvious error, I tend to respond. This is supposed to be a list for teaching people to use Python better, after all. > So, do we agree on the main point that choosing a specific data structure or > algorithm (or even computer language) too soon can lead to problems that can > be avoided if we first map out the problem and understand it better? Sure, why not? That's vague and generic enough that it has to be true. But if its meant as advice, you don't really offer anything concrete. How does one decide what is "too soon"? How does one avoid design paralysis? > I do not concede that efficiency can be ignored because computers are fast. That's good, but I'm not sure why you think it is relevant as I never suggested that efficiency can be ignored. Only that what people *guess* is "lots of data" and what actually *is* lots of data may not be the same thing. > I do concede that it is often not worth the effort or that you can > inadvertently make things worse and there are tradeoffs. Okay. > Let me be specific. The side topic was asking how to get a random key from > an existing dictionary. If you do this ONCE, it may be no big deal to make a > list of all keys, index it by a random number, and move on. I did supply a > solution that might(or might not) run faster by using a generator to get one > item at a time and stopping when found. Less space but not sure if less > time. Why don't you try it and find out? > But what I often need to do is to segment lots of data into two piles. One > is for training purposes using some machine learning algorithm and the > remainder is to be used for verifications. The choice must be random or the > entire project may become meaningless. So if your data structure was a > dictionary with key names promptly abandoned, you cannot just call pop() > umpteen times to get supposedly random results as they may come in a very > specific order. Fortunately I never suggested doing that. > If you want to have 75% of the data in the training section, > and 25% reserved, and you have millions of records, what is a good way to > go? The obvious solution: keys = list(mydict.keys()) random.shuffle(keys) index = len(keys)*3//4 training_data = keys[:index] reserved = keys[index:] Now you have the keys split into training data and reserved data. To extract the value, you can just call mydict[some_key]. If you prefer, you can generate two distinct dicts: training_data = {key: mydict[key] for key in training_data} and similarly for the reserved data, and then mydict becomes redundant and you are free to delete it (or just ignore it). Anything more complex than this solution should not even be attempted until you have tried the simple, obvious solution and discovered that it isn't satisfactory. Keep it simple. Try the simplest thing that works first, and don't add complexity until you know that you need it. By the way, your comments would be more credible if you had actual working code that demonstrates your point, rather than making vague comments that something "may" be faster. Sure, anything "may" be faster. We can say that about literally anything. Walking to Alaska from the southernmost tip of Chile while dragging a grand piano behind you "may" be faster than flying, but probably isn't. Unless you have actual code backing up your assertions, they're pretty meaningless. And the advantage of working code is that people might actually learn some Python too. -- Steve _______________________________________________ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: https://mail.python.org/mailman/listinfo/tutor _______________________________________________ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: https://mail.python.org/mailman/listinfo/tutor