>105000 bugs in the Ubuntu Linux database currently. >https://bugs.launchpad.net/ubuntu
From: hale.michael...@live.com To: wikidata-l@lists.wikimedia.org Date: Sun, 7 Jul 2013 17:07:23 -0400 Subject: Re: [Wikidata-l] Accelerating software innovation with Wikidata and improved Wikicode I don't think there is anything fundamentally different about writing code as opposed to a natural language that should prevent such a project. I just think that in practice we aren't as good at it yet. All software has bugs, but we can often use them for extended periods of time without encountering one. I can read many Wikipedia articles before I encounter an inconsistent statement. I think a project like this would just start with common source-control restrictions found in open-source and proprietary software like you have to have good code coverage in the test cases and you can't check in changes that break tests. That would require users to understand the code before they change it. People know that Wikipedia isn't perfect (neither are/were traditional encyclopedias), but it provides incomparable value regardless. Studies show that commercial software averages 20-30 bugs per 1000 lines of code. http://www.wired.com/software/coolapps/news/2004/12/66022 > From: j...@sahnwaldt.de > Date: Sun, 7 Jul 2013 22:47:07 +0200 > To: wikidata-l@lists.wikimedia.org > Subject: Re: [Wikidata-l] Accelerating software innovation with Wikidata and > improved Wikicode > > I think software and most other engineering products need a much > higher level of coherence than artefacts that are consumed by humans, > like Wikipedia. Wikipedia is full of inconsistencies and even > contradictions. When I browse Wikipedia, I often stumble upon > statements on one page that are contradicted by another page. That's > not a big deal - sometimes I can easily tell which statement is > correct (and fix the other pages), sometimes I can't, but either way, > I am not a computer: I don't follow these statements blindly. In a > computer program, such inconsistencies would lead to erratic behavior, > i.e., bugs. This means that a completely open wiki process will not > work for software. > > In a way, a lot of open source software is developed in a restricted > wiki way: someone proposes a change, but before it is merged, it is > checked by people who (hopefully) know all the nooks and crannies of > the existing code. A bit like edit-protected pages in Wikipedia: > everyone can propose changes on the talk page, but only admins can > actually make these changes. > > Christopher > > On 7 July 2013 04:16, Michael Hale <hale.michael...@live.com> wrote: > > I'm glad you mentioned that the same issue applies to electronics. I suppose > > I could have just referred to Moore's law instead of the relatively recent > > increasing size of datacenters. I like asking computers to work hard, but I > > find it hard to think of valuable things for them to do. You can play a new > > game or donate time to BOINC, but not very many great games are produced > > each year and BOINC typically runs algorithms that benefit humanity but not > > specifically you. For example, my genetics tests say I have an increased > > risk of prostate cancer, so I'd like to be able to tell Folding@home to > > focus on the proteins that are most relevant for the diseases I'm most > > likely to get. > > > > I still have hope that a more wiki-like model could work for developing > > software libraries though. The problems of technical design in software and > > hardware are similar, but software can be developed more fluidly and rapidly > > due to the lower barrier to entry and non-existent manufacturing costs. > > Essentially all electronics are designed and simulated with software prior > > to constructing physical prototypes these days. > > > > I've thought about the integration problem some, but I haven't ironed out > > how it would all work yet. I think standard object-oriented programming and > > modeling techniques have been absorbed by enough programmers that it might > > be worth a shot though. Essentially, each article would have a standard > > class and supporting data structures or file formats for the inputs and > > outputs of its algorithms. It would be like the typical flow chart or visual > > programming languages you can use with libraries like Modelica, but on a > > larger scale and the formats would often be more complex. So, like, you > > would have a class representing a cloud, with flags for different > > representations (density values in a cubic grid, collections of point > > particles, polygonal shape approximations, etc) which are used for different > > algorithms. So then you would have code that can convert between all of the > > representations, code for generating random clouds (with potentially lots of > > optional parameters to specify atmospheric conditions), code for outputting > > images of the generated clouds in different styles, and algorithms for > > manipulating them through time. Then if I wanted to see the effects on a > > specific cloud I've made drifting over the ocean in different atmospheric > > conditions, I could grab the code to instantiate 3D Euclidean space with a > > virtual camera, add some gravity, add some ground, add some water, add an > > atmosphere, add my cloud, and then simulate it with adjustable parameters > > for the accuracy and speed of computation. Now, there are a lot of details > > that leaves out, but I don't know of another way to easily mix capabilities > > from high-end graphics software and various specialized simulation > > algorithms in lots of ways. Graphics software typically gives you some > > simulation capabilities, and simulation software typically gives you some > > graphics functionality, but I want lots of both. > > > > I think having more semantic annotation tools will be great, but I don't > > spend most of my time doing searches. There is an astounding amount of > > information, data, and media on the internet, but it's not hard to find the > > edge if you really try. It's pretty crazy if you search for images of "blue > > bear" how many results come up, but if you search for "blue bear and green > > gorilla" you don't get anything useful. Then you get to face the craziness > > of how many options you have for combining a picture of a blue bear and a > > different picture of a green gorilla into one picture. I think it's > > interesting what they are trying with the Wolfram Alpha website, but they > > will always have to limit the time of the computations they allow you to do > > on their servers, so that's why I think we need better libraries to more > > easily program the computers we have direct control over. > > > > ________________________________ > > From: dacu...@gmail.com > > Date: Sat, 6 Jul 2013 17:49:41 -0400 > > To: wikidata-l@lists.wikimedia.org > > Subject: Re: [Wikidata-l] Accelerating software innovation with Wikidata and > > improved Wikicode > > > > > > Thanks for sharing your thoughts Michael, it is also something that has been > > bothering me for a while and not only in programming, also in other > > technical domains like electronics. > > > > In my opinion, the reason why programming (or technical design in general) > > couldn't follow the wiki world is because it has some structural differences > > that require a different approach. To start with, there is the problem of > > integration, where code solutions are usually part of a larger system and > > they cannot be isolated or combined with others blocks as easily as you > > would combine text fragments like in Wikipedia. I'm sure that all those 10 > > open file examples have some particularities about the operative system, > > method, supporting libraries, etc. > > The part of scavenging and gluing will be always there unless you follow the > > approach used in hardware design (wp: semiconductor intellectual property > > core). > > > > Since that kind of modularity trend is hard to set up at large scale other > > than what is already stablished, it would be more practical to focus on what > > can be improved more easily, which is the scavenging. Instead of copying > > code fragments, it would be better to point to the fragment in the source > > code project itself, while at the same time providing the semantic tags > > necessary for describing that fragment. This can be done (more or less) with > > current existing semantic annotation technology (see thepund.it and Dbpedia > > Spotlight). > > > > If this has not been done before it is maybe because semantic tools are now > > in the transition from "adaptation of an emerging technology" into "social > > appropriation of that technology". For the wiki concept it took 6 years for > > it to be transformed into wikipedia, more or less the same amount of years > > between SMW and Wikidata. Semantic annotation of code will eventually > > happen, how fast it will depend on interest in such a tool and the success > > of the supporting technologies. > > > > Micru > > > > > > On Sat, Jul 6, 2013 at 3:10 PM, Michael Hale <hale.michael...@live.com> > > wrote: > > > > I have been pondering this for some time, and I would like some feedback. I > > figure there are many programmers on this list, but I think others might > > find it interesting as well. > > > > Are you satisfied with our progress in increasing software sophistication as > > compared to, say, increasing the size of datacenters? Personally, I think > > there is still too much "reinventing the wheel" going on, and the best way > > to get to software that is complex enough to do things like high-fidelity > > simulations of virtual worlds is to essentially crowd-source the translation > > of Wikipedia into code. The existing structure of the Wikipedia articles > > would serve as a scaffold for a large, consistently designed, open-source > > software library. Then, whether I was making software for weather prediction > > and I needed code to slowly simulate physically accurate clouds or I was > > making a game and I needed code to quickly draw stylized clouds I could just > > go to the article for clouds, click on C++ (or whatever programming language > > is appropriate) and then find some useful chunks of code. Every article > > could link to useful algorithms, data structures, and interface designs that > > are relevant to the subject of the article. You could also find data-centric > > programs too. Like, maybe a JavaScript weather statistics browser and > > visualizer that accesses Wikidata. The big advantage would be that > > constraining the design of the library to the structure of Wikipedia would > > handle the encapsulation and modularity aspects of the software engineering > > so that the components could improve independently. Creating a simulation or > > visualization where you zoom in from a whole cloud to see its constituent > > microscopic particles is certainly doable right now, but it would be a lot > > easier with a function library like this. > > > > If you look at the existing Wikicode and Rosetta Code the code samples are > > small and isolated. They will show, for example, how to open a file in 10 > > different languages. However, the search engines already do a great job of > > helping us find those types of code samples across blog posts of people who > > have had to do that specific task before. However, a problem that I run into > > frequently that the search engines don't help me solve is if I read a > > nanoelectronics paper and I want to do a simulation of the physical system > > they describe I often have to go to the websites of several different > > professors and do a fair bit of manual work to assemble their different > > programs into a pipeline, and then the result of my hacking is not easy to > > expand to new scenarios. We've made enough progress on Wikipedia that I can > > often just click on a couple of articles to get an understanding of the > > paper, but if I want to experiment with the ideas in a software context I > > have to do a lot of scavenging and gluing. > > > > I'm not yet convinced that this could work. Maybe Wikipedia works so well > > because the internet reached a point where there was so much redundant > > knowledge listed in many places that there was immense social and economic > > pressure to utilize knowledgeable people to summarize it in a free > > encyclopedia. Maybe the total amount of software that has been written is > > still too small, there are still too few programmers, and it's still too > > difficult compared to writing natural languages for the crowdsourcing > > dynamics to work. There have been a lot of successful open-source software > > projects of course, but most of them are focused on creating software for a > > specific task instead of library components that cover all of the knowledge > > in the encyclopedia. > > > > _______________________________________________ > > Wikidata-l mailing list > > Wikidata-l@lists.wikimedia.org > > https://lists.wikimedia.org/mailman/listinfo/wikidata-l > > > > > > > > > > -- > > Etiamsi omnes, ego non > > _______________________________________________ Wikidata-l mailing list > > Wikidata-l@lists.wikimedia.org > > https://lists.wikimedia.org/mailman/listinfo/wikidata-l > > > > _______________________________________________ > > Wikidata-l mailing list > > Wikidata-l@lists.wikimedia.org > > https://lists.wikimedia.org/mailman/listinfo/wikidata-l > > > > _______________________________________________ > Wikidata-l mailing list > Wikidata-l@lists.wikimedia.org > https://lists.wikimedia.org/mailman/listinfo/wikidata-l _______________________________________________ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
_______________________________________________ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l