Re: Cello: a library of string algoritms using succinct data structures

2017-04-11 Thread andrea
@Krux02 I could just use a concept and be done. BUT it turns out I actually need to know that I am working with data from disk, because in this case I want to use disk-based sequences also for the intermediate data structures that I use in the constructions - something that I am not doing right

Re: Cello: a library of string algoritms using succinct data structures

2017-04-11 Thread Krux02
@andrea About the issue. You are right I should explain a bit more. I try to get something as [std::span](https://forum.nim-lang.org/open-std.org/JTC1/SC22/WG21/docs/papers/2016/p0122r1.pdf) from c++ into the standard library of Nim. That document explains it much better that I could do it in

Re: Cello: a library of string algoritms using succinct data structures

2017-04-11 Thread andrea
@ Krux02 As bpr remarked, the library is really about strings - especially large ones. One key application is searching, both exact and approximate, but I have also started adding similarity measures for strings, and data structures such as suffix arrays have many other applications (for

Re: Cello: a library of string algoritms using succinct data structures

2017-04-11 Thread Krux02
@bpr sorry, bioinformatics then. What I originally mean is 'people who write programs that handle genes' and I naïvely used the term "genetic engineer" which is of course wrong. I am just an application developer. All strings I handle are the strings I have written manually that are necessary

Re: Cello: a library of string algoritms using succinct data structures

2017-04-10 Thread bpr
> Well it is for genetics. No wonder that I initially didn't get what it was > about. I am not a genetic engineer. Genetics is one of the applications, but these algorithms are generally applicable in text analysis. In the C++ world, [SDSL](https://github.com/simongog/sdsl-lite) is a

Re: Cello: a library of string algoritms using succinct data structures

2017-04-10 Thread bpr
> ... and I really don't want to hijack this thread I think a better solution is for you to post in another thread which refers to this one, where you can put forth your arguments. Be careful though, as you nag you run the risk of alienating people who might otherwise be sympathetic. Please

Re: Cello: a library of string algoritms using succinct data structures

2017-04-10 Thread Krux02
Well it is for genetics. No wonder that I initially didn't get what it was about. I am not a genetic engineer. I saw that you had a spill datatype. I think it is related to this issue, it would be nice, if you could also give a comment on it, because I think not all libraries should define this

Re: Cello: a library of string algoritms using succinct data structures

2017-04-10 Thread andrea
I ask you the favour to be polite and leave the thread for the discussion of the library itself

Re: Cello: a library of string algoritms using succinct data structures

2017-04-10 Thread Libman
You are of course free to release your software under any license you want, and I really don't want to hijack this thread, but your summation of the Apache License is not accurate. Your summation ("it is as free as you can get, with the only requirement that if you actually do modifications to

Re: Cello: a library of string algoritms using succinct data structures

2017-04-10 Thread andrea
Yes, as all my other libraries, this one is licensed under Apache 2. It is as free as you can get, with the only requirement that if you actually do modifications to the library itself, you have to give credit to the original authors. It only seems fair to me, but in any case it is the

Re: Cello: a library of string algoritms using succinct data structures

2017-04-09 Thread Libman
Sorry to be the resident annoying guardian of [copyfree](http://copyfree.org/standard/rejected)-dom, but [the Apache License on this library](https://github.com/unicredit/cello/issues/1) means that it **and everything that uses it** would have to be excluded from all pure-copyfree projects

Re: Cello: a library of string algoritms using succinct data structures

2017-04-06 Thread bpr
@andrea Yes, I think you understand me, the alphabet of DNA and RNA can and should be represented with two bits per character. There's a ['standard'](https://genome.ucsc.edu/goldenpath/help/twoBit.html) for storing FASTA files as .2bit files for compression, but I am befuddled as to why they

Re: Cello: a library of string algoritms using succinct data structures

2017-04-06 Thread andrea
@cdunn2001 Thank you. I am no bioinformatician myself, but I am trying to learn some of the standard algorithms as they may have applications elsewhere. Hope Cello turns out to be useful, though! @bpr If you see [here](https://github.com/unicredit/cello#wavelet-tree), one can turn a string

Re: Cello: a library of string algoritms using succinct data structures

2017-04-06 Thread bpr
Wow, this is really great! Do you support the 2 bit representation for DNA/RNA sequences? I looked and didn't see it. I think writing tools for genome assembly in Nim would be worthwhile. Bioinformaticians use Python a lot but of course need to write fast code in C and C++. Nim's Pythonesque

Re: Cello: a library of string algoritms using succinct data structures

2017-04-06 Thread cdunn2001
This is really good. Forked into bio-nim. You're welcome to join that GitHub org with me and bpr, if you want. * [https://github.com/bio-nim](https://github.com/bio-nim)

Re: Cello: a library of string algoritms using succinct data structures

2017-04-03 Thread andrea
Thank you for your feedback! I will try to improve the wording in the next days. The terminology is quite standard in the field, but I agree that there is room for improvement!

Re: Cello: a library of string algoritms using succinct data structures

2017-04-03 Thread Krux02
Thanks a lot for the improvement, but I am still a bit puzzled when I get the the part where you start talking about `rank` and `select`. By rereading it several times I think I know now what this is all about. But the intro could be much smoother (if you care about that). For example here you

Re: Cello: a library of string algoritms using succinct data structures

2017-03-31 Thread mashingan
I thought of [this](http://libcello.org/) after glance the thread title

Re: Cello: a library of string algoritms using succinct data structures

2017-03-31 Thread Krux02
I just looked over the introduction you just posted. I am not really into string algorithms, I rather avoid all string operations in my applications except string format for output. Therefore I can already see that this library is not for me, but still it would be nice to read one or two

Cello: a library of string algoritms using succinct data structures

2017-03-31 Thread andrea
I am releasing [Cello](https://unicredit.github.io/cello/), a library of string algoritms using succinct data structures. It should appear shortly on Nimble. For now it implements walevet trees, FM indices, the Burrow-Wheeler transform, suffix arrays and a little more. Check out the README