@Krux02 I could just use a concept and be done. BUT it turns out I actually
need to know that I am working with data from disk, because in this case I want
to use disk-based sequences also for the intermediate data structures that I
use in the constructions - something that I am not doing right
@andrea
About the issue. You are right I should explain a bit more. I try to get
something as
[std::span](https://forum.nim-lang.org/open-std.org/JTC1/SC22/WG21/docs/papers/2016/p0122r1.pdf)
from c++ into the standard library of Nim. That document explains it much
better that I could do it in
@ Krux02 As bpr remarked, the library is really about strings - especially
large ones. One key application is searching, both exact and approximate, but I
have also started adding similarity measures for strings, and data structures
such as suffix arrays have many other applications (for
@bpr sorry, bioinformatics then. What I originally mean is 'people who write
programs that handle genes' and I naïvely used the term "genetic engineer"
which is of course wrong. I am just an application developer. All strings I
handle are the strings I have written manually that are necessary
> Well it is for genetics. No wonder that I initially didn't get what it was
> about. I am not a genetic engineer.
Genetics is one of the applications, but these algorithms are generally
applicable in text analysis. In the C++ world,
[SDSL](https://github.com/simongog/sdsl-lite) is a
> ... and I really don't want to hijack this thread
I think a better solution is for you to post in another thread which refers to
this one, where you can put forth your arguments. Be careful though, as you nag
you run the risk of alienating people who might otherwise be sympathetic.
Please
Well it is for genetics. No wonder that I initially didn't get what it was
about. I am not a genetic engineer. I saw that you had a spill datatype. I
think it is related to this issue, it would be nice, if you could also give a
comment on it, because I think not all libraries should define this
I ask you the favour to be polite and leave the thread for the discussion of
the library itself
You are of course free to release your software under any license you want, and
I really don't want to hijack this thread, but your summation of the Apache
License is not accurate.
Your summation ("it is as free as you can get, with the only requirement that
if you actually do modifications to
Yes, as all my other libraries, this one is licensed under Apache 2. It is as
free as you can get, with the only requirement that if you actually do
modifications to the library itself, you have to give credit to the original
authors.
It only seems fair to me, but in any case it is the
Sorry to be the resident annoying guardian of
[copyfree](http://copyfree.org/standard/rejected)-dom, but [the Apache License
on this library](https://github.com/unicredit/cello/issues/1) means that it
**and everything that uses it** would have to be excluded from all
pure-copyfree projects
@andrea Yes, I think you understand me, the alphabet of DNA and RNA can and
should be represented with two bits per character. There's a
['standard'](https://genome.ucsc.edu/goldenpath/help/twoBit.html) for storing
FASTA files as .2bit files for compression, but I am befuddled as to why they
@cdunn2001 Thank you. I am no bioinformatician myself, but I am trying to learn
some of the standard algorithms as they may have applications elsewhere. Hope
Cello turns out to be useful, though!
@bpr If you see [here](https://github.com/unicredit/cello#wavelet-tree), one
can turn a string
Wow, this is really great! Do you support the 2 bit representation for DNA/RNA
sequences? I looked and didn't see it.
I think writing tools for genome assembly in Nim would be worthwhile.
Bioinformaticians use Python a lot but of course need to write fast code in C
and C++. Nim's Pythonesque
This is really good. Forked into bio-nim. You're welcome to join that GitHub
org with me and bpr, if you want.
* [https://github.com/bio-nim](https://github.com/bio-nim)
Thank you for your feedback! I will try to improve the wording in the next
days. The terminology is quite standard in the field, but I agree that there is
room for improvement!
Thanks a lot for the improvement, but I am still a bit puzzled when I get the
the part where you start talking about `rank` and `select`. By rereading it
several times I think I know now what this is all about. But the intro could be
much smoother (if you care about that). For example here you
I thought of [this](http://libcello.org/) after glance the thread title
I just looked over the introduction you just posted. I am not really into
string algorithms, I rather avoid all string operations in my applications
except string format for output. Therefore I can already see that this library
is not for me, but still it would be nice to read one or two
I am releasing [Cello](https://unicredit.github.io/cello/), a library of string
algoritms using succinct data structures.
It should appear shortly on Nimble. For now it implements walevet trees, FM
indices, the Burrow-Wheeler transform, suffix arrays and a little more.
Check out the README
20 matches
Mail list logo