Thank you, Skip!
I tried to publish an Ordinal Fraction article in wikipedia, but it was removed
because original research is not allowed in wikipedia. However somebody copied
it into this link: StateMaster - Encyclopedia: Ordinal fraction . I think that
the most obvious use of ordinal fractions
Skip,
Are you sure you have the word2vec description right?
https://en.wikipedia.org/wiki/Word2vec claims the dimensionality of
word2vec is typically in the range of 100 . .. 1000, which would allow
treatment of a rather limited vocabulary if each dimension
corresponded to a distinct word.
The i
I read in a text file of word vectors using fread. The format looks like
this:
bell 0.0264 -0.2927 -0.0254 -0.1034 0.1672 -0.0440 -0.0019 0.1210 ...
bell_tower -0.1252 -0.1233 0.1351 0.1897 0.0242 0.0014 0.1942 -0.0237 ...
belt 0.1332 0.0142 -0.1208 -0.0574 0.1451 -0.0731 -0.1293 0.0855 ...
bel
a=:'belt 0.1332 0.0142 -0.1208 -0.0574 0.1451 -0.0731 -0.1293 0.0855'
(}.~ #@:>@:{.@:;: ) a
On Wed, 2/21/18, Skip Cave wrote:
Subject: [Jprogramming] File Cleanup
To: "[email protected]"
Date: Wednesday, February 21, 2018, 5:36 PM
I think you want this pair of expressions:
<@({.~ i.&' ');._2 text
0 1 }. _&".;._2 text
(Note that giving ". a left argument tells J that you only want
numeric results. When you do this, you don't need to do the search and
replace, because J recognizes - as being a minus sign. Also, since
txt here is a set of lines from your example with trailing ... removed;
here are the first two:
,.2{.txt
+--+
|bell 0.0264 -0.2927 -0.0254 -0.1034 0.1672 -0.0440 -0.0019 0.1210 |
+--
Raul,
Well, one way to start word2vec* is to* assign a boolean vector to each
word, with a single boolean one in a different place for each unique word.
That's why it's called 'one hot' embedding. However, after training the
word set with a shallow, two-layer neural network, and doing significant
Another suggestion using some of J's in-built utilities
dat=: freads 'yourfile.txt'
labels=: <@(' '&taketo);._2 dat
numbers=: _ ". (' '&takeafter);._2 dat
HTH
Ric
On Wed, Feb 21, 2018 at 9:57 PM, 'Mike Day' via Programming <
[email protected]> wrote:
> txt here is a set of lines from y
Or using the tables/dsv addon:
load 'tables/dsv'
dat=: makenum ' ' readdsv 'yourfile.txt'
Note that although they're boxed the numbers are actually numeric.
To split them you could do:
labels=: {."1 dat
numbers=: > }."1 dat
On Wed, Feb 21, 2018 at 11:03 PM, Ric Sherlock wrote:
> Another sugge
Sure, you can represent words that way
That's basically the same kind of representation that you get encoding
each word as a unique integer (index). Just less space efficient.
So, for example, in J, if you had 8 words, you could represent the
unique sequence of words as i.8 or you cou
Thanks to Raul and Mike for the suggestions.
I read in the data:
nb =: <'C:\numberbatch-en.txt'
nbs =. fread nb
Then I tried to clean it up:
Mike's method ran out of memory:
nbs4 =. ( i.&' ' ({.;0 ". }.)] ) every nbs
|out of memory
When I tried to run it on a smaller set:
nbs4=: (i.&' '
You need to convert words to a list. Also, night use &> instead of each as
It needs to be unboxed to use as an index.
On Feb 21, 2018 9:09 AM, "Skip Cave" wrote:
> Thanks to Raul and Mike for the suggestions.
>
> I read in the data:
>
>
> nb =: <'C:\numberbatch-en.txt'
>
> nbs =. fread nb
>
>
>
vec {~ (<'adults') i.~ words
is perhaps what you are looking for
R.E. Boss
> -Original Message-
> From: Programming [mailto:[email protected]]
> On Behalf Of Skip Cave
> Sent: woensdag 21 februari 2018 17:09
> To: [email protected]
> Subject: Re: [Jprogra
Why not submit it to arXiv.org?
R.E. Boss
> -Original Message-
> From: Programming [mailto:[email protected]]
> On Behalf Of 'Bo Jacoby' via Programming
> Sent: woensdag 21 februari 2018 09:07
> To: [email protected]
> Subject: Re: [Jprogramming] Vector Si
For what it's worth, "obverse" essentially means "best approximation to
inverse".
I want to know if two points are the same, meaning one lies within a
sufficiently small sphere about the other. The correct way for my
application would be to test if the Euclidean distance between them is
suff
Yes, you need to box the name, when comparing it to the value of
'words' because the names in 'words' are all boxed.
I'd do it like this:
get=:3 :'vec{~words i. wrote:
> Thanks to Raul and Mike for the suggestions.
>
> I read in the data:
>
>
> nb =: <'C:\numberbatch-en.txt'
>
> nbs =. fread n
Defining a verb get to retrieve the index of the desired word as tacit does
make get pretty much unreadable; however, there is a possible performance
gain as the hash table for i. gets built only once when get is defined. If
you will be running get many times this could result in a significant
perf
That's an interesting point.
That said, if you give get a large list of words to look up, it's the
sort of issue which might be buried in everything else that's going on
(the cost per word gets divided by the number of words being looked up
at once).
Thanks,
--
Raul
On Wed, Feb 21, 2018 at 12
How about defining get as
get=:13 : '(,words)i.boxopen y'
Then it can take a single word unboxed or a boxed list.
On Wed, Feb 21, 2018 at 10:18 AM, Raul Miller wrote:
> That's an interesting point.
>
> That said, if you give get a large list of words to look up, it's the
> sort of issue whic
Sure, that could work, but this definition only returns the word
index, not the row(s) from vec. So it should get a different name.
That said, you could also do something like (what I had been
originally thinking of, though not the implementation I had posted):
get=: 13 :'vec{~words i. ;:^:(0=
I don't think this prescription is accurate. When m&i. is executed to
create a fast search verb, the value of m is put into the new verb. If
m is a name, the value of the name is NOT copied, but instead referred
to. If the name m is subsequently reassigned, the old value is
retained, referre
So words should be a list instead of a one column table. So we would have
words&i.
instead of
(,words)&i.
Correct? Doesn't the raveling prevent sharing of the contents of words in
the new verb?
And perhaps get should be
get=:13 : 'words&i.boxopen y'
instead of
get=:13 : 'words i.boxopen y'
I suppose the shape of the variable 'words' depends on how you built
the value for 'words'?
If you used
words=: <@({.~ i.&' ');._2 text
the noun will be rank 1, and there's no need to ravel it.
If you built it at rank 2, rather than rank 1... well... I'm not sure
why you would want to do that
Wow! Thanks so much for all the help on cleaning up and parsing the
numberbatch
text file, as well as the various methods for extracting words and their
associated
vectors from the data. It will take me a bit to digest all this, as well as
some time to
test the various suggestions, to see which sch
This is a good teaching example.
Consider
words&i. string
there are TWO executions. First, (words&i.) is executed to produce an
anonymous verb. This execution of & does far more than just save the
value of words. It builds a hash table, saves the value of words, and
does some other cla
I didn’t do a full check on my offering. I wonder, without being able to
check easily right now, whether your “nbs” is a rectangular char array rather
than a boxed array, like my example, “txt”. I vaguely recall that freadb
returns lines as boxes, may be wrong.
In any case, I did mean t
The first 807 beta is available. To install, browse to
code.jsoftware.com/wiki/System/Installation/Beta.
This includes:
* an updated regex
* better support for the Qt WebEngine which allows GUI applications to be a
mixture of the usual Qt desktop controls plus web browser controls
* support for
27 matches
Mail list logo