Re: [DISCUSSION] second release schedule and scope

DO YUNG YOON Sat, 08 Jul 2017 18:26:33 -0700

Thanks for your feedback. Here is my questions.

1. Release schedule:
- Do you think we should wait until all issues with tinkerpop support
resolved after?

What others think about the release schedule?

Should we wait until all of tinkerpop related issues resolving?
Can you guys list up "must resolve" issues on our second release?
The reason I mentioned index is I think it is the only one blocker issue
from list for next release.

2. Full-Text search:
- There would be 2 types of index support with variation(mixed/composite)
- Graph-Index: s2graph do not have this type of index.
- Composite-Index
- Mixed-Index
- Vertex-Centric-Index: s2graph do have this type of index.

Since they are two different type of index, it is inevitable to provide
them as separate option.

I doubt there could be confusion between graph-index and
vertex-centric-index and always clarify it on documentation.

If we agree that graph index layer is necessary, then develop the features
first, then see if there could be confusion and decide what to do to
clarify it. I think you agree that graph-index is necessary addition on
project(tell me if you don't).

Continue on more details on index topic.

Following is what titan provide and I think it would be nice if we can
provide this in S2Graph so let me briefly explain. (I suggest read through
http://s3.thinkaurelius.com/docs/titan/1.0.0/indexes.html if you are not
familiar with notations)

1. composite

Composite indexes retrieve vertices or edges by one or a (fixed)
composition of multiple keys.

this example is how user can create composite index on titan.

```
mgmt.buildIndex('byNameAndAgeComposite',
Vertex.class).addKey(name).addKey(age).buildCompositeIndex()
mgmt.commit()
```

then following traversal take benefit from `byNameComposite` index.

```
g.V().has('age', 30).has('name', 'hercules')
```

We can use HBase to store this index by creating row key as ("age", 30,
"name", "hercules").

```
g.V().has('name', 'hercules').has('age', 30)
```

To answer above traveral, it seems to sort property key and value in
composite index.

we can also make partial composite index such as below.

```
("age", 30)
("name", "hercules")
```

I am not sure if this is necessary. user can explicitly create above as
seperate index such as 'byName', 'byAge'.

One more suggestion is provide option to partition index, since there could
be lots of vertices/edges that has specific value. for example,
'byCountryGender' index can contains lots of vertices/edges and it is
problematic to store vertices/edges on same HBase region. we need to
auto-partition theses into user specified number of partition by prefix
salt. This is optimization step so can be revisited once we have
functionality working.

Note that composite index is only for comparing equality so following
traversal can't take advantage of index.

```
g.V().has('name', 'hercules').has('age', inside(20, 50))
```

2. mixed

Mixed indexes retrieve vertices or edges by any combination of previously
added property keys. full text search can be powered by mixed index, but it
may slower than composite index since it include external index backend
search(lucene, solr, elasticsearch, ...).

this example is how user can create mixed index on titan.

```
mgmt.buildIndex('nameAndAge',Vertex.class).addKey(name,Mapping.TEXT.getParameter()).addKey(age,Mapping.TEXT.getParameter()).buildMixedIndex("search")
```
user can decide use tokenizer when search engine index(named search) by
specifing Mapping(String or TEXT, default TEXT provide full text search).

then following traversal take benefit from `nameAndAge` index.

```
g.V().has('name', textContains('hercules')).has('age', inside(20, 50))
g.V().has('name', textContains('hercules'))
g.V().has('age', lt(50))
```

we can use elasticsearch/lucene/solr as index backend for this type of
index and actual tasks can be splitted by as following.

If there is no objection, then I will create index task and list above
subtasks under it.

One possible tasks list can be described as following.

1. Management Client:
- add option to speficy index type on creating ServiceColumn/Label.
2. Storage:
- add method to build mutation for storage backend when set of
vertexs/edges are given.
- add method to call index backend with built mutation.
3. Serializer/Deserializer:
- serializer: when a edge/vertex is given, build SKeyValue which can be
used by storage methods.
- deserializer: when byte array is given, build a Vertex/Edge that can be
used by storage methods.
4. ProviderOptimization
- tinkerpop ask provider to translate given traversal into implementation
specific functions.
- not sure if this is necessary with my limited knowledge so far, but need
to check once S2Graph internal provide composite/mixed index.

Any feedback would be appreciated.

On Sat, Jul 8, 2017 at 11:48 AM Hwansung Yu <deejayf...@gmail.com> wrote:

> Sorry for late reply.
>
> I think it is important to implement Tinkerpop in terms of functionality of
> S2Graph and for the activation of the community.
> I agree with your suggestion to concentrate on tinkerpop implementation
> issues in the second release.
> In my opinion, the time of release is when the tinkerpop implementation
> issue is cleaned up.
>
> And with regard to full text search...
> If full-text search is supported, we expect that constraints that were able
> to traversal will disappear only if the vertex is known.
> If supported, it would be better to leave it as a separate option to avoid
> confusion with existing indexes.
>
> On Sat, Jul 8, 2017 at 9:10 AM, DO YUNG YOON <sho...@gmail.com> wrote:
>
> > I guess there is no objection on my suggestion, so I am going to try list
> > up issues in more detail while preparing 0.2.0 release on late this
> month.
> >
> > Before list up above issues as task on jira, I want to discuss index in
> > more details.
> >
> > Following is my understanding on index to support tinkerpop fully and
> > efficiently
> > - reference: http://s3.thinkaurelius.com/docs/titan/1.0.0/indexes.html
> >
> > 1. graph index: traversal from a list of vertices or edges that are
> > identified by their properties
> >
> > 2. vertex-centric index: traversal through vertices with many incident
> > edges.
> >
> > I believe s2graph has vertex-centric index already, but it does not have
> > graph index layer so full text predicate, and range search features in
> > tinkerpop runs very inefficiently.
> >
> > For example, following traversal run full scan.
> >
> > - g.V().has('name', 'hercules')
> > - g.E().has('reason', textContains('loves'))
> >
> > To support full tinkerpop features efficiently, we need to add graph
> index
> > layer and I want to discuss how we are going to achieve this. like
> > suggested here(http://markmail.org/message/2vn2bwrwh5zbeie4) using
> > external
> > search engine totally make sense to me.
> >
> > I suggest to design index management interface first, since graph index
> has
> > never exist in S2Graph previously. then decision about index storage
> > backend, implementation can be discussed in more detail(the other way
> > around could also possible).
> >
> > Following is how user create index in s2graph currently.
> >
> > Management.createServiceColumn(
> > serviceName = serviceName, columnName = "person", columnType = "integer",
> >     props = Seq(
> >     Prop("name", "-", "string"),
> >     Prop("age", "0", "integer"),
> >     Prop("location", "-", "string")
> >     )
> > )
> >
> > management.createLabel(
> > label = "bought",
> >     srcServiceName = serviceName, srcColumnName = "person",
> srcColumnType =
> > "integer",
> >     tgtServiceName = serviceName, tgtColumnName = "product",
> tgtColumnType
> > = "integer", idDirected = true,
> >     serviceName = serviceName,
> >     indices = Seq(
> >     Index("PK", Seq("amount", "created_at")
> >     ),
> >     props = Seq(
> >     Prop("amount", "0.0", "double"),
> >     Prop("created_at", "2000-01-01", "string")
> >     ),
> >     consistencyLevel = "strong"
> > )
> >
> > How we going to let user to create graph-index? Should we add extra
> > parameters on existing methods, or provide separate methods?
> >
> >
> > On Mon, Jul 3, 2017 at 10:11 PM DO YUNG YOON <sho...@gmail.com> wrote:
> >
> > > Hi folks.
> > >
> > > It's been for a while we released our first release.
> > > It seems that needs for implementing tinkerpop interface has been high,
> > > but we have not finished it. I have been working on
> > > https://issues.apache.org/jira/browse/S2GRAPH-136 since April, then
> > > recently merged it into master.
> > >
> > > I think Gremlin-core is tested, but following is what I think we have
> to
> > > improve for tinkerpop users to try out s2graph easily.
> > >
> > > 1. provide provider optimization, we have none currently.
> > > 2. full text search predicate is not currently supported(as @echarles
> > > pointed out)
> > > 3. provide gremlin plugin
> > > 4. make sure tinkerpop stack works correctly.
> > >
> > > Any help on above issues would be highly appreciated(help on any other
> > > issue would be also highly appreciated).
> > >
> > > By the way, What I want to discuss is the schedule and what will be
> > > included on our second release.
> > >
> > > I suggest to focus on integrate with tinkerpop on our second release.
> It
> > > would be best if we can address above issues by this month, but I doubt
> > if
> > > it is possible.
> > >
> > > I am suggesting fix our release date on late this month, then focus on
> > > above issues with high priority. if we can address them all, great, but
> > if
> > > we can't, then release with version as much as we can deliver in time,
> > then
> > > move them on next next release so on.
> > >
> > > Want to hear what other folks think about focus and schedule on our
> > second
> > > release, and happy to volunteer as release manager for this time if
> there
> > > are no other volunteer.
> > >
> > > If there are other issues which anyone think to be included on next
> > > release, please list them on this thread.
> > >
> > > Thanks
> > >
> > > DO YUNG YOON
> > >
> > >
> >
>

Re: [DISCUSSION] second release schedule and scope

Reply via email to