Sorry but I did not understand. For what I see case classes are scala, I'm using java (I could consider learn and change to scala because I have not started yet and its for learning purposes only)
What do you mean with known formats? When the user creates a channel he only has some basic types (string, long, timestamp, etc) and some channels previously created (by him) to choose from. Example: The user first creates 2 simple channels (Coordinate and Temperature): Coordinate = { "X" : "Float", "Y" : "Float", "instant" : "Timestamp" } Temperature{ "value" : "Float", "measurement_unit" : "String", "instant" : "Timestamp" } Then, the user creates a new channel using the 2 previously created: Measurement{ "coord" : "Coordinate", "temp" : "Temperature", "instant" : "Timestamp" } Now, when the data comes I validate its format against the defined channel's format, if it does't match I throw an error. Example: { "coord" : { "X" : 31.75, "Y" : "32.75" "instant" : "2016-06-20T13:28:06.419Z" }, "temp" : { "value" : 25.6, "measurement_unit" : "Celsius", "instant" : "2016-06-20T13:28:06.419Z" }, "instant" : "2016-06-20T13:28:06.419Z" } That piece of data will fail validation cause the "Y" value does't have Float type (as defined in the Coordinate channel). Is there a chance you could explain a little more what you said previously? will really help me. Thank you 2016-07-07 20:54 GMT-03:00 Ted Yu <yuzhih...@gmail.com>: > For 1) you don't have to introduce external storage. > > You can define case classes for the known formats. > > FYI > > On Thu, Jul 7, 2016 at 4:40 PM, venito camelas <robotirlan...@gmail.com> > wrote: > >> I'm pretty new to this and I have a use case I'm not sure how to >> implement, I'll try to explain it and I'd appreciate if anyone could point >> me in the right direction. >> >> The case has these requirements: >> 1 - Any user shoud be able to define the format of the information they >> want to store (channel). For example, user X defines a channel named >> "coordinate": >> coordinate = { >> "X" : "Float", >> "Y" : "Float", >> "instant" : "Timestamp" >> } >> Every channel has some time value, it can be an instant (like above) or >> a period of time ("start" : "Timestamp", "end" : "Timestamp") >> >> 2 - Given the previous example, the user should be able to ask the >> following questions: >> 2.1 When was the last time I went near {X : x, Y : y}? --> Process the >> information in order to get the "near" places and return the newest one. >> 2.2 Where was I on march 6th between 1pm and 2pm? --> Query by time >> >> >> >> For 1) I was thinking of using some Document oriented storage because of >> the channels lack of structure, not sure that's the only thing to consider >> though. >> >> For 2.1) I'd use some MR job >> >> For 2.2) I think it would be better to have the information in the >> document storage and make the queries there. >> >> Is it a good approach to have the information stored both in the hdfs and >> the document oriented storage (for processing and querying respectively)? >> >> As I mentioned in the beginning, I'm really new to this and I'm just >> trying to learn..so sorry if my doubts are silly. >> >> Any suggestion or any good reference related to this will be much >> appreciated. >> > >