Hi I would want to develop a simulator (or log data generator) using Hadoop modules.
The simulator includes a lot of parallel (time synchronized) state machines (even million) and each of them generates log file (with timestamps). State machines also use common data structures (like database of key-value-pairs). Finally all events (rows) of all log files should combine as time order to (one) very huge log file. Practically the combined huge log file can also be split into smaller ones. Is that good or bad idea ? If it is possible, how to do that using by Hadoop modules (or other modules/libraries/tools) ? I would also interested how to produce real time event stream (instead of log files) by simulator.