Weird, inputs with tabs work for me right out of the box. Either the "\t" is not the cause or it's some Java-version specific issue. Try this toy program:
import java.util.regex.Pattern; public class Test { public static void main(String[] args) { Pattern SEPARATOR = Pattern.compile("[\t ]"); String[] tokens = SEPARATOR.split("3 4 5 6 7"); for (int i = 0; i < tokens.length; i++) { System.out.println("--" + tokens[i] + "--"); } } } Does it split the tabs properly for your Java? Young On Mon, Mar 31, 2014 at 4:19 PM, ghufran malik <ghufran1ma...@gmail.com>wrote: > Yep you right it is a bug with all the InputFormats I believe, I just > checked it with the Giraph 1.1.0 jar using the IntIntNullVertexInputFormat > and the example ConnectedComponents class and it worked like a charm with > just the normal spacing. > > > On Mon, Mar 31, 2014 at 9:15 PM, Young Han <young....@uwaterloo.ca> wrote: > >> Huh, it might be a bug in the code. Could it be that Pattern.compile has >> to take "[\\t ]" (note the double backslash) to properly match tabs? If so, >> that bug is in all the input formats... >> >> Happy to help :) >> >> Young >> >> >> On Mon, Mar 31, 2014 at 4:07 PM, ghufran malik >> <ghufran1ma...@gmail.com>wrote: >> >>> Hi, >>> >>> I removed the spaces and it worked! I don't understand though. I'm sure >>> the separator pattern means that it splits it by tab spaces?. >>> >>> Thanks for all your help though some what relieved now! >>> >>> Kind regards, >>> >>> Ghufran >>> >>> >>> On Mon, Mar 31, 2014 at 8:15 PM, Young Han <young....@uwaterloo.ca>wrote: >>> >>>> Hi, >>>> >>>> That looks like an error with the algorithm... What do the Hadoop >>>> userlogs say? >>>> >>>> And just to rule out weirdness, what happens if you use spaces instead >>>> of tabs (for your input graph)? >>>> >>>> Young >>>> >>>> >>>> On Mon, Mar 31, 2014 at 2:04 PM, ghufran malik <ghufran1ma...@gmail.com >>>> > wrote: >>>> >>>>> Hey, >>>>> >>>>> No even after I added the .txt it gets to map 100% then drops back >>>>> down to 50 and gives me the error: >>>>> >>>>> 14/03/31 18:22:56 INFO utils.ConfigurationUtils: No edge input format >>>>> specified. Ensure your InputFormat does not require one. >>>>> 14/03/31 18:22:56 WARN job.GiraphConfigurationValidator: Output format >>>>> vertex index type is not known >>>>> 14/03/31 18:22:56 WARN job.GiraphConfigurationValidator: Output format >>>>> vertex value type is not known >>>>> 14/03/31 18:22:56 WARN job.GiraphConfigurationValidator: Output format >>>>> edge value type is not known >>>>> 14/03/31 18:22:56 INFO job.GiraphJob: run: Since checkpointing is >>>>> disabled (default), do not allow any task retries (setting >>>>> mapred.map.max.attempts = 0, old value = 4) >>>>> 14/03/31 18:22:57 INFO mapred.JobClient: Running job: >>>>> job_201403311622_0004 >>>>> 14/03/31 18:22:58 INFO mapred.JobClient: map 0% reduce 0% >>>>> 14/03/31 18:23:16 INFO mapred.JobClient: map 50% reduce 0% >>>>> 14/03/31 18:23:19 INFO mapred.JobClient: map 100% reduce 0% >>>>> 14/03/31 18:33:25 INFO mapred.JobClient: map 50% reduce 0% >>>>> 14/03/31 18:33:30 INFO mapred.JobClient: Job complete: >>>>> job_201403311622_0004 >>>>> 14/03/31 18:33:30 INFO mapred.JobClient: Counters: 6 >>>>> 14/03/31 18:33:30 INFO mapred.JobClient: Job Counters >>>>> 14/03/31 18:33:30 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=1238858 >>>>> 14/03/31 18:33:30 INFO mapred.JobClient: Total time spent by all >>>>> reduces waiting after reserving slots (ms)=0 >>>>> 14/03/31 18:33:30 INFO mapred.JobClient: Total time spent by all >>>>> maps waiting after reserving slots (ms)=0 >>>>> 14/03/31 18:33:30 INFO mapred.JobClient: Launched map tasks=2 >>>>> 14/03/31 18:33:30 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=0 >>>>> 14/03/31 18:33:30 INFO mapred.JobClient: Failed map tasks=1 >>>>> >>>>> >>>>> I did a check to make sure the graph was being stored correctly by >>>>> doing: >>>>> >>>>> ghufran@ghufran:~/Downloads/hadoop-0.20.203.0/bin$ hadoop dfs -cat >>>>> input/* >>>>> 1 2 >>>>> 2 1 3 4 >>>>> 3 2 >>>>> 4 2 >>>>> >>>> >>>> >>> >> >